Augustana University College

COMPUTING SCIENCE 370
Programming Languages


The Chomsky Hierarchy of Languages



Chomsky's Hierarchy

Noam Chomsky defined four classes of grammars, which define four classes of languages. These are arranged in an hierarchy: each class includes the one below it. The hierarchy is strict, meaning that there exist languages of each type that do not belong to the next higher type.

The class of languages describable by BNF (or EBNF), including recursively defined nonterminals, is equivalent to Chomsky's context-free (type 2) languages.

By comparison, a regular language may be defined by EBNF without the use of recursive rules. The regular expressions recognized by the Unix shell and utilities such as 'grep', 'sed' and 'awk' are examples of regular languages. For example, the set of all strings containing alternating 1's and 0's that start and end with a 1 is a regular language. Both of the following expressions describe this set of strings:

1{01}*
{10}*1

Since recursion in a grammar allows the definition of nested syntactic structures, any language (including any programming language) which allows nested structures is a context-free language, not a regular language. For example, the set of strings consisting of balanced parentheses [like a LISP program with the alphanumerics removed] is a context-free language:

<expression> ::= <balanced>*
<balanced> ::= (<expression>)

Some examples of legal strings in this language are:

()
()()((()))
(()(()))
                    [That's the empty string.]
((()))()((()()))

Some programming languages are not strictly context-free languages. For example, a procedure in Modula-2 may be defined in EBNF as:

<procedure> ::= PROCEDURE <identifier> [ <formal parameters> ] ; [ <declarations> ] BEGIN <statement sequence> END <identifier>

However, there is a context-sensitive restriction on this grammar: the <identifier> after 'PROCEDURE' must be the same as the <identifier> after 'END'.

Copyright © 2000 Jonathan Mohr