Formal Languages - Grammar

214
Formal Languages Grammar Ryan Stansifer Department of Computer Sciences Florida Institute of Technology Melbourne, Florida USA 32901 http://www.cs.fit.edu/~ryan/ 17 October 2019 Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 1 / 141

Transcript of Formal Languages - Grammar

Formal LanguagesGrammar

Ryan Stansifer

Department of Computer SciencesFlorida Institute of TechnologyMelbourne, Florida USA 32901

http://www.cs.fit.edu/~ryan/

17 October 2019

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 1 / 141

A formal language isa set of strings over an alphabet.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 2 / 141

Summary of Four Paradigms of Formal Languages

• Ad-hoc mathematical descriptions {w ∈ Σ∗ | . . . }.1 An automaton M accepts or recognizes a formal language L(M).2 An expression x denotes a formal language LJxK.3 A grammar G generates a formal language L(G).4 A Post system P derives a formal language L(P).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 3 / 141

A language may be described mathematically using set notation as L = {w ∈ Σ∗ | P(w) }.This means w ∈ L if, and only, if P(w) is true.

A language may be accepted by an automaton M.

L(M) = {w ∈ Σ∗ | d0 `∗ df where df is final state }

A language may be denoted by an expression x . We write L = DJxK. A language may bedenoted by an expression x . We write L = LJxK.

A language may be generated by a grammar G .

L(G) = {w ∈ Σ∗ | S ⇒∗ w }

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 4 / 141

Automata compute Expressions denote

Trees demonstrate Grammars construct

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 5 / 141

Grammars

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 6 / 141

Grammar

A grammar is another formalism for defining and generating a formal language (a set of stringsover an alphabet).

In the context of grammars it is traditional to call the alphabet a set of terminal symbols.Ordered sequences of terminal symbols are sometimes called sentences in this context, insteadof strings.

Context-free grammars play an important role in defining programming languages and in theconstruction of compilers for programming languages.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 7 / 141

Definition of Grammar

A grammar is a 4-tuple 〈T ,V ,P, S〉:• T is the finite set of terminal symbols;

• V is the finite set of nonterminal symbols, also called variables or syntactic categories,T ∩ V = Ø;• S ∈ V , is the start symbol;• P is the finite set of productions.

A production has the form α→ β where α and β are ordered sequences (strings) of terminalsand nonterminals symbols. (We write (T ∪ V )∗ for the set of ordered sequences of terminalsand nonterminals symbols.) The LHS α of a production can’t be the empty sequence, but βmight be.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 8 / 141

Definition of Grammar

A grammar is a 4-tuple 〈T ,V ,P, S〉:• T is the finite set of terminal symbols;• V is the finite set of nonterminal symbols, also called variables or syntactic categories,T ∩ V = Ø;

• S ∈ V , is the start symbol;• P is the finite set of productions.

A production has the form α→ β where α and β are ordered sequences (strings) of terminalsand nonterminals symbols. (We write (T ∪ V )∗ for the set of ordered sequences of terminalsand nonterminals symbols.) The LHS α of a production can’t be the empty sequence, but βmight be.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 8 / 141

Definition of Grammar

A grammar is a 4-tuple 〈T ,V ,P, S〉:• T is the finite set of terminal symbols;• V is the finite set of nonterminal symbols, also called variables or syntactic categories,T ∩ V = Ø;• S ∈ V , is the start symbol;

• P is the finite set of productions.A production has the form α→ β where α and β are ordered sequences (strings) of terminalsand nonterminals symbols. (We write (T ∪ V )∗ for the set of ordered sequences of terminalsand nonterminals symbols.) The LHS α of a production can’t be the empty sequence, but βmight be.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 8 / 141

Definition of Grammar

A grammar is a 4-tuple 〈T ,V ,P, S〉:• T is the finite set of terminal symbols;• V is the finite set of nonterminal symbols, also called variables or syntactic categories,T ∩ V = Ø;• S ∈ V , is the start symbol;• P is the finite set of productions.

A production has the form α→ β where α and β are ordered sequences (strings) of terminalsand nonterminals symbols. (We write (T ∪ V )∗ for the set of ordered sequences of terminalsand nonterminals symbols.) The LHS α of a production can’t be the empty sequence, but βmight be.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 8 / 141

Example

We are primarily interested in context-free grammars (which we formally introduce later). CFGare simpler in that the LHS α is a single nonterminal.

The following five productions are for a grammar with T = {0, 1}, V = {S}, and start symbolS.

1 S→ ε2 S→ 03 S→ 14 S→ 0 S 05 S→ 1 S 1

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 9 / 141

Common Notational ConventionsTo more effectively communicate and spare tedious repetition, it is convenient to establishsome notational conventions.

1 Lower-case letters near the beginning of the alphabet, a, b, and so on, are terminalsymbols. We shall also assume that non-letter symbols, like digits, +, are terminalsymbols.

2 Upper-case letters near the beginning of the alphabet, A, B, and so on, are nonterminalsymbols. Often the start symbol of a grammar is assumed to be named S, or sometimesit is the nonterminal on the left-hand side of the first production.

3 Lower-case letters near the end of the alphabet, such as w or z , are strings of terminals.4 Upper-case letters near the end of the alphabet, such as X or Y , are a single symbol,

either a terminal or a nonterminal.5 Lower-case Greek letters, such as α and β, are sequences (possibly empty) of terminal

and nonterminal symbols.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 10 / 141

Common Notational Conventions (Recap)

To more effectively communicate and spare tedious repetition, it is convenient to establishsome notational conventions.

1 a, b, c, . . . are terminals.2 A, B, C , . . . are nonterminals.3 . . ., X , Y , Z are terminal or nonterminal symbols.4 . . ., w , x , y , z are strings/sequences of terminals only.5 α, β, γ, . . . are strings/sequences of terminals or nonterminals.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 11 / 141

Compact Notation

If the RHS of a production is a sequence with no symbols in it, we use ε to communicate thatfact clearly. So, for example, A→ ε is a production.Definition. A production in a grammar in which the RHS of a production is a sequence oflength zero is called an ε-production.The productions with the same LHS, A→ α1, A→ α2, . . . , A→ αn can be written using thenotation A→ α1 | α2 | · · · | αn

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 12 / 141

Compact Notation

It is convenient to think of a production [in a CFG] as “belonging” to the variable[nonterminal] of its head [LHS]. We shall often use remarks like “the productions forA” or “A-productions” to refer to the production whose head [LHS] is [the] variableA. We may write the productions for a grammar by listing each variable [nonterminal]once, and then listing all the bodies of the productions for that variable [nonterminal],separated by vertical bars.

HMU 3rd, §5.1, page 175.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 13 / 141

Derivation

[cf. Kozen, page 133]

A grammar G = 〈T ,V ,P, S〉 gives rise naturally to a method of constructing strings ofterminal and nonterminal symbols by application of the productions.

Definition. If α, β ∈ (T ∪ V )∗, we say that α derives β in one step, and we write

α1⇒G β

if β can be obtained from α by replacing some occurrence of the substring δ in α with γ,where δ → γ is a production of G . In other words, α = α1δα2 and β = α1γα2 for someα1, α2 ∈ (T ∪ V )∗.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 14 / 141

We usually omit the grammar G writing

α1⇒ β

and leave it to the reader to figure out which grammar is meant.

We often omit the 1, writingα⇒ β

except to emphasize the distinction with other derivability relations.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 15 / 141

Derivation

Perhaps we can recast the previous definition slightly more perspicuously as follows:Definition. If δ → γ is a production of G , then α1δα2 derives α1γα2 in one step, and we write

α1δα21⇒G α1γα2

for all α1, α2 ∈ (T ∪ V )∗.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 16 / 141

Derivation

Definition. Let ∗⇒G be the reflexive, transitive closure of the 1⇒G relation. That is:

α∗⇒ α for any α

α∗⇒ β if α ∗⇒ γ and γ 1⇒ β

This relation is sometimes called the “derives in zero or more steps” relation.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 17 / 141

Derivation – Inductively Defined Relation

δ → γ ∈ Pα1δα2

1⇒ α1γα2

α ∈ (T ∪ V )∗

α∗⇒ α

α∗⇒ γ γ

1⇒ β

α∗⇒ β

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 18 / 141

Derivation

A string in (T ∪ V )∗ that is derivable from the start symbol S is called a sentential form. Asentential form is called a sentence, if it consists only of terminal symbols.

Definition. The language generated by G , denoted L(G), is the set of all sentences:

L(G) = { x ∈ T ∗ | S ∗⇒G x }

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 19 / 141

Outline

1 Different Kinds of Grammars

2 Context-Free GrammarsLeftmost, Rightmost DerivationsAmbiguity

3 Chomsky and Greibach Normal Forms

4 Cleaning Up CFG’s

5 Brute Force Membership Test

6 CYK Algorithm

Grammars

context-free

linear

regular

leftlinear

rightlinear

S → SSS → aSbS → bSaS → ε

S → aSb | ε

S → aS | εS → Sa | ε

Grammars (not languages)Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 21 / 141

Different Kinds of Grammars

Unrestricted grammar• Context-sensitive grammar• Context-free grammar

• Grammar in Greibach normal form• simple-grammar, s-grammar

• Grammar in Chomsky normal form• Linear grammar

• Regular grammar• right-linear grammar.• left-linear grammar.

• Compiler theory: LL(k), LR(k) etc.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 22 / 141

Restrictions on Grammars

Grammar (Linz 6th, §1.2, definition 1.1, page 21).• Context-sensitive grammar (Linz 6th, §11.4, definition 11.4, page 300).• Context-free grammar (Linz 6th, §5.1, definition 5.1, page 130).

• Greibach normal form (Linz 6th §6.2, definition 6.5, page 174).• simple-grammar, s-grammar (Linz 6th, §5.3, definition 5.4, page 144).

• Chomsky normal form (Linz 6th, §6.2, definition 6.4, page 171).• Linear grammar (Linz 6th, §3.3, page 93).

• Regular grammar (Linz 6th, §3.3, definition 3.3, page 92).• right-linear grammar.• left-linear grammar.

• Compiler theory: LL(k) (Linz 6th, §7.4, definition 7.5, page 210); LR(k) [Not defined inLinz 6th.]

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 23 / 141

Restrictions on Grammars (HMU)

• Context-sensitive grammar [Not defined in HMU].• Context-free grammar (HMU 3rd, §5.1.2, page 173).

• Greibach normal form (HMU 3rd, §7.1, page 277).• simple-grammar, s-grammar [Not defined in HMU].

• Chomsky normal form (HMU 3rd, §7.1.5, page 272).• Linear grammar [Not defined in HMU].

• Regular grammar [Not defined in HMU].• right-linear grammar [HMU 3rd, §5.1.7, exercise 5.1.4, page 182].• left-linear grammar.

• Compiler theory: LL(k), LR(k) etc. [Not defined in HMU].

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 24 / 141

Restrictions on Grammars (Sudkamp)

• Context-sensitive grammar [Not defined in HMU].• Context-free grammar (HMU 3rd, §5.1.2, page 173).

• Greibach normal form (Sudkamp, §4.8, definition 4.8.1, page 131).• simple-grammar, s-grammar [Not defined in HMU].

• Chomsky normal form (Sudkamp, §4.5, definition 4.5.1, page 122).• Linear grammar (Sudkamp, chapter 7, exercise 22, page 250).

• Regular grammar (Sudkamp, page 81).• right-linear grammar [Sudkamp, chapter 3„ exercise 40, page 102].• left-linear grammar.

• Compiler theory: LL(k), LR(k) etc. [Not defined in HMU].

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 25 / 141

Restrictions on Grammars

• Context-sensitive grammar. Each production α→ β restricted to |α| ≤ |β|• Context-free grammar. Each production A→ β where A ∈ N

• Grammar in Greibach normal form. A→ aγ where a ∈ T and γ ∈ V ∗

• simple-grammar or s-grammar. Any pair 〈A, a〉 occurs at most once in the productions• Grammar in Chomsky normal form. A→ BC or A→ a• Linear grammar. RHS β contains at most one nonterminal

• Regular grammar, either:• right-linear grammar. The nonterminal (if any) occurs to the right of (or after) any terminals• left-linear grammar. The nonterminal (if any) occurs to the left of (or before) any terminals

If ε ∈ L(G) we must allow a niggly exception S → ε.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 26 / 141

The s-languages are those languages recognized by a particular restricted form ofdeterministic pushdown automaton, called an s-machine. They are uniquely charac-terized by that subset of the standard-form grammars in which each rule has the formZ → aY1 . . .Yn, n ≥ 0, and for which the pairs (Z , a) are distinct among the rules.It is shown that the s-languages have the prefix property, and that they include theregular sets with end-markers. Finally, their closure properties and decision problemsare examined, and it is found that their equivalence problem is solvable.

Korenja, Hopcroft, Simple deterministic languages, 1968.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 27 / 141

right-linear A→ xB or A→ x A,B ∈ V , x ∈ T ∗strongly right-linear A→ aB or A→ ε A,B ∈ V , a ∈ Tleft-linear A→ Bx or A→ x A,B ∈ V , x ∈ T ∗strongly left-linear A→ Ba or A→ ε A,B ∈ V , a ∈ T

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 28 / 141

Outline

1 Different Kinds of Grammars

2 Context-Free GrammarsLeftmost, Rightmost DerivationsAmbiguity

3 Chomsky and Greibach Normal Forms

4 Cleaning Up CFG’s

5 Brute Force Membership Test

6 CYK Algorithm

Context-Free Grammars (Overview)

• Definitions of a grammar and, specifically, a context-free grammar (Linz 6th, definition5.1, page 130; HMU §5.1.2).• Definition of a context-free language (Linz 6th, definition 5.1, page 130; HMU §5.1.5).• Simplification of context-free grammars (Linz 6th, §6.1, page 156; HMU 3rd, §7.1, page261).• Normal Forms

• Chomsky Normal Form (Linz 6th, §6.2, page 171; HUM 3rd, §7.1.5, page 272).• Greibach Normal Form (Linz 6th, §6.2, definition 6.6, page 174; HMU 3rd, §7.1, page 277).

• CYK algorithm (Linz 6th, §6.3, page 178; HMU 3rd, §7.4.4, page 303).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 30 / 141

Context-Free Languages (Overview)

• Pushdown automata (Linz 6th, chapter 7, page 181; HUM 3rd, chapter 6, page 225).• CFL pumping lemma (Linz 6th, §8.1, page 214, HMU 3rd, §7.2, page 279).• Closure properties of CFL (Linz 6th, §8.2, page 219; HMU 3rd, §7.3, page 287).• Decision properties of CFL (Linz 6th, §8.2, page 227; HMU 3rd, §7.4.2, page 301).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 31 / 141

Context-Free Grammars

Definition. A context-free grammar is a grammar, if all productions are of the form A→ αwhere A ∈ V and (as always) α ∈ (T ∪ V )∗.

It is called context-free because there is no context surrounding the LHS nonterminal.

Linz 6th, §5.1, definition 5.1, page 130.HMU 3rd, §5.1.2, page 173.

Sudkamp, §3.1, definition 3.1.3, page 70.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 32 / 141

Context-Free Grammar

Definition. A subset L ⊆ T ∗ is a context-free language (CFL) if L = L(G) for somecontext-free grammar G .

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 33 / 141

regularlanguages

context-freelanguages

context-sensitivelanguages

recursively enumerablelanguages

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 34 / 141

Context-Free Grammars

Context-free grammars are so common and useful that when speaking of grammars one oftenassumes that context-free grammars are meant as opposed to unrestricted grammars.

The definitions of derivations and sentential forms, and so on, are, of course, the same for allgrammars. However, they bear repeating in the special case of context-free grammars.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 35 / 141

Derivation (CFG)

A context-free grammar G = 〈T ,V ,S,P〉 gives rise naturally to a method of constructingstrings in T ∗ by applying the productions.If α, β ∈ (T ∪ V )∗, we say that α derives β in one step, and we write

α1⇒G β

if β can be obtained from α by replacing some occurrence of a nonterminal A in the string αwith γ, where A→ γ is a production of G . In other words, α = α1Aα2 and β = α1γα2 forsome α1α2 ∈ (T ∪ V )∗.

We usually omit the grammar G writing

α1⇒ β

and leave it to the reader to figure out which grammar is meant.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 36 / 141

Derivation (CFG)

Definition. Let ∗⇒G be the reflexive, transitive closure of the 1⇒G relation. That is:

α∗⇒ α for any α

α∗⇒ β if α ∗⇒ γ and γ 1⇒ β

This relation is sometimes called the “derives in zero or more steps” or simply the “derives”relation.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 37 / 141

Linz 6th, Section 5.1, Example 5.1, page 131

Consider the context-free grammar G = 〈{S}, {a, b},S,P〉 with productions

S→ aSaS→ bSbS→ ε

Here are some derivations for G :

S 1⇒G aSa 1⇒G aaSaa 1⇒G aabSbaa 1⇒G aabbaa

S 1⇒G bSb 1⇒G baSab 1⇒G babSbab 1⇒G babaSabab 1⇒G babaabab

Since S ∗⇒ aabbaa and S ∗⇒ babaabab, we have aabbaa ∈ L(G) and babaabab ∈ L(G). It isclear that this is the language of even length palindromes L(G) = {wwR | w ∈ {a, b}∗}.Observe that this language was shown earlier not to be regular.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 38 / 141

Notation

Derivations are inductively defined sets and we prove properties of derivations and hence aboutthe languages they construct.

Next we give an example inductive proof about the language constructed by a CFG. We willneed variations on the following notation.

For all X ∈ T ∪ V we write either nX (α) or #X (α) for the number of occurrences of thesymbol X in the sentential form α ∈ (T ∪ V )∗.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 39 / 141

Example

Consider the following grammar G :

1 B→ ε2 B→ (R B3 R→ )4 R→ (RR

L(G) = {w ∈ T ∗ | B ∗⇒ w}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 40 / 141

Example Inductive Proof of CFG

(For the purposes of this example, it is slightly less awkward to write #L[α] for #([α].)

Definition. Let #L[α] be the number of left parentheses in α. And, let #R [α] be the numberof occurrences of the nonterminal R. plus the number of right parentheses in α.

Definition. Let #[α] be #R [α]−#L[α]. In words, count the closing parentheses and theoccurrences of the nonterminal R and subtract the number of opening parentheses.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 41 / 141

α #L[alpha] #R [alpha] #[α]ε 0 0 0( 1 0 1) 0 1 −1R 1 0 1B 0 0 0

R(R 3 0 3RRR 3 0 3RRRB 3 0 3)))B 0 3 −1

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 42 / 141

We capture some (but not all) of the notion what it means for an expression to useparentheses correctly. Theorem. For all B ∗⇒ α, #[α] = 0.

Corollary. For all w ∈ L(G), #[w ] = 0.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 43 / 141

Proof. Clearly each production preserves the count #, i.e., the count on the RHS of theproduction is the same as the count on the LHS.

1 #[B] = #[ε] = 02 #[B] = #[(R B] = 03 #[R] = #[)] = 14 #[R] = #[(R R] = 1

So, any step in the derivation preserves the count.

1 #[αBβ] = #[αβ]2 #[αBβ] = #[α (R B β]3 #[αRβ] = #[α )β]4 #[αRβ] = #[α (R R β]

Given these observations, we now show that for all derivations B ∗⇒ γ it is the case that#[γ] = 0.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 44 / 141

In each step of these derivations (read from top to bottom),

R + ) − (occurrences of R + right parens minus left parens

is preserved and is zero.

B (0+0)-0 0(RB (1+0)-1 0

(R(RB (2+0)-2 0(R()B (1+1)-2 0(R() (1+1)-2 0()() (0+2)-2 0

B (0+0)-0 0(RB (1+0)-1 0

((RRB (2+0)-2 0(()RB (1+1)-2 0(())B (0+2)-2 0(()) (0+2)-2 0

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 45 / 141

The proof is by induction on the derivation. The base case is a zero step derivation from B.Obviously, for B ∗⇒ B, we have #[B] = 0. Assume all the previous steps in the derivationpreserve the count. In other words, the induction hypothesis says for all sentential forms γ ′ ifB ∗⇒ γ ′ we have #[γ ′] = 0. The last step of the derivation is

γ ′1⇒ γ

Since each of the four productions preserves the count, #[γ] = 0. QED

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 46 / 141

Linz 6th, Section 5.1, Example 5.2, page 132L = {ab(bbaa)nbba(ba)n|0 ≤ n} is CFL

Linz 6th, Section 5.1, Example 5.3, page 132L = {anbm|n 6= m} is CFL

Linz 6th, Section 5.1, Example 5.4, page 133L = {w |∀u, v uv = w#([u] ≥ #)[v ],#([uv ] = #)[uv ]} is CFL

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 47 / 141

Linz 6th, Section 5.1, Example 5.3, page 132

Show the language L = {anbm|n 6= m} is context free. Observe

L = {anbm|n < m} ∪ {anbm|n > m}

So, G be:S →AS1 | S1BS1→ aS1b | εA → aA | aB → bB | b

Since L = L(G), L is a context-free language.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 48 / 141

Concrete Syntax Trees

Every derivation gives rise to a figure called a concrete syntax tree. A node labeled with anonterminal occurring on the left side of a production has children consisting of the symbolson the right side of that production.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 49 / 141

Linz 6th, Section 5.1, Example 5.6, page 136

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 50 / 141

Derivation

In a grammar that is not linear, sentential forms with more than one nonterminal in them canbe derived. In such cases, we have a choice in which production to use.Definition. A derivation is said to be leftmost if in each step the leftmost nonterminal in thesentential form is replaced. We write

α∗⇒

lmβ

Definition. A derivation is said to be rightmost if in each step the rightmost nonterminal inthe sentential form is replaced. We write

α∗⇒

rmβ

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 51 / 141

Derivation

A step in a derivation will be both leftmost and rightmost if the sentential form only has onenon-terminal to begin with.Hence, a deriviation can be both leftmost and rightmost at the same time.A derivation can be nither leftmost nor rightmost. This is the case when some steps areleftmost (but not rightmost) and some steps are rightmost (but not leftmost).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 52 / 141

Derivation

Theorem. All derivations in linear grammars are both leftmost and rightmost.Theorem. If there is a derivation at all, then there is a leftmost derivation and a rightmostderivation.Theorem. If there is a rightmost derivation, then there is a leftmost derivation, and vice versa.Theorem. Some derivations are neither leftmost nor rightmost.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 53 / 141

Linz 6th, Section 5.1, Example 5.5, page 134Consider the grammar with productions

1 S→ aAB2 A→ bBb3 B→A4 B→ ε

Here are some distinct derivations (of the same string abbbb):

S 1⇒ aAB 1⇒lm

abBbB 1⇒lm

abAbB 1⇒lm

abbBbbB 1⇒lm

abbbbB 1⇒ abbbb

S 1⇒ aAB 1⇒lm

abBbB 1⇒rm

abBb 1⇒ abAb 1⇒ abbBbb 1⇒ abbbb

S 1⇒ aAB 1⇒rm

aA 1⇒ abBb 1⇒ abAb 1⇒ abbBb 1⇒ abbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 54 / 141

The derivations are not essentially different. This can be easily seen if we view a derivation asa tree. A tree can be built from a derivation by taking LHS nonterminal of each productionA→ α used in the derivation as an internal node with children for each of the |α| grammarsymbols in the RHS. Terminal symbols are leaves in the tree (they have no children).

The derivations on the previous frame/slide are instructions for building a tree using theproductions 1, 2, 3, 2, 4, 4 with inconsequential differences in order.

Effectively leftmost is an building the tree depth-first with the internal nodes (nonterminals)ordered left to right. While right most is building a tree with the internal nodes (nonterminals)ordered right to left.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 55 / 141

[Picture]

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 56 / 141

Three nonterminals and four productions lead to three mutually recursive, algebraic typedeclarations in Haskell with four constructors.

class Yield a whereyield :: a -> String

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 57 / 141

data S = P1 A B -- S -> a A Bdata A = P2 B -- A -> b B bdata B = P3 A | P4 -- B -> A | [ epsilon ]

instance Yield S whereyield (P1 a b) = ’a’ : (yield a) ++ (yield b)

instance Yield A whereyield (P2 b) = ’b’ : yield b ++ [’b’]

instance Yield B whereyield (P3 a) = yield ayield (P4) = ""

d :: Sd = P1 (P2 (P3 (P2 P4 ))) P4

y :: Stringy = yield d -- == abbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 58 / 141

data S = P1 S | P2 S S | P3

instance Yield S whereyield (P1 s) = ’a’ : (yield s) ++ [’b’]yield (P2 s1 s2) = (yield s1) ++ (yield s1)yield (P3) = ""

d1 , d2 :: Sd1 = P1 (P1 P3)d2 = P2 P3 (P1 (P1 P3))

-- (yield d1) == (yield d2) == "aabb"

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 59 / 141

Linz 6th, section 5.2, definition 5.5, page 145.Definition. A context-free grammar G is ambiguous, if there is more than one leftmost (orrightmost) derivation of the same sentence.

Different leftmost (or rightmost) derivations arise from different choices in productions for asingle nonterminal in a sentential form, not just in the order to expand different occurrences ofa nonterminals in a sentential form.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 60 / 141

Linz 6th, Section 5.2, Example 5.10, page 146

Consider the grammar with productions S → aSb | SS | εThe sentence aabb has two distinct leftmost derivations.

S 1⇒ aSb 1⇒ aaSbb 1⇒ aabb

S 1⇒ SS 1⇒lm

S 1⇒ aSb 1⇒ aaSbb 1⇒ aabb

The first derivation is a leftmost derivation (and a rightmost derivation); the second derivationis a leftmost derivation of the same string (aabb). The derivations are different. The firstderivation does not use the production S → SS while the second one does.Hence, the grammar is ambiguous.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 61 / 141

(Linz 6th, Section 5.2, Example 5.10, page 146)Linz 6th, Section 5.2, Example 5.11, page 146–147

Linz 6th, Section 5.2, Example 5.12, page 147

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 62 / 141

Linz 6th, section 5.2, definition 5.5, page 146.Definition. A context-free language L is inherently ambiguous, if there is no unambiguouscontext-free grammar G for which L = L(G).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 63 / 141

Linz 6th, Section 5.2, Example 5.13, page 149

L = {anbncm} ∪ {anbmcm}

Notice that L = L1 ∪ L2 where L1 and L2 are generated by the following context-free grammarwith S → S1 | S2 where

S1→S1c | AA → aAb | ε

S2→ aS2 | BB → bBc | ε

This grammar is ambiguous as S1∗⇒ abc and S2

∗⇒ abc. But this does not mean the languageis ambiguous. A rigorous argument that L is inherently ambiguous is technical and can befound, for example, in Harrison 1978.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 64 / 141

Exhaustive Search

Linz 6th, Section 5.2, page 141.Algorithm. Given a grammar G without ε-productions and unit productions and a string w ,systematically construct all possible leftmost (or rightmost) derivations to determine ifw ∈ L(G).

We postpone this until we have we give all the necessary definitions.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 65 / 141

Outline

1 Different Kinds of Grammars

2 Context-Free GrammarsLeftmost, Rightmost DerivationsAmbiguity

3 Chomsky and Greibach Normal Forms

4 Cleaning Up CFG’s

5 Brute Force Membership Test

6 CYK Algorithm

1 Chomsky Normal FormDefinition: A grammar is said to be in CNF if all its productions are in one of two forms:A→ BC or A→ a.

Linz 6th, §6.2, definition 6.4, page 171HUM 3rd, §7.1.5, page 272

Kozen, Lecture 21, definition 21.1, page 140Sudkamp 3rd, §4.5, definition 4.5.1, page 122

2 Greibach Normal FormDefinition: A grammar is said to be in GNF if all its productions are in the from: A→ aXwhere a ∈ T and X ∈ V ∗.

Linz 6th §6.2, definition 6.5, page 174HMU 3rd, §7.1, page 277

Kozen, Lecture 21, definition 21.1, page 140Sudkamp 3rd, §4.8, definition 4.8.1, page 131

NB ε /∈ L(G).Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 67 / 141

There is another interesting normal form for grammars. This form, called GreibachNormal Form, after Sheila Greibach, has several interesting consequences. Since eachuse of a production introduces exactly one terminal into a sentential form, a string oflength n has a derivation of exactly n steps. Also, if we apply the PDA construction toa Greibach-Normal grammar, then we get a PDA with no ε-rules, thus showing thatit is always possible to eliminate such transitions of a PDA.

HMU 3rd, section 7.1, page 277

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 68 / 141

Sheila Adele Greibach (b. 1939)

Sheila Greibach was born in New York City and received the A.B. degree from RadcliffeCollege in Linguistics and Applied Mathematics summa cum laude in 1960. She received thePh.D. in Applied Mathematics from Harvard University in 1963. She joined the UCLA Facultyin 1969 and the Computer Science Department in 1970 and is now Emeritus Professor.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 69 / 141

We will return to GNF after we introduce PDA’s.

Going back to CNF, it is easy to put a grammar in Chomsky Normal Form. And, because ofthat, there is a very convenient and efficient algorithm to determine membership in any CFL.This algorithm is known as the CYK Algorithm and has many uses, e.g., in natural languageprocessing.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 70 / 141

Before we put grammars in Chomsky normal form, we wish to address some petty annoyancesin “untidy” or “meandering” grammars.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 71 / 141

Outline

1 Different Kinds of Grammars

2 Context-Free GrammarsLeftmost, Rightmost DerivationsAmbiguity

3 Chomsky and Greibach Normal Forms

4 Cleaning Up CFG’s

5 Brute Force Membership Test

6 CYK Algorithm

Cleaning Up CFG’s

Before we can study context-free languages in greater depth, we must attendto some technical matters. The definition of a context-free grammar imposes norestriction whatsoever on the right side of a production. However, complete freedomis not necessary and, in fact, is a detriment in some arguments.

Linz 6th, Chapter 6, page 156.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 73 / 141

Cleaning Up CFG’s

We must get rid of all ε-productions A → ε. and unit productions A → B.These are bothersome because they make it hard to determine whether applying aproduction makes any progress toward deriving a string of terminals. For instance,with unit productions, there can be loops in the derivation, and with ε-productions,one can generate very long strings of nonterminals and then erase them all.

Kozen, page 141.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 74 / 141

An Example of a “Meandering” Grammar

S→AS | BA→B | εB→A | b

Needless looping:S ⇒ B ⇒ b

S ⇒ B ⇒ A⇒ B ⇒ bS ⇒ B ⇒ A⇒ B ⇒ A⇒ B ⇒ b

Vanishing nonterminals:S ⇒ AS ∗⇒ AAAS ⇒ AAAB ∗⇒ B ⇒ b

S ⇒ AS ∗⇒ AAAAAS ⇒ AAAAAB ∗⇒ B ⇒ b

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 75 / 141

Another Example

S → W | X | ZW → AA → BB → CC → AaX → CY → a Y | aZ → ε

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 76 / 141

1 The nonterminal Y can never be derived by the start symbol S. It is useless.2 The production S → X can easily be made redundant by skipping directly to C since

X → C .3 The nonterminal A cannot derive any string in T ∗ because any sentential form with A

cannot get rid of the A.4 We do not need the nonterminal Z , because is can generate nothing—the empty string.

It seems counterproductive to generate a long sentential form only to cut it back later.The grammar is too complicated–the only string in L(G) is ε.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 77 / 141

A Useful Substitution Rule

Linz 6th, section 6.1, theorem 6.1, page 157.

Theorem. For any grammar G = 〈T ,V ,P,S〉, and for any nonterminal B ∈ V such thatB ∗⇒G γ, if A→ αBβ ∈ P, then L(G) = L(G ′) for the grammarG ′ = 〈T ,V ,P ∪ {A→ αγβ}, S〉.

Adding a shortcut does not change the grammar.

Do not overlook the special case in which γ = ε and so if there is a production B → ε, thenthere is short cut A→ αβ

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 78 / 141

Summary of the Process

It is possible to systematically modify grammars to eliminate productions that are not helpfulmaking them simpiler to use. A two-step process gets rid of the most blatant problems:

1 Remove ε and unit productions,2 Remove unreachable and nonproductive nonterminals.

In what follows we elaborate and give algorithms.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 79 / 141

Definition

For a context-free grammar G = 〈T ,V ,P,S〉:• A nonterminal A ∈ N is said to be nullable if A⇒∗ ε.• A nonterminal A ∈ N is said to be reachable if S ⇒∗ αAβ for some α, β ∈ (Σ ∪ V )∗.• A nonterminal A ∈ N is said to be productive if A⇒∗ w for some w ∈ Σ∗.• A nonterminal A ∈ N is said to be useful if it is both reachable and productive.• A pair of nonterminals A,B ∈ V is said to be a unit pair if A ∗⇒ B.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 80 / 141

Nullability. Any nonterminal A is nullable if there is a production of the form A→ ε. A is alsonullable, if there is a production of the form A→ V1V2 · · ·Vk and all Vi is nullable for1 ≤ i ≤ k.Reachability. A is reachable if there is a production of the form S → αAβ, or if there is anyB ∈ N where B → αAβ and B is reachable.Productiveness/Generating. A nonterminal A ∈ N is productive if there is some production ofthe form A→ w for some w ∈ Σ∗ or if there is some production A→ α for which all thenonterminals in α are productive.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 81 / 141

Vn := ∅; -- assume nothing is nullableloop

Vo := Vn;for X → α ∈ P loop

add X to Vn if α ∈ Vo∗;

end loop;exit when Vn = Vo ;

end loop;

Compute the set Vo of all nullable nonterminals

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 82 / 141

nullableRHS :: Eq a => [a] -> [a] -> BoolnullableRHS set rhs = all (‘elem ‘ set) rhs

close :: [Char] -> [Char]close set

| set == next = set| otherwise = close nextwhere next = nub $ set ++

[non | (non ,rhs)<-grammar , nullableRHS set rhs]

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 83 / 141

Sudkamp 3rd, section 4.2, algorithm 4.2.1, page 108.Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 84 / 141

The set nullable is an inductively defined set.

A→ ε ∈ PA ∈ nulllable

A→ α ∈ P α ∈ nullable∗A ∈ nulllable

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 85 / 141

Floyd, 1994, Section 5.3, page 338.Figure 5.4: An algorithm for testing which variables in a CFG derive Λ.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 86 / 141

Vn := {S}; -- Assume just S is reachableloop

Vo := Vn;for X → α ∈ P loop

if X ∈ Vn thenfor N ∈ V in α loop

add N to Vn;end loop;

end if;end loop;exit when Vn = Vo

end loop;

Compute the set of all reachable nonterminals

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 87 / 141

The set reachable is an inductively defined set.

S ∈ reachableA→ αBβ ∈ P A ∈ reachable

B ∈ reachable

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 88 / 141

Vn := Ø; -- assume nothing is productiveloop

Vo := Vn;for X → α ∈ P loop

add X to Vn if α ∈ T ∗end loop;exit when Vn = Vo ;

end loop;

Compute the set of all productive nonterminals

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 89 / 141

Sudkamp 3rd, section 4.4, algorithm 4.4.2, page 117.Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 90 / 141

Additional and Pointless Algorithm

Zn = {〈Y ,Y 〉}; -- Y ∗⇒ Y for all Y ∈ Vloop

Zo := Zn;for A→ B ∈ P loop -- unit production

for 〈B,C〉 ∈ Zn loop -- B ∗⇒ Cadd 〈A,C〉 to Zn;

end loop;end loop;exit when Zn = Zo ;

end loop;

Compute the set of all units pairs A ∗⇒ B

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 91 / 141

An ε production is one of the form A→ ε for some nonterminal A.All ε productions can be removed from a grammar (except possibly S → ε).A nonterminal is said to be nullable if N ⇒∗ ε.Find all nullable nonterminals. Replace every production with a nullable nonterminal in theRHS with two productions: one with and one without the nullable nonterminal. If the leavingout the nonterminal results in a ε production, then do not add it.If a production has n nullable nonterminals in the RHS, then it is replaced by 2n productions(or 2n − 1 productions).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 92 / 141

Pn := P; -- keep all the original productionsloop

Po := Pn; -- remember previous productionsfor X → αNβ ∈ Pn loop

if N → ε ∈ Pn thenadd X → αβ to Pn;

end if;end loop;for X → B ∈ Pn (where B ∈ V ) loop

if B → β ∈ Pn thenadd X → β to Pn;

end if;end loop;exit when Po = Pn; -- repeat until no changes

end loop;

Make ε and unit productions superfluousSlightly more explict version

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 93 / 141

Pn := P; -- keep all the original productionsloop

Po := Pn; -- remember previous productionsfor X → αNβ ∈ Pn loop

add X → αβ to Pn if N → ε ∈ Pn;end loop;for X → B ∈ Pn (where B ∈ V ) loop

add X → β to Pn if B → β ∈ Pn;end loop;exit when Po = Pn; -- repeat until no changes

end loop;

Make ε and unit productions superfluous

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 94 / 141

1 Examples first.2 Proof of correctness afterward.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 95 / 141

Remove the ε and unit productions from the following grammar:S→ABA→ aAA | εB→ bBB | ε

S→A | B | ABA→ a | aA | aAA | εB→ b | bB | bBB | ε

S→A | B | ABS→ a | aA | aAAS→ b | bB | bBBA→ a | aA | aAA | εB→ b | bB | bBB | εS→ABS→ a | aA | aAAS→ b | bB | bBBA→ a | aA | aAAB→ b | bB | bBBRyan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 96 / 141

The previous algorithm terminates. The RHS of each new production added is not longer thanthe longest production of the original grammar. There is only a finite number of such possibleproductions.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 97 / 141

Theorem. No unit production is required in a minimal step derivation any grammarconstructed by the previous algorithm.Suppose a unit production A→ B is used in the shortest possible derivation of S ∗⇒ x . Then

S ⇒m αAβ 1⇒ αBβ ⇒n ηBθ 1⇒ ηγδ ⇒k x

S ⇒m αAβ 1⇒ αγθ ⇒n ηγθ ⇒k x

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 98 / 141

Theorem. No epsilon production is required in a minimal step derivation any grammarconstructed by the previous algorithm.Suppose an ε production B → ε is used in the shortest possible derivation of S ∗⇒ x . Then

S ⇒m ηAθ 1⇒ ηαBβθ ⇒n γBδ 1⇒ γδ ⇒k x

S ⇒m ηAθ 1⇒ ηαβθ ⇒n γδ∗⇒

k x

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 99 / 141

Theorem: Let G ′ be the grammar after running the “eu” algorithm on context-free grammarG . If ε ∈ L(G), then S → ε in the productions of G ′.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 100 / 141

Cleaning Up CFG

If a nonterminal A is useful, then S ∗⇒ αAβ ∗⇒ w for some α, β ∈ (T ∪ V )∗ and w ∈ T ∗

Theorem. Deleting all useless nonterminals and any production containing them on the LHS orRHS results in a grammar that generates the same language as originally.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 101 / 141

Theorem

Every CFG G = 〈Σ,N,P,S〉 can be (effectively) transformed to one without cycles (unitproductions), non-productive, or unreachable nonterminals. (This means that unit productionsare unnecessary.)• A nullable nonterminal N is one for which N ⇒∗ ε.• A productive nonterminal N is one for which N ⇒∗ w for some w ∈ Σ∗.• A reachable nonterminal N is one for which S ⇒∗ αNβ for some α, β ∈ (Σ ∪ N)∗.

All epsilon productions may also (effectively) be eliminated from a CFG, if the language doesnot contain the empty string. If the language contains the empty string, no epsilonproductions are necessary save one: S → ε.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 102 / 141

Chomsky Normal Form

Theorem. Every context free grammar can be put in Chomsky Normal Form. In other words,for every context grammar G , there is a context free grammar G ′ in Chomsky Normal Formsuch that LG(G) = L(G ′)

Linz 6th, §6.2, Theorem 6.6, page 172.HMU 3rd, §7.1.5, page 272.Kozen, Lecture 21, page 140.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 103 / 141

Algorithm to Put a Grammar in CNF

First, eliminate ε productions and unit productions. One might as well eliminate uselessnontermianls (and their productions).

1 See that all RHS of length two or more consist only of nonterminals by introducing anonterminal Ta and adding productions like Ta → a to the grammar.

2 For productions of length k greater than two, add a cascade of k − 2 new nonterminalsand productions.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 104 / 141

Algorithm to Put a Grammar in CNF

For productions of length k greater than two X → A1A2 . . .Ak , add a cascade of k − 2 newnonterminals and productions.

X →B1X1X1 →B2X2X2 →B3X3... →

...Xk−3→Bk−2Xk−2Xk=2→Bk−1Bk

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 105 / 141

Convert to CNF.Linz, example 6.8, page 173. S → ABa,A→ aab,B → Ac.

Kozen, example 21.4. L = {anbn | n ≥ 1}

Kozen, example 21.5. Balanced parentheses.

S → [S] | SS | εS → ASB | SS | AB, A→ [, B → ]

S → AC | SS, A→ [, B → ], C → SB

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 106 / 141

Put in Chomsky Normal Form:S→ASA | aBA→B | SB→ b | ε

B → εS→ASA | aB | aA→B | S | εB→ b | ε

A→ εS→ASA | aB | a | AS | SAA→B | S | εB→ b | ε

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 107 / 141

A→ B and A→ SS→ASA | aB | a | AS | SAS→BSB | aB | a | BS | SBS→SSS | aB | a | SS | SSA→B | S | εB→ b | ε

Does not new unit rules like S → ε matter?

S→ASA | aB | a | AS | SAS→BSB | aB | a | BS | SBS→SSS | aB | a | SSB→ b

Do we now have useless nonterminals?

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 108 / 141

Outline

1 Different Kinds of Grammars

2 Context-Free GrammarsLeftmost, Rightmost DerivationsAmbiguity

3 Chomsky and Greibach Normal Forms

4 Cleaning Up CFG’s

5 Brute Force Membership Test

6 CYK Algorithm

Exhaustive Search

Linz 6th, Section 5.2, page 141.Algorithm. Given a grammar G without ε-productions and unit productions and a string w ,systematically construct all possible leftmost (or rightmost) derivations and see if thesentential forms are consistent with w .

Lemma. Given a grammar G without ε-productions. If S ∗⇒G α∗⇒G x , then the |α| ≤ |x |.

Proof. For all nonterminals N, if N ∗⇒ x , then 1 ≤ |x |.Lemma. Given a grammar G without ε-productions and unit productions. Let #+(α) be thenumber of terminal symbols in α added to the length of w . If β 1⇒G γ, then #+(β) < #+(γ).And, hence, if β ∗⇒G γ, then #+(β) < #+(γ).Corollary. If α ∗⇒G w , then the number steps is bounded by #+(w) = 2|w |.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 110 / 141

Example of Exhaustive Search

Linz 6th, Section 5.2, Example 5.7, page 141.Busch’s notes, class09 , page 12ff.

Find a derivation of aabb in the grammar S → SS | aSb | bSa | ε.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 111 / 141

Outline

1 Different Kinds of Grammars

2 Context-Free GrammarsLeftmost, Rightmost DerivationsAmbiguity

3 Chomsky and Greibach Normal Forms

4 Cleaning Up CFG’s

5 Brute Force Membership Test

6 CYK Algorithm

Cocke-Younger-Kasami Algorithm (CYK)

Linz 6th, section 6.3, page 178Busch’s notes, class10 , page 22ffHMU 3rd, section 7.4.4, page 303

Kozen, Lecture 27, page 191Floyd, Section 5.11, Figure 5.20, page 390.

John Cocke and Jacob T. Schwartz. Programming Languages and their Compilers. Tech. rep.New York: NYU, Courant Institute, 1970T. Kasami. An efficient recognition and syntax analysis algorithm for context-free languages.Tech. rep. AFCRL-65-758. Air Force Cambridge Res. Lab., Bedford Mass., 1965D. H. Younger. “Recognition and Parsing of Context-Free Languages in n3”. In: Informationand Control 10 (1967), pp. 189–208

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 113 / 141

CYK Algorithm

Let G = 〈T ,V ,P, S〉 be a CFG in Chomsky normal form. We can determine if a string s is inL(G). Suppose s = a0a1, . . . , an−1. We write s[i : j] where 0 ≤ i ≤ j ≤ n − 1 for the substringof s of length j − i + 1 beginning at position i and ending at position j .

We require a mutable, two dimensional, triangular array M containing sets of nonterminalswith the intention that A ∈ M[i , j] iff A⇒∗ s[i : j].

A classic dynamic programming algorithm.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 114 / 141

CYK Algorithm

0 1 2 · · · i · · · n-10 s[0 : 0] s[0 : 1] s[0 : 2] s[0 : i ] s[0 : n − 1]1 s[1 : 1] s[1 : 2] s[1 : i ] s[1 : n − 1]2 s[2 : 2] s[2 : i ] s[2 : n − 1]...r s[r : i ] s[r : n − 1]...

n-1 s[n − 1 : n − 1]

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 115 / 141

CYK Algorithm – Initialization

for i in 0, . . . , n − 1 loopM[i , i ] := ∅for N → a in P loop

add N to M[i , i ] if a = ai -- N ⇒∗G s[i : i ]

end loopend loop

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 116 / 141

CYK Algorithm

for g in 1 . . . , n − 1 loop -- every substring lengthfor r in 0, . . . , n − g − 1 loop -- starting at r

for m in r , . . . , r + g − 1 loop -- every midpoint-- s[r : r + g ] = s[r : m] ++ s[m + 1 : r + g ]-- the nonterminals generating s[r : m]L = M[r ,m]-- the nonterminals generating s[m + 1 : r + g ]R = M[m + 1, r + g ]for A→ B C in P loop

-- A⇒∗G s[r : r + g ]

add A to M[r , r + g ] if B ∈ L and C ∈ Rend loop

end loopend loop

end loopreturn S ∈ M[0, n − 1]

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 117 / 141

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 118 / 141

Russell & Norvig, Chapter 23, Figure 23.5, page 894Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 119 / 141

ExampleLinz 6th, §6.3, example 6.11, page 179. Use the CYK method to determine if the stringw = aabbb is in the language generated by the following grammar G in Chomsky NormalForm.

S→ABA→BB | aB→AB | b

a a b b b{A}

{A} {B} {B} {B}

aa ab bb bb

Ø {S,B} {A} {A}

aab abb bbb

{S,B} {A} {S,B}

aabb abbb

{A} {S,B}

aabbb

{S,B}

Since S ∈ M[0, 4], the string w = aabbb is in L(G).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 120 / 141

ExampleLinz 6th, §6.3, example 6.11, page 179. Use the CYK method to determine if the stringw = aabbb is in the language generated by the following grammar G in Chomsky NormalForm.

S→ABA→BB | aB→AB | b

a a b b b{A} {A}

{B} {B} {B}

aa ab bb bb

Ø {S,B} {A} {A}

aab abb bbb

{S,B} {A} {S,B}

aabb abbb

{A} {S,B}

aabbb

{S,B}

Since S ∈ M[0, 4], the string w = aabbb is in L(G).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 120 / 141

ExampleLinz 6th, §6.3, example 6.11, page 179. Use the CYK method to determine if the stringw = aabbb is in the language generated by the following grammar G in Chomsky NormalForm.

S→ABA→BB | aB→AB | b

a a b b b{A} {A} {B}

{B} {B}

aa ab bb bb

Ø {S,B} {A} {A}

aab abb bbb

{S,B} {A} {S,B}

aabb abbb

{A} {S,B}

aabbb

{S,B}

Since S ∈ M[0, 4], the string w = aabbb is in L(G).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 120 / 141

ExampleLinz 6th, §6.3, example 6.11, page 179. Use the CYK method to determine if the stringw = aabbb is in the language generated by the following grammar G in Chomsky NormalForm.

S→ABA→BB | aB→AB | b

a a b b b{A} {A} {B} {B}

{B}

aa ab bb bb

Ø {S,B} {A} {A}

aab abb bbb

{S,B} {A} {S,B}

aabb abbb

{A} {S,B}

aabbb

{S,B}

Since S ∈ M[0, 4], the string w = aabbb is in L(G).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 120 / 141

ExampleLinz 6th, §6.3, example 6.11, page 179. Use the CYK method to determine if the stringw = aabbb is in the language generated by the following grammar G in Chomsky NormalForm.

S→ABA→BB | aB→AB | b

a a b b b{A} {A} {B} {B} {B}aa ab bb bb

Ø {S,B} {A} {A}

aab abb bbb

{S,B} {A} {S,B}

aabb abbb

{A} {S,B}

aabbb

{S,B}

Since S ∈ M[0, 4], the string w = aabbb is in L(G).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 120 / 141

ExampleLinz 6th, §6.3, example 6.11, page 179. Use the CYK method to determine if the stringw = aabbb is in the language generated by the following grammar G in Chomsky NormalForm.

S→ABA→BB | aB→AB | b

a a b b b{A} {A} {B} {B} {B}aa ab bb bbØ

{S,B} {A} {A}

aab abb bbb

{S,B} {A} {S,B}

aabb abbb

{A} {S,B}

aabbb

{S,B}

Since S ∈ M[0, 4], the string w = aabbb is in L(G).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 120 / 141

ExampleLinz 6th, §6.3, example 6.11, page 179. Use the CYK method to determine if the stringw = aabbb is in the language generated by the following grammar G in Chomsky NormalForm.

S→ABA→BB | aB→AB | b

a a b b b{A} {A} {B} {B} {B}aa ab bb bbØ {S,B}

{A} {A}

aab abb bbb

{S,B} {A} {S,B}

aabb abbb

{A} {S,B}

aabbb

{S,B}

Since S ∈ M[0, 4], the string w = aabbb is in L(G).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 120 / 141

ExampleLinz 6th, §6.3, example 6.11, page 179. Use the CYK method to determine if the stringw = aabbb is in the language generated by the following grammar G in Chomsky NormalForm.

S→ABA→BB | aB→AB | b

a a b b b{A} {A} {B} {B} {B}aa ab bb bbØ {S,B} {A}

{A}

aab abb bbb

{S,B} {A} {S,B}

aabb abbb

{A} {S,B}

aabbb

{S,B}

Since S ∈ M[0, 4], the string w = aabbb is in L(G).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 120 / 141

ExampleLinz 6th, §6.3, example 6.11, page 179. Use the CYK method to determine if the stringw = aabbb is in the language generated by the following grammar G in Chomsky NormalForm.

S→ABA→BB | aB→AB | b

a a b b b{A} {A} {B} {B} {B}aa ab bb bbØ {S,B} {A} {A}aab abb bbb

{S,B} {A} {S,B}

aabb abbb

{A} {S,B}

aabbb

{S,B}

Since S ∈ M[0, 4], the string w = aabbb is in L(G).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 120 / 141

ExampleLinz 6th, §6.3, example 6.11, page 179. Use the CYK method to determine if the stringw = aabbb is in the language generated by the following grammar G in Chomsky NormalForm.

S→ABA→BB | aB→AB | b

a a b b b{A} {A} {B} {B} {B}aa ab bb bbØ {S,B} {A} {A}aab abb bbb{S,B}

{A} {S,B}

aabb abbb

{A} {S,B}

aabbb

{S,B}

Since S ∈ M[0, 4], the string w = aabbb is in L(G).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 120 / 141

ExampleLinz 6th, §6.3, example 6.11, page 179. Use the CYK method to determine if the stringw = aabbb is in the language generated by the following grammar G in Chomsky NormalForm.

S→ABA→BB | aB→AB | b

a a b b b{A} {A} {B} {B} {B}aa ab bb bbØ {S,B} {A} {A}aab abb bbb{S,B} {A}

{S,B}

aabb abbb

{A} {S,B}

aabbb

{S,B}

Since S ∈ M[0, 4], the string w = aabbb is in L(G).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 120 / 141

ExampleLinz 6th, §6.3, example 6.11, page 179. Use the CYK method to determine if the stringw = aabbb is in the language generated by the following grammar G in Chomsky NormalForm.

S→ABA→BB | aB→AB | b

a a b b b{A} {A} {B} {B} {B}aa ab bb bbØ {S,B} {A} {A}aab abb bbb{S,B} {A} {S,B}aabb abbb

{A} {S,B}

aabbb

{S,B}

Since S ∈ M[0, 4], the string w = aabbb is in L(G).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 120 / 141

ExampleLinz 6th, §6.3, example 6.11, page 179. Use the CYK method to determine if the stringw = aabbb is in the language generated by the following grammar G in Chomsky NormalForm.

S→ABA→BB | aB→AB | b

a a b b b{A} {A} {B} {B} {B}aa ab bb bbØ {S,B} {A} {A}aab abb bbb{S,B} {A} {S,B}aabb abbb{A}

{S,B}

aabbb

{S,B}

Since S ∈ M[0, 4], the string w = aabbb is in L(G).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 120 / 141

ExampleLinz 6th, §6.3, example 6.11, page 179. Use the CYK method to determine if the stringw = aabbb is in the language generated by the following grammar G in Chomsky NormalForm.

S→ABA→BB | aB→AB | b

a a b b b{A} {A} {B} {B} {B}aa ab bb bbØ {S,B} {A} {A}aab abb bbb{S,B} {A} {S,B}aabb abbb{A} {S,B}aabbb

{S,B}

Since S ∈ M[0, 4], the string w = aabbb is in L(G).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 120 / 141

ExampleLinz 6th, §6.3, example 6.11, page 179. Use the CYK method to determine if the stringw = aabbb is in the language generated by the following grammar G in Chomsky NormalForm.

S→ABA→BB | aB→AB | b

a a b b b{A} {A} {B} {B} {B}aa ab bb bbØ {S,B} {A} {A}aab abb bbb{S,B} {A} {S,B}aabb abbb{A} {S,B}aabbb{S,B}

Since S ∈ M[0, 4], the string w = aabbb is in L(G).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 120 / 141

Example

Linz 6th, §6.3, exericse 4, page 180 (solution page 421). Use the CYK method to determine ifthe string w = aaabbbb is in the language generated by the grammar S → aSb | b.

First, convert the grammar to CNF.

S →AC | bC→ SBA→ aB→ b

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 121 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A}

{A} {A} {S,B} {S,B} {S,B} {S,B}

aa aa ab bb bb bb

Ø Ø Ø {C} {C} {C}

aaa aab abb bbb bbb

Ø Ø {S} Ø Ø

aaab aabb abbb bbb

{C} Ø Ø Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A}

{A} {S,B} {S,B} {S,B} {S,B}

aa aa ab bb bb bb

Ø Ø Ø {C} {C} {C}

aaa aab abb bbb bbb

Ø Ø {S} Ø Ø

aaab aabb abbb bbb

{C} Ø Ø Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A}

{S,B} {S,B} {S,B} {S,B}

aa aa ab bb bb bb

Ø Ø Ø {C} {C} {C}

aaa aab abb bbb bbb

Ø Ø {S} Ø Ø

aaab aabb abbb bbb

{C} Ø Ø Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A} {S,B}

{S,B} {S,B} {S,B}

aa aa ab bb bb bb

Ø Ø Ø {C} {C} {C}

aaa aab abb bbb bbb

Ø Ø {S} Ø Ø

aaab aabb abbb bbb

{C} Ø Ø Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A} {S,B} {S,B}

{S,B} {S,B}

aa aa ab bb bb bb

Ø Ø Ø {C} {C} {C}

aaa aab abb bbb bbb

Ø Ø {S} Ø Ø

aaab aabb abbb bbb

{C} Ø Ø Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A} {S,B} {S,B} {S,B}

{S,B}

aa aa ab bb bb bb

Ø Ø Ø {C} {C} {C}

aaa aab abb bbb bbb

Ø Ø {S} Ø Ø

aaab aabb abbb bbb

{C} Ø Ø Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A} {S,B} {S,B} {S,B} {S,B}aa aa ab bb bb bb

Ø Ø Ø {C} {C} {C}

aaa aab abb bbb bbb

Ø Ø {S} Ø Ø

aaab aabb abbb bbb

{C} Ø Ø Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A} {S,B} {S,B} {S,B} {S,B}aa aa ab bb bb bbØ

Ø Ø {C} {C} {C}

aaa aab abb bbb bbb

Ø Ø {S} Ø Ø

aaab aabb abbb bbb

{C} Ø Ø Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A} {S,B} {S,B} {S,B} {S,B}aa aa ab bb bb bbØ Ø

Ø {C} {C} {C}

aaa aab abb bbb bbb

Ø Ø {S} Ø Ø

aaab aabb abbb bbb

{C} Ø Ø Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A} {S,B} {S,B} {S,B} {S,B}aa aa ab bb bb bbØ Ø Ø

{C} {C} {C}

aaa aab abb bbb bbb

Ø Ø {S} Ø Ø

aaab aabb abbb bbb

{C} Ø Ø Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A} {S,B} {S,B} {S,B} {S,B}aa aa ab bb bb bbØ Ø Ø {C}

{C} {C}

aaa aab abb bbb bbb

Ø Ø {S} Ø Ø

aaab aabb abbb bbb

{C} Ø Ø Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A} {S,B} {S,B} {S,B} {S,B}aa aa ab bb bb bbØ Ø Ø {C} {C}

{C}

aaa aab abb bbb bbb

Ø Ø {S} Ø Ø

aaab aabb abbb bbb

{C} Ø Ø Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A} {S,B} {S,B} {S,B} {S,B}aa aa ab bb bb bbØ Ø Ø {C} {C} {C}aaa aab abb bbb bbb

Ø Ø {S} Ø Ø

aaab aabb abbb bbb

{C} Ø Ø Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A} {S,B} {S,B} {S,B} {S,B}aa aa ab bb bb bbØ Ø Ø {C} {C} {C}aaa aab abb bbb bbbØ

Ø {S} Ø Ø

aaab aabb abbb bbb

{C} Ø Ø Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A} {S,B} {S,B} {S,B} {S,B}aa aa ab bb bb bbØ Ø Ø {C} {C} {C}aaa aab abb bbb bbbØ Ø

{S} Ø Ø

aaab aabb abbb bbb

{C} Ø Ø Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A} {S,B} {S,B} {S,B} {S,B}aa aa ab bb bb bbØ Ø Ø {C} {C} {C}aaa aab abb bbb bbbØ Ø {S}

Ø Ø

aaab aabb abbb bbb

{C} Ø Ø Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A} {S,B} {S,B} {S,B} {S,B}aa aa ab bb bb bbØ Ø Ø {C} {C} {C}aaa aab abb bbb bbbØ Ø {S} Ø

Ø

aaab aabb abbb bbb

{C} Ø Ø Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A} {S,B} {S,B} {S,B} {S,B}aa aa ab bb bb bbØ Ø Ø {C} {C} {C}aaa aab abb bbb bbbØ Ø {S} Ø Ø

aaab aabb abbb bbb

{C} Ø Ø Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A} {S,B} {S,B} {S,B} {S,B}aa aa ab bb bb bbØ Ø Ø {C} {C} {C}aaa aab abb bbb bbbØ Ø {S} Ø Ø

aaab aabb abbb bbb{C}

Ø Ø Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A} {S,B} {S,B} {S,B} {S,B}aa aa ab bb bb bbØ Ø Ø {C} {C} {C}aaa aab abb bbb bbbØ Ø {S} Ø Ø

aaab aabb abbb bbb{C} Ø

Ø Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A} {S,B} {S,B} {S,B} {S,B}aa aa ab bb bb bbØ Ø Ø {C} {C} {C}aaa aab abb bbb bbbØ Ø {S} Ø Ø

aaab aabb abbb bbb{C} Ø Ø

Ø

aaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A} {S,B} {S,B} {S,B} {S,B}aa aa ab bb bb bbØ Ø Ø {C} {C} {C}aaa aab abb bbb bbbØ Ø {S} Ø Ø

aaab aabb abbb bbb{C} Ø Ø Øaaabb aabbb bbbb

{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example (Continued)Linz 6th, §6.3, exericse 4, page 180.

a a a b b b b{A} {A} {A} {S,B} {S,B} {S,B} {S,B}aa aa ab bb bb bbØ Ø Ø {C} {C} {C}aaa aab abb bbb bbbØ Ø {S} Ø Ø

aaab aabb abbb bbb{C} Ø Ø Øaaabb aabbb bbbb{}

aaabbb aabbbb

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 122 / 141

Example

Example 7.34 from HMU, 3rd, page 306.Consider the following CFG G in CNF.

S →AB | BCA→BA | aB→CC | bC→AB | a

Let us test baaba for membership in L(G).

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 123 / 141

Example (Continued)

S →AB | BCA→BA | aB→CC | bC→AB | a

b a a b a{B}

{A,C} {A,C} {B} {A,C}

ba aa ab ba

{S,A} {B} {S,C} {S,A}

baa aab aba

Ø {B} {B}

baab aaba

Ø {S,A,C}

baaba

{S,A,C}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 124 / 141

Example (Continued)

S →AB | BCA→BA | aB→CC | bC→AB | a

b a a b a{B} {A,C}

{A,C} {B} {A,C}

ba aa ab ba

{S,A} {B} {S,C} {S,A}

baa aab aba

Ø {B} {B}

baab aaba

Ø {S,A,C}

baaba

{S,A,C}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 124 / 141

Example (Continued)

S →AB | BCA→BA | aB→CC | bC→AB | a

b a a b a{B} {A,C} {A,C}

{B} {A,C}

ba aa ab ba

{S,A} {B} {S,C} {S,A}

baa aab aba

Ø {B} {B}

baab aaba

Ø {S,A,C}

baaba

{S,A,C}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 124 / 141

Example (Continued)

S →AB | BCA→BA | aB→CC | bC→AB | a

b a a b a{B} {A,C} {A,C} {B}

{A,C}

ba aa ab ba

{S,A} {B} {S,C} {S,A}

baa aab aba

Ø {B} {B}

baab aaba

Ø {S,A,C}

baaba

{S,A,C}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 124 / 141

Example (Continued)

S →AB | BCA→BA | aB→CC | bC→AB | a

b a a b a{B} {A,C} {A,C} {B} {A,C}ba aa ab ba

{S,A} {B} {S,C} {S,A}

baa aab aba

Ø {B} {B}

baab aaba

Ø {S,A,C}

baaba

{S,A,C}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 124 / 141

Example (Continued)

S →AB | BCA→BA | aB→CC | bC→AB | a

b a a b a{B} {A,C} {A,C} {B} {A,C}ba aa ab ba{S,A}

{B} {S,C} {S,A}

baa aab aba

Ø {B} {B}

baab aaba

Ø {S,A,C}

baaba

{S,A,C}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 124 / 141

Example (Continued)

S →AB | BCA→BA | aB→CC | bC→AB | a

b a a b a{B} {A,C} {A,C} {B} {A,C}ba aa ab ba{S,A} {B}

{S,C} {S,A}

baa aab aba

Ø {B} {B}

baab aaba

Ø {S,A,C}

baaba

{S,A,C}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 124 / 141

Example (Continued)

S →AB | BCA→BA | aB→CC | bC→AB | a

b a a b a{B} {A,C} {A,C} {B} {A,C}ba aa ab ba{S,A} {B} {S,C}

{S,A}

baa aab aba

Ø {B} {B}

baab aaba

Ø {S,A,C}

baaba

{S,A,C}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 124 / 141

Example (Continued)

S →AB | BCA→BA | aB→CC | bC→AB | a

b a a b a{B} {A,C} {A,C} {B} {A,C}ba aa ab ba{S,A} {B} {S,C} {S,A}baa aab aba

Ø {B} {B}

baab aaba

Ø {S,A,C}

baaba

{S,A,C}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 124 / 141

Example (Continued)

S →AB | BCA→BA | aB→CC | bC→AB | a

b a a b a{B} {A,C} {A,C} {B} {A,C}ba aa ab ba{S,A} {B} {S,C} {S,A}baa aab abaØ

{B} {B}

baab aaba

Ø {S,A,C}

baaba

{S,A,C}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 124 / 141

Example (Continued)

S →AB | BCA→BA | aB→CC | bC→AB | a

b a a b a{B} {A,C} {A,C} {B} {A,C}ba aa ab ba{S,A} {B} {S,C} {S,A}baa aab abaØ {B}

{B}

baab aaba

Ø {S,A,C}

baaba

{S,A,C}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 124 / 141

Example (Continued)

S →AB | BCA→BA | aB→CC | bC→AB | a

b a a b a{B} {A,C} {A,C} {B} {A,C}ba aa ab ba{S,A} {B} {S,C} {S,A}baa aab abaØ {B} {B}

baab aaba

Ø {S,A,C}

baaba

{S,A,C}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 124 / 141

Example (Continued)

S →AB | BCA→BA | aB→CC | bC→AB | a

b a a b a{B} {A,C} {A,C} {B} {A,C}ba aa ab ba{S,A} {B} {S,C} {S,A}baa aab abaØ {B} {B}

baab aabaØ

{S,A,C}

baaba

{S,A,C}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 124 / 141

Example (Continued)

S →AB | BCA→BA | aB→CC | bC→AB | a

b a a b a{B} {A,C} {A,C} {B} {A,C}ba aa ab ba{S,A} {B} {S,C} {S,A}baa aab abaØ {B} {B}

baab aabaØ {S,A,C}

baaba

{S,A,C}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 124 / 141

Example (Continued)

S →AB | BCA→BA | aB→CC | bC→AB | a

b a a b a{B} {A,C} {A,C} {B} {A,C}ba aa ab ba{S,A} {B} {S,C} {S,A}baa aab abaØ {B} {B}

baab aabaØ {S,A,C}

baaba{S,A,C}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 124 / 141

Example

Kozen, Lecture 27, page 192. aabbab ∈ L(G)?

S →AB | BA | SS | AC | BDA→ aB→ bC→SBD→SA

The grammar is in Chomsky Normal Form.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 125 / 141

a a b b a b{A}

{A} {B} {B} {A} {B}

aa ab bb ba ab

Ø {S} Ø {S} {S}

aab abb bba bab

Ø {C} Ø {C}

aabb abba bbab

{S} {S} Ø

aabba abbab

{D} {C}

aabbab

{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

a a b b a b{A} {A}

{B} {B} {A} {B}

aa ab bb ba ab

Ø {S} Ø {S} {S}

aab abb bba bab

Ø {C} Ø {C}

aabb abba bbab

{S} {S} Ø

aabba abbab

{D} {C}

aabbab

{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

a a b b a b{A} {A} {B}

{B} {A} {B}

aa ab bb ba ab

Ø {S} Ø {S} {S}

aab abb bba bab

Ø {C} Ø {C}

aabb abba bbab

{S} {S} Ø

aabba abbab

{D} {C}

aabbab

{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

a a b b a b{A} {A} {B} {B}

{A} {B}

aa ab bb ba ab

Ø {S} Ø {S} {S}

aab abb bba bab

Ø {C} Ø {C}

aabb abba bbab

{S} {S} Ø

aabba abbab

{D} {C}

aabbab

{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

a a b b a b{A} {A} {B} {B} {A}

{B}

aa ab bb ba ab

Ø {S} Ø {S} {S}

aab abb bba bab

Ø {C} Ø {C}

aabb abba bbab

{S} {S} Ø

aabba abbab

{D} {C}

aabbab

{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

a a b b a b{A} {A} {B} {B} {A} {B}aa ab bb ba ab

Ø {S} Ø {S} {S}

aab abb bba bab

Ø {C} Ø {C}

aabb abba bbab

{S} {S} Ø

aabba abbab

{D} {C}

aabbab

{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

a a b b a b{A} {A} {B} {B} {A} {B}aa ab bb ba abØ

{S} Ø {S} {S}

aab abb bba bab

Ø {C} Ø {C}

aabb abba bbab

{S} {S} Ø

aabba abbab

{D} {C}

aabbab

{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

a a b b a b{A} {A} {B} {B} {A} {B}aa ab bb ba abØ {S}

Ø {S} {S}

aab abb bba bab

Ø {C} Ø {C}

aabb abba bbab

{S} {S} Ø

aabba abbab

{D} {C}

aabbab

{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

a a b b a b{A} {A} {B} {B} {A} {B}aa ab bb ba abØ {S} Ø

{S} {S}

aab abb bba bab

Ø {C} Ø {C}

aabb abba bbab

{S} {S} Ø

aabba abbab

{D} {C}

aabbab

{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

a a b b a b{A} {A} {B} {B} {A} {B}aa ab bb ba abØ {S} Ø {S}

{S}

aab abb bba bab

Ø {C} Ø {C}

aabb abba bbab

{S} {S} Ø

aabba abbab

{D} {C}

aabbab

{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

a a b b a b{A} {A} {B} {B} {A} {B}aa ab bb ba abØ {S} Ø {S} {S}aab abb bba bab

Ø {C} Ø {C}

aabb abba bbab

{S} {S} Ø

aabba abbab

{D} {C}

aabbab

{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

a a b b a b{A} {A} {B} {B} {A} {B}aa ab bb ba abØ {S} Ø {S} {S}aab abb bba babØ

{C} Ø {C}

aabb abba bbab

{S} {S} Ø

aabba abbab

{D} {C}

aabbab

{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

a a b b a b{A} {A} {B} {B} {A} {B}aa ab bb ba abØ {S} Ø {S} {S}aab abb bba babØ {C}

Ø {C}

aabb abba bbab

{S} {S} Ø

aabba abbab

{D} {C}

aabbab

{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

a a b b a b{A} {A} {B} {B} {A} {B}aa ab bb ba abØ {S} Ø {S} {S}aab abb bba babØ {C} Ø

{C}

aabb abba bbab

{S} {S} Ø

aabba abbab

{D} {C}

aabbab

{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

a a b b a b{A} {A} {B} {B} {A} {B}aa ab bb ba abØ {S} Ø {S} {S}aab abb bba babØ {C} Ø {C}

aabb abba bbab

{S} {S} Ø

aabba abbab

{D} {C}

aabbab

{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

a a b b a b{A} {A} {B} {B} {A} {B}aa ab bb ba abØ {S} Ø {S} {S}aab abb bba babØ {C} Ø {C}

aabb abba bbab{S}

{S} Ø

aabba abbab

{D} {C}

aabbab

{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

a a b b a b{A} {A} {B} {B} {A} {B}aa ab bb ba abØ {S} Ø {S} {S}aab abb bba babØ {C} Ø {C}

aabb abba bbab{S} {S}

Ø

aabba abbab

{D} {C}

aabbab

{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

a a b b a b{A} {A} {B} {B} {A} {B}aa ab bb ba abØ {S} Ø {S} {S}aab abb bba babØ {C} Ø {C}

aabb abba bbab{S} {S} Øaabba abbab

{D} {C}

aabbab

{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

a a b b a b{A} {A} {B} {B} {A} {B}aa ab bb ba abØ {S} Ø {S} {S}aab abb bba babØ {C} Ø {C}

aabb abba bbab{S} {S} Øaabba abbab{D}

{C}

aabbab

{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

a a b b a b{A} {A} {B} {B} {A} {B}aa ab bb ba abØ {S} Ø {S} {S}aab abb bba babØ {C} Ø {C}

aabb abba bbab{S} {S} Øaabba abbab{D} {C}

aabbab

{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

a a b b a b{A} {A} {B} {B} {A} {B}aa ab bb ba abØ {S} Ø {S} {S}aab abb bba babØ {C} Ø {C}

aabb abba bbab{S} {S} Øaabba abbab{D} {C}

aabbab{S}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 126 / 141

Earley’s algorithm and CYK both run in time O(n3) on general grammars.However, Earley’s algorithm does much better on grammars of practical inter-est. Although we will not prove it, Early’s algorithm runs in time O(n2) onunambiguous grammars and in time O(n) on LR(1) grammars. (We will notdefine LR(1) grammars, but they are equivalent to DCFLs. Furthermore, thesyntax of almost every programming language is essentially an LR(1) grammar.For more information, see a textbook on compiler design.) In contrast, theCYK algorithm always takes time bounded above and below by multiples ofn3, regardless of the grammar.

Floyd, Section 5.12, page 392–393.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 127 / 141

Earley’s algorithm and CYK both run in time O(n3) on general grammars. How-ever, Earley’s algorithm does much better on grammars of practical interest. Althoughwe will not prove it, Early’s algorithm runs in time O(n2) on unambiguous grammarsand in time O(n) on LR(1) grammars. (We will not define LR(1) grammars, butthey are equivalent to DCFLs. Furthermore, the syntax of almost every programminglanguage is essentially an LR(1) grammar. For more information, see a textbook oncompiler design.) In contrast, the CYK algorithm always takes time bounded aboveand below by multiples of n3, regardless of the grammar.

Floyd, Section 5.12, page 392–393.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 128 / 141

Theorem For every fixed k ≥ 1: A language has an LR(k) grammar iff it is DCFL.Knuth, Section V on page 628.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 129 / 141

The end. Ignore the rest.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 130 / 141

The languages generated by LL(k) grammars is a proper subset of the DCFL accepted byDPDA’s.Theorem. Linz 6th, §7.4, exercise 6, page 211. If L is generated by a LL(k) grammar for somek, then L ∈ DCFL.

Let L be the language an ∪ anbn.Let L be the language ancbn ∪ andb2n.1Theorem. L is accepted by a DPDA.Theorem. For no k is there an LL(k) grammar generating L.

1Lehtinen and Okhotin, LNCS, 2012, page 156.Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 131 / 141

Example of a Context-Sensitive Grammar

Linz 6th, §11.3, Example 11.2, page 301

1 S → aAbc2 S → abc3 Ab→ bA4 Ac→Bbcc5 bB→Bb6 aB→ aa7 aB→ aaA

A means you must add an a, b, and c. B means you have added a b and a c and you mustadd an A.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 132 / 141

Theorem. The family of context-free languages is a proper subset of the family ofcontext-sensitive languages.

L = {anbncn} is not context free.Linz 6th,

HMU, 3rd, §7.2, Example 7.16, page 284.Sudkamp, §7.4, Example 7.4.1, page 242.

L = {anbncn} is context sensitive.Linz 6th, §11.3, Example 11.2, page 301.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 133 / 141

Unrestricted Grammars

We have seen so many restrictions on grammars, it becomes necessary to define:

Definition. An unrestricted grammar is a grammar without restrictions.

So an unrestricted grammar is simply a grammar. Although CFGs are the most common,useful, and intuitive of the grammars, we now turn our attention to unfettered grammars.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 134 / 141

Example of an Unrestricted Grammar

Sudkamp, §10.1, Example 10.1.1, page 327

1 S → aAbc2 S → ε3 A → aAbC4 A → ε5 Cb→ bc6 Cc→ cc

L(G) = {anbncn | 0 ≤ n}

S 1⇒ aAbc ∗⇒ aiA(bC)i−1bc ∗⇒ ai (bC)i−1bc ∗⇒ aibiC i−1c ∗⇒ aibic i

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 135 / 141

Example of an Unrestricted Grammar

Sudkamp, §10.1, Example 10.1.2, page 327

S → aT [a] | bT [b] | []T [→ aT [A | bT [B | [

Aa→ aAAb→ bABa→ aBBb→ bB

A]→ a]B]→ b]

L(G) = {w [w ] | w ∈ {a, b}∗}

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 136 / 141

A 1⇒ aT [a]1⇒ aaT [Aa]1⇒ aaT [aA]1⇒ aaT [aa]1⇒ aabT [Baa]1⇒ aabT [aBa]1⇒ aabT [aaB]1⇒ aabT [aab]1⇒ aab[aab]

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 137 / 141

Theorem 11.6, Linz, 6th, page 293. Any language generated by an unrestrictedgrammar is recursively enumerable.

Kozen, Miscellaneous Exercise, #104, page 343. Show that the type 0 grammars (seeLecture 36) generate exactly the r.e. sets.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 138 / 141

Example 11.1S⇒ SV␣␣

S → SV␣␣ 1 (11.6) add trailing blanks⇒ TV␣␣

S → T 2 (11.6) generate input⇒ TVaaV␣␣

T → TVaa 3 (11.7) an input character⇒ Va0aVaaV␣␣

T → TVaa 4 (11.7) start state, 1st character⇒ VaaVa0aV␣␣

Va0aVaa → VaaVa0a 5 (11.8) 〈q0, a, q0, a,R〉 ∈ ∆⇒ VaaVaaV␣0␣

Va0aV␣␣ → VaaV␣0␣ 6 (11.8) 〈q0, a, q0, a,R〉 ∈ ∆⇒ VaaVa1aV␣␣

VaaV␣0␣ → Va1aV␣␣ 7 (11.9) 〈q0, ␣, q1, a, L〉 ∈ ∆⇒ VaaaV␣␣

Va1a → a 8 (11.10) restore input⇒ Vaaa␣ aV␣␣ → a␣ 9 (11.11) restore input⇒ aa␣ Vaaa→ a 10 (11.12) restore input⇒ aa ␣→ ε 11 (11.13) restore input

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 139 / 141

Example 11.1

S ⇒ SV␣␣ S → SV␣␣ (11.6)⇒ TV␣␣

⇒ TVaaV␣␣

⇒ Va0aVaaV␣␣ ` 〈ε, q0, aa〉⇒ VaaVa0aV␣␣ ` 〈a, q0, a〉⇒ VaaVaaV␣0␣ ` 〈aa, q0, ␣〉⇒ VaaVa1aV␣␣ ` 〈a, q1, a␣〉⇒ VaaaV␣␣

⇒ Vaaa␣⇒ aa␣⇒ aa ␣→ ε (11.13)

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 140 / 141

References I

John Cocke and Jacob T. Schwartz. Programming Languages and their Compilers.Tech. rep. New York: NYU, Courant Institute, 1970.T. Kasami. An efficient recognition and syntax analysis algorithm for context-freelanguages. Tech. rep. AFCRL-65-758. Air Force Cambridge Res. Lab., Bedford Mass.,1965.D. H. Younger. “Recognition and Parsing of Context-Free Languages in n3”. In:Information and Control 10 (1967), pp. 189–208.

Ryan Stansifer (CS, Florida Tech) Formal Languages 17 October 2019 141 / 141