Context-free Grammars - Natural & Programming Languages

28
Linguistic Roots ALGOL Parsing Other Syntax Models Context-free Grammars Natural & Programming Languages Laureats’ Visit July 19, 2013 1/22

Transcript of Context-free Grammars - Natural & Programming Languages

Linguistic Roots ALGOL Parsing Other Syntax Models

Context-free GrammarsNatural & Programming Languages

Laureats’ Visit

July 19, 2013

1/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Example of a Programming

Language: Go

I designed by Google (2012)

I documentation : specifiesthe syntax

I uses a context-freegrammar

2/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Example of a Programming

Language: Go

I tool shipped with Go:YACC

I generates a parser from agrammar

I allows for creating,editing, adapting thesyntax of programminglanguages

2/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Pan. ini (∼350 BC) : As.t. adhyayiI Sanskrit grammar

I about 4000 rules

I formal rules:

A→ B/C D

“rewrite A to B in thecontext C D”

I auxiliary symbols

3/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Chomsky (1956) : Three Models for theDescription of Language

1. finite-state automata

2. phrase-structure grammars

3. transformational grammars

N. Chomsky

4/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Modeling

I a language � set of sentences

I syntax vs. semantics:The child eats a tomato.A tomato eats the child.

*A tomato the child eats.

I competence vs. performance:The child eats a nice tomato.The child eats a nice round tomato.The child eats a nice red round tomato....

5/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Constituents Analysis

[[The child] [eats [a tomato]]].[[The child] [eats [a [nice tomato]]]].

6/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Constituents Analysis (ctd.)

P

NP

det

The

AP

n

child

VP

v

eats

NP

det

a

AP

adj

nice

AP

n

tomato7/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Context-free Grammars

Special case of phrase-structured grammars:empty contexts

P→NP VPNP→ det APVP→ v NPAP→ adj AP | ndet→ The | a

n→ child | tomatov→ eats

adj→ nice | red | round

8/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Backus (1959); Naur (1960): Algol

60

I ALGOrithmic Language

I standard syntax

〈statement〉→ 〈unconditional statement〉| 〈conditional statement〉

〈unconditional statement〉→ 〈for statement〉

〈conditional statement〉→ 〈if statement〉| 〈if statement〉 else 〈statement〉

〈if statement〉→ 〈if clause〉 〈unconditional statement〉〈if clause〉→ if 〈boolean expression〉 then

J. Backus

9/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Ginsburg and Rice (1962) : Twofamilies of languages related to ALGOLI connection between Algol and

Chomsky’s work

I pluri-disciplinary research:

I linguistics

I programming languages

I theoretical computer science(Chomsky, 1959; Bar-Hillel et al.,1961; Chomsky and Schutzenberger,1963, ...)

Y. Bar-Hillel

M.P. Schutzenberger

10/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Pushdown Automata

Yngve (1960); Oettinger (1961); Chomsky (1962)

I operational model, easy implementation

I expressivity equivalent to that of context-freegrammars

I idea of parsing: generate a pushdownautomaton from a grammar

11/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Pushdown Automata (ctd.)

(q,ε,⊥,ε,qf)(q,ε,P,NP VP,q)

(q,ε,NP,det AP,q)...

(q,ε,det,The,q)(q,ε,det,a,q)

(q,The,The,ε,q)(q,a,a,ε,q)

...

12/22

Linguistic Roots ALGOL Parsing Other Syntax Models

IssuesI Floyd (1962b): Algol 60 is not

context-free:begin

real x;

y := 3

end

is only correct if the twoidentifiers x and y are thesame.

I separation into lexical analysis,parsing, and semanticsanalysis

R.W. Floyd

13/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Issues

I Cantor (1962); Floyd (1962a):Algol 60 is ambiguous:several possible analyses forsome programs

I inherently ambiguouslanguages (Parikh, 1966;Ginsburg and Ullian, 1966)

I undecidable propertiesR. Parikh

13/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Issues

I the first parsers impose verystringent restrictions ongrammars (Irons, 1961)

I ideally: deterministicpushdown automata (Ginsburgand Greibach, 1966)—notderivable from any grammar

I undecidable properties

S. Greibach

13/22

Linguistic Roots ALGOL Parsing Other Syntax Models

... and Answers

I parser generators for larger andlarger classes of grammars

I Knuth (1965): LR parsing for allthe deterministic languages

I DeRemer (1969) : simplifications(SLR & LALR)

I YACC (Johnson, 1975) : LALR(1)parser generator

D.E. Knuth

14/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Today

All the mainstream programming languages areshipped with

I a context-free grammar that specifies theirsyntax

I a parser generator (most likely a YACC variant)allowing to write parsers for new languages

15/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Syntax Models

I context-free grammars (rewriting systems)

I pushdown automata (transition systems)

I algebraic equations (equations systems)

I categorial grammars (proof systems)

I dynamic logic on trees (model theory)

16/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Syntax Models

I context-free grammars (rewriting systems)

I pushdown automata (transition systems)

I algebraic equations (equations systems)

I categorial grammars (proof systems)

I dynamic logic on trees (model theory)

16/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Algebraic Equations

(Ginsburg and Rice, 1962; Chomsky and Schutzenberger, 1963)

Minimal solutions of a systemP = NP ·VP

NP = det ·APVP = v ·NPAP = adj ·AP∪ndet = {The}∪ {a}

n = {child}∪ {tomato}v = {eats}

adj = {nice}∪ {round}∪ {red}

17/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Categorial Grammars

(Bar-Hillel, 1953; Lambek, 1958)

Categories built using left and right quotientsover a finite set of symbols A:

γ ::=A | γ1\γ2 | γ1/γ2 (categories)

Deduction rules:

Lexiconw ` γ

w1 ` γ1 w2 ` γ1\γ2\

w1 ·w2 ` γ2

w1 ` γ2/γ1 w2 ` γ1/

w1 ·w2 ` γ2 J. Lambek

18/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Proofs

Example

The ` NP/n child ` n/

The child ` NP

eats ` (P\NP)/NP

a ` NP/n tomato ` n/

a tomato ` NP/

eats a tomato ` P\NP\

The child eats a tomato ` P

19/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Logics on Trees

(Blackburn et al., 1993; Afanasiev et al., 2005)

Modal logic on a set of atomic propositions p

ϕ ::=> | p | ¬ϕ |ϕ1 ∧ϕ2 | 〈π〉ϕ (formulæ)π ::=→ |← | ↓ | ↑ | π∗ (relations)

P. Blackburn

20/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Models

An ordered finite labeled tree t in a node n:

t,n |=>t,n |= p if the label of n is pt,n |= ¬ϕ if t,n 6|=ϕt,n |=ϕ1 ∧ϕ2 if t,n |=ϕ1 and t,n |=ϕ2

t,n |= 〈π〉ϕ if ∃n ′,n π n ′ and t,n ′ |=ϕ

21/22

Linguistic Roots ALGOL Parsing Other Syntax Models

Formulæ

ExampleP∧[↓∗][→∗](

∨X∈Σ]N

(X∧∧Y,X

¬Y)

∧ (¬〈↓〉>)≡ (∨a∈Σ

a) ∧ (〈↓〉>)≡ (∨A∈N

A)

∧P ⊃ 〈↓〉(NP∧ 〈→〉VP∧¬〈←〉>∧ 〈→〉¬〈→〉>)∧AP⊃ 〈↓〉(adj∧ 〈→〉AP∧¬〈←〉>∧ 〈→〉¬〈→〉>)

∨ 〈↓〉(n∧¬〈←〉>∧¬〈→〉>)∧det⊃ 〈↓〉(The∧¬〈←〉>∧¬〈→〉>)

∨ 〈↓〉(a∧¬〈←〉>∧¬〈→〉>)∧ ...)

22/22

References References

Afanasiev, L., Blackburn, P., Dimitriou, I., Gaiffe, B., Goris, E., Marx, M., and de Rijke, M., 2005. PDL for orderedtrees. Journal of Applied Non-Classical Logic, 15(2):115–135. doi:10.3166/jancl.15.115-135.

Aho, A.V., Johnson, S.C., and Ullman, J.D., 1975. Deterministic parsing of ambiguous grammars. Communications ofthe ACM, 18(8):441–452. doi:10.1145/360933.360969.

Backus, J.W., 1959. The syntax and semantics of the proposed international algebraic language of the ZurichACM-GAMM Conference. In IFIP Congress, pages 125–131.

Bar-Hillel, Y., Perles, M., and Shamir, E., 1961. On formal properties of simple phrase-structure grammars.Zeitschrift fur Phonetik, Sprachwissenschaft, und Kommunikations-forschung, 14:143–172.

Bar-Hillel, Y., 1953. A quasi-arithmetical notation for syntactic description. Language, 29(1):47–58.doi:10.2307/410452.

Blackburn, P., Gardent, C., and Meyer-Viol, W., 1993. Talking about trees. In EACL ’93, pages 21–29. ACL Press.doi:10.3115/976744.976748.

Cantor, D.G., 1962. On the ambiguity problem of Backus systems. Journal of the ACM, 9(4):477–479.doi:10.1145/321138.321145.

Chomsky, N., 1956. Three models for the description of language. IEEE Transactions on Information Theory, 2(3):113–124. doi:10.1109/TIT.1956.1056813.

Chomsky, N., 1959. On certain formal properties of grammars. Information and Control, 2(2):137–167.doi:10.1016/S0019-9958(59)90362-6.

Chomsky, N., 1962. Context-free grammars and pushdown storage. Quarterly Progress Report 65, ResearchLaboratory of Electronics, M.I.T.

Chomsky, N. and Schutzenberger, M.P., 1963. The algebraic theory of context-free languages. In Braffort, P. andHirshberg, D., editors, Computer Programming and Formal Systems, volume 35 of Studies in Logic, pages118–161. North-Holland Publishing. doi:10.1016/S0049-237X(08)72023-8.

DeRemer, F.L., 1969. Practical Translators for LR(k) Languages. PhD thesis, Massachusetts Institute of Technology,Cambridge, Massachusetts. http://www.lcs.mit.edu/publications/pubs/pdf/MIT-LCS-TR-065.pdf.

Earley, J., 1975. Ambiguity and precedence in syntax description. Acta Informatica, 4(2):183–192.doi:10.1007/BF00288747.

Floyd, R.W., 1962a. On ambiguity in phrase structure languages. Communications of the ACM, 5(10):526.doi:10.1145/368959.368993.

Floyd, R.W., 1962b. On the nonexistence of a phrase structure grammar for ALGOL 60. Communications of the ACM,5(9):483–484. doi:10.1145/368834.368898.

Ginsburg, S. and Rice, H.G., 1962. Two families of languages related to ALGOL. Journal of the ACM, 9(3):350–371.doi:10.1145/321127.321132.

Ginsburg, S. and Greibach, S., 1966. Deterministic context-free languages. Information and Control, 9(6):620–648.doi:10.1016/S0019-9958(66)80019-0.

23/22

References References

Ginsburg, S. and Ullian, J., 1966. Ambiguity in context free languages. Journal of the ACM, 13(1):62–89.doi:10.1145/321312.321318.

Irons, E.T., 1961. A syntax directed compiler for ALGOL 60. Communications of the ACM, 4(1):51–55.doi:10.1145/366062.366083.

Johnson, S.C., 1975. YACC — yet another compiler compiler. Computing science technical report 32, AT&T BellLaboratories, Murray Hill, New Jersey.

Knuth, D.E., 1965. On the translation of languages from left to right. Information and Control, 8(6):607–639.doi:10.1016/S0019-9958(65)90426-2.

Lambek, J., 1958. The mathematics of sentence structure. American Mathematical Monthly, 65(3):154–170.doi:10.2307/2310058.

Naur, P., editor, 1960. Report on the algorithmic language ALGOL 60. Communications of the ACM, 3(5):299–314.doi:10.1145/367236.367262.

Oettinger, A.G., 1961. Automatic syntactic analysis and the pushdown store. In Structure of Language and itsMathematical Aspects, volume 12 of Proc. of Symposia in Applied Math., pages 104–129. AMS.

Parikh, R.J., 1966. On context-free languages. Journal of the ACM, 13(4):570–581. doi:10.1145/321356.321364.

Yngve, V.H., 1960. A model and an hypothesis for language structure. Proceedings of the American PhilosophicalSociety, 104(5):444–466.

24/22