Parsing Expression Grammars: A Recognition-Based Syntactic Foundation
Bryan Ford
Massachusetts Institute of Technology
Symposium on Principles of Programming Languages,
January 14-16, 2004, Venice, Italy
Abstract
For decades we have been using Chomsky's generative system of grammars,
particularly context-free grammars (CFGs) and regular expressions (REs),
to express the syntax of
programming languages and protocols.
The power of generative grammars to express ambiguity
is crucial to their original purpose of modelling natural languages,
but this very power makes it unnecessarily difficult
both to express and to parse machine-oriented languages using CFGs.
Parsing Expression Grammars (PEGs) provide
an alternative, recognition-based formal foundation
for describing machine-oriented syntax,
which solves the ambiguity problem
by not introducing ambiguity in the first place.
Where CFGs express nondeterministic choice between alternatives,
PEGs instead use prioritized choice.
PEGs address frequently felt expressiveness limitations
of CFGs and REs,
simplifying syntax definitions and making it unnecessary
to separate their lexical and hierarchical components.
A linear-time parser can be built for any PEG,
avoiding both the complexity and fickleness of LR parsers
and the inefficiency of generalized CFG parsing.
While PEGs provide a rich set of operators
for constructing grammars,
they are reducible
to two minimal recognition schemas developed around 1970,
TS/TDPL and gTS/GTDPL,
which are here proven equivalent in effective recognition power.