Nowadays what's the most advanced fully formalized formal grammar for a natural language?
Short answer : For a single natural language, the English Resource Grammar (ERG)—currently at its May 2025 release—is generally regarded as the most advanced, fully formalised grammar that actually runs on a computer. It is hand-built in the HPSG formalism, covers 90-plus % of open-domain English, and produces logic-ready Minimal Recursion Semantics (MRS) representations. (Delph-in)
Why the ERG stands out
Aspect | What the ERG offers |
---|---|
Theoretical foundation | Head-Driven Phrase-Structure Grammar with a rich type hierarchy of 1 100+ lexical/phrased-structure types |
Coverage & precision | ~35 000 lexemes, ~200 syntactic rules; parses ≈ 94 % of Wall-Street-Journal sentences and > 90 % of contemporary web text while staying linguistically faithful (iafisher.com) |
Semantics | Direct compositional output in MRS, compatible with downstream inference, QA, RTE and MT pipelines (Delph-in) |
Tool chain | Ships as a binary grammar (“erg-2025-x86-64-0.9.34.dat”) for the ACE parser and for LKB/pyDelphin; also accessible through a REST API (Delph-in) |
Sustained maintenance | Active DELPH-IN community, annual releases; latest tagged 2025 |
Because every rule, feature structure and semantic predicate is machine-checkable, ERG is what linguists call a precision grammar: if it accepts a sentence, it guarantees an internally consistent syntactic and semantic analysis. That property makes it the go-to resource for deep NLP experiments and for testing linguistic hypotheses.
How does it compare to other large formal grammars?
Project | Formalism | Languages | Maturity & scope |
---|---|---|---|
ParGram / XLE grammars | Lexical-Functional Grammar (LFG) | ≈ 10 major languages (EN, FR, DE, JA…) | High-quality parallel grammars but smaller coverage per language; used mainly in grammar-checking research (iafisher.com) |
GF Resource Grammar Library (RGL) | Type-theoretical GF | 45 + languages | Elegant multilingual framework for generation & controlled NL; not yet as deep or wide-coverage per language as ERG (grammaticalframework.org) |
OpenCCG / CCGBank grammar | Combinatory Categorial Grammar | English | Grammar induced from CCGBank; broad but automatically derived, with less hand-verified precision than ERG (GitHub) |
Resource Grammars (Jacy, SRG, etc.) | HPSG | Japanese, Spanish, Norwegian… | Built with the same methodology as ERG; none yet match ERG’s size or stability (e.g. Spanish RG revival 2024) |
In short, those projects show that formalised, executable grammars are feasible for many languages, but the ERG remains the flagship in terms of depth, breadth, and semantic integration.
Take-away
If you need a fully formal, broad-coverage, linguistically rigorous grammar that you can parse and generate with today, the ERG is the state of the art. For cross-lingual work, look toward the GF RGL or the ParGram LFG suite, but be prepared for more limited coverage or weaker semantic output.