Monday, July 21, 2025

Natural language formal grammars

Nowadays what's the most advanced fully formalized formal grammar for a natural language?

Short answer : For a single natural language, the English Resource Grammar (ERG)—currently at its May 2025 release—is generally regarded as the most advanced, fully formalised grammar that actually runs on a computer. It is hand-built in the HPSG formalism, covers 90-plus % of open-domain English, and produces logic-ready Minimal Recursion Semantics (MRS) representations. (Delph-in)


Why the ERG stands out

Aspect What the ERG offers
Theoretical foundation Head-Driven Phrase-Structure Grammar with a rich type hierarchy of 1 100+ lexical/phrased-structure types
Coverage & precision ~35 000 lexemes, ~200 syntactic rules; parses ≈ 94 % of Wall-Street-Journal sentences and > 90 % of contemporary web text while staying linguistically faithful (iafisher.com)
Semantics Direct compositional output in MRS, compatible with downstream inference, QA, RTE and MT pipelines (Delph-in)
Tool chain Ships as a binary grammar (“erg-2025-x86-64-0.9.34.dat”) for the ACE parser and for LKB/pyDelphin; also accessible through a REST API (Delph-in)
Sustained maintenance Active DELPH-IN community, annual releases; latest tagged 2025

Because every rule, feature structure and semantic predicate is machine-checkable, ERG is what linguists call a precision grammar: if it accepts a sentence, it guarantees an internally consistent syntactic and semantic analysis. That property makes it the go-to resource for deep NLP experiments and for testing linguistic hypotheses.


How does it compare to other large formal grammars?

Project Formalism Languages Maturity & scope
ParGram / XLE grammars Lexical-Functional Grammar (LFG) ≈ 10 major languages (EN, FR, DE, JA…) High-quality parallel grammars but smaller coverage per language; used mainly in grammar-checking research (iafisher.com)
GF Resource Grammar Library (RGL) Type-theoretical GF 45 + languages Elegant multilingual framework for generation & controlled NL; not yet as deep or wide-coverage per language as ERG (grammaticalframework.org)
OpenCCG / CCGBank grammar Combinatory Categorial Grammar English Grammar induced from CCGBank; broad but automatically derived, with less hand-verified precision than ERG (GitHub)
Resource Grammars (Jacy, SRG, etc.) HPSG Japanese, Spanish, Norwegian… Built with the same methodology as ERG; none yet match ERG’s size or stability (e.g. Spanish RG revival 2024)

In short, those projects show that formalised, executable grammars are feasible for many languages, but the ERG remains the flagship in terms of depth, breadth, and semantic integration.


Take-away

If you need a fully formal, broad-coverage, linguistically rigorous grammar that you can parse and generate with today, the ERG is the state of the art. For cross-lingual work, look toward the GF RGL or the ParGram LFG suite, but be prepared for more limited coverage or weaker semantic output.

No comments:

Post a Comment