refactor: split large grammar.js into modules #341

vergenzt · 2025-12-12T22:19:34Z

I tried to make no semantic changes to anything, but that could use some verifying. I did make substantial editorial decisions about what "belongs" where -- totally open to feedback on any of it.

I may try to do some sort of automated analysis to confirm that all of the rules expressions from before and after are the same. We'll see.

Would you be open to merging something like this?

I tried to make no semantic changes to anything, but that could use some verifying.

vergenzt · 2025-12-13T19:45:48Z

Okay I was able to figure out a good way to do that validation. I used an ast-grep rule to fetch all relevant grammar rule JS key/value pairs, normalized whitespace for each (since I dedented when extracting to the grammar/ folder), then sorted and wrote to a file and was able to diff the files.

On both main and this branch, I ran the following:

ast-grep scan --json --inline-rules='id: _
language: js
rule:
  kind: pair
  inside:
    kind: object
    inside:
      any:
        - pattern: module.exports = { $$$ }
        - kind: pair
          has:
            field: key
            regex: rules
' grammar* | jq '.[].text | gsub("\\\\s+"; " ")' | sort > rules-$(git rev-parse HEAD)

(edit 2025-12-29 to remove indent which arose from copy/paste)

This then produced the following files:

rules-224f3a49757b4997fc89b89fe31b84c635002534 (all rules detected at 224f3a4, tip of this branch)
rules-fe77f6868d6cdea593052a6af390116495093dc1 (all rules detected at fe77f68, current tip of main)

Then, git diff --no-index rules-fe77f6868d6cdea593052a6af390116495093dc1 rules-224f3a49757b4997fc89b89fe31b84c635002534 for me yields:

diff --git a/rules-fe77f6868d6cdea593052a6af390116495093dc1 b/rules-224f3a49757b4997fc89b89fe31b84c635002534
index c025468..320e991 100644
--- a/rules-fe77f6868d6cdea593052a6af390116495093dc1
+++ b/rules-224f3a49757b4997fc89b89fe31b84c635002534
@@ -148,7 +148,7 @@
 "direction: $ => choice($.keyword_desc, $.keyword_asc)"
 "distinct_from: $ => seq($.keyword_is, $.keyword_distinct, $.keyword_from)"
 "dollar_quote: () => /\\$[^\\$]*\\$/"
-"double: $ => choice( make_keyword(\"float8\"), unsigned_type($, parametric_type($, $.keyword_double, ['precision', 'scale'])), unsigned_type($, parametric_type($, seq($.keyword_double, $.keyword_precision), ['precision', 'scale'])), unsigned_type($, parametric_type($, $.keyword_real, ['precision', 'scale'])), )"
+"double: $ => choice( $.keyword_float8, unsigned_type($, parametric_type($, $.keyword_double, ['precision', 'scale'])), unsigned_type($, parametric_type($, seq($.keyword_double, $.keyword_precision), ['precision', 'scale'])), unsigned_type($, parametric_type($, $.keyword_real, ['precision', 'scale'])), )"
 "drop_column: $ => seq( $.keyword_drop, optional( $.keyword_column, ), optional($._if_exists), field('name', $.identifier), )"
 "drop_constraint: $ => seq( $.keyword_drop, $.keyword_constraint, optional($._if_exists), $.identifier, optional($._drop_behavior), )"
 "drop_database: $ => prec.left(seq( $.keyword_drop, $.keyword_database, optional($._if_exists), $.identifier, optional($.keyword_with), optional($.keyword_force), ))"
@@ -306,6 +306,7 @@
 "keyword_filter: _ => make_keyword(\"filter\")"
 "keyword_first: _ => make_keyword(\"first\")"
 "keyword_float: _ => make_keyword(\"float\")"
+"keyword_float8: _ => make_keyword(\"float8\")"
 "keyword_following: _ => make_keyword(\"following\")"
 "keyword_follows: _ => make_keyword(\"follows\")"
 "keyword_for: _ => make_keyword(\"for\")"

... which of course now reminds me that I did change one thing, which was to extract a separate keyword_float8 rule instead of the previous inline call to make_keyword. 😄

Looks like that's confirmed to be the only change though!!

DerekStride · 2025-12-15T19:37:28Z

I like the idea here, splitting everything out into separate modules seems nice, are you aware of other projects that make use of this pattern? We're already quite different in that we don't commit the generated code so I worry about adding more friction.

vergenzt · 2025-12-28T01:03:06Z

I like the idea here, splitting everything out into separate modules seems nice, are you aware of other projects that make use of this pattern? We're already quite different in that we don't commit the generated code so I worry about adding more friction.

From some searching, the following appear to be tree-sitter grammar definitions which require pieces of their definitions from other files:

To answer your question more directly though, no I'm not aware of specific other projects which split things out. I believe this is also the first major tree-sitter grammar I've attempted to make a contribution to -- but I'm pretty sure it's also the largest / most complex grammar I've encountered.

IMO this will substantially reduce friction for new contributors because potential contributions will have a clearer place to go.

to avoid changing any parsed output (which would require changes to highlights.scm and test/corpus)

vergenzt · 2025-12-29T18:41:16Z

FYI I pushed 8f8e5fd to revert the (admittedly minor) keyword change, and reran the diff script. Diff output from ebf08ba (now-current tip of main) to 8f8e5fd (new tip of this PR) is now empty.

(You can see full output here: https://gist.github.com/vergenzt/1614ef579a3ba452050b2338cfd7adcf)

vergenzt · 2025-12-29T18:53:57Z

If you're not interested in this would you at least be open to some sort of well-defined order for rules in grammar.js? Maybe a topological sort or something?

DerekStride · 2026-01-05T18:49:44Z

@matthias-Q Do you have any objections? If not, I'm good to merge these changes.

matthias-Q · 2026-01-05T19:16:45Z

No, I am totally fine with it.

vergenzt · 2026-01-09T01:01:24Z

Awesome! I'll try to get conflicts resolved asap and get this back to being ready to merge.

refactor: split large grammar.js into modules

224f3a4

I tried to make no semantic changes to anything, but that could use some verifying.

vergenzt added 2 commits December 27, 2025 20:03

update artifact builds to include grammar/**

3fb03c7

undo float8 keyword extraction

8f8e5fd

to avoid changing any parsed output (which would require changes to highlights.scm and test/corpus)

vergenzt added 2 commits December 29, 2025 13:41

Merge branch 'main' into vergenzt/modularize-grammar-js

6dac84f

whoops - correctly import make_keyword

36abeaa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: split large grammar.js into modules #341

refactor: split large grammar.js into modules #341

Uh oh!

vergenzt commented Dec 12, 2025 •

edited

Loading

Uh oh!

vergenzt commented Dec 13, 2025 •

edited

Loading

Uh oh!

DerekStride commented Dec 15, 2025

Uh oh!

vergenzt commented Dec 28, 2025 •

edited

Loading

Uh oh!

vergenzt commented Dec 29, 2025 •

edited

Loading

Uh oh!

vergenzt commented Dec 29, 2025

Uh oh!

DerekStride commented Jan 5, 2026 •

edited

Loading

Uh oh!

matthias-Q commented Jan 5, 2026

Uh oh!

vergenzt commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

refactor: split large grammar.js into modules #341

Are you sure you want to change the base?

refactor: split large grammar.js into modules #341

Uh oh!

Conversation

vergenzt commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vergenzt commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DerekStride commented Dec 15, 2025

Uh oh!

vergenzt commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vergenzt commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vergenzt commented Dec 29, 2025

Uh oh!

DerekStride commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthias-Q commented Jan 5, 2026

Uh oh!

vergenzt commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vergenzt commented Dec 12, 2025 •

edited

Loading

vergenzt commented Dec 13, 2025 •

edited

Loading

vergenzt commented Dec 28, 2025 •

edited

Loading

vergenzt commented Dec 29, 2025 •

edited

Loading

DerekStride commented Jan 5, 2026 •

edited

Loading