-
Notifications
You must be signed in to change notification settings - Fork 86
refactor: split large grammar.js into modules #341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
refactor: split large grammar.js into modules #341
Conversation
I tried to make no semantic changes to anything, but that could use some verifying.
|
Okay I was able to figure out a good way to do that validation. I used an On both ast-grep scan --json --inline-rules='id: _
language: js
rule:
kind: pair
inside:
kind: object
inside:
any:
- pattern: module.exports = { $$$ }
- kind: pair
has:
field: key
regex: rules
' grammar* | jq '.[].text | gsub("\\\\s+"; " ")' | sort > rules-$(git rev-parse HEAD)(edit 2025-12-29 to remove indent which arose from copy/paste) This then produced the following files:
Then, diff --git a/rules-fe77f6868d6cdea593052a6af390116495093dc1 b/rules-224f3a49757b4997fc89b89fe31b84c635002534
index c025468..320e991 100644
--- a/rules-fe77f6868d6cdea593052a6af390116495093dc1
+++ b/rules-224f3a49757b4997fc89b89fe31b84c635002534
@@ -148,7 +148,7 @@
"direction: $ => choice($.keyword_desc, $.keyword_asc)"
"distinct_from: $ => seq($.keyword_is, $.keyword_distinct, $.keyword_from)"
"dollar_quote: () => /\\$[^\\$]*\\$/"
-"double: $ => choice( make_keyword(\"float8\"), unsigned_type($, parametric_type($, $.keyword_double, ['precision', 'scale'])), unsigned_type($, parametric_type($, seq($.keyword_double, $.keyword_precision), ['precision', 'scale'])), unsigned_type($, parametric_type($, $.keyword_real, ['precision', 'scale'])), )"
+"double: $ => choice( $.keyword_float8, unsigned_type($, parametric_type($, $.keyword_double, ['precision', 'scale'])), unsigned_type($, parametric_type($, seq($.keyword_double, $.keyword_precision), ['precision', 'scale'])), unsigned_type($, parametric_type($, $.keyword_real, ['precision', 'scale'])), )"
"drop_column: $ => seq( $.keyword_drop, optional( $.keyword_column, ), optional($._if_exists), field('name', $.identifier), )"
"drop_constraint: $ => seq( $.keyword_drop, $.keyword_constraint, optional($._if_exists), $.identifier, optional($._drop_behavior), )"
"drop_database: $ => prec.left(seq( $.keyword_drop, $.keyword_database, optional($._if_exists), $.identifier, optional($.keyword_with), optional($.keyword_force), ))"
@@ -306,6 +306,7 @@
"keyword_filter: _ => make_keyword(\"filter\")"
"keyword_first: _ => make_keyword(\"first\")"
"keyword_float: _ => make_keyword(\"float\")"
+"keyword_float8: _ => make_keyword(\"float8\")"
"keyword_following: _ => make_keyword(\"following\")"
"keyword_follows: _ => make_keyword(\"follows\")"
"keyword_for: _ => make_keyword(\"for\")"... which of course now reminds me that I did change one thing, which was to extract a separate Looks like that's confirmed to be the only change though!! |
|
I like the idea here, splitting everything out into separate modules seems nice, are you aware of other projects that make use of this pattern? We're already quite different in that we don't commit the generated code so I worry about adding more friction. |
From some searching, the following appear to be tree-sitter grammar definitions which
To answer your question more directly though, no I'm not aware of specific other projects which split things out. I believe this is also the first major tree-sitter grammar I've attempted to make a contribution to -- but I'm pretty sure it's also the largest / most complex grammar I've encountered. IMO this will substantially reduce friction for new contributors because potential contributions will have a clearer place to go. |
to avoid changing any parsed output (which would require changes to highlights.scm and test/corpus)
|
FYI I pushed 8f8e5fd to revert the (admittedly minor) keyword change, and reran the diff script. Diff output from ebf08ba (now-current tip of main) to 8f8e5fd (new tip of this PR) is now empty. (You can see full output here: https://gist.github.com/vergenzt/1614ef579a3ba452050b2338cfd7adcf) |
|
If you're not interested in this would you at least be open to some sort of well-defined order for rules in |
|
@matthias-Q Do you have any objections? If not, I'm good to merge these changes. |
|
No, I am totally fine with it. |
|
Awesome! I'll try to get conflicts resolved asap and get this back to being ready to merge. |
I tried to make no semantic changes to anything, but that could use some verifying. I did make substantial editorial decisions about what "belongs" where -- totally open to feedback on any of it.
I may try to do some sort of automated analysis to confirm that all of the rules expressions from before and after are the same. We'll see.
Would you be open to merging something like this?