Skip to content

Conversation

@vergenzt
Copy link

@vergenzt vergenzt commented Dec 12, 2025

I tried to make no semantic changes to anything, but that could use some verifying. I did make substantial editorial decisions about what "belongs" where -- totally open to feedback on any of it.

I may try to do some sort of automated analysis to confirm that all of the rules expressions from before and after are the same. We'll see.

Would you be open to merging something like this?

I tried to make no semantic changes to anything, but that could use some verifying.
@vergenzt
Copy link
Author

vergenzt commented Dec 13, 2025

Okay I was able to figure out a good way to do that validation. I used an ast-grep rule to fetch all relevant grammar rule JS key/value pairs, normalized whitespace for each (since I dedented when extracting to the grammar/ folder), then sorted and wrote to a file and was able to diff the files.

On both main and this branch, I ran the following:

ast-grep scan --json --inline-rules='id: _
language: js
rule:
  kind: pair
  inside:
    kind: object
    inside:
      any:
        - pattern: module.exports = { $$$ }
        - kind: pair
          has:
            field: key
            regex: rules
' grammar* | jq '.[].text | gsub("\\\\s+"; " ")' | sort > rules-$(git rev-parse HEAD)

(edit 2025-12-29 to remove indent which arose from copy/paste)

This then produced the following files:

  • rules-224f3a49757b4997fc89b89fe31b84c635002534 (all rules detected at 224f3a4, tip of this branch)
  • rules-fe77f6868d6cdea593052a6af390116495093dc1 (all rules detected at fe77f68, current tip of main)

Then, git diff --no-index rules-fe77f6868d6cdea593052a6af390116495093dc1 rules-224f3a49757b4997fc89b89fe31b84c635002534 for me yields:

diff --git a/rules-fe77f6868d6cdea593052a6af390116495093dc1 b/rules-224f3a49757b4997fc89b89fe31b84c635002534
index c025468..320e991 100644
--- a/rules-fe77f6868d6cdea593052a6af390116495093dc1
+++ b/rules-224f3a49757b4997fc89b89fe31b84c635002534
@@ -148,7 +148,7 @@
 "direction: $ => choice($.keyword_desc, $.keyword_asc)"
 "distinct_from: $ => seq($.keyword_is, $.keyword_distinct, $.keyword_from)"
 "dollar_quote: () => /\\$[^\\$]*\\$/"
-"double: $ => choice( make_keyword(\"float8\"), unsigned_type($, parametric_type($, $.keyword_double, ['precision', 'scale'])), unsigned_type($, parametric_type($, seq($.keyword_double, $.keyword_precision), ['precision', 'scale'])), unsigned_type($, parametric_type($, $.keyword_real, ['precision', 'scale'])), )"
+"double: $ => choice( $.keyword_float8, unsigned_type($, parametric_type($, $.keyword_double, ['precision', 'scale'])), unsigned_type($, parametric_type($, seq($.keyword_double, $.keyword_precision), ['precision', 'scale'])), unsigned_type($, parametric_type($, $.keyword_real, ['precision', 'scale'])), )"
 "drop_column: $ => seq( $.keyword_drop, optional( $.keyword_column, ), optional($._if_exists), field('name', $.identifier), )"
 "drop_constraint: $ => seq( $.keyword_drop, $.keyword_constraint, optional($._if_exists), $.identifier, optional($._drop_behavior), )"
 "drop_database: $ => prec.left(seq( $.keyword_drop, $.keyword_database, optional($._if_exists), $.identifier, optional($.keyword_with), optional($.keyword_force), ))"
@@ -306,6 +306,7 @@
 "keyword_filter: _ => make_keyword(\"filter\")"
 "keyword_first: _ => make_keyword(\"first\")"
 "keyword_float: _ => make_keyword(\"float\")"
+"keyword_float8: _ => make_keyword(\"float8\")"
 "keyword_following: _ => make_keyword(\"following\")"
 "keyword_follows: _ => make_keyword(\"follows\")"
 "keyword_for: _ => make_keyword(\"for\")"

... which of course now reminds me that I did change one thing, which was to extract a separate keyword_float8 rule instead of the previous inline call to make_keyword. 😄

Looks like that's confirmed to be the only change though!!

@DerekStride
Copy link
Owner

I like the idea here, splitting everything out into separate modules seems nice, are you aware of other projects that make use of this pattern? We're already quite different in that we don't commit the generated code so I worry about adding more friction.

@vergenzt
Copy link
Author

vergenzt commented Dec 28, 2025

I like the idea here, splitting everything out into separate modules seems nice, are you aware of other projects that make use of this pattern? We're already quite different in that we don't commit the generated code so I worry about adding more friction.

From some searching, the following appear to be tree-sitter grammar definitions which require pieces of their definitions from other files:

To answer your question more directly though, no I'm not aware of specific other projects which split things out. I believe this is also the first major tree-sitter grammar I've attempted to make a contribution to -- but I'm pretty sure it's also the largest / most complex grammar I've encountered.

IMO this will substantially reduce friction for new contributors because potential contributions will have a clearer place to go.

to avoid changing any parsed output (which would require changes to highlights.scm and test/corpus)
@vergenzt
Copy link
Author

vergenzt commented Dec 29, 2025

FYI I pushed 8f8e5fd to revert the (admittedly minor) keyword change, and reran the diff script. Diff output from ebf08ba (now-current tip of main) to 8f8e5fd (new tip of this PR) is now empty.

(You can see full output here: https://gist.github.com/vergenzt/1614ef579a3ba452050b2338cfd7adcf)

@vergenzt
Copy link
Author

If you're not interested in this would you at least be open to some sort of well-defined order for rules in grammar.js? Maybe a topological sort or something?

@DerekStride
Copy link
Owner

DerekStride commented Jan 5, 2026

@matthias-Q Do you have any objections? If not, I'm good to merge these changes.

@matthias-Q
Copy link
Collaborator

No, I am totally fine with it.

@vergenzt
Copy link
Author

vergenzt commented Jan 9, 2026

Awesome! I'll try to get conflicts resolved asap and get this back to being ready to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants