-
Notifications
You must be signed in to change notification settings - Fork 26
fix: naive engine parsing picked up properties with engine keyword in it #3440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: naive engine parsing picked up properties with engine keyword in it #3440
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Summary by CodeRabbit
WalkthroughReplaced fragile substring-based ClickHouse CREATE TABLE ENGINE extraction with an AST-aware approach that uses sqlparser parsing and a tokenizer-based extractor (with dialect awareness) as fallback; added helpers for token/span navigation to correctly handle quoted strings, nested parentheses, and edge cases. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (3)**/*.rs📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Files:
apps/framework-cli/**/*.rs📄 CodeRabbit inference engine (apps/framework-cli/AGENTS.md)
Files:
**/framework-cli/src/**⚙️ CodeRabbit configuration file
Files:
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (2)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs`:
- Around line 324-332: The function signature for slice_for_span uses an
unnecessary explicit lifetime 'a and triggers clippy::needless_lifetimes; remove
the explicit lifetime by changing fn slice_for_span<'a>(sql: &'a str, span:
Span) -> Option<&'a str'> to use elided lifetimes (fn slice_for_span(sql: &str,
span: Span) -> Option<&str>), leaving the body (calls to location_to_index and
slicing &sql[start..end]) unchanged so the borrow checker still infers the
correct lifetime.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Run
cargo clippyto ensure Rust code passes Clippy's linting standards before each commit
**/*.rs: Usethiserrorwith#[derive(thiserror::Error)]for error handling in Rust; define errors near the fallibility unit, never useanyhow::Result
Use snake_case for functions and variables, PascalCase for types and traits, SCREAMING_SNAKE_CASE for constants in Rust
Place constants inconstants.rsat the appropriate module level in Rust
Use tuple structs with validation constructors (newtypes) for Rust type safety (e.g.,struct UserId(String))
Write inline tests with#[cfg(test)]modules in Rust
Documentation is required for all public APIs in Rust
Runcargo clippy --all-targets -- -D warningsbefore committing Rust code; no warnings allowed
Files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
apps/framework-cli/**/*.rs
📄 CodeRabbit inference engine (apps/framework-cli/AGENTS.md)
apps/framework-cli/**/*.rs: Always runcargo clippy --all-targets -- -D warningsbefore commits; fix all warnings - no Clippy warnings may remain (treat warnings as errors)
Userustfmt --edition 2021for consistent formatting
Write meaningful names for functions, variables, and types
Document all public APIs and breaking changes
Usethiserrorcrate instead ofanyhow::Resultfor error handling
Define errors near their unit of fallibility (no global Error types)
Use#[derive(thiserror::Error)]with#[error()]messages for error definitions
Define newtypes as tuple structs:struct UserId(u64);
Add validation constructors for newtypes:UserId::new(id: u64) -> Result<Self, Error>
Derive standard traits for newtypes:#[derive(Debug, Clone, PartialEq)]
ImplementFrom/TryFromfor newtype conversions
Usederive_moreornutypeto reduce newtype boilerplate
Useconstfor static values (prefer overstatic)
UseUPPER_SNAKE_CASEnaming for constants
Scope constant visibility with preference:pub(crate)>pub(super)>pub
Group related constants together
Write unit tests for all public functions
Test error conditions and edge cases
Files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
**/framework-cli/src/**
⚙️ CodeRabbit configuration file
**/framework-cli/src/**: When reviewing changes to Moose CLI:
- Check if any user-facing changes were made (commands, flags, configs, apis, etc)
- If yes, verify the documentation for THAT SPECIFIC feature is updated in apps/framework-docs-v2/content
- If docs for that feature doesn't exist yet, it should be added. If the change removes public apis, the documentation for those should also be removed. Changing unrelated docs doesn't satisfy this requirement.
- In addition to reviewing for docs discrepancies, you should also review the code as per usual guidelines.
Files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
🪛 GitHub Actions: Test Framework CLI
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
[error] 325-325: the following explicit lifetimes could be elided: 'a
🪛 GitHub Check: Lints
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
[failure] 325-325:
the following explicit lifetimes could be elided: 'a
🔇 Additional comments (7)
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs (7)
7-14: LGTM!Imports are appropriate for the new AST-aware and tokenizer-based parsing approach.
179-200: LGTM!Solid two-tier strategy: AST parsing when possible, tokenizer fallback for complex ClickHouse DDL that
sqlparsercan't handle. Early returns are clear.
202-218: LGTM!The check for
SqlOption::NamedParenthesizedListis appropriate for ClickHouse ENGINE options. Any edge cases missed here fall through to the tokenizer-based extraction.
220-278: LGTM!The state machine (
saw_create,saw_table) correctly ensures ENGINE is only matched after the CREATE TABLE sequence, fixing the column-name false positive bug. Span-based extraction preserves original formatting.
280-322: LGTM!Helper functions are clean and correct.
find_matching_parenproperly handles nesting depth.
334-358: LGTM!Correctly maps 1-based line/column to byte index with proper UTF-8 handling via
char_indices().
1256-1270: LGTM!Solid regression test directly targeting the reported bug: column name
scripting_engineno longer causes false ENGINE matches.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Note
Medium Risk
Changes core SQL engine extraction logic used for ClickHouse table introspection; while scoped and test-covered, it may alter behavior on unusual CREATE TABLE formatting or comments.
Overview
Fixes
extract_engine_from_create_tableto avoid naive substring matching ofENGINE(e.g., when an identifier containsengine).The function now uses
sqlparserparsing to confirm the statement is aCREATE TABLEwith an engine option, then extracts the engine definition via dialect-aware tokenization and span slicing (handling whitespace/=and nested parentheses). Adds a regression test for a column name containingengine.Written by Cursor Bugbot for commit 7c1e501. This will update automatically on new commits. Configure here.