Skip to content

Conversation

@callicles
Copy link
Collaborator

@callicles callicles commented Feb 3, 2026

Note

Medium Risk
Changes core SQL engine extraction logic used for ClickHouse table introspection; while scoped and test-covered, it may alter behavior on unusual CREATE TABLE formatting or comments.

Overview
Fixes extract_engine_from_create_table to avoid naive substring matching of ENGINE (e.g., when an identifier contains engine).

The function now uses sqlparser parsing to confirm the statement is a CREATE TABLE with an engine option, then extracts the engine definition via dialect-aware tokenization and span slicing (handling whitespace/= and nested parentheses). Adds a regression test for a column name containing engine.

Written by Cursor Bugbot for commit 7c1e501. This will update automatically on new commits. Configure here.

@linear
Copy link

linear bot commented Feb 3, 2026

@vercel
Copy link

vercel bot commented Feb 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs-v2 Ready Ready Preview, Comment Feb 3, 2026 6:13pm

Request Review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 3, 2026

Summary by CodeRabbit

  • Bug Fixes
    • More reliable ClickHouse CREATE TABLE ENGINE extraction using an AST-first, token-aware approach to avoid false positives (e.g., column names containing "engine").
    • Preserves original formatting and correctly handles ENGINE with or without parameter lists, nested parentheses, and quoted strings.
  • Tests
    • Added tests covering edge cases around ENGINE detection and complex parameter formats.

Walkthrough

Replaced fragile substring-based ClickHouse CREATE TABLE ENGINE extraction with an AST-aware approach that uses sqlparser parsing and a tokenizer-based extractor (with dialect awareness) as fallback; added helpers for token/span navigation to correctly handle quoted strings, nested parentheses, and edge cases.

Changes

Cohort / File(s) Summary
ClickHouse SQL parser / ENGINE extraction
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
Reworked ENGINE extraction: attempt sqlparser AST parse then use tokenizer/span-aware extractor if ENGINE option present; fallback to tokenizer when parsing fails. Added internal helpers: create_table_has_engine_option, extract_engine_from_tokens, slice_for_span, location_to_index, is_keyword, skip_whitespace, find_matching_paren. Updated imports to include sqlparser types and expanded tests for edge cases (e.g., column names containing "engine").

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • jmsuzuki
  • DatGuyJonathan

Poem

🔧 Substrings gave way to AST-lit streams,
Tokens keep spans and quoted dreams,
Parens matched, no false alarms,
Buffer tables find their arms,
Parser sings with steadier beams.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed Title accurately describes the core fix: replacing naive substring matching of ENGINE keyword with AST-aware tokenizer approach to avoid false positives.
Description check ✅ Passed Description clearly explains the problem (naive substring matching), the solution (AST-gated tokenizer approach), and notes medium risk from behavior changes.
Linked Issues check ✅ Passed Changes directly address ENG-2101 by fixing ClickHouse Buffer engine recognition through improved ENGINE clause parsing that avoids false positives from identifiers containing 'engine'.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the ENGINE extraction logic in sql_parser.rs with supporting helpers and regression tests; no unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch nico/eng-2101-moose-not-recognising-existing-clickhouse-buffer-engine

📜 Recent review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6a0621d and 7c1e501.

📒 Files selected for processing (1)
  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run cargo clippy to ensure Rust code passes Clippy's linting standards before each commit

**/*.rs: Use thiserror with #[derive(thiserror::Error)] for error handling in Rust; define errors near the fallibility unit, never use anyhow::Result
Use snake_case for functions and variables, PascalCase for types and traits, SCREAMING_SNAKE_CASE for constants in Rust
Place constants in constants.rs at the appropriate module level in Rust
Use tuple structs with validation constructors (newtypes) for Rust type safety (e.g., struct UserId(String))
Write inline tests with #[cfg(test)] modules in Rust
Documentation is required for all public APIs in Rust
Run cargo clippy --all-targets -- -D warnings before committing Rust code; no warnings allowed

Files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
apps/framework-cli/**/*.rs

📄 CodeRabbit inference engine (apps/framework-cli/AGENTS.md)

apps/framework-cli/**/*.rs: Always run cargo clippy --all-targets -- -D warnings before commits; fix all warnings - no Clippy warnings may remain (treat warnings as errors)
Use rustfmt --edition 2021 for consistent formatting
Write meaningful names for functions, variables, and types
Document all public APIs and breaking changes
Use thiserror crate instead of anyhow::Result for error handling
Define errors near their unit of fallibility (no global Error types)
Use #[derive(thiserror::Error)] with #[error()] messages for error definitions
Define newtypes as tuple structs: struct UserId(u64);
Add validation constructors for newtypes: UserId::new(id: u64) -> Result<Self, Error>
Derive standard traits for newtypes: #[derive(Debug, Clone, PartialEq)]
Implement From/TryFrom for newtype conversions
Use derive_more or nutype to reduce newtype boilerplate
Use const for static values (prefer over static)
Use UPPER_SNAKE_CASE naming for constants
Scope constant visibility with preference: pub(crate) > pub(super) > pub
Group related constants together
Write unit tests for all public functions
Test error conditions and edge cases

Files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
**/framework-cli/src/**

⚙️ CodeRabbit configuration file

**/framework-cli/src/**: When reviewing changes to Moose CLI:

  1. Check if any user-facing changes were made (commands, flags, configs, apis, etc)
  2. If yes, verify the documentation for THAT SPECIFIC feature is updated in apps/framework-docs-v2/content
  3. If docs for that feature doesn't exist yet, it should be added. If the change removes public apis, the documentation for those should also be removed. Changing unrelated docs doesn't satisfy this requirement.
  4. In addition to reviewing for docs discrepancies, you should also review the code as per usual guidelines.

Files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test E2E TypeScript Tests Template (Node 24, npm)
  • GitHub Check: Test E2E TypeScript Tests Template (Node 22, npm)
  • GitHub Check: Test E2E TypeScript Tests Template (Node 20, npm)
🔇 Additional comments (2)
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs (2)

224-275: No false positive issue exists. The current implementation already correctly handles columns named "engine" because sqlparser distinguishes between keywords and identifiers at the token level. The existing test test_extract_engine_when_column_name_contains_engine confirms the code works as intended. The proposed fix is unnecessary.

Likely an incorrect or invalid review comment.


285-292: Docstring claims skipped comments but implementation doesn't handle them—fix the mismatch.

The skip_whitespace function's docstring says it skips comments, but the code only skips Token::Whitespace. This creates a real bug: inline comments like ENGINE /*...*/ = would break parsing.

However, the exact Token variant names for comments cannot be verified against the sqlparser fork without source access. The proposed fix references Token::LineComment and Token::BlockComment, but these variants don't appear in the codebase and may not match the fork's API. Verify the correct variant names in the forked sqlparser before committing.

🔧 Proposed fix (verify variant names first)
 fn skip_whitespace(tokens: &[sqlparser::tokenizer::TokenWithSpan], idx: &mut usize) {
     while *idx < tokens.len() {
         match tokens[*idx].token {
             Token::Whitespace(_) => *idx += 1,
+            Token::LineComment(_) | Token::BlockComment(_) => *idx += 1,
             _ => break,
         }
     }
 }

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@blacksmith-sh

This comment has been minimized.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs`:
- Around line 324-332: The function signature for slice_for_span uses an
unnecessary explicit lifetime 'a and triggers clippy::needless_lifetimes; remove
the explicit lifetime by changing fn slice_for_span<'a>(sql: &'a str, span:
Span) -> Option<&'a str'> to use elided lifetimes (fn slice_for_span(sql: &str,
span: Span) -> Option<&str>), leaving the body (calls to location_to_index and
slicing &sql[start..end]) unchanged so the borrow checker still infers the
correct lifetime.
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c5f268f and 6a0621d.

📒 Files selected for processing (1)
  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run cargo clippy to ensure Rust code passes Clippy's linting standards before each commit

**/*.rs: Use thiserror with #[derive(thiserror::Error)] for error handling in Rust; define errors near the fallibility unit, never use anyhow::Result
Use snake_case for functions and variables, PascalCase for types and traits, SCREAMING_SNAKE_CASE for constants in Rust
Place constants in constants.rs at the appropriate module level in Rust
Use tuple structs with validation constructors (newtypes) for Rust type safety (e.g., struct UserId(String))
Write inline tests with #[cfg(test)] modules in Rust
Documentation is required for all public APIs in Rust
Run cargo clippy --all-targets -- -D warnings before committing Rust code; no warnings allowed

Files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
apps/framework-cli/**/*.rs

📄 CodeRabbit inference engine (apps/framework-cli/AGENTS.md)

apps/framework-cli/**/*.rs: Always run cargo clippy --all-targets -- -D warnings before commits; fix all warnings - no Clippy warnings may remain (treat warnings as errors)
Use rustfmt --edition 2021 for consistent formatting
Write meaningful names for functions, variables, and types
Document all public APIs and breaking changes
Use thiserror crate instead of anyhow::Result for error handling
Define errors near their unit of fallibility (no global Error types)
Use #[derive(thiserror::Error)] with #[error()] messages for error definitions
Define newtypes as tuple structs: struct UserId(u64);
Add validation constructors for newtypes: UserId::new(id: u64) -> Result<Self, Error>
Derive standard traits for newtypes: #[derive(Debug, Clone, PartialEq)]
Implement From/TryFrom for newtype conversions
Use derive_more or nutype to reduce newtype boilerplate
Use const for static values (prefer over static)
Use UPPER_SNAKE_CASE naming for constants
Scope constant visibility with preference: pub(crate) > pub(super) > pub
Group related constants together
Write unit tests for all public functions
Test error conditions and edge cases

Files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
**/framework-cli/src/**

⚙️ CodeRabbit configuration file

**/framework-cli/src/**: When reviewing changes to Moose CLI:

  1. Check if any user-facing changes were made (commands, flags, configs, apis, etc)
  2. If yes, verify the documentation for THAT SPECIFIC feature is updated in apps/framework-docs-v2/content
  3. If docs for that feature doesn't exist yet, it should be added. If the change removes public apis, the documentation for those should also be removed. Changing unrelated docs doesn't satisfy this requirement.
  4. In addition to reviewing for docs discrepancies, you should also review the code as per usual guidelines.

Files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
🪛 GitHub Actions: Test Framework CLI
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs

[error] 325-325: the following explicit lifetimes could be elided: 'a

🪛 GitHub Check: Lints
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs

[failure] 325-325:
the following explicit lifetimes could be elided: 'a

🔇 Additional comments (7)
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs (7)

7-14: LGTM!

Imports are appropriate for the new AST-aware and tokenizer-based parsing approach.


179-200: LGTM!

Solid two-tier strategy: AST parsing when possible, tokenizer fallback for complex ClickHouse DDL that sqlparser can't handle. Early returns are clear.


202-218: LGTM!

The check for SqlOption::NamedParenthesizedList is appropriate for ClickHouse ENGINE options. Any edge cases missed here fall through to the tokenizer-based extraction.


220-278: LGTM!

The state machine (saw_create, saw_table) correctly ensures ENGINE is only matched after the CREATE TABLE sequence, fixing the column-name false positive bug. Span-based extraction preserves original formatting.


280-322: LGTM!

Helper functions are clean and correct. find_matching_paren properly handles nesting depth.


334-358: LGTM!

Correctly maps 1-based line/column to byte index with proper UTF-8 handling via char_indices().


1256-1270: LGTM!

Solid regression test directly targeting the reported bug: column name scripting_engine no longer causes false ENGINE matches.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@callicles callicles added this pull request to the merge queue Feb 3, 2026
Merged via the queue into main with commit aef176f Feb 3, 2026
59 checks passed
@callicles callicles deleted the nico/eng-2101-moose-not-recognising-existing-clickhouse-buffer-engine branch February 3, 2026 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants