-
Notifications
You must be signed in to change notification settings - Fork 26
docs: convert data warehouse guide from TypeScript to Python #3368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…static report guide Replace <Note> JSX components with standard markdown blockquote syntax (>) to fix MDX compilation errors in next-mdx-remote. Custom JSX components require explicit component registration which wasn't configured. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Wrap `<100ms` and `<1 second` in backticks to prevent MDX from parsing them as JSX tag names. The `<1` pattern was causing "Unexpected character 1 before name" compilation errors. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Convert all code examples in the data warehouse guide to use Python and py-moose-lib instead of TypeScript. Changes include: - Update project initialization to use Python venv and pip - Convert all data models from TypeScript interfaces to Pydantic classes - Add OlapTable definitions with correct ClickHouse engines - Update Consumption API to use Python Api pattern - Fix date dimension SQL syntax (CASE instead of formatDateTime) - Fix fact table primary key issue (remove Key[int] from order_item_key) - Add Python-specific best practices section - Add technical notes explaining date dimension techniques - Update file paths from .ts to .py and models/ to datamodels/ All conversions were tested end-to-end with working data loading, queries, date dimension population, and Consumption API. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughDocumentation replatformed from TypeScript to Python patterns, converting model definitions, API examples, and export strategies across data warehouses and static report guides. Minor formatting updates to note styling and code emphasis. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@apps/framework-docs-v2/content/guides/data-warehouses.mdx`:
- Around line 2171-2185: The Product model uses Pydantic v1 `@validator` which is
incompatible with pydantic>=2; replace it with the v2 field validator API:
import field_validator from pydantic, change the decorator on
Product.name_must_not_be_empty from `@validator`('name') to
`@field_validator`('name') and update the function signature to the v2 form
(accepting cls and value), perform the same strip/empty check and return the
cleaned value; ensure Product still imports BaseModel and Field as before.
- Around line 732-743: The snippet defines dim_date using MergeTreeEngine()
which contradicts the guide's recommendation to use ReplacingMergeTreeEngine()
for dimensions; update the OlapConfig for dim_date to use
ReplacingMergeTreeEngine() instead of MergeTreeEngine(), or if you intentionally
want the non-replacing engine, add a brief inline comment next to the dim_date
declaration (referencing dim_date, OlapTable, OlapConfig, MergeTreeEngine,
ReplacingMergeTreeEngine, and DateDimension) explaining why the date dimension
is static and why MergeTreeEngine() was chosen to avoid confusion.
- Around line 2284-2288: The test snippet uses pytest.raises(ValueError) but
Pydantic raises pydantic.ValidationError for missing required fields; update the
test to use pytest.raises(ValidationError) and add the appropriate import for
ValidationError from pydantic so the invalid Customer(...) (the Customer model
instantiation) is asserted correctly.
- Around line 1801-1830: The SQL query in sales_summary_handler uses untyped
placeholders; update the query string to use typed placeholders `{start_date:
String}` and `{end_date: String}` and keep the params mapping passed to
client.query.execute unchanged (i.e., still provide params.start_date and
params.end_date), so the call in client.query.execute continues to safely
parameterize values per the moose-lib API.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
apps/framework-docs-v2/content/guides/data-warehouses.mdxapps/framework-docs-v2/content/guides/static-report-generation.mdx
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: examples/cdp-analytics/CLAUDE.md:0-0
Timestamp: 2026-01-15T19:41:53.549Z
Learning: Data models should be compatible with major analytics platforms: GA4, Segment, Meta CAPI, and Square, with consistent field mappings documented in the data models table
📚 Learning: 2026-01-15T19:41:53.549Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: examples/cdp-analytics/CLAUDE.md:0-0
Timestamp: 2026-01-15T19:41:53.549Z
Learning: For slow dashboard queries (>500ms) that aggregate across multiple tables or need cohort-based breakdowns, implement a materialized view instead of real-time queries
Applied to files:
apps/framework-docs-v2/content/guides/static-report-generation.mdx
📚 Learning: 2026-01-15T19:41:53.549Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: examples/cdp-analytics/CLAUDE.md:0-0
Timestamp: 2026-01-15T19:41:53.549Z
Learning: Data models should be compatible with major analytics platforms: GA4, Segment, Meta CAPI, and Square, with consistent field mappings documented in the data models table
Applied to files:
apps/framework-docs-v2/content/guides/data-warehouses.mdx
📚 Learning: 2026-01-15T19:41:53.549Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: examples/cdp-analytics/CLAUDE.md:0-0
Timestamp: 2026-01-15T19:41:53.549Z
Learning: Applies to examples/cdp-analytics/app/views/*.ts : Use SummingMergeTree engine for materialized views that require incremental updates, and wrap aggregated metrics with SUM() when querying to properly merge partial rows
Applied to files:
apps/framework-docs-v2/content/guides/data-warehouses.mdx
📚 Learning: 2025-12-16T23:09:00.546Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:00.546Z
Learning: When changing MooseStack functionality, ALWAYS run end-to-end tests located in `apps/framework-cli-e2e`
Applied to files:
apps/framework-docs-v2/content/guides/data-warehouses.mdx
📚 Learning: 2026-01-15T19:41:53.549Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: examples/cdp-analytics/CLAUDE.md:0-0
Timestamp: 2026-01-15T19:41:53.549Z
Learning: Applies to examples/cdp-analytics/app/ingest/models.ts : Data models in ingest/models.ts should define interfaces using the Model naming convention, and create corresponding IngestPipeline instances with table, stream, and ingestApi options enabled
Applied to files:
apps/framework-docs-v2/content/guides/data-warehouses.mdx
🔇 Additional comments (9)
apps/framework-docs-v2/content/guides/static-report-generation.mdx (2)
195-195: LGTM: MDX syntax fixes.Replacing
<Note>JSX with blockquotes fixes next-mdx-remote compilation issues.Also applies to: 290-290
877-878: LGTM: Escaped less-than signs.Wrapping
<100msand<1 secondin backticks prevents MDX from interpreting them as malformed JSX tags.apps/framework-docs-v2/content/guides/data-warehouses.mdx (7)
374-379: LGTM: Python project setup.Correct venv activation syntax for Unix. Windows alternative documented inline.
479-529: LGTM: Customer dimension model.Pydantic model with
Optionaldefaults andReplacingMergeTreeEnginefor deduplication is correct for dimension tables.
671-707: LGTM: Fact table design.Correct approach:
order_item_keyis plainint(notKey[int]) since the compositeorder_by_fieldsbecomes the ClickHouse primary key. Documentation at lines 771-773 explains this clearly.Also applies to: 758-769
777-802: LGTM: Module exports.All models and
OlapTableinstances properly exported with__all__list.
1270-1278: LGTM: Date dimension SQL fix.
CASEstatement correctly replacesformatDateTime('%A')which isn't supported in ClickHouse for day names.
624-625: LGTM: Technical notes.Clear explanations of
numbers()table function and date dimension performance benefits.Also applies to: 1299-1300
469-469: LGTM: Python migration overall.Comprehensive conversion from TypeScript to Python. Pydantic models, OlapTable patterns, and Python conventions correctly applied throughout.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| **DateDimension** (add to `app/datamodels/DateDimension.py`): | ||
| ```python | ||
| from moose_lib import OlapTable, OlapConfig, MergeTreeEngine | ||
|
|
||
| dim_date = OlapTable[DateDimension]( | ||
| "dim_date", | ||
| OlapConfig( | ||
| engine=MergeTreeEngine(), | ||
| order_by_fields=["date_key"], | ||
| ) | ||
| ) | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent engine choice for date dimension.
Line 537 states: "For dimensions, use ReplacingMergeTreeEngine()". However, dim_date uses MergeTreeEngine(). While date dimensions are typically generated once and static, this contradicts the documented pattern and may confuse readers.
Consider either:
- Using
ReplacingMergeTreeEngine()for consistency, or - Adding a comment explaining why date dimension differs
🤖 Prompt for AI Agents
In `@apps/framework-docs-v2/content/guides/data-warehouses.mdx` around lines 732 -
743, The snippet defines dim_date using MergeTreeEngine() which contradicts the
guide's recommendation to use ReplacingMergeTreeEngine() for dimensions; update
the OlapConfig for dim_date to use ReplacingMergeTreeEngine() instead of
MergeTreeEngine(), or if you intentionally want the non-replacing engine, add a
brief inline comment next to the dim_date declaration (referencing dim_date,
OlapTable, OlapConfig, MergeTreeEngine, ReplacingMergeTreeEngine, and
DateDimension) explaining why the date dimension is static and why
MergeTreeEngine() was chosen to avoid confusion.
| def sales_summary_handler(client: MooseClient, params: SalesSummaryParams): | ||
| """Get daily sales summary for date range""" | ||
|
|
||
| export const salesSummaryApi = new Api<SalesSummaryParams, SalesSummaryResult[]>( | ||
| "sales-summary", | ||
| async ({ startDate, endDate }, { client, sql }) => { | ||
| const result = await client.query.execute<SalesSummaryResult>(sql` | ||
| SELECT | ||
| query = """ | ||
| SELECT | ||
| toString(d.date) as date, | ||
| SUM(f.total_amount) as revenue, | ||
| COUNT(DISTINCT f.order_id) as orders, | ||
| SUM(f.quantity) as units | ||
| FROM fact_order_items f | ||
| JOIN dim_date d ON f.order_date_key = d.date_key | ||
| WHERE d.date BETWEEN ${startDate} AND ${endDate} | ||
| GROUP BY d.date | ||
| ORDER BY d.date | ||
| `); | ||
|
|
||
| return result.json(); | ||
| } | ||
| ); | ||
| FROM fact_order_items f | ||
| JOIN dim_date d ON f.order_date_key = d.date_key | ||
| WHERE d.date BETWEEN {start_date} AND {end_date} | ||
| GROUP BY d.date | ||
| ORDER BY d.date | ||
| """ | ||
|
|
||
| results = client.query.execute( | ||
| query, | ||
| { | ||
| "start_date": params.start_date, | ||
| "end_date": params.end_date, | ||
| } | ||
| ) | ||
|
|
||
| return results | ||
|
|
||
| sales_summary_api = Api[SalesSummaryParams, SalesSummaryResult]( | ||
| name="sales_summary", | ||
| query_function=sales_summary_handler | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
moose-lib Python client.query.execute SQL parameterization safety
💡 Result:
Short answer: use the library's typed placeholders — yes, client.query.execute in moose-lib’s Python API supports typed parameterization (e.g. {table: Identifier}, {limit: Int32}) and passing a params dict so values and identifiers are escaped/handled by the library rather than interpolated raw, which prevents normal SQL-injection vectors when used correctly. See the Python example in the moose-lib docs showing client.query.execute("SELECT * FROM {table: Identifier} LIMIT {limit: Int32}", { "table": ..., "limit": ... }). [1][2]
Notes / guidance:
- Use the provided typed placeholders exactly (Identifier for table/column names, numeric/string types for literals). [1]
- Do NOT build SQL by concatenating user input into raw strings; if you must allow dynamic identifiers, validate against an allowlist of known table/column names before passing them as Identifier. [2]
- If you want extra assurance, test edge cases (quotes, SQL control chars) and review the library’s source or docs for how each placeholder type is escaped/serialized.
Sources:
[1] moose-lib (PyPI) examples showing Python client.query.execute with typed placeholders.
[2] Modular Moose / Moose Query documentation (usage and query examples).
🌐 Web query:
moose framework client query execute parameters documentation
💡 Result:
There are two main Moose projects with a client.query.execute pattern — the 514-labs “Moose” analytics backend (moose-lib/moosestack) and the MooseStack/ORM clients. Key docs show how to call client.query.execute and pass parameters:
-
TypeScript (moose-lib): use the sql tagged template with embedded identifiers/values; example:
const result = await client.query.execute(sqlSELECT * FROM ${clickhouseTable} LIMIT ${limit}).
Source: official Moose docs / examples. [1] -
Python (moose-lib): call client.query.execute with a SQL string that uses Moose’s placeholder syntax and a parameters object. Example:
return client.query.execute("SELECT * FROM {table: Identifier} LIMIT {limit: Int32}", {
"table": clickhouse_table.name,
"limit": params.limit,
}).
Source: moosestack GitHub README / examples. [2]
References
- Moose docs / examples (moose-lib): docs and code showing client.query.execute with sql`` tagged template in TypeScript. [1]
- moosestack GitHub README / language examples (Python + TS parameterized calls). [2]
🏁 Script executed:
# Search repository for moose_lib or moose-client implementation/documentation
find . -type f -name "*.py" -o -name "*.md" -o -name "*.txt" | head -20
git ls-files | grep -iE "(moose|client)" | head -20Repository: 514-labs/moosestack
Length of output: 2080
🏁 Script executed:
# Search for actual client.query.execute usage in the repository
rg "client\.query\.execute" -A 5 -B 2 --type py | head -50Repository: 514-labs/moosestack
Length of output: 3027
🏁 Script executed:
# Look at the documentation file to get full context
head -1835 apps/framework-docs-v2/content/guides/data-warehouses.mdx | tail -40Repository: 514-labs/moosestack
Length of output: 983
🏁 Script executed:
# Search for moose client examples in the codebase
fd -e py | xargs grep -l "client\.query" 2>/dev/null | head -10Repository: 514-labs/moosestack
Length of output: 592
Use typed placeholders per moose-lib API.
client.query.execute() safely parameterizes values, but the example should use typed placeholders: {start_date: String} and {end_date: String} instead of untyped {start_date} and {end_date}.
🤖 Prompt for AI Agents
In `@apps/framework-docs-v2/content/guides/data-warehouses.mdx` around lines 1801
- 1830, The SQL query in sales_summary_handler uses untyped placeholders; update
the query string to use typed placeholders `{start_date: String}` and
`{end_date: String}` and keep the params mapping passed to client.query.execute
unchanged (i.e., still provide params.start_date and params.end_date), so the
call in client.query.execute continues to safely parameterize values per the
moose-lib API.
| ```python | ||
| from pydantic import BaseModel, Field, validator | ||
| from datetime import datetime | ||
|
|
||
| class Product(BaseModel): | ||
| product_key: int | ||
| name: str | ||
| price: float = Field(gt=0, description="Price must be positive") | ||
|
|
||
| @validator('name') | ||
| def name_must_not_be_empty(cls, v): | ||
| if not v or not v.strip(): | ||
| raise ValueError('Product name cannot be empty') | ||
| return v.strip() | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pydantic v2 syntax mismatch.
Line 2193 specifies pydantic>=2.0.0, but line 2172 uses @validator which is Pydantic v1 syntax. Pydantic v2 requires @field_validator with different signature.
Fix for Pydantic v2
-from pydantic import BaseModel, Field, validator
+from pydantic import BaseModel, Field, field_validator
from datetime import datetime
class Product(BaseModel):
product_key: int
name: str
price: float = Field(gt=0, description="Price must be positive")
- `@validator`('name')
- def name_must_not_be_empty(cls, v):
+ `@field_validator`('name')
+ `@classmethod`
+ def name_must_not_be_empty(cls, v: str) -> str:
if not v or not v.strip():
raise ValueError('Product name cannot be empty')
return v.strip()Also applies to: 2193-2198
🤖 Prompt for AI Agents
In `@apps/framework-docs-v2/content/guides/data-warehouses.mdx` around lines 2171
- 2185, The Product model uses Pydantic v1 `@validator` which is incompatible with
pydantic>=2; replace it with the v2 field validator API: import field_validator
from pydantic, change the decorator on Product.name_must_not_be_empty from
`@validator`('name') to `@field_validator`('name') and update the function signature
to the v2 form (accepting cls and value), perform the same strip/empty check and
return the cleaned value; ensure Product still imports BaseModel and Field as
before.
|
|
||
| # Invalid customer (missing required field) | ||
| with pytest.raises(ValueError): | ||
| Customer(customer_key=1, email="[email protected]") | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect exception type in test example.
Pydantic raises ValidationError for missing required fields, not ValueError.
Fix exception type
+from pydantic import ValidationError
def test_customer_validation():
# ...
# Invalid customer (missing required field)
- with pytest.raises(ValueError):
+ with pytest.raises(ValidationError):
Customer(customer_key=1, email="[email protected]")🤖 Prompt for AI Agents
In `@apps/framework-docs-v2/content/guides/data-warehouses.mdx` around lines 2284
- 2288, The test snippet uses pytest.raises(ValueError) but Pydantic raises
pydantic.ValidationError for missing required fields; update the test to use
pytest.raises(ValidationError) and add the appropriate import for
ValidationError from pydantic so the invalid Customer(...) (the Customer model
instantiation) is asserted correctly.
Summary
Converts the complete data warehouse guide from TypeScript to Python, updating all code examples to use py-moose-lib and Pydantic models.
Changes
Code Conversions
Bug Fixes
Key[int]from order_item_key)Documentation Enhancements
Testing
All conversions were tested end-to-end:
Sample project published to: https://github.com/514-labs/data-warehouse-sample-guide
Impact
Files Changed
apps/framework-docs-v2/content/guides/data-warehouses.mdxCo-Authored-By: Claude Sonnet 4.5 [email protected]
Note
Switch guide to Python implementation
Customer,Product,DateDimension,Store,OrderItem) and addsOlapTableconfigs (ReplacingMergeTree for dims, MergeTree for facts); updates discovery viaapp/__init__.pysales_summary_api) and adjusts endpoints/paramsCASEforday_name; clarifiesnumbers()technique and performance notesWritten by Cursor Bugbot for commit a55a203. This will update automatically on new commits. Configure here.