Skip to content

Conversation

@514Ben
Copy link
Contributor

@514Ben 514Ben commented Jan 20, 2026

Summary

Converts the complete data warehouse guide from TypeScript to Python, updating all code examples to use py-moose-lib and Pydantic models.

Changes

Code Conversions

  • ✅ Updated project initialization to use Python venv and pip
  • ✅ Converted all 5 data models from TypeScript interfaces to Pydantic classes
    • Customer, Product, Store, DateDimension, OrderItem
  • ✅ Added OlapTable definitions with correct ClickHouse engines
    • ReplacingMergeTreeEngine for dimensions
    • MergeTreeEngine for facts
  • ✅ Converted Consumption API to Python Api pattern
  • ✅ Updated all file paths (.ts → .py, models/ → datamodels/)

Bug Fixes

  • 🐛 Fixed fact table primary key issue (removed Key[int] from order_item_key)
    • ClickHouse requires primary key to be a prefix of ORDER BY clause
  • 🐛 Fixed date dimension SQL syntax (CASE statement instead of formatDateTime '%A')

Documentation Enhancements

  • 📝 Added Python-specific best practices section
    • Virtual environments
    • Pydantic validation
    • Error handling
    • Testing with pytest
  • 📝 Added technical notes explaining date dimension techniques
    • numbers() function for sequence generation
    • Performance benefits (10-100x faster queries)

Testing

All conversions were tested end-to-end:

  • ✅ Project initialization and setup
  • ✅ All 5 data models created successfully
  • ✅ Data loading (5 customers, 5 products, 4 stores, 6 order items)
  • ✅ Date dimension population (4,018 dates, 2020-2030)
  • ✅ Complex multi-table joins working
  • ✅ Consumption API returning JSON responses
  • ✅ Weekend/weekday analysis queries
  • ✅ Top products by revenue queries

Sample project published to: https://github.com/514-labs/data-warehouse-sample-guide

Impact

  • Guide now supports Python developers building data warehouses
  • All code examples are executable and validated
  • Maintains same learning flow and structure as TypeScript version
  • No breaking changes to existing TypeScript guide (separate guide)

Files Changed

  • apps/framework-docs-v2/content/guides/data-warehouses.mdx
    • +483 additions / -189 deletions

Co-Authored-By: Claude Sonnet 4.5 [email protected]


Note

Switch guide to Python implementation

  • Rewrites models to Pydantic (Customer, Product, DateDimension, Store, OrderItem) and adds OlapTable configs (ReplacingMergeTree for dims, MergeTree for facts); updates discovery via app/__init__.py
  • Replaces TypeScript setup with Python venv/pip workflow; updates project structure and commands
  • Adds Python consumption API (sales_summary_api) and adjusts endpoints/params
  • Updates date dimension SQL to use explicit CASE for day_name; clarifies numbers() technique and performance notes
  • Adds Python best practices section (venv, Pydantic validation, deps, error handling, testing)
  • Minor docs polish: convert custom Note blocks to blockquotes, tweak performance table formatting

Written by Cursor Bugbot for commit a55a203. This will update automatically on new commits. Configure here.

514Ben and others added 3 commits January 19, 2026 14:41
…static report guide

Replace <Note> JSX components with standard markdown blockquote syntax (>) to fix MDX compilation errors in next-mdx-remote. Custom JSX components require explicit component registration which wasn't configured.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Wrap `<100ms` and `<1 second` in backticks to prevent MDX from parsing them as JSX tag names. The `<1` pattern was causing "Unexpected character 1 before name" compilation errors.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Convert all code examples in the data warehouse guide to use Python and
py-moose-lib instead of TypeScript. Changes include:

- Update project initialization to use Python venv and pip
- Convert all data models from TypeScript interfaces to Pydantic classes
- Add OlapTable definitions with correct ClickHouse engines
- Update Consumption API to use Python Api pattern
- Fix date dimension SQL syntax (CASE instead of formatDateTime)
- Fix fact table primary key issue (remove Key[int] from order_item_key)
- Add Python-specific best practices section
- Add technical notes explaining date dimension techniques
- Update file paths from .ts to .py and models/ to datamodels/

All conversions were tested end-to-end with working data loading,
queries, date dimension population, and Consumption API.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@vercel
Copy link

vercel bot commented Jan 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
docs-v2 Ready Ready Preview, Comment Jan 20, 2026 3:14am

Request Review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 20, 2026

Summary by CodeRabbit

  • Documentation
    • Data warehouses guide updated with Python-based examples, API patterns, comprehensive model definitions, and step-by-step instructions for data loading, scheduling, and best practices.
    • Static report generation guide enhanced with improved formatting and visual clarity.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Documentation replatformed from TypeScript to Python patterns, converting model definitions, API examples, and export strategies across data warehouses and static report guides. Minor formatting updates to note styling and code emphasis.

Changes

Cohort / File(s) Summary
Data Warehouses Replatforming
apps/framework-docs-v2/content/guides/data-warehouses.mdx
Replaced TypeScript interfaces, exports, and API examples with Python Pydantic models, OlapTable definitions, and Python-based patterns. Updated all code examples, file paths, CLI references, and narrative to reflect Python-centric workflows for data modeling, wrangling, loading, and scheduling.
Static Report Formatting
apps/framework-docs-v2/content/guides/static-report-generation.mdx
Converted HTML <Note> components to blockquote syntax and wrapped time values in backticks for code-style emphasis.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • callicles

Poem

🐍 From TypeScript shores to Python lands,
Models transform through skilled hands,
Pydantic fields dance with grace,
OlapTables find their place,
Docs reborn in new attire,
Code examples reach higher.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed Title accurately summarizes the primary change: converting the data warehouse guide from TypeScript to Python.
Description check ✅ Passed Description comprehensively covers the changeset: code conversions, bug fixes, documentation enhancements, testing details, and impact on developers.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@apps/framework-docs-v2/content/guides/data-warehouses.mdx`:
- Around line 2171-2185: The Product model uses Pydantic v1 `@validator` which is
incompatible with pydantic>=2; replace it with the v2 field validator API:
import field_validator from pydantic, change the decorator on
Product.name_must_not_be_empty from `@validator`('name') to
`@field_validator`('name') and update the function signature to the v2 form
(accepting cls and value), perform the same strip/empty check and return the
cleaned value; ensure Product still imports BaseModel and Field as before.
- Around line 732-743: The snippet defines dim_date using MergeTreeEngine()
which contradicts the guide's recommendation to use ReplacingMergeTreeEngine()
for dimensions; update the OlapConfig for dim_date to use
ReplacingMergeTreeEngine() instead of MergeTreeEngine(), or if you intentionally
want the non-replacing engine, add a brief inline comment next to the dim_date
declaration (referencing dim_date, OlapTable, OlapConfig, MergeTreeEngine,
ReplacingMergeTreeEngine, and DateDimension) explaining why the date dimension
is static and why MergeTreeEngine() was chosen to avoid confusion.
- Around line 2284-2288: The test snippet uses pytest.raises(ValueError) but
Pydantic raises pydantic.ValidationError for missing required fields; update the
test to use pytest.raises(ValidationError) and add the appropriate import for
ValidationError from pydantic so the invalid Customer(...) (the Customer model
instantiation) is asserted correctly.
- Around line 1801-1830: The SQL query in sales_summary_handler uses untyped
placeholders; update the query string to use typed placeholders `{start_date:
String}` and `{end_date: String}` and keep the params mapping passed to
client.query.execute unchanged (i.e., still provide params.start_date and
params.end_date), so the call in client.query.execute continues to safely
parameterize values per the moose-lib API.
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between cc5196c and a55a203.

📒 Files selected for processing (2)
  • apps/framework-docs-v2/content/guides/data-warehouses.mdx
  • apps/framework-docs-v2/content/guides/static-report-generation.mdx
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: examples/cdp-analytics/CLAUDE.md:0-0
Timestamp: 2026-01-15T19:41:53.549Z
Learning: Data models should be compatible with major analytics platforms: GA4, Segment, Meta CAPI, and Square, with consistent field mappings documented in the data models table
📚 Learning: 2026-01-15T19:41:53.549Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: examples/cdp-analytics/CLAUDE.md:0-0
Timestamp: 2026-01-15T19:41:53.549Z
Learning: For slow dashboard queries (>500ms) that aggregate across multiple tables or need cohort-based breakdowns, implement a materialized view instead of real-time queries

Applied to files:

  • apps/framework-docs-v2/content/guides/static-report-generation.mdx
📚 Learning: 2026-01-15T19:41:53.549Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: examples/cdp-analytics/CLAUDE.md:0-0
Timestamp: 2026-01-15T19:41:53.549Z
Learning: Data models should be compatible with major analytics platforms: GA4, Segment, Meta CAPI, and Square, with consistent field mappings documented in the data models table

Applied to files:

  • apps/framework-docs-v2/content/guides/data-warehouses.mdx
📚 Learning: 2026-01-15T19:41:53.549Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: examples/cdp-analytics/CLAUDE.md:0-0
Timestamp: 2026-01-15T19:41:53.549Z
Learning: Applies to examples/cdp-analytics/app/views/*.ts : Use SummingMergeTree engine for materialized views that require incremental updates, and wrap aggregated metrics with SUM() when querying to properly merge partial rows

Applied to files:

  • apps/framework-docs-v2/content/guides/data-warehouses.mdx
📚 Learning: 2025-12-16T23:09:00.546Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:00.546Z
Learning: When changing MooseStack functionality, ALWAYS run end-to-end tests located in `apps/framework-cli-e2e`

Applied to files:

  • apps/framework-docs-v2/content/guides/data-warehouses.mdx
📚 Learning: 2026-01-15T19:41:53.549Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: examples/cdp-analytics/CLAUDE.md:0-0
Timestamp: 2026-01-15T19:41:53.549Z
Learning: Applies to examples/cdp-analytics/app/ingest/models.ts : Data models in ingest/models.ts should define interfaces using the Model naming convention, and create corresponding IngestPipeline instances with table, stream, and ingestApi options enabled

Applied to files:

  • apps/framework-docs-v2/content/guides/data-warehouses.mdx
🔇 Additional comments (9)
apps/framework-docs-v2/content/guides/static-report-generation.mdx (2)

195-195: LGTM: MDX syntax fixes.

Replacing <Note> JSX with blockquotes fixes next-mdx-remote compilation issues.

Also applies to: 290-290


877-878: LGTM: Escaped less-than signs.

Wrapping <100ms and <1 second in backticks prevents MDX from interpreting them as malformed JSX tags.

apps/framework-docs-v2/content/guides/data-warehouses.mdx (7)

374-379: LGTM: Python project setup.

Correct venv activation syntax for Unix. Windows alternative documented inline.


479-529: LGTM: Customer dimension model.

Pydantic model with Optional defaults and ReplacingMergeTreeEngine for deduplication is correct for dimension tables.


671-707: LGTM: Fact table design.

Correct approach: order_item_key is plain int (not Key[int]) since the composite order_by_fields becomes the ClickHouse primary key. Documentation at lines 771-773 explains this clearly.

Also applies to: 758-769


777-802: LGTM: Module exports.

All models and OlapTable instances properly exported with __all__ list.


1270-1278: LGTM: Date dimension SQL fix.

CASE statement correctly replaces formatDateTime('%A') which isn't supported in ClickHouse for day names.


624-625: LGTM: Technical notes.

Clear explanations of numbers() table function and date dimension performance benefits.

Also applies to: 1299-1300


469-469: LGTM: Python migration overall.

Comprehensive conversion from TypeScript to Python. Pydantic models, OlapTable patterns, and Python conventions correctly applied throughout.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +732 to +743
**DateDimension** (add to `app/datamodels/DateDimension.py`):
```python
from moose_lib import OlapTable, OlapConfig, MergeTreeEngine

dim_date = OlapTable[DateDimension](
"dim_date",
OlapConfig(
engine=MergeTreeEngine(),
order_by_fields=["date_key"],
)
)
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent engine choice for date dimension.

Line 537 states: "For dimensions, use ReplacingMergeTreeEngine()". However, dim_date uses MergeTreeEngine(). While date dimensions are typically generated once and static, this contradicts the documented pattern and may confuse readers.

Consider either:

  1. Using ReplacingMergeTreeEngine() for consistency, or
  2. Adding a comment explaining why date dimension differs
🤖 Prompt for AI Agents
In `@apps/framework-docs-v2/content/guides/data-warehouses.mdx` around lines 732 -
743, The snippet defines dim_date using MergeTreeEngine() which contradicts the
guide's recommendation to use ReplacingMergeTreeEngine() for dimensions; update
the OlapConfig for dim_date to use ReplacingMergeTreeEngine() instead of
MergeTreeEngine(), or if you intentionally want the non-replacing engine, add a
brief inline comment next to the dim_date declaration (referencing dim_date,
OlapTable, OlapConfig, MergeTreeEngine, ReplacingMergeTreeEngine, and
DateDimension) explaining why the date dimension is static and why
MergeTreeEngine() was chosen to avoid confusion.

Comment on lines +1801 to +1830
def sales_summary_handler(client: MooseClient, params: SalesSummaryParams):
"""Get daily sales summary for date range"""

export const salesSummaryApi = new Api<SalesSummaryParams, SalesSummaryResult[]>(
"sales-summary",
async ({ startDate, endDate }, { client, sql }) => {
const result = await client.query.execute<SalesSummaryResult>(sql`
SELECT
query = """
SELECT
toString(d.date) as date,
SUM(f.total_amount) as revenue,
COUNT(DISTINCT f.order_id) as orders,
SUM(f.quantity) as units
FROM fact_order_items f
JOIN dim_date d ON f.order_date_key = d.date_key
WHERE d.date BETWEEN ${startDate} AND ${endDate}
GROUP BY d.date
ORDER BY d.date
`);

return result.json();
}
);
FROM fact_order_items f
JOIN dim_date d ON f.order_date_key = d.date_key
WHERE d.date BETWEEN {start_date} AND {end_date}
GROUP BY d.date
ORDER BY d.date
"""

results = client.query.execute(
query,
{
"start_date": params.start_date,
"end_date": params.end_date,
}
)

return results

sales_summary_api = Api[SalesSummaryParams, SalesSummaryResult](
name="sales_summary",
query_function=sales_summary_handler
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

moose-lib Python client.query.execute SQL parameterization safety

💡 Result:

Short answer: use the library's typed placeholders — yes, client.query.execute in moose-lib’s Python API supports typed parameterization (e.g. {table: Identifier}, {limit: Int32}) and passing a params dict so values and identifiers are escaped/handled by the library rather than interpolated raw, which prevents normal SQL-injection vectors when used correctly. See the Python example in the moose-lib docs showing client.query.execute("SELECT * FROM {table: Identifier} LIMIT {limit: Int32}", { "table": ..., "limit": ... }). [1][2]

Notes / guidance:

  • Use the provided typed placeholders exactly (Identifier for table/column names, numeric/string types for literals). [1]
  • Do NOT build SQL by concatenating user input into raw strings; if you must allow dynamic identifiers, validate against an allowlist of known table/column names before passing them as Identifier. [2]
  • If you want extra assurance, test edge cases (quotes, SQL control chars) and review the library’s source or docs for how each placeholder type is escaped/serialized.

Sources:
[1] moose-lib (PyPI) examples showing Python client.query.execute with typed placeholders.
[2] Modular Moose / Moose Query documentation (usage and query examples).


🌐 Web query:

moose framework client query execute parameters documentation

💡 Result:

There are two main Moose projects with a client.query.execute pattern — the 514-labs “Moose” analytics backend (moose-lib/moosestack) and the MooseStack/ORM clients. Key docs show how to call client.query.execute and pass parameters:

  • TypeScript (moose-lib): use the sql tagged template with embedded identifiers/values; example:
    const result = await client.query.execute(sqlSELECT * FROM ${clickhouseTable} LIMIT ${limit}).
    Source: official Moose docs / examples. [1]

  • Python (moose-lib): call client.query.execute with a SQL string that uses Moose’s placeholder syntax and a parameters object. Example:
    return client.query.execute("SELECT * FROM {table: Identifier} LIMIT {limit: Int32}", {
    "table": clickhouse_table.name,
    "limit": params.limit,
    }).
    Source: moosestack GitHub README / examples. [2]

References

  • Moose docs / examples (moose-lib): docs and code showing client.query.execute with sql`` tagged template in TypeScript. [1]
  • moosestack GitHub README / language examples (Python + TS parameterized calls). [2]

🏁 Script executed:

# Search repository for moose_lib or moose-client implementation/documentation
find . -type f -name "*.py" -o -name "*.md" -o -name "*.txt" | head -20
git ls-files | grep -iE "(moose|client)" | head -20

Repository: 514-labs/moosestack

Length of output: 2080


🏁 Script executed:

# Search for actual client.query.execute usage in the repository
rg "client\.query\.execute" -A 5 -B 2 --type py | head -50

Repository: 514-labs/moosestack

Length of output: 3027


🏁 Script executed:

# Look at the documentation file to get full context
head -1835 apps/framework-docs-v2/content/guides/data-warehouses.mdx | tail -40

Repository: 514-labs/moosestack

Length of output: 983


🏁 Script executed:

# Search for moose client examples in the codebase
fd -e py | xargs grep -l "client\.query" 2>/dev/null | head -10

Repository: 514-labs/moosestack

Length of output: 592


Use typed placeholders per moose-lib API.

client.query.execute() safely parameterizes values, but the example should use typed placeholders: {start_date: String} and {end_date: String} instead of untyped {start_date} and {end_date}.

🤖 Prompt for AI Agents
In `@apps/framework-docs-v2/content/guides/data-warehouses.mdx` around lines 1801
- 1830, The SQL query in sales_summary_handler uses untyped placeholders; update
the query string to use typed placeholders `{start_date: String}` and
`{end_date: String}` and keep the params mapping passed to client.query.execute
unchanged (i.e., still provide params.start_date and params.end_date), so the
call in client.query.execute continues to safely parameterize values per the
moose-lib API.

Comment on lines +2171 to +2185
```python
from pydantic import BaseModel, Field, validator
from datetime import datetime

class Product(BaseModel):
product_key: int
name: str
price: float = Field(gt=0, description="Price must be positive")

@validator('name')
def name_must_not_be_empty(cls, v):
if not v or not v.strip():
raise ValueError('Product name cannot be empty')
return v.strip()
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pydantic v2 syntax mismatch.

Line 2193 specifies pydantic>=2.0.0, but line 2172 uses @validator which is Pydantic v1 syntax. Pydantic v2 requires @field_validator with different signature.

Fix for Pydantic v2
-from pydantic import BaseModel, Field, validator
+from pydantic import BaseModel, Field, field_validator
 from datetime import datetime

 class Product(BaseModel):
     product_key: int
     name: str
     price: float = Field(gt=0, description="Price must be positive")

-    `@validator`('name')
-    def name_must_not_be_empty(cls, v):
+    `@field_validator`('name')
+    `@classmethod`
+    def name_must_not_be_empty(cls, v: str) -> str:
         if not v or not v.strip():
             raise ValueError('Product name cannot be empty')
         return v.strip()

Also applies to: 2193-2198

🤖 Prompt for AI Agents
In `@apps/framework-docs-v2/content/guides/data-warehouses.mdx` around lines 2171
- 2185, The Product model uses Pydantic v1 `@validator` which is incompatible with
pydantic>=2; replace it with the v2 field validator API: import field_validator
from pydantic, change the decorator on Product.name_must_not_be_empty from
`@validator`('name') to `@field_validator`('name') and update the function signature
to the v2 form (accepting cls and value), perform the same strip/empty check and
return the cleaned value; ensure Product still imports BaseModel and Field as
before.

Comment on lines +2284 to +2288

# Invalid customer (missing required field)
with pytest.raises(ValueError):
Customer(customer_key=1, email="[email protected]")
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incorrect exception type in test example.

Pydantic raises ValidationError for missing required fields, not ValueError.

Fix exception type
+from pydantic import ValidationError

 def test_customer_validation():
     # ...
     
     # Invalid customer (missing required field)
-    with pytest.raises(ValueError):
+    with pytest.raises(ValidationError):
         Customer(customer_key=1, email="[email protected]")
🤖 Prompt for AI Agents
In `@apps/framework-docs-v2/content/guides/data-warehouses.mdx` around lines 2284
- 2288, The test snippet uses pytest.raises(ValueError) but Pydantic raises
pydantic.ValidationError for missing required fields; update the test to use
pytest.raises(ValidationError) and add the appropriate import for
ValidationError from pydantic so the invalid Customer(...) (the Customer model
instantiation) is asserted correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants