Skip to content

feat(security): WASM grammar validation and npm audit CI#834

Merged
carlos-alm merged 4 commits intomainfrom
feat/wasm-validation-and-audit
Apr 4, 2026
Merged

feat(security): WASM grammar validation and npm audit CI#834
carlos-alm merged 4 commits intomainfrom
feat/wasm-validation-and-audit

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • WASM grammar validation: After each tree-sitter grammar is compiled to .wasm, validate it is a legitimate grammar — correct WASM magic bytes, exactly 2 exports (tree_sitter_<lang> + __wasm_apply_data_relocs), and only safe C runtime imports (memory, string, character classification — no I/O, no syscalls). Suspicious grammars are deleted with a security warning.
  • npm audit in CI: New audit job gates the CI pipeline on high+ severity vulnerabilities in production dependencies (npm audit --omit=dev --audit-level=high).

Context

Tree-sitter grammar packages are external dependencies that compile to WASM. While lockfile integrity hashes prevent tampered tarballs, a compromised grammar could theoretically produce a malicious .wasm binary. The validation ensures only legitimate tree-sitter grammars (with a strict, predictable structure) reach runtime.

Test plan

  • build-wasm.ts tested locally — all 22 grammars built and passed validation
  • CI audit job passes (may need npm audit fix for path-to-regexp — tracked separately)
  • Verify validation catches a bad WASM file (manually corrupt one to test rejection)

…o CI

After each tree-sitter grammar is compiled to .wasm, validate it is a
legitimate grammar: correct WASM magic bytes, exactly 2 exports
(tree_sitter_<lang> + __wasm_apply_data_relocs), and only safe C runtime
imports (memory, string, character classification — no I/O or syscalls).
Suspicious grammars are deleted with a security warning.

Add npm audit job to CI pipeline gating on high+ severity vulnerabilities
in production dependencies.
@claude
Copy link
Copy Markdown

claude bot commented Apr 4, 2026

Claude finished @carlos-alm's task —— View job


Reviewing PR: WASM grammar validation and npm audit CI

Review Progress

  • Read and analyze changed files
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity review
  • Phase 4: Code quality assessment
  • Phase 5: Backlog compliance evaluation
  • Final recommendation

Root Cause Analysis

Problem Being Solved: Supply chain security for tree-sitter WASM grammars. While npm lockfiles prevent tampered packages, a compromised grammar could theoretically produce malicious WASM binaries. The PR adds validation to ensure only legitimate tree-sitter grammars reach runtime.

Why This Approach: Post-build validation is the right defense-in-depth layer. The validation checks are appropriate:

  • WASM magic bytes (ensures legitimate WASM)
  • Exactly 2 exports (tree_sitter_<lang> + __wasm_apply_data_relocs - standard tree-sitter pattern)
  • Only safe C runtime imports (memory management, string ops, character classification - no I/O or syscalls)

Risk Assessment: The npm audit addition for CI is standard security practice. The WASM validation is proportionate - it catches obviously malicious grammars while allowing all legitimate ones.

Phase 2: Configuration Change Detection (✅ PASSED)

CI Changes Analysis:

  • No relaxed rules: The new audit job is additive security, not relaxation
  • Appropriate audit level: --audit-level=high focuses on production dependencies (--omit=dev)
  • Proper CI integration: Added to the needs: array in ci-pipeline job - will gate deployments on security audit failures
  • No timeout increases or coverage reductions

WASM Validation Changes:

  • Security enhancement: Validates WASM binaries against malicious imports/exports
  • Fail-safe behavior: Deletes suspicious binaries and continues (doesn't break builds)
  • Clear security messaging: Logs "SECURITY:" warnings for rejected grammars

These are defensive security improvements, not rule relaxations. No red flags detected.

Phase 3: Test Integrity Review (✅ PASSED)

Test Coverage Analysis:

  • No test weakening: No tests were modified or removed
  • No assertion changes: No weakened test expectations
  • Missing test coverage: No tests exist for validateGrammar(), but this is acceptable for security validation code that:
    • Has clear, deterministic logic (magic bytes, export counts, import allowlists)
    • Is manually tested per PR description ("all 22 grammars built and passed validation")
    • Fails safe (deletes suspicious files, continues operation)

Test Plan Assessment: The PR author tested locally with all 22 grammars. The validation logic is simple and deterministic.

Phase 4: Code Quality Assessment (✅ PASSED with minor observations)

Security Logic Review (scripts/build-wasm.ts:53-81):

  • WASM magic validation: Correctly checks 0061736d magic bytes
  • Export validation: Enforces exactly 2 exports (tree_sitter function + data relocs)
  • Import allowlist: Comprehensive C runtime allowlist, no I/O or syscalls
  • Error handling: Returns detailed error arrays, handles edge cases
  • File cleanup: Uses unlinkSync() to remove suspicious files

Minor Code Quality Notes:

  • _expectedName parameter is unused (scripts/build-wasm.ts:53) - could validate export name matches
  • Uses shell: true in execFileSync (line 133) - acceptable here since args are controlled
  • Could use more TypeScript typing for WebAssembly API calls, but current code is clear

CI Integration:

  • Proper npm audit flags: --omit=dev --audit-level=high is industry standard
  • No npm install: Audit job doesn't need dependencies, avoiding bootstrap issues

Backlog Compliance

  • Zero-dep: ✅ — Uses Node.js built-in WebAssembly API, no new runtime dependencies
  • Foundation-aligned: ✅ — Defensive security aligns with P1 (always-current) and P4 (zero-cost core)
  • Problem-fit: 3 — Security validation is useful but doesn't directly address "lost AI" problem
  • Breaking: No — Purely additive changes to build process and CI
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

None. This is a well-implemented security enhancement.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Solid defensive security implementation. WASM validation logic is sound, CI audit integration follows best practices, and changes are purely additive. The unused _expectedName parameter and lack of tests are minor issues that don't affect security posture. This adds meaningful supply chain protection without breaking existing functionality.

Minor suggestions for follow-up (not blocking):

  • Consider using _expectedName to validate export function name matches expected pattern
  • Could add a simple integration test that validates a known-good WASM file passes and a crafted malicious one fails

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 4, 2026

Greptile Summary

This PR adds two supply-chain security controls: post-build WASM grammar validation (scripts/build-wasm.ts) that checks magic bytes, a named tree_sitter_* export matching the expected grammar, a total export count floor, and a strict C-runtime import allowlist; and a new CI audit job gating the pipeline on high-severity production vulnerabilities. All three concerns raised in prior review rounds (export-name verification, strict-vs-floor export count, and npm ci before audit) have been addressed in the follow-up commit.

Confidence Score: 5/5

Safe to merge; remaining findings are P2 style suggestions that do not affect correctness or security in practice.

All three prior P1 concerns are resolved. The two open items (shell: true anti-pattern and __wasm_apply_data_relocs check by count vs. by name) are non-blocking improvements — the import allowlist provides the real security boundary regardless, and grammarDir is constructed from hardcoded package paths.

scripts/build-wasm.ts — shell: true on execFileSync and __wasm_apply_data_relocs name check are worth a follow-up cleanup.

Important Files Changed

Filename Overview
scripts/build-wasm.ts Adds post-build WASM validation: magic bytes, single named tree_sitter_* export (now verified by name via expectedName), total export count >= 2, and a strict import allowlist. Two minor gaps remain: shell: true on execFileSync and the __wasm_apply_data_relocs presence check is by count not by name.
.github/workflows/ci.yml Adds audit job (npm ci + npm audit --omit=dev --audit-level=high) gated by ci-pipeline. npm ci step added per prior review feedback; audit job is wired into ci-pipeline needs. Straightforward and correct.
package-lock.json Lockfile updated to reflect path-to-regexp bump from 8.3.0 to 8.4.2 (ReDoS fix). No concerns.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[For each grammar in list] --> B[execFileSync: npx tree-sitter build --wasm]
    B -->|build fails| C[failed++\nwarn + continue]
    B -->|build succeeds| D{.wasm file exists?}
    D -->|no| E[failed++\nwarn + continue]
    D -->|yes| F[validateGrammar]
    F --> G{WASM magic bytes OK?}
    G -->|no| H[push error\nreturn early]
    G -->|yes| I[WebAssembly.compile]
    I --> J{Exactly 1 tree_sitter_ export?}
    J -->|no| K[push error]
    J -->|yes| L{Export name matches expectedName?}
    L -->|no| M[push error]
    L -->|yes| N{exports.length >= 2?}
    N -->|no| O[push error]
    N -->|yes| P{All imports allowlisted?}
    P -->|no| Q[push error]
    P -->|yes| R{errors.length > 0?}
    K --> R
    M --> R
    O --> R
    Q --> R
    R -->|yes| S[rejected++\nunlinkSync .wasm\nlog REJECTED]
    R -->|no| T[log OK validated]
Loading

Reviews (2): Last reviewed commit: "fix(security): verify export name, relax..." | Re-trigger Greptile


const WASM_MAGIC = '0061736d';

async function validateGrammar(wasmPath: string, _expectedName: string): Promise<string[]> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Export name never verified against expected grammar

_expectedName is accepted but intentionally discarded, so the validation never confirms that the exported tree_sitter_* symbol matches the grammar being compiled. A compromised package could produce a WASM that exports tree_sitter_evil instead of tree_sitter_javascript, pass every check here (magic bytes ✓, export count 2 ✓, allowlisted imports ✓), and quietly replace the legitimate grammar. The expected export name is deterministic — g.name.replace(/-/g, '_') — so the check is straightforward to add.

Suggested change
async function validateGrammar(wasmPath: string, _expectedName: string): Promise<string[]> {
async function validateGrammar(wasmPath: string, expectedName: string): Promise<string[]> {

Then after const tsExports = ...:

const expectedExport = expectedName.replace(/-/g, '_');
if (tsExports.length === 1 && tsExports[0] !== expectedExport) {
  errors.push(`expected export '${expectedExport}', found '${tsExports[0]}'`);
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b905377. The expectedName parameter is now wired in: after extracting the tree_sitter_* export, we derive the expected export name via expectedName.replace(/-/g, '_') and verify it matches. A mismatched export (e.g. tree_sitter_evil instead of tree_sitter_javascript) now produces a validation error and the WASM file is deleted.

Comment on lines +71 to +73
if (exports.length !== 2) {
errors.push(`expected 2 exports, found ${exports.length}: [${exports.join(', ')}]`);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hard-coded export count may be too brittle

Pinning total exports to exactly 2 assumes tree-sitter's WASM ABI is frozen at this count. Some grammars already ship with additional exports (e.g. external scanner symbols prefixed tree_sitter_), and future tree-sitter versions or parser-specific helpers could add more. A more stable check is to enforce exports.length >= 2 (at least the one tree_sitter_* export plus __wasm_apply_data_relocs) rather than a strict equality, keeping the security signal while reducing the risk of spurious rejections when the 22 grammars are regenerated against a newer tree-sitter-cli.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b905377. Changed the export count check from exports.length !== 2 to exports.length < 2. This ensures at least the expected tree_sitter_* export and __wasm_apply_data_relocs are present, while allowing additional exports that future tree-sitter versions or parser-specific helpers may add.

Comment on lines +104 to +116
audit:
runs-on: ubuntu-latest
name: Security audit
steps:
- uses: actions/checkout@v6

- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: 22

- name: Audit production dependencies
run: npm audit --omit=dev --audit-level=high
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 npm audit without npm install skips lockfile generation

npm audit works from package-lock.json so it doesn't need node_modules, but it does need npm install to have been run at least once to produce or refresh the lockfile. If the lockfile is stale relative to package.json (e.g. after a dependency bump without a corresponding lockfile commit), the audit runs against outdated resolution data and can silently miss newly introduced vulnerabilities. Adding npm install (or npm ci) before the audit step — mirroring the pattern used by lint, test, and typecheck — keeps the results authoritative and consistent.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b905377. Added an npm ci step before the audit step to ensure the lockfile is always fresh and consistent with package.json, matching the pattern used by lint, test, and typecheck jobs.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

Codegraph Impact Analysis

1 functions changed1 callers affected across 1 files

  • validateGrammar in scripts/build-wasm.ts:53 (1 transitive callers)

…audit job

- Wire expectedName param to validate tree_sitter_ export matches the
  grammar being built (prevents substitution attacks)
- Relax export count check from == 2 to >= 2 to accommodate future
  tree-sitter ABI changes while still catching empty modules
- Add npm ci before npm audit in CI to ensure lockfile is fresh
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit 5abc131 into main Apr 4, 2026
12 of 13 checks passed
@carlos-alm carlos-alm deleted the feat/wasm-validation-and-audit branch April 4, 2026 22:04
@github-actions github-actions bot locked and limited conversation to collaborators Apr 4, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant