feat(security): WASM grammar validation and npm audit CI by carlos-alm · Pull Request #834 · optave/ops-codegraph-tool

carlos-alm · 2026-04-04T11:23:37Z

Summary

WASM grammar validation: After each tree-sitter grammar is compiled to .wasm, validate it is a legitimate grammar — correct WASM magic bytes, exactly 2 exports (tree_sitter_<lang> + __wasm_apply_data_relocs), and only safe C runtime imports (memory, string, character classification — no I/O, no syscalls). Suspicious grammars are deleted with a security warning.
npm audit in CI: New audit job gates the CI pipeline on high+ severity vulnerabilities in production dependencies (npm audit --omit=dev --audit-level=high).

Context

Tree-sitter grammar packages are external dependencies that compile to WASM. While lockfile integrity hashes prevent tampered tarballs, a compromised grammar could theoretically produce a malicious .wasm binary. The validation ensures only legitimate tree-sitter grammars (with a strict, predictable structure) reach runtime.

Test plan

build-wasm.ts tested locally — all 22 grammars built and passed validation
CI audit job passes (may need npm audit fix for path-to-regexp — tracked separately)
Verify validation catches a bad WASM file (manually corrupt one to test rejection)

…o CI After each tree-sitter grammar is compiled to .wasm, validate it is a legitimate grammar: correct WASM magic bytes, exactly 2 exports (tree_sitter_<lang> + __wasm_apply_data_relocs), and only safe C runtime imports (memory, string, character classification — no I/O or syscalls). Suspicious grammars are deleted with a security warning. Add npm audit job to CI pipeline gating on high+ severity vulnerabilities in production dependencies.

claude · 2026-04-04T11:23:50Z

Claude finished @carlos-alm's task —— View job

Reviewing PR: WASM grammar validation and npm audit CI

Review Progress

Read and analyze changed files
Phase 1: Root cause analysis
Phase 2: Configuration change detection
Phase 3: Test integrity review
Phase 4: Code quality assessment
Phase 5: Backlog compliance evaluation
Final recommendation

Root Cause Analysis

Problem Being Solved: Supply chain security for tree-sitter WASM grammars. While npm lockfiles prevent tampered packages, a compromised grammar could theoretically produce malicious WASM binaries. The PR adds validation to ensure only legitimate tree-sitter grammars reach runtime.

Why This Approach: Post-build validation is the right defense-in-depth layer. The validation checks are appropriate:

WASM magic bytes (ensures legitimate WASM)
Exactly 2 exports (tree_sitter_<lang> + __wasm_apply_data_relocs - standard tree-sitter pattern)
Only safe C runtime imports (memory management, string ops, character classification - no I/O or syscalls)

Risk Assessment: The npm audit addition for CI is standard security practice. The WASM validation is proportionate - it catches obviously malicious grammars while allowing all legitimate ones.

Phase 2: Configuration Change Detection (✅ PASSED)

CI Changes Analysis:

✅ No relaxed rules: The new audit job is additive security, not relaxation
✅ Appropriate audit level: --audit-level=high focuses on production dependencies (--omit=dev)
✅ Proper CI integration: Added to the needs: array in ci-pipeline job - will gate deployments on security audit failures
✅ No timeout increases or coverage reductions

WASM Validation Changes:

✅ Security enhancement: Validates WASM binaries against malicious imports/exports
✅ Fail-safe behavior: Deletes suspicious binaries and continues (doesn't break builds)
✅ Clear security messaging: Logs "SECURITY:" warnings for rejected grammars

These are defensive security improvements, not rule relaxations. No red flags detected.

Phase 3: Test Integrity Review (✅ PASSED)

Test Coverage Analysis:

✅ No test weakening: No tests were modified or removed
✅ No assertion changes: No weakened test expectations
✅ Missing test coverage: No tests exist for validateGrammar(), but this is acceptable for security validation code that:
- Has clear, deterministic logic (magic bytes, export counts, import allowlists)
- Is manually tested per PR description ("all 22 grammars built and passed validation")
- Fails safe (deletes suspicious files, continues operation)

Test Plan Assessment: The PR author tested locally with all 22 grammars. The validation logic is simple and deterministic.

Phase 4: Code Quality Assessment (✅ PASSED with minor observations)

Security Logic Review (scripts/build-wasm.ts:53-81):

✅ WASM magic validation: Correctly checks 0061736d magic bytes
✅ Export validation: Enforces exactly 2 exports (tree_sitter function + data relocs)
✅ Import allowlist: Comprehensive C runtime allowlist, no I/O or syscalls
✅ Error handling: Returns detailed error arrays, handles edge cases
✅ File cleanup: Uses unlinkSync() to remove suspicious files

Minor Code Quality Notes:

_expectedName parameter is unused (scripts/build-wasm.ts:53) - could validate export name matches
Uses shell: true in execFileSync (line 133) - acceptable here since args are controlled
Could use more TypeScript typing for WebAssembly API calls, but current code is clear

CI Integration:

✅ Proper npm audit flags: --omit=dev --audit-level=high is industry standard
✅ No npm install: Audit job doesn't need dependencies, avoiding bootstrap issues

Backlog Compliance

Zero-dep: ✅ — Uses Node.js built-in WebAssembly API, no new runtime dependencies
Foundation-aligned: ✅ — Defensive security aligns with P1 (always-current) and P4 (zero-cost core)
Problem-fit: 3 — Security validation is useful but doesn't directly address "lost AI" problem
Breaking: No — Purely additive changes to build process and CI
Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

None. This is a well-implemented security enhancement.

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5)
Action: APPROVE
Reasoning: Solid defensive security implementation. WASM validation logic is sound, CI audit integration follows best practices, and changes are purely additive. The unused _expectedName parameter and lack of tests are minor issues that don't affect security posture. This adds meaningful supply chain protection without breaking existing functionality.

Minor suggestions for follow-up (not blocking):

Consider using _expectedName to validate export function name matches expected pattern
Could add a simple integration test that validates a known-good WASM file passes and a crafted malicious one fails

Fixes GHSA-j3q9-mxjg-w52f and GHSA-27v5-c462-wpq7 (high severity).

greptile-apps · 2026-04-04T11:25:48Z

Greptile Summary

This PR adds two supply-chain security controls: post-build WASM grammar validation (scripts/build-wasm.ts) that checks magic bytes, a named tree_sitter_* export matching the expected grammar, a total export count floor, and a strict C-runtime import allowlist; and a new CI audit job gating the pipeline on high-severity production vulnerabilities. All three concerns raised in prior review rounds (export-name verification, strict-vs-floor export count, and npm ci before audit) have been addressed in the follow-up commit.

Confidence Score: 5/5

Safe to merge; remaining findings are P2 style suggestions that do not affect correctness or security in practice.

All three prior P1 concerns are resolved. The two open items (shell: true anti-pattern and __wasm_apply_data_relocs check by count vs. by name) are non-blocking improvements — the import allowlist provides the real security boundary regardless, and grammarDir is constructed from hardcoded package paths.

scripts/build-wasm.ts — shell: true on execFileSync and __wasm_apply_data_relocs name check are worth a follow-up cleanup.

Important Files Changed

Filename	Overview
scripts/build-wasm.ts	Adds post-build WASM validation: magic bytes, single named tree_sitter_* export (now verified by name via expectedName), total export count >= 2, and a strict import allowlist. Two minor gaps remain: shell: true on execFileSync and the __wasm_apply_data_relocs presence check is by count not by name.
.github/workflows/ci.yml	Adds audit job (npm ci + npm audit --omit=dev --audit-level=high) gated by ci-pipeline. npm ci step added per prior review feedback; audit job is wired into ci-pipeline needs. Straightforward and correct.
package-lock.json	Lockfile updated to reflect path-to-regexp bump from 8.3.0 to 8.4.2 (ReDoS fix). No concerns.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[For each grammar in list] --> B[execFileSync: npx tree-sitter build --wasm]
    B -->|build fails| C[failed++\nwarn + continue]
    B -->|build succeeds| D{.wasm file exists?}
    D -->|no| E[failed++\nwarn + continue]
    D -->|yes| F[validateGrammar]
    F --> G{WASM magic bytes OK?}
    G -->|no| H[push error\nreturn early]
    G -->|yes| I[WebAssembly.compile]
    I --> J{Exactly 1 tree_sitter_ export?}
    J -->|no| K[push error]
    J -->|yes| L{Export name matches expectedName?}
    L -->|no| M[push error]
    L -->|yes| N{exports.length >= 2?}
    N -->|no| O[push error]
    N -->|yes| P{All imports allowlisted?}
    P -->|no| Q[push error]
    P -->|yes| R{errors.length > 0?}
    K --> R
    M --> R
    O --> R
    Q --> R
    R -->|yes| S[rejected++\nunlinkSync .wasm\nlog REJECTED]
    R -->|no| T[log OK validated]

_{Reviews (2): Last reviewed commit: "fix(security): verify export name, relax..." | Re-trigger Greptile}

greptile-apps · 2026-04-04T11:25:51Z

scripts/build-wasm.ts

+
+const WASM_MAGIC = '0061736d';
+
+async function validateGrammar(wasmPath: string, _expectedName: string): Promise<string[]> {


Export name never verified against expected grammar

_expectedName is accepted but intentionally discarded, so the validation never confirms that the exported tree_sitter_* symbol matches the grammar being compiled. A compromised package could produce a WASM that exports tree_sitter_evil instead of tree_sitter_javascript, pass every check here (magic bytes ✓, export count 2 ✓, allowlisted imports ✓), and quietly replace the legitimate grammar. The expected export name is deterministic — g.name.replace(/-/g, '_') — so the check is straightforward to add.

Suggested change

async function validateGrammar(wasmPath: string, _expectedName: string): Promise<string[]> {

async function validateGrammar(wasmPath: string, expectedName: string): Promise<string[]> {

Then after const tsExports = ...:

const expectedExport = expectedName.replace(/-/g, '_'); if (tsExports.length === 1 && tsExports[0] !== expectedExport) { errors.push(`expected export '${expectedExport}', found '${tsExports[0]}'`); }

Fixed in b905377. The expectedName parameter is now wired in: after extracting the tree_sitter_* export, we derive the expected export name via expectedName.replace(/-/g, '_') and verify it matches. A mismatched export (e.g. tree_sitter_evil instead of tree_sitter_javascript) now produces a validation error and the WASM file is deleted.

greptile-apps · 2026-04-04T11:25:52Z

scripts/build-wasm.ts

+  if (exports.length !== 2) {
+    errors.push(`expected 2 exports, found ${exports.length}: [${exports.join(', ')}]`);
+  }


Hard-coded export count may be too brittle

Pinning total exports to exactly 2 assumes tree-sitter's WASM ABI is frozen at this count. Some grammars already ship with additional exports (e.g. external scanner symbols prefixed tree_sitter_), and future tree-sitter versions or parser-specific helpers could add more. A more stable check is to enforce exports.length >= 2 (at least the one tree_sitter_* export plus __wasm_apply_data_relocs) rather than a strict equality, keeping the security signal while reducing the risk of spurious rejections when the 22 grammars are regenerated against a newer tree-sitter-cli.

Fixed in b905377. Changed the export count check from exports.length !== 2 to exports.length < 2. This ensures at least the expected tree_sitter_* export and __wasm_apply_data_relocs are present, while allowing additional exports that future tree-sitter versions or parser-specific helpers may add.

greptile-apps · 2026-04-04T11:25:53Z

.github/workflows/ci.yml

+  audit:
+    runs-on: ubuntu-latest
+    name: Security audit
+    steps:
+      - uses: actions/checkout@v6
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v6
+        with:
+          node-version: 22
+
+      - name: Audit production dependencies
+        run: npm audit --omit=dev --audit-level=high


npm audit without npm install skips lockfile generation

npm audit works from package-lock.json so it doesn't need node_modules, but it does need npm install to have been run at least once to produce or refresh the lockfile. If the lockfile is stale relative to package.json (e.g. after a dependency bump without a corresponding lockfile commit), the audit runs against outdated resolution data and can silently miss newly introduced vulnerabilities. Adding npm install (or npm ci) before the audit step — mirroring the pattern used by lint, test, and typecheck — keeps the results authoritative and consistent.

Fixed in b905377. Added an npm ci step before the audit step to ensure the lockfile is always fresh and consistent with package.json, matching the pattern used by lint, test, and typecheck jobs.

github-actions · 2026-04-04T11:31:42Z

Codegraph Impact Analysis

1 functions changed → 1 callers affected across 1 files

validateGrammar in scripts/build-wasm.ts:53 (1 transitive callers)

…audit job - Wire expectedName param to validate tree_sitter_ export matches the grammar being built (prevents substitution attacks) - Relax export count check from == 2 to >= 2 to accommodate future tree-sitter ABI changes while still catching empty modules - Add npm ci before npm audit in CI to ensure lockfile is fresh

carlos-alm · 2026-04-04T20:24:34Z

@greptileai

fix(deps): update path-to-regexp 8.3.0 → 8.4.2 (ReDoS vulnerability)

7df7af0

Fixes GHSA-j3q9-mxjg-w52f and GHSA-27v5-c462-wpq7 (high severity).

greptile-apps bot reviewed Apr 4, 2026

View reviewed changes

Merge branch 'main' into feat/wasm-validation-and-audit

2f1f711

carlos-alm merged commit 5abc131 into main Apr 4, 2026
12 of 13 checks passed

carlos-alm deleted the feat/wasm-validation-and-audit branch April 4, 2026 22:04

github-actions bot locked and limited conversation to collaborators Apr 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(security): WASM grammar validation and npm audit CI#834

feat(security): WASM grammar validation and npm audit CI#834
carlos-alm merged 4 commits intomainfrom
feat/wasm-validation-and-audit

carlos-alm commented Apr 4, 2026

Uh oh!

claude bot commented Apr 4, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 4, 2026 •

edited

Loading

Uh oh!

greptile-apps bot Apr 4, 2026

Uh oh!

carlos-alm Apr 4, 2026

Uh oh!

greptile-apps bot Apr 4, 2026

Uh oh!

carlos-alm Apr 4, 2026

Uh oh!

greptile-apps bot Apr 4, 2026

Uh oh!

carlos-alm Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026 •

edited

Loading

Uh oh!

carlos-alm commented Apr 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		const WASM_MAGIC = '0061736d';

		async function validateGrammar(wasmPath: string, _expectedName: string): Promise<string[]> {

Conversation

carlos-alm commented Apr 4, 2026

Summary

Context

Test plan

Uh oh!

claude bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewing PR: WASM grammar validation and npm audit CI

Review Progress

Root Cause Analysis

Phase 2: Configuration Change Detection (✅ PASSED)

Phase 3: Test Integrity Review (✅ PASSED)

Phase 4: Code Quality Assessment (✅ PASSED with minor observations)

Backlog Compliance

Critical Concerns

Final Recommendation

Uh oh!

greptile-apps bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codegraph Impact Analysis

Uh oh!

carlos-alm commented Apr 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Apr 4, 2026 •

edited

Loading

greptile-apps bot commented Apr 4, 2026 •

edited

Loading

github-actions bot commented Apr 4, 2026 •

edited

Loading