feat(security): WASM grammar validation and npm audit CI#834
feat(security): WASM grammar validation and npm audit CI#834carlos-alm merged 4 commits intomainfrom
Conversation
…o CI After each tree-sitter grammar is compiled to .wasm, validate it is a legitimate grammar: correct WASM magic bytes, exactly 2 exports (tree_sitter_<lang> + __wasm_apply_data_relocs), and only safe C runtime imports (memory, string, character classification — no I/O or syscalls). Suspicious grammars are deleted with a security warning. Add npm audit job to CI pipeline gating on high+ severity vulnerabilities in production dependencies.
|
Claude finished @carlos-alm's task —— View job Reviewing PR: WASM grammar validation and npm audit CIReview Progress
Root Cause AnalysisProblem Being Solved: Supply chain security for tree-sitter WASM grammars. While npm lockfiles prevent tampered packages, a compromised grammar could theoretically produce malicious WASM binaries. The PR adds validation to ensure only legitimate tree-sitter grammars reach runtime. Why This Approach: Post-build validation is the right defense-in-depth layer. The validation checks are appropriate:
Risk Assessment: The npm audit addition for CI is standard security practice. The WASM validation is proportionate - it catches obviously malicious grammars while allowing all legitimate ones. Phase 2: Configuration Change Detection (✅ PASSED)CI Changes Analysis:
WASM Validation Changes:
These are defensive security improvements, not rule relaxations. No red flags detected. Phase 3: Test Integrity Review (✅ PASSED)Test Coverage Analysis:
Test Plan Assessment: The PR author tested locally with all 22 grammars. The validation logic is simple and deterministic. Phase 4: Code Quality Assessment (✅ PASSED with minor observations)Security Logic Review (scripts/build-wasm.ts:53-81):
Minor Code Quality Notes:
CI Integration:
Backlog Compliance
Critical ConcernsNone. This is a well-implemented security enhancement. Final Recommendation
Minor suggestions for follow-up (not blocking):
|
Fixes GHSA-j3q9-mxjg-w52f and GHSA-27v5-c462-wpq7 (high severity).
Greptile SummaryThis PR adds two supply-chain security controls: post-build WASM grammar validation ( Confidence Score: 5/5Safe to merge; remaining findings are P2 style suggestions that do not affect correctness or security in practice. All three prior P1 concerns are resolved. The two open items (shell: true anti-pattern and __wasm_apply_data_relocs check by count vs. by name) are non-blocking improvements — the import allowlist provides the real security boundary regardless, and grammarDir is constructed from hardcoded package paths. scripts/build-wasm.ts — shell: true on execFileSync and __wasm_apply_data_relocs name check are worth a follow-up cleanup. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[For each grammar in list] --> B[execFileSync: npx tree-sitter build --wasm]
B -->|build fails| C[failed++\nwarn + continue]
B -->|build succeeds| D{.wasm file exists?}
D -->|no| E[failed++\nwarn + continue]
D -->|yes| F[validateGrammar]
F --> G{WASM magic bytes OK?}
G -->|no| H[push error\nreturn early]
G -->|yes| I[WebAssembly.compile]
I --> J{Exactly 1 tree_sitter_ export?}
J -->|no| K[push error]
J -->|yes| L{Export name matches expectedName?}
L -->|no| M[push error]
L -->|yes| N{exports.length >= 2?}
N -->|no| O[push error]
N -->|yes| P{All imports allowlisted?}
P -->|no| Q[push error]
P -->|yes| R{errors.length > 0?}
K --> R
M --> R
O --> R
Q --> R
R -->|yes| S[rejected++\nunlinkSync .wasm\nlog REJECTED]
R -->|no| T[log OK validated]
Reviews (2): Last reviewed commit: "fix(security): verify export name, relax..." | Re-trigger Greptile |
scripts/build-wasm.ts
Outdated
|
|
||
| const WASM_MAGIC = '0061736d'; | ||
|
|
||
| async function validateGrammar(wasmPath: string, _expectedName: string): Promise<string[]> { |
There was a problem hiding this comment.
Export name never verified against expected grammar
_expectedName is accepted but intentionally discarded, so the validation never confirms that the exported tree_sitter_* symbol matches the grammar being compiled. A compromised package could produce a WASM that exports tree_sitter_evil instead of tree_sitter_javascript, pass every check here (magic bytes ✓, export count 2 ✓, allowlisted imports ✓), and quietly replace the legitimate grammar. The expected export name is deterministic — g.name.replace(/-/g, '_') — so the check is straightforward to add.
| async function validateGrammar(wasmPath: string, _expectedName: string): Promise<string[]> { | |
| async function validateGrammar(wasmPath: string, expectedName: string): Promise<string[]> { |
Then after const tsExports = ...:
const expectedExport = expectedName.replace(/-/g, '_');
if (tsExports.length === 1 && tsExports[0] !== expectedExport) {
errors.push(`expected export '${expectedExport}', found '${tsExports[0]}'`);
}There was a problem hiding this comment.
Fixed in b905377. The expectedName parameter is now wired in: after extracting the tree_sitter_* export, we derive the expected export name via expectedName.replace(/-/g, '_') and verify it matches. A mismatched export (e.g. tree_sitter_evil instead of tree_sitter_javascript) now produces a validation error and the WASM file is deleted.
scripts/build-wasm.ts
Outdated
| if (exports.length !== 2) { | ||
| errors.push(`expected 2 exports, found ${exports.length}: [${exports.join(', ')}]`); | ||
| } |
There was a problem hiding this comment.
Hard-coded export count may be too brittle
Pinning total exports to exactly 2 assumes tree-sitter's WASM ABI is frozen at this count. Some grammars already ship with additional exports (e.g. external scanner symbols prefixed tree_sitter_), and future tree-sitter versions or parser-specific helpers could add more. A more stable check is to enforce exports.length >= 2 (at least the one tree_sitter_* export plus __wasm_apply_data_relocs) rather than a strict equality, keeping the security signal while reducing the risk of spurious rejections when the 22 grammars are regenerated against a newer tree-sitter-cli.
There was a problem hiding this comment.
Fixed in b905377. Changed the export count check from exports.length !== 2 to exports.length < 2. This ensures at least the expected tree_sitter_* export and __wasm_apply_data_relocs are present, while allowing additional exports that future tree-sitter versions or parser-specific helpers may add.
| audit: | ||
| runs-on: ubuntu-latest | ||
| name: Security audit | ||
| steps: | ||
| - uses: actions/checkout@v6 | ||
|
|
||
| - name: Setup Node.js | ||
| uses: actions/setup-node@v6 | ||
| with: | ||
| node-version: 22 | ||
|
|
||
| - name: Audit production dependencies | ||
| run: npm audit --omit=dev --audit-level=high |
There was a problem hiding this comment.
npm audit without npm install skips lockfile generation
npm audit works from package-lock.json so it doesn't need node_modules, but it does need npm install to have been run at least once to produce or refresh the lockfile. If the lockfile is stale relative to package.json (e.g. after a dependency bump without a corresponding lockfile commit), the audit runs against outdated resolution data and can silently miss newly introduced vulnerabilities. Adding npm install (or npm ci) before the audit step — mirroring the pattern used by lint, test, and typecheck — keeps the results authoritative and consistent.
There was a problem hiding this comment.
Fixed in b905377. Added an npm ci step before the audit step to ensure the lockfile is always fresh and consistent with package.json, matching the pattern used by lint, test, and typecheck jobs.
Codegraph Impact Analysis1 functions changed → 1 callers affected across 1 files
|
…audit job - Wire expectedName param to validate tree_sitter_ export matches the grammar being built (prevents substitution attacks) - Relax export count check from == 2 to >= 2 to accommodate future tree-sitter ABI changes while still catching empty modules - Add npm ci before npm audit in CI to ensure lockfile is fresh
Summary
.wasm, validate it is a legitimate grammar — correct WASM magic bytes, exactly 2 exports (tree_sitter_<lang>+__wasm_apply_data_relocs), and only safe C runtime imports (memory, string, character classification — no I/O, no syscalls). Suspicious grammars are deleted with a security warning.auditjob gates the CI pipeline on high+ severity vulnerabilities in production dependencies (npm audit --omit=dev --audit-level=high).Context
Tree-sitter grammar packages are external dependencies that compile to WASM. While lockfile integrity hashes prevent tampered tarballs, a compromised grammar could theoretically produce a malicious
.wasmbinary. The validation ensures only legitimate tree-sitter grammars (with a strict, predictable structure) reach runtime.Test plan
build-wasm.tstested locally — all 22 grammars built and passed validationauditjob passes (may neednpm audit fixforpath-to-regexp— tracked separately)