Fix component tags and directives inside PHP blocks#61
Closed
calebporzio wants to merge 6 commits intomainfrom
Closed
Fix component tags and directives inside PHP blocks#61calebporzio wants to merge 6 commits intomainfrom
calebporzio wants to merge 6 commits intomainfrom
Conversation
The tokenizer's FSM was PHP-unaware — it matched <x-*> patterns inside
raw <?php ?> blocks (comments, strings, etc.), producing broken compiled
output. Blade comments ({{-- --}}) and @php blocks were already stripped
before tokenization, but raw <?php ?> blocks passed through untouched.
Two fixes:
1. Tokenizer: Build a PHP range map via token_get_all() before the FSM
loop. When handleTextState() encounters '<' inside a PHP range,
fast-forward past the block instead of attempting component matching.
Uses PHP's own tokenizer for 100% correct parsing (handles ?> inside
strings, heredocs, unclosed blocks at EOF).
2. Wrapper: Stop restoring rawBlocks early. Keeping @php/@verbatim
content as @__raw_block_N__@ placeholders through the compilation
pipeline protects it from downstream precompilers (e.g. Livewire's
morph precompiler) that would otherwise inject PHP tags into comment
content. Laravel's own restoreRawContent() handles final restoration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move tokenizer tests from CommentBugEndToEndTest → TokenizerTest - Move integration/e2e tests from CommentBugEndToEndTest → IntegrationTest - Delete CommentBugEndToEndTest.php (bug-named catchall) - Remove unused BladeService::storeVerbatimBlocks() and restoreRawBlocks() - Fix style: new Tokenizer() → app(Tokenizer::class), remove decorative separators - Clarify Wrapper.php comment to be self-explanatory without PR context Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
Author
|
@ganyicz - I don't see a better way. I think this will fix a lot of these hard to track down issues we've been hearing about. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Blaze's tokenizer is a text-level FSM that scans for
<x-*>patterns to identify component tags. It has zero PHP awareness — no handling of<?php ?>blocks, PHP comments, strings, heredocs, or any other PHP construct.The compilation pipeline has two preprocessing steps that run before tokenization:
preStoreUncompiledBlocks()— replaces@php/@verbatim/@unblazeblocks with placeholderscompileComments()— strips{{-- --}}Blade commentsThis means
@phpblocks and Blade comments are safely hidden from the tokenizer. But raw<?php ?>blocks are never touched — they pass through to the tokenizer verbatim.What happens
The tokenizer splits the PHP comment into three tokens:
It compiles the component tag into multi-line PHP code:
Line 1's
//only comments out theensureCompiledcall. Lines 2+ execute normally, attempting torequire_oncea file that was never compiled — crash.The same bug affects
/* */block comments (component is silently swallowed) and any component-like pattern inside PHP strings or arbitrary PHP code.History
This was partially fixed before — PR #23 fixed
@phpblocks, PR #25 fixed Blade comments — but raw<?php ?>blocks were never addressed. Issue #55 shows users still hitting this in production. The Flux team worked around it by changing their code rather than Blaze fixing the underlying issue.Second issue: early raw block restoration
Separately,
Wrapper::wrap()was callingrestoreRawBlocks()too early in the pipeline, exposing@phpblock content to downstream precompilers (e.g. Livewire's morph precompiler) that can inject PHP tags into what should be protected content.Related: #22, #55
Alternatives explored
1. Regex preprocessing +
rawBlocksplaceholdersReplace
<?php ... ?>blocks with@__raw_block_N__@placeholders using regex before tokenization. ~10 lines, follows existingpreStoreUncompiledBlocks()pattern.Rejected: Regex
/<\?(?:php|=).*?\?>/sfails on?>inside PHP strings (e.g.<?php echo "?>"; ?>— regex closes at the string's?>). Also relies on Laravel'srawBlockssystem which has known issues (Issue #43 — placeholders leaking to output duringphp artisan optimize).2.
token_get_all()preprocessing +rawBlocksplaceholdersSame as #1 but uses PHP's own tokenizer instead of regex for 100% correct parsing.
Rejected: Correct parsing, but still depends on the fragile
rawBlocksplaceholder/restore mechanism. Also requires changes to all 4 compilation entry points inBlazeManager(or extracting a shared method).3. Tokenizer inline
<?skip (naivestrpos)In
handleTextState(), detect<?and usestrposto find?>, skipping the block.Rejected: Same
?>inside strings problem as regex. Self-contained (1 file) but not correct.4. New
PHP_BLOCKFSM stateAdd a
TokenizerState::PHP_BLOCKstate that accumulates text until?>.Rejected: More code than needed. Same
?>matching problem unless usingtoken_get_all(). Adds complexity to the state machine for no benefit over the chosen approach.5. Hybrid split-and-reassemble
Use
token_get_all()to split the template, only passT_INLINE_HTMLsegments to the tokenizer, reassemble afterward.Rejected: Significant refactor of the Parser/Tokenizer interface. Overkill for this bug.
6. Whitespace substitution
Replace PHP block content with whitespace, preserving the
<?php ?>wrapper.Rejected: Destroys PHP content. Downstream compilation may need it intact.
7. Upstream Laravel PR
Add
<?php ?>blocks to Laravel'sstoreUncompiledBlocks().Rejected: Laravel intentionally doesn't store raw PHP blocks — its
token_get_all()step incompileString()expects to see them. Would break Laravel's own compilation. Not actionable as a standalone fix.8. Tokenizer +
token_get_all()range map ← ChosenBefore the FSM runs, call
token_get_all()once to build a map of PHP block byte ranges. WhenhandleTextState()sees<inside a range, fast-forward past it. See "Why Solution 8" below.9. Document as limitation
Declare "use
{{-- --}}instead."Rejected: Users are hitting this in production (#55). IDEs and formatters can produce
<?php ?>blocks. Not acceptable.10. Centralize fix in
Parser::parse()Add preprocessing inside
Parser::parse()instead of all 4BlazeManagerentry points.Rejected: Restoration wiring is unclear — Parser shouldn't own preprocessing responsibility.
Why Solution 8
token_get_all()is PHP's own tokenizer — handles?>in strings, heredocs, unclosed blocks at EOF. Zero false positives.Tokenizer.php. No changes to BlazeManager, Parser, or BladeService.BladeService::compileStatementsMadePublic()already usestoken_get_all()+T_INLINE_HTMLin the hot path.token_get_all()adds ~67µs per call on a 49KB template (benchmarked). Templates without<?skip it entirely via earlystrposcheck — zero overhead for the common case.Solution
1. Tokenizer PHP range map (
src/Parser/Tokenizer.php):$phpRangesproperty andbuildPhpRangeMap()methodtoken_get_all()builds a map of all<?php ... ?>byte rangeshandleTextState()encounters<inside a PHP range, fast-forwards past the entire block (appending it to the text buffer as-is)strpos('<?')check means zero overhead for templates without PHP blocks2. Defer raw block restoration (
src/Compiler/Wrapper.php):restoreRawBlocks(),storeVerbatimBlocks(), andpreStoreUncompiledBlocks()calls from the Wrapper@php/@verbatimcontent stays as@__raw_block_N__@placeholders through the pipeline, protecting it from downstream precompilersrestoreRawContent()handles final restoration at the end ofcompileString()3. Dead code removal (
src/BladeService.php):storeVerbatimBlocks()andrestoreRawBlocks()— no longer called anywhereTest plan
//)/* */)