Skip to content

Comments

Fix component tags and directives inside PHP blocks#61

Closed
calebporzio wants to merge 6 commits intomainfrom
fix-blade-inside-php-comments
Closed

Fix component tags and directives inside PHP blocks#61
calebporzio wants to merge 6 commits intomainfrom
fix-blade-inside-php-comments

Conversation

@calebporzio
Copy link
Contributor

@calebporzio calebporzio commented Feb 25, 2026

Problem

Blaze's tokenizer is a text-level FSM that scans for <x-*> patterns to identify component tags. It has zero PHP awareness — no handling of <?php ?> blocks, PHP comments, strings, heredocs, or any other PHP construct.

The compilation pipeline has two preprocessing steps that run before tokenization:

  1. preStoreUncompiledBlocks() — replaces @php/@verbatim/@unblaze blocks with placeholders
  2. compileComments() — strips {{-- --}} Blade comments

This means @php blocks and Blade comments are safely hidden from the tokenizer. But raw <?php ?> blocks are never touched — they pass through to the tokenizer verbatim.

What happens

<?php // <x-alert /> ?>
<div>Hello</div>

The tokenizer splits the PHP comment into three tokens:

TextToken:          "<?php // "
TagSelfCloseToken:  name=alert     ← parsed the commented-out tag
TextToken:          " ?>\n<div>..."

It compiles the component tag into multi-line PHP code:

<?php // <?php $__blaze->ensureCompiled(...); ?>
<?php require_once $__blaze->compiledPath.'/hash.php'; ?>
<?php _hash($__blaze, [], [], [], isset($this) ? $this : null); ?>
<?php $__blaze->popData(); ?> ?>
<div>Hello</div>

Line 1's // only comments out the ensureCompiled call. Lines 2+ execute normally, attempting to require_once a file that was never compiled — crash.

The same bug affects /* */ block comments (component is silently swallowed) and any component-like pattern inside PHP strings or arbitrary PHP code.

History

This was partially fixed before — PR #23 fixed @php blocks, PR #25 fixed Blade comments — but raw <?php ?> blocks were never addressed. Issue #55 shows users still hitting this in production. The Flux team worked around it by changing their code rather than Blaze fixing the underlying issue.

Second issue: early raw block restoration

Separately, Wrapper::wrap() was calling restoreRawBlocks() too early in the pipeline, exposing @php block content to downstream precompilers (e.g. Livewire's morph precompiler) that can inject PHP tags into what should be protected content.

Related: #22, #55


Alternatives explored

1. Regex preprocessing + rawBlocks placeholders

Replace <?php ... ?> blocks with @__raw_block_N__@ placeholders using regex before tokenization. ~10 lines, follows existing preStoreUncompiledBlocks() pattern.

Rejected: Regex /<\?(?:php|=).*?\?>/s fails on ?> inside PHP strings (e.g. <?php echo "?>"; ?> — regex closes at the string's ?>). Also relies on Laravel's rawBlocks system which has known issues (Issue #43 — placeholders leaking to output during php artisan optimize).

2. token_get_all() preprocessing + rawBlocks placeholders

Same as #1 but uses PHP's own tokenizer instead of regex for 100% correct parsing.

Rejected: Correct parsing, but still depends on the fragile rawBlocks placeholder/restore mechanism. Also requires changes to all 4 compilation entry points in BlazeManager (or extracting a shared method).

3. Tokenizer inline <? skip (naive strpos)

In handleTextState(), detect <? and use strpos to find ?>, skipping the block.

Rejected: Same ?> inside strings problem as regex. Self-contained (1 file) but not correct.

4. New PHP_BLOCK FSM state

Add a TokenizerState::PHP_BLOCK state that accumulates text until ?>.

Rejected: More code than needed. Same ?> matching problem unless using token_get_all(). Adds complexity to the state machine for no benefit over the chosen approach.

5. Hybrid split-and-reassemble

Use token_get_all() to split the template, only pass T_INLINE_HTML segments to the tokenizer, reassemble afterward.

Rejected: Significant refactor of the Parser/Tokenizer interface. Overkill for this bug.

6. Whitespace substitution

Replace PHP block content with whitespace, preserving the <?php ?> wrapper.

Rejected: Destroys PHP content. Downstream compilation may need it intact.

7. Upstream Laravel PR

Add <?php ?> blocks to Laravel's storeUncompiledBlocks().

Rejected: Laravel intentionally doesn't store raw PHP blocks — its token_get_all() step in compileString() expects to see them. Would break Laravel's own compilation. Not actionable as a standalone fix.

8. Tokenizer + token_get_all() range map ← Chosen

Before the FSM runs, call token_get_all() once to build a map of PHP block byte ranges. When handleTextState() sees < inside a range, fast-forward past it. See "Why Solution 8" below.

9. Document as limitation

Declare "use {{-- --}} instead."

Rejected: Users are hitting this in production (#55). IDEs and formatters can produce <?php ?> blocks. Not acceptable.

10. Centralize fix in Parser::parse()

Add preprocessing inside Parser::parse() instead of all 4 BlazeManager entry points.

Rejected: Restoration wiring is unclear — Parser shouldn't own preprocessing responsibility.


Why Solution 8

Factor Detail
Correctness token_get_all() is PHP's own tokenizer — handles ?> in strings, heredocs, unclosed blocks at EOF. Zero false positives.
No rawBlocks Avoids the fragile placeholder/restore system entirely. No store/restore dance, no collision risk, no relation to Issue #43. PHP blocks just pass through as text.
Self-contained Only touches Tokenizer.php. No changes to BlazeManager, Parser, or BladeService.
Precedented BladeService::compileStatementsMadePublic() already uses token_get_all() + T_INLINE_HTML in the hot path.
Performance token_get_all() adds ~67µs per call on a 49KB template (benchmarked). Templates without <? skip it entirely via early strpos check — zero overhead for the common case.

Solution

1. Tokenizer PHP range map (src/Parser/Tokenizer.php):

  • Added $phpRanges property and buildPhpRangeMap() method
  • Before the FSM loop, token_get_all() builds a map of all <?php ... ?> byte ranges
  • When handleTextState() encounters < inside a PHP range, fast-forwards past the entire block (appending it to the text buffer as-is)
  • Early strpos('<?') check means zero overhead for templates without PHP blocks
  • Handles unclosed PHP blocks at EOF

2. Defer raw block restoration (src/Compiler/Wrapper.php):

  • Removed restoreRawBlocks(), storeVerbatimBlocks(), and preStoreUncompiledBlocks() calls from the Wrapper
  • @php/@verbatim content stays as @__raw_block_N__@ placeholders through the pipeline, protecting it from downstream precompilers
  • Laravel's own restoreRawContent() handles final restoration at the end of compileString()

3. Dead code removal (src/BladeService.php):

  • Removed storeVerbatimBlocks() and restoreRawBlocks() — no longer called anywhere

Test plan

  • Tokenizer skips component inside PHP single-line comment (//)
  • Tokenizer skips component inside PHP block comment (/* */)
  • Tokenizer skips component inside PHP string
  • Tokenizer skips component inside unclosed PHP block at EOF
  • Tokenizer handles multiple PHP blocks with real component between them
  • End-to-end: component inside PHP line comment is not compiled
  • End-to-end: component inside PHP block comment renders correctly
  • Control: Blade comments still correctly hide components
  • Control: @php blocks still correctly hide components
  • Precompilers cannot inject PHP tags into @php block content
  • All 127 tests pass, zero regressions

calebporzio and others added 3 commits February 24, 2026 19:23
The tokenizer's FSM was PHP-unaware — it matched <x-*> patterns inside
raw <?php ?> blocks (comments, strings, etc.), producing broken compiled
output. Blade comments ({{-- --}}) and @php blocks were already stripped
before tokenization, but raw <?php ?> blocks passed through untouched.

Two fixes:

1. Tokenizer: Build a PHP range map via token_get_all() before the FSM
   loop. When handleTextState() encounters '<' inside a PHP range,
   fast-forward past the block instead of attempting component matching.
   Uses PHP's own tokenizer for 100% correct parsing (handles ?> inside
   strings, heredocs, unclosed blocks at EOF).

2. Wrapper: Stop restoring rawBlocks early. Keeping @php/@verbatim
   content as @__raw_block_N__@ placeholders through the compilation
   pipeline protects it from downstream precompilers (e.g. Livewire's
   morph precompiler) that would otherwise inject PHP tags into comment
   content. Laravel's own restoreRawContent() handles final restoration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@calebporzio calebporzio changed the title [WIP] Fix blade directives inside PHP comments Fix component tags and directives inside PHP blocks Feb 25, 2026
- Move tokenizer tests from CommentBugEndToEndTest → TokenizerTest
- Move integration/e2e tests from CommentBugEndToEndTest → IntegrationTest
- Delete CommentBugEndToEndTest.php (bug-named catchall)
- Remove unused BladeService::storeVerbatimBlocks() and restoreRawBlocks()
- Fix style: new Tokenizer() → app(Tokenizer::class), remove decorative separators
- Clarify Wrapper.php comment to be self-explanatory without PR context

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@calebporzio
Copy link
Contributor Author

@ganyicz - I don't see a better way. I think this will fix a lot of these hard to track down issues we've been hearing about.

@ganyicz ganyicz closed this Feb 25, 2026
@ganyicz ganyicz deleted the fix-blade-inside-php-comments branch February 25, 2026 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants