refactor(pest)!: greatly simplified grammar, removed expensive look-ahead that offer no real benefit #11

lalvarezt · 2025-11-05T17:43:25Z

This PR simplifies the template parser by removing expensive lookahead operations and consolidating multiple context-specific argument parsers into a single, unified approach.

The Problem

The previous grammar implementation attempted to be "smart" by using context-specific parsing rules:

simple_arg for basic operations (append, prepend, join)
regex_arg for regex patterns
split_arg for split separators
map_regex_arg for regex within map operations

Each parser had different lookahead logic to determine when special characters (|, :, {, }) should be treated as literals versus syntax. This required extensive operation_keyword lookahead and complex negative assertions like:

split_content = { !(":" ~ (number | range_part)) ~ !("|" ~ operation_keyword) ~ !("}" ~ EOI) ~ ANY }

The Solution

The refactor introduces a single unified argument parser:

argument     = { (escaped_char | normal_char)* }
normal_char  = { !("|" | "}" | "{" | ":" | "\\") ~ ANY }
escaped_char = { "\\" ~ ANY }

All operations now use the same escaping rules. Special characters require explicit escaping - no exceptions, no context-dependent behavior.

Why This is Better

Performance: Lookahead operations are expensive. Removing them provides immediate parsing performance improvements, especially for complex templates.

Maintainability: One set of rules to understand, test, and maintain instead of four context-specific parsers.

Predictability: Users now have clear, consistent rules. If a character is special (|, :, {, }, \), escape it. No need to understand parser internals or memorize context-specific exceptions.

Explicitness: Templates become more intentional. Compare:

{split:|:0}           # old: relies on smart parsing
{split:\|:0}          # new: explicit intent

The second version clearly shows what's a separator and what's syntax.

Robustness: Fewer edge cases means fewer bugs. Complex lookahead logic often fails in corner cases that are hard to predict.

The Trade-off

This is a breaking change. Templates that previously relied on smart escaping need updates:

Pipes in split separators: {split:|:0} → {split:\|:0}
Colons in regex patterns: {regex_extract:Version: (\d+):1} → {regex_extract:Version\: (\d+):1}

However, the migration path is straightforward, and the long-term benefits of a simpler, faster, more predictable parser far outweigh the one-time update cost.

* feat(bench): add comprehensive throughput analysis tool Add new `bench_throughput` binary for detailed performance analysis of string_pipeline operations at scale. Features: - 28+ comprehensive templates covering all operations - Real-world path processing templates (filename extraction, etc.) - Per-operation timing breakdown with --detailed flag - Latency statistics (min, p50, p95, p99, max, stddev) - JSON output format for tracking performance over time - Scaling analysis (sub-linear, linear, super-linear detection) - Operation-level metrics (call counts, time attribution) - Throughput measurements (paths/sec) - Parse cost analysis across input sizes Template Categories: - Core operations: split, join, upper, lower, trim, replace, etc. - Path operations: extract filename, directory, extension, basename - Complex chains: multi-operation pipelines - Map operations: nested transformations --------- Co-authored-by: Claude <[email protected]>

…zes (#12) * feat(ci): add manual baseline update workflow and update benchmark sizes - Remove automatic baseline updates from benchmark.yml (main branch pushes) - Add new update-baseline.yml workflow for manual baseline updates - Trigger via workflow_dispatch with ref and iterations inputs - Checkout specified git ref (tag/branch/commit) - Run benchmarks and update baseline artifact - Update benchmark sizes from 100,1000,10000 to 1000,5000,10000 - Update iterations from 50 to 100 - Baseline updates now require explicit manual trigger via GitHub Actions UI * refactor(ci): lower upper bound --------- Co-authored-by: Claude <[email protected]>

* fix(ci): resolve CI/CD inconsistencies and broken baseline storage This commit addresses several critical issues in the CI/CD workflows: 1. Fix broken baseline artifact storage: - benchmark.yml now downloads baseline from update-baseline.yml workflow - Previously tried to download from itself after baseline upload was removed 2. Standardize Rust toolchain versions to stable: - CI workflow: Changed test and rustfmt jobs from nightly to stable - CD workflow: Changed deb publish job from nightly to stable - Ensures consistent builds and testing across all jobs These changes restore benchmark comparisons and ensure consistent CI/CD behavior. * feat(ci): add /bench command for on-demand PR benchmark comparisons Add new workflow that allows repository owners to trigger benchmark comparisons between any two refs via PR comments. Features: - Command syntax: /bench <ref1> <ref2> - Owner-only security: Only repo owner can trigger - Works on PR comments only (not regular issues) - Compares any two commits, branches, or tags - Posts detailed comparison report to PR - Interactive reactions (👀 → 🚀 or 😕) Use cases: - Compare feature branch vs stable release - Validate optimizations between specific commits - Test performance across version boundaries - Ad-hoc performance debugging Documentation: - Updated scripts/README.md with complete workflow documentation - Added security model and usage examples - Documented all three benchmark workflows * refactor(ci): simplify benchmark system to single /bench command Removed automatic benchmarking workflows and consolidated to a single on-demand command-based system with enhanced flexibility. Changes: - Removed benchmark.yml (automatic benchmarks on every PR/push) - Removed update-baseline.yml (manual baseline updates) - Enhanced bench-command.yml with optional parameters: * /bench <ref1> <ref2> [iterations] [sizes] * Defaults: iterations=100, sizes=1000,5000,10000 * Examples: /bench main HEAD /bench v0.12.0 v0.13.0 100 1000,5000,10000 New features: - Graceful handling when benchmark tool doesn't exist in refs - Posts helpful error messages with commit info - Validates all parameters before running - Supports custom iterations and sizes per comparison - Build logs included in artifacts for debugging Benefits: - No automatic runs = lower CI costs - Owner controls when benchmarks run - Flexible parameters per comparison - Clear error messages for invalid refs - Only one workflow to maintain Documentation: - Completely rewrote scripts/README.md - Added comprehensive usage examples - Documented all error cases - Added use case scenarios - Simplified troubleshooting guide * fix(ci): improve bench workflow robustness and reliability Address critical issues to make the workflow less flaky and more reliable. Fixes: 1. Permissions - Added issues: write and pull-requests: write to check-permission job - Allows posting comments and reactions without permission errors 2. Concurrency control - Added concurrency group: bench-${{ issue.number }} - Prevents overlapping benchmark runs on the same PR - Uses cancel-in-progress: false to queue instead of cancel 3. Ref fetching - Added targeted git fetch before validation - Fetches refs, tags, and heads explicitly - Prevents "ref not found" errors for remote branches/tags 4. Shell robustness - Added set -euo pipefail to all bash scripts - Ensures early failure on any error - Prevents silent failures and undefined variables 5. Early parse feedback - Added continue-on-error to parse step - Posts immediate error message on invalid format - Shows usage examples with all parameters - Prevents workflow from proceeding with bad params Benefits: - More reliable ref resolution - Clear error messages for invalid input - No concurrent runs causing conflicts - Proper permissions for all operations - Fail-fast behavior prevents wasted CI time * fix(ci): fix YAML syntax errors and improve workflow behavior Critical fixes for YAML parsing and workflow reliability: 1. YAML Syntax Errors (CRITICAL) - Fixed template literals with markdown causing YAML parse errors - Converted all multi-line template literals to array.join() pattern - Lines starting with ** were interpreted as YAML alias markers - Affected: parse error message, acknowledgment, results, error handler 2. Graceful Error Handling - Removed process.exit(78) from "Post missing tool error" step - Non-zero exit triggers handle-error job unnecessarily - Existing if guards properly skip subsequent steps - Workflow now exits gracefully without false failures 3. Concurrency Behavior - Changed cancel-in-progress from false to true - Previous: Multiple /bench commands queue and run sequentially - Now: Newer /bench commands cancel older pending runs - Saves CI time and provides faster feedback All changes maintain functionality while fixing syntax and improving UX. * feat(ci): add intelligent commit ordering and same-commit detection Implement the intended benchmark logic to ensure correct comparisons: 1. Automatic commit ordering - Determines which ref is older (baseline) vs newer (current) - Uses git commit timestamps for ordering - Ensures baseline is always the older commit regardless of argument order - Example: /bench v0.13.0 main → baseline=v0.13.0, current=main (if main is newer) 2. Same-commit detection - Checks if both refs resolve to the same SHA - Exits early with helpful message if refs are identical - Prevents wasteful benchmark runs on identical code - Example: /bench main origin/main → detects same commit, posts message 3. Updated all steps to use determined ordering - Renamed steps: check_ref1_tool → check_baseline_tool, check_ref2_tool → check_current_tool - Updated benchmark steps to use baseline_ref and current_ref - Updated artifact names: benchmark_ref1.json → benchmark_baseline.json - Updated build logs: build_ref1.log → build_baseline.log - Results always show: Baseline (older) vs Current (newer) 4. Improved error messages - Missing tool errors now indicate "baseline (older)" vs "current (newer)" - Same-commit message explains refs are identical - Clear distinction between user input and determined ordering Benefits: - Consistent baseline/current semantics regardless of argument order - No wasted CI time on identical commits - Clear, predictable results in every scenario - Better user experience with informative messages --------- Co-authored-by: Claude <[email protected]>

…head that offer no real benefit expect some change in templates that relied on smart escaping, now it's more intentional

Reorganized operation alternatives based on actual benchmark usage patterns to optimize PEG parser performance. Most frequently used operations (split, join, upper, lower, trim, substring, reverse) are now tested first.

Add fast-path optimizations for filter, filter_not, and replace operations when patterns contain no regex metacharacters. This avoids unnecessary regex compilation and matching for simple literal string operations.

lalvarezt · 2025-11-07T19:20:43Z

/bench d264124 a194856

github-actions · 2025-11-07T19:20:51Z

🚀 Benchmark comparison started

Comparing:

Baseline: d2641242c4628bc725ea5fa0602db1f5a51d8f9d
Current: a194856553402e5645376917e43726d816a3b581

Parameters:

Iterations: 100
Sizes: 1000,5000,10000

Results will be posted here when complete...

github-actions · 2025-11-07T19:24:58Z

🔬 Benchmark Comparison Report

Requested by: @lalvarezt

Comparison:

Baseline (older): d2641242c4628bc725ea5fa0602db1f5a51d8f9d (d264124)
Current (newer): a194856553402e5645376917e43726d816a3b581 (a194856)

Parameters:

Iterations: 100
Sizes: 1000,5000,10000

📊 Benchmark Comparison Report

Input Size: 10,000 paths

Baseline Timestamp: 1762543388
Current Timestamp: 1762543498

Performance Comparison

Template	Avg/Path	Change	p95	Change	Throughput	Change
Basename no ext	1.22μs	➖ -1.3%	1.26μs	➖ -0.4%	817.14K/s	➖ +1.3%
Breadcrumb last 3	1.57μs	➖ -0.3%	1.61μs	➖ +0.6%	635.35K/s	➖ +0.2%
Chain: map complex	3.22μs	🟢 -52.0%	3.29μs	🟢 -52.5%	310.78K/s	🟢 +108.2%
Chain: split+filter+sort+join	5.73μs	➖ -1.2%	5.77μs	➖ -1.4%	174.46K/s	➖ +1.2%
Chain: trim+upper+pad	1.39μs	➖ +0.8%	1.42μs	➖ +0.7%	718.51K/s	➖ -0.8%
Extract directory	1.35μs	➖ +0.7%	1.37μs	➖ +0.3%	739.97K/s	➖ -0.7%
Extract filename	375ns	🟡 +4.2%	396ns	⚠️ +5.3%	2.67M/s	🟡 -4.0%
File extension	1.23μs	➖ -1.7%	1.28μs	🟢 -6.8%	812.71K/s	➖ +1.7%
Filter	5.27μs	➖ +0.5%	5.32μs	➖ +0.2%	189.64K/s	➖ -0.5%
Join	1.34μs	➖ -0.8%	1.36μs	✅ -2.9%	748.21K/s	➖ +0.8%
Lower	940ns	🟡 +3.6%	1.00μs	⚠️ +6.8%	1.06M/s	🟡 -3.5%
Normalize filename	1.07μs	✅ -2.3%	1.10μs	➖ -1.3%	930.86K/s	✅ +2.3%
Pad	1.20μs	➖ +1.6%	1.23μs	🟡 +2.4%	830.72K/s	➖ -1.6%
Regex extract filename	4.86μs	➖ -0.9%	4.91μs	✅ -3.6%	205.86K/s	➖ +0.9%
Remove hidden dirs	5.03μs	➖ +1.4%	5.07μs	➖ +0.7%	198.98K/s	➖ -1.4%
Replace complex	3.45μs	➖ +0.6%	3.51μs	➖ +1.7%	289.94K/s	➖ -0.6%
Replace simple	3.59μs	➖ -0.9%	3.65μs	➖ -1.5%	278.30K/s	➖ +0.9%
Reverse	920ns	➖ +0.4%	949ns	➖ +1.0%	1.09M/s	➖ -0.4%
Slug generation	493ns	🟢 -45.9%	506ns	🟢 -45.7%	2.03M/s	🟢 +84.7%
Sort	1.75μs	➖ +1.3%	1.77μs	➖ +0.9%	571.43K/s	➖ -1.3%
Split all	540ns	✅ -4.9%	563ns	🟢 -5.5%	1.85M/s	🟢 +5.2%
Split last index	353ns	🟢 -8.5%	367ns	🟢 -10.9%	2.83M/s	🟢 +9.4%
Strip ANSI	276ns	✅ -2.8%	279ns	🟢 -7.3%	3.62M/s	✅ +3.1%
Substring	1.11μs	➖ -0.2%	1.16μs	➖ +0.8%	898.03K/s	➖ +0.2%
Trim	1.04μs	➖ -0.1%	1.09μs	🟡 +2.2%	964.47K/s	➖ +0.1%
Unique	2.25μs	➖ -0.7%	2.28μs	➖ -0.4%	443.45K/s	➖ +0.7%
Upper	892ns	➖ -0.7%	920ns	➖ -1.0%	1.12M/s	➖ +0.7%
Uppercase all components	2.42μs	🟡 +2.0%	2.81μs	🔴 +17.4%	413.49K/s	➖ -2.0%

Summary

Total templates compared: 28
Improvements: 3 🟢
Regressions: 0 🔴
Neutral: 25 ➖

✨ Performance Improvements

Chain: map complex: 52.0% faster
Slug generation: 45.9% faster
Split last index: 8.5% faster

Legend

🟢 Significant improvement (>5% faster)
✅ Improvement (2-5% faster)
➖ Neutral (<2% change)
🟡 Caution (2-5% slower)
⚠️ Warning (5-10% slower)
🔴 Regression (>10% slower)

_{Triggered by /bench command}

lalvarezt self-assigned this Nov 5, 2025

lalvarezt added the enhancement New feature or request label Nov 5, 2025

This comment was marked as outdated.

Sign in to view

lalvarezt force-pushed the simplified-grammar branch 2 times, most recently from a936d14 to c1e6fa5 Compare November 6, 2025 09:32

This comment was marked as outdated.

Sign in to view

lalvarezt and others added 2 commits November 6, 2025 17:07

docs: refresh benchmarking documentation to reflect new system

b92e7e2

lalvarezt force-pushed the main branch from 21f1bc1 to b92e7e2 Compare November 6, 2025 16:07

lalvarezt and others added 3 commits November 7, 2025 20:18

refactor(pest)!: greatly simplified grammar, removed expensive look-a…

7d83007

…head that offer no real benefit expect some change in templates that relied on smart escaping, now it's more intentional

perf(grammar): reorder pest grammar operations by usage frequency

9d720af

Reorganized operation alternatives based on actual benchmark usage patterns to optimize PEG parser performance. Most frequently used operations (split, join, upper, lower, trim, substring, reverse) are now tested first.

perf(operations): add literal string fast paths for filter and replace

a194856

Add fast-path optimizations for filter, filter_not, and replace operations when patterns contain no regex metacharacters. This avoids unnecessary regex compilation and matching for simple literal string operations.

lalvarezt force-pushed the simplified-grammar branch from 5ddd68b to a194856 Compare November 7, 2025 19:18

lalvarezt force-pushed the main branch from dc06069 to df93f9b Compare November 9, 2025 08:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(pest)!: greatly simplified grammar, removed expensive look-ahead that offer no real benefit #11

refactor(pest)!: greatly simplified grammar, removed expensive look-ahead that offer no real benefit #11

lalvarezt commented Nov 5, 2025

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

lalvarezt commented Nov 7, 2025

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

refactor(pest)!: greatly simplified grammar, removed expensive look-ahead that offer no real benefit #11

Are you sure you want to change the base?

refactor(pest)!: greatly simplified grammar, removed expensive look-ahead that offer no real benefit #11

Conversation

lalvarezt commented Nov 5, 2025

The Problem

The Solution

Why This is Better

The Trade-off

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

lalvarezt commented Nov 7, 2025

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

github-actions bot commented Nov 7, 2025

🔬 Benchmark Comparison Report

📊 Benchmark Comparison Report

Performance Comparison

Summary

✨ Performance Improvements

Legend

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants