Skip to content

Conversation

@lalvarezt
Copy link
Owner

This PR simplifies the template parser by removing expensive lookahead operations and consolidating multiple context-specific argument parsers into a single, unified approach.

The Problem

The previous grammar implementation attempted to be "smart" by using context-specific parsing rules:

  • simple_arg for basic operations (append, prepend, join)
  • regex_arg for regex patterns
  • split_arg for split separators
  • map_regex_arg for regex within map operations

Each parser had different lookahead logic to determine when special characters (|, :, {, }) should be treated as literals versus syntax. This required extensive operation_keyword lookahead and complex negative assertions like:

split_content = { !(":" ~ (number | range_part)) ~ !("|" ~ operation_keyword) ~ !("}" ~ EOI) ~ ANY }

The Solution

The refactor introduces a single unified argument parser:

argument     = { (escaped_char | normal_char)* }
normal_char  = { !("|" | "}" | "{" | ":" | "\\") ~ ANY }
escaped_char = { "\\" ~ ANY }

All operations now use the same escaping rules. Special characters require explicit escaping - no exceptions, no context-dependent behavior.

Why This is Better

Performance: Lookahead operations are expensive. Removing them provides immediate parsing performance improvements, especially for complex templates.

Maintainability: One set of rules to understand, test, and maintain instead of four context-specific parsers.

Predictability: Users now have clear, consistent rules. If a character is special (|, :, {, }, \), escape it. No need to understand parser internals or memorize context-specific exceptions.

Explicitness: Templates become more intentional. Compare:

{split:|:0}           # old: relies on smart parsing
{split:\|:0}          # new: explicit intent

The second version clearly shows what's a separator and what's syntax.

Robustness: Fewer edge cases means fewer bugs. Complex lookahead logic often fails in corner cases that are hard to predict.

The Trade-off

This is a breaking change. Templates that previously relied on smart escaping need updates:

  • Pipes in split separators: {split:|:0}{split:\|:0}
  • Colons in regex patterns: {regex_extract:Version: (\d+):1}{regex_extract:Version\: (\d+):1}

However, the migration path is straightforward, and the long-term benefits of a simpler, faster, more predictable parser far outweigh the one-time update cost.

* feat(bench): add comprehensive throughput analysis tool

Add new `bench_throughput` binary for detailed performance analysis of
string_pipeline operations at scale.

Features:
- 28+ comprehensive templates covering all operations
- Real-world path processing templates (filename extraction, etc.)
- Per-operation timing breakdown with --detailed flag
- Latency statistics (min, p50, p95, p99, max, stddev)
- JSON output format for tracking performance over time
- Scaling analysis (sub-linear, linear, super-linear detection)
- Operation-level metrics (call counts, time attribution)
- Throughput measurements (paths/sec)
- Parse cost analysis across input sizes

Template Categories:
- Core operations: split, join, upper, lower, trim, replace, etc.
- Path operations: extract filename, directory, extension, basename
- Complex chains: multi-operation pipelines
- Map operations: nested transformations

---------

Co-authored-by: Claude <[email protected]>
@lalvarezt lalvarezt self-assigned this Nov 5, 2025
@lalvarezt lalvarezt added the enhancement New feature or request label Nov 5, 2025
@github-actions

This comment was marked as outdated.

…zes (#12)

* feat(ci): add manual baseline update workflow and update benchmark sizes

- Remove automatic baseline updates from benchmark.yml (main branch pushes)
- Add new update-baseline.yml workflow for manual baseline updates
  - Trigger via workflow_dispatch with ref and iterations inputs
  - Checkout specified git ref (tag/branch/commit)
  - Run benchmarks and update baseline artifact
- Update benchmark sizes from 100,1000,10000 to 1000,5000,10000
- Update iterations from 50 to 100

- Baseline updates now require explicit manual trigger via GitHub Actions UI

* refactor(ci): lower upper bound

---------

Co-authored-by: Claude <[email protected]>
@lalvarezt lalvarezt force-pushed the simplified-grammar branch 2 times, most recently from a936d14 to c1e6fa5 Compare November 6, 2025 09:32
@lalvarezt

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

lalvarezt and others added 2 commits November 6, 2025 17:07
* fix(ci): resolve CI/CD inconsistencies and broken baseline storage

This commit addresses several critical issues in the CI/CD workflows:

1. Fix broken baseline artifact storage:
   - benchmark.yml now downloads baseline from update-baseline.yml
workflow
   - Previously tried to download from itself after baseline upload was
removed

2. Standardize Rust toolchain versions to stable:
   - CI workflow: Changed test and rustfmt jobs from nightly to stable
   - CD workflow: Changed deb publish job from nightly to stable
   - Ensures consistent builds and testing across all jobs

These changes restore benchmark comparisons and ensure consistent CI/CD
behavior.

* feat(ci): add /bench command for on-demand PR benchmark comparisons

Add new workflow that allows repository owners to trigger benchmark
comparisons between any two refs via PR comments.

Features:
- Command syntax: /bench <ref1> <ref2>
- Owner-only security: Only repo owner can trigger
- Works on PR comments only (not regular issues)
- Compares any two commits, branches, or tags
- Posts detailed comparison report to PR
- Interactive reactions (👀 → 🚀 or 😕)

Use cases:
- Compare feature branch vs stable release
- Validate optimizations between specific commits
- Test performance across version boundaries
- Ad-hoc performance debugging

Documentation:
- Updated scripts/README.md with complete workflow documentation
- Added security model and usage examples
- Documented all three benchmark workflows

* refactor(ci): simplify benchmark system to single /bench command

Removed automatic benchmarking workflows and consolidated to a single
on-demand command-based system with enhanced flexibility.

Changes:
- Removed benchmark.yml (automatic benchmarks on every PR/push)
- Removed update-baseline.yml (manual baseline updates)
- Enhanced bench-command.yml with optional parameters:
  * /bench <ref1> <ref2> [iterations] [sizes]
  * Defaults: iterations=100, sizes=1000,5000,10000
  * Examples: /bench main HEAD
             /bench v0.12.0 v0.13.0 100 1000,5000,10000

New features:
- Graceful handling when benchmark tool doesn't exist in refs
- Posts helpful error messages with commit info
- Validates all parameters before running
- Supports custom iterations and sizes per comparison
- Build logs included in artifacts for debugging

Benefits:
- No automatic runs = lower CI costs
- Owner controls when benchmarks run
- Flexible parameters per comparison
- Clear error messages for invalid refs
- Only one workflow to maintain

Documentation:
- Completely rewrote scripts/README.md
- Added comprehensive usage examples
- Documented all error cases
- Added use case scenarios
- Simplified troubleshooting guide

* fix(ci): improve bench workflow robustness and reliability

Address critical issues to make the workflow less flaky and more
reliable.

Fixes:

1. Permissions
   - Added issues: write and pull-requests: write to check-permission
job
   - Allows posting comments and reactions without permission errors

2. Concurrency control
   - Added concurrency group: bench-${{ issue.number }}
   - Prevents overlapping benchmark runs on the same PR
   - Uses cancel-in-progress: false to queue instead of cancel

3. Ref fetching
   - Added targeted git fetch before validation
   - Fetches refs, tags, and heads explicitly
   - Prevents "ref not found" errors for remote branches/tags

4. Shell robustness
   - Added set -euo pipefail to all bash scripts
   - Ensures early failure on any error
   - Prevents silent failures and undefined variables

5. Early parse feedback
   - Added continue-on-error to parse step
   - Posts immediate error message on invalid format
   - Shows usage examples with all parameters
   - Prevents workflow from proceeding with bad params

Benefits:
- More reliable ref resolution
- Clear error messages for invalid input
- No concurrent runs causing conflicts
- Proper permissions for all operations
- Fail-fast behavior prevents wasted CI time

* fix(ci): fix YAML syntax errors and improve workflow behavior

Critical fixes for YAML parsing and workflow reliability:

1. YAML Syntax Errors (CRITICAL)
   - Fixed template literals with markdown causing YAML parse errors
   - Converted all multi-line template literals to array.join() pattern
   - Lines starting with ** were interpreted as YAML alias markers
   - Affected: parse error message, acknowledgment, results, error
handler

2. Graceful Error Handling
   - Removed process.exit(78) from "Post missing tool error" step
   - Non-zero exit triggers handle-error job unnecessarily
   - Existing if guards properly skip subsequent steps
   - Workflow now exits gracefully without false failures

3. Concurrency Behavior
   - Changed cancel-in-progress from false to true
   - Previous: Multiple /bench commands queue and run sequentially
   - Now: Newer /bench commands cancel older pending runs
   - Saves CI time and provides faster feedback

All changes maintain functionality while fixing syntax and improving UX.

* feat(ci): add intelligent commit ordering and same-commit detection

Implement the intended benchmark logic to ensure correct comparisons:

1. Automatic commit ordering
   - Determines which ref is older (baseline) vs newer (current)
   - Uses git commit timestamps for ordering
   - Ensures baseline is always the older commit regardless of argument
order
   - Example: /bench v0.13.0 main → baseline=v0.13.0, current=main (if
main is newer)

2. Same-commit detection
   - Checks if both refs resolve to the same SHA
   - Exits early with helpful message if refs are identical
   - Prevents wasteful benchmark runs on identical code
   - Example: /bench main origin/main → detects same commit, posts
message

3. Updated all steps to use determined ordering
   - Renamed steps: check_ref1_tool → check_baseline_tool,
check_ref2_tool → check_current_tool
   - Updated benchmark steps to use baseline_ref and current_ref
   - Updated artifact names: benchmark_ref1.json →
benchmark_baseline.json
   - Updated build logs: build_ref1.log → build_baseline.log
   - Results always show: Baseline (older) vs Current (newer)

4. Improved error messages
   - Missing tool errors now indicate "baseline (older)" vs "current
(newer)"
   - Same-commit message explains refs are identical
   - Clear distinction between user input and determined ordering

Benefits:
- Consistent baseline/current semantics regardless of argument order
- No wasted CI time on identical commits
- Clear, predictable results in every scenario
- Better user experience with informative messages

---------

Co-authored-by: Claude <[email protected]>
lalvarezt and others added 3 commits November 7, 2025 20:18
…head that offer no real benefit

expect some change in templates that relied on smart escaping, now it's
more intentional
Reorganized operation alternatives based on actual benchmark usage
patterns to optimize PEG parser performance. Most frequently used
operations (split, join, upper, lower, trim, substring, reverse) are now
tested first.
Add fast-path optimizations for filter, filter_not, and replace
operations when patterns contain no regex metacharacters. This avoids
unnecessary regex compilation and matching for simple literal string
operations.
@lalvarezt
Copy link
Owner Author

/bench d264124 a194856

@github-actions
Copy link

github-actions bot commented Nov 7, 2025

🚀 Benchmark comparison started

Comparing:

  • Baseline: d2641242c4628bc725ea5fa0602db1f5a51d8f9d
  • Current: a194856553402e5645376917e43726d816a3b581

Parameters:

  • Iterations: 100
  • Sizes: 1000,5000,10000

Results will be posted here when complete...

@github-actions
Copy link

github-actions bot commented Nov 7, 2025

🔬 Benchmark Comparison Report

Requested by: @lalvarezt

Comparison:

  • Baseline (older): d2641242c4628bc725ea5fa0602db1f5a51d8f9d (d264124)
  • Current (newer): a194856553402e5645376917e43726d816a3b581 (a194856)

Parameters:

  • Iterations: 100
  • Sizes: 1000,5000,10000

📊 Benchmark Comparison Report

Input Size: 10,000 paths

Baseline Timestamp: 1762543388
Current Timestamp: 1762543498

Performance Comparison

Template Avg/Path Change p95 Change Throughput Change
Basename no ext 1.22μs ➖ -1.3% 1.26μs ➖ -0.4% 817.14K/s ➖ +1.3%
Breadcrumb last 3 1.57μs ➖ -0.3% 1.61μs ➖ +0.6% 635.35K/s ➖ +0.2%
Chain: map complex 3.22μs 🟢 -52.0% 3.29μs 🟢 -52.5% 310.78K/s 🟢 +108.2%
Chain: split+filter+sort+join 5.73μs ➖ -1.2% 5.77μs ➖ -1.4% 174.46K/s ➖ +1.2%
Chain: trim+upper+pad 1.39μs ➖ +0.8% 1.42μs ➖ +0.7% 718.51K/s ➖ -0.8%
Extract directory 1.35μs ➖ +0.7% 1.37μs ➖ +0.3% 739.97K/s ➖ -0.7%
Extract filename 375ns 🟡 +4.2% 396ns ⚠️ +5.3% 2.67M/s 🟡 -4.0%
File extension 1.23μs ➖ -1.7% 1.28μs 🟢 -6.8% 812.71K/s ➖ +1.7%
Filter 5.27μs ➖ +0.5% 5.32μs ➖ +0.2% 189.64K/s ➖ -0.5%
Join 1.34μs ➖ -0.8% 1.36μs ✅ -2.9% 748.21K/s ➖ +0.8%
Lower 940ns 🟡 +3.6% 1.00μs ⚠️ +6.8% 1.06M/s 🟡 -3.5%
Normalize filename 1.07μs ✅ -2.3% 1.10μs ➖ -1.3% 930.86K/s ✅ +2.3%
Pad 1.20μs ➖ +1.6% 1.23μs 🟡 +2.4% 830.72K/s ➖ -1.6%
Regex extract filename 4.86μs ➖ -0.9% 4.91μs ✅ -3.6% 205.86K/s ➖ +0.9%
Remove hidden dirs 5.03μs ➖ +1.4% 5.07μs ➖ +0.7% 198.98K/s ➖ -1.4%
Replace complex 3.45μs ➖ +0.6% 3.51μs ➖ +1.7% 289.94K/s ➖ -0.6%
Replace simple 3.59μs ➖ -0.9% 3.65μs ➖ -1.5% 278.30K/s ➖ +0.9%
Reverse 920ns ➖ +0.4% 949ns ➖ +1.0% 1.09M/s ➖ -0.4%
Slug generation 493ns 🟢 -45.9% 506ns 🟢 -45.7% 2.03M/s 🟢 +84.7%
Sort 1.75μs ➖ +1.3% 1.77μs ➖ +0.9% 571.43K/s ➖ -1.3%
Split all 540ns ✅ -4.9% 563ns 🟢 -5.5% 1.85M/s 🟢 +5.2%
Split last index 353ns 🟢 -8.5% 367ns 🟢 -10.9% 2.83M/s 🟢 +9.4%
Strip ANSI 276ns ✅ -2.8% 279ns 🟢 -7.3% 3.62M/s ✅ +3.1%
Substring 1.11μs ➖ -0.2% 1.16μs ➖ +0.8% 898.03K/s ➖ +0.2%
Trim 1.04μs ➖ -0.1% 1.09μs 🟡 +2.2% 964.47K/s ➖ +0.1%
Unique 2.25μs ➖ -0.7% 2.28μs ➖ -0.4% 443.45K/s ➖ +0.7%
Upper 892ns ➖ -0.7% 920ns ➖ -1.0% 1.12M/s ➖ +0.7%
Uppercase all components 2.42μs 🟡 +2.0% 2.81μs 🔴 +17.4% 413.49K/s ➖ -2.0%

Summary

  • Total templates compared: 28
  • Improvements: 3 🟢
  • Regressions: 0 🔴
  • Neutral: 25 ➖

✨ Performance Improvements

  • Chain: map complex: 52.0% faster
  • Slug generation: 45.9% faster
  • Split last index: 8.5% faster

Legend

  • 🟢 Significant improvement (>5% faster)
  • ✅ Improvement (2-5% faster)
  • ➖ Neutral (<2% change)
  • 🟡 Caution (2-5% slower)
  • ⚠️ Warning (5-10% slower)
  • 🔴 Regression (>10% slower)

Triggered by /bench command

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants