-
Notifications
You must be signed in to change notification settings - Fork 1
refactor(pest)!: greatly simplified grammar, removed expensive look-ahead that offer no real benefit #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
lalvarezt
wants to merge
7
commits into
main
Choose a base branch
from
simplified-grammar
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+2,530
−883
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* feat(bench): add comprehensive throughput analysis tool Add new `bench_throughput` binary for detailed performance analysis of string_pipeline operations at scale. Features: - 28+ comprehensive templates covering all operations - Real-world path processing templates (filename extraction, etc.) - Per-operation timing breakdown with --detailed flag - Latency statistics (min, p50, p95, p99, max, stddev) - JSON output format for tracking performance over time - Scaling analysis (sub-linear, linear, super-linear detection) - Operation-level metrics (call counts, time attribution) - Throughput measurements (paths/sec) - Parse cost analysis across input sizes Template Categories: - Core operations: split, join, upper, lower, trim, replace, etc. - Path operations: extract filename, directory, extension, basename - Complex chains: multi-operation pipelines - Map operations: nested transformations --------- Co-authored-by: Claude <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
…zes (#12) * feat(ci): add manual baseline update workflow and update benchmark sizes - Remove automatic baseline updates from benchmark.yml (main branch pushes) - Add new update-baseline.yml workflow for manual baseline updates - Trigger via workflow_dispatch with ref and iterations inputs - Checkout specified git ref (tag/branch/commit) - Run benchmarks and update baseline artifact - Update benchmark sizes from 100,1000,10000 to 1000,5000,10000 - Update iterations from 50 to 100 - Baseline updates now require explicit manual trigger via GitHub Actions UI * refactor(ci): lower upper bound --------- Co-authored-by: Claude <[email protected]>
a936d14 to
c1e6fa5
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
* fix(ci): resolve CI/CD inconsistencies and broken baseline storage
This commit addresses several critical issues in the CI/CD workflows:
1. Fix broken baseline artifact storage:
- benchmark.yml now downloads baseline from update-baseline.yml
workflow
- Previously tried to download from itself after baseline upload was
removed
2. Standardize Rust toolchain versions to stable:
- CI workflow: Changed test and rustfmt jobs from nightly to stable
- CD workflow: Changed deb publish job from nightly to stable
- Ensures consistent builds and testing across all jobs
These changes restore benchmark comparisons and ensure consistent CI/CD
behavior.
* feat(ci): add /bench command for on-demand PR benchmark comparisons
Add new workflow that allows repository owners to trigger benchmark
comparisons between any two refs via PR comments.
Features:
- Command syntax: /bench <ref1> <ref2>
- Owner-only security: Only repo owner can trigger
- Works on PR comments only (not regular issues)
- Compares any two commits, branches, or tags
- Posts detailed comparison report to PR
- Interactive reactions (👀 → 🚀 or 😕)
Use cases:
- Compare feature branch vs stable release
- Validate optimizations between specific commits
- Test performance across version boundaries
- Ad-hoc performance debugging
Documentation:
- Updated scripts/README.md with complete workflow documentation
- Added security model and usage examples
- Documented all three benchmark workflows
* refactor(ci): simplify benchmark system to single /bench command
Removed automatic benchmarking workflows and consolidated to a single
on-demand command-based system with enhanced flexibility.
Changes:
- Removed benchmark.yml (automatic benchmarks on every PR/push)
- Removed update-baseline.yml (manual baseline updates)
- Enhanced bench-command.yml with optional parameters:
* /bench <ref1> <ref2> [iterations] [sizes]
* Defaults: iterations=100, sizes=1000,5000,10000
* Examples: /bench main HEAD
/bench v0.12.0 v0.13.0 100 1000,5000,10000
New features:
- Graceful handling when benchmark tool doesn't exist in refs
- Posts helpful error messages with commit info
- Validates all parameters before running
- Supports custom iterations and sizes per comparison
- Build logs included in artifacts for debugging
Benefits:
- No automatic runs = lower CI costs
- Owner controls when benchmarks run
- Flexible parameters per comparison
- Clear error messages for invalid refs
- Only one workflow to maintain
Documentation:
- Completely rewrote scripts/README.md
- Added comprehensive usage examples
- Documented all error cases
- Added use case scenarios
- Simplified troubleshooting guide
* fix(ci): improve bench workflow robustness and reliability
Address critical issues to make the workflow less flaky and more
reliable.
Fixes:
1. Permissions
- Added issues: write and pull-requests: write to check-permission
job
- Allows posting comments and reactions without permission errors
2. Concurrency control
- Added concurrency group: bench-${{ issue.number }}
- Prevents overlapping benchmark runs on the same PR
- Uses cancel-in-progress: false to queue instead of cancel
3. Ref fetching
- Added targeted git fetch before validation
- Fetches refs, tags, and heads explicitly
- Prevents "ref not found" errors for remote branches/tags
4. Shell robustness
- Added set -euo pipefail to all bash scripts
- Ensures early failure on any error
- Prevents silent failures and undefined variables
5. Early parse feedback
- Added continue-on-error to parse step
- Posts immediate error message on invalid format
- Shows usage examples with all parameters
- Prevents workflow from proceeding with bad params
Benefits:
- More reliable ref resolution
- Clear error messages for invalid input
- No concurrent runs causing conflicts
- Proper permissions for all operations
- Fail-fast behavior prevents wasted CI time
* fix(ci): fix YAML syntax errors and improve workflow behavior
Critical fixes for YAML parsing and workflow reliability:
1. YAML Syntax Errors (CRITICAL)
- Fixed template literals with markdown causing YAML parse errors
- Converted all multi-line template literals to array.join() pattern
- Lines starting with ** were interpreted as YAML alias markers
- Affected: parse error message, acknowledgment, results, error
handler
2. Graceful Error Handling
- Removed process.exit(78) from "Post missing tool error" step
- Non-zero exit triggers handle-error job unnecessarily
- Existing if guards properly skip subsequent steps
- Workflow now exits gracefully without false failures
3. Concurrency Behavior
- Changed cancel-in-progress from false to true
- Previous: Multiple /bench commands queue and run sequentially
- Now: Newer /bench commands cancel older pending runs
- Saves CI time and provides faster feedback
All changes maintain functionality while fixing syntax and improving UX.
* feat(ci): add intelligent commit ordering and same-commit detection
Implement the intended benchmark logic to ensure correct comparisons:
1. Automatic commit ordering
- Determines which ref is older (baseline) vs newer (current)
- Uses git commit timestamps for ordering
- Ensures baseline is always the older commit regardless of argument
order
- Example: /bench v0.13.0 main → baseline=v0.13.0, current=main (if
main is newer)
2. Same-commit detection
- Checks if both refs resolve to the same SHA
- Exits early with helpful message if refs are identical
- Prevents wasteful benchmark runs on identical code
- Example: /bench main origin/main → detects same commit, posts
message
3. Updated all steps to use determined ordering
- Renamed steps: check_ref1_tool → check_baseline_tool,
check_ref2_tool → check_current_tool
- Updated benchmark steps to use baseline_ref and current_ref
- Updated artifact names: benchmark_ref1.json →
benchmark_baseline.json
- Updated build logs: build_ref1.log → build_baseline.log
- Results always show: Baseline (older) vs Current (newer)
4. Improved error messages
- Missing tool errors now indicate "baseline (older)" vs "current
(newer)"
- Same-commit message explains refs are identical
- Clear distinction between user input and determined ordering
Benefits:
- Consistent baseline/current semantics regardless of argument order
- No wasted CI time on identical commits
- Clear, predictable results in every scenario
- Better user experience with informative messages
---------
Co-authored-by: Claude <[email protected]>
…head that offer no real benefit expect some change in templates that relied on smart escaping, now it's more intentional
Reorganized operation alternatives based on actual benchmark usage patterns to optimize PEG parser performance. Most frequently used operations (split, join, upper, lower, trim, substring, reverse) are now tested first.
Add fast-path optimizations for filter, filter_not, and replace operations when patterns contain no regex metacharacters. This avoids unnecessary regex compilation and matching for simple literal string operations.
5ddd68b to
a194856
Compare
Owner
Author
|
🚀 Benchmark comparison started Comparing:
Parameters:
Results will be posted here when complete... |
🔬 Benchmark Comparison ReportRequested by: @lalvarezt Comparison:
Parameters:
📊 Benchmark Comparison ReportInput Size: 10,000 paths Baseline Timestamp: 1762543388 Performance Comparison
Summary
✨ Performance Improvements
Legend
Triggered by /bench command |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR simplifies the template parser by removing expensive lookahead operations and consolidating multiple context-specific argument parsers into a single, unified approach.
The Problem
The previous grammar implementation attempted to be "smart" by using context-specific parsing rules:
simple_argfor basic operations (append, prepend, join)regex_argfor regex patternssplit_argfor split separatorsmap_regex_argfor regex within map operationsEach parser had different lookahead logic to determine when special characters (
|,:,{,}) should be treated as literals versus syntax. This required extensiveoperation_keywordlookahead and complex negative assertions like:The Solution
The refactor introduces a single unified argument parser:
All operations now use the same escaping rules. Special characters require explicit escaping - no exceptions, no context-dependent behavior.
Why This is Better
Performance: Lookahead operations are expensive. Removing them provides immediate parsing performance improvements, especially for complex templates.
Maintainability: One set of rules to understand, test, and maintain instead of four context-specific parsers.
Predictability: Users now have clear, consistent rules. If a character is special (
|,:,{,},\), escape it. No need to understand parser internals or memorize context-specific exceptions.Explicitness: Templates become more intentional. Compare:
The second version clearly shows what's a separator and what's syntax.
Robustness: Fewer edge cases means fewer bugs. Complex lookahead logic often fails in corner cases that are hard to predict.
The Trade-off
This is a breaking change. Templates that previously relied on smart escaping need updates:
{split:|:0}→{split:\|:0}{regex_extract:Version: (\d+):1}→{regex_extract:Version\: (\d+):1}However, the migration path is straightforward, and the long-term benefits of a simpler, faster, more predictable parser far outweigh the one-time update cost.