feat(filters): add TOML filters for bq query and bq show with CSV compression#896
Open
fkztw wants to merge 1 commit intortk-ai:developfrom
Open
feat(filters): add TOML filters for bq query and bq show with CSV compression#896fkztw wants to merge 1 commit intortk-ai:developfrom
fkztw wants to merge 1 commit intortk-ai:developfrom
Conversation
- Filters out noise like gcloud update warnings and job progress status - Implements max_lines=40 and truncate_lines_at=120 to guard against large payloads - Registers these filters in discover/rules.rs to track savings - Adjusts test suite counts to account for the new filters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR addresses token bloat from BigQuery commands
bq queryandbq showby introducing TOML filters designed to mitigate noise and aggressively compress output schema and large results.Inspired by the TOON (Token-Oriented Object Notation) project format architecture, this filter drops the inherently expensive ASCII table layouts and structural paddings typical of BigQuery CLI outputs.
By transforming the output into a more streamlined, CSV-like footprint during the proxy ingestion phase, we achieved the following optimizations:
gcloudupdate warnings, BQ job submission statuses, and purely decorative ASCII borders (+---+---+).|paddings with dense comma-separated syntax. This successfully drops raw token per-line consumption by up to 40-80% depending on row width.max_linessafely from standard bounds up to100 lines. This empowers LLM reasoning by providing 2.5x more rows of context for the same historical compute bandwidth.REPEATED RECORDSsmoothly translate effectively down into valid sparse rows.Testing & Verification
Inline unit tests were expanded to comprehensively cover various complex schemas (JSON multi-line payload representations and massive clustered dataset partition listings). Real-world anonymized queries gathered from our local engineering team have validated our aggressive savings metric assumptions.