statistics: add sample-based global stats for partitioned tables#66289
statistics: add sample-based global stats for partitioned tables#66289mjonss wants to merge 19 commits intopingcap:masterfrom
Conversation
Build global-level statistics from merged partition samples instead of merging already-built partition-level TopN and Histogram structures. The existing merge approach is O(P² × T × H) for TopN and has known accuracy issues (bucket overlap heuristics, missed globally frequent values). The sample-based approach reuses the same BuildHistAndTopN() code path used for per-partition stats, producing equivalent quality global stats ~750× faster by avoiding cross-partition reconciliation. Gated behind `SET tidb_enable_sample_based_global_stats = ON` (default OFF). Falls back to the existing merge path when samples are unavailable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Skipping CI for Draft Pull Request. |
|
/ok-to-test |
|
Hi @mjonss. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #66289 +/- ##
================================================
+ Coverage 77.7002% 78.2147% +0.5145%
================================================
Files 2006 1938 -68
Lines 548386 536249 -12137
================================================
- Hits 426097 419426 -6671
+ Misses 120629 116382 -4247
+ Partials 1660 441 -1219
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Add 7 integration tests covering the sample-based global stats feature: - Basic end-to-end with range partitions (column + index) - TopN discovery with skewed data - Composite index stats - Feature disabled (merge-based fallback) - Hash partitions - V1 stats fallback - Empty partition handling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reorder fields in AnalyzeResults to reduce pointer bytes from 136 to 56, fixing the nogo fieldalignment lint error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/retest |
…al stats Fix a critical bug where index FMSketch/NullCount/TotalSize were read from wrong offsets for tables without a clustered primary key. These tables include _tidb_rowid in the collector arrays but not in tableInfo.Columns, causing an off-by-one offset for index metadata. The fix derives the actual column count from sample rows via sampleColumnCount() instead of using len(tableInfo.Columns). Also adds memory cleanup for globalSampleCollectors after use, and strengthens test assertions with exact NDV and null_count checks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tsFromSamples Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/retest |
|
/retest |
Add mysql.stats_samples system table and progressive pruning logic to persist pruned per-partition sample collectors during ANALYZE. This enables future incremental global stats rebuilds by loading saved samples for unchanged partitions instead of re-analyzing all partitions. Key changes: - New system table mysql.stats_samples with save/load/delete operations - SamplePruner for budget-based progressive sample allocation across partitions - PruneTo() method for correct A-Res sub-sampling to smaller reservoir size - Integration into analyze flow: prune and persist before merge - DDL cleanup hooks for drop/truncate/reorganize partition actions - SavePartitionSamples added to StatsReadWriter interface Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add comprehensive tests for the new sample persistence feature: - PruneTo and SamplePruner unit tests (20 tests in sample_test.go) - Save/Load/Delete round-trip tests (8 tests in stats_samples_test.go) - DDL cleanup tests verifying stats_samples rows are cleaned up on drop partition, truncate partition, drop table, and reorganize partition (4 tests in ddl_test.go) Fix two bugs found during testing: - PruneTo(0) caused an index out of range panic because sampleZippedRow cannot handle MaxSampleSize=0. Add early return for targetSize <= 0. - LoadSampleCollectorsFromStorage NOT IN clause failed with "unsupported argument" because the %? SQL escaper does not support []int64 slices. Convert to []string before passing to the query. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-stats' into sample-based-global-stats
|
/retest |
The new mysql.stats_samples system table shifts auto-assigned table IDs in test environments, which changes checksums and table counts in three existing tests: - bootstrap_test: table count 52 -> 53 - collect_conflicts_test: classic checksum update - importer_testkit_test: classic checksum update Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/retest |
…ageStats Update hardcoded system table count from 60 to 61 to account for the new mysql.stats_samples table. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update expected currentBootstrapVersion from 254 to 255 to account for the new upgradeToVer255 function that creates mysql.stats_samples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/retest |
The new mysql.stats_samples system table shifts auto-assigned IDs by +2 in mock stores, affecting all hardcoded DDL job/schema/table IDs in the classic-path assertions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update reserved table count from 59 to 60 in TestNextgenBootstrap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/retest |
|
@mjonss: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Remove the stats_samples system table that persisted per-partition sample collectors. Samples are still used in-memory during full table ANALYZE for building global stats, just no longer persisted to disk. This simplifies the codebase ahead of the new unified stats_global_merge_data table. Removed: table definition, bootstrap/upgrade registration, storage CRUD (stats_samples.go), interface method, subscriber cleanup calls, analyze.go persistence, DDL cleanup tests, and BUILD.bazel entries. Restored test values (table counts, checksums, DDL job IDs) that were adjusted when stats_samples was added. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/retest |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Update expected system table count from 53 to 52. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/retest |
0xPoe
left a comment
There was a problem hiding this comment.
I haven't dove into the code yet. I just had a quick glance at the document.
In auto-analyze and manual analzye, we usually analyze only one partition. This means there is a 99% chance we will fallback to the old implementation because not all samples are available.
Not sure if I understand it correctly.
Build global-level statistics from merged partition samples instead of merging already-built partition-level TopN and Histogram structures. The existing merge approach is O(P² × T × H) for TopN and has known accuracy issues (bucket overlap heuristics, missed globally frequent values). The sample-based approach reuses the same BuildHistAndTopN() code path used for per-partition stats, producing equivalent quality global stats ~750× faster by avoiding cross-partition reconciliation.
Gated behind
SET tidb_enable_sample_based_global_stats = ON(default OFF). Falls back to the existing merge path when samples are unavailable.What problem does this PR solve?
Issue Number: close #66220
Problem Summary: Global stats merging for partitioned tables is slow and produces lossy results. The current merge-based approach reconciles already-built partition-level TopN and Histogram structures, which is O(P² × T × H) and suffers from bucket overlap heuristics and missed globally frequent values.
What changed and how does it work?
New session variable:
tidb_enable_sample_based_global_stats(global+session scope, TypeBool, default OFF)When enabled and analyze version is 2:
ANALYZE TABLE, per-partitionReservoirRowSampleCollectors are accumulated and merged using weighted reservoir sampling (A-Res algorithm) viaMergeCollector().BuildGlobalStatsFromSamples()builds global-level TopN and Histograms directly from the merged samples using the sameBuildHistAndTopN()code path used for per-partition stats.Key files changed:
pkg/sessionctx/vardef/tidb_vars.go,pkg/sessionctx/variable/sysvar.go,pkg/sessionctx/variable/session.go— New session variable registrationpkg/executor/analyze_col_v2.go— Conditional retention ofRowCollectorsamplespkg/executor/analyze.go— Accumulation and merging of partition sample collectorspkg/executor/analyze_global_stats.go— Sample-based path dispatch with merge-based fallbackpkg/statistics/handle/globalstats/global_stats_sample.go— Core implementation: builds column and index stats from merged samplespkg/statistics/handle/globalstats/global_stats_storage.go—WriteGlobalStatsToStorage()for persisting sample-based resultspkg/statistics/analyze.go—RowCollectorfield onAnalyzeResultsCheck List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.