statistics: add sample-based global stats for partitioned tables by mjonss · Pull Request #66289 · pingcap/tidb

mjonss · 2026-02-15T13:26:32Z

Build global-level statistics from merged partition samples instead of merging already-built partition-level TopN and Histogram structures. The existing merge approach is O(P² × T × H) for TopN and has known accuracy issues (bucket overlap heuristics, missed globally frequent values). The sample-based approach reuses the same BuildHistAndTopN() code path used for per-partition stats, producing equivalent quality global stats ~750× faster by avoiding cross-partition reconciliation.

Gated behind SET tidb_enable_sample_based_global_stats = ON (default OFF). Falls back to the existing merge path when samples are unavailable.

What problem does this PR solve?

Issue Number: close #66220

Problem Summary: Global stats merging for partitioned tables is slow and produces lossy results. The current merge-based approach reconciles already-built partition-level TopN and Histogram structures, which is O(P² × T × H) and suffers from bucket overlap heuristics and missed globally frequent values.

What changed and how does it work?

New session variable: tidb_enable_sample_based_global_stats (global+session scope, TypeBool, default OFF)

When enabled and analyze version is 2:

During ANALYZE TABLE, per-partition ReservoirRowSampleCollectors are accumulated and merged using weighted reservoir sampling (A-Res algorithm) via MergeCollector().
After all partitions are analyzed, BuildGlobalStatsFromSamples() builds global-level TopN and Histograms directly from the merged samples using the same BuildHistAndTopN() code path used for per-partition stats.
Falls back transparently to the existing merge-based path when samples are unavailable (V1 stats, empty partitions, feature disabled).

Key files changed:

pkg/sessionctx/vardef/tidb_vars.go, pkg/sessionctx/variable/sysvar.go, pkg/sessionctx/variable/session.go — New session variable registration
pkg/executor/analyze_col_v2.go — Conditional retention of RowCollector samples
pkg/executor/analyze.go — Accumulation and merging of partition sample collectors
pkg/executor/analyze_global_stats.go — Sample-based path dispatch with merge-based fallback
pkg/statistics/handle/globalstats/global_stats_sample.go — Core implementation: builds column and index stats from merged samples
pkg/statistics/handle/globalstats/global_stats_storage.go — WriteGlobalStatsToStorage() for persisting sample-based results
pkg/statistics/analyze.go — RowCollector field on AnalyzeResults

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

Add a new session variable `tidb_enable_sample_based_global_stats` (default OFF) that builds global-level statistics for partitioned tables directly from merged partition samples, improving both speed and accuracy compared to the existing merge-based approach.

Build global-level statistics from merged partition samples instead of merging already-built partition-level TopN and Histogram structures. The existing merge approach is O(P² × T × H) for TopN and has known accuracy issues (bucket overlap heuristics, missed globally frequent values). The sample-based approach reuses the same BuildHistAndTopN() code path used for per-partition stats, producing equivalent quality global stats ~750× faster by avoiding cross-partition reconciliation. Gated behind `SET tidb_enable_sample_based_global_stats = ON` (default OFF). Falls back to the existing merge path when samples are unavailable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ti-chi-bot · 2026-02-15T13:26:36Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

mjonss · 2026-02-15T13:26:42Z

/ok-to-test

tiprow · 2026-02-15T13:26:49Z

Hi @mjonss. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

codecov · 2026-02-15T13:48:38Z

Codecov Report

❌ Patch coverage is 37.83784% with 184 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.2147%. Comparing base (2f9776e) to head (52fcf92).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #66289        +/-   ##
================================================
+ Coverage   77.7002%   78.2147%   +0.5145%     
================================================
  Files          2006       1938        -68     
  Lines        548386     536249     -12137     
================================================
- Hits         426097     419426      -6671     
+ Misses       120629     116382      -4247     
+ Partials       1660        441      -1219

Flag	Coverage Δ
integration	`44.0727% <8.1632%> (-4.1139%)`	⬇️
unit	`76.6201% <37.8378%> (+0.2713%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`56.7974% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`48.8024% <ø> (-12.0713%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add 7 integration tests covering the sample-based global stats feature: - Basic end-to-end with range partitions (column + index) - TopN discovery with skewed data - Composite index stats - Feature disabled (merge-based fallback) - Hash partitions - V1 stats fallback - Empty partition handling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Reorder fields in AnalyzeResults to reduce pointer bytes from 136 to 56, fixing the nogo fieldalignment lint error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mjonss · 2026-02-15T14:03:43Z

/retest

…al stats Fix a critical bug where index FMSketch/NullCount/TotalSize were read from wrong offsets for tables without a clustered primary key. These tables include _tidb_rowid in the collector arrays but not in tableInfo.Columns, causing an off-by-one offset for index metadata. The fix derives the actual column count from sample rows via sampleColumnCount() instead of using len(tableInfo.Columns). Also adds memory cleanup for globalSampleCollectors after use, and strengthens test assertions with exact NDV and null_count checks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tsFromSamples Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mjonss · 2026-02-15T22:37:37Z

/retest

mjonss · 2026-02-16T01:43:14Z

/retest

Add mysql.stats_samples system table and progressive pruning logic to persist pruned per-partition sample collectors during ANALYZE. This enables future incremental global stats rebuilds by loading saved samples for unchanged partitions instead of re-analyzing all partitions. Key changes: - New system table mysql.stats_samples with save/load/delete operations - SamplePruner for budget-based progressive sample allocation across partitions - PruneTo() method for correct A-Res sub-sampling to smaller reservoir size - Integration into analyze flow: prune and persist before merge - DDL cleanup hooks for drop/truncate/reorganize partition actions - SavePartitionSamples added to StatsReadWriter interface Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add comprehensive tests for the new sample persistence feature: - PruneTo and SamplePruner unit tests (20 tests in sample_test.go) - Save/Load/Delete round-trip tests (8 tests in stats_samples_test.go) - DDL cleanup tests verifying stats_samples rows are cleaned up on drop partition, truncate partition, drop table, and reorganize partition (4 tests in ddl_test.go) Fix two bugs found during testing: - PruneTo(0) caused an index out of range panic because sampleZippedRow cannot handle MaxSampleSize=0. Add early return for targetSize <= 0. - LoadSampleCollectorsFromStorage NOT IN clause failed with "unsupported argument" because the %? SQL escaper does not support []int64 slices. Convert to []string before passing to the query. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…-stats' into sample-based-global-stats

…d later

mjonss · 2026-02-16T11:14:45Z

/retest

The new mysql.stats_samples system table shifts auto-assigned table IDs in test environments, which changes checksums and table counts in three existing tests: - bootstrap_test: table count 52 -> 53 - collect_conflicts_test: classic checksum update - importer_testkit_test: classic checksum update Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mjonss · 2026-02-16T13:38:42Z

/retest

…ageStats Update hardcoded system table count from 60 to 61 to account for the new mysql.stats_samples table. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update expected currentBootstrapVersion from 254 to 255 to account for the new upgradeToVer255 function that creates mysql.stats_samples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mjonss · 2026-02-16T23:16:32Z

/retest

The new mysql.stats_samples system table shifts auto-assigned IDs by +2 in mock stores, affecting all hardcoded DDL job/schema/table IDs in the classic-path assertions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update reserved table count from 59 to 60 in TestNextgenBootstrap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mjonss · 2026-02-17T00:42:52Z

/retest

ti-chi-bot · 2026-02-17T00:59:59Z

@mjonss: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-unit-test-ddlv1	`bf94494`	link	true	`/test pull-unit-test-ddlv1`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Remove the stats_samples system table that persisted per-partition sample collectors. Samples are still used in-memory during full table ANALYZE for building global stats, just no longer persisted to disk. This simplifies the codebase ahead of the new unified stats_global_merge_data table. Removed: table definition, bootstrap/upgrade registration, storage CRUD (stats_samples.go), interface method, subscriber cleanup calls, analyze.go persistence, DDL cleanup tests, and BUILD.bazel entries. Restored test values (table counts, checksums, DDL job IDs) that were adjusted when stats_samples was added. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mjonss · 2026-02-17T14:54:32Z

/retest

ti-chi-bot · 2026-02-17T14:54:35Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign 0xpoe, yudongusa for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Update expected system table count from 53 to 52. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mjonss · 2026-02-17T15:15:56Z

/retest

0xPoe

I haven't dove into the code yet. I just had a quick glance at the document.
In auto-analyze and manual analzye, we usually analyze only one partition. This means there is a 99% chance we will fallback to the old implementation because not all samples are available.

Not sure if I understand it correctly.

ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. labels Feb 15, 2026

ti-chi-bot bot added the do-not-merge/needs-tests-checked label Feb 15, 2026

ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. ok-to-test Indicates a PR is ready to be tested. component/statistics sig/planner SIG: Planner labels Feb 15, 2026

mjonss and others added 2 commits February 15, 2026 13:58

statistics: fix fieldalignment in AnalyzeResults struct

db60672

Reorder fields in AnalyzeResults to reduce pointer bytes from 136 to 56, fixing the nogo fieldalignment lint error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ti-chi-bot bot removed the do-not-merge/needs-tests-checked label Feb 15, 2026

mjonss and others added 2 commits February 15, 2026 22:02

statistics: remove unused indexes parameter from buildGlobalColumnSta…

143e68b

…tsFromSamples Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Feb 15, 2026

mjonss added 2 commits February 15, 2026 22:30

statistics: removed unused statsHandle parameter

e322a16

bazel_prepare

6601164

mjonss mentioned this pull request Feb 15, 2026

statistics: add combined TopN+histogram merge for global stats #66221

Draft

13 tasks

Enable tidb_enable_sample_based_global_stats as default

b7fcceb

mjonss and others added 4 commits February 16, 2026 10:14

Merge remote-tracking branch 'refs/remotes/origin/sample-based-global…

18435a0

…-stats' into sample-based-global-stats

Disabled DefTiDBEnableSampleBasedGlobalStats as default, to be change…

92a252b

…d later

mjonss and others added 2 commits February 16, 2026 14:01

executor: update mysql table count for stats_samples in TestTableStor…

f4cb22d

…ageStats Update hardcoded system table count from 60 to 61 to account for the new mysql.stats_samples table. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

br: update bootstrap version assertion for stats_samples upgrade

d5b5653

Update expected currentBootstrapVersion from 254 to 255 to account for the new upgradeToVer255 function that creates mysql.stats_samples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mjonss and others added 2 commits February 17, 2026 00:22

session: update nextgen bootstrap table count for stats_samples

bf94494

Update reserved table count from 59 to 60 in TestNextgenBootstrap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

session: fix TestMySQLDBTables table count after stats_samples removal

52fcf92

Update expected system table count from 53 to 52. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

0xPoe reviewed Feb 17, 2026

View reviewed changes

Conversation

mjonss commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Uh oh!

ti-chi-bot bot commented Feb 15, 2026

Uh oh!

mjonss commented Feb 15, 2026

Uh oh!

tiprow bot commented Feb 15, 2026

Uh oh!

codecov bot commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mjonss commented Feb 15, 2026

Uh oh!

mjonss commented Feb 15, 2026

Uh oh!

mjonss commented Feb 16, 2026

Uh oh!

mjonss commented Feb 16, 2026

Uh oh!

mjonss commented Feb 16, 2026

Uh oh!

mjonss commented Feb 16, 2026

Uh oh!

mjonss commented Feb 17, 2026

Uh oh!

ti-chi-bot bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjonss commented Feb 17, 2026

Uh oh!

ti-chi-bot bot commented Feb 17, 2026

Uh oh!

mjonss commented Feb 17, 2026

Uh oh!

0xPoe left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mjonss commented Feb 15, 2026 •

edited

Loading

codecov bot commented Feb 15, 2026 •

edited

Loading

ti-chi-bot bot commented Feb 17, 2026 •

edited

Loading