Skip to content

perf(index): optimize indexing time by ~35% for large repositories#1035

Closed
clemlesne wants to merge 1 commit intosourcegraph:mainfrom
clemlesne:perf/optimize-indexing-time
Closed

perf(index): optimize indexing time by ~35% for large repositories#1035
clemlesne wants to merge 1 commit intosourcegraph:mainfrom
clemlesne:perf/optimize-indexing-time

Conversation

@clemlesne
Copy link
Copy Markdown
Contributor

Summary

  • Merge postings and lastOffsets maps into a single map[ngram]*postingEntry — reduces map operations from 4 to 1 per rune in the trigram generation hot loop (mapassign_fast64 dropped from 1.64s to 0.30s)
  • Pre-allocate postings map (200K capacity hint) and posting data slices (64-byte initial capacity) to reduce growslice and madvise overhead
  • Pipeline document creation with builder processing — a goroutine pre-fetches git blobs via a buffered channel, overlapping I/O with CPU-bound trigram generation (~1.6s saved)

Benchmark

Measured on kubernetes repository (26K files, ~200MB content, shallow bare clone), zoekt-git-index with -disable_ctags:

Time Change
Before 13.2s
After 8.6s -35%

CPU profiled with -cpu_profile, each experiment measured with median of 3-5 runs. 16 experiments total, 4 kept after measurement. Guard: go test ./index/... ./gitindex/... passed after each change.

Test plan

  • go test ./index/... passes
  • go test ./gitindex/... passes
  • CI passes
  • Verify indexing produces identical shards (same search results)

🤖 Generated with Claude Code

Reduce indexing time from 13.2s to 8.6s (-35%) on the kubernetes repository
(26K files, ~200MB content) through four targeted optimizations identified
via CPU profiling.

## Changes

### 1. Merge postings and lastOffsets maps into single map (index/shard_builder.go)

The trigram generation hot loop (`newSearchableString`) previously used two
separate maps: `postings map[ngram][]byte` and `lastOffsets map[ngram]uint32`.
Each rune required 2 map reads + 2 map writes across both maps.

Replaced with a single `map[ngram]*postingEntry` where `postingEntry` holds
both the data slice and last offset. After the initial map lookup, all
modifications go through the pointer — reducing map operations from 4 to 1
per rune. This cut `mapassign_fast64` time from 1.64s to 0.30s (5.5x).

### 2. Pre-allocate postings map and data slices (index/shard_builder.go)

- Pre-allocate the postings map with 200K capacity hint (a typical shard
  contains 50K-200K unique trigrams), avoiding repeated map growth.
- Pre-allocate each `postingEntry.data` byte slice with 64-byte initial
  capacity, avoiding the first several grow operations per trigram.

### 3. Pipeline document creation with builder processing (gitindex/index.go)

Document creation (git blob reading + decompression) and builder processing
(trigram generation + shard building) were fully sequential. Added a
goroutine pipeline: a producer reads git blobs ahead of the main loop via a
buffered channel (64 slots), overlapping I/O with CPU-bound processing.
This alone saved ~1.6s (-16%).

### 4. Update write.go for new postingEntry struct (index/write.go)

Updated `writePostings` to access `s.postings[k].data` instead of
`s.postings[k]` to match the new `postingEntry` struct.

## Profiling methodology

- CPU profiled with `-cpu_profile` flag on kubernetes bare clone (shallow)
- Each experiment measured with median of 3-5 runs
- Guard: `go test ./index/... ./gitindex/...` after each change
- 16 experiments total, 4 kept, 12 discarded after measurement

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clemlesne
Copy link
Copy Markdown
Contributor Author

Closing — upstream has already landed equivalent (and more comprehensive) optimizations in PRs #1020 and #1021 that supersede all 4 changes here.

@clemlesne clemlesne closed this Apr 1, 2026
@clemlesne clemlesne deleted the perf/optimize-indexing-time branch April 1, 2026 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant