An interactive educational tool to visualize how Approximate Nearest Neighbor (ANN) search algorithms work, specifically focusing on the Inverted File Index (IVF) method.
This application demonstrates the trade-off between search speed (scanned points) and accuracy (recall) using real-time interactive visualizations.
- Generative Datasets: Uses Google Gemini 2.0 Flash to generate semantic datasets (e.g., "Sci-Fi Movies", "Spicy Foods") with meaningful 2D coordinates.
- Interactive Visualization:
- D3.js powered graph showing data points and cluster boundaries (Voronoi cells).
- Draggable "Query Target" to see nearest neighbors update in real-time.
- Algorithm Comparison:
- Brute Force: Visualizes checking every single point (100% accuracy, slow).
- IVF (Inverted File Index): Visualizes the "Cluster & Probe" approach (faster, approximate).
- Real-time Tuning: Adjust
N Clusters,N Probes, andTop Kto observe how they affect the search radius and accuracy.
- Frontend: React 19, TypeScript
- Visualization: D3.js
- AI Integration: Google GenAI SDK (
@google/genai) - Styling: Tailwind CSS, Lucide React
The core visualization showing Voronoi cells, data points, and the search radius.

Generate semantic datasets using Gemini.

Toggle between Brute Force and IVF, and tune parameters like N-Probes.

Real-time feedback on points scanned versus accuracy (Recall).

The resulting items found by the algorithm.

-
Install Dependencies
npm install
-
Set API Key You need a Google Gemini API Key.
export API_KEY="your_gemini_api_key_here"
-
Start the Dev Server
npm run start
- Generate Data: Enter a prompt (e.g., "50 famous rock bands") and click "Generate". Gemini will create data points positioned by semantic meaning.
- Drag the Target: Move the pink "Target" dot around the canvas.
- Switch Algorithms:
- Select Brute Force to see the baseline (checking all points).
- Select IVF to see the optimization.
- Tune Parameters:
- Total Clusters: Re-runs K-Means to divide the space into more/fewer cells.
- N Probes: Controls how many neighboring cells are checked. Increase this to improve accuracy (Recall) at the cost of scanning more points.
Inverted File Index (IVF) works by clustering the vector space.
- Training: We use K-Means to group points into clusters (Voronoi cells).
- Indexing: Each point is assigned to a specific cluster ID.
- Searching:
- Instead of checking every point, we identify which cluster the query point falls into.
- We only scan points within that cluster (and
nnearest neighboring clusters/probes). - This drastically reduces the number of distance calculations required.