A self‑hosted semantic audio search stack that indexes your sound library, auto‑tags files, and lets you find and preview audio with natural‑language queries.
Stack: FastAPI (Python) · CLAP embeddings · Qdrant vector DB · SQLite · Gradio UI · Docker
- Semantic search: Type things like “applause” or “bad feedback buzzer” and get the right sounds first.
- Auto‑metadata: Lightweight auto‑tagging powered by text–audio similarity. Optionally merges with your own tags/descriptions.
- Hybrid ranking: Combines vector similarity with keyword matches (filename, tags, description).
- Library sync: Reindex or incremental rescan to keep Qdrant and SQLite in sync with your library folder.
- Inline preview: Play audio directly in the UI and copy/download the file URL.
- Simple admin: Edit tags/description, delete tracks, bulk update via API.
+-----------+ +------------------+ +-----------------+
| Gradio | <---> | FastAPI (CLAP, | upsert | Qdrant |
| UI | API | auto-tags, DB) | <-----> | vector search |
+-----------+ +------------------+ +-----------------+
| |
| +---- SQLite (metadata)
|
+---- Library folder (audio files; served via /media)
- Embeddings:
laion/clap-htsat-unfusedfor both audio and text. - Auto‑tags: Matches each audio file against a configurable label list via CLAP text embeddings.
- Metadata: Stored in SQLite (
trackstable) and mirrored as Qdrant payload for reranking.
git clone <your-fork> semantic-audio-search
cd semantic-audio-search
docker compose up -d --buildServices:
- API at http://localhost:8000
- UI at http://localhost:7860
The first run downloads a CLAP model and may take a few minutes.
By default the compose file mounts a host folder into the API container at /app/library. You have two options:
Edit docker-compose.yml:
api:
volumes:
- ./library:/app/library # put .wav/.mp3 here
- ./data:/app/data
- ./config:/app/configCreate the folder and drop a few audio files:
mkdir -p library data config
cp /path/to/sounds/*.wav library/If you already sync your sounds with Seafile or WebDAV (via rclone), mount that path instead of ./library:
api:
volumes:
- /mnt/seafile/semantic-audio-search/sounds:/app/library:rw
- ./data:/app/data
- ./config:/app/configIf you use the UI Upload tab, ensure the library path is a persistent bind mount; otherwise uploaded files will disappear when containers are removed.
These are set in docker-compose.yml. Override as needed.
| Variable | Default | Purpose |
|---|---|---|
QDRANT_URL |
http://qdrant:6333 |
Qdrant endpoint |
COLLECTION_NAME |
sfx |
Qdrant collection name |
DB_PATH |
/app/data/meta.sqlite3 |
SQLite path (persist this) |
LIBRARY_DIR |
/app/library |
Mounted audio library |
CLAP_MODEL |
laion/clap-htsat-unfused |
Hugging Face model id |
SYNONYMS_PATH |
/app/config/synonyms.json |
Query synonyms |
SIM_WEIGHT |
0.5 |
Weight for vector similarity |
KW_WEIGHT |
0.5 |
Weight for keyword bonus |
AUTO_TAGS |
1 |
Enable/disable auto-tags |
AUTO_TAGS_OVERWRITE |
1 |
Reindex may overwrite empty fields |
AUTO_TAGS_TOPK |
5 |
Max tags to pick |
CAPTION_MODE |
basic |
Basic description template |
AUTO_TAGS_LABELS_PATH |
/app/config/auto_tags_labels.json |
Custom label list (optional) |
AUTO_TAGS_MIN_SIM |
0.28 |
Similarity floor before picking extra labels |
AUTOTAGS_MODE |
merge |
merge | fill_missing | overwrite |
API_URL (UI) |
http://api:8000 |
API URL inside the Docker network |
API_URL_PUBLIC (UI) |
unset | Public base URL for clickable links (e.g., http://localhost:8000) |
Custom labels: create /app/config/auto_tags_labels.json with either:
["applause","ui click","buzzer","whoosh"]or
{"labels": ["applause","ui click","buzzer","whoosh"]}Base URL: http://localhost:8000
Health probe.
Semantic search with hybrid reranking.
Returns: list of results {id, filename, path, score, duration, tags, description, url}.
Fields: file (audio), tags (str, optional), description (str, optional), auto (0/1).
Behavior depends on AUTOTAGS_MODE: merge (default), fill_missing, or overwrite.
Full scan of LIBRARY_DIR. Inserts new files and (optionally) overwrites empty metadata.
Only updates changed/new files; deletes records for missing files.
Serves audio files from the library for inline playback.
Downloads a file by track id with the original filename.
List all tracks {id, filename, tags, description} (paginated via limit/offset).
Update tags/description for a single track.
Delete a track (DB + Qdrant + file on disk if present).
Export all track metadata as JSON.
Bulk upsert by id or filename:
{"items":[{"id":"...","tags":"ui tap","description":"short chime"}]}Fill in missing tags/descriptions via auto‑tags.
Regenerate tags/descriptions for all rows (obeys AUTO_TAGS_OVERWRITE).
Garbage‑collect Qdrant points that don’t exist in SQLite.
- Open http://localhost:7860.
- Search tab:
- Enter a query → see results with score, tags, description.
- Select a row to preview and download.
- Edit tags/description and Save.
- Rescan library to ingest new files.
- Manage Library tab:
- Upload a new file (optionally disable auto‑tagging with the checkbox).
- Prefer mounting a persistent library path so uploads survive container restarts.
- Qdrant collection uses HNSW index; you can tune
mandef_constructat creation andhnsw_efat search time. - The API caches text embeddings (
@lru_cache) to speed up repeated queries. - For larger libraries, consider running the API with multiple workers (e.g.,
--workers 2) and giving Qdrant more RAM. - The audio embedder truncates to ~10s mono @48kHz to keep inference fast and memory low.
- Modern React/Next.js frontend
- Sorting/pagination, dark mode, bulk tagging
- Optional segment‑level embeddings (long files → playable snippets)
- Auth + HTTPS for public demos
- CLAP:
laion/clap-htsat-unfused(Hugging Face) - Qdrant: vector database
- Gradio: quick admin UI
- Zenodo pretrained assets downloaded in the API Dockerfile are public resources from their respective authors.
- No external calls at query time except model downloads on first run.
- Keep your library and
data/on trusted storage; no analytics or telemetry are collected. - If you expose the API publicly, put it behind a reverse proxy with HTTPS and auth.
MIT. See LICENSE for details.