This project implements fast, robust chord recognition from raw guitar audio using Rust. Frequency-domain analysis is performed via FFT-based signal processing, and chord classification is structured around extendable music-theoretic templates. All chord samples were recorded on an unamplified acoustic guitar (4 s per sample), providing a controlled dataset for iterative algorithm development.
The architecture is designed to be modular and testable, with a clear path toward real-time analysis.
- Background & Theory
- Audio I/O and FFT
- Chord Matching Engine
- Batch-Matching Diagnostics & Logging Pipeline
- Project Structure
- Implemented Features
- Project Status & Future Work
- Contributing
In the 12-tone equal temperament system, each pitch class of the chromatic scale can be represented by an integer modulo 12 with;
| Pitch | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Integer | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
The cycle of fifths is built by repeatedly moving a perfect fifth. Since a perfect fifth equals 7 semitones, the transformation on pitch classes is given by:
By performing calculations modulo 12, any value that exceeds 11 wraps around to start again at 0, ensuring that the pitch classes form a continuous cyclic system. Starting with C (0), we obtain:
$\text{C} \to 7 \times 0 \mod 12 = 0 \ (\text{C})$ $\text{G} \to 7 \times 1 \mod 12 = 7 \ (\text{G})$ $\text{D} \to 7 \times 2 \mod 12 = 2 \ (\text{D})$ -
$\dots$ and so on (as illustrated in the LH figure below)
The RH figure is a permutation of the 12 pitch classes mapped to the unit circle and orders them so that:
- Keys with fewer accidentals (sharps or flats) are grouped near the top,
- Clockwise movement around the circle accumulates sharps.
- Counter-clockwise movement accumulates flats.
Two notes are said to be enharmonically equivalent if they have the same pitch on a tempered instrument. In this project, this equivalence is ignored such that,
-
$\text{C}\sharp$ and$\text{D}\flat$ $\to \text{C}\sharp$ -
$\text{F}\sharp$ and$\text{G}\flat$ $\to \text{F}\sharp$ -
$\text{G}\sharp$ and$\text{A}\flat$ $\to \text{G}\sharp$ -
$\text{A}\sharp$ and$\text{B}\flat$ $\to \text{A}\sharp$
Chords are built by choosing specific pitches (or scale degrees) from a scale. For example, in the case of the major scale (for the key of C major:
- A major chord is constructed by taking the
$1^{\text{st}}$ ,$3^{\text{rd}}$ , and$5^{\text{th}}$ (e.g.,$\text{C, E, G}$ inC major) - A minor chord is built using the
$1^{\text{st}}$ ,$\flat 3^{\text{rd}}$ , and$5^{\text{th}}$ degrees (e.g.,$\text{C}, \text{E}\flat, \text{G}$ inC minor)
More complex chords include an additional note:
- Dominant 7th:
$1^{\text{st}}$ ,$3^{\text{rd}}$ ,$5^{\text{th}}$ and$\flat 7^{\text{th}}$ —$\text{C, E, G, B}\flat$ forC7 - Major 7th:
$1^{\text{st}}$ ,$3^{\text{rd}}$ ,$5^{\text{th}}$ and$7^{\text{th}}$ —$\text{C, E, G, B}$ forCmaj7 - Minor 7th:
$1^{\text{st}}$ ,$\flat 3^{\text{rd}}$ ,$5^{\text{th}}$ and$\flat 7^{\text{th}}$ —$\text{C, E}\flat, \text{G, B}\flat$ forCmin7
The above interval structures can be represented by a list of semitone distances relative to a root note at index 0;
- Major:
[0, 4, 7] - Minor:
[0, 3, 7] - Dominant 7th:
[0, 4, 7, 10] - Major 7th:
[0, 4, 7, 11] - Minor 7th:
[0, 3, 7, 10]
In Rust, we define an enum ChordType to build a library of CHORD_TEMPLATES (which is instantiated at runtime as a HashMap - more details can be found in chords.md)
/// Types of chords supported, such as major, minor, dominant 7th, etc.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum ChordType {
Major,
Minor,
Dominant7,
Major7,
Minor7,
}
impl ChordType {
/// Returns the semitone intervals defining the chord relative to its root.
/// In code, each chord type is represented by its semitone intervals:
pub fn intervals(&self) -> &'static [u8] {
match self {
ChordType::Major => &[0, 4, 7],
ChordType::Minor => &[0, 3, 7],
ChordType::Dominant7 => &[0, 4, 7, 10],
ChordType::Major7 => &[0, 4, 7, 11],
ChordType::Minor7 => &[0, 3, 7, 10],
}
}
}CHORD_TEMPLATES is fully extendable and will support suspended chords in the future. Chord classification is a complex subject: Wikipedia - List of chords
When we analyse a sound using the FFT, we obtain a spectrum that shows energy spread across a continuous range of frequencies.
Music theory divides this continuous range into discrete steps; most commonly, 12 semitones per octave. This division is logarithmic; each semitone is a frequency multiple of
The MIDI tuning system formalises this idea by quantising the continuous frequency spectrum into discrete steps. If
Where
We define a function in Rust to quantise frequencies
/// List of note names representing the 12 chromatic pitch classes.
pub static NOTES: [&str; 12] = [
"C", "C#", "D", "D#", "E", "F", "F#", "G", "G#", "A", "A#", "B"
];
pub fn freq_to_note(freq: f32) -> String {
let a4 = 440.0;
let midi = (69.0 + 12.0 * (freq / a4).log2()).round() as i32;
let note_name = NOTES[(midi as usize) % 12];
let octave = (midi / 12) - 1;
format!("{}{}", note_name, octave)
}An inverse function note_to_freq was also developed. More information can be found in pitch.md
The hound crate is used to read .wav files and extract audio samples as Vec<f32> along with the sample rate in Hz. Both 16-bit integer and 32-bit float formats are supported.
See read_wav_file() for details. Usage;
let path = Path::new("assets/guitar_samples/A_chords/Amaj.wav")
let (sample_rate, samples) = read_wav_file(path).unwrap();
// Debug/sanity check
println!("Sample rate: {}", sample_rate);
println!("First 10 samples: {:?}", &samples[..10]);The FFT rustfft crate with wrapper function compute_fft following NumPy's norm='forward' convention.
Usage:
let fft_output: Vec<Complex<f32>> = compute_fft(&samples);The ESD (or power) of the fft_output is defined as,
The power vector has real components. Hence,
let power: Vec<f32> = fft_output.iter().map(|c| c.norm_sqr()).collect();To plot samples by sample_time and shift the zero component of frequency to the centre of the spectrum.
let freqs = fftfreq(samples.len(), 1.0 / sample_rate as f32);
let shifted_freqs = fftshift_real(&freqs);
let shifted_power = fftshift_real(&power);Core helpers aimed to replicate the numpy.fft backend. See fftfreq, fftshift for details.
Using the plotters crate, the below figure is the ESD plot of Amaj7.wav. See plot_energy_spectral_density for details.
The resonance artefact in the above figure has approximate frequency ChordMatcher's peak filtering step:
// Cull known broad guitar-body resonance (≈110–115 Hz)
let body_filtered: Vec<PeakData> = peaks
.iter()
.filter(|p| !(p.freq >= 110.0 && p.freq <= 115.00))
.cloned()
.collect();To classify chords, we want to capture tuples of (freqency, amplitude)
This process in documented ad nauseum in peak_detection.md.
High-level we;
- Isolate spectral peaks whose amplitudes exceed a specified
threshold_percentile. - Apply a
min_spacingconstraint to avoid closely spaced duplicates. - Select the top
$M$ peaks by amplitude after filtering. - Map each frequency to its closest equal-tempered pitch class.
- Return an ordered list of
PeakDatacontainingfreq,amplitude, andnote.
Usage:
pub struct PeakData {
pub freq: f32,
pub amplitude: f32,
pub note: String,
}
let opts = RunOptions {
top_n: 6,
threshold_percentile: 99.5,
min_spacing: 10.0,
};
let peaks = detect_peaks_from_fft(
&shifted_freqs,
&shifted_power,
opts.threshold_percentile,
opts.min_spacing,
opts.top_n
);For Amaj7.wav we have;
Top 6 peaks:
Freq: 208.25 Hz, Amp: 1.56e-4, Note: G#3
Freq: 108.75 Hz, Amp: 1.40e-4, Note: A2
Freq: 164.50 Hz, Amp: 4.70e-5, Note: E3
Freq: 220.50 Hz, Amp: 3.12e-5, Note: A3
Freq: 278.00 Hz, Amp: 2.89e-5, Note: C#4
Freq: 330.75 Hz, Amp: 8.91e-6, Note: E4
src/analysis/spectrogram.rsmodule:
This module implements a windowed STFT to compute a dB-scaled spectrogram using a real-to-complex FFT (realfft) over overlapping frames.
The goal is to analyse how the spectral content of a signal evolves over time. Unlike a single FFT, which returns global frequency components, the STFT slides a window along the signal to produce a time–frequency representation.
- Input: time-domain signal as
Vec<f32> - Output:
Spectrogramstruct with- Time axis (
Vec<f32>) - Frequency axis up to
max_freq(Vec<f32>) - 2D dB matrix (
Array2<f32>)
- Time axis (
- Crates used:
-
Frame-wise windowing
The input signal is segmented into overlapping frames of length
window_size. Successive frames are offset byhop_sizesamples. The number of frames is computed as:$$n_{\text{frames}} = \left\lfloor \frac{L - N}{H} \right\rfloor + 1$$
-
$L$ is total sample length -
$N$ iswindow_size -
$H$ ishop_size
-
Hann window application
Each frame is multiplied elementwise with a Hann window:
$$w[n] = 0.5 \left(1 - \cos\left(\frac{2\pi n}{N-1}\right)\right)$$ This reduces discontinuities at frame boundaries and mitigates spectral leakage.
-
Real-to-complex FFT
We compute the FFT of each windowed frame using:
let mut planner = RealFftPlanner::<f32>::new(); let r2c = planner.plan_fft_forward(window_size);
Only the first
$N/2 + 1$ frequency bins are kept due to Hermitian symmetry of the real FFT.
-
Power spectrum → dB scale
For each frequency bin
$k$ of each frame, compute:$$P_k = |\widetilde{V}_k|^2$$ The power at each frequency bin is converted to a decibel (dB) scale by applying a logarithm. A small constant is added to the power values beforehand to prevent issues with taking the log of zero.
-
Matrix construction
All frame-wise dB spectra are stacked into an output matrix of shape:
$$\text{shape} = \left[n_{\text{freq bins}},\ n_{\text{frames}}\right]$$ The frequency axis is truncated to remove bins with
$f > \text{max}_{\text{freq}}$
pub struct Spectrogram {
pub freqs: Vec<f32>, // Frequency axis (Hz)
pub times: Vec<f32>, // Time axis (s)
pub data: Array2<f32>, // Spectrogram in dB [freq_bins x time_bins]
}- A reference STFT crate was considered but abandoned due to outdated or broken dependencies.
- This implementation ensures full control over axis truncation, scale handling, and matrix layout.
- The returned spectrogram is suitable for visualisation with
plotters.
| Parameter | Meaning | Example |
|---|---|---|
window_size |
Length of each FFT frame (samples) | 1024 |
hop_size |
Step size between frames (samples) | 256 |
max_freq |
Upper limit for frequency axis (Hz) | 3000.0 |
These correspond to nperseg and noverlap = nperseg - hop_size in SciPy’s spectrogram().
For Amaj7.wav we have;
src/analysis/matching.rsmodule
ChordMatcher is a pluggable scoring engine that converts a small set of
top-N spectral peaks into a chord label. Its design deliberately separates
fast maths from music theory heuristics, making it easy to extend. Is it exposed via the public API identify_chord.
| Layer | File / Type | What it does | How to extend |
|---|---|---|---|
| Peak prep | PeakData (processing::peak_detection) |
FFT peaks are de-noised → [PeakData]. |
Tune top_n, threshold_percentile, or swap in a smarter peak-finder. |
| Metadata cache | ScoringMetadata |
One pass builds:pitch_classes, pc_counts, max_amplitudes, bass_pc. |
Add new derived features (e.g., spectral centroid) without touching the scoring loop. |
| Template loop | ChordMatcher::score_chord |
Iterates over CHORD_TEMPLATES; computes a weighted score. |
Add a new weight term by editing one function; all templates inherit it automatically. |
| Heuristic knobs | ChordMatcher { amplitude_scale, penalty_extra, … } |
Field values drive the formula. | Expose them via CLI flags or a TOML config for easy tuning. |
| Filtering | filter_peaks + Voicing enum |
Drops impossible bass notes (Open, Barre, etc.). | Add Voicing::PowerChord, Voicing::DropD, etc. |
| Logging plug-in | Box<dyn MatchLogger> |
Injects FileLogger, NullLogger, or your own implementation. |
Swap in a JsonLogger or tracing subscriber without recompiling the matcher. |
total_score =
base_score // template-note recall
+ amplitude_scale × Σ loudness(matched) // louder matches = better
+ order_bonus (root in bass) // voicing preference
+ repetition_weight × extra occurrences // open-string drones, etc.
− penalty_extra × extraneous notes // stray peaks hurt
All coefficients are public fields, so experiments in Python can overwrite these values via serde/TOML and re-run without code edits.
Usage:
#[derive(Deserialize)]
struct MatcherCfg {
amplitude_scale: f32,
penalty_extra: f32,
order_bonus: f32,
repetition_weight: f32,
}
let cfg: MatcherCfg = toml::from_str(&fs::read_to_string("weights.toml")?)?;
let matcher = ChordMatcher {
amplitude_scale: cfg.amplitude_scale,
penalty_extra: cfg.penalty_extra,
order_bonus: cfg.order_bonus,
repetition_weight: cfg.repetition_weight,
..Default::default()
};New qualities (sus2, add9, diminished, …) live in theory/chords.rs:
ChordType::Sus2 => &[0, 2, 7],The matcher auto-discovers them at run-time; no other changes needed.
TL;DR – ChordMatcher is a self-contained, hot-swappable component:
adjust weights, add templates, or plug in alternative loggers without
ripple effects across the codebase.
The batch-matching toolchain ties together three key modules:
| Rust file | Responsibility |
|---|---|
analysis/matching.rs |
Core heuristic engine — converts FFT peak lists into a best-guess chord plus a rich MatchDetails breakdown. |
utils/logging.rs |
Writes per-sample logs, full Markdown diagnostics, and a CSV summary (results.csv). Location defaults to output/logs/, but tests can override via SA_LOG_BASE. |
dev/debug_batch_matching.rs |
Convenience runner that loops over a folder of samples, feeds each one through the matcher, and hands the results to the logger. |
A thin CLI wrapper lives in src/bin/test_batch_matching.rs.
Run it from project root:
cargo run --bin test_batch_matchingoutput/
├── diagrams/
│ └── group/ # SVG images of the chord diagram
└── logs/
├── group/ # one folder per chromatic root
│ ├─ passes/ # .log files for correct predictions
│ ├─ failures/ # .log files for mismatches
│ └─ full_scores/ # Markdown tables of every candidate ≥2-note match
└─ results.csv # flat file with 1 row per sample
.log— one-liner score breakdown (quick grep-able view).full_scores/<label>.md— rich Markdown report with peak lists and a sortable table of candidate chords.results.csv— machine-readable dataset for later analysis.
- forked crate:
pineapple-bois/chord-gen
| What it does | Key points |
|---|---|
Generates a guitar-fret SVG (e.g. F#sus2.svg) and drops it into output/diagrams/…. |
• Built on a fork of whostolemyhat/chord-gen. • Added Rust-native API; no more CLI calls. • Optional background rectangle for light/dark integration. |
| Picks the correct shape family (Barre5, Barre6, Open), then transposes & sets the barre-fret. | • assets/voicings.json holds base shapes.• BARRE5_OFF / BARRE6_OFF offset tables ensure the barre is drawn at the right fret.• Graceful fallback (Barre5 → Barre6 → Open) if a quality is missing. |
Lets callers decide where the file goes via PathMode. |
• PathMode::Test → output/diagrams/test/{file}.svg • PathMode::Group → output/diagrams/{root}_group/{file}.svg |
Integrates with the matcher in one line (only for Certain matches). |
render_diagram(label, &name, false, PathMode::Group)?; |
use utils::chord_diagrams::{render_diagram, PathMode};
render_diagram(
"Bmaj_sample", // stem
"Bmaj", // chord label
true, // background on
PathMode::Group, // or Test
)?;Internal logic handles creation of Open or Barre chords
Upstream chord-gen |
Our fork |
|---|---|
CLI-only (--frets …) |
Added types::Chord + render_svg() (returns u64 tmp-name). |
| Random filenames | We rename to {label}.svg after render. |
| Fixed footer credit | Template stripped; MIT licence left intact. |
TL;DR — drop render_diagram() into any MatchOutcome branch and they’ll
be ready-to-embed SVG chord charts alongside logs files.
The CSV is designed for painless import into pandas / Jupyter:
import pandas as pd
df = pd.read_csv("output/logs/results.csv")
df.head()- Feature importance — correlate
base_score,amplitude_bonus, etc. withis_matchto spot overweight or under-weight terms. - Threshold sweeps — grid-search
penalty_extra,order_bonus, etc. to maximise accuracy. - Harmonic library — mine recurring “extraneous” notes to seed a lookup table of expected overtones (e.g., open-string drones, body resonances).
Once tuned, we'll feed the new constants back into analysis/matching.rs (or promote them to a config file) and re-run the batch script. Rinse & repeat until the hit-rate nudges past the current 80 % success mark!
A Python data analysis project was created to run some grid searches, and we eventually entered a feedback loop for weights.toml updates. Therefore, new scoring features were developed in ChordMatcher:
-
detect_triad— Returnstrueif root–3rd–5th or root-2nd-5th are all present in the detected set. -
seventh_evidence-
$+$ scorewhen 7th is confidently present, -
$-$ scorewhen expected but weak / missing.
-
- Normalised peak amplitudes — Help with the scoring transparency.
These optimisations led to a 100% success rate for the library of samples (now extended with sus2 chords). However, there were some borderline cases. Hence, it was necessary to measure confidence in a ChordMatcher match. Two parameters were defined:
-
DELTA_S_STAR(0.45):$\Delta\text{S}=\text{S}_1 - \text{S}_2$ where-
$\text{S}_1=$ best_match -
$\text{S}_2=$ second_match
-
-
CONF_STAR(0.75):$\sum\text{S}=\exp(\text{S}_1) / \exp(\text{S}_1) + \exp(\text{S}_2)$ - Soft‑max confidence of the best chord over the runner‑up
The constants come from the fifth‑percentile of each metric measured on the clean 100%‑accurate sample set.
In practice
- A loud rogue peak that makes a second template almost as plausible drops
ΔS, failing the gate. - A low‑fret recording where amplitudes are flatter shrinks
ΣS, also failing. - Clean, confident matches clear both thresholds and are logged as passes.
Defined: MatchOutcome enumerable type:
#[derive(Debug)]
pub enum MatchOutcome {
Certain { name: String, details: MatchDetails, conf: f32 },
Ambiguous { name: String, second: String, delta: f32, conf: f32 },
NoMatch,
}NoMatch is returned if and only if peak data is invalid. Therefore, a mechanism needs to be written to handle Ambiguous when transitioning to streaming audio.
The chord library was recorded using 5th-fret barre shapes and a capo (to make it as painless as possible). Ambiguous matches by the above metrics:
There's some distance between the highest and lowest chord total_scores. Therefore, further grid sweeps are required to push that confidence threshold north.
signal_analysis/
├── assets/
│ ├── figures/
│ └── guitar_samples/ # Sample guitar recordings (WAV files)
├── notes/ # Markdown notes documenting modules
│ ├── analysis/
│ ├── processing/
│ ├── theory/
│ └── utils.md
├── output/ # Plots and results (built by Rust)
├── src/
│ ├── analysis/ # Visualisation and STFT
│ │ ├── mod.rs
│ │ ├── spectrogram.rs
│ │ └── visualisation.rs
│ ├── audio/ # (Placeholder - future audio features)
│ │ └── mod.rs
│ ├── processing/ # FFT utilities, spectral analysis
│ │ ├── mod.rs
│ │ ├── fft_utils.rs
│ │ └── peak_detection.rs
│ ├── theory/ # Static music theory templates
│ │ ├── mod.rs
│ │ ├── chords.rs
│ │ └── notes.rs
│ ├── utils/ # Sample loading, WAV file reading
│ │ ├── mod.rs
│ │ ├── run_samples.rs
│ │ ├── sample_loader.rs
│ │ └── wav_reader.rs
│ ├── lib.rs # Library API (re-exports modules)
│ └── main.rs # CLI runner for experiments
├── Cargo.toml
└── README.md
-
WAV file parsing — Reads normalised
Vec<f32>samples + sample rate viahound -
Static chord template library — Defines
CHORD_TEMPLATESfor major, minor, dominant7, major7 and minor7 structures -
FFT computation — Forward transform using
rustfft(NumPynorm='forward'convention) -
Energy spectral density — Computes
$|\widetilde{V}|^2$ from the complex FFT -
Frequency bin generation — Replicates NumPy’s
fftfreq -
Spectrum shifting — Centres zero frequency using a real-valued
fftshift - Peak detection — Extracts top-$N$ spectral peaks using percentile thresholding and spacing constraints
- Pitch class quantisation — Logarithmic MIDI mapping with equal-tempered tuning
- Short-Time Fourier Transform (STFT) — Hann-windowed, frame-wise time–frequency analysis
-
Spectrogram plotting — High-resolution dB-scaled figures via
plotters+colorous - Unit tests — Coverage for FFT utilities, peak filtering and chord template logic
This project is currently a research and prototyping platform for chord recognition from raw guitar audio. Development is organised around the following milestones:
A full pipeline exists for loading .wav samples, computing FFTs, detecting peaks, matching chord templates and writing structured logs. Current work focuses on hyperparameter optimisation and confidence metrics.
A self-contained STFT implementation produces dB-scaled spectrograms suitable for plotting or later real-time visualisation.
A dedicated CLI is being built to:
- Load and process an audio file directly from the command line
- Display classification results in the terminal
- Optionally save plots and logs to disk
Future integration may include argument parsing (clap) and progress bars (indicatif).
Infrastructure exists to scan a directory of samples, process each file, and aggregate results to CSV/Markdown. JSON-based persistence via serde is supported.
The architecture was designed with streaming support in mind. Planned extensions include:
- Real-time audio capture via
cpal - Incremental FFT and chord matching
- Real-time confidence gating
Longer-term, the goal is to provide a lightweight graphical interface for:
- Drag-and-drop audio files
- Real-time spectrogram visualisation
- Live chord identification
Potential frameworks include egui/eframe or iced.
Contributions are welcome. Areas of interest include:
- Improved DSP algorithms (FFT/STFT performance, peak analysis)
- Extensions to the chord template library (sus, add, diminished, etc.)
- Additional heuristics or confidence measurements for the matcher
- CLI improvements and configuration interfaces
- Real-time or GUI-based features
- Python bindings or integration examples
If you are interested in collaborating, please open an issue or submit a pull request. The repository is structured for clarity and extensibility, and internal modules are documented with the intention of lowering the barrier to entry for contributors.
Built with 🦀 Rust and 🎸 music in mind.
Brian Eno articulates well the complexity in objectively quantifying musical experience.
"As far as your mind is concerned, nothing happens the same twice, even if in every technical sense, the thing is identical. Your perception is constantly shifting. It doesn’t stay in one place."1





