Most audio libraries expose samples as raw numeric buffers. In Python,
audio is typically represented as a NumPy array whose dtype is
explicit, but whose meaning is not: sample rate, amplitude range,
memory interleaving, and PCM versus floating-point semantics are tracked
externally, if at all. In Rust, the situation is reversed but not
resolved. Libraries provide fast and safe low-level primitives, yet
users are still responsible for managing raw buffers, writing ad hoc
conversion code, and manually preserving invariants across crates.
AudioSamples closes this gap with a strongly typed audio representation that encodes sample format, numeric domain, channel structure, and layout in the type system. All operations preserve or explicitly update these invariants, supporting both exploratory workflows and system-level use without requiring users to remember hidden conventions or reimplement common audio logic.
cargo add audio_samplesThe default feature set (bare-bones) includes only the core types and
traits. Add features for the operations you need — see Features.
use audio_samples::{sample_rate, AudioTypeConversion, cosine_wave, sine_wave};
use std::time::Duration;
fn main() {
let sr = sample_rate!(44_100);
let duration = Duration::from_secs_f64(1.0);
// Generate a 440 Hz sine wave as i16 PCM, then convert to f32
let float_sine = sine_wave::<i16>(440.0, duration, sr, 0.5).as_f32();
// Mix with a 220 Hz cosine wave
let cosine = cosine_wave::<f32>(220.0, duration, sr, 0.5);
let mixed = float_sine + cosine;
}Enable the transforms feature:
cargo add audio_samples --features transformsuse audio_samples::{AudioSamples, AudioTransforms, nzu, sample_rate, sine_wave};
use spectrograms::{ChromaParams, CqtParams, MfccParams, StftParams, WindowType};
use std::time::Duration;
fn main() -> audio_samples::AudioSampleResult<()> {
let sr = sample_rate!(44100);
let audio: AudioSamples<'static, f64> =
sine_wave::<f64>(440.0, Duration::from_millis(200), sr, 0.8);
let fft = audio.fft(nzu!(8192))?;
let stft_params = StftParams::new(nzu!(1024), nzu!(256), WindowType::Hanning, true)?;
let stft = audio.stft(&stft_params)?;
let mfcc = audio.mfcc(&stft_params, nzu!(40), &MfccParams::speech_standard())?;
let chroma = audio.chromagram(&stft_params, &ChromaParams::music_standard())?;
let (_freqs, _psd) = audio.power_spectral_density(nzu!(1024), 0.5)?;
let _cqt = audio.constant_q_transform(
&CqtParams::new(nzu!(12), nzu!(7), 32.7)?,
nzu!(256),
)?;
// Round-trip via inverse STFT
let _reconstructed = AudioSamples::<f64>::istft(stft)?;
Ok(())
}AudioSamples creation returns a Result because validity requires
consistent buffer length, channel count, and sample rate. The
sample_rate! macro and non_empty_vec! guarantee invariants at
construction:
use audio_samples::{AudioSamples, sample_rate};
use non_empty_slice::non_empty_vec;
let audio = AudioSamples::from_mono_vec(
non_empty_vec![0.1f32, 0.2, 0.3],
sample_rate!(44100),
);For multi-channel audio:
use audio_samples::{AudioSamples, sample_rate};
use ndarray::array;
let stereo = AudioSamples::<f32>::new_multi_channel(
array![[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
sample_rate!(44100),
).unwrap();The default feature is bare-bones — the core types and traits with no
optional dependencies. Enable features as needed:
| Feature | Description |
|---|---|
statistics |
Descriptive statistics: peak, RMS, mean, variance |
processing |
Normalization, scaling, clipping (requires statistics) |
editing |
Trim, pad, reverse, perturb, concatenate (requires statistics, random-generation) |
channels |
Interleave/deinterleave, mono↔stereo conversion |
iir-filtering |
IIR filter design and application |
parametric-eq |
Parametric EQ bands (requires iir-filtering) |
dynamic-range |
Compression, limiting, expansion |
envelopes |
Amplitude, RMS, and attack-decay envelopes |
vad |
Voice activity detection |
| Feature | Description |
|---|---|
transforms |
FFT, STFT, MFCC, chromagram, CQT, PSD |
pitch-analysis |
YIN and autocorrelation pitch detection (requires transforms) |
onset-detection |
Onset detection (requires transforms, peak-picking, processing) |
beat-tracking |
Beat tracking |
peak-picking |
Peak picking on onset envelopes |
decomposition |
Audio decomposition (requires onset-detection) |
| Feature | Description |
|---|---|
resampling |
Sample-rate conversion via rubato |
random-generation |
Noise and random audio generation |
fixed-size-audio |
Fixed-size buffer support (no heap allocation) |
plotting |
Interactive HTML plots via plotly |
static-plots |
PNG/SVG export (requires plotting — see PLOTTING.md) |
simd |
SIMD acceleration (nightly only) |
| Feature | Description |
|---|---|
full |
All features |
full_no_plotting |
All features except plotting |
Full API documentation: https://docs.rs/audio_samples
The repository includes runnable examples in examples/. Each is
self-contained and annotated with the required feature flags.
Additional demos:
audio_samples_io— Audio file decoding and encodingaudio_samples_playback— Device-level playbackaudio_samples_python— Python bindingsspectrograms— Spectrogram and time–frequency transforms (used by thetransformsfeature)i24— 24-bit signed integer type for Rustdtmf_tones—no_stdDTMF keypad frequencies
MIT License
If you use AudioSamples in research, please cite:
@inproceedings{geraghty2026audio,
author = {Geraghty, Jack and Golpayegani, Fatemeh and Hines, Andrew},
title = {Audio Made Simple: A Modern Framework for Audio Processing},
booktitle = {ACM Multimedia Systems Conference 2026 (MMSys '26)},
year = {2026},
month = apr,
publisher = {ACM},
address = {Hong Kong, Hong Kong},
doi = {10.1145/3793853.3799811},
note = {Accepted for publication}
}Contributions are welcome. Please submit a pull request and see CONTRIBUTING.md for guidance.
