Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
59ae65a
Nov 17, 2024, 3:57 PM
Nov 17, 2024
cb0b4cc
Nov 17, 2024, 4:11 PM
Nov 18, 2024
afdd47b
Nov 17, 2024, 4:12 PM
Nov 18, 2024
91a9605
Nov 17, 2024, 4:17 PM
Nov 18, 2024
1b57e19
Nov 17, 2024, 4:22 PM
Nov 18, 2024
78c45d6
Nov 17, 2024, 4:24 PM
Nov 18, 2024
bb3cea8
Nov 17, 2024, 4:26 PM
Nov 18, 2024
c9f2111
Nov 17, 2024, 4:28 PM
Nov 18, 2024
3936bc8
Nov 17, 2024, 4:29 PM
Nov 18, 2024
6a0740f
Nov 17, 2024, 4:30 PM
Nov 18, 2024
bfca75a
Nov 17, 2024, 4:32 PM
Nov 18, 2024
b099b57
Nov 17, 2024, 4:32 PM
Nov 18, 2024
a8e5787
Nov 17, 2024, 4:33 PM
Nov 18, 2024
1edd190
Nov 17, 2024, 9:59 PM
Nov 18, 2024
42f4cdf
Nov 17, 2024, 10:02 PM
Nov 18, 2024
f714be1
Nov 17, 2024, 10:03 PM
Nov 18, 2024
df3cae0
Nov 17, 2024, 10:04 PM
Nov 18, 2024
86f2c88
Nov 17, 2024, 10:07 PM
Nov 18, 2024
94d5a2a
Nov 17, 2024, 10:08 PM
Nov 18, 2024
f6d781f
Nov 17, 2024, 10:09 PM
Nov 18, 2024
3015985
Nov 17, 2024, 10:18 PM
Nov 18, 2024
34c3cfb
Nov 17, 2024, 10:18 PM
Nov 18, 2024
62d37e8
Nov 17, 2024, 10:24 PM
Nov 18, 2024
8135da8
Nov 17, 2024, 10:26 PM
Nov 18, 2024
ae57b21
Nov 17, 2024, 10:28 PM
Nov 18, 2024
2bafb23
Nov 17, 2024, 10:38 PM
Nov 18, 2024
797b1f8
Nov 17, 2024, 10:41 PM
Nov 18, 2024
8c9a2f6
Nov 17, 2024, 10:43 PM
Nov 18, 2024
68b8a68
Nov 17, 2024, 10:44 PM
Nov 18, 2024
925c6ae
Nov 17, 2024, 10:45 PM
Nov 18, 2024
8eb86b7
Nov 17, 2024, 10:46 PM
Nov 18, 2024
2769d3c
Nov 17, 2024, 10:47 PM
Nov 18, 2024
357c34d
Nov 17, 2024, 10:49 PM
Nov 18, 2024
0ac9acb
Nov 17, 2024, 10:51 PM
Nov 18, 2024
7613dee
Nov 17, 2024, 10:53 PM
Nov 18, 2024
f980972
Nov 17, 2024, 10:55 PM
Nov 18, 2024
f7d7653
Nov 17, 2024, 11:01 PM
Nov 18, 2024
ee2b6ae
Nov 17, 2024, 11:09 PM
Nov 18, 2024
cdf9948
Nov 17, 2024, 11:10 PM
Nov 18, 2024
93f6b91
Nov 17, 2024, 11:22 PM
Nov 18, 2024
86e31c1
Nov 17, 2024, 11:24 PM
Nov 18, 2024
d798be8
Nov 17, 2024, 11:25 PM
Nov 18, 2024
c49b223
Nov 17, 2024, 11:28 PM
Nov 18, 2024
a6cda0e
Nov 17, 2024, 11:30 PM
Nov 18, 2024
9e941a5
Nov 17, 2024, 11:30 PM
Nov 18, 2024
d341d2a
Nov 17, 2024, 11:31 PM
Nov 18, 2024
1d46e7e
Nov 17, 2024, 11:33 PM
Nov 18, 2024
9891ac3
Nov 17, 2024, 11:34 PM
Nov 18, 2024
1f6b189
Nov 17, 2024, 11:34 PM
Nov 18, 2024
e488435
Nov 17, 2024, 11:37 PM
Nov 18, 2024
3affe52
Nov 17, 2024, 11:38 PM
Nov 18, 2024
74c8d5f
Nov 17, 2024, 11:41 PM
Nov 18, 2024
47b08d2
Nov 17, 2024, 11:43 PM
Nov 18, 2024
d9f4634
Nov 17, 2024, 11:46 PM
Nov 18, 2024
c752c64
Nov 17, 2024, 11:47 PM
Nov 18, 2024
8ea5496
Nov 17, 2024, 11:48 PM
Nov 18, 2024
9e4e14d
Nov 17, 2024, 11:50 PM
Nov 18, 2024
781723b
Nov 17, 2024, 11:57 PM
Nov 18, 2024
a2966c7
Nov 18, 2024, 12:15 AM
Nov 18, 2024
7703ac3
Nov 18, 2024, 12:16 AM
Nov 18, 2024
7feb4a2
Nov 18, 2024, 12:18 AM
Nov 18, 2024
72472a8
Nov 18, 2024, 12:22 AM
Nov 18, 2024
16de3cc
Nov 18, 2024, 12:27 AM
Nov 18, 2024
ab2eb22
Nov 18, 2024, 12:28 AM
Nov 18, 2024
e8a218c
Nov 18, 2024, 12:30 AM
Nov 18, 2024
f956617
Nov 18, 2024, 12:34 AM
Nov 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@

# Ignore macOS metadata files
.DS_Store
._*
Thumbs.db

# Ignore Python bytecode
__pycache__/
*.py[cod]

# Ignore logs and backup files
*.log
_backup/

# Ignore output files like spectrograms
*.png

# Ignore directories created by the script
sample-shrinker-python/_backup/
sample-shrinker-python/*.log
sample-shrinker-python/*.png

# Virtual environment files
venv/
env/
.venv
sample-shrinker_venv/
191 changes: 191 additions & 0 deletions sample-shrinker-python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
# Sample Shrinker

A Python script to conditionally batch-convert audio samples into minimal `.wav` files and manage duplicate audio files. This script is useful for saving storage space, reducing I/O stress during simultaneous real-time streaming of multiple `.wav` files, and cleaning up duplicate samples across your library.

## Features

### Sample Conversion
- **Conditional Conversion**: Only converts samples that don't meet the target criteria (bit depth, channels, etc.)
- **Auto-Mono**: Automatically convert stereo samples to mono if the content is effectively mono
- **Backup and Spectrogram Generation**: Converted files are backed up with original folder structure preserved
- **Pre-Normalization**: Optionally normalize samples before downsampling bit depth
- **Parallel Processing**: Process multiple files simultaneously for faster conversions

### Duplicate Management
- **Multi-Level Detection**: Finds duplicates at both directory and file levels
- **Intelligent Matching**: Uses file size, content hashes, and audio fingerprinting
- **Audio Fingerprinting**: Uses spectral analysis to detect similar audio content
- **Safe Defaults**: Moves duplicates to backup instead of deleting
- **Directory Structure**: Maintains original folder structure in backup directory

## Requirements

- Python 3.10 or later
- Required Python packages (install with `pip install -r requirements.txt`):
```
librosa==0.10.2.post1
matplotlib==3.9.2
numpy
pydub==0.25.1
questionary==2.0.1
soundfile==0.12.1
scipy>=1.11.0
```
- `ffmpeg` or `libav` installed for audio processing

Install system dependencies:
```bash
# MacOS with Homebrew
brew install ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg
```

## Usage

### Interactive Mode (Recommended)
Simply run the script without arguments:
```bash
python sample-shrinker.py
```

The interactive interface will guide you through:
1. Choosing between sample conversion or duplicate removal
2. Selecting directories/files to process (add multiple paths)
3. Configuring operation-specific options
4. Setting advanced parameters

### Command Line Mode
For automation or scripting:
```bash
python sample-shrinker.py [options] FILE|DIRECTORY ...
```

## Sample Conversion Options

### Interactive Configuration
When choosing "Shrink samples", configure:
- Target bit depth (8, 16, or 24 bit)
- Channel count (mono or stereo)
- Sample rate (22050, 44100, or 48000 Hz)
- Advanced options:
- Auto-mono conversion
- Pre-normalization
- Spectrogram generation
- Parallel processing
- Minimum sample rate
- Minimum bit depth
- Dry run preview

### Command Line Options
- `-b BIT_DEPTH`: Set target bit depth (default: 16)
- `-B MIN_BIT_DEPTH`: Set minimum bit depth
- `-c CHANNELS`: Set target channels (1=mono, 2=stereo)
- `-r SAMPLERATE`: Set target sample rate (default: 44100)
- `-R MIN_SAMPLERATE`: Set minimum sample rate
- `-a`: Enable auto-mono conversion
- `-p`: Enable pre-normalization
- `-j JOBS`: Set number of parallel jobs
- `-n`: Preview changes without converting
- `-d BACKUP_DIR`: Set backup directory (default: _backup)

## Duplicate Removal Options

### Interactive Configuration
When choosing "Remove duplicates", configure:
- Audio matching options:
- Similarity threshold (80-95%)
- File length comparison
- Sample rate comparison
- Channel count comparison
- Filename handling:
- Match by name and content
- Match by content only
- Duplicate handling:
- Move to backup (safe)
- Delete immediately
- Preview only

### Detection Process
1. **Directory Level**:
- Finds directories with matching names
- Compares file counts and total sizes
- Verifies exact content matches
- Keeps oldest copy, moves others to backup

2. **File Level**:
- Groups files by size (fast initial filter)
- Performs quick hash comparison for exact matches
- Uses audio fingerprinting for similar content detection
- Maintains original directory structure in backup

### Audio Fingerprinting
- Converts audio to mono for comparison
- Generates spectral fingerprints
- Compares frequency content
- Provides similarity scores as percentages
- Configurable similarity threshold

### Safety Features
- Dry run option to preview changes
- Backup by default instead of deletion
- Verification of file accessibility
- Symlink detection
- Lock checking
- Detailed progress reporting
- Original folder structure preserved in backups

## Examples

### Basic Sample Conversion
```bash
# Interactive mode with guided configuration
python sample-shrinker.py

# Command line with specific options
python sample-shrinker.py -c 1 -b 16 -a samples/
```

### Duplicate Removal
```bash
# Interactive mode (recommended)
python sample-shrinker.py

# Preview duplicate detection
python sample-shrinker.py samples/ -n
```

### Output Example
```
Processing file: samples/drums/kick.wav
samples/drums/kick.wav [CHANGED]: bit depth 24 -> 16, auto-mono

Found duplicate directories named 'drums' with 10 files (1.2MB):
Keeping oldest copy: samples/drums (created: Thu Mar 21 10:00:00 2024)
Moving duplicate: samples/backup/drums (created: Thu Mar 21 11:30:00 2024)

Found similar files: 'snare.wav' (250KB)
Similarity scores:
snare_old.wav: 92.5% similar
snare_copy.wav: 95.8% similar
Keeping oldest copy: samples/snare.wav
Moving similar files to backup...
```

## Directory Structure
```
samples/ # Original directory
drums/
kick.wav
snare.wav
_backup/ # Backup directory
samples/ # Original structure preserved
drums/
kick.wav.old # Original files
kick.wav.old.png # Spectrograms
kick.wav.new.png
```

## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
8 changes: 8 additions & 0 deletions sample-shrinker-python/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
librosa==0.10.2.post1
matplotlib==3.9.2
numpy==2.1.3
pydub==0.25.1
questionary==2.0.1
rich==13.9.4
scipy==1.14.1
soundfile==0.12.1
Loading