Skip to content

Commit 10fa098

Browse files
authored
Add file-level chunking for large audio files (fixes #158) (#256)
* Add file-level chunking for large audio files (#158) - Add AudioChunker class for splitting/merging audio files - Add chunk_duration parameter to Separator class - Implement _process_with_chunking() for chunk-based processing - Add --chunk_duration CLI option - Chunks are processed sequentially with simple concatenation - GPU cache cleared between chunks to manage memory This implementation follows the reference approach from issue #44 without overlap/crossfade (can be added in future PR if needed). * Add comprehensive unit tests for AudioChunker - 17 test cases covering initialization, splitting, merging - Tests for edge cases (short files, exact multiples, boundaries) - Integration tests with actual audio segments - Mock tests for error handling - All tests passing * docs: Add documentation for large file processing with chunking Updated README.md with comprehensive section on using --chunk_duration option for processing large audio files. Documents the split-process-merge workflow, benefits, recommendations, and limitation (simple concatenation). Relates to #158 * fix: Ensure chunk files are saved to temp directory and cleaned up properly Fixed two issues in chunk processing: 1. Convert relative output file paths to absolute paths when collecting chunk results 2. Temporarily change both Separator and model_instance output_dir to temp directory during chunk processing This ensures chunk files are saved to the temp directory and automatically cleaned up, keeping the output directory clean with only final merged files. Relates to #158 * refactor: Simplify docstrings and remove redundant comments - Simplified audio_chunking.py module docstring to match existing codebase style - Removed self-explanatory inline comments in _process_with_chunking - Kept important comments for non-obvious operations (GPU cache clearing) - All unit tests passing (17/17) Relates to #158 * fix: Adjust log levels for internal operations - Changed 'Loading audio file' to debug (internal AudioChunker operation) - Changed 'Created temporary directory' to debug (low-priority detail) - Removed 'File-level chunking enabled' log from __init__ (unnecessary, already logged when actually used) - INFO level now shows only user-relevant information: - Splitting/merging progress - Chunk processing status - Completion messages Relates to #158 * refactor: Remove redundant 'Successfully split' log message The completion message after splitting is redundant since we already log the splitting operation. Follows existing codebase pattern where 'completed' messages are typically debug level. Relates to #158 * Fix multi-stem model support in chunking - Replace hardcoded primary/secondary lists with dynamic stem dictionary - Extract stem names from chunk filenames using regex pattern - Support 2-stem, 4-stem, and 6-stem models (MDX, Demucs, Demucs 6s) - Add re module import for stem name extraction - Update README to document multi-stem support Previously, only the first 2 stems were preserved when using chunking with 4-stem or 6-stem models. This fix ensures all stems are correctly processed and merged. The chunking feature now supports: - 2-stem models (e.g., MDX): Vocals + Instrumental - 4-stem models (e.g., Demucs): Drums, Bass, Other, Vocals - 6-stem models (e.g., Demucs 6s): Bass, Drums, Other, Vocals, Guitar, Piano * Add comprehensive unit tests for chunking functionality Add 15 new unit tests for Separator chunking logic, covering: **Basic Functionality (6 tests):** - 2-stem model compatibility (Vocals, Instrumental) - 4-stem Demucs model (Drums, Bass, Other, Vocals) - 6-stem Demucs model (all 6 stems) - Stem name extraction from filenames with regex - Fallback handling for non-matching patterns - Sorted stem order in merged output **Internal Logic & State Management (6 tests):** - State restoration after chunking (chunk_duration, output_dir) - GPU cache clearing between chunks - Temporary directory cleanup verification - State restoration on error (exception handling) - AudioChunker initialization with correct parameters - custom_output_names parameter handling **Edge Cases (3 tests):** - Empty output handling (no stems produced) - Inconsistent stem counts across chunks - Filename pattern match failure with fallback naming Total test coverage: 32 tests (17 AudioChunker + 15 Separator chunking) All tests passing. * Fix test_cli.py to include chunk_duration parameter The common_expected_args fixture was missing the new chunk_duration parameter added to Separator.__init__(), causing all CLI tests to fail in CI. Also corrected parameter order to match actual constructor (use_soundfile before use_autocast). Fixes: - Added chunk_duration: None to expected args - Reordered use_soundfile/use_autocast to match Separator.__init__() All tests/unit/test_cli.py tests now pass (13 passed, 2 skipped). * Fix ruff lint errors in test_audio_chunking.py Addressed CodeRabbit feedback: 1. Renamed unused mock parameter mock_makedirs to _mock_makedirs (ARG002) 2. Renamed unused lambda parameter key to _ (ARG005) 3. Replaced hardcoded /tmp paths with pytest tmp_path fixture (S108) All 17 tests still pass after these changes.
1 parent bfa7380 commit 10fa098

File tree

7 files changed

+1263
-1
lines changed

7 files changed

+1263
-1
lines changed

README.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -278,6 +278,46 @@ For programmatic use, you can output the model list in JSON format:
278278
audio-separator -l --list_format=json
279279
```
280280

281+
### Processing Large Files
282+
283+
For very long audio files (>1 hour), you may encounter out-of-memory errors. The `--chunk_duration` option automatically splits large files into smaller chunks, processes them separately, and merges the results:
284+
285+
```sh
286+
# Process an 8-hour podcast in 10-minute chunks
287+
audio-separator long_podcast.wav --chunk_duration 600
288+
289+
# Adjust chunk size based on available memory
290+
audio-separator very_long_audio.wav --chunk_duration 300 # 5-minute chunks
291+
```
292+
293+
#### How It Works
294+
295+
1. **Split**: The input file is split into fixed-duration chunks (e.g., 10 minutes)
296+
2. **Process**: Each chunk is processed separately, reducing peak memory usage
297+
3. **Merge**: The results are merged back together with simple concatenation
298+
299+
The chunking feature supports all model types:
300+
- **2-stem models** (e.g., MDX): Vocals + Instrumental
301+
- **4-stem models** (e.g., Demucs): Drums, Bass, Other, Vocals
302+
- **6-stem models** (e.g., Demucs 6s): Bass, Drums, Other, Vocals, Guitar, Piano
303+
304+
#### Benefits
305+
306+
- **Prevents OOM errors**: Process files of any length without running out of memory
307+
- **Predictable memory usage**: Memory usage stays bounded regardless of file length
308+
- **No quality loss**: Each chunk is fully processed with the selected model
309+
- **Multi-stem support**: Works seamlessly with 2, 4, and 6-stem models
310+
311+
#### Recommendations
312+
313+
- **Files > 1 hour**: Use `--chunk_duration 600` (10 minutes)
314+
- **Limited memory systems**: Use smaller chunks (300-600 seconds)
315+
- **Ample memory**: You may not need chunking at all
316+
317+
#### Note on Audio Quality
318+
319+
Chunks are concatenated without crossfading, which may result in minor artifacts at chunk boundaries in rare cases. For most use cases, these are not noticeable. The simple concatenation approach keeps processing time minimal while solving out-of-memory issues.
320+
281321
### Full command-line interface options
282322

283323
```sh
Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
"""Audio chunking utilities for processing large audio files to prevent OOM errors."""
2+
3+
import os
4+
import logging
5+
from typing import List
6+
from pydub import AudioSegment
7+
8+
9+
class AudioChunker:
10+
"""
11+
Handles splitting and merging of large audio files.
12+
13+
This class provides utilities to:
14+
- Split large audio files into fixed-duration chunks
15+
- Merge processed chunks back together with simple concatenation
16+
- Determine if a file should be chunked based on its duration
17+
18+
Example:
19+
>>> chunker = AudioChunker(chunk_duration_seconds=600) # 10-minute chunks
20+
>>> chunk_paths = chunker.split_audio("long_audio.wav", "/tmp/chunks")
21+
>>> # Process each chunk...
22+
>>> output_path = chunker.merge_chunks(processed_chunks, "output.wav")
23+
"""
24+
25+
def __init__(self, chunk_duration_seconds: float, logger: logging.Logger = None):
26+
"""
27+
Initialize the AudioChunker.
28+
29+
Args:
30+
chunk_duration_seconds: Duration of each chunk in seconds
31+
logger: Optional logger instance for logging operations
32+
"""
33+
self.chunk_duration_ms = int(chunk_duration_seconds * 1000)
34+
self.logger = logger or logging.getLogger(__name__)
35+
36+
def split_audio(self, input_path: str, output_dir: str) -> List[str]:
37+
"""
38+
Split audio file into fixed-size chunks.
39+
40+
Args:
41+
input_path: Path to the input audio file
42+
output_dir: Directory where chunk files will be saved
43+
44+
Returns:
45+
List of paths to the created chunk files
46+
47+
Raises:
48+
FileNotFoundError: If input file doesn't exist
49+
IOError: If there's an error reading or writing audio files
50+
"""
51+
if not os.path.exists(input_path):
52+
raise FileNotFoundError(f"Input file not found: {input_path}")
53+
54+
if not os.path.exists(output_dir):
55+
os.makedirs(output_dir)
56+
57+
self.logger.debug(f"Loading audio file: {input_path}")
58+
audio = AudioSegment.from_file(input_path)
59+
60+
total_duration_ms = len(audio)
61+
chunk_paths = []
62+
63+
# Calculate number of chunks
64+
num_chunks = (total_duration_ms + self.chunk_duration_ms - 1) // self.chunk_duration_ms
65+
self.logger.info(f"Splitting {total_duration_ms / 1000:.1f}s audio into {num_chunks} chunks of {self.chunk_duration_ms / 1000:.1f}s each")
66+
67+
# Get file extension from input
68+
_, ext = os.path.splitext(input_path)
69+
if not ext:
70+
ext = ".wav" # Default to WAV if no extension
71+
72+
# Split into chunks
73+
for i in range(num_chunks):
74+
start_ms = i * self.chunk_duration_ms
75+
end_ms = min(start_ms + self.chunk_duration_ms, total_duration_ms)
76+
77+
chunk = audio[start_ms:end_ms]
78+
chunk_filename = f"chunk_{i:04d}{ext}"
79+
chunk_path = os.path.join(output_dir, chunk_filename)
80+
81+
self.logger.debug(f"Exporting chunk {i + 1}/{num_chunks}: {start_ms / 1000:.1f}s - {end_ms / 1000:.1f}s to {chunk_path}")
82+
chunk.export(chunk_path, format=ext.lstrip('.'))
83+
chunk_paths.append(chunk_path)
84+
85+
return chunk_paths
86+
87+
def merge_chunks(self, chunk_paths: List[str], output_path: str) -> str:
88+
"""
89+
Merge processed chunks with simple concatenation.
90+
91+
Args:
92+
chunk_paths: List of paths to chunk files to merge
93+
output_path: Path where the merged output will be saved
94+
95+
Returns:
96+
Path to the merged output file
97+
98+
Raises:
99+
ValueError: If chunk_paths is empty
100+
FileNotFoundError: If any chunk file doesn't exist
101+
IOError: If there's an error reading or writing audio files
102+
"""
103+
if not chunk_paths:
104+
raise ValueError("Cannot merge empty list of chunks")
105+
106+
# Verify all chunks exist
107+
for chunk_path in chunk_paths:
108+
if not os.path.exists(chunk_path):
109+
raise FileNotFoundError(f"Chunk file not found: {chunk_path}")
110+
111+
self.logger.info(f"Merging {len(chunk_paths)} chunks into {output_path}")
112+
113+
# Start with empty audio segment
114+
combined = AudioSegment.empty()
115+
116+
# Concatenate all chunks
117+
for i, chunk_path in enumerate(chunk_paths):
118+
self.logger.debug(f"Loading chunk {i + 1}/{len(chunk_paths)}: {chunk_path}")
119+
chunk = AudioSegment.from_file(chunk_path)
120+
combined += chunk # Simple concatenation
121+
122+
# Get output format from file extension
123+
_, ext = os.path.splitext(output_path)
124+
output_format = ext.lstrip('.') if ext else 'wav'
125+
126+
self.logger.info(f"Exporting merged audio ({len(combined) / 1000:.1f}s) to {output_path}")
127+
combined.export(output_path, format=output_format)
128+
129+
return output_path
130+
131+
def should_chunk(self, audio_duration_seconds: float) -> bool:
132+
"""
133+
Determine if file is large enough to benefit from chunking.
134+
135+
Args:
136+
audio_duration_seconds: Duration of the audio file in seconds
137+
138+
Returns:
139+
True if the file should be chunked, False otherwise
140+
"""
141+
return audio_duration_seconds > (self.chunk_duration_ms / 1000)

audio_separator/separator/separator.py

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
import warnings
1111
import importlib
1212
import io
13+
import re
1314
from typing import Optional
1415

1516
import hashlib
@@ -94,6 +95,7 @@ def __init__(
9495
use_soundfile=False,
9596
use_autocast=False,
9697
use_directml=False,
98+
chunk_duration=None,
9799
mdx_params={"hop_length": 1024, "segment_size": 256, "overlap": 0.25, "batch_size": 1, "enable_denoise": False},
98100
vr_params={"batch_size": 1, "window_size": 512, "aggression": 5, "enable_tta": False, "enable_post_process": False, "post_process_threshold": 0.2, "high_end_process": False},
99101
demucs_params={"segment_size": "Default", "shifts": 2, "overlap": 0.25, "segments_enabled": True},
@@ -182,6 +184,11 @@ def __init__(
182184
self.use_autocast = use_autocast
183185
self.use_directml = use_directml
184186

187+
self.chunk_duration = chunk_duration
188+
if chunk_duration is not None:
189+
if chunk_duration <= 0:
190+
raise ValueError("chunk_duration must be greater than 0")
191+
185192
# These are parameters which users may want to configure so we expose them to the top-level Separator class,
186193
# even though they are specific to a single model architecture
187194
self.arch_specific_params = {"MDX": mdx_params, "VR": vr_params, "Demucs": demucs_params, "MDXC": mdxc_params}
@@ -866,6 +873,18 @@ def _separate_file(self, audio_file_path, custom_output_names=None):
866873
Returns:
867874
- output_files (list of str): A list containing the paths to the separated audio stem files.
868875
"""
876+
# Check if chunking is enabled and file is large enough
877+
if self.chunk_duration is not None:
878+
import librosa
879+
duration = librosa.get_duration(path=audio_file_path)
880+
881+
from audio_separator.separator.audio_chunking import AudioChunker
882+
chunker = AudioChunker(self.chunk_duration, self.logger)
883+
884+
if chunker.should_chunk(duration):
885+
self.logger.info(f"File duration {duration:.1f}s exceeds chunk size {self.chunk_duration}s, using chunked processing")
886+
return self._process_with_chunking(audio_file_path, custom_output_names)
887+
869888
# Log the start of the separation process
870889
self.logger.info(f"Starting separation process for audio_file_path: {audio_file_path}")
871890
separate_start_time = time.perf_counter()
@@ -899,6 +918,117 @@ def _separate_file(self, audio_file_path, custom_output_names=None):
899918

900919
return output_files
901920

921+
def _process_with_chunking(self, audio_file_path, custom_output_names=None):
922+
"""
923+
Process large file by splitting into chunks.
924+
925+
This method splits a large audio file into smaller chunks, processes each chunk
926+
separately, and merges the results back together. This helps prevent out-of-memory
927+
errors when processing very long audio files.
928+
929+
Parameters:
930+
- audio_file_path (str): The path to the audio file.
931+
- custom_output_names (dict, optional): Custom names for the output files. Defaults to None.
932+
933+
Returns:
934+
- output_files (list of str): A list containing the paths to the separated audio stem files.
935+
"""
936+
import tempfile
937+
import shutil
938+
from audio_separator.separator.audio_chunking import AudioChunker
939+
940+
# Create temporary directory for chunks
941+
temp_dir = tempfile.mkdtemp(prefix="audio-separator-chunks-")
942+
self.logger.debug(f"Created temporary directory for chunks: {temp_dir}")
943+
944+
try:
945+
# Split audio into chunks
946+
chunker = AudioChunker(self.chunk_duration, self.logger)
947+
chunk_paths = chunker.split_audio(audio_file_path, temp_dir)
948+
949+
# Process each chunk
950+
processed_chunks_by_stem = {}
951+
952+
for i, chunk_path in enumerate(chunk_paths):
953+
self.logger.info(f"Processing chunk {i+1}/{len(chunk_paths)}: {chunk_path}")
954+
955+
original_chunk_duration = self.chunk_duration
956+
original_output_dir = self.output_dir
957+
self.chunk_duration = None
958+
self.output_dir = temp_dir
959+
960+
if self.model_instance:
961+
original_model_output_dir = self.model_instance.output_dir
962+
self.model_instance.output_dir = temp_dir
963+
964+
try:
965+
output_files = self._separate_file(chunk_path, custom_output_names)
966+
967+
# Dynamically group chunks by stem name
968+
for stem_path in output_files:
969+
# Extract stem name from filename: "chunk_0000_(Vocals).wav" → "Vocals"
970+
filename = os.path.basename(stem_path)
971+
match = re.search(r'_\(([^)]+)\)', filename)
972+
if match:
973+
stem_name = match.group(1)
974+
else:
975+
# Fallback: use index-based name if pattern not found
976+
stem_index = len([k for k in processed_chunks_by_stem.keys() if k.startswith('stem_')])
977+
stem_name = f"stem_{stem_index}"
978+
self.logger.warning(f"Could not extract stem name from {filename}, using {stem_name}")
979+
980+
if stem_name not in processed_chunks_by_stem:
981+
processed_chunks_by_stem[stem_name] = []
982+
983+
# Ensure absolute path
984+
abs_path = stem_path if os.path.isabs(stem_path) else os.path.join(temp_dir, stem_path)
985+
processed_chunks_by_stem[stem_name].append(abs_path)
986+
987+
if not output_files:
988+
self.logger.warning(f"Chunk {i+1} produced no output files")
989+
990+
finally:
991+
self.chunk_duration = original_chunk_duration
992+
self.output_dir = original_output_dir
993+
if self.model_instance:
994+
self.model_instance.output_dir = original_model_output_dir
995+
996+
# Clear GPU cache between chunks
997+
if self.model_instance:
998+
self.model_instance.clear_gpu_cache()
999+
1000+
# Merge chunks for each stem dynamically
1001+
base_name = os.path.splitext(os.path.basename(audio_file_path))[0]
1002+
output_files = []
1003+
1004+
for stem_name in sorted(processed_chunks_by_stem.keys()):
1005+
chunk_paths_for_stem = processed_chunks_by_stem[stem_name]
1006+
1007+
if not chunk_paths_for_stem:
1008+
self.logger.warning(f"No chunks found for stem: {stem_name}")
1009+
continue
1010+
1011+
# Determine output filename
1012+
if custom_output_names and stem_name in custom_output_names:
1013+
output_filename = custom_output_names[stem_name]
1014+
else:
1015+
output_filename = f"{base_name}_({stem_name})"
1016+
1017+
output_path = os.path.join(self.output_dir, f"{output_filename}.{self.output_format.lower()}")
1018+
1019+
self.logger.info(f"Merging {len(chunk_paths_for_stem)} chunks for stem: {stem_name}")
1020+
chunker.merge_chunks(chunk_paths_for_stem, output_path)
1021+
output_files.append(output_path)
1022+
1023+
self.logger.info(f"Chunked processing completed. Output files: {output_files}")
1024+
return output_files
1025+
1026+
finally:
1027+
# Clean up temporary directory
1028+
if os.path.exists(temp_dir):
1029+
self.logger.debug(f"Cleaning up temporary directory: {temp_dir}")
1030+
shutil.rmtree(temp_dir, ignore_errors=True)
1031+
9021032
def download_model_and_data(self, model_filename):
9031033
"""
9041034
Downloads the model file without loading it into memory.

audio_separator/utils/cli.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ def main():
5959
sample_rate_help = "Modify the sample rate of the output audio (default: %(default)s). Example: --sample_rate=44100"
6060
use_soundfile_help = "Use soundfile to write audio output (default: %(default)s). Example: --use_soundfile"
6161
use_autocast_help = "Use PyTorch autocast for faster inference (default: %(default)s). Do not use for CPU inference. Example: --use_autocast"
62+
chunk_duration_help = "Split audio into chunks of this duration in seconds (default: %(default)s = no chunking). Useful for processing very long audio files on systems with limited memory. Recommended: 600 (10 minutes) for files >1 hour. Chunks are concatenated without overlap/crossfade. Example: --chunk_duration=600"
6263
custom_output_names_help = 'Custom names for all output files in JSON format (default: %(default)s). Example: --custom_output_names=\'{"Vocals": "vocals_output", "Drums": "drums_output"}\''
6364

6465
common_params = parser.add_argument_group("Common Separation Parameters")
@@ -69,6 +70,7 @@ def main():
6970
common_params.add_argument("--sample_rate", type=int, default=44100, help=sample_rate_help)
7071
common_params.add_argument("--use_soundfile", action="store_true", help=use_soundfile_help)
7172
common_params.add_argument("--use_autocast", action="store_true", help=use_autocast_help)
73+
common_params.add_argument("--chunk_duration", type=float, default=None, help=chunk_duration_help)
7274
common_params.add_argument("--custom_output_names", type=json.loads, default=None, help=custom_output_names_help)
7375

7476
mdx_segment_size_help = "Larger consumes more resources, but may give better results (default: %(default)s). Example: --mdx_segment_size=256"
@@ -200,6 +202,7 @@ def main():
200202
sample_rate=args.sample_rate,
201203
use_soundfile=args.use_soundfile,
202204
use_autocast=args.use_autocast,
205+
chunk_duration=args.chunk_duration,
203206
mdx_params={
204207
"hop_length": args.mdx_hop_length,
205208
"segment_size": args.mdx_segment_size,

0 commit comments

Comments
 (0)