-
-
Notifications
You must be signed in to change notification settings - Fork 161
Add file-level chunking for large audio files (fixes #158) #256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add file-level chunking for large audio files (fixes #158) #256
Conversation
- Add AudioChunker class for splitting/merging audio files - Add chunk_duration parameter to Separator class - Implement _process_with_chunking() for chunk-based processing - Add --chunk_duration CLI option - Chunks are processed sequentially with simple concatenation - GPU cache cleared between chunks to manage memory This implementation follows the reference approach from issue nomadkaraoke#44 without overlap/crossfade (can be added in future PR if needed).
- 17 test cases covering initialization, splitting, merging - Tests for edge cases (short files, exact multiples, boundaries) - Integration tests with actual audio segments - Mock tests for error handling - All tests passing
Updated README.md with comprehensive section on using --chunk_duration option for processing large audio files. Documents the split-process-merge workflow, benefits, recommendations, and limitation (simple concatenation). Relates to nomadkaraoke#158
…operly Fixed two issues in chunk processing: 1. Convert relative output file paths to absolute paths when collecting chunk results 2. Temporarily change both Separator and model_instance output_dir to temp directory during chunk processing This ensures chunk files are saved to the temp directory and automatically cleaned up, keeping the output directory clean with only final merged files. Relates to nomadkaraoke#158
- Simplified audio_chunking.py module docstring to match existing codebase style - Removed self-explanatory inline comments in _process_with_chunking - Kept important comments for non-obvious operations (GPU cache clearing) - All unit tests passing (17/17) Relates to nomadkaraoke#158
- Changed 'Loading audio file' to debug (internal AudioChunker operation) - Changed 'Created temporary directory' to debug (low-priority detail) - Removed 'File-level chunking enabled' log from __init__ (unnecessary, already logged when actually used) - INFO level now shows only user-relevant information: - Splitting/merging progress - Chunk processing status - Completion messages Relates to nomadkaraoke#158
The completion message after splitting is redundant since we already log the splitting operation. Follows existing codebase pattern where 'completed' messages are typically debug level. Relates to nomadkaraoke#158
- Replace hardcoded primary/secondary lists with dynamic stem dictionary - Extract stem names from chunk filenames using regex pattern - Support 2-stem, 4-stem, and 6-stem models (MDX, Demucs, Demucs 6s) - Add re module import for stem name extraction - Update README to document multi-stem support Previously, only the first 2 stems were preserved when using chunking with 4-stem or 6-stem models. This fix ensures all stems are correctly processed and merged. The chunking feature now supports: - 2-stem models (e.g., MDX): Vocals + Instrumental - 4-stem models (e.g., Demucs): Drums, Bass, Other, Vocals - 6-stem models (e.g., Demucs 6s): Bass, Drums, Other, Vocals, Guitar, Piano
Add 15 new unit tests for Separator chunking logic, covering: **Basic Functionality (6 tests):** - 2-stem model compatibility (Vocals, Instrumental) - 4-stem Demucs model (Drums, Bass, Other, Vocals) - 6-stem Demucs model (all 6 stems) - Stem name extraction from filenames with regex - Fallback handling for non-matching patterns - Sorted stem order in merged output **Internal Logic & State Management (6 tests):** - State restoration after chunking (chunk_duration, output_dir) - GPU cache clearing between chunks - Temporary directory cleanup verification - State restoration on error (exception handling) - AudioChunker initialization with correct parameters - custom_output_names parameter handling **Edge Cases (3 tests):** - Empty output handling (no stems produced) - Inconsistent stem counts across chunks - Filename pattern match failure with fallback naming Total test coverage: 32 tests (17 AudioChunker + 15 Separator chunking) All tests passing.
WalkthroughAdds configurable audio chunking: new AudioChunker utility, integration into Separator with optional chunk_duration (CLI flag), per-chunk processing and merging flow, extensive unit/integration tests for chunking, and README documentation describing the chunk/split/merge workflow. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI as CLI
participant Sep as Separator
participant Chunk as AudioChunker
participant Model as Separator._separate_file
participant FS as FileSystem
User->>CLI: run with --chunk_duration=300
CLI->>Sep: Separator(chunk_duration=300)
Sep->>Chunk: should_chunk(input_duration)
Chunk-->>Sep: True
Sep->>Chunk: split_audio(input_path, temp_dir)
Chunk->>FS: load audio & write chunk files
Chunk-->>Sep: [chunk_0000, chunk_0001...]
loop per chunk
Sep->>Model: _separate_file(chunk_N)
Model-->>Sep: per-chunk stem outputs
Sep->>FS: store per-chunk outputs grouped by stem
Sep->>Sep: torch.cuda.empty_cache() / mem cleanup
end
loop per stem
Sep->>Chunk: merge_chunks(stem_chunk_paths, final_stem_output)
Chunk->>FS: read & concatenate chunk stems
Chunk->>FS: export merged stem file
Chunk-->>Sep: merged_stem_path
end
Sep->>FS: cleanup temp_dir
Sep-->>User: final merged output files
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (2)**/*.py📄 CodeRabbit inference engine (.cursor/rules/specify-rules.mdc)
Files:
tests/**/*.py📄 CodeRabbit inference engine (.cursor/rules/specify-rules.mdc)
Files:
🧬 Code graph analysis (1)tests/unit/test_audio_chunking.py (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
🔇 Additional comments (8)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@tests/unit/test_audio_chunking.py`:
- Around line 57-60: The test functions (e.g., test_split_audio_basic) declare a
patched parameter named mock_makedirs that is unused and triggers ruff ARG002;
rename that parameter to _mock_makedirs (or _) in the function signature for all
affected tests (including the one at the other occurrence around line 88) so the
`@patch` order remains unchanged but the unused arg is explicitly ignored.
- Line 65: Replace the unused lambda parameter name with an underscore to
satisfy ARG005: update the Mock side_effect assignments (the lines setting
mock_audio.__getitem__ = Mock(side_effect=lambda key: mock_audio) and the
similar occurrence around the other test) to use lambda _: mock_audio (or lambda
*_: mock_audio) so the unused arg is named `_`; keep the rest of the mocking
logic unchanged.
- Around line 111-115: Replace hardcoded "/tmp" usages in the split-audio tests
with pytest's tmp_path fixture: update test_split_audio_file_not_found to accept
tmp_path and call self.chunker.split_audio("nonexistent.wav", tmp_path) (or
tmp_path.as_posix() if a string path is required), and make the same replacement
for the other split_audio tests that currently pass "/tmp" so they use tmp_path
for secure, isolated temp directories; ensure test signatures include the
tmp_path parameter and any assertions use the fixture path.
🧹 Nitpick comments (6)
audio_separator/separator/separator.py (3)
876-886: Consider caching the duration check to avoid redundant I/O.The
librosa.get_duration()call loads the audio file to determine its duration, and thenAudioChunker.split_audio()will load the file again via pydub. For very large files, this double-load could be noticeable.💡 Optional optimization
Consider passing the pre-computed duration to the chunker or having
split_audioreturn the duration as a side effect to avoid loading the file twice. However, for the scope of this PR (preventing OOM), the current approach is acceptable since the duration check is lightweight compared to the actual separation.
952-965: State modification pattern is correct but could benefit from a context manager.The pattern of saving original values, modifying state, and restoring in finally is correct. However, the restoration happens inside a nested try/finally which makes the flow complex.
Note: The current implementation works correctly. A future refactor could extract this into a context manager for cleaner state management.
1026-1030: Cleanup uses ignore_errors=True which silently swallows errors.Using
ignore_errors=Trueinshutil.rmtreemeans any cleanup failures will be silently ignored. This is acceptable for temporary files, but consider logging a warning if cleanup fails.💡 Optional: Add cleanup failure logging
finally: # Clean up temporary directory if os.path.exists(temp_dir): self.logger.debug(f"Cleaning up temporary directory: {temp_dir}") - shutil.rmtree(temp_dir, ignore_errors=True) + try: + shutil.rmtree(temp_dir) + except OSError as e: + self.logger.warning(f"Failed to clean up temporary directory {temp_dir}: {e}")audio_separator/separator/audio_chunking.py (3)
25-34: Type hint should use explicit Optional instead of implicit None default.The static analysis correctly flags this as PEP 484 violation. The parameter allows None but the type hint doesn't reflect this.
♻️ Fix type hint
- def __init__(self, chunk_duration_seconds: float, logger: logging.Logger = None): + def __init__(self, chunk_duration_seconds: float, logger: logging.Logger | None = None):
67-82: Consider validating that the file extension is supported by pydub/ffmpeg.The code defaults to
.wavif no extension is present, but if an unsupported extension is provided,chunk.export()will fail at runtime with a potentially confusing error.This is a minor edge case since most users will provide standard audio formats. The current behavior delegates validation to pydub/ffmpeg which will provide appropriate errors.
113-120: Concatenation loop may be inefficient for many chunks.The
combined += chunkpattern creates a new AudioSegment on each iteration. For a large number of chunks (e.g., 48 chunks for an 8-hour file with 10-minute chunks), this could be slow.💡 Alternative using reduce or sum
# Alternative approach that may be more efficient: from functools import reduce combined = reduce(lambda acc, path: acc + AudioSegment.from_file(path), chunk_paths, AudioSegment.empty())However, for typical use cases (tens of chunks), the performance difference is negligible. This can be addressed later if profiling shows it's a bottleneck.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
README.mdaudio_separator/separator/audio_chunking.pyaudio_separator/separator/separator.pyaudio_separator/utils/cli.pytests/unit/test_audio_chunking.pytests/unit/test_separator_chunking.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/specify-rules.mdc)
**/*.py: Use Python 3.11+ syntax and features
Follow standard Python conventions for Python 3.11+
Use PyTorch for deep learning operations
Use librosa for audio loading and processing
Use soundfile for audio I/O operations
Use numpy for numerical array operations
Use onnxruntime for model inference when applicable
Use ruff for code linting and formatting checks
Files:
audio_separator/separator/audio_chunking.pytests/unit/test_audio_chunking.pyaudio_separator/separator/separator.pyaudio_separator/utils/cli.pytests/unit/test_separator_chunking.py
tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/specify-rules.mdc)
Run tests using pytest
Files:
tests/unit/test_audio_chunking.pytests/unit/test_separator_chunking.py
🧬 Code graph analysis (1)
audio_separator/separator/separator.py (1)
audio_separator/separator/audio_chunking.py (4)
AudioChunker(9-141)should_chunk(131-141)split_audio(36-85)merge_chunks(87-129)
🪛 Ruff (0.14.11)
audio_separator/separator/audio_chunking.py
25-25: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
52-52: Avoid specifying long messages outside the exception class
(TRY003)
104-104: Avoid specifying long messages outside the exception class
(TRY003)
109-109: Avoid specifying long messages outside the exception class
(TRY003)
tests/unit/test_audio_chunking.py
60-60: Unused method argument: mock_makedirs
(ARG002)
65-65: Unused lambda argument: key
(ARG005)
88-88: Unused method argument: mock_makedirs
(ARG002)
93-93: Unused lambda argument: key
(ARG005)
114-114: Probable insecure usage of temporary file or directory: "/tmp"
(S108)
137-137: Probable insecure usage of temporary file or directory: "/tmp/output.wav"
(S108)
139-139: Probable insecure usage of temporary file or directory: "/tmp/output.wav"
(S108)
145-145: Probable insecure usage of temporary file or directory: "/tmp/output.wav"
(S108)
153-153: Probable insecure usage of temporary file or directory: "/tmp/output.wav"
(S108)
audio_separator/separator/separator.py
190-190: Avoid specifying long messages outside the exception class
(TRY003)
tests/unit/test_separator_chunking.py
321-321: Unused function argument: chunk_path
(ARG001)
321-321: Unused function argument: custom_names
(ARG001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: run-integration-test
- GitHub Check: test-ubuntu (3.13)
- GitHub Check: test-ubuntu (3.10)
- GitHub Check: test-ubuntu (3.12)
- GitHub Check: test-macos (3.10)
- GitHub Check: test-windows (3.12)
- GitHub Check: test-macos (3.13)
- GitHub Check: test-ubuntu (3.11)
- GitHub Check: test-windows (3.13)
- GitHub Check: test-windows (3.10)
- GitHub Check: test-windows (3.11)
- GitHub Check: test-macos (3.12)
- GitHub Check: test-macos (3.11)
🔇 Additional comments (15)
audio_separator/separator/separator.py (3)
187-191: LGTM! Validation for chunk_duration parameter.The validation correctly rejects non-positive values. The parameter is optional (defaults to None) which allows backward compatibility.
996-998: GPU cache clearing should happen after state restoration.Currently,
clear_gpu_cache()is called after the finally block that restores state. This is the correct behavior to ensure the GPU cache is cleared even if state restoration encounters issues. Good defensive coding.
968-985: The path handling at line 984 is correct and necessary.The paths returned from
_separate_file()are relative filenames (e.g., "chunk_0000_(Vocals).wav"), not absolute paths. Althoughwrite_audio_pydub()andwrite_audio_soundfile()joinoutput_dirwith the path locally (lines 309 and 397 in common_separator.py), this modified path is not propagated back to the caller. Theget_stem_output_path()method returns only the filename, which is then appended tooutput_filesbefore being passed back. Therefore, the normalization check at line 984 correctly handles both cases: if a path is somehow absolute (edge case), it's used as-is; if relative, it's joined withtemp_dir.audio_separator/utils/cli.py (2)
62-73: LGTM! CLI option well-documented and properly integrated.The help text is comprehensive, includes practical recommendations (600 seconds for files >1 hour), and notes the concatenation behavior. The
type=floatallows flexibility for fractional durations.
205-205: LGTM! Parameter correctly passed to Separator.The
chunk_durationis properly wired from CLI args to the Separator constructor.audio_separator/separator/audio_chunking.py (3)
51-58: LGTM! Input validation and directory creation.The method correctly validates input existence and creates the output directory if needed. Loading via
AudioSegment.from_filesupports multiple formats.
63-65: Ceiling division correctly calculates chunk count.The formula
(total + chunk_size - 1) // chunk_sizeis the correct ceiling division pattern to ensure the last partial chunk is included.
131-141: LGTM! Simple and correct threshold check.The
should_chunkmethod correctly compares audio duration against the configured chunk duration threshold.README.md (1)
281-320: LGTM! Comprehensive and user-friendly documentation.The new section thoroughly covers:
- Practical usage examples with realistic scenarios
- Clear explanation of the three-step workflow
- Multi-stem model support (2/4/6 stems)
- Benefits and recommendations based on system resources
- Honest disclosure about potential boundary artifacts
This documentation aligns well with the implementation and will help users understand when and how to use chunking.
tests/unit/test_separator_chunking.py (6)
17-83: LGTM! Good test coverage for 2-stem model chunking.The test properly mocks the AudioChunker and verifies:
- Correct number of merge_chunks calls (2 for 2 stems)
- Output contains expected stem names
- State management through mock separator
195-236: LGTM! Thorough regex pattern testing.Tests cover:
- Standard stem patterns (Vocals, Instrumental, Drums, etc.)
- Different file extensions
- Multi-word stem names ("Backing Vocals")
- Fallback scenarios when pattern doesn't match
319-328: Unused arguments are intentional for mock callback signature.The static analysis flags
chunk_pathandcustom_namesas unused, but they're required to match the_separate_filemethod signature for the mock. This is a false positive.
430-461: Good test for error handling and state restoration.This test verifies that the state is correctly restored even when an exception occurs during chunk processing. This is critical for ensuring the Separator remains usable after a failed operation.
589-631: Test for inconsistent stem counts is valuable but may not reflect real behavior.The test verifies graceful handling when chunks produce different stem counts. In practice, the same model should always produce the same number of stems. However, this defensive test is good for robustness.
633-668: LGTM! Tests fallback naming behavior.Good coverage of the edge case where output files don't match the expected
_(StemName)pattern. The test verifies the fallback tostem_0naming works correctly.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
The common_expected_args fixture was missing the new chunk_duration parameter added to Separator.__init__(), causing all CLI tests to fail in CI. Also corrected parameter order to match actual constructor (use_soundfile before use_autocast). Fixes: - Added chunk_duration: None to expected args - Reordered use_soundfile/use_autocast to match Separator.__init__() All tests/unit/test_cli.py tests now pass (13 passed, 2 skipped).
Addressed CodeRabbit feedback: 1. Renamed unused mock parameter mock_makedirs to _mock_makedirs (ARG002) 2. Renamed unused lambda parameter key to _ (ARG005) 3. Replaced hardcoded /tmp paths with pytest tmp_path fixture (S108) All 17 tests still pass after these changes.
|
Thank you for contributing this @shimpeiws :) |
Fixes #158
Summary
Adds automatic file-level chunking to process large audio files without OOM errors. Follows the approach outlined in the maintainer's reference implementation (#44).
Changes
Core Implementation
audio_chunking.pywithAudioChunkerclass for splitting and mergingSeparatorclass now supportschunk_durationparameter--chunk_durationfor configurable chunk size (default: no chunking)Multi-Stem Support
Documentation
Testing
Unit Tests (32 tests, all passing)
Manual Testing
Tested with 30-second audio file:
All tests completed successfully without errors.
Design Decisions
Simple Concatenation (No Crossfade)
Dynamic Stem Handling
Example Usage
Performance Impact
Related
--use_soundfileoption for output OOM mitigationChecklist
Summary by CodeRabbit
New Features
Documentation
CLI
Tests
✏️ Tip: You can customize this high-level summary in your review settings.