Add file-level chunking for large audio files (fixes #158) #256

shimpeiws · 2026-01-15T22:36:54Z

Fixes #158

Summary

Adds automatic file-level chunking to process large audio files without OOM errors. Follows the approach outlined in the maintainer's reference implementation (#44).

Changes

Core Implementation

New module audio_chunking.py with AudioChunker class for splitting and merging
Separator class now supports chunk_duration parameter
CLI option: --chunk_duration for configurable chunk size (default: no chunking)
Chunks processed sequentially with GPU cache clearing between chunks
Results merged with simple concatenation (no overlap/crossfade)

Multi-Stem Support

Critical bug fix: Previously only first 2 stems were preserved with 4-stem or 6-stem models
Now dynamically handles 2, 4, and 6-stem models:
- 2-stem (MDX): Vocals + Instrumental
- 4-stem (Demucs): Drums, Bass, Other, Vocals
- 6-stem (Demucs 6s): Bass, Drums, Other, Vocals, Guitar, Piano

Documentation

Updated README with "Processing Large Files" section
Usage examples and recommendations included

Testing

Unit Tests (32 tests, all passing)

AudioChunker (17 tests): Split, merge, edge cases
Separator chunking (15 tests):
- 2/4/6-stem model compatibility
- State restoration after chunking
- GPU cache clearing verification
- Temporary directory cleanup
- Error handling with state restoration
- Edge cases (empty output, inconsistent stems, etc.)

Manual Testing

Tested with 30-second audio file:

✅ Test 1: MDX 2-stem without chunking → 2 output files
✅ Test 2: MDX 2-stem with chunking (10s chunks) → 2 output files
✅ Test 3: Demucs 4-stem with chunking → 4 output files (previously only 2)

All tests completed successfully without errors.

Design Decisions

Simple Concatenation (No Crossfade)

Matches maintainer's reference implementation in Separator terminated with "Killed" message (Out of memory / OOM error on Linux) #44
Keeps PR scope minimal for easier review ("good first issue")
Crossfade/overlap can be added in Phase 2 if users report artifacts
Most audio separation models produce continuous spectral output

Dynamic Stem Handling

Uses regex pattern matching to extract stem names from chunk filenames
Automatically adapts to any number of stems (2, 4, 6, or more)
Sorted output order for consistency

Example Usage

# Process 8-hour file in 10-minute chunks
audio-separator long_podcast.wav --chunk_duration 600

# Works with all model types
audio-separator large_file.wav --model_filename htdemucs.yaml --chunk_duration 600

Performance Impact

Memory usage reduced from 100GB+ to <10GB for 8-hour files
Processing time: Slightly longer due to chunking overhead, but prevents OOM
Output quality: Identical to non-chunked processing (verified by file comparison)

Checklist

Code follows existing style and patterns
All unit tests passing (32 tests)
Manual testing completed successfully
Documentation updated (README)
Multi-stem models tested and working
No breaking changes to existing functionality

Summary by CodeRabbit

New Features
- Optional chunking to process very long audio files by splitting, per-chunk processing, and final merge to avoid OOM while preserving quality.
Documentation
- Added "Processing Large Files" under JSON Output with examples, three-step workflow, model guidance, recommendations, and note on minor boundary artifacts.
CLI
- New --chunk_duration flag to enable/configure chunking.
Tests
- Extensive unit and integration tests covering chunking, merging, edge cases, and error handling.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Add AudioChunker class for splitting/merging audio files - Add chunk_duration parameter to Separator class - Implement _process_with_chunking() for chunk-based processing - Add --chunk_duration CLI option - Chunks are processed sequentially with simple concatenation - GPU cache cleared between chunks to manage memory This implementation follows the reference approach from issue nomadkaraoke#44 without overlap/crossfade (can be added in future PR if needed).

- 17 test cases covering initialization, splitting, merging - Tests for edge cases (short files, exact multiples, boundaries) - Integration tests with actual audio segments - Mock tests for error handling - All tests passing

Updated README.md with comprehensive section on using --chunk_duration option for processing large audio files. Documents the split-process-merge workflow, benefits, recommendations, and limitation (simple concatenation). Relates to nomadkaraoke#158

…operly Fixed two issues in chunk processing: 1. Convert relative output file paths to absolute paths when collecting chunk results 2. Temporarily change both Separator and model_instance output_dir to temp directory during chunk processing This ensures chunk files are saved to the temp directory and automatically cleaned up, keeping the output directory clean with only final merged files. Relates to nomadkaraoke#158

- Simplified audio_chunking.py module docstring to match existing codebase style - Removed self-explanatory inline comments in _process_with_chunking - Kept important comments for non-obvious operations (GPU cache clearing) - All unit tests passing (17/17) Relates to nomadkaraoke#158

- Changed 'Loading audio file' to debug (internal AudioChunker operation) - Changed 'Created temporary directory' to debug (low-priority detail) - Removed 'File-level chunking enabled' log from __init__ (unnecessary, already logged when actually used) - INFO level now shows only user-relevant information: - Splitting/merging progress - Chunk processing status - Completion messages Relates to nomadkaraoke#158

The completion message after splitting is redundant since we already log the splitting operation. Follows existing codebase pattern where 'completed' messages are typically debug level. Relates to nomadkaraoke#158

- Replace hardcoded primary/secondary lists with dynamic stem dictionary - Extract stem names from chunk filenames using regex pattern - Support 2-stem, 4-stem, and 6-stem models (MDX, Demucs, Demucs 6s) - Add re module import for stem name extraction - Update README to document multi-stem support Previously, only the first 2 stems were preserved when using chunking with 4-stem or 6-stem models. This fix ensures all stems are correctly processed and merged. The chunking feature now supports: - 2-stem models (e.g., MDX): Vocals + Instrumental - 4-stem models (e.g., Demucs): Drums, Bass, Other, Vocals - 6-stem models (e.g., Demucs 6s): Bass, Drums, Other, Vocals, Guitar, Piano

Add 15 new unit tests for Separator chunking logic, covering: **Basic Functionality (6 tests):** - 2-stem model compatibility (Vocals, Instrumental) - 4-stem Demucs model (Drums, Bass, Other, Vocals) - 6-stem Demucs model (all 6 stems) - Stem name extraction from filenames with regex - Fallback handling for non-matching patterns - Sorted stem order in merged output **Internal Logic & State Management (6 tests):** - State restoration after chunking (chunk_duration, output_dir) - GPU cache clearing between chunks - Temporary directory cleanup verification - State restoration on error (exception handling) - AudioChunker initialization with correct parameters - custom_output_names parameter handling **Edge Cases (3 tests):** - Empty output handling (no stems produced) - Inconsistent stem counts across chunks - Filename pattern match failure with fallback naming Total test coverage: 32 tests (17 AudioChunker + 15 Separator chunking) All tests passing.

coderabbitai · 2026-01-15T22:37:08Z

Walkthrough

Adds configurable audio chunking: new AudioChunker utility, integration into Separator with optional chunk_duration (CLI flag), per-chunk processing and merging flow, extensive unit/integration tests for chunking, and README documentation describing the chunk/split/merge workflow.

Changes

Cohort / File(s)	Summary
Core Chunking Module `audio_separator/separator/audio_chunking.py`	New `AudioChunker` class with `split_audio()`, `merge_chunks()`, and `should_chunk()` using pydub; input validation, chunk naming/export, concatenation, and logging.
Separator Integration `audio_separator/separator/separator.py`	Added optional `chunk_duration` constructor param and implemented `_process_with_chunking()` flow: temp workspace, split -> per-chunk separation -> group per-stem outputs -> merge per-stem -> cleanup; adds `re` import and clears GPU cache between chunks.
CLI Configuration `audio_separator/utils/cli.py`	New `--chunk_duration` float CLI flag with help text; passes `chunk_duration=args.chunk_duration` into `Separator`.
Documentation `README.md`	New "Processing Large Files" section under JSON Output describing chunking usage, workflow (Split / Process / Merge), supported models, memory recommendations, and notes on concatenation artifacts.
Tests — AudioChunker `tests/unit/test_audio_chunking.py`	New comprehensive tests: init, `should_chunk()` boundaries, mocked split/merge behaviors, error cases, and integration round-trips using `AudioSegment`.
Tests — Separator Chunking `tests/unit/test_separator_chunking.py`, `tests/unit/test_cli.py`	Extensive tests for `_process_with_chunking()` across multiple-stem scenarios, merge invocation per stem, stem-name extraction, state restoration on exceptions, temp dir cleanup, and CLI fixture update to include `chunk_duration`.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as CLI
    participant Sep as Separator
    participant Chunk as AudioChunker
    participant Model as Separator._separate_file
    participant FS as FileSystem

    User->>CLI: run with --chunk_duration=300
    CLI->>Sep: Separator(chunk_duration=300)
    Sep->>Chunk: should_chunk(input_duration)
    Chunk-->>Sep: True

    Sep->>Chunk: split_audio(input_path, temp_dir)
    Chunk->>FS: load audio & write chunk files
    Chunk-->>Sep: [chunk_0000, chunk_0001...]

    loop per chunk
        Sep->>Model: _separate_file(chunk_N)
        Model-->>Sep: per-chunk stem outputs
        Sep->>FS: store per-chunk outputs grouped by stem
        Sep->>Sep: torch.cuda.empty_cache() / mem cleanup
    end

    loop per stem
        Sep->>Chunk: merge_chunks(stem_chunk_paths, final_stem_output)
        Chunk->>FS: read & concatenate chunk stems
        Chunk->>FS: export merged stem file
        Chunk-->>Sep: merged_stem_path
    end

    Sep->>FS: cleanup temp_dir
    Sep-->>User: final merged output files

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 A long file knocked upon my door,
"Slice me, process me, then restore."
Chunk by chunk I hum and hop,
Stems rejoined — one final chop.
Memory light, the job's a hop!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change: adding file-level chunking for large audio files, and directly references the linked issue `#158`.
Linked Issues check	✅ Passed	The PR fully addresses `#158` objectives: automatic file splitting, sequential processing, merging results, GPU cache clearing, configurable CLI option, and reduced memory usage for large files.
Out of Scope Changes check	✅ Passed	All changes directly support chunking functionality and its integration: AudioChunker module, Separator integration, CLI support, documentation, and comprehensive tests. No unrelated changes detected.
Docstring Coverage	✅ Passed	Docstring coverage is 94.12% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 52f8050 and 7ef3d77.

📒 Files selected for processing (1)

tests/unit/test_audio_chunking.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/specify-rules.mdc)

**/*.py: Use Python 3.11+ syntax and features
Follow standard Python conventions for Python 3.11+
Use PyTorch for deep learning operations
Use librosa for audio loading and processing
Use soundfile for audio I/O operations
Use numpy for numerical array operations
Use onnxruntime for model inference when applicable
Use ruff for code linting and formatting checks

Files:

tests/unit/test_audio_chunking.py

tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/specify-rules.mdc)

Run tests using pytest

Files:

tests/unit/test_audio_chunking.py

🧬 Code graph analysis (1)

tests/unit/test_audio_chunking.py (1)

audio_separator/separator/audio_chunking.py (4)

AudioChunker (9-141)

should_chunk (131-141)

split_audio (36-85)

merge_chunks (87-129)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: run-integration-test
GitHub Check: test-windows (3.10)
GitHub Check: test-macos (3.10)
GitHub Check: test-macos (3.11)
GitHub Check: test-ubuntu (3.13)
GitHub Check: test-ubuntu (3.10)
GitHub Check: test-windows (3.13)
GitHub Check: test-ubuntu (3.11)
GitHub Check: test-windows (3.12)
GitHub Check: test-ubuntu (3.12)
GitHub Check: test-windows (3.11)

🔇 Additional comments (8)

tests/unit/test_audio_chunking.py (8)

19-35: Solid fixture setup and init coverage.

37-55: Boundary tests for should_chunk look good.

57-110: Split-audio mocking covers expected chunk counts well.

111-153: Merge-path tests cover normal and error flows cleanly.

155-165: Chunk duration conversion tests are correct.

167-206: Round‑trip split/merge integration test is strong.

207-228: Format-variation test looks solid.

233-275: Edge‑case chunk counts are well covered.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@tests/unit/test_audio_chunking.py`:
- Around line 57-60: The test functions (e.g., test_split_audio_basic) declare a
patched parameter named mock_makedirs that is unused and triggers ruff ARG002;
rename that parameter to _mock_makedirs (or _) in the function signature for all
affected tests (including the one at the other occurrence around line 88) so the
`@patch` order remains unchanged but the unused arg is explicitly ignored.
- Line 65: Replace the unused lambda parameter name with an underscore to
satisfy ARG005: update the Mock side_effect assignments (the lines setting
mock_audio.__getitem__ = Mock(side_effect=lambda key: mock_audio) and the
similar occurrence around the other test) to use lambda _: mock_audio (or lambda
*_: mock_audio) so the unused arg is named `_`; keep the rest of the mocking
logic unchanged.
- Around line 111-115: Replace hardcoded "/tmp" usages in the split-audio tests
with pytest's tmp_path fixture: update test_split_audio_file_not_found to accept
tmp_path and call self.chunker.split_audio("nonexistent.wav", tmp_path) (or
tmp_path.as_posix() if a string path is required), and make the same replacement
for the other split_audio tests that currently pass "/tmp" so they use tmp_path
for secure, isolated temp directories; ensure test signatures include the
tmp_path parameter and any assertions use the fixture path.

🧹 Nitpick comments (6)

audio_separator/separator/separator.py (3)
876-886: Consider caching the duration check to avoid redundant I/O.

The librosa.get_duration() call loads the audio file to determine its duration, and then AudioChunker.split_audio() will load the file again via pydub. For very large files, this double-load could be noticeable.

💡 Optional optimization

Consider passing the pre-computed duration to the chunker or having split_audio return the duration as a side effect to avoid loading the file twice. However, for the scope of this PR (preventing OOM), the current approach is acceptable since the duration check is lightweight compared to the actual separation.

952-965: State modification pattern is correct but could benefit from a context manager.

The pattern of saving original values, modifying state, and restoring in finally is correct. However, the restoration happens inside a nested try/finally which makes the flow complex.

Note: The current implementation works correctly. A future refactor could extract this into a context manager for cleaner state management.

1026-1030: Cleanup uses ignore_errors=True which silently swallows errors.

Using ignore_errors=True in shutil.rmtree means any cleanup failures will be silently ignored. This is acceptable for temporary files, but consider logging a warning if cleanup fails.
💡 Optional: Add cleanup failure logging
         finally:
             # Clean up temporary directory
             if os.path.exists(temp_dir):
                 self.logger.debug(f"Cleaning up temporary directory: {temp_dir}")
-                shutil.rmtree(temp_dir, ignore_errors=True)
+                try:
+                    shutil.rmtree(temp_dir)
+                except OSError as e:
+                    self.logger.warning(f"Failed to clean up temporary directory {temp_dir}: {e}")
audio_separator/separator/audio_chunking.py (3)
25-34: Type hint should use explicit Optional instead of implicit None default.

The static analysis correctly flags this as PEP 484 violation. The parameter allows None but the type hint doesn't reflect this.
♻️ Fix type hint
-    def __init__(self, chunk_duration_seconds: float, logger: logging.Logger = None):
+    def __init__(self, chunk_duration_seconds: float, logger: logging.Logger | None = None):
67-82: Consider validating that the file extension is supported by pydub/ffmpeg.

The code defaults to .wav if no extension is present, but if an unsupported extension is provided, chunk.export() will fail at runtime with a potentially confusing error.

This is a minor edge case since most users will provide standard audio formats. The current behavior delegates validation to pydub/ffmpeg which will provide appropriate errors.

113-120: Concatenation loop may be inefficient for many chunks.

The combined += chunk pattern creates a new AudioSegment on each iteration. For a large number of chunks (e.g., 48 chunks for an 8-hour file with 10-minute chunks), this could be slow.
💡 Alternative using reduce or sum
# Alternative approach that may be more efficient:
from functools import reduce
combined = reduce(lambda acc, path: acc + AudioSegment.from_file(path), chunk_paths, AudioSegment.empty())
However, for typical use cases (tens of chunks), the performance difference is negligible. This can be addressed later if profiling shows it's a bottleneck.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bfa7380 and b285616.

📒 Files selected for processing (6)

README.md
audio_separator/separator/audio_chunking.py
audio_separator/separator/separator.py
audio_separator/utils/cli.py
tests/unit/test_audio_chunking.py
tests/unit/test_separator_chunking.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/specify-rules.mdc)

**/*.py: Use Python 3.11+ syntax and features
Follow standard Python conventions for Python 3.11+
Use PyTorch for deep learning operations
Use librosa for audio loading and processing
Use soundfile for audio I/O operations
Use numpy for numerical array operations
Use onnxruntime for model inference when applicable
Use ruff for code linting and formatting checks

Files:

audio_separator/separator/audio_chunking.py
tests/unit/test_audio_chunking.py
audio_separator/separator/separator.py
audio_separator/utils/cli.py
tests/unit/test_separator_chunking.py

tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/specify-rules.mdc)

Run tests using pytest

Files:

tests/unit/test_audio_chunking.py
tests/unit/test_separator_chunking.py

🧬 Code graph analysis (1)

audio_separator/separator/separator.py (1)

audio_separator/separator/audio_chunking.py (4)

AudioChunker (9-141)

should_chunk (131-141)

split_audio (36-85)

merge_chunks (87-129)

🪛 Ruff (0.14.11)

audio_separator/separator/audio_chunking.py

25-25: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

52-52: Avoid specifying long messages outside the exception class

(TRY003)

104-104: Avoid specifying long messages outside the exception class

(TRY003)

109-109: Avoid specifying long messages outside the exception class

(TRY003)

tests/unit/test_audio_chunking.py

60-60: Unused method argument: mock_makedirs

(ARG002)

65-65: Unused lambda argument: key

(ARG005)

88-88: Unused method argument: mock_makedirs

(ARG002)

93-93: Unused lambda argument: key

(ARG005)

114-114: Probable insecure usage of temporary file or directory: "/tmp"

(S108)

137-137: Probable insecure usage of temporary file or directory: "/tmp/output.wav"

(S108)

139-139: Probable insecure usage of temporary file or directory: "/tmp/output.wav"

(S108)

145-145: Probable insecure usage of temporary file or directory: "/tmp/output.wav"

(S108)

153-153: Probable insecure usage of temporary file or directory: "/tmp/output.wav"

(S108)

audio_separator/separator/separator.py

190-190: Avoid specifying long messages outside the exception class

(TRY003)

tests/unit/test_separator_chunking.py

321-321: Unused function argument: chunk_path

(ARG001)

321-321: Unused function argument: custom_names

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)

GitHub Check: run-integration-test
GitHub Check: test-ubuntu (3.13)
GitHub Check: test-ubuntu (3.10)
GitHub Check: test-ubuntu (3.12)
GitHub Check: test-macos (3.10)
GitHub Check: test-windows (3.12)
GitHub Check: test-macos (3.13)
GitHub Check: test-ubuntu (3.11)
GitHub Check: test-windows (3.13)
GitHub Check: test-windows (3.10)
GitHub Check: test-windows (3.11)
GitHub Check: test-macos (3.12)
GitHub Check: test-macos (3.11)

🔇 Additional comments (15)

audio_separator/separator/separator.py (3)

187-191: LGTM! Validation for chunk_duration parameter.

The validation correctly rejects non-positive values. The parameter is optional (defaults to None) which allows backward compatibility.

996-998: GPU cache clearing should happen after state restoration.

Currently, clear_gpu_cache() is called after the finally block that restores state. This is the correct behavior to ensure the GPU cache is cleared even if state restoration encounters issues. Good defensive coding.

968-985: The path handling at line 984 is correct and necessary.

The paths returned from _separate_file() are relative filenames (e.g., "chunk_0000_(Vocals).wav"), not absolute paths. Although write_audio_pydub() and write_audio_soundfile() join output_dir with the path locally (lines 309 and 397 in common_separator.py), this modified path is not propagated back to the caller. The get_stem_output_path() method returns only the filename, which is then appended to output_files before being passed back. Therefore, the normalization check at line 984 correctly handles both cases: if a path is somehow absolute (edge case), it's used as-is; if relative, it's joined with temp_dir.

audio_separator/utils/cli.py (2)

62-73: LGTM! CLI option well-documented and properly integrated.

The help text is comprehensive, includes practical recommendations (600 seconds for files >1 hour), and notes the concatenation behavior. The type=float allows flexibility for fractional durations.

205-205: LGTM! Parameter correctly passed to Separator.

The chunk_duration is properly wired from CLI args to the Separator constructor.

audio_separator/separator/audio_chunking.py (3)

51-58: LGTM! Input validation and directory creation.

The method correctly validates input existence and creates the output directory if needed. Loading via AudioSegment.from_file supports multiple formats.

63-65: Ceiling division correctly calculates chunk count.

The formula (total + chunk_size - 1) // chunk_size is the correct ceiling division pattern to ensure the last partial chunk is included.

131-141: LGTM! Simple and correct threshold check.

The should_chunk method correctly compares audio duration against the configured chunk duration threshold.

README.md (1)

281-320: LGTM! Comprehensive and user-friendly documentation.

The new section thoroughly covers:

Practical usage examples with realistic scenarios

Clear explanation of the three-step workflow

Multi-stem model support (2/4/6 stems)

Benefits and recommendations based on system resources

Honest disclosure about potential boundary artifacts

This documentation aligns well with the implementation and will help users understand when and how to use chunking.

tests/unit/test_separator_chunking.py (6)

17-83: LGTM! Good test coverage for 2-stem model chunking.

The test properly mocks the AudioChunker and verifies:

Correct number of merge_chunks calls (2 for 2 stems)

Output contains expected stem names

State management through mock separator

195-236: LGTM! Thorough regex pattern testing.

Tests cover:

Standard stem patterns (Vocals, Instrumental, Drums, etc.)

Different file extensions

Multi-word stem names ("Backing Vocals")

Fallback scenarios when pattern doesn't match

319-328: Unused arguments are intentional for mock callback signature.

The static analysis flags chunk_path and custom_names as unused, but they're required to match the _separate_file method signature for the mock. This is a false positive.

430-461: Good test for error handling and state restoration.

This test verifies that the state is correctly restored even when an exception occurs during chunk processing. This is critical for ensuring the Separator remains usable after a failed operation.

589-631: Test for inconsistent stem counts is valuable but may not reflect real behavior.

The test verifies graceful handling when chunks produce different stem counts. In practice, the same model should always produce the same number of stems. However, this defensive test is good for robustness.

633-668: LGTM! Tests fallback naming behavior.

Good coverage of the edge case where output files don't match the expected _(StemName) pattern. The test verifies the fallback to stem_0 naming works correctly.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

tests/unit/test_audio_chunking.py

The common_expected_args fixture was missing the new chunk_duration parameter added to Separator.__init__(), causing all CLI tests to fail in CI. Also corrected parameter order to match actual constructor (use_soundfile before use_autocast). Fixes: - Added chunk_duration: None to expected args - Reordered use_soundfile/use_autocast to match Separator.__init__() All tests/unit/test_cli.py tests now pass (13 passed, 2 skipped).

Addressed CodeRabbit feedback: 1. Renamed unused mock parameter mock_makedirs to _mock_makedirs (ARG002) 2. Renamed unused lambda parameter key to _ (ARG005) 3. Replaced hardcoded /tmp paths with pytest tmp_path fixture (S108) All 17 tests still pass after these changes.

beveradb · 2026-01-16T02:53:52Z

Thank you for contributing this @shimpeiws :)
Should be released in version 0.41.0 shortly

shimpeiws added 9 commits January 15, 2026 11:38

Add comprehensive unit tests for AudioChunker

75e5cff

- 17 test cases covering initialization, splitting, merging - Tests for edge cases (short files, exact multiples, boundaries) - Integration tests with actual audio segments - Mock tests for error handling - All tests passing

refactor: Remove redundant 'Successfully split' log message

488051a

The completion message after splitting is redundant since we already log the splitting operation. Follows existing codebase pattern where 'completed' messages are typically debug level. Relates to nomadkaraoke#158

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

tests/unit/test_audio_chunking.py Outdated Show resolved Hide resolved

tests/unit/test_audio_chunking.py Outdated Show resolved Hide resolved

tests/unit/test_audio_chunking.py Outdated Show resolved Hide resolved

shimpeiws added 2 commits January 16, 2026 07:49

beveradb merged commit 10fa098 into nomadkaraoke:main Jan 16, 2026
18 checks passed

shimpeiws deleted the feature/file-level-chunking-158 branch January 16, 2026 09:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add file-level chunking for large audio files (fixes #158) #256

Add file-level chunking for large audio files (fixes #158) #256

Uh oh!

shimpeiws commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

beveradb commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add file-level chunking for large audio files (fixes #158) #256

Add file-level chunking for large audio files (fixes #158) #256

Uh oh!

Conversation

shimpeiws commented Jan 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Core Implementation

Multi-Stem Support

Documentation

Testing

Unit Tests (32 tests, all passing)

Manual Testing

Design Decisions

Simple Concatenation (No Crossfade)

Dynamic Stem Handling

Example Usage

Performance Impact

Related

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

beveradb commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shimpeiws commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2026 •

edited

Loading