Skip to content

Conversation

@shimpeiws
Copy link
Contributor

@shimpeiws shimpeiws commented Jan 15, 2026

Fixes #158

Summary

Adds automatic file-level chunking to process large audio files without OOM errors. Follows the approach outlined in the maintainer's reference implementation (#44).

Changes

Core Implementation

  • New module audio_chunking.py with AudioChunker class for splitting and merging
  • Separator class now supports chunk_duration parameter
  • CLI option: --chunk_duration for configurable chunk size (default: no chunking)
  • Chunks processed sequentially with GPU cache clearing between chunks
  • Results merged with simple concatenation (no overlap/crossfade)

Multi-Stem Support

  • Critical bug fix: Previously only first 2 stems were preserved with 4-stem or 6-stem models
  • Now dynamically handles 2, 4, and 6-stem models:
    • 2-stem (MDX): Vocals + Instrumental
    • 4-stem (Demucs): Drums, Bass, Other, Vocals
    • 6-stem (Demucs 6s): Bass, Drums, Other, Vocals, Guitar, Piano

Documentation

  • Updated README with "Processing Large Files" section
  • Usage examples and recommendations included

Testing

Unit Tests (32 tests, all passing)

  • AudioChunker (17 tests): Split, merge, edge cases
  • Separator chunking (15 tests):
    • 2/4/6-stem model compatibility
    • State restoration after chunking
    • GPU cache clearing verification
    • Temporary directory cleanup
    • Error handling with state restoration
    • Edge cases (empty output, inconsistent stems, etc.)

Manual Testing

Tested with 30-second audio file:

  • Test 1: MDX 2-stem without chunking → 2 output files
  • Test 2: MDX 2-stem with chunking (10s chunks) → 2 output files
  • Test 3: Demucs 4-stem with chunking → 4 output files (previously only 2)

All tests completed successfully without errors.

Design Decisions

Simple Concatenation (No Crossfade)

Dynamic Stem Handling

  • Uses regex pattern matching to extract stem names from chunk filenames
  • Automatically adapts to any number of stems (2, 4, 6, or more)
  • Sorted output order for consistency

Example Usage

# Process 8-hour file in 10-minute chunks
audio-separator long_podcast.wav --chunk_duration 600

# Works with all model types
audio-separator large_file.wav --model_filename htdemucs.yaml --chunk_duration 600

Performance Impact

  • Memory usage reduced from 100GB+ to <10GB for 8-hour files
  • Processing time: Slightly longer due to chunking overhead, but prevents OOM
  • Output quality: Identical to non-chunked processing (verified by file comparison)

Related

Checklist

  • Code follows existing style and patterns
  • All unit tests passing (32 tests)
  • Manual testing completed successfully
  • Documentation updated (README)
  • Multi-stem models tested and working
  • No breaking changes to existing functionality

Summary by CodeRabbit

  • New Features

    • Optional chunking to process very long audio files by splitting, per-chunk processing, and final merge to avoid OOM while preserving quality.
  • Documentation

    • Added "Processing Large Files" under JSON Output with examples, three-step workflow, model guidance, recommendations, and note on minor boundary artifacts.
  • CLI

    • New --chunk_duration flag to enable/configure chunking.
  • Tests

    • Extensive unit and integration tests covering chunking, merging, edge cases, and error handling.

✏️ Tip: You can customize this high-level summary in your review settings.

- Add AudioChunker class for splitting/merging audio files
- Add chunk_duration parameter to Separator class
- Implement _process_with_chunking() for chunk-based processing
- Add --chunk_duration CLI option
- Chunks are processed sequentially with simple concatenation
- GPU cache cleared between chunks to manage memory

This implementation follows the reference approach from issue nomadkaraoke#44
without overlap/crossfade (can be added in future PR if needed).
- 17 test cases covering initialization, splitting, merging
- Tests for edge cases (short files, exact multiples, boundaries)
- Integration tests with actual audio segments
- Mock tests for error handling
- All tests passing
Updated README.md with comprehensive section on using --chunk_duration
option for processing large audio files. Documents the split-process-merge
workflow, benefits, recommendations, and limitation (simple concatenation).

Relates to nomadkaraoke#158
…operly

Fixed two issues in chunk processing:
1. Convert relative output file paths to absolute paths when collecting chunk results
2. Temporarily change both Separator and model_instance output_dir to temp directory during chunk processing

This ensures chunk files are saved to the temp directory and automatically
cleaned up, keeping the output directory clean with only final merged files.

Relates to nomadkaraoke#158
- Simplified audio_chunking.py module docstring to match existing codebase style
- Removed self-explanatory inline comments in _process_with_chunking
- Kept important comments for non-obvious operations (GPU cache clearing)
- All unit tests passing (17/17)

Relates to nomadkaraoke#158
- Changed 'Loading audio file' to debug (internal AudioChunker operation)
- Changed 'Created temporary directory' to debug (low-priority detail)
- Removed 'File-level chunking enabled' log from __init__ (unnecessary, already logged when actually used)
- INFO level now shows only user-relevant information:
  - Splitting/merging progress
  - Chunk processing status
  - Completion messages

Relates to nomadkaraoke#158
The completion message after splitting is redundant since we already
log the splitting operation. Follows existing codebase pattern where
'completed' messages are typically debug level.

Relates to nomadkaraoke#158
- Replace hardcoded primary/secondary lists with dynamic stem dictionary
- Extract stem names from chunk filenames using regex pattern
- Support 2-stem, 4-stem, and 6-stem models (MDX, Demucs, Demucs 6s)
- Add re module import for stem name extraction
- Update README to document multi-stem support

Previously, only the first 2 stems were preserved when using chunking
with 4-stem or 6-stem models. This fix ensures all stems are correctly
processed and merged.

The chunking feature now supports:
- 2-stem models (e.g., MDX): Vocals + Instrumental
- 4-stem models (e.g., Demucs): Drums, Bass, Other, Vocals
- 6-stem models (e.g., Demucs 6s): Bass, Drums, Other, Vocals, Guitar, Piano
Add 15 new unit tests for Separator chunking logic, covering:

**Basic Functionality (6 tests):**
- 2-stem model compatibility (Vocals, Instrumental)
- 4-stem Demucs model (Drums, Bass, Other, Vocals)
- 6-stem Demucs model (all 6 stems)
- Stem name extraction from filenames with regex
- Fallback handling for non-matching patterns
- Sorted stem order in merged output

**Internal Logic & State Management (6 tests):**
- State restoration after chunking (chunk_duration, output_dir)
- GPU cache clearing between chunks
- Temporary directory cleanup verification
- State restoration on error (exception handling)
- AudioChunker initialization with correct parameters
- custom_output_names parameter handling

**Edge Cases (3 tests):**
- Empty output handling (no stems produced)
- Inconsistent stem counts across chunks
- Filename pattern match failure with fallback naming

Total test coverage: 32 tests (17 AudioChunker + 15 Separator chunking)
All tests passing.
@coderabbitai
Copy link

coderabbitai bot commented Jan 15, 2026

Walkthrough

Adds configurable audio chunking: new AudioChunker utility, integration into Separator with optional chunk_duration (CLI flag), per-chunk processing and merging flow, extensive unit/integration tests for chunking, and README documentation describing the chunk/split/merge workflow.

Changes

Cohort / File(s) Summary
Core Chunking Module
audio_separator/separator/audio_chunking.py
New AudioChunker class with split_audio(), merge_chunks(), and should_chunk() using pydub; input validation, chunk naming/export, concatenation, and logging.
Separator Integration
audio_separator/separator/separator.py
Added optional chunk_duration constructor param and implemented _process_with_chunking() flow: temp workspace, split -> per-chunk separation -> group per-stem outputs -> merge per-stem -> cleanup; adds re import and clears GPU cache between chunks.
CLI Configuration
audio_separator/utils/cli.py
New --chunk_duration float CLI flag with help text; passes chunk_duration=args.chunk_duration into Separator.
Documentation
README.md
New "Processing Large Files" section under JSON Output describing chunking usage, workflow (Split / Process / Merge), supported models, memory recommendations, and notes on concatenation artifacts.
Tests — AudioChunker
tests/unit/test_audio_chunking.py
New comprehensive tests: init, should_chunk() boundaries, mocked split/merge behaviors, error cases, and integration round-trips using AudioSegment.
Tests — Separator Chunking
tests/unit/test_separator_chunking.py, tests/unit/test_cli.py
Extensive tests for _process_with_chunking() across multiple-stem scenarios, merge invocation per stem, stem-name extraction, state restoration on exceptions, temp dir cleanup, and CLI fixture update to include chunk_duration.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as CLI
    participant Sep as Separator
    participant Chunk as AudioChunker
    participant Model as Separator._separate_file
    participant FS as FileSystem

    User->>CLI: run with --chunk_duration=300
    CLI->>Sep: Separator(chunk_duration=300)
    Sep->>Chunk: should_chunk(input_duration)
    Chunk-->>Sep: True

    Sep->>Chunk: split_audio(input_path, temp_dir)
    Chunk->>FS: load audio & write chunk files
    Chunk-->>Sep: [chunk_0000, chunk_0001...]

    loop per chunk
        Sep->>Model: _separate_file(chunk_N)
        Model-->>Sep: per-chunk stem outputs
        Sep->>FS: store per-chunk outputs grouped by stem
        Sep->>Sep: torch.cuda.empty_cache() / mem cleanup
    end

    loop per stem
        Sep->>Chunk: merge_chunks(stem_chunk_paths, final_stem_output)
        Chunk->>FS: read & concatenate chunk stems
        Chunk->>FS: export merged stem file
        Chunk-->>Sep: merged_stem_path
    end

    Sep->>FS: cleanup temp_dir
    Sep-->>User: final merged output files
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 A long file knocked upon my door,
"Slice me, process me, then restore."
Chunk by chunk I hum and hop,
Stems rejoined — one final chop.
Memory light, the job's a hop!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: adding file-level chunking for large audio files, and directly references the linked issue #158.
Linked Issues check ✅ Passed The PR fully addresses #158 objectives: automatic file splitting, sequential processing, merging results, GPU cache clearing, configurable CLI option, and reduced memory usage for large files.
Out of Scope Changes check ✅ Passed All changes directly support chunking functionality and its integration: AudioChunker module, Separator integration, CLI support, documentation, and comprehensive tests. No unrelated changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 94.12% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings


📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 52f8050 and 7ef3d77.

📒 Files selected for processing (1)
  • tests/unit/test_audio_chunking.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/specify-rules.mdc)

**/*.py: Use Python 3.11+ syntax and features
Follow standard Python conventions for Python 3.11+
Use PyTorch for deep learning operations
Use librosa for audio loading and processing
Use soundfile for audio I/O operations
Use numpy for numerical array operations
Use onnxruntime for model inference when applicable
Use ruff for code linting and formatting checks

Files:

  • tests/unit/test_audio_chunking.py
tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/specify-rules.mdc)

Run tests using pytest

Files:

  • tests/unit/test_audio_chunking.py
🧬 Code graph analysis (1)
tests/unit/test_audio_chunking.py (1)
audio_separator/separator/audio_chunking.py (4)
  • AudioChunker (9-141)
  • should_chunk (131-141)
  • split_audio (36-85)
  • merge_chunks (87-129)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: run-integration-test
  • GitHub Check: test-windows (3.10)
  • GitHub Check: test-macos (3.10)
  • GitHub Check: test-macos (3.11)
  • GitHub Check: test-ubuntu (3.13)
  • GitHub Check: test-ubuntu (3.10)
  • GitHub Check: test-windows (3.13)
  • GitHub Check: test-ubuntu (3.11)
  • GitHub Check: test-windows (3.12)
  • GitHub Check: test-ubuntu (3.12)
  • GitHub Check: test-windows (3.11)
🔇 Additional comments (8)
tests/unit/test_audio_chunking.py (8)

19-35: Solid fixture setup and init coverage.


37-55: Boundary tests for should_chunk look good.


57-110: Split-audio mocking covers expected chunk counts well.


111-153: Merge-path tests cover normal and error flows cleanly.


155-165: Chunk duration conversion tests are correct.


167-206: Round‑trip split/merge integration test is strong.


207-228: Format-variation test looks solid.


233-275: Edge‑case chunk counts are well covered.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@tests/unit/test_audio_chunking.py`:
- Around line 57-60: The test functions (e.g., test_split_audio_basic) declare a
patched parameter named mock_makedirs that is unused and triggers ruff ARG002;
rename that parameter to _mock_makedirs (or _) in the function signature for all
affected tests (including the one at the other occurrence around line 88) so the
`@patch` order remains unchanged but the unused arg is explicitly ignored.
- Line 65: Replace the unused lambda parameter name with an underscore to
satisfy ARG005: update the Mock side_effect assignments (the lines setting
mock_audio.__getitem__ = Mock(side_effect=lambda key: mock_audio) and the
similar occurrence around the other test) to use lambda _: mock_audio (or lambda
*_: mock_audio) so the unused arg is named `_`; keep the rest of the mocking
logic unchanged.
- Around line 111-115: Replace hardcoded "/tmp" usages in the split-audio tests
with pytest's tmp_path fixture: update test_split_audio_file_not_found to accept
tmp_path and call self.chunker.split_audio("nonexistent.wav", tmp_path) (or
tmp_path.as_posix() if a string path is required), and make the same replacement
for the other split_audio tests that currently pass "/tmp" so they use tmp_path
for secure, isolated temp directories; ensure test signatures include the
tmp_path parameter and any assertions use the fixture path.
🧹 Nitpick comments (6)
audio_separator/separator/separator.py (3)

876-886: Consider caching the duration check to avoid redundant I/O.

The librosa.get_duration() call loads the audio file to determine its duration, and then AudioChunker.split_audio() will load the file again via pydub. For very large files, this double-load could be noticeable.

💡 Optional optimization

Consider passing the pre-computed duration to the chunker or having split_audio return the duration as a side effect to avoid loading the file twice. However, for the scope of this PR (preventing OOM), the current approach is acceptable since the duration check is lightweight compared to the actual separation.


952-965: State modification pattern is correct but could benefit from a context manager.

The pattern of saving original values, modifying state, and restoring in finally is correct. However, the restoration happens inside a nested try/finally which makes the flow complex.

Note: The current implementation works correctly. A future refactor could extract this into a context manager for cleaner state management.


1026-1030: Cleanup uses ignore_errors=True which silently swallows errors.

Using ignore_errors=True in shutil.rmtree means any cleanup failures will be silently ignored. This is acceptable for temporary files, but consider logging a warning if cleanup fails.

💡 Optional: Add cleanup failure logging
         finally:
             # Clean up temporary directory
             if os.path.exists(temp_dir):
                 self.logger.debug(f"Cleaning up temporary directory: {temp_dir}")
-                shutil.rmtree(temp_dir, ignore_errors=True)
+                try:
+                    shutil.rmtree(temp_dir)
+                except OSError as e:
+                    self.logger.warning(f"Failed to clean up temporary directory {temp_dir}: {e}")
audio_separator/separator/audio_chunking.py (3)

25-34: Type hint should use explicit Optional instead of implicit None default.

The static analysis correctly flags this as PEP 484 violation. The parameter allows None but the type hint doesn't reflect this.

♻️ Fix type hint
-    def __init__(self, chunk_duration_seconds: float, logger: logging.Logger = None):
+    def __init__(self, chunk_duration_seconds: float, logger: logging.Logger | None = None):

67-82: Consider validating that the file extension is supported by pydub/ffmpeg.

The code defaults to .wav if no extension is present, but if an unsupported extension is provided, chunk.export() will fail at runtime with a potentially confusing error.

This is a minor edge case since most users will provide standard audio formats. The current behavior delegates validation to pydub/ffmpeg which will provide appropriate errors.


113-120: Concatenation loop may be inefficient for many chunks.

The combined += chunk pattern creates a new AudioSegment on each iteration. For a large number of chunks (e.g., 48 chunks for an 8-hour file with 10-minute chunks), this could be slow.

💡 Alternative using reduce or sum
# Alternative approach that may be more efficient:
from functools import reduce
combined = reduce(lambda acc, path: acc + AudioSegment.from_file(path), chunk_paths, AudioSegment.empty())

However, for typical use cases (tens of chunks), the performance difference is negligible. This can be addressed later if profiling shows it's a bottleneck.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bfa7380 and b285616.

📒 Files selected for processing (6)
  • README.md
  • audio_separator/separator/audio_chunking.py
  • audio_separator/separator/separator.py
  • audio_separator/utils/cli.py
  • tests/unit/test_audio_chunking.py
  • tests/unit/test_separator_chunking.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/specify-rules.mdc)

**/*.py: Use Python 3.11+ syntax and features
Follow standard Python conventions for Python 3.11+
Use PyTorch for deep learning operations
Use librosa for audio loading and processing
Use soundfile for audio I/O operations
Use numpy for numerical array operations
Use onnxruntime for model inference when applicable
Use ruff for code linting and formatting checks

Files:

  • audio_separator/separator/audio_chunking.py
  • tests/unit/test_audio_chunking.py
  • audio_separator/separator/separator.py
  • audio_separator/utils/cli.py
  • tests/unit/test_separator_chunking.py
tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/specify-rules.mdc)

Run tests using pytest

Files:

  • tests/unit/test_audio_chunking.py
  • tests/unit/test_separator_chunking.py
🧬 Code graph analysis (1)
audio_separator/separator/separator.py (1)
audio_separator/separator/audio_chunking.py (4)
  • AudioChunker (9-141)
  • should_chunk (131-141)
  • split_audio (36-85)
  • merge_chunks (87-129)
🪛 Ruff (0.14.11)
audio_separator/separator/audio_chunking.py

25-25: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


52-52: Avoid specifying long messages outside the exception class

(TRY003)


104-104: Avoid specifying long messages outside the exception class

(TRY003)


109-109: Avoid specifying long messages outside the exception class

(TRY003)

tests/unit/test_audio_chunking.py

60-60: Unused method argument: mock_makedirs

(ARG002)


65-65: Unused lambda argument: key

(ARG005)


88-88: Unused method argument: mock_makedirs

(ARG002)


93-93: Unused lambda argument: key

(ARG005)


114-114: Probable insecure usage of temporary file or directory: "/tmp"

(S108)


137-137: Probable insecure usage of temporary file or directory: "/tmp/output.wav"

(S108)


139-139: Probable insecure usage of temporary file or directory: "/tmp/output.wav"

(S108)


145-145: Probable insecure usage of temporary file or directory: "/tmp/output.wav"

(S108)


153-153: Probable insecure usage of temporary file or directory: "/tmp/output.wav"

(S108)

audio_separator/separator/separator.py

190-190: Avoid specifying long messages outside the exception class

(TRY003)

tests/unit/test_separator_chunking.py

321-321: Unused function argument: chunk_path

(ARG001)


321-321: Unused function argument: custom_names

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: run-integration-test
  • GitHub Check: test-ubuntu (3.13)
  • GitHub Check: test-ubuntu (3.10)
  • GitHub Check: test-ubuntu (3.12)
  • GitHub Check: test-macos (3.10)
  • GitHub Check: test-windows (3.12)
  • GitHub Check: test-macos (3.13)
  • GitHub Check: test-ubuntu (3.11)
  • GitHub Check: test-windows (3.13)
  • GitHub Check: test-windows (3.10)
  • GitHub Check: test-windows (3.11)
  • GitHub Check: test-macos (3.12)
  • GitHub Check: test-macos (3.11)
🔇 Additional comments (15)
audio_separator/separator/separator.py (3)

187-191: LGTM! Validation for chunk_duration parameter.

The validation correctly rejects non-positive values. The parameter is optional (defaults to None) which allows backward compatibility.


996-998: GPU cache clearing should happen after state restoration.

Currently, clear_gpu_cache() is called after the finally block that restores state. This is the correct behavior to ensure the GPU cache is cleared even if state restoration encounters issues. Good defensive coding.


968-985: The path handling at line 984 is correct and necessary.

The paths returned from _separate_file() are relative filenames (e.g., "chunk_0000_(Vocals).wav"), not absolute paths. Although write_audio_pydub() and write_audio_soundfile() join output_dir with the path locally (lines 309 and 397 in common_separator.py), this modified path is not propagated back to the caller. The get_stem_output_path() method returns only the filename, which is then appended to output_files before being passed back. Therefore, the normalization check at line 984 correctly handles both cases: if a path is somehow absolute (edge case), it's used as-is; if relative, it's joined with temp_dir.

audio_separator/utils/cli.py (2)

62-73: LGTM! CLI option well-documented and properly integrated.

The help text is comprehensive, includes practical recommendations (600 seconds for files >1 hour), and notes the concatenation behavior. The type=float allows flexibility for fractional durations.


205-205: LGTM! Parameter correctly passed to Separator.

The chunk_duration is properly wired from CLI args to the Separator constructor.

audio_separator/separator/audio_chunking.py (3)

51-58: LGTM! Input validation and directory creation.

The method correctly validates input existence and creates the output directory if needed. Loading via AudioSegment.from_file supports multiple formats.


63-65: Ceiling division correctly calculates chunk count.

The formula (total + chunk_size - 1) // chunk_size is the correct ceiling division pattern to ensure the last partial chunk is included.


131-141: LGTM! Simple and correct threshold check.

The should_chunk method correctly compares audio duration against the configured chunk duration threshold.

README.md (1)

281-320: LGTM! Comprehensive and user-friendly documentation.

The new section thoroughly covers:

  • Practical usage examples with realistic scenarios
  • Clear explanation of the three-step workflow
  • Multi-stem model support (2/4/6 stems)
  • Benefits and recommendations based on system resources
  • Honest disclosure about potential boundary artifacts

This documentation aligns well with the implementation and will help users understand when and how to use chunking.

tests/unit/test_separator_chunking.py (6)

17-83: LGTM! Good test coverage for 2-stem model chunking.

The test properly mocks the AudioChunker and verifies:

  • Correct number of merge_chunks calls (2 for 2 stems)
  • Output contains expected stem names
  • State management through mock separator

195-236: LGTM! Thorough regex pattern testing.

Tests cover:

  • Standard stem patterns (Vocals, Instrumental, Drums, etc.)
  • Different file extensions
  • Multi-word stem names ("Backing Vocals")
  • Fallback scenarios when pattern doesn't match

319-328: Unused arguments are intentional for mock callback signature.

The static analysis flags chunk_path and custom_names as unused, but they're required to match the _separate_file method signature for the mock. This is a false positive.


430-461: Good test for error handling and state restoration.

This test verifies that the state is correctly restored even when an exception occurs during chunk processing. This is critical for ensuring the Separator remains usable after a failed operation.


589-631: Test for inconsistent stem counts is valuable but may not reflect real behavior.

The test verifies graceful handling when chunks produce different stem counts. In practice, the same model should always produce the same number of stems. However, this defensive test is good for robustness.


633-668: LGTM! Tests fallback naming behavior.

Good coverage of the edge case where output files don't match the expected _(StemName) pattern. The test verifies the fallback to stem_0 naming works correctly.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

The common_expected_args fixture was missing the new chunk_duration
parameter added to Separator.__init__(), causing all CLI tests to fail
in CI. Also corrected parameter order to match actual constructor
(use_soundfile before use_autocast).

Fixes:
- Added chunk_duration: None to expected args
- Reordered use_soundfile/use_autocast to match Separator.__init__()

All tests/unit/test_cli.py tests now pass (13 passed, 2 skipped).
Addressed CodeRabbit feedback:
1. Renamed unused mock parameter mock_makedirs to _mock_makedirs (ARG002)
2. Renamed unused lambda parameter key to _ (ARG005)
3. Replaced hardcoded /tmp paths with pytest tmp_path fixture (S108)

All 17 tests still pass after these changes.
@beveradb beveradb merged commit 10fa098 into nomadkaraoke:main Jan 16, 2026
18 checks passed
@beveradb
Copy link
Collaborator

Thank you for contributing this @shimpeiws :)
Should be released in version 0.41.0 shortly

@shimpeiws shimpeiws deleted the feature/file-level-chunking-158 branch January 16, 2026 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Insufficient memory when processing large files

2 participants