An MPV script to skip intro sequences in media by fingerprinting audio and video.
When you mark an intro in one episode, the script can search for that same intro in other episodes (using either video or audio matching) and skip it automatically.
- Audio Fingerprinting: Uses Constellation Hashing to find identical audio patterns, robust to noise and distortion. (Recommended/Default)
- Video Fingerprinting: Uses PDQ Hash (Perceptual Hashing) to find visually similar intros.
- High Performance:
- Uses LuaJIT FFI for zero-allocation data processing to handle large audio/video datasets efficiently.
- Optimized Pure-Lua Fallback for environments without LuaJIT (e.g., some Linux builds), achieving ~2.5x faster FFTs than standard implementations.
- Async Execution: Scans run in the background using mpv coroutines and async subprocesses, ensuring the player remains responsive.
- Cross-Platform: Supports Windows, Linux, and macOS (with appropriate dependencies).
- ffmpeg (required) must be in your system
PATH. (Install Instructions) - LuaJIT (optional) is highly recommended. The script uses FFI C-arrays for audio processing to avoid massive Garbage Collection overhead (standard in mpv). (Install Instructions)
- 'bit' library (optional): Standard in LuaJIT. Used for faster processing if available.
Run the following command in PowerShell:
irm https://raw.githubusercontent.com/jjangsangy/intro-fingerprint/main/installers/install.ps1 | iexRun the following command in your terminal:
curl -fsSL https://raw.githubusercontent.com/jjangsangy/intro-fingerprint/main/installers/install.sh | sh- Download the (Latest Release)
- Extract the contents directly into your mpv configuration directory:
- Windows:
%APPDATA%\mpv\ - Linux/macOS:
~/.config/mpv/
- Windows:
Note: Automatic install scripts do not work for
portable_configdirectories. If you are using a portable config, you must install it manually.
- Open a video that contains the intro you want to skip.
- Seek to the very end of the intro.
- Press
Ctrl+ito save the fingerprint. This captures both video frame and audio spectrogram data to temporary files. - Open another video (e.g., the next episode).
- Press
Ctrl+s(Audio scan) orCtrl+Shift+s(Video scan) to find and skip the intro.
Ctrl+i: Save Intro. Captures the current timestamp as the intro fingerprint (saves video frame and audio data to temp files).Ctrl+s: Skip Intro (Audio). Scans the audio stream for a match based on the saved audio fingerprint.- Note: Audio fingerprinting is significantly faster and is the default method. However, if the intro music changes between episodes while the video remains the same, use Video Skip instead.
Ctrl+Shift+s: Skip Intro (Video). Scans the current video for a match based on the saved video fingerprint.
You can customize the key bindings using either intro-fingerprint.conf file or input.conf.
You can change the default key bindings by setting the following options in your intro-fingerprint.conf file:
key_save_intro=Ctrl+i
key_skip_audio=Ctrl+s
key_skip_video=Ctrl+Shift+sYou can map any key to the script's named bindings in your input.conf file. The internal binding names are:
save-introskip-intro-audioskip-intro-video
Example input.conf:
Alt+i script-binding save-intro
Alt+s script-binding skip-intro-audio
Alt+Shift+s script-binding skip-intro-videoYou can customize the script by creating intro-fingerprint.conf in your mpv script-opts folder.
| Option | Default | Description |
|---|---|---|
debug |
no |
Enable console debug printing for performance stats and scan info. |
| Option | Default | Description |
|---|---|---|
audio_threshold |
10 |
Minimum magnitude for frequency peaks and minimum matches for a valid skip. |
audio_min_match_ratio |
0.30 |
Minimum ratio of matching hashes required (0.0 - 1.0). |
audio_concurrency |
4 |
Number of parallel FFmpeg workers for audio scanning. |
audio_scan_limit |
900 |
Maximum seconds of the file to scan for audio matches. |
audio_sample_rate |
11025 |
Sample rate for audio extraction. |
audio_segment_duration |
15 |
Duration (seconds) of each audio scan segment for the linear scan. |
audio_fingerprint_duration |
10 |
Duration (seconds) of the audio fingerprint to capture. |
audio_fft_size |
2048 |
FFT size for audio processing. |
audio_hop_size |
1024 |
Hop size (overlap) between FFT frames. |
audio_target_t_min |
10 |
Minimum delay in frames for peak pairs in constellation hashing. |
audio_target_t_max |
100 |
Maximum delay in frames for peak pairs in constellation hashing. |
| Option | Default | Description |
|---|---|---|
audio_silence_threshold |
0.005 |
RMS amplitude threshold below which audio is considered silence. |
audio_sparsity_threshold |
0.10 |
Minimum signal density (non-zero samples ratio). |
audio_min_complexity |
50 |
Minimum number of hashes required for a valid fingerprint. |
| Option | Default | Description |
|---|---|---|
video_hash_size |
64 |
Hash size (64x64 input -> 16x16 DCT -> 256 bit hash). |
video_threshold |
50 |
Tolerance for Hamming Distance (0-256). Lower is stricter. |
video_interval |
0.20 |
Time interval (seconds) between checked frames during video scan. |
video_search_window |
10 |
Initial seconds before/after saved timestamp to search. |
video_max_search_window |
300 |
Maximum seconds to expand the search window. |
video_window_step |
30 |
Step size (seconds) when expanding the video search window. |
| Option | Default | Description |
|---|---|---|
video_min_brightness |
15 |
Minimum mean brightness (0-255). |
video_max_brightness |
240 |
Maximum mean brightness (0-255). |
video_min_contrast |
10.0 |
Minimum standard deviation. |
video_min_entropy |
4.0 |
Minimum entropy (0-8). |
video_min_quality |
50 |
Minimum PDQ quality score (0-100). |
| Option | Default | Description |
|---|---|---|
audio_temp_filename |
mpv_intro_skipper_audio.dat |
Name of temp file used for audio |
video_temp_filename |
mpv_intro_skipper_video.dat |
Name of temp file used for video |
| Option | Default | Description |
|---|---|---|
key_save_intro |
Ctrl+i |
Key binding to save the intro fingerprint. |
key_skip_video |
Ctrl+Shift+s |
Key binding to skip using video fingerprinting. |
key_skip_audio |
Ctrl+s |
Key binding to skip using audio fingerprinting. |
To prevent false positives and wasted scans, the script validates media quality before creating a fingerprint.
If the audio is too simple or quiet, you will see an "Audio Rejected" message. This happens if:
- Silence Detected: Audio is too quiet (RMS < 0.005).
- Signal Too Sparse: Audio is mostly silence (< 10% active samples).
- Low Complexity: Audio lacks distinct frequency peaks (< 50 hashes generated).
To ensure robust matching, the system automatically validates frames before creating a fingerprint. A frame is rejected if it fails any of the following checks:
- Extreme Darkness/Brightness: The image is almost entirely black (
Mean < 15) or white (Mean > 240). - Low Contrast: The image looks flat with little variation in brightness (
StdDev < 10.0). - Low Structure: The image lacks distinct edges or consists of smooth gradients (
PDQ Quality < 50). - Low Information: The image is too simple or repetitive (
Entropy < 4.0).
| Original Frame | What PDQ Hash Sees |
|---|---|
![]() |
![]() |
Reason: High Quality. The image has distinct edges, good contrast, and clear shapes that remain visible even after resizing. This produces a strong, unique fingerprint.
| Original Frame | What PDQ Hash Sees |
|---|---|
![]() |
![]() |
Reason: Extremely Dark & Low Contrast. The scene is too dim to extract meaningful features. The PDQ algorithm effectively sees a black square, which would match any other dark scene.
| Original Frame | What PDQ Hash Sees |
|---|---|
![]() |
![]() |
Reason: Lack of Sharp Edges. The image consists of smooth color transitions (gradients) without any sharp lines. PDQ Hash relies on edge detection, so smooth blurs result in a weak fingerprint that fails the Gradient Quality check.
| Original Frame | What PDQ Hash Sees |
|---|---|
![]() |
![]() |
Reason: Low Feature Density. While this frame technically passes the rejection thresholds, it is a borderline candidate. Large areas of the image are flat color (low texture), meaning the hash has fewer "anchors" than a highly detailed scene. It is better to choose a frame with more complex details if possible.
Tip: Always choose a frame with clear shapes, high contrast, and distinct objects. If you encounter errors, try moving the playback position slightly forward or backward to a more complex part of the intro.
The script uses two primary methods for fingerprinting:
- Algorithm: Extracts audio using FFmpeg (s16le, mono) and performs FFT to identify peak frequencies in time-frequency bins.
-
Hashing: Pairs peaks to form hashes:
[f1][f2][delta_time]. -
Matching: Uses a Global Offset Histogram. Every match calculates
$Offset = T_{file} - T_{query}$ , and the script looks for the largest cluster (peak) of consistent offsets. - Filtering: Implements Match Ratio filtering (default 30%) to ensure the match is an exact fingerprint overlap rather than just similar-sounding music.
- Search Strategy: Concurrent Linear Scan. The timeline is divided into contiguous segments (e.g., 10s). Each segment is processed by a concurrent worker with sufficient padding to ensure no matches are lost at segment boundaries. Hashes are filtered to prevent double-counting in overlapping regions.
-
Optimization:
- Concurrency: Launches multiple parallel FFmpeg workers to utilize all CPU cores.
-
Inverted Index: Uses an
$O(1)$ hash-map for near-instant lookup of fingerprints during the scan. - Optimal Stopping: Scans terminate immediately once a high-confidence match is confirmed and the signal gradient drops.
- Algorithm: Downsamples frames to 512x512, converts to grayscale (Luma), and applies a 2-pass Jarosz filter. Then, resizes to 64x64 and computes the Discrete Cosine Transform (DCT) of the rows and columns. A 256-bit hash (32 bytes) is generated from the low-frequency 16x16 coefficients by comparing each coefficient against the median value.
- Matching: Uses Hamming Distance (count of differing bits). It is robust against color changes, small aspect ratio variations, and high-frequency noise.
- Search Strategy: The search starts around the timestamp of the saved fingerprint and expands outward.
- Optimization: FFmpeg video decoding is the most expensive part of the pipeline. By assuming the intro is at a similar location (common in episodic content), we avoid decoding the entire stream, resulting in much faster scans.
The script approximates the Jarosz filter (essential for PDQ robustness) using an optimized FFmpeg filter chain: scale=512:512:flags=bilinear, colorchannelmixer (exact luminance), avgblur=sizeX=4:sizeY=4 (applied twice), and scale=64:64:flags=neighbor. This configuration matches closely but is not exact with the official PDQ C++ implementation.
The script is heavily optimized for LuaJIT and high-performance processing.
- Zero-Allocation Data Processing: Critical hot paths use LuaJIT FFI C-arrays (
double[],int16_t[]) instead of Lua tables. This prevents massive Garbage Collection (GC) pauses that would occur if creating millions of small table objects for audio samples and hashes. - Flattened Data Structures: 2D data (like spectrogram peaks) is flattened into 1D C-arrays to ensure memory contiguity and cache friendliness.
- Direct Memory Access: Raw audio and video buffers from FFmpeg are cast directly to C-structs using FFI, avoiding any copying or string manipulation in Lua.
The script uses highly optimized internal FFT implementations:
- Stockham Auto-Sort Algorithm: Avoids the expensive bit-reversal permutation step, maximizing FFI performance.
- Radix-4 & Mixed-Radix: Processes 4 points at a time to reduce complex multiplications, with Radix-2 fallback passes to handle non-power-of-4 sizes (e.g., 2048).
- Cache-Aware Loop Tiling: Ensures unit-stride memory access for maximum memory throughput.
- Zero-Allocation Processing: Replaces table churn with reusable buffers to minimize Garbage Collection overhead.
- Fused Scrambling: Combines Hann windowing and bit-reversal into a single pass.
- Precomputed Lookups: Uses pre-calculated trig tables and bit-reversal maps to avoid redundant math inside hot loops.
- Speedup: Achieves approximately 2.5x faster processing compared to naive Lua implementations.
-
Inverted Index Matching: Fingerprints are stored in a hash map (
$O(1)$ lookup), allowing the scanner to instantly find potential matches without iterating through the reference data. - Precomputed Population Count: A 256-entry lookup table is used to calculate Hamming distances for video hashes, replacing bit-twiddling loops with a single table lookup per byte.
- Gradient-Based Early Stopping: The scanner monitors the "match strength" gradient. Once a peak is found and the signal begins to fade, the scan aborts immediately, saving CPU time.
-
Asynchronous Concurrency: Uses
mpvcoroutines and multiple parallel FFmpeg workers to utilize all CPU cores without blocking the player UI.
This script relies on ffmpeg being available in your system's PATH.
Using a package manager (recommended):
Winget:
winget install ffmpegChocolatey:
choco install ffmpegScoop:
scoop install ffmpegUsing Homebrew:
brew install ffmpegDebian/Ubuntu:
sudo apt update && sudo apt install ffmpegFedora:
sudo dnf install ffmpegArch Linux:
sudo pacman -S ffmpeg-
"Audio Rejected" / "Frame Rejected":
- Cause: The scene is too simple (silence, black screen, featureless background) to generate a unique fingerprint.
- Solution: Seek forward or backward by a few seconds to a scene with clear audio (dialogue/music) or visual detail, then press
Ctrl+iagain.
-
"FFmpeg failed during scan":
- Cause:
ffmpegis missing or not in system PATH. - Solution: Install FFmpeg and verify it runs from a terminal.
- Cause:
-
No match found:
- Video: Try increasing
video_thresholdin config, or ensure the intro is visually identical. - Audio: Ensure the intro music is consistent. If the intro has variable music but same video, use Video Skip (
Ctrl+Shift+s).
- Video: Try increasing
This script is highly optimized for LuaJIT. While it includes a fallback for standard Lua (5.1/5.2), using LuaJIT provides significantly faster performance, especially for audio scanning.
To check if your mpv build uses LuaJIT, run the following command in your terminal:
Windows:
mpv --version -v | findstr luajitmacOS / Linux:
mpv --version -v | grep luajitIf the command returns a line containing luajit, you are good to go. If it returns nothing, you are likely using standard Lua.
If luajit is missing:
- Windows:
These package managers typically install the shinchiro builds (or equivalent) which include LuaJIT support.
- Scoop:
scoop bucket add extras scoop install mpv
- Chocolatey:
choco install mpvio - Winget:
winget install "mpv (Unofficial)" - Or download the official builds directly from mpv.io (select the shinchiro builds).
- Scoop:
- macOS: Install via Homebrew (
brew install mpv). - Linux:
- Arch Linux: Install with Pacman (
pacman -S mpv) - Ubuntu: The default
mpvpackage in apt often lacks LuaJIT support or is outdated. Use the ubuntuhandbook1/mpv PPAsudo add-apt-repository ppa:ubuntuhandbook1/mpv sudo apt update sudo apt install mpv
- Fedora: The default repositories may lack full codec support or features. Use RPMFusion:
sudo dnf install https://mirrors.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm sudo dnf install mpv - Other Distributions: Install via Flatpak from Flathub.
- Arch Linux: Install with Pacman (
You can use the provided VS Code DevContainer to test the script in a pre-configured Linux environment:
- Open the project in VS Code.
- Click Reopen in Container when prompted.
- The container comes with
mpv,ffmpeg, andxvfbpre-installed. - To test:
xvfb-run mpv --script=main.lua videos- Note: Place your test videos in the
videos/folder in the project root to have them available inside the container.
- Note: Place your test videos in the







