intro-fingerprint

An MPV script to skip intro sequences in media by fingerprinting audio and video.

When you mark an intro in one episode, the script can search for that same intro in other episodes (using either video or audio matching) and skip it automatically.

Features

Audio Fingerprinting: Uses Constellation Hashing to find identical audio patterns, robust to noise and distortion. (Recommended/Default)
Video Fingerprinting: Uses PDQ Hash (Perceptual Hashing) to find visually similar intros.
High Performance:
- Uses LuaJIT FFI for zero-allocation data processing to handle large audio/video datasets efficiently.
- Optimized Pure-Lua Fallback for environments without LuaJIT (e.g., some Linux builds), achieving ~2.5x faster FFTs than standard implementations.
Async Execution: Scans run in the background using mpv coroutines and async subprocesses, ensuring the player remains responsive.
Cross-Platform: Supports Windows, Linux, and macOS (with appropriate dependencies).

Requirements

ffmpeg (required) must be in your system PATH. (Install Instructions)
LuaJIT (optional) is highly recommended. The script uses FFI C-arrays for audio processing to avoid massive Garbage Collection overhead (standard in mpv). (Install Instructions)
'bit' library (optional): Standard in LuaJIT. Used for faster processing if available.

Installation

Automatic (Windows)

Run the following command in PowerShell:

irm https://raw.githubusercontent.com/jjangsangy/intro-fingerprint/main/installers/install.ps1 | iex

Automatic (Linux / macOS)

Run the following command in your terminal:

curl -fsSL https://raw.githubusercontent.com/jjangsangy/intro-fingerprint/main/installers/install.sh | sh

Manual

Download the (Latest Release)
Extract the contents directly into your mpv configuration directory:
- Windows: %APPDATA%\mpv\
- Linux/macOS: ~/.config/mpv/

Note: Automatic install scripts do not work for portable_config directories. If you are using a portable config, you must install it manually.

Usage

Open a video that contains the intro you want to skip.
Seek to the very end of the intro.
Press Ctrl+i to save the fingerprint. This captures both video frame and audio spectrogram data to temporary files.
Open another video (e.g., the next episode).
Press Ctrl+s (Audio scan) or Ctrl+Shift+s (Video scan) to find and skip the intro.

Key Bindings

Ctrl+i: Save Intro. Captures the current timestamp as the intro fingerprint (saves video frame and audio data to temp files).
Ctrl+s: Skip Intro (Audio). Scans the audio stream for a match based on the saved audio fingerprint.
- Note: Audio fingerprinting is significantly faster and is the default method. However, if the intro music changes between episodes while the video remains the same, use Video Skip instead.
Ctrl+Shift+s: Skip Intro (Video). Scans the current video for a match based on the saved video fingerprint.

Customizing Key Bindings

You can customize the key bindings using either intro-fingerprint.conf file or input.conf.

1. Using `intro-fingerprint.conf`

You can change the default key bindings by setting the following options in your intro-fingerprint.conf file:

key_save_intro=Ctrl+i
key_skip_audio=Ctrl+s
key_skip_video=Ctrl+Shift+s

2. Using `input.conf`

You can map any key to the script's named bindings in your input.conf file. The internal binding names are:

save-intro
skip-intro-audio
skip-intro-video

Example input.conf:

Alt+i script-binding save-intro
Alt+s script-binding skip-intro-audio
Alt+Shift+s script-binding skip-intro-video

Configuration

You can customize the script by creating intro-fingerprint.conf in your mpv script-opts folder.

General

Option	Default	Description
`debug`	`no`	Enable console debug printing for performance stats and scan info.

Audio Options

Option	Default	Description
`audio_threshold`	`10`	Minimum magnitude for frequency peaks and minimum matches for a valid skip.
`audio_min_match_ratio`	`0.30`	Minimum ratio of matching hashes required (0.0 - 1.0).
`audio_concurrency`	`4`	Number of parallel FFmpeg workers for audio scanning.
`audio_scan_limit`	`900`	Maximum seconds of the file to scan for audio matches.
`audio_sample_rate`	`11025`	Sample rate for audio extraction.
`audio_segment_duration`	`15`	Duration (seconds) of each audio scan segment for the linear scan.
`audio_fingerprint_duration`	`10`	Duration (seconds) of the audio fingerprint to capture.
`audio_fft_size`	`2048`	FFT size for audio processing.
`audio_hop_size`	`1024`	Hop size (overlap) between FFT frames.
`audio_target_t_min`	`10`	Minimum delay in frames for peak pairs in constellation hashing.
`audio_target_t_max`	`100`	Maximum delay in frames for peak pairs in constellation hashing.

Audio Validation Options

Option	Default	Description
`audio_silence_threshold`	`0.005`	RMS amplitude threshold below which audio is considered silence.
`audio_sparsity_threshold`	`0.10`	Minimum signal density (non-zero samples ratio).
`audio_min_complexity`	`50`	Minimum number of hashes required for a valid fingerprint.

Video Options

Option	Default	Description
`video_hash_size`	`64`	Hash size (64x64 input -> 16x16 DCT -> 256 bit hash).
`video_threshold`	`50`	Tolerance for Hamming Distance (0-256). Lower is stricter.
`video_interval`	`0.20`	Time interval (seconds) between checked frames during video scan.
`video_search_window`	`10`	Initial seconds before/after saved timestamp to search.
`video_max_search_window`	`300`	Maximum seconds to expand the search window.
`video_window_step`	`30`	Step size (seconds) when expanding the video search window.

Video Validation Options

Option	Default	Description
`video_min_brightness`	`15`	Minimum mean brightness (0-255).
`video_max_brightness`	`240`	Maximum mean brightness (0-255).
`video_min_contrast`	`10.0`	Minimum standard deviation.
`video_min_entropy`	`4.0`	Minimum entropy (0-8).
`video_min_quality`	`50`	Minimum PDQ quality score (0-100).

File Paths

Option	Default	Description
`audio_temp_filename`	`mpv_intro_skipper_audio.dat`	Name of temp file used for audio
`video_temp_filename`	`mpv_intro_skipper_video.dat`	Name of temp file used for video

Key Bindings

Option	Default	Description
`key_save_intro`	`Ctrl+i`	Key binding to save the intro fingerprint.
`key_skip_video`	`Ctrl+Shift+s`	Key binding to skip using video fingerprinting.
`key_skip_audio`	`Ctrl+s`	Key binding to skip using audio fingerprinting.

Quality Validation

To prevent false positives and wasted scans, the script validates media quality before creating a fingerprint.

Audio Validation

If the audio is too simple or quiet, you will see an "Audio Rejected" message. This happens if:

Silence Detected: Audio is too quiet (RMS < 0.005).
Signal Too Sparse: Audio is mostly silence (< 10% active samples).
Low Complexity: Audio lacks distinct frequency peaks (< 50 hashes generated).

Video Validation (Frame Rejection)

To ensure robust matching, the system automatically validates frames before creating a fingerprint. A frame is rejected if it fails any of the following checks:

Extreme Darkness/Brightness: The image is almost entirely black (Mean < 15) or white (Mean > 240).
Low Contrast: The image looks flat with little variation in brightness (StdDev < 10.0).
Low Structure: The image lacks distinct edges or consists of smooth gradients (PDQ Quality < 50).
Low Information: The image is too simple or repetitive (Entropy < 4.0).

Examples

1. Good Frame (Accepted)

Original Frame	What PDQ Hash Sees

Reason: High Quality. The image has distinct edges, good contrast, and clear shapes that remain visible even after resizing. This produces a strong, unique fingerprint.

2. Bad Frame (Too Dark & Flat)

Original Frame	What PDQ Hash Sees

Reason: Extremely Dark & Low Contrast. The scene is too dim to extract meaningful features. The PDQ algorithm effectively sees a black square, which would match any other dark scene.

3. Bad Frame (Low Structure)

Original Frame	What PDQ Hash Sees

Reason: Lack of Sharp Edges. The image consists of smooth color transitions (gradients) without any sharp lines. PDQ Hash relies on edge detection, so smooth blurs result in a weak fingerprint that fails the Gradient Quality check.

4. Weak Frame (Low Texture)

Original Frame	What PDQ Hash Sees

Reason: Low Feature Density. While this frame technically passes the rejection thresholds, it is a borderline candidate. Large areas of the image are flat color (low texture), meaning the hash has fewer "anchors" than a highly detailed scene. It is better to choose a frame with more complex details if possible.

Tip: Always choose a frame with clear shapes, high contrast, and distinct objects. If you encounter errors, try moving the playback position slightly forward or backward to a more complex part of the intro.

How it Works

The script uses two primary methods for fingerprinting:

1. Audio Fingerprinting (Constellation Hashing)

Algorithm: Extracts audio using FFmpeg (s16le, mono) and performs FFT to identify peak frequencies in time-frequency bins.
Hashing: Pairs peaks to form hashes: [f1][f2][delta_time].
Matching: Uses a Global Offset Histogram. Every match calculates $Offset = T_{file} - T_{query}$, and the script looks for the largest cluster (peak) of consistent offsets.
Filtering: Implements Match Ratio filtering (default 30%) to ensure the match is an exact fingerprint overlap rather than just similar-sounding music.
Search Strategy: Concurrent Linear Scan. The timeline is divided into contiguous segments (e.g., 10s). Each segment is processed by a concurrent worker with sufficient padding to ensure no matches are lost at segment boundaries. Hashes are filtered to prevent double-counting in overlapping regions.
Optimization:
- Concurrency: Launches multiple parallel FFmpeg workers to utilize all CPU cores.
- Inverted Index: Uses an $O(1)$ hash-map for near-instant lookup of fingerprints during the scan.
- Optimal Stopping: Scans terminate immediately once a high-confidence match is confirmed and the signal gradient drops.

2. Video Fingerprinting (PDQ Hash)

Algorithm: Downsamples frames to 512x512, converts to grayscale (Luma), and applies a 2-pass Jarosz filter. Then, resizes to 64x64 and computes the Discrete Cosine Transform (DCT) of the rows and columns. A 256-bit hash (32 bytes) is generated from the low-frequency 16x16 coefficients by comparing each coefficient against the median value.
Matching: Uses Hamming Distance (count of differing bits). It is robust against color changes, small aspect ratio variations, and high-frequency noise.
Search Strategy: The search starts around the timestamp of the saved fingerprint and expands outward.
Optimization: FFmpeg video decoding is the most expensive part of the pipeline. By assuming the intro is at a similar location (common in episodic content), we avoid decoding the entire stream, resulting in much faster scans.

Jarosz Filter Approximation

The script approximates the Jarosz filter (essential for PDQ robustness) using an optimized FFmpeg filter chain: scale=512:512:flags=bilinear, colorchannelmixer (exact luminance), avgblur=sizeX=4:sizeY=4 (applied twice), and scale=64:64:flags=neighbor. This configuration matches closely but is not exact with the official PDQ C++ implementation.

Performance & Technical Details

The script is heavily optimized for LuaJIT and high-performance processing.

1. LuaJIT FFI & Memory Management

Zero-Allocation Data Processing: Critical hot paths use LuaJIT FFI C-arrays (double[], int16_t[]) instead of Lua tables. This prevents massive Garbage Collection (GC) pauses that would occur if creating millions of small table objects for audio samples and hashes.
Flattened Data Structures: 2D data (like spectrogram peaks) is flattened into 1D C-arrays to ensure memory contiguity and cache friendliness.
Direct Memory Access: Raw audio and video buffers from FFmpeg are cast directly to C-structs using FFI, avoiding any copying or string manipulation in Lua.

2. Audio FFT Processing

The script uses highly optimized internal FFT implementations:

For LuaJIT (FFI-Optimized)

Stockham Auto-Sort Algorithm: Avoids the expensive bit-reversal permutation step, maximizing FFI performance.
Radix-4 & Mixed-Radix: Processes 4 points at a time to reduce complex multiplications, with Radix-2 fallback passes to handle non-power-of-4 sizes (e.g., 2048).
Cache-Aware Loop Tiling: Ensures unit-stride memory access for maximum memory throughput.

For Standard Lua (Interpreter-Optimized)

Zero-Allocation Processing: Replaces table churn with reusable buffers to minimize Garbage Collection overhead.
Fused Scrambling: Combines Hann windowing and bit-reversal into a single pass.
Precomputed Lookups: Uses pre-calculated trig tables and bit-reversal maps to avoid redundant math inside hot loops.
Speedup: Achieves approximately 2.5x faster processing compared to naive Lua implementations.

3. Algorithmic Optimizations

Inverted Index Matching: Fingerprints are stored in a hash map ($O(1)$ lookup), allowing the scanner to instantly find potential matches without iterating through the reference data.
Precomputed Population Count: A 256-entry lookup table is used to calculate Hamming distances for video hashes, replacing bit-twiddling loops with a single table lookup per byte.
Gradient-Based Early Stopping: The scanner monitors the "match strength" gradient. Once a peak is found and the signal begins to fade, the scan aborts immediately, saving CPU time.
Asynchronous Concurrency: Uses mpv coroutines and multiple parallel FFmpeg workers to utilize all CPU cores without blocking the player UI.

Install FFmpeg

This script relies on ffmpeg being available in your system's PATH.

Windows

Using a package manager (recommended):

Winget:

winget install ffmpeg

Chocolatey:

choco install ffmpeg

Scoop:

scoop install ffmpeg

macOS

Using Homebrew:

brew install ffmpeg

Linux

Debian/Ubuntu:

sudo apt update && sudo apt install ffmpeg

Fedora:

sudo dnf install ffmpeg

Arch Linux:

sudo pacman -S ffmpeg

Troubleshooting

"Audio Rejected" / "Frame Rejected":
- Cause: The scene is too simple (silence, black screen, featureless background) to generate a unique fingerprint.
- Solution: Seek forward or backward by a few seconds to a scene with clear audio (dialogue/music) or visual detail, then press Ctrl+i again.
"FFmpeg failed during scan":
- Cause: ffmpeg is missing or not in system PATH.
- Solution: Install FFmpeg and verify it runs from a terminal.
No match found:
- Video: Try increasing video_threshold in config, or ensure the intro is visually identical.
- Audio: Ensure the intro music is consistent. If the intro has variable music but same video, use Video Skip (Ctrl+Shift+s).

Verifying LuaJIT Support

This script is highly optimized for LuaJIT. While it includes a fallback for standard Lua (5.1/5.2), using LuaJIT provides significantly faster performance, especially for audio scanning.

To check if your mpv build uses LuaJIT, run the following command in your terminal:

Windows:

mpv --version -v | findstr luajit

macOS / Linux:

mpv --version -v | grep luajit

If the command returns a line containing luajit, you are good to go. If it returns nothing, you are likely using standard Lua.

If luajit is missing:

Windows: These package managers typically install the shinchiro builds (or equivalent) which include LuaJIT support.
- Scoop:
```
scoop bucket add extras
scoop install mpv
```
- Chocolatey: choco install mpvio
- Winget: winget install "mpv (Unofficial)"
- Or download the official builds directly from mpv.io (select the shinchiro builds).
macOS: Install via Homebrew (brew install mpv).
Linux:
- Arch Linux: Install with Pacman (pacman -S mpv)
- Ubuntu: The default mpv package in apt often lacks LuaJIT support or is outdated. Use the ubuntuhandbook1/mpv PPA
```
sudo add-apt-repository ppa:ubuntuhandbook1/mpv
sudo apt update
sudo apt install mpv
```
- Fedora: The default repositories may lack full codec support or features. Use RPMFusion:
```
sudo dnf install https://mirrors.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm
sudo dnf install mpv
```
- Other Distributions: Install via Flatpak from Flathub.

Development & Testing

You can use the provided VS Code DevContainer to test the script in a pre-configured Linux environment:

Open the project in VS Code.
Click Reopen in Container when prompted.
The container comes with mpv, ffmpeg, and xvfb pre-installed.
To test: xvfb-run mpv --script=main.lua videos
- Note: Place your test videos in the videos/ folder in the project root to have them available inside the container.

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
.clinerules		.clinerules
.devcontainer		.devcontainer
.github		.github
assets		assets
installers		installers
memory-bank		memory-bank
modules		modules
scripts		scripts
tests		tests
.clineignore		.clineignore
.gitignore		.gitignore
.luarc.json		.luarc.json
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
intro-fingerprint.conf		intro-fingerprint.conf
main.lua		main.lua

Uh oh!

License

jjangsangy/intro-fingerprint

Folders and files

Latest commit

History

Repository files navigation

intro-fingerprint

Features

Requirements

Installation

Automatic (Windows)

Automatic (Linux / macOS)

Manual

Usage

Key Bindings

Customizing Key Bindings

1. Using intro-fingerprint.conf

2. Using input.conf

Configuration

General

Audio Options

Audio Validation Options

Video Options

Video Validation Options

File Paths

Key Bindings

Quality Validation

Audio Validation

Video Validation (Frame Rejection)

Examples

1. Good Frame (Accepted)

2. Bad Frame (Too Dark & Flat)

3. Bad Frame (Low Structure)

4. Weak Frame (Low Texture)

How it Works

1. Audio Fingerprinting (Constellation Hashing)

2. Video Fingerprinting (PDQ Hash)

Jarosz Filter Approximation

Performance & Technical Details

1. LuaJIT FFI & Memory Management

2. Audio FFT Processing

For LuaJIT (FFI-Optimized)

For Standard Lua (Interpreter-Optimized)

3. Algorithmic Optimizations

Install FFmpeg

Windows

macOS

Linux

Troubleshooting

Verifying LuaJIT Support

Development & Testing

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 15

Sponsor this project

Uh oh!

Packages 0

Languages

1. Using `intro-fingerprint.conf`

2. Using `input.conf`

Packages