Skip to content

An MPV script to skip intro sequences in videos by fingerprinting audio and video

License

Notifications You must be signed in to change notification settings

jjangsangy/intro-fingerprint

Repository files navigation

intro-fingerprint

intro-fingerprint

CI Lint GitHub release (latest by date) GitHub last commit Platform: mpv LuaJIT Optimized License: MIT

An MPV script to skip intro sequences in media by fingerprinting audio and video.

When you mark an intro in one episode, the script can search for that same intro in other episodes (using either video or audio matching) and skip it automatically.

Features

  • Audio Fingerprinting: Uses Constellation Hashing to find identical audio patterns, robust to noise and distortion. (Recommended/Default)
  • Video Fingerprinting: Uses PDQ Hash (Perceptual Hashing) to find visually similar intros.
  • High Performance:
    • Uses LuaJIT FFI for zero-allocation data processing to handle large audio/video datasets efficiently.
    • Optimized Pure-Lua Fallback for environments without LuaJIT (e.g., some Linux builds), achieving ~2.5x faster FFTs than standard implementations.
  • Async Execution: Scans run in the background using mpv coroutines and async subprocesses, ensuring the player remains responsive.
  • Cross-Platform: Supports Windows, Linux, and macOS (with appropriate dependencies).

Requirements

  • ffmpeg (required) must be in your system PATH. (Install Instructions)
  • LuaJIT (optional) is highly recommended. The script uses FFI C-arrays for audio processing to avoid massive Garbage Collection overhead (standard in mpv). (Install Instructions)
  • 'bit' library (optional): Standard in LuaJIT. Used for faster processing if available.

Installation

Automatic (Windows)

Run the following command in PowerShell:

irm https://raw.githubusercontent.com/jjangsangy/intro-fingerprint/main/installers/install.ps1 | iex

Automatic (Linux / macOS)

Run the following command in your terminal:

curl -fsSL https://raw.githubusercontent.com/jjangsangy/intro-fingerprint/main/installers/install.sh | sh

Manual

  1. Download the (Latest Release)
  2. Extract the contents directly into your mpv configuration directory:
    • Windows: %APPDATA%\mpv\
    • Linux/macOS: ~/.config/mpv/

Note: Automatic install scripts do not work for portable_config directories. If you are using a portable config, you must install it manually.

Usage

  1. Open a video that contains the intro you want to skip.
  2. Seek to the very end of the intro.
  3. Press Ctrl+i to save the fingerprint. This captures both video frame and audio spectrogram data to temporary files.
  4. Open another video (e.g., the next episode).
  5. Press Ctrl+s (Audio scan) or Ctrl+Shift+s (Video scan) to find and skip the intro.

Key Bindings

  • Ctrl+i: Save Intro. Captures the current timestamp as the intro fingerprint (saves video frame and audio data to temp files).
  • Ctrl+s: Skip Intro (Audio). Scans the audio stream for a match based on the saved audio fingerprint.
    • Note: Audio fingerprinting is significantly faster and is the default method. However, if the intro music changes between episodes while the video remains the same, use Video Skip instead.
  • Ctrl+Shift+s: Skip Intro (Video). Scans the current video for a match based on the saved video fingerprint.

Customizing Key Bindings

You can customize the key bindings using either intro-fingerprint.conf file or input.conf.

1. Using intro-fingerprint.conf

You can change the default key bindings by setting the following options in your intro-fingerprint.conf file:

key_save_intro=Ctrl+i
key_skip_audio=Ctrl+s
key_skip_video=Ctrl+Shift+s

2. Using input.conf

You can map any key to the script's named bindings in your input.conf file. The internal binding names are:

  • save-intro
  • skip-intro-audio
  • skip-intro-video

Example input.conf:

Alt+i script-binding save-intro
Alt+s script-binding skip-intro-audio
Alt+Shift+s script-binding skip-intro-video

Configuration

You can customize the script by creating intro-fingerprint.conf in your mpv script-opts folder.

General

Option Default Description
debug no Enable console debug printing for performance stats and scan info.

Audio Options

Option Default Description
audio_threshold 10 Minimum magnitude for frequency peaks and minimum matches for a valid skip.
audio_min_match_ratio 0.30 Minimum ratio of matching hashes required (0.0 - 1.0).
audio_concurrency 4 Number of parallel FFmpeg workers for audio scanning.
audio_scan_limit 900 Maximum seconds of the file to scan for audio matches.
audio_sample_rate 11025 Sample rate for audio extraction.
audio_segment_duration 15 Duration (seconds) of each audio scan segment for the linear scan.
audio_fingerprint_duration 10 Duration (seconds) of the audio fingerprint to capture.
audio_fft_size 2048 FFT size for audio processing.
audio_hop_size 1024 Hop size (overlap) between FFT frames.
audio_target_t_min 10 Minimum delay in frames for peak pairs in constellation hashing.
audio_target_t_max 100 Maximum delay in frames for peak pairs in constellation hashing.

Audio Validation Options

Option Default Description
audio_silence_threshold 0.005 RMS amplitude threshold below which audio is considered silence.
audio_sparsity_threshold 0.10 Minimum signal density (non-zero samples ratio).
audio_min_complexity 50 Minimum number of hashes required for a valid fingerprint.

Video Options

Option Default Description
video_hash_size 64 Hash size (64x64 input -> 16x16 DCT -> 256 bit hash).
video_threshold 50 Tolerance for Hamming Distance (0-256). Lower is stricter.
video_interval 0.20 Time interval (seconds) between checked frames during video scan.
video_search_window 10 Initial seconds before/after saved timestamp to search.
video_max_search_window 300 Maximum seconds to expand the search window.
video_window_step 30 Step size (seconds) when expanding the video search window.

Video Validation Options

Option Default Description
video_min_brightness 15 Minimum mean brightness (0-255).
video_max_brightness 240 Maximum mean brightness (0-255).
video_min_contrast 10.0 Minimum standard deviation.
video_min_entropy 4.0 Minimum entropy (0-8).
video_min_quality 50 Minimum PDQ quality score (0-100).

File Paths

Option Default Description
audio_temp_filename mpv_intro_skipper_audio.dat Name of temp file used for audio
video_temp_filename mpv_intro_skipper_video.dat Name of temp file used for video

Key Bindings

Option Default Description
key_save_intro Ctrl+i Key binding to save the intro fingerprint.
key_skip_video Ctrl+Shift+s Key binding to skip using video fingerprinting.
key_skip_audio Ctrl+s Key binding to skip using audio fingerprinting.

Quality Validation

To prevent false positives and wasted scans, the script validates media quality before creating a fingerprint.

Audio Validation

If the audio is too simple or quiet, you will see an "Audio Rejected" message. This happens if:

  • Silence Detected: Audio is too quiet (RMS < 0.005).
  • Signal Too Sparse: Audio is mostly silence (< 10% active samples).
  • Low Complexity: Audio lacks distinct frequency peaks (< 50 hashes generated).

Video Validation (Frame Rejection)

To ensure robust matching, the system automatically validates frames before creating a fingerprint. A frame is rejected if it fails any of the following checks:

  1. Extreme Darkness/Brightness: The image is almost entirely black (Mean < 15) or white (Mean > 240).
  2. Low Contrast: The image looks flat with little variation in brightness (StdDev < 10.0).
  3. Low Structure: The image lacks distinct edges or consists of smooth gradients (PDQ Quality < 50).
  4. Low Information: The image is too simple or repetitive (Entropy < 4.0).

Examples

1. Good Frame (Accepted)

Original Frame What PDQ Hash Sees
Accepted Accepted pHash

Reason: High Quality. The image has distinct edges, good contrast, and clear shapes that remain visible even after resizing. This produces a strong, unique fingerprint.

2. Bad Frame (Too Dark & Flat)

Original Frame What PDQ Hash Sees
Rejected Rejected pHash

Reason: Extremely Dark & Low Contrast. The scene is too dim to extract meaningful features. The PDQ algorithm effectively sees a black square, which would match any other dark scene.

3. Bad Frame (Low Structure)

Original Frame What PDQ Hash Sees
Waves Waves pHash

Reason: Lack of Sharp Edges. The image consists of smooth color transitions (gradients) without any sharp lines. PDQ Hash relies on edge detection, so smooth blurs result in a weak fingerprint that fails the Gradient Quality check.

4. Weak Frame (Low Texture)

Original Frame What PDQ Hash Sees
Betrayal Betrayal pHash

Reason: Low Feature Density. While this frame technically passes the rejection thresholds, it is a borderline candidate. Large areas of the image are flat color (low texture), meaning the hash has fewer "anchors" than a highly detailed scene. It is better to choose a frame with more complex details if possible.

Tip: Always choose a frame with clear shapes, high contrast, and distinct objects. If you encounter errors, try moving the playback position slightly forward or backward to a more complex part of the intro.

How it Works

The script uses two primary methods for fingerprinting:

1. Audio Fingerprinting (Constellation Hashing)

Constellation Hashing

  • Algorithm: Extracts audio using FFmpeg (s16le, mono) and performs FFT to identify peak frequencies in time-frequency bins.
  • Hashing: Pairs peaks to form hashes: [f1][f2][delta_time].
  • Matching: Uses a Global Offset Histogram. Every match calculates $Offset = T_{file} - T_{query}$, and the script looks for the largest cluster (peak) of consistent offsets.
  • Filtering: Implements Match Ratio filtering (default 30%) to ensure the match is an exact fingerprint overlap rather than just similar-sounding music.
  • Search Strategy: Concurrent Linear Scan. The timeline is divided into contiguous segments (e.g., 10s). Each segment is processed by a concurrent worker with sufficient padding to ensure no matches are lost at segment boundaries. Hashes are filtered to prevent double-counting in overlapping regions.
  • Optimization:
    • Concurrency: Launches multiple parallel FFmpeg workers to utilize all CPU cores.
    • Inverted Index: Uses an $O(1)$ hash-map for near-instant lookup of fingerprints during the scan.
    • Optimal Stopping: Scans terminate immediately once a high-confidence match is confirmed and the signal gradient drops.

2. Video Fingerprinting (PDQ Hash)

Perceptual Hashing

  • Algorithm: Downsamples frames to 512x512, converts to grayscale (Luma), and applies a 2-pass Jarosz filter. Then, resizes to 64x64 and computes the Discrete Cosine Transform (DCT) of the rows and columns. A 256-bit hash (32 bytes) is generated from the low-frequency 16x16 coefficients by comparing each coefficient against the median value.
  • Matching: Uses Hamming Distance (count of differing bits). It is robust against color changes, small aspect ratio variations, and high-frequency noise.
  • Search Strategy: The search starts around the timestamp of the saved fingerprint and expands outward.
  • Optimization: FFmpeg video decoding is the most expensive part of the pipeline. By assuming the intro is at a similar location (common in episodic content), we avoid decoding the entire stream, resulting in much faster scans.

Jarosz Filter Approximation

The script approximates the Jarosz filter (essential for PDQ robustness) using an optimized FFmpeg filter chain: scale=512:512:flags=bilinear, colorchannelmixer (exact luminance), avgblur=sizeX=4:sizeY=4 (applied twice), and scale=64:64:flags=neighbor. This configuration matches closely but is not exact with the official PDQ C++ implementation.

Performance & Technical Details

The script is heavily optimized for LuaJIT and high-performance processing.

1. LuaJIT FFI & Memory Management

  • Zero-Allocation Data Processing: Critical hot paths use LuaJIT FFI C-arrays (double[], int16_t[]) instead of Lua tables. This prevents massive Garbage Collection (GC) pauses that would occur if creating millions of small table objects for audio samples and hashes.
  • Flattened Data Structures: 2D data (like spectrogram peaks) is flattened into 1D C-arrays to ensure memory contiguity and cache friendliness.
  • Direct Memory Access: Raw audio and video buffers from FFmpeg are cast directly to C-structs using FFI, avoiding any copying or string manipulation in Lua.

2. Audio FFT Processing

The script uses highly optimized internal FFT implementations:

For LuaJIT (FFI-Optimized)

  • Stockham Auto-Sort Algorithm: Avoids the expensive bit-reversal permutation step, maximizing FFI performance.
  • Radix-4 & Mixed-Radix: Processes 4 points at a time to reduce complex multiplications, with Radix-2 fallback passes to handle non-power-of-4 sizes (e.g., 2048).
  • Cache-Aware Loop Tiling: Ensures unit-stride memory access for maximum memory throughput.

For Standard Lua (Interpreter-Optimized)

  • Zero-Allocation Processing: Replaces table churn with reusable buffers to minimize Garbage Collection overhead.
  • Fused Scrambling: Combines Hann windowing and bit-reversal into a single pass.
  • Precomputed Lookups: Uses pre-calculated trig tables and bit-reversal maps to avoid redundant math inside hot loops.
  • Speedup: Achieves approximately 2.5x faster processing compared to naive Lua implementations.

3. Algorithmic Optimizations

  • Inverted Index Matching: Fingerprints are stored in a hash map ($O(1)$ lookup), allowing the scanner to instantly find potential matches without iterating through the reference data.
  • Precomputed Population Count: A 256-entry lookup table is used to calculate Hamming distances for video hashes, replacing bit-twiddling loops with a single table lookup per byte.
  • Gradient-Based Early Stopping: The scanner monitors the "match strength" gradient. Once a peak is found and the signal begins to fade, the scan aborts immediately, saving CPU time.
  • Asynchronous Concurrency: Uses mpv coroutines and multiple parallel FFmpeg workers to utilize all CPU cores without blocking the player UI.

Install FFmpeg

This script relies on ffmpeg being available in your system's PATH.

Windows

Using a package manager (recommended):

Winget:

winget install ffmpeg

Chocolatey:

choco install ffmpeg

Scoop:

scoop install ffmpeg

macOS

Using Homebrew:

brew install ffmpeg

Linux

Debian/Ubuntu:

sudo apt update && sudo apt install ffmpeg

Fedora:

sudo dnf install ffmpeg

Arch Linux:

sudo pacman -S ffmpeg

Troubleshooting

  • "Audio Rejected" / "Frame Rejected":

    • Cause: The scene is too simple (silence, black screen, featureless background) to generate a unique fingerprint.
    • Solution: Seek forward or backward by a few seconds to a scene with clear audio (dialogue/music) or visual detail, then press Ctrl+i again.
  • "FFmpeg failed during scan":

    • Cause: ffmpeg is missing or not in system PATH.
    • Solution: Install FFmpeg and verify it runs from a terminal.
  • No match found:

    • Video: Try increasing video_threshold in config, or ensure the intro is visually identical.
    • Audio: Ensure the intro music is consistent. If the intro has variable music but same video, use Video Skip (Ctrl+Shift+s).

Verifying LuaJIT Support

This script is highly optimized for LuaJIT. While it includes a fallback for standard Lua (5.1/5.2), using LuaJIT provides significantly faster performance, especially for audio scanning.

To check if your mpv build uses LuaJIT, run the following command in your terminal:

Windows:

mpv --version -v | findstr luajit

macOS / Linux:

mpv --version -v | grep luajit

If the command returns a line containing luajit, you are good to go. If it returns nothing, you are likely using standard Lua.

If luajit is missing:

  • Windows: These package managers typically install the shinchiro builds (or equivalent) which include LuaJIT support.
    • Scoop:
      scoop bucket add extras
      scoop install mpv
    • Chocolatey: choco install mpvio
    • Winget: winget install "mpv (Unofficial)"
    • Or download the official builds directly from mpv.io (select the shinchiro builds).
  • macOS: Install via Homebrew (brew install mpv).
  • Linux:
    • Arch Linux: Install with Pacman (pacman -S mpv)
    • Ubuntu: The default mpv package in apt often lacks LuaJIT support or is outdated. Use the ubuntuhandbook1/mpv PPA
      sudo add-apt-repository ppa:ubuntuhandbook1/mpv
      sudo apt update
      sudo apt install mpv
    • Fedora: The default repositories may lack full codec support or features. Use RPMFusion:
      sudo dnf install https://mirrors.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm
      sudo dnf install mpv
    • Other Distributions: Install via Flatpak from Flathub.

Development & Testing

You can use the provided VS Code DevContainer to test the script in a pre-configured Linux environment:

  1. Open the project in VS Code.
  2. Click Reopen in Container when prompted.
  3. The container comes with mpv, ffmpeg, and xvfb pre-installed.
  4. To test: xvfb-run mpv --script=main.lua videos
    • Note: Place your test videos in the videos/ folder in the project root to have them available inside the container.

Buy Me A Coffee

About

An MPV script to skip intro sequences in videos by fingerprinting audio and video

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Sponsor this project

Packages

No packages published