Skip to content

doccstat/llm-watermark-adaptive

Repository files navigation

A Likelihood Based Approach for Watermark Detection

Implementation of the methods described in "A Likelihood Based Approach for Watermark Detection" by Xingchi Li, Guanxun Li, Xianyang Zhang.

Proceedings of Machine Learning Research OpenReview

Prerequisites

Python environments
  • Cython==3.0.10
  • datasets==2.19.1
  • huggingface_hub==0.23.0
  • nltk==3.8.1
  • numpy==1.26.4
  • sacremoses==0.0.53
  • scipy==1.13.0
  • sentencepiece==0.2.0
  • tokenizers==0.19.1
  • torch==2.3.0.post100
  • torchaudio==2.3.0
  • torchvision==0.18.0
  • tqdm==4.66.4
  • transformers==4.40.2

Set up environments

# PyTorch: https://pytorch.org/get-started/locally
# Transformers: https://huggingface.co/docs/transformers/en/installation
conda install cython scipy nltk sentencepiece sacremoses

Instructions

All experiments are conducted using Slurm workload manager. Expected running time and memory usage are provided in the corresponding sbatch scripts.

Important

Please modify the paths, Slurm mail options and adjust the GPU resources in the sbatch scripts before running the experiments.

# Setup pyx.
sbatch 1-setup.sh

# Download models to local.
sbatch 2-download.sh

# Text generation.
bash 3-textgen-helper.sh
sbatch 3-textgen.sh

# Watermark detection.
bash 4-detect-helper.sh
sbatch 4-detect.sh

# Result analysis and ploting.
Rscript 5-analyze.R

Citation

@InProceedings{pmlr-v258-li25d,
  title = 	 {A Likelihood Based Approach for Watermark Detection},
  author =       {Li, Xingchi and Li, Guanxun and Zhang, Xianyang},
  booktitle = 	 {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1675--1683},
  year = 	 {2025},
  editor = 	 {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz},
  volume = 	 {258},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {03--05 May},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v258/main/assets/li25d/li25d.pdf},
  url = 	 {https://proceedings.mlr.press/v258/li25d.html},
  abstract = 	 {Watermarking techniques embed statistical signals within content generated by large language models to help trace its source. Although existing methods perform well on long texts, their effectiveness significantly decreases for shorter texts. We introduce a statistical detection approach that improves the power of watermark detection, particularly in shorter texts. Our method leverages both the watermark key sequence and the next token probabilities (NTPs) to determine whether a text is generated by a large language model. We demonstrate the optimality of our approach and analyze its power properties. We also investigate an approach to estimating NTPs and extend our method to scenarios where texts face potential attacks such as substitutions, insertions, or deletions. We validate the effectiveness of our technique using texts generated by Meta-Llama-3-8B from Meta and Mistral-7B-v0.1 from Mistral AI, utilizing prompts extracted from Google’s C4 dataset. In scenarios without attacks and with short text lengths, our method demonstrates approximately 65% power improvement compared to the baseline method on average. We release all code publicly at \url{https://github.com/doccstat/llm-watermark-adaptive.}}
}
OpenReview
@inproceedings{
  li2025a,
  title={A Likelihood Based Approach for Watermark Detection},
  author={Xingchi Li and Guanxun Li and Xianyang Zhang},
  booktitle={The 28th International Conference on Artificial Intelligence and Statistics},
  year={2025},
  url={https://openreview.net/forum?id=S2QoDt4bw4}
}

Stargazers over time

Stargazers over time