Surprisal Calculator WM7±2

Converting the surprisingness of language into musical compositions

Live Demo: https://surprisal.onrender.com

cat.mp4

🎵 What is Surprisal?

Surprisal theory suggests that the more surprising a word is in context, the longer it takes the human brain to process. Consider these sentences:

"The man fed the cat some tuna." (low surprisal)
"The lawyer presented the cat with a lawsuit." (high surprisal)

The word "cat" is far more surprising in the legal context! This "surprisingness" can be quantified using Claude Shannon's information theory formula:

Surprisal(x) = -log₂ P(x | context)

🎹 How It Works

Text → Music: Input text → Calculate word surprisal → Map the numeric values to musical pitches → Generate melody
Music → Text: Play musical notes → Find words that would have a similar surprisal value in the given context → Generate text

This is meant as a fun experiment to help build intuition about how humans process natural language as well as how LLMs model the compositional features of communication. The surprisal data of a sentence could be abstracted and presented in many different ways, but we thought musical melody would be a form where the abstraction actually uses some similar properties of perception and processing.

The results change on the model used, both due to the underlying tokenization process that each uses as well as the statistical models that develop during their training. Making these differences audible (and interactive) has been a fun way to build new intuition and make the "black box" of the models' inner workings more accessible.

We have chosen to focus on small models, partly to lower the computational overhead required, but also to get a sense of how these little guys are trying to squeeze as much coherence as possible out of their training. The live demo only exposes one model, but cloning the repo and running it locally would allow you to experiment with the other models we have selected or to choose your own!

🚀 Quick Start

Option 1: Local Development

# Clone and setup
git clone https://github.com/wobblybits/surprisal.git
cd surprisal
pip install -r requirements.txt
python app.py

The first time you run it, the transformers library will download and cache the model tensors, which all combined is ~3GB. You can disable certain models in config.py. If you want to add your own models, you will need to edit app.py to provide the configuration details as well as enable them in config.py.

Repository Structure

├── app.py                     # Main Flask application
├── assets/js/
│   ├── config.js             # Configuration and presets
│   ├── surprisal-app.js      # Main application logic
│   └── utilities.js          # Helper functions and error handling
├── templates/wireframe.html   # Main UI template
├── requirements.txt          # Python dependencies
└── .env.example

🔬 Language Models

Model	Size	Description
GPT-2	124M	OpenAI's foundational model
DistilGPT-2	88M	A distilled version of GPT2
SmolLM	135M	Hugging Face's optimized small model
Nano Mistral	170M	Compact Mistral variant
Qwen 2.5	494M	Multilingual question-answering model
Flan T5	74M	Google's text-to-text transformer

Each model has different tokenization and surprisal characteristics, leading to unique musical interpretations.

🔗 Links and References

Academic Sources

Testing the Predictions of Surprisal Theory in 11 Languages (Wilcox et al., TACL 2023)
Expectation-based syntactic comprehension (Levy, Cognition 2008)
A mathematical theory of communication (Shannon, 1948)

Models and Assets

Language models from Hugging Face
Audio synthesis with Tone.js
Icons from Flaticon
Fonts from Google Fonts and Old School PC Fonts
Sound effects from Pixabay

More detailed attributions are included at the bottom of the main html file.

❤️ Appreciation

Built with ❤️ at the Recurse Center.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
assets		assets
preview		preview
templates		templates
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
config.py		config.py
gunicorn.conf.py		gunicorn.conf.py
requirements.txt		requirements.txt
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Surprisal Calculator WM7±2

🎵 What is Surprisal?

🎹 How It Works

🚀 Quick Start

Option 1: Local Development

Repository Structure

🔬 Language Models

🔗 Links and References

Academic Sources

Models and Assets

❤️ Appreciation

About

Uh oh!

Contributors 2

Uh oh!

Languages

License

wobblybits/surprisal

Folders and files

Latest commit

History

Repository files navigation

Surprisal Calculator WM7±2

🎵 What is Surprisal?

🎹 How It Works

🚀 Quick Start

Option 1: Local Development

Repository Structure

🔬 Language Models

🔗 Links and References

Academic Sources

Models and Assets

❤️ Appreciation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages