A pure Python implementation of a Hash Array Mapped Trie (HAMT), providing an immutable, persistent mapping data structure with efficient updates and structural sharing.
Key Advantage: While HAMT is slower than dict for single operations, it's 14x faster and uses 99.8% less memory than dict.copy() when creating multiple variants, making it ideal for functional programming and version management.
HAMT (Hash Array Mapped Trie) is a data structure that combines the benefits of hash tables and tries to create an efficient immutable mapping. It was popularized by languages like Clojure and Scala for implementing persistent data structures.
- Immutability: All operations return new instances, original remains unchanged
- Persistence: Old versions remain available after updates
- Structural Sharing: New versions share unchanged parts with old versions
- O(log32 n) Operations: Effectively constant time for practical data sizes
- Thread-Safe: Immutability makes it inherently safe for concurrent access
- Memory Efficient: When creating many similar versions
- Python 3.11 or higher
- Virtual environment (recommended)
- Clone the repository:
git clone <repository-url>
cd hamt- Create and activate virtual environment:
python3 -m venv env
source env/bin/activate # On Windows: env\Scripts\activate- Install dependencies:
make installfrom hamt import HAMT
# Create an empty HAMT
h = HAMT()
# Add key-value pairs (returns new instance)
h1 = h.set('key1', 'value1')
h2 = h1.set('key2', 'value2')
# Original remains unchanged
assert len(h) == 0
assert len(h1) == 1
assert len(h2) == 2
# Access values
value = h2['key1'] # 'value1'
value = h2.get('key3', 'default') # 'default'
# Check membership
if 'key1' in h2:
print("Key exists")
# Delete keys (returns new instance)
h3 = h2.delete('key1')
assert 'key1' not in h3
assert 'key1' in h2 # Original unchanged
# Iteration
for key in h2:
print(key, h2[key])
# Get all keys, values, items
keys = h2.keys()
values = h2.values()
items = h2.items()# From dictionary
d = {'a': 1, 'b': 2, 'c': 3}
h = HAMT(d)
# From list of tuples
items = [('a', 1), ('b', 2), ('c', 3)]
h = HAMT(items)# Traditional dict approach - full copies
base_dict = {'a': 1, 'b': 2, 'c': 3, ...} # 10,000 items
variants = []
for i in range(500):
variant = base_dict.copy() # Full copy every time!
variant[f'item_{i}'] = i
variants.append(variant)
# Result: 500 full copies, ~140MB memory, 42ms
# HAMT approach - structural sharing
base_hamt = HAMT({'a': 1, 'b': 2, 'c': 3, ...}) # 10,000 items
variants = []
for i in range(500):
variant = base_hamt.set(f'item_{i}', i) # Shares 99.9% of structure!
variants.append(variant)
# Result: 500 variants sharing structure, ~0.3MB memory, 3ms
# The magic: each variant only stores the differences
# All unchanged nodes are shared between versionsThis structural sharing makes HAMT ideal for:
- Implementing undo/redo with full state snapshots
- Creating "what-if" scenarios without duplicating data
- Building version control systems for data structures
- Maintaining audit trails with complete state history
# Run all tests
make test
# Run with verbose output
make test-verbose
# Run with coverage report
make test-coverage# Small benchmark (1K operations, ~1 second)
make benchmark-small
# Medium benchmark (10K operations, ~5 seconds) - Default
make benchmark
# Large benchmark (100K operations, ~30 seconds)
make benchmark-largeThe benchmarks test:
- Insertions: Adding key-value pairs
- Lookups: Reading values by key
- Deletions: Removing key-value pairs
- Iteration: Traversing all keys
- Structural sharing vs Copying: Comparing HAMT variants vs dict.copy()
- Hash collisions: Performance with colliding hash values
Benchmark sizes:
- Small: 1,000 insertions, 500 deletions, 10 variants
- Medium: 10,000 insertions, 5,000 deletions, 100 variants
- Large: 100,000 insertions, 25,000 deletions, 500 variants
From the large benchmark (100K items, 500 variants):
Single Operations:
Insertions: dict 0.02s vs HAMT 0.69s (36x slower)
Lookups: dict 0.01s vs HAMT 0.12s (11x slower)
Creating Variants:
dict.copy(): 42ms total, 140MB memory
HAMT: 3ms total, 0.3MB memory
→ HAMT is 14x faster and uses 99.8% less memory
# Format code
make format
# Check formatting and linting
make lint
# Clean build artifacts
make clean| Operation | Average Case | Worst Case |
|---|---|---|
| get | O(log32 n) | O(log32 n) |
| set | O(log32 n) | O(log32 n) |
| delete | O(log32 n) | O(log32 n) |
| contains | O(log32 n) | O(log32 n) |
| len | O(1) | O(1) |
Note: log32 n is effectively constant for practical data sizes (log32 of 1 billion H 6)
- Single HAMT: O(n) where n is the number of key-value pairs
- Multiple versions: Efficient due to structural sharing
- Creating k versions with m changes each: O(n + k�m) instead of O(k�n)
Based on our comprehensive benchmarks:
- Insertions: 25-40x slower than dict
- Lookups: 10-25x slower than dict
- Deletions: 25-40x slower than dict
- Iteration: 10-20x slower than dict
When creating multiple versions with small changes:
- Speed: HAMT is 2.7-14x faster than dict.copy()
- Memory: HAMT uses ~99.8% less memory through structural sharing
- Example: Creating 500 variants of a 10,000 item collection:
- Dict approach: 42ms, ~140MB memory
- HAMT approach: 3ms, ~0.3MB memory (shares 99.9% of structure)
- Immutable data structures for functional programming
- Version history - keeping multiple versions of data
- Thread safety without locks
- Undo/redo functionality in applications
- Structural sharing for memory efficiency
- Creating many variants of a base data structure
- Copy-on-write semantics with better performance than dict.copy()
- Maximum performance for single-threaded operations
- Mutable in-place operations
- Simple key-value storage without versioning
- Best performance for single instances without copying
HAMT excels in scenarios like:
- Configuration management: Track config changes across deployments
- State management: Redux-style state trees in Python applications
- Caching with history: Keep previous cache states for rollback
- Parallel processing: Safe concurrent access without locks
- Event sourcing: Efficient storage of incremental state changes
This HAMT implementation uses:
- 32-way branching factor for the trie nodes
- Bitmap indexing for sparse arrays
- Collision nodes for hash collisions
- Path copying for persistence
The trie structure uses 5 bits of the hash at each level, creating up to 32 branches per node. This provides a good balance between tree depth and node size.
Create a new HAMT, optionally initialized with items.
Return a new HAMT with the key-value pair added/updated.
Get value for key, return default if not found.
Get value for key, raise KeyError if not found.
Return a new HAMT with the key removed.
Check if key exists in the HAMT.
Return the number of key-value pairs.
Iterate over keys.
Return list of all keys.
Return list of all values.
Return list of (key, value) tuples.
Check equality with another HAMT.
Return string representation of the HAMT.
The test suite includes:
- Basic functionality tests
- Edge cases (empty keys, special characters, Unicode)
- Hash collision handling
- Large dataset tests
- Concurrency tests
- Memory efficiency tests
- Performance tests
Run tests with make test-verbose to see all test cases.
Contributions are welcome! Please ensure:
- All tests pass (
make test) - Code is formatted (
make format) - New features include tests
- Documentation is updated
See LICENSE
The HAMT data structure was originally described by Phil Bagwell in "Ideal Hash Trees" (2001). This implementation is inspired by Clojure's PersistentHashMap and Python's built-in HAMT (used in contextvars).