Skip to content

jsherer/bitcask

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bitcask - Python Implementation

A Python implementation of Bitcask, a log-structured hash table for fast key-value storage, originally developed by Basho (Riak).

Features

  • Fast writes: Append-only design ensures O(1) disk writes
  • Fast reads: In-memory hash table provides O(1) lookups
  • Crash recovery: Append-only logs enable consistent recovery
  • Automatic compaction: Merge process reclaims space from deleted/overwritten keys
  • Hint files: Speed up startup time by avoiding full data file scans
  • TTL Support: Automatic key expiration with time-to-live settings
  • String Support: Automatic conversion between strings and bytes with configurable encoding
  • Snapshots: Create thread-safe read-only point-in-time views of the database
  • Single writer, multiple readers: Ensures consistency while allowing concurrent reads
  • Configurable modes: Read-only, write-only, and sync-on-put options
  • MutableMapping interface: Dict-like API for ease of use

Installation

# Clone the repository
git clone <repository-url>
cd bitcask

# Install with development dependencies
pip install -e ".[dev,test]"

# Or just install the package
pip install -e .

# Run tests to verify installation
make test

Quick Start

from bitcask import Bitcask

# Basic usage - supports both bytes and strings
with Bitcask("mydb") as db:
    # Write data (strings are auto-converted to bytes)
    db.put("key", "value")
    
    # Read data (returns bytes)
    value = db.get("key")
    print(value)  # b"value"
    
    # Can also use bytes directly
    db.put(b"binary_key", b"binary\x00value")
    
    # Delete data
    db.delete("key")
    
    # List all keys (returns bytes)
    for key in db.keys():
        print(key)

API Reference

Core Operations

put(key: bytes|str, value: bytes|str, ttl: int = None) -> int

Store a key-value pair with optional TTL (time-to-live) in seconds. Strings are automatically converted to bytes. Returns timestamp.

# Using strings (auto-converted to bytes)
timestamp = db.put("user:123", "John Doe")
timestamp = db.put("session:456", "data", ttl=3600)  # Expires in 1 hour

# Using bytes directly
timestamp = db.put(b"binary_key", b"binary\x00value")

get(key: bytes|str) -> bytes

Retrieve a value by key. Strings are auto-converted to bytes for lookup. Always returns bytes. Raises KeyError if not found.

try:
    value = db.get("user:123")  # Returns b"John Doe"
    # Decode if you need a string
    text = value.decode('utf-8')
except KeyError:
    print("Key not found")

delete(key: bytes|str)

Mark a key as deleted (tombstone). Strings are auto-converted to bytes. Raises KeyError if not found.

db.delete("user:123")  # String key
db.delete(b"binary_key")  # Bytes key

keys() -> List[bytes]

Return all active keys (excludes deleted keys).

all_keys = db.keys()

fold(function, initial) -> Any

Fold over all key-value pairs.

def sum_values(key, value, acc):
    return acc + int(value)

total = db.fold(sum_values, 0)

expire(key: bytes|str, ttl: int) -> int

Set TTL on an existing key. Strings are auto-converted to bytes. Returns timestamp.

db.expire("user:123", 3600)  # Expire in 1 hour

ttl(key: bytes|str) -> int

Get remaining TTL in seconds. Strings are auto-converted to bytes. Returns -1 if no TTL, -2 if key doesn't exist.

remaining = db.ttl("session:456")
if remaining > 0:
    print(f"Expires in {remaining} seconds")

persist(key: bytes|str) -> int

Remove TTL from a key, making it persistent. Strings are auto-converted to bytes.

db.persist("session:456")  # Remove expiration

snapshot() -> Bitcask

Create a read-only snapshot of the current database state. Returns a new Bitcask instance with a deep copy of the keydir, providing a consistent point-in-time view that remains unchanged even as the original database is modified. Each snapshot has its own file handle and lock for thread-safe concurrent access.

with Bitcask("mydb") as db:
    db.put("key", "value1")
    
    # Create point-in-time snapshot
    snap = db.snapshot()
    
    # Modify original
    db.put("key", "value2")
    
    # Snapshot still sees original value
    assert snap.get("key") == b"value1"
    assert db.get("key") == b"value2"
    
    # Clean up snapshot when done
    snap.close()

Configuration Options

Bitcask(
    path="mydb",              # Database directory
    read_only=False,          # Open in read-only mode
    write_only=False,         # Open in write-only mode (no reads)
    sync_on_put=False,        # Sync to disk after each write
    max_file_size=1048576,    # Max size before file rotation (1MB default)
    encoding='utf-8'          # Encoding for string to bytes conversion
)

Context Manager

# Automatic open/close with context manager
with Bitcask("mydb") as db:
    db.put("key", "value")  # Strings are auto-converted

# Manual open/close
db = Bitcask("mydb")
db.open()
db.put("key", "value")
db.close()

Dictionary Interface

Bitcask implements Python's MutableMapping protocol:

with Bitcask("mydb") as db:
    # Dict-like operations with strings or bytes
    db["key"] = "value"        # Strings auto-converted
    value = db["key"]          # Returns bytes
    del db["key"]              # Delete
    
    # Mix strings and bytes
    db[b"binary"] = b"data"
    db["text"] = "content"
    
    # Iteration (keys are always bytes)
    for key in db:
        print(key, db[key])
    
    # Length
    print(len(db))
    
    # Membership test (strings auto-converted)
    if "key" in db:
        print("Key exists")

Operating Modes

Read-Only Mode

Perfect for read-heavy workloads with shared access:

with Bitcask("mydb", read_only=True) as reader:
    value = reader.get("key")  # Strings work here too
    # reader.put("key", "value")  # Raises ReadOnly exception

Write-Only Mode

Optimized for high-throughput append operations (logging, ingestion):

with Bitcask("mydb", write_only=True) as writer:
    writer.put("log1", "data")  # Fast, no keydir overhead
    # writer.get("log1")  # Raises WriteOnly exception

Sync-On-Put Mode

Ensures durability for critical data:

with Bitcask("mydb", sync_on_put=True) as db:
    db.put("critical", "data")  # Immediately synced to disk

File Management

Merge (Compaction)

Reclaim space by removing old versions and deleted keys:

db = Bitcask("mydb")
db.merge()  # Must be called when database is closed

File Structure

mydb/
├── bitcask.data           # Active write file
├── bitcask.data.lock      # Write lock file
├── 1.bitcask.data         # Immutable data file
├── 1.bitcask.hint         # Hint file for faster loading
├── 2.bitcask.data         # Immutable data file
└── 2.bitcask.hint         # Hint file

Data Format

Data File Entry

+-----+----------+--------+--------+-----+-------+
| CRC | Tstamp   | KeySz  | ValSz  | Key | Value |
+-----+----------+--------+--------+-----+-------+
| 4B  | 4B       | 2B     | 4B     | var | var   |
+-----+----------+--------+--------+-----+-------+

Hint File Entry

+-----+----------+--------+--------+----------+-----+
| CRC | Tstamp   | KeySz  | ValSz  | ValuePos | Key |
+-----+----------+--------+--------+----------+-----+
| 4B  | 4B       | 2B     | 4B     | 8B       | var |
+-----+----------+--------+--------+----------+-----+

Concurrency Model

Thread Safety

  • Thread-Safe: All operations are protected by an internal RLock
  • Multi-threaded: Safe to share a single Bitcask instance across threads
  • Performance: ~8% overhead for writes, ~20% for reads vs single-threaded

Process Safety

  • Single Writer: Only one process can write at a time (enforced via file lock)
  • Multiple Readers: Any number of processes can read simultaneously
  • Snapshot Isolation: Readers see a point-in-time snapshot
# Thread-safe: Single instance, multiple threads
db = Bitcask("mydb")
db.open()

def worker(thread_id):
    for i in range(1000):
        db.put(f"key_{thread_id}_{i}".encode(), b"value")
        value = db.get(f"key_{thread_id}_{i}".encode())

# Safe to use from multiple threads
threads = [threading.Thread(target=worker, args=(i,)) for i in range(10)]
for t in threads:
    t.start()
for t in threads:
    t.join()

db.close()

# Process-safe: Multiple processes
# Writer process
with Bitcask("mydb").open_context() as writer:
    writer.put(b"key", b"value")

# Multiple reader processes (can run simultaneously)
with Bitcask("mydb", read_only=True).open_context() as reader:
    value = reader.get(b"key")

Performance Considerations

Memory Usage

  • Keydir stores all keys in memory: ~145 bytes overhead per key in Python
  • 1 million keys ≈ 145MB RAM
  • Use write-only mode for append-only workloads to avoid keydir overhead
  • Snapshots deep-copy the keydir, doubling memory for keys (data files are shared)

Write Performance

  • Sequential writes to append-only log
  • Configurable file size for rotation
  • Optional sync-on-put for durability vs speed trade-off
  • Thread-safe with ~8% overhead vs single-threaded

Read Performance

  • O(1) key lookups via in-memory hash table
  • Single disk seek per read
  • Hint files speed up startup time
  • Thread-safe with ~20% overhead vs single-threaded

Benchmark Results

On a typical development machine:

  • Sequential writes: ~50,000 ops/sec
  • Random reads: ~100,000 ops/sec
  • Mixed workload: ~30,000 ops/sec

Run make benchmark to test on your hardware.

Testing

The project uses pytest for testing with comprehensive test coverage:

# Run all tests with pytest (94 tests)
make test

# Run with verbose output
make test-verbose

# Run with coverage report
make test-coverage

# Run tests in parallel (faster)
make test-parallel

# Run specific test file
env/bin/python -m pytest bitcask/__tests__/test_basic_operations.py

# Clean up test artifacts
make clean

# Run benchmarks
make benchmark            # Standard benchmark
make benchmark-threaded   # Multi-threaded benchmark

# Show all available targets
make help

Test coverage includes:

  • ✅ Basic CRUD operations (9 tests)
  • ✅ File rotation and merging (8 tests)
  • ✅ Corruption recovery (10 tests)
  • ✅ Concurrent access and thread safety (14 tests)
  • ✅ Edge cases (empty values, large keys) (10 tests)
  • ✅ Mode restrictions (8 tests)
  • ✅ Error handling (10 tests)
  • ✅ TTL support (12 tests)
  • ✅ String conversion (10 tests)
  • ✅ Snapshot functionality (11 tests)
  • ✅ Legacy compatibility tests (24 tests)

Total: 128 tests with ~85% code coverage

Examples

Point-in-Time Snapshots

from bitcask import Bitcask
import threading

def analyze_snapshot(snap):
    """Analyze data in snapshot without affecting live database"""
    total_keys = len(snap.keys())
    print(f"Snapshot has {total_keys} keys")
    
    # Perform analysis on consistent view
    for key in snap.keys():
        value = snap.get(key)
        # Process data...
    
    snap.close()

with Bitcask("production_db") as db:
    # Add initial data
    for i in range(1000):
        db.put(f"key{i}", f"value{i}")
    
    # Create snapshot for analysis
    snap = db.snapshot()
    
    # Multiple threads can safely read from the same snapshot
    threads = []
    for i in range(5):
        thread = threading.Thread(target=analyze_snapshot, args=(snap,))
        threads.append(thread)
        thread.start()
    
    # Continue modifying the database while snapshot analysis runs
    for i in range(1000):
        db.put(f"key{i}", f"modified{i}")
    
    # Wait for analysis to complete
    for thread in threads:
        thread.join()
    
    # Snapshots saw the original state, not the modifications

Simple Cache

class Cache:
    def __init__(self, path):
        self.db = Bitcask(path)
        self.db.open()
    
    def set(self, key: str, value: str, ttl=None):
        # Strings are auto-converted, TTL is supported
        self.db.put(key, value, ttl=ttl)
    
    def get(self, key: str) -> str:
        try:
            # Get returns bytes, decode to string
            return self.db.get(key).decode('utf-8')
        except KeyError:
            return None
    
    def close(self):
        self.db.close()

Log Storage

# High-speed log ingestion - strings work seamlessly
with Bitcask("logs", write_only=True) as logs:
    for i in range(1_000_000):
        timestamp = time.time()
        logs.put(f"log:{timestamp}", log_data)  # String key

# Later, process logs
with Bitcask("logs", read_only=True) as logs:
    for key in logs.keys():  # Keys are bytes
        if key.startswith(b"log:"):
            process_log(logs.get(key))

Session Store with TTL

class SessionStore:
    def __init__(self):
        self.db = Bitcask("sessions", sync_on_put=True)
        self.db.open()
    
    def save_session(self, session_id: str, data: dict, ttl=3600):
        key = f"session:{session_id}"  # String key
        value = json.dumps(data)  # String value
        # Sessions automatically expire after TTL seconds
        self.db.put(key, value, ttl=ttl)
    
    def get_session(self, session_id: str) -> dict:
        key = f"session:{session_id}"
        try:
            value = self.db.get(key)  # Returns bytes
            return json.loads(value.decode('utf-8'))
        except KeyError:
            return None  # Session expired or doesn't exist
    
    def extend_session(self, session_id: str, ttl=3600):
        key = f"session:{session_id}"
        self.db.expire(key, ttl)  # Strings work with all methods
    
    def delete_session(self, session_id: str):
        key = f"session:{session_id}"
        self.db.delete(key)

Cache with TTL

class Cache:
    def __init__(self, path="cache", encoding='utf-8'):
        # Pass custom encoding if needed
        self.db = Bitcask(path, encoding=encoding)
        self.db.open()
    
    def set(self, key: str, value: str, ttl=300):
        # Default 5-minute TTL, strings auto-converted
        self.db.put(key, value, ttl=ttl)
    
    def get(self, key: str) -> str:
        try:
            return self.db.get(key).decode(self.db._encoding)
        except KeyError:
            return None  # Cache miss or expired
    
    def close(self):
        self.db.close()

Limitations

  • Key Size: Maximum 65KB (2-byte length field)
  • Value Size: Maximum 4GB (4-byte length field)
  • Memory: All keys must fit in memory
  • No Transactions: Single key operations only
  • Single Machine: No built-in replication/distribution

Implementation Details

This implementation follows the Bitcask paper with some Python-specific adaptations:

  1. CRC Validation: All entries are CRC32 protected
  2. Crash Recovery: Corrupted entries are skipped with logging
  3. File Locking: OS-level locks prevent multiple writers
  4. Tombstones: Deleted keys marked with special value
  5. Hint Files: Created during merge for faster startup
  6. Python Optimizations: Uses bytearray for efficient I/O
  7. Snapshots: Each snapshot gets its own file handle and lock for thread-safe access
  8. TTL Support: Value-embedded expiration timestamps with lazy deletion
  9. String Support: Automatic encoding/decoding with configurable character sets

Contributing

  1. Install development dependencies: pip install -e ".[dev,test]"
  2. Run existing tests: make test
  3. Add tests for new features in bitcask/__tests__/
  4. Ensure all tests pass and coverage is maintained
  5. Follow existing code style (no comments unless necessary)
  6. Run benchmarks to ensure performance is not degraded

License

See LICENSE

Acknowledgments

Based on the Bitcask paper by Justin Sheehy and David Smith at Basho Technologies.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published