This repository contains implementations of two seminal distributed systems papers:
- MapReduce: Simplified Data Processing on Large Clusters
- Raft: In Search of an Understandable Consensus Algorithm
Both projects were developed as part of the 6.5840 (Distributed Systems) course in Spring 2024 at MIT.
/
├── Makefile
├── .check-build
├── src/
│ ├── labgob/ # Go encoder/decoder utilities
│ ├── labrpc/ # Go RPC framework for distributed communication
│ ├── main/ # Main programs for MapReduce coordination and testing
│ ├── models/ # Model files for the Key/Value store in Raft
│ ├── mr/ # MapReduce coordinator, worker, and RPC implementation
│ ├── mrapps/ # Applications (e.g., WordCount, Indexer) for MapReduce
│ ├── porcupine/ # Linearizability checker for testing Raft consistency
│ ├── raft/ # Raft implementation and tests
The MapReduce implementation is based on the original Google paper, focusing on parallel processing with fault-tolerance using a coordinator-worker architecture. The project includes:
- A coordinator to distribute Map and Reduce tasks.
- Workers that execute tasks, handle intermediate files, and retry failed tasks.
- Applications such as WordCount and Indexer for testing.
- Fault Tolerance: Automatic re-assignment of tasks from failed workers.
- Parallelism: Multiple workers executing Map and Reduce tasks concurrently.
- Tests: Validation scripts to ensure correctness and performance.
-
Build the WordCount plugin:
cd src/main go build -buildmode=plugin ../mrapps/wc.go -
Start the coordinator:
go run mrcoordinator.go pg-*.txt -
Start one or more workers:
go run mrworker.go wc.so
-
Validate results:
cat mr-out-* | sort
For a complete test, run:
bash src/main/test-mr.shThe Raft implementation follows the extended Raft paper, focusing on leader election, log replication, and fault-tolerance. It is the foundation for building a distributed key-value store.
- Leader Election: Election of a leader within 5 seconds of a failure.
- Log Replication: Consistent replication of logs across all nodes.
- Fault Tolerance: Persistence of state to recover from crashes.
- Snapshotting: Reduces memory usage by discarding old logs.
-
Run tests for Part 3A (Leader Election):
cd src/raft go test -run 3A
-
Run tests for Part 3B (Log Replication):
go test -run 3B -
Run tests for Part 3C (Persistence):
go test -run 3C -
Run tests for Part 3D (Snapshotting):
go test -run 3D
A successful run of the 3A tests looks like this:
Test (3A): initial election ...
... Passed -- 3.5 3 58 16840 0
Test (3A): election after network failure ...
... Passed -- 5.4 3 118 25269 0
PASS
ok 6.5840/raft 16.265s
- These implementations were homework assignments for 6.5840 and adhere to the course’s collaboration policy.
- The provided tests simulate real-world distributed environments, including network delays, crashes, and restarts.