Portfolio Note: Portfolio recreation of production fraud detection system built at Omfys Technologies.
Real-time fraud detection validating 500K+ transactions/day with 92% accuracy using Spark Scala, XGBoost, and ONNX with <100ms latency.
- Volume: 500K+ transactions/day
- Accuracy: 92%
- False Positive Rate: <2%
- Latency: <100ms p95
- Fraud Prevention: ₹10L+ annually
- Processing: Spark (Scala), Kafka Streams
- Storage: HBase, HDFS (Parquet), Hive
- ML: XGBoost, Random Forest, ONNX Runtime
- ETL: Apache Sqoop
- Monitoring: Prometheus, Grafana
- Spark Scala for optimal JVM performance
- Exactly-once processing with checkpointing
- Write-ahead logs for fault tolerance
- Velocity checks: Transaction frequency patterns
- Geographical anomalies: Location-based risk
- Amount deviation: Statistical outlier detection
- Merchant patterns: Historical analysis
- Device fingerprinting: Behavioral analysis
- Random Forest + XGBoost stacking
- ONNX format for 3x faster inference
- A/B testing framework
- HBase for <10ms p95 lookups
- 10M+ transaction profiles
- Bloom filters for efficiency
fraud-detection-system/
├── src/
│ ├── main/scala/ # Spark Scala code
│ │ ├── FraudDetector.scala
│ │ ├── FeatureEngineering.scala
│ │ └── KafkaStreaming.scala
│ └── python/ # ML training
│ ├── model_training.py
│ └── onnx_conversion.py
├── config/
│ ├── hbase-site.xml
│ └── application.conf
├── build.sbt
├── requirements.txt
└── README.md
git clone https://github.com/Amanroy666/fraud-detection-system.git
cd fraud-detection-system
# Build Scala project
sbt clean compile assembly
# Install Python dependencies
pip install -r requirements.txt| Metric | Value |
|---|---|
| Accuracy | 92% |
| Precision | 89% |
| Recall | 94% |
| F1-Score | 91.5% |
| False Positive Rate | 1.8% |
| Latency (p95) | 98ms |
Aman Roy - Data Engineer at Omfys Technologies
📧 contactaman000@gmail.com | 💼 LinkedIn | 🐙 @Amanroy666