This is the naive implementation of a fraud detection system WITHOUT using a feature store like Featureform. It demonstrates the problems that arise when building ML systems without proper feature infrastructure.
naive/
├── docker-compose.yml # Simple Postgres setup
├── load_data.py # Load data with train/test split
├── train_naive.py # Train model with inline features
├── inference_naive.py # Run predictions (SLOW!)
└── README.md # This file
Get Kaggle API key from Kaggle website
mkdir ~/.kaggle/
mv kaggle.json ~/.kaggle/kaggle.json
chmod 600 ~/.kaggle/kaggle.json# Create virtual environment
python3.9 -m venv venv
# Activate
source venv/bin/activate
# Install dependencies
pip install -r requirements.txtcd naive/
docker-compose up -dYou need the IEEE-CIS Fraud Detection dataset from Kaggle:
# Setup Kaggle credentials first
pip install kaggle
python download_dataset.py# Quick mode: 50K rows
python load_data.py --quick
# Full dataset: 590K rows
python load_data.pypython train_naive.py# Single prediction (random test transaction)
python inference_naive.py --random
# Specific transaction
python inference_naive.py --transaction-id 2987000
# Batch predictions (watch it get slow!)
python inference_naive.py --batch 20In Training (train_naive.py):
def compute_features(df):
# Card aggregates
card_stats = df.groupby('card1').agg({
'transaction_amt': ['count', 'mean'],
'is_fraud': 'mean'
})
df = df.merge(card_stats, on='card1')
...In Inference (inference_naive.py):
def compute_features_for_inference(transaction_id, conn):
# Must duplicate the SAME logic
card_query = """
SELECT COUNT(*) as card_transaction_count,
AVG(transaction_amt) as card_avg_amt,
AVG(is_fraud) as card_fraud_rate
FROM transactions WHERE card1 = ...
"""
...Result:
- ✗ Easy to introduce bugs (logic diverges over time)
- ✗ Training-serving skew (different implementations)
- ✗ No version control
- ✗ Hard to maintain
Every prediction requires:
- Query all transactions for the card → SLOW
- Query all transactions for the email → SLOW
- Compute aggregates from scratch → SLOW
- No caching whatsoever → SLOW
Example output:
Feature computation: 847.23ms (97.8%)
Model inference: 18.45ms ( 2.2%)
Total: 865.68ms
⚠️ 97.8% of time spent recomputing features!
Scaling problems:
20 predictions = 17 seconds
→ 1000 predictions = 14 minutes
→ 100,000 predictions = 24 hours
This is NOT suitable for production!
In the naive approach, you could accidentally use future data:
# WRONG: This includes transactions AFTER the one we're predicting!
card_stats = df.groupby('card1')['is_fraud'].mean()This causes data leakage and inflates model performance artificially.
When you update features:
- Old models break
- No rollback capability
- Hard to A/B test features
- Can't compare model versions fairly
Every inference runs expensive aggregate queries:
- Database CPU spikes
- Other queries get slow
- Not scalable
- Production DB at risk
| Metric | Value |
|---|---|
| Single prediction latency | 500-1000ms |
| Throughput | 1-2 predictions/sec |
| Database load | Very high |
| Feature computation | 95%+ of time |
| Scalability | Poor |
| Metric | Value |
|---|---|
| Single prediction latency | 10-50ms |
| Throughput | 20-100 predictions/sec |
| Database load | Low (pre-computed) |
| Feature computation | <10% of time |
| Scalability | Excellent |
Result: 10-100x faster with Featureform!
# Features are materialized once
# Served from Redis (milliseconds)
features = client.features(...).serve(entity_id)# Define feature ONCE
@ff.entity
class Transaction:
card_avg_amt = ff.Feature(
transformation[["entity_id", "card_avg_amt"]],
variant="v1"
)# Training uses v1
training_set = client.training_set("fraud_detection", "v1")
# Deploy v2 without breaking v1
@ff.feature(variant="v2")
def card_avg_amt_v2():
...# Featureform automatically handles temporal correctness
# No data leakage possible- Track feature usage across models
- Monitor feature drift
- Understand dependencies
- Debug issues quickly
cd naive/
python inference_naive.py --batch 20
# Watch: ~17 seconds for 20 predictionscd ../ # Go to featureform implementation
python inference.py --batch 20
# Watch: ~1 second for 20 predictionsWithout Featureform:
- 800ms per prediction
- Can handle ~1,200 transactions/hour per server
- Need 8 servers for 10,000 transactions/hour
- Cost: ~$3,200/month (8 × $400)
- Risk: Database overload during peak times
With Featureform:
- 20ms per prediction
- Can handle ~180,000 transactions/hour per server
- Need 1 server for 10,000 transactions/hour
- Cost: ~$400/month
- Risk: Minimal, scales easily
Savings: $2,800/month + operational headaches
✗ Slow predictions (100ms - 1000ms) ✗ Duplicated feature logic ✗ Risk of training-serving skew ✗ Database becomes bottleneck ✗ Hard to maintain ✗ Not scalable ✗ No versioning ✗ Manual point-in-time handling
✓ Fast predictions (10-50ms) ✓ Single feature definition ✓ Guaranteed consistency ✓ Database barely touched ✓ Easy to maintain ✓ Horizontally scalable ✓ Automatic versioning ✓ Point-in-time correctness built-in
- Run the naive approach to see the problems firsthand
- Compare with Featureform implementation in parent directory
- Measure the difference in latency and throughput
- Calculate the ROI for your use case
This naive implementation intentionally shows bad practices to demonstrate why feature stores like Featureform are essential for production ML systems.
For the proper Featureform implementation, see the parent directory.