Graph embeddings are supposed to enhance tabular models. But on Elliptic++ Bitcoin fraud detection, adding Node2Vec embeddings to XGBoost actually decreases performance by 2%.
This repository demonstrates why β and validates that rich tabular features already encode graph structure, making explicit graph embeddings redundant.
Main Result: XGBoost (tabular-only) achieves PR-AUC 0.669, while XGBoost + Node2Vec (fusion) achieves only 0.656.
Why? Features
AF1βAF93(local transaction attributes) combined with the baseline'sAF94βAF182(neighbor aggregates) already capture graph topology.Conclusion: Graph embeddings don't add value when tabular features already encode neighborhood information β a negative result that's scientifically valuable.
We trained a fusion model using strict temporal splits (no leakage) on the Elliptic++ dataset:
| Model | Features | PR-AUC β | ROC-AUC | F1 | Recall@1% |
|---|---|---|---|---|---|
| XGBoost (Baseline) | Tabular only (AF1-93) | 0.669 π | 0.888 | 0.699 | - |
| XGBoost + Node2Vec | Tabular + 64-dim embeddings | 0.656 |
0.861 | 0.688 | 17.5% |
| Random Forest | Tabular only | 0.658 | 0.877 | 0.694 | - |
| MLP | Tabular only | 0.364 | 0.830 | 0.486 | - |
Figure 2: Multi-metric comparison across models. Fusion (blue) consistently underperforms baseline XGBoost (green).
β οΈ Key Insight: The 2% performance drop (0.669 β 0.656) when adding graph embeddings indicates that tabular features already capture neighborhood information effectively.
Figure 3: Graph-Tabular Fusion pipeline showing leakage-free embedding generation and feature concatenation.
Fusion Protocol A (Implemented):
- Temporal splits: 60% train / 20% val / 20% test (from baseline)
- Graph embeddings: Node2Vec (64-dim) generated per-split to prevent leakage
- Tabular features: Local features (AF1-93) to avoid double-encoding
- Fusion: Concatenate embeddings + features β 157 total dimensions
- Model: XGBoost with early stopping on validation PR-AUC
Leakage Prevention:
- β Embeddings computed separately for train/val/test using within-split edges only
- β No future information used in random walks
- β Same temporal splits as baseline for fair comparison
Three reasons embeddings are redundant:
-
Tabular features already encode graph structure
- Local features (AF1-93) capture transaction characteristics
- Baseline aggregate features (AF94-182) explicitly encode neighbor statistics
- Graph topology is implicitly represented in the data
-
Node2Vec embeddings approximate what features already have
- Random walk embeddings learn neighborhood structure
- Similar patterns to pre-computed neighbor aggregates
- No unique signal beyond tabular representation
-
Rich feature engineering beats architectural complexity
- 166 engineered features per node (local + aggregates)
- Significant domain knowledge encoded in features
- Graph structure less informative than node attributes
Key Metrics (Test Set):
- PR-AUC: 0.656 vs 0.669 baseline (-2%)
- ROC-AUC: 0.861 vs 0.888 baseline (-3%)
- F1 Score: 0.688 vs 0.699 baseline (-2%)
- Training Time: Similar (~2 minutes on CPU)
- Features: 157 vs 93 (+64 embeddings added no value)
Validation Performance:
- PR-AUC: 0.965 (excellent learning)
- Slight overfitting from validation to test
- Python 3.8+
- 2GB disk space for dataset + embeddings
- Optional: GPU for faster embedding generation (CPU works, ~30 min)
# 1οΈβ£ Clone repository
git clone https://github.com/BhaveshBytess/GraphTabular-FraudFusion.git
cd GraphTabular-FraudFusion
# 2οΈβ£ Setup environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# 3οΈβ£ Download Elliptic++ dataset (NOT included)
# Get from: https://drive.google.com/drive/folders/1MRPXz79Lu_JGLlJ21MDfML44dKN9R08l
# Place files in: data/Elliptic++ Dataset/
# βββ txs_features.csv
# βββ txs_classes.csv
# βββ txs_edgelist.csv
# 4οΈβ£ Verify dataset
python src/data/verify_dataset.py "data/Elliptic++ Dataset"
# 5οΈβ£ Generate embeddings (~30 min on CPU, ~5 min on GPU)
python scripts/generate_embeddings.py
# 6οΈβ£ Train fusion model (~2 min)
python scripts/train_fusion.py
# 7οΈβ£ View results
ls reports/ # Metrics and model
ls reports/plots/ # VisualizationsExpected Output:
- Embeddings:
data/embeddings.parquet(70 MB, 203K nodes Γ 64 dims) - Model:
reports/xgb_fusion.json - Metrics:
reports/metrics.json(PR-AUC β 0.656 Β± 0.01)
graph-tabular-fusion/
βββ data/
β βββ Elliptic++ Dataset/ # User-provided (see Quick Start)
β βββ embeddings.parquet # Generated Node2Vec embeddings
βββ notebooks/
β βββ 01_generate_embeddings.ipynb # Kaggle-ready
β βββ 02_fusion_xgb.ipynb # Kaggle-ready
β βββ 03_ablation_studies.ipynb # Optional experiments
βββ src/
β βββ data/ # Loaders, splits, verification
β βββ embeddings/ # Node2Vec implementation
β βββ train/ # XGBoost fusion trainer
β βββ eval/ # Comparison reports
β βββ utils/ # Metrics, seeding, logging
βββ configs/ # YAML configurations
βββ reports/
β βββ metrics.json # Evaluation results
β βββ metrics_summary.csv # Consolidated comparison
β βββ plots/ # Visualizations
β βββ xgb_fusion.json # Trained model
βββ scripts/ # Execution pipelines
βββ docs/ # Specifications, provenance
- Seed: 42 (fixed for all random operations)
- Splits: Temporal 60/20/20 (imported from baseline)
- Embeddings: Deterministic Node2Vec (fixed seed)
- Metrics: Same evaluation protocol as baseline
- Per-split embedding generation: Train/val/test embeddings computed independently
- Within-split edges only: No cross-split information in random walks
- Temporal isolation: No future information leaks to past
- Same splits as baseline (exact txId alignment)
- Same metrics (PR-AUC, ROC-AUC, F1, Recall@K)
- Same class weighting (computed from training data)
- No hyperparameter tuning (baseline config reused)
- Use tabular features alone - simpler, faster, equally effective
- Graph embeddings β automatic improvement - validate with strong baselines
- Feature engineering > model complexity for fraud detection
- XGBoost on rich features often beats sophisticated graph methods
- Negative results are valuable - demonstrate when fusion doesn't help
- Baseline comparison is critical - always test against best tabular methods
- Feature redundancy matters - check what's already in your data
- Honest reporting builds credibility - report findings, not hopes
- Production-ready: Simpler XGBoost preferred (no embeddings needed)
- Deployment: Tabular-only approach easier to maintain and debug
- Cost: Save computation (no embedding generation required)
- Interpretability: XGBoost feature importance more actionable
This work contributes:
- Empirical validation of when graph methods don't help
- Rigorous methodology for fusion model evaluation
- Honest reporting of negative results (often unpublished)
- Reproducible pipeline for graph-tabular fusion experiments
- Portfolio demonstration of scientific thinking and rigor
Publication-worthy aspects:
- Leakage-free temporal evaluation framework
- Comprehensive baseline comparison
- Clear interpretation of negative results
- Reproducible experimental design
- Practical guidance for practitioners
This extension builds on the baseline project:
Baseline Repository: FRAUD-DETECTION-GNN
Baseline Finding:
"XGBoost (tabular) beats GraphSAGE (GNN) by 49% because features already encode neighbor information."
This Extension Validates:
"Adding explicit graph embeddings doesn't help because tabular features already capture graph structure."
Provenance: See docs/baseline_provenance.json for baseline commit SHA and imported artifacts.
Not implemented but interesting:
- Protocol B: Test with full features (AF1-182) + embeddings
- Embedding dimensions: Sweep 16/32/128 (does size matter?)
- GraphSAGE export: Compare supervised vs unsupervised embeddings
- MLP fusion learner: Alternative to XGBoost
- Explainability: SHAP analysis on fusion features
- Temporal embeddings: Time-aware graph learning
- Cross-dataset: Test on Ethereum phishing networks
If you use this work, please cite:
@software{graphtabular_fusion_2025,
title={Graph-Tabular Fusion on Elliptic++ Bitcoin Fraud Detection},
author={Your Name},
year={2025},
url={https://github.com/BhaveshBytess/GraphTabular-FraudFusion}
}Dataset Citation:
@article{weber2019anti,
title={Anti-money laundering in bitcoin: Experimenting with graph convolutional networks for financial forensics},
author={Weber, Mark and Domeniconi, Giacomo and Chen, Jie and Weidele, Daniel Karl I and Bellei, Claudio and Robinson, Tom and Leiserson, Charles E},
journal={arXiv preprint arXiv:1908.02591},
year={2019}
}MIT License - See LICENSE for details.
Educational/demonstrative use. Respect Elliptic++ dataset terms and conditions.
- Elliptic for the Elliptic++ dataset
- Baseline project for splits, metrics, and utilities
- PyTorch Geometric & XGBoost communities
- NetworkX & Gensim for Node2Vec implementation
For questions, issues, or collaboration:
- GitHub Issues: Open an issue
- Email: [Your email]
- LinkedIn: [Your profile]
Status: β Complete (E1-E3) | π Results validated | π Portfolio-ready
Last Updated: November 2025 | Version: 1.0.0




