A product-focused Data Scientist with over 6 years of experience building and shipping ML systems for CPG, retail, and real-estate domains. Proven expertise in demand forecasting, anomaly detection, and trade-promotion optimization, having trained 300 000+ models on multi-TB data to deliver $2.4 M in client savings. End-to-end ownership across problem framing, distributed model development, API deployment, monitoring, and stakeholder enablement.
– Domains: CPG · Retail · Real Estate – Impact: 300 000+ models · $2.4 M savings · 100 M+ rows/day pipelines – End-to-End: Framing · Feature Engineering · Model Training · Deployment · Monitoring · Enablement
Machine Learning & Analytics Forecasting · Time Series · Anomaly Detection · Trade Promotion Optimization · Product ML · Model Monitoring
Languages & Frameworks Python · SQL · PySpark · Pandas · NumPy · scikit-learn · PyTorch · TensorFlow · FastAPI · Streamlit
Data & Infrastructure Databricks · Spark · Dask · MongoDB · MySQL · Azure · AWS · Docker · CI/CD · Git · Hyperparameter Tuning · Distributed Training
NLP & LLMs RAG · Vector Stores · Semantic Search · OpenAI · Hugging Face
Tools & Libraries SHAP · PyOD · Vaex · Plotly · pre-commit
| Repository | Description |
|---|---|
| resume-builder | AI-powered resume optimization for ATS compatibility and job matching, using LLMs and semantic parsing. |
| Comparison_Segmentation_models | Comparative study of segmentation models, served via FastAPI and Streamlit, Dockerized. |
| FaceMatcher | Face recognition and matching utility using Python CV libraries. |
| Optical-Character-Recognition | Benchmarking multiple OCR libraries for accuracy and performance. |
| Projects | Sandbox for miscellaneous personal and experimental projects. |
| Linux_Programming | C language examples and projects for Linux system programming. |
| Learn_Python | Curated Python learning exercises and examples. |
| Learn_C-CPP | C/C++ data structures implementations and project assignments. |
| Machine-Learning | Collection of Python notebooks covering classic ML algorithms and tutorials. |
| remote_car | Control a car via Wi-Fi, Bluetooth, and computer vision. |
| InterFusion_updated | Fork of KDD’21 “Multivariate Time Series Anomaly Detection and Interpretation” with hierarchical embedding. |
| MNIST_AutoEncoders | Autoencoder architectures for MNIST digit compression and reconstruction. |
| Segmentation_frontend | Streamlit-based frontend for segmentation use cases (Heroku-deployed). |
| Segmentation_backend | Backend API for segmentation model inference, decoupled service design. |
| style-transfer | FastAPI + Streamlit web app for neural style transfer, Docker-ready. |
| ashish-surve.github.io | Personal website codebase built with JavaScript, HTML, SCSS, and CSS. |
| Data-Science--Cheat-Sheet | Forked collection of cheat sheets covering core data science concepts and commands. |
- Trained 300 000+ models on multi-TB datasets, yielding $2.4 M in client savings.
- Scaled forecasting pipelines to process 100 M+ rows per day.
- Reduced anomaly detection false positives by 35% through advanced monitoring.
- Problem Framing & Data Ingestion
- Distributed Feature Engineering (PySpark/Databricks)
- Model Development & Training (scikit-learn, PyTorch, TensorFlow)
- Hyperparameter Tuning & Automated CI/CD
- API Deployment (FastAPI + Docker)
- Monitoring & Alerting (Prometheus, Custom Dashboards)
- Stakeholder Enablement
- GitHub: github.com/Ashish-Surve
- LinkedIn: linkedin.com/in/ashish-surve
- Email: [email protected]
Let’s leverage data to drive impactful solutions!

