This repository demonstrates a full-scale MLOps pipeline for a Sentiment Analysis model trained on the IMDB movie review dataset. It combines best practices from software engineering and machine learning to create a robust, reproducible, and deployable ML system.
| Category | Tools/Services Used |
|---|---|
| Version Control | Git, GitHub |
| Virtual Env | Conda |
| Experiment Tracking | MLFlow (via Dagshub) |
| Data Versioning | DVC |
| Model Serving | Flask API |
| CI/CD | GitHub Actions |
| Containerization | Docker, DockerHub |
| Cloud Storage | AWS S3 (for DVC remote), AWS ECR (for container image storage) |
git clone https://github.com/AdArya125/MLOPS-Capstone-Project.git && cd MLOPS-Capstone-Project
conda create -n atlas python=3.10 -y
conda activate atlas
pip install cookiecutter dagshub mlflow dvc dvc[s3] awscli flask pipreqscookiecutter -c v1 https://github.com/drivendata/cookiecutter-data-science
mv src/models src/model- Create a Dagshub repo and connect your GitHub repo.
- Copy the MLFlow tracking URI and auth token.
- Run experiment notebooks and track metrics via MLFlow.
from dagshub import dagshub_logger
import mlflow
mlflow.set_tracking_uri("https://dagshub.com/<user>/<repo>.mlflow")
mlflow.set_experiment("sentiment-analysis")
with mlflow.start_run():
...dvc init
mkdir local_s3
dvc remote add -d mylocal local_s3-
Implement components inside
src/:logger.pydata_ingestion.pydata_preprocessing.pyfeature_engineering.pymodel_building.pymodel_evaluation.pyregister_model.py
-
Add
dvc.yamlandparams.yaml.
dvc repro
dvc status
git add .
git commit -m "DVC and MLFlow tracking setup"
git pushaws configure
dvc remote add -d myremote s3://<your-bucket-name>Ensure your IAM user has AmazonS3FullAccess.
cd flask_app
pip install flask
python app.py- Add
.github/workflows/ci.yaml - Generate Dagshub token and save it in GitHub secrets as
CAPSTONE_TEST - Add AWS-related secrets:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_REGION
AWS_ACCOUNT_ID
ECR_REPOSITORY
pipreqs flask_app --force
docker build -t capstone-app:latest .
docker run -p 8888:5000 -e CAPSTONE_TEST=your_token capstone-app:latest(Optional):
docker push youruser/capstone-app:latest- Create ECR repo in AWS.
- Ensure IAM user has
AmazonEC2ContainerRegistryFullAccess. - Add ECR build/push steps to CI pipeline.
Use the tests/ and scripts/ directories for unit tests.
pytest tests/Aditya Arya 📧 [email protected] 🌐 LinkedIn 🔗 GitHub
This project is licensed under the MIT License. See the LICENSE file for details.