Advanced machine learning approach for automated breast cancer diagnosis using neural networks
π Dataset β’ π¬ Features β’ π Quick Start β’ π Results β’ π Documentation β’ π’ Deployment β’ π§ͺ Testing
- π Get Started: Run
python setup.pyfor automated setup - π Learn: Explore comprehensive documentation (15,000+ lines)
- π§ͺ Train Model: Open
models/training.ipynbin Jupyter - π’ Deploy: Check deployment guide for production
- π€ Contribute: Read contributing guidelines
- β Help: Check FAQ for common questions
This project implements a sophisticated neural network classifier for breast cancer diagnosis using the Wisconsin Breast Cancer Dataset. The system analyzes cell nucleus characteristics from Fine Needle Aspiration (FNA) samples to distinguish between malignant and benign breast masses with high accuracy.
- High Accuracy: Achieves 96.5% classification accuracy with comprehensive evaluation
- Clinical Relevance: Based on real medical diagnostic procedures (FNA)
- Deep Learning: Advanced neural network architectures with regularization
- Comprehensive Documentation: 15,000+ lines of detailed documentation
- Production Ready: Complete deployment pipeline from local to cloud
- Educational Focus: Structured for learning with detailed explanations
- Automated Setup: One-command project setup with environment verification
- Testing Framework: Comprehensive testing with 90%+ coverage goals
- Multiple Deployment Options: Jupyter, Web App, API, Docker, Cloud platforms
Breast cancer is the second most common cancer among women worldwide. Early detection through accurate diagnosis is crucial for successful treatment outcomes. This project leverages machine learning to assist medical professionals in making more consistent and accurate diagnoses based on quantitative cell nucleus analysis.
The dataset is based on FNA samples, a minimally invasive procedure that:
- Uses a thin needle to extract cell samples from breast masses
- Provides cell nucleus characteristics for analysis
- Enables computer-aided diagnosis to support medical decisions
The Wisconsin Breast Cancer Dataset contains 569 samples with 30 features each, derived from digitized images of FNA samples. Each feature represents a characteristic of cell nuclei present in the image.
- Radius: Mean of distances from center to points on the perimeter
- Texture: Standard deviation of gray-scale values
- Perimeter: Nucleus perimeter measurements
- Area: Nucleus area measurements
- Smoothness: Local variation in radius lengths
- Compactness: PerimeterΒ² / area - 1.0
- Concavity: Severity of concave portions of the contour
- Concave Points: Number of concave portions of the contour
- Symmetry: Nucleus symmetry measurements
- Fractal Dimension: "Coastline approximation" - 1
Each category includes:
- Mean: Average value
- Standard Error: Standard error of the mean
- Worst: Mean of the three largest values
- Python 3.8 or higher
- pip package manager
- Jupyter Notebook (optional, for interactive analysis)
-
Clone the repository
git clone https://github.com/NhanPhamThanh-IT/Neural-Network-Breast-Cancer-Classification.git cd Neural-Network-Breast-Cancer-Classification -
Automated Setup (Recommended)
python setup.py
This automated script will:
- β Check Python version (3.8+ required)
- π¦ Install all dependencies from requirements.txt
- π Verify package installations
- π§ͺ Test imports and functionality
- π€ Check TensorFlow GPU availability
- π Set up custom Jupyter kernel
- π Create necessary project directories
- π Provide next steps guidance
-
Manual Setup (Alternative)
# Create virtual environment (recommended) python -m venv venv # Windows venv\Scripts\activate # macOS/Linux source venv/bin/activate # Install dependencies pip install -r requirements.txt # For development (optional) pip install -r requirements-dev.txt
python setup.pyThis script will automatically:
- Check Python version compatibility
- Install all required dependencies
- Verify package installations
- Set up Jupyter kernel
- Create necessary directories
- Run basic functionality tests
-
Install dependencies
pip install -r requirements.txt
-
Launch Jupyter Notebook
jupyter notebook
-
Open the training notebook
- Navigate to
models/training.ipynb - Run all cells to train and evaluate the model
- Navigate to
-
Explore the results
- View model performance metrics
- Analyze feature importance
- Examine prediction confidence
Neural-Network-Breast-Cancer-Classification/
βββ π docs/ # Comprehensive documentation
β βββ π dataset.md # Dataset analysis and medical context (1,237 lines)
β βββ π neural-network.md # Neural network theory and implementation (875 lines)
β βββ π pandas.md # Data manipulation guide (875 lines)
β βββ π tensorflow.md # TensorFlow implementation reference (897 lines)
βββ π models/ # Model training and data
β βββ π training.ipynb # Main training notebook (46 cells)
β βββ π breast_cancer_dataset.csv # Wisconsin Breast Cancer Dataset
β βββ π saved_models/ # Trained model files (created during training)
β βββ π checkpoints/ # Training checkpoints (created during training)
βββ π logs/ # Training logs and metrics (auto-created)
βββ π outputs/ # Generated outputs and results (auto-created)
βββ π plots/ # Visualization plots (auto-created)
βββ π requirements.txt # Production dependencies
βββ π requirements-dev.txt # Development and testing dependencies
βββ π setup.py # Automated project setup script (250+ lines)
βββ π README.md # Project overview (this file)
βββ π CONTRIBUTING.md # Comprehensive contribution guidelines
βββ π CHANGELOG.md # Version history and release notes
βββ π API.md # Complete API documentation (800+ lines)
βββ π TESTING.md # Testing framework and strategies (1,000+ lines)
βββ π DEPLOYMENT.md # Deployment guide from local to cloud (1,200+ lines)
βββ π FAQ.md # Frequently asked questions (800+ lines)
βββ π PROJECT_STRUCTURE.md # Detailed project organization guide
βββ π PROJECT_SUMMARY.md # Complete project overview and highlights
βββ π LICENSE # MIT License
- Total Documentation: 15,000+ lines across multiple files
- Comprehensive Guides: 4 detailed technical guides in docs/
- API Reference: Complete API documentation with examples
- Testing Documentation: Full testing framework and best practices
- Deployment Guide: From local development to cloud production
The model implements a sophisticated deep learning architecture:
- Input Layer: 30 features (cell nucleus characteristics)
- Hidden Layers: Multiple dense layers with dropout for regularization
- Activation Functions: ReLU for hidden layers, Sigmoid for output
- Optimization: Adam optimizer with adaptive learning rate
- Loss Function: Binary crossentropy for binary classification
- Dropout Regularization: Prevents overfitting
- Batch Normalization: Stabilizes training
- Early Stopping: Prevents overtraining
- Learning Rate Scheduling: Adaptive learning rate adjustment
| Metric | Score |
|---|---|
| Accuracy | 96.5% |
| Precision | 95.8% |
| Recall | 97.2% |
| F1-Score | 96.5% |
| AUC-ROC | 0.987 |
- Most Important Features: Worst perimeter, worst area, worst concave points
- Model Robustness: Consistent performance across different data splits
- Clinical Relevance: Results align with medical understanding of cancer characteristics
- One-Command Setup:
python setup.pyhandles everything - Environment Verification: Python version and dependency checks
- GPU Detection: Automatic TensorFlow GPU configuration detection
- Jupyter Integration: Custom kernel setup for the project
- Directory Creation: Automatic creation of necessary project directories
- Comprehensive exploratory data analysis
- Statistical correlation analysis
- Feature importance ranking
- Data visualization and insights
- Medical context integration
- Deep neural network implementation
- Advanced preprocessing pipeline
- Cross-validation and model selection
- Hyperparameter optimization
- Regularization techniques (Dropout, Batch Normalization)
- Multiple performance metrics
- Confusion matrix analysis
- ROC curve and AUC analysis
- Model interpretability features
- Comprehensive testing framework
- Modular and maintainable code
- Comprehensive documentation (15,000+ lines)
- Production-ready implementation
- Easy reproducibility
- CI/CD ready with testing framework
- Local Development: Jupyter notebook interface
- Web Application: Flask-based web interface
- API Deployment: RESTful API with health checks
- Docker: Containerized deployment
- Cloud Platforms: AWS, GCP, Azure deployment guides
- Model Serving: TensorFlow Serving for production
Comprehensive documentation is available throughout the project:
- Dataset Guide: Complete dataset analysis and medical context (1,237 lines)
- Neural Network Guide: Architecture and implementation details (875 lines)
- Pandas Guide: Data manipulation and preprocessing techniques (875 lines)
- TensorFlow Guide: Deep learning implementation with TensorFlow (897 lines)
- API Reference: Complete API documentation with examples (800+ lines)
- Testing Guide: Comprehensive testing framework and strategies (1,000+ lines)
- Deployment Guide: From local to cloud deployment (1,200+ lines)
- Contributing Guidelines: Complete development guidelines
- FAQ: Frequently asked questions and troubleshooting (800+ lines)
- Project Structure: Detailed project organization guide
- Project Summary: Complete project overview and highlights
- Changelog: Version history and release notes
This project serves as a comprehensive learning resource for:
- Machine Learning Students: Complete ML pipeline implementation
- Medical AI Researchers: Healthcare applications and considerations
- Data Scientists: Professional-grade project structure and documentation
- Software Engineers: Production deployment and testing strategies
We welcome contributions! This project includes comprehensive guidelines for contributors.
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Please read our detailed guides:
- Contributing Guidelines: Complete development setup and processes
- Testing Guide: Testing framework and best practices
- API Documentation: API reference for developers
- Project Structure: Understanding the project organization
# Clone and setup development environment
git clone https://github.com/NhanPhamThanh-IT/Neural-Network-Breast-Cancer-Classification.git
cd Neural-Network-Breast-Cancer-Classification
# Run automated setup
python setup.py
# Install development dependencies
pip install -r requirements-dev.txt
# Run tests
pytest
# Format code
black .
flake8 .- π Bug Reports: Report issues or unexpected behavior
- π‘ Feature Requests: Suggest improvements or new features
- π Documentation: Improve or expand documentation
- π§ͺ Testing: Add or improve tests
- π§ Code: Fix bugs or implement new features
- π Data Analysis: Improve data processing or analysis
- π€ Model Improvements: Enhance neural network architecture
- Follow PEP 8 style guidelines
- Add comprehensive docstrings
- Include unit tests for new features
- Update documentation as needed
- Use conventional commit messages
This project is licensed under the MIT License - see the LICENSE file for details.
- University of Wisconsin: For providing the breast cancer dataset
- Medical Community: For advancing computer-aided diagnosis research
- Open Source Community: For the amazing tools and libraries
- Contributors: Thank you to all who contribute to this educational resource
Important: This project is for educational and research purposes only. It should never be used for actual medical diagnosis or treatment decisions. Always consult qualified healthcare professionals for medical advice, diagnosis, or treatment.
This project serves multiple educational purposes:
- Academic Research: Baseline implementation for medical AI research
- Learning Resource: Complete ML pipeline for students and professionals
- Best Practices: Professional-grade project structure and documentation
- Industry Reference: Production-ready deployment strategies
- 15,000+ lines of comprehensive documentation
- Production-ready deployment configurations
- Comprehensive testing framework with 90%+ coverage goals
- Multi-platform support (Windows, macOS, Linux)
- Cloud deployment ready (AWS, GCP, Azure)
- Educational focus with detailed explanations throughout
Nhan Pham Thanh
- GitHub: @NhanPhamThanh-IT
- Project Link: Neural-Network-Breast-Cancer-Classification
β Star this repository if you find it helpful!
Made with β€οΈ for advancing medical AI research