Skip to content
This repository was archived by the owner on Jul 31, 2025. It is now read-only.

🩺 Advanced neural network for breast cancer classification using Wisconsin dataset. Analyzes cell nucleus characteristics from FNA samples to distinguish malignant/benign masses with 96.5% accuracy. Features comprehensive documentation, automated setup, testing framework, and deployment guides. Educational ML project with 15,000+ lines of docs.

License

Notifications You must be signed in to change notification settings

NhanPhamThanh-IT/Neural-Network-Breast-Cancer-Classification

Repository files navigation

🧠 Neural Network Breast Cancer Classification

Python TensorFlow License Status

Advanced machine learning approach for automated breast cancer diagnosis using neural networks

πŸ“Š Dataset β€’ πŸ”¬ Features β€’ πŸš€ Quick Start β€’ πŸ“ˆ Results β€’ πŸ“– Documentation β€’ 🚒 Deployment β€’ πŸ§ͺ Testing


πŸš€ Quick Navigation

🎯 Project Overview

This project implements a sophisticated neural network classifier for breast cancer diagnosis using the Wisconsin Breast Cancer Dataset. The system analyzes cell nucleus characteristics from Fine Needle Aspiration (FNA) samples to distinguish between malignant and benign breast masses with high accuracy.

🌟 Key Highlights

  • High Accuracy: Achieves 96.5% classification accuracy with comprehensive evaluation
  • Clinical Relevance: Based on real medical diagnostic procedures (FNA)
  • Deep Learning: Advanced neural network architectures with regularization
  • Comprehensive Documentation: 15,000+ lines of detailed documentation
  • Production Ready: Complete deployment pipeline from local to cloud
  • Educational Focus: Structured for learning with detailed explanations
  • Automated Setup: One-command project setup with environment verification
  • Testing Framework: Comprehensive testing with 90%+ coverage goals
  • Multiple Deployment Options: Jupyter, Web App, API, Docker, Cloud platforms

πŸ”¬ Medical Context

Breast cancer is the second most common cancer among women worldwide. Early detection through accurate diagnosis is crucial for successful treatment outcomes. This project leverages machine learning to assist medical professionals in making more consistent and accurate diagnoses based on quantitative cell nucleus analysis.

🩺 Fine Needle Aspiration (FNA)

The dataset is based on FNA samples, a minimally invasive procedure that:

  • Uses a thin needle to extract cell samples from breast masses
  • Provides cell nucleus characteristics for analysis
  • Enables computer-aided diagnosis to support medical decisions

πŸ“Š Dataset

The Wisconsin Breast Cancer Dataset contains 569 samples with 30 features each, derived from digitized images of FNA samples. Each feature represents a characteristic of cell nuclei present in the image.

Feature Categories

  1. Radius: Mean of distances from center to points on the perimeter
  2. Texture: Standard deviation of gray-scale values
  3. Perimeter: Nucleus perimeter measurements
  4. Area: Nucleus area measurements
  5. Smoothness: Local variation in radius lengths
  6. Compactness: PerimeterΒ² / area - 1.0
  7. Concavity: Severity of concave portions of the contour
  8. Concave Points: Number of concave portions of the contour
  9. Symmetry: Nucleus symmetry measurements
  10. Fractal Dimension: "Coastline approximation" - 1

Each category includes:

  • Mean: Average value
  • Standard Error: Standard error of the mean
  • Worst: Mean of the three largest values

πŸš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • pip package manager
  • Jupyter Notebook (optional, for interactive analysis)

Installation

  1. Clone the repository

    git clone https://github.com/NhanPhamThanh-IT/Neural-Network-Breast-Cancer-Classification.git
    cd Neural-Network-Breast-Cancer-Classification
  2. Automated Setup (Recommended)

    python setup.py

    This automated script will:

    • βœ… Check Python version (3.8+ required)
    • πŸ“¦ Install all dependencies from requirements.txt
    • πŸ” Verify package installations
    • πŸ§ͺ Test imports and functionality
    • πŸ€– Check TensorFlow GPU availability
    • πŸ““ Set up custom Jupyter kernel
    • πŸ“ Create necessary project directories
    • πŸŽ‰ Provide next steps guidance
  3. Manual Setup (Alternative)

    # Create virtual environment (recommended)
    python -m venv venv
    # Windows
    venv\Scripts\activate
    # macOS/Linux
    source venv/bin/activate
    
    # Install dependencies
    pip install -r requirements.txt
    
    # For development (optional)
    pip install -r requirements-dev.txt

πŸƒβ€β™‚οΈ Running the Project

Option 1: Automated Setup (Recommended)

python setup.py

This script will automatically:

  • Check Python version compatibility
  • Install all required dependencies
  • Verify package installations
  • Set up Jupyter kernel
  • Create necessary directories
  • Run basic functionality tests

Option 2: Manual Setup

  1. Install dependencies

    pip install -r requirements.txt
  2. Launch Jupyter Notebook

    jupyter notebook
  3. Open the training notebook

    • Navigate to models/training.ipynb
    • Run all cells to train and evaluate the model
  4. Explore the results

    • View model performance metrics
    • Analyze feature importance
    • Examine prediction confidence

πŸ”§ Project Structure

Neural-Network-Breast-Cancer-Classification/
β”œβ”€β”€ πŸ“ docs/                          # Comprehensive documentation
β”‚   β”œβ”€β”€ πŸ“„ dataset.md                 # Dataset analysis and medical context (1,237 lines)
β”‚   β”œβ”€β”€ πŸ“„ neural-network.md          # Neural network theory and implementation (875 lines)
β”‚   β”œβ”€β”€ πŸ“„ pandas.md                  # Data manipulation guide (875 lines)
β”‚   └── πŸ“„ tensorflow.md              # TensorFlow implementation reference (897 lines)
β”œβ”€β”€ πŸ“ models/                        # Model training and data
β”‚   β”œβ”€β”€ πŸ“„ training.ipynb             # Main training notebook (46 cells)
β”‚   β”œβ”€β”€ πŸ“Š breast_cancer_dataset.csv  # Wisconsin Breast Cancer Dataset
β”‚   β”œβ”€β”€ πŸ“ saved_models/              # Trained model files (created during training)
β”‚   └── πŸ“ checkpoints/               # Training checkpoints (created during training)
β”œβ”€β”€ πŸ“ logs/                          # Training logs and metrics (auto-created)
β”œβ”€β”€ πŸ“ outputs/                       # Generated outputs and results (auto-created)
β”œβ”€β”€ πŸ“ plots/                         # Visualization plots (auto-created)
β”œβ”€β”€ πŸ“„ requirements.txt               # Production dependencies
β”œβ”€β”€ πŸ“„ requirements-dev.txt           # Development and testing dependencies
β”œβ”€β”€ πŸ“„ setup.py                       # Automated project setup script (250+ lines)
β”œβ”€β”€ πŸ“„ README.md                      # Project overview (this file)
β”œβ”€β”€ πŸ“„ CONTRIBUTING.md                # Comprehensive contribution guidelines
β”œβ”€β”€ πŸ“„ CHANGELOG.md                   # Version history and release notes
β”œβ”€β”€ πŸ“„ API.md                         # Complete API documentation (800+ lines)
β”œβ”€β”€ πŸ“„ TESTING.md                     # Testing framework and strategies (1,000+ lines)
β”œβ”€β”€ πŸ“„ DEPLOYMENT.md                  # Deployment guide from local to cloud (1,200+ lines)
β”œβ”€β”€ πŸ“„ FAQ.md                         # Frequently asked questions (800+ lines)
β”œβ”€β”€ πŸ“„ PROJECT_STRUCTURE.md           # Detailed project organization guide
β”œβ”€β”€ πŸ“„ PROJECT_SUMMARY.md             # Complete project overview and highlights
└── πŸ“„ LICENSE                        # MIT License

πŸ“Š Documentation Statistics

  • Total Documentation: 15,000+ lines across multiple files
  • Comprehensive Guides: 4 detailed technical guides in docs/
  • API Reference: Complete API documentation with examples
  • Testing Documentation: Full testing framework and best practices
  • Deployment Guide: From local development to cloud production

🧠 Neural Network Architecture

The model implements a sophisticated deep learning architecture:

  • Input Layer: 30 features (cell nucleus characteristics)
  • Hidden Layers: Multiple dense layers with dropout for regularization
  • Activation Functions: ReLU for hidden layers, Sigmoid for output
  • Optimization: Adam optimizer with adaptive learning rate
  • Loss Function: Binary crossentropy for binary classification

Model Features

  • Dropout Regularization: Prevents overfitting
  • Batch Normalization: Stabilizes training
  • Early Stopping: Prevents overtraining
  • Learning Rate Scheduling: Adaptive learning rate adjustment

πŸ“ˆ Results

Performance Metrics

Metric Score
Accuracy 96.5%
Precision 95.8%
Recall 97.2%
F1-Score 96.5%
AUC-ROC 0.987

Key Insights

  • Most Important Features: Worst perimeter, worst area, worst concave points
  • Model Robustness: Consistent performance across different data splits
  • Clinical Relevance: Results align with medical understanding of cancer characteristics

πŸ”¬ Features

οΏ½ Automated Setup

  • One-Command Setup: python setup.py handles everything
  • Environment Verification: Python version and dependency checks
  • GPU Detection: Automatic TensorFlow GPU configuration detection
  • Jupyter Integration: Custom kernel setup for the project
  • Directory Creation: Automatic creation of necessary project directories

οΏ½πŸ“Š Data Analysis

  • Comprehensive exploratory data analysis
  • Statistical correlation analysis
  • Feature importance ranking
  • Data visualization and insights
  • Medical context integration

πŸ€– Machine Learning

  • Deep neural network implementation
  • Advanced preprocessing pipeline
  • Cross-validation and model selection
  • Hyperparameter optimization
  • Regularization techniques (Dropout, Batch Normalization)

πŸ“ˆ Evaluation

  • Multiple performance metrics
  • Confusion matrix analysis
  • ROC curve and AUC analysis
  • Model interpretability features
  • Comprehensive testing framework

πŸ› οΈ Engineering

  • Modular and maintainable code
  • Comprehensive documentation (15,000+ lines)
  • Production-ready implementation
  • Easy reproducibility
  • CI/CD ready with testing framework

🌐 Deployment Options

  • Local Development: Jupyter notebook interface
  • Web Application: Flask-based web interface
  • API Deployment: RESTful API with health checks
  • Docker: Containerized deployment
  • Cloud Platforms: AWS, GCP, Azure deployment guides
  • Model Serving: TensorFlow Serving for production

πŸ“– Documentation

Comprehensive documentation is available throughout the project:

πŸ“š Core Documentation (docs/)

πŸ”§ Development Documentation

πŸ“‹ Project Information

🎯 Educational Value

This project serves as a comprehensive learning resource for:

  • Machine Learning Students: Complete ML pipeline implementation
  • Medical AI Researchers: Healthcare applications and considerations
  • Data Scientists: Professional-grade project structure and documentation
  • Software Engineers: Production deployment and testing strategies

🀝 Contributing

We welcome contributions! This project includes comprehensive guidelines for contributors.

πŸ“‹ Quick Contributing Steps

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“– Comprehensive Guidelines

Please read our detailed guides:

πŸ”§ Development Setup

# Clone and setup development environment
git clone https://github.com/NhanPhamThanh-IT/Neural-Network-Breast-Cancer-Classification.git
cd Neural-Network-Breast-Cancer-Classification

# Run automated setup
python setup.py

# Install development dependencies
pip install -r requirements-dev.txt

# Run tests
pytest

# Format code
black .
flake8 .

πŸ’‘ Contribution Areas

  • πŸ› Bug Reports: Report issues or unexpected behavior
  • πŸ’‘ Feature Requests: Suggest improvements or new features
  • πŸ“ Documentation: Improve or expand documentation
  • πŸ§ͺ Testing: Add or improve tests
  • πŸ”§ Code: Fix bugs or implement new features
  • πŸ“Š Data Analysis: Improve data processing or analysis
  • πŸ€– Model Improvements: Enhance neural network architecture

Development Guidelines

  • Follow PEP 8 style guidelines
  • Add comprehensive docstrings
  • Include unit tests for new features
  • Update documentation as needed
  • Use conventional commit messages

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • University of Wisconsin: For providing the breast cancer dataset
  • Medical Community: For advancing computer-aided diagnosis research
  • Open Source Community: For the amazing tools and libraries
  • Contributors: Thank you to all who contribute to this educational resource

βš•οΈ Medical Disclaimer

Important: This project is for educational and research purposes only. It should never be used for actual medical diagnosis or treatment decisions. Always consult qualified healthcare professionals for medical advice, diagnosis, or treatment.

πŸŽ“ Educational Impact

This project serves multiple educational purposes:

  • Academic Research: Baseline implementation for medical AI research
  • Learning Resource: Complete ML pipeline for students and professionals
  • Best Practices: Professional-grade project structure and documentation
  • Industry Reference: Production-ready deployment strategies

πŸ“Š Project Statistics

  • 15,000+ lines of comprehensive documentation
  • Production-ready deployment configurations
  • Comprehensive testing framework with 90%+ coverage goals
  • Multi-platform support (Windows, macOS, Linux)
  • Cloud deployment ready (AWS, GCP, Azure)
  • Educational focus with detailed explanations throughout

πŸ“ž Contact

Nhan Pham Thanh


⭐ Star this repository if you find it helpful!

Made with ❀️ for advancing medical AI research

About

🩺 Advanced neural network for breast cancer classification using Wisconsin dataset. Analyzes cell nucleus characteristics from FNA samples to distinguish malignant/benign masses with 96.5% accuracy. Features comprehensive documentation, automated setup, testing framework, and deployment guides. Educational ML project with 15,000+ lines of docs.

Topics

Resources

License

Contributing

Stars

Watchers

Forks