This project implements a convolutional neural network (CNN) based on the AlexNet architecture for image classification, with an interactive web interface for demonstration. The implementation is inspired by the landmark paper "ImageNet Classification with Deep Convolutional Neural Networks" by Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton (2012).
The project includes:
- Implementation of the AlexNet architecture (adapted for smaller datasets)
- A simplified version with fewer parameters for faster training
- Training and evaluation on CIFAR-10 or MNIST datasets
- Visualization of training progress and model predictions
- Interactive web interface for image classification
- Python 3.6+
- PyTorch 1.7+
- torchvision
- Flask (for web interface)
- numpy
- matplotlib
- tqdm
- Pillow
Install the required packages using:
pip install -r requirements.txt
main.py: Main script to run training and testingmodel.py: CNN model definitions (AlexNet and SimplifiedAlexNet)data_utils.py: Utilities for loading and preprocessing datasetstrain_utils.py: Functions for training, evaluation, and visualizationapp.py: Flask web application for the interactive demotemplates/: HTML templates for the web interfacestatic/: Static files (CSS, JS, uploaded images)requirements.txt: Required Python packages
Train a simplified AlexNet on CIFAR-10:
python main.py
python main.py --dataset cifar10 --model simplified --batch-size 128 --epochs 20 --lr 0.01
--dataset: Dataset to use (cifar10 or mnist)--model: Model architecture (alexnet or simplified)--batch-size: Batch size for training--epochs: Number of epochs to train--lr: Learning rate--momentum: Momentum for SGD optimizer--weight-decay: Weight decay for regularization--num-workers: Number of workers for data loading--save-dir: Directory to save results--no-cuda: Disable CUDA training
After training the model, run the web application:
python app.py
Then open a web browser and navigate to http://localhost:5000 to access the interactive demo.
The web interface provides:
- Image upload for classification
- Real-time prediction with confidence scores
- Visualization of model architecture
- Display of model performance metrics
- Information about the AlexNet architecture and CIFAR-10 dataset
The original AlexNet was designed for 224x224 images, but this implementation is adapted for 32x32 images (CIFAR-10/MNIST). The architecture includes:
- 5 convolutional layers with ReLU activations
- 3 max pooling layers
- 3 fully connected layers
- Dropout for regularization
A more compact version with:
- 4 convolutional layers
- 3 max pooling layers
- 2 fully connected layers
- Fewer parameters for faster training
After training, the model will:
- Display training and testing loss/accuracy curves
- Save these plots to the specified directory
- Save the trained model
- Visualize predictions on test images
These results can also be viewed in the web interface.
- Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
- CIFAR-10 dataset: https://www.cs.toronto.edu/~kriz/cifar.html
- MNIST dataset: http://yann.lecun.com/exdb/mnist/