A deep learning model for American Sign Language (ASL) alphabet recognition using MobileNetV3Large architecture with transfer learning. This project achieves high accuracy in classifying ASL hand signs for letters A-Z and special characters.
- Overview
- Features
- Dataset
- Model Architecture
- Installation
- Usage
- Training Process
- Results
- Model Export
- Requirements
- License
This project implements a state-of-the-art deep learning model for recognizing American Sign Language alphabet gestures. The model uses MobileNetV3Large as the base architecture with custom classification layers, trained in two phases:
- Phase 1: Training the classifier head with frozen base model
- Phase 2: Fine-tuning the entire network with reduced learning rate
The model is optimized for both accuracy and deployment, with support for:
- Keras format for training and evaluation
- TensorFlow Lite format for mobile and edge device deployment
- π§ Transfer Learning: Leverages pre-trained MobileNetV3Large on ImageNet
- π¨ Data Augmentation: Random rotation, zoom, contrast, and brightness adjustments
- βοΈ Class Balancing: Automatic class weight calculation for imbalanced datasets
- π Comprehensive Evaluation: Detailed metrics, confusion matrix, and visualizations
- π± Mobile-Ready: TensorFlow Lite export for on-device inference
- π GPU Acceleration: Mixed precision training support for faster training
- π Learning Rate Scheduling: Adaptive learning rate reduction on plateau
The model is trained on the ASL Alphabet Dataset which includes:
- 26 letters (A-Z)
- 3 special characters (space, delete, nothing)
- Total: 29 classes
- Training: 70% of the dataset
- Validation: 15% of the dataset
- Test: 15% of the dataset
Expected dataset structure:
dataset/
βββ A/
β βββ image1.jpg
β βββ image2.jpg
β βββ ...
βββ B/
βββ C/
...
βββ Z/
βββ space/
βββ del/
βββ nothing/
The model consists of:
-
Base Model: MobileNetV3Large (pre-trained on ImageNet)
- Input shape: 200x200x3
- Pooling: Global Average Pooling
- Initial state: Frozen (Phase 1)
-
Custom Head:
- Dropout layer (0.2)
- Dense layer (29 units, softmax activation)
-
Training Configuration:
- Phase 1: Adam optimizer (lr=0.001), 15 epochs
- Phase 2: Adam optimizer (lr=0.00002), 15 epochs
- Loss: Categorical Crossentropy
- Callbacks: ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
- Python 3.8+
- TensorFlow 2.x
- CUDA-compatible GPU (optional, but recommended)
pip install numpy pandas matplotlib seaborn scikit-learn tensorflowOr install from a requirements file:
pip install -r requirements.txt-
Open the notebook:
jupyter notebook asl-model.ipynb
-
Update dataset path in the notebook to point to your ASL dataset location
-
Run all cells to:
- Load and prepare the dataset
- Train the model
- Evaluate performance
- Export models
import tensorflow as tf
import numpy as np
from PIL import Image
# Load the model
model = tf.keras.models.load_model('models/model.keras')
# Load and preprocess image
img = Image.open('test_image.jpg').resize((200, 200))
img_array = np.array(img) / 255.0
img_array = np.expand_dims(img_array, axis=0)
# Make prediction
predictions = model.predict(img_array)
predicted_class = np.argmax(predictions[0])
# Load class names
with open('models/training_set_labels.txt', 'r') as f:
class_names = [line.strip() for line in f.readlines()]
print(f"Predicted: {class_names[predicted_class]}")
print(f"Confidence: {predictions[0][predicted_class]:.2%}")The training follows a two-phase approach:
- Base model layers are frozen
- Only the classification head is trained
- Higher learning rate (0.001)
- Class weights applied for imbalanced data
- All layers are unfrozen
- Entire network is fine-tuned
- Lower learning rate (0.00002)
- Learning rate reduction on plateau
Applied during training to improve generalization:
- Random rotation (Β±10%)
- Random zoom (Β±10%)
- Random contrast (Β±20%)
- Random brightness (Β±20%)
- Rescaling to [0, 1]
The model achieves high accuracy on the test set with robust performance across all ASL alphabet classes.
best_model_phase1.keras: Best model from Phase 1best_model_final.keras: Final best model after Phase 2training_results.png: Visualization of training metricstraining_history.json: Complete training historymodel_metadata.json: Model information and metadata
Training plots include:
- Training vs Validation Accuracy
- Training vs Validation Loss
- Final metrics summary
The notebook automatically exports models in multiple formats:
- Full model with architecture and weights
- Use for continued training or Python inference
- Location:
models/model.keras
- Optimized for mobile and edge devices
- Smaller file size with quantization
- Location:
models/model.tflite
training_set_labels.txt: Class names mappingmodel_metadata.json: Model configuration and metricstraining_history.json: Complete training logs
numpy>=1.19.0
pandas>=1.2.0
matplotlib>=3.3.0
seaborn>=0.11.0
scikit-learn>=0.24.0
tensorflow>=2.8.0
pillow>=8.0.0
Key hyperparameters that can be adjusted:
BATCH_SIZE = 64 # Batch size for training
IMG_SIZE = (200, 200) # Input image dimensions
EPOCHS_PHASE1 = 15 # Training epochs for Phase 1
EPOCHS_PHASE2 = 15 # Training epochs for Phase 2
LEARNING_RATE_1 = 0.001 # Phase 1 learning rate
LEARNING_RATE_2 = 0.00002 # Phase 2 learning rate- Mobile Applications: Real-time ASL recognition on smartphones
- Educational Tools: Interactive ASL learning applications
- Accessibility Solutions: Communication aids for deaf and hard-of-hearing individuals
- Research: Baseline for gesture recognition research
Contributions are welcome! Please feel free to submit a Pull Request.
This project is available for educational and research purposes.
- ASL Alphabet Dataset on Kaggle
- TensorFlow and Keras teams
- MobileNetV3 architecture by Google Research
For questions or feedback, please open an issue on GitHub.
Made with β€οΈ for the deaf and hard-of-hearing community