Skip to content

Akith-002/ASL_Model_DL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🀟 ASL Alphabet Recognition Model

A deep learning model for American Sign Language (ASL) alphabet recognition using MobileNetV3Large architecture with transfer learning. This project achieves high accuracy in classifying ASL hand signs for letters A-Z and special characters.

πŸ“‹ Table of Contents

🎯 Overview

This project implements a state-of-the-art deep learning model for recognizing American Sign Language alphabet gestures. The model uses MobileNetV3Large as the base architecture with custom classification layers, trained in two phases:

  1. Phase 1: Training the classifier head with frozen base model
  2. Phase 2: Fine-tuning the entire network with reduced learning rate

The model is optimized for both accuracy and deployment, with support for:

  • Keras format for training and evaluation
  • TensorFlow Lite format for mobile and edge device deployment

✨ Features

  • 🧠 Transfer Learning: Leverages pre-trained MobileNetV3Large on ImageNet
  • 🎨 Data Augmentation: Random rotation, zoom, contrast, and brightness adjustments
  • βš–οΈ Class Balancing: Automatic class weight calculation for imbalanced datasets
  • πŸ“Š Comprehensive Evaluation: Detailed metrics, confusion matrix, and visualizations
  • πŸ“± Mobile-Ready: TensorFlow Lite export for on-device inference
  • πŸš€ GPU Acceleration: Mixed precision training support for faster training
  • πŸ“ˆ Learning Rate Scheduling: Adaptive learning rate reduction on plateau

πŸ“Š Dataset

The model is trained on the ASL Alphabet Dataset which includes:

  • 26 letters (A-Z)
  • 3 special characters (space, delete, nothing)
  • Total: 29 classes

Data Split

  • Training: 70% of the dataset
  • Validation: 15% of the dataset
  • Test: 15% of the dataset

Expected dataset structure:

dataset/
β”œβ”€β”€ A/
β”‚   β”œβ”€β”€ image1.jpg
β”‚   β”œβ”€β”€ image2.jpg
β”‚   └── ...
β”œβ”€β”€ B/
β”œβ”€β”€ C/
...
β”œβ”€β”€ Z/
β”œβ”€β”€ space/
β”œβ”€β”€ del/
└── nothing/

πŸ—οΈ Model Architecture

The model consists of:

  1. Base Model: MobileNetV3Large (pre-trained on ImageNet)

    • Input shape: 200x200x3
    • Pooling: Global Average Pooling
    • Initial state: Frozen (Phase 1)
  2. Custom Head:

    • Dropout layer (0.2)
    • Dense layer (29 units, softmax activation)
  3. Training Configuration:

    • Phase 1: Adam optimizer (lr=0.001), 15 epochs
    • Phase 2: Adam optimizer (lr=0.00002), 15 epochs
    • Loss: Categorical Crossentropy
    • Callbacks: ModelCheckpoint, EarlyStopping, ReduceLROnPlateau

πŸš€ Installation

Prerequisites

  • Python 3.8+
  • TensorFlow 2.x
  • CUDA-compatible GPU (optional, but recommended)

Install Dependencies

pip install numpy pandas matplotlib seaborn scikit-learn tensorflow

Or install from a requirements file:

pip install -r requirements.txt

πŸ’» Usage

Running the Notebook

  1. Open the notebook:

    jupyter notebook asl-model.ipynb
  2. Update dataset path in the notebook to point to your ASL dataset location

  3. Run all cells to:

    • Load and prepare the dataset
    • Train the model
    • Evaluate performance
    • Export models

Using the Trained Model

import tensorflow as tf
import numpy as np
from PIL import Image

# Load the model
model = tf.keras.models.load_model('models/model.keras')

# Load and preprocess image
img = Image.open('test_image.jpg').resize((200, 200))
img_array = np.array(img) / 255.0
img_array = np.expand_dims(img_array, axis=0)

# Make prediction
predictions = model.predict(img_array)
predicted_class = np.argmax(predictions[0])

# Load class names
with open('models/training_set_labels.txt', 'r') as f:
    class_names = [line.strip() for line in f.readlines()]

print(f"Predicted: {class_names[predicted_class]}")
print(f"Confidence: {predictions[0][predicted_class]:.2%}")

πŸŽ“ Training Process

The training follows a two-phase approach:

Phase 1: Transfer Learning (15 epochs)

  • Base model layers are frozen
  • Only the classification head is trained
  • Higher learning rate (0.001)
  • Class weights applied for imbalanced data

Phase 2: Fine-tuning (15 epochs)

  • All layers are unfrozen
  • Entire network is fine-tuned
  • Lower learning rate (0.00002)
  • Learning rate reduction on plateau

Data Augmentation

Applied during training to improve generalization:

  • Random rotation (Β±10%)
  • Random zoom (Β±10%)
  • Random contrast (Β±20%)
  • Random brightness (Β±20%)
  • Rescaling to [0, 1]

πŸ“ˆ Results

The model achieves high accuracy on the test set with robust performance across all ASL alphabet classes.

Training Outputs

  • best_model_phase1.keras: Best model from Phase 1
  • best_model_final.keras: Final best model after Phase 2
  • training_results.png: Visualization of training metrics
  • training_history.json: Complete training history
  • model_metadata.json: Model information and metadata

Visualization

Training plots include:

  • Training vs Validation Accuracy
  • Training vs Validation Loss
  • Final metrics summary

πŸ“¦ Model Export

The notebook automatically exports models in multiple formats:

1. Keras Format (.keras)

  • Full model with architecture and weights
  • Use for continued training or Python inference
  • Location: models/model.keras

2. TensorFlow Lite Format (.tflite)

  • Optimized for mobile and edge devices
  • Smaller file size with quantization
  • Location: models/model.tflite

3. Supporting Files

  • training_set_labels.txt: Class names mapping
  • model_metadata.json: Model configuration and metrics
  • training_history.json: Complete training logs

πŸ“‹ Requirements

numpy>=1.19.0
pandas>=1.2.0
matplotlib>=3.3.0
seaborn>=0.11.0
scikit-learn>=0.24.0
tensorflow>=2.8.0
pillow>=8.0.0

πŸ”§ Configuration

Key hyperparameters that can be adjusted:

BATCH_SIZE = 64          # Batch size for training
IMG_SIZE = (200, 200)    # Input image dimensions
EPOCHS_PHASE1 = 15       # Training epochs for Phase 1
EPOCHS_PHASE2 = 15       # Training epochs for Phase 2
LEARNING_RATE_1 = 0.001  # Phase 1 learning rate
LEARNING_RATE_2 = 0.00002 # Phase 2 learning rate

🎯 Use Cases

  • Mobile Applications: Real-time ASL recognition on smartphones
  • Educational Tools: Interactive ASL learning applications
  • Accessibility Solutions: Communication aids for deaf and hard-of-hearing individuals
  • Research: Baseline for gesture recognition research

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“„ License

This project is available for educational and research purposes.

πŸ™ Acknowledgments

  • ASL Alphabet Dataset on Kaggle
  • TensorFlow and Keras teams
  • MobileNetV3 architecture by Google Research

πŸ“ž Contact

For questions or feedback, please open an issue on GitHub.


Made with ❀️ for the deaf and hard-of-hearing community

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published