Skip to content

An emotion-based music player Android application that uses a custom-trained deep learning model (TensorFlow Lite) to detect the user's mood from a facial image and recommends music based on the detected emotion.

Notifications You must be signed in to change notification settings

s0oraj/SangeetAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Banner

SangeetAI -- Emotion Based Music Player

An Android Application which recommends and plays songs based on the user's emotion detected from a facial image. Powered by a custom deep learning model (VGG19, TensorFlow Lite) trained on FER2013.

Screenshots

It does image processing of the user's photo taken from the camera, runs emotion classification using an on-device TFLite model, and then plays/recommends songs according to the user's mood.

Install & try the app: Download APK


Overall ML & App Flow

flowchart TD
    A[Dataset<br/>FER2013] --> B[Image Preprocessing]
    B --> C[Normalization]
    B --> D[Augmentation] 
    B --> E[Image Resize<br/>224*224]
    C --> F[Training of<br/>VGG19 Model]
    D --> F
    E --> F
    F --> G[Saving model in<br/>.h5 format]
    F --> H[Testing Model in<br/>Colab]
    F --> I[TensorflowLite]
    I --> J[Saving Model as<br/>tflite file]
    J --> K[Sangeet-AI<br/>Android App]
    K --> L[Emotion<br/>Detection using<br/>front camera]
    K --> M[Music/Media<br/>Recommendation<br/>based on users<br/>mood.]
    classDef goldBox fill:#B8860B,stroke:#8B7355,stroke-width:2px,color:#000
    classDef blackBox fill:#2F2F2F,stroke:#555,stroke-width:2px,color:#fff
    classDef grayBox fill:#D3D3D3,stroke:#A9A9A9,stroke-width:2px,color:#000
    classDef whiteBox fill:#fff,stroke:#000,stroke-width:2px,color:#000
    class B,G,H,I,J goldBox
    class F,K blackBox
    class A grayBox
    class C,D,E,L,M whiteBox
Loading

Deep Technical Overview

ML Development (ml-notebooks)

1. Dataset Acquisition and Setup

  • Direct Kaggle Integration: Dataset downloaded directly to Google Colab using Kaggle API
  • FER2013 Dataset: 35,887 grayscale images (48x48) across 7 emotion classes
    • Classes: Angry, Disgust, Fear, Happy, Neutral, Sad, Surprise
  • Data Split: 80% training, 10% validation, 10% testing using splitfolders

2. Advanced Image Preprocessing Pipeline (CaseStudyImagePreprocessing.ipynb)

2.1 Normalization Implementation

def normalization(image):
    Imax = max(image)
    Imin = min(image)
    return (image - Imin) / (Imax - Imin)
  • Min-Max Scaling: Pixel values normalized to [0, 1] range
  • Batch Processing: All images processed through normalization pipeline
  • Format Conversion: Images converted from JPG to PNG for consistency

2.2 Data Augmentation Techniques

  • Rotation: Images rotated at 15-20 degree intervals using imutils.rotate_bound
  • Horizontal Flipping: Mirror images using numpy.fliplr
  • Affine Transformation: Custom transformation matrices for geometric variations
  • Output: 3x augmentation per original image (rotation + h_flip + affine_transform)

2.3 Image Resizing and Preprocessing

  • Target Size: All images resized to 224x224 for VGG19 compatibility
  • Batch Processing: Efficient processing using OpenCV and NumPy arrays
  • Memory Optimization: Images saved as .npy files for faster loading
  • Pixel Normalization: Final rescaling by 1./255 for neural network input

3. Model Architecture and Training (CaseStudyProject.ipynb)

3.1 VGG19 Transfer Learning Implementation

vgg = VGG19(input_shape=[224, 224, 3], weights='imagenet', include_top=False)

# Freeze pre-trained layers
for layer in vgg.layers:
    layer.trainable = False

# Custom classifier head
x = Flatten()(vgg.output)
prediction = Dense(7, activation='softmax')(x)
model = Model(inputs=vgg.input, outputs=prediction)

3.2 Training Configuration

  • Loss Function: Sparse Categorical Crossentropy
  • Optimizer: Adam optimizer for adaptive learning rate
  • Metrics: Accuracy tracking for performance monitoring
  • Early Stopping: Patience=5 on validation loss to prevent overfitting
  • Batch Size: 32 for optimal GPU memory utilization

3.3 Model Evaluation and Metrics

  • Classification Report: Precision, recall, F1-score per emotion class
  • Confusion Matrix: Detailed error analysis across emotion categories
  • Accuracy Scoring: Final model performance validation
  • Visualization: Training/validation loss and accuracy curves

4. TensorFlow Lite Conversion and Optimization

converter = tf.lite.TFLiteConverter.from_keras_model(model)
tfmodel = converter.convert()
open('linear.tflite','wb').write(tfmodel)
  • Model Conversion: Keras model (.h5) converted to TensorFlow Lite (.tflite)
  • Mobile Optimization: Model optimized for on-device inference
  • Size Reduction: Significant model size reduction for mobile deployment
  • Inference Speed: Optimized for real-time emotion detection

Android Integration

Key Android Libraries & Their Roles

1. Camera & Image Capture

  • Otalia Studios CameraView

    • Used for high-level camera integration (front/back camera switching, easy photo capture).
    • Handles permission checks, lifecycle management, and provides a simple API for camera events.
    • Allows for capturing images directly as Bitmap objects, suitable for ML preprocessing.
  • AndroidX AppCompat and Core Libraries

    • Standard Android compatibility, UI, and lifecycle management.

2. Image Preprocessing & ML Inference

  • TensorFlow Lite
    • Loads and runs the custom-trained emotion detection model.
    • Provides APIs for model loading, input/output tensor management, and fast on-device inference.
    • Used with TensorImage, ImageProcessor, ResizeOp, NormalizeOp, and TensorBuffer for preprocessing and inference.

3. Music Playback & UI

  • JcPlayer

    • Provides a modern, feature-rich music player interface for Android.
    • Supports playlists, notifications, background playback, and easy integration with URIs/URLs.
    • Used to stream songs from Firebase Storage.
    • Customizable UI components for play/pause, next/previous, and notifications.
  • Android MediaPlayer

    • Used internally (by JCPlayer or custom logic) for audio playback control.

4. UI Components & Navigation

  • ConstraintLayout, RelativeLayout, ListView, ImageView, TextView (Android SDK)

    • For flexible and responsive UI design.
    • Used to display detected emotion, emoji, and song lists.
  • Intent & Activity Navigation

    • Used to transition between the main camera/emotion detection screen and the song recommendation/playback screen.

5. Utility Libraries

  • Android Handler, Runnable

    • Used for timed UI effects (e.g., ripple animation) and asynchronous operations.
  • Android Permission Management

    • Ensures camera, microphone, and storage permissions are checked and requested as needed.

TFLite Model Loading and Inference

private int loadTfLiteModel(InputStream inputStream) {
    // Read TFLite model from assets or file
    byte[] buffer = Files.createTempDirectory("temp").resolve("temp_model").toFile().readAllBytes();
    tfliteModel = assetModel = tfliteModel.createFromBuffer(tfliteContent.x);
    
    // Preprocess image for model
    TensorImage tensorImage = TensorImage.fromBitmap(croppedBitmap);
    ImageProcessor processor = new ImageProcessor.Builder()
        .add(new ResizeOp(224, 224, ResizeOp.ResizeMethod.NEAREST_NEIGHBOR))
        .add(new NormalizeOp(0.f, 1.0f))
        .build();
    TensorImage preprocessedImage = processor.process(tensorImage);
    
    // Prepare input/output buffers
    TensorBuffer inputBuffer = preprocessedImage.getTensorBuffer();
    TensorBuffer outputBuffer = TensorBuffer.createFixedSize(new int[]{1, 7}, DataType.FLOAT32);
    
    // Run inference
    tflite.run(inputFeatures.getFloatArray(), outputBuffer.getFloatArray());
    
    // Postprocessing: find highest probability
    float[] outputs = outputBuffer.getFloatArray();
    int index = 0;
    float max = 0.0f;
    for(int i=0; i<outputs.length;i++){
        if(outputs[i]>max){
            max=outputs[i];
            index=i;
        }
    }
    // index = predicted emotion class
    model.close();
    return index;
}
  • Preprocessing matches the training pipeline: resize, normalize.
  • Output is a probability array [p1, p2, ..., p7]. The index with the max value is chosen as the emotion.
  • Postprocessing maps the index to human-readable emotion (e.g., 0=Angry, 1=Disgust, ..., 6=Neutral).

UI & Recommendation Logic

  • MainActivity:

    • Handles camera permissions and image capture.
    • Calls loadTfLiteModel() on the captured image.
    • Gets the predicted emotion index.
    • Sets AppController.currentMood accordingly.
  • ExpressionDisplayActivity:

    • Reads currentMood.
    • Sets emoji, background, and playlist based on emotion.
    • Plays a recommended song via JCPlayer.

Project Structure

  • app/src/main/java/blog/cosmos/home/sangeetai/: Main application logic
  • app/src/main/java/blog/cosmos/home/sangeetai/activity/: App screens/activities
  • app/src/main/java/blog/cosmos/home/sangeetai/utils/: Utility classes (image processing, TFLite inference)
  • app/src/main/java/blog/cosmos/home/sangeetai/constants/: Mood constants
  • app/src/main/java/blog/cosmos/home/sangeetai/interfaces/: Callbacks/interfaces
  • ml-notebooks/: Jupyter notebooks for data prep, training, and TFLite conversion

Example Song URIs

Sad Songs:

Happy Songs:


App Flow Summary

  1. User launches app.
  2. CameraView opens, user takes a selfie.
  3. TensorFlow Lite processes image with custom VGG19 model, outputs emotion probabilities.
  4. Predicted emotion index mapped to mood (e.g., Happy, Sad, Angry, etc.).
  5. JCPlayer UI displays matching playlist and controls.
  6. User plays, pauses, or skips songs. UI updates with emoji and background for emotion.

Permissions & Testing

  • Camera, microphone, and storage permissions managed and checked at runtime.
  • Unit tests and instrumented tests in app/src/test and app/src/androidTest.

Extensibility

  • New emotions: Retrain model with additional classes and update mapping logic in Android code.
  • More songs: Extend playlists/HashMaps in AppController.
  • Model upgrades: Replace .tflite file and update inference logic for better accuracy.

Contributions

Feel free to open issues, make feature requests, or submit pull requests!


This README covers comprehensive technical aspects of both the ML pipeline and Android implementation, providing detailed documentation for the complete SangeetAI system.

About

An emotion-based music player Android application that uses a custom-trained deep learning model (TensorFlow Lite) to detect the user's mood from a facial image and recommends music based on the detected emotion.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages