Skip to content

πŸ“ŠDataLens AI - Transform raw datasets into stunning visual insights. Powered by Gemini AI for automated cleaning, interactive dashboards, and professional visualizations with intelligent analysis.

Notifications You must be signed in to change notification settings

Adinath-Jagtap/DataLens-AI-Intelligent-Data-Analytics-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š DataLens AI

Intelligent Data Analytics Agent

Python Jupyter Gemini

An autonomous AI-powered data analytics system that transforms raw datasets into professional visualizations and interactive dashboards

πŸš€ Live Demo (V1) β€’ πŸ“Ί Video Demo β€’

Capstone Project for Google's 5-Day AI Agents Intensive Course

divider

πŸ“– Table of Contents


🎯 Problem & Solution

The Challenge

Modern data analysis faces critical barriers:

  • Complexity: Multiple tools required for cleaning, analysis, and visualization
  • Technical Skills: Demands expertise in Python, pandas, and visualization libraries
  • Time Investment: Manual processes consume hours of productive time
  • Accessibility: Non-technical users locked out of advanced analytics
  • Inconsistency: Variable quality based on individual expertise

Our Solution

DataLens AI democratizes data analysis through AI automation:

Raw Data β†’ AI Processing β†’ Professional Insights
   ↓            ↓                    ↓
Upload β†’ Gemini Analysis β†’ Interactive Dashboard

Key Benefits:

  • πŸ€– AI-Driven: Leverages Google's Gemini API for intelligent processing
  • ⚑ Fast: Hours of work reduced to minutes
  • 🎯 Complete: End-to-end pipeline in a single notebook
  • πŸš€ No-Code: Upload and process without manual coding
  • πŸ“Š Professional: Publication-quality visualizations

divider

πŸ—οΈ System Architecture

Pipeline Workflow

graph LR
    A[πŸ”§ Setup Environment] --> B[πŸ€– Initialize Gemini AI]
    B --> C[πŸ“ Load Data]
    C --> D[🧠 AI Analysis & Cleaning]
    D --> E[πŸ“Š Generate Visualizations]
    E --> F[πŸ“ˆ Interactive Dashboard]
    
    style A fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000000
    style B fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px,color:#000000
    style C fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#000000
    style D fill:#e8f5e9,stroke:#388e3c,stroke-width:3px,color:#000000
    style E fill:#fce4ec,stroke:#c2185b,stroke-width:3px,color:#000000
    style F fill:#e0f2f1,stroke:#00796b,stroke-width:3px,color:#000000
Loading

Component Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Data Input    β”‚    β”‚  AI Processing   β”‚    β”‚ Output Generationβ”‚
β”‚                 β”‚    β”‚                  β”‚    β”‚                  β”‚
β”‚ β€’ CSV/Excel     │───▢│ β€’ Gemini AI     │───▢│ β€’ Visualizations β”‚
β”‚ β€’ Raw Data      β”‚    β”‚ β€’ Analysis       β”‚    β”‚ β€’ Dashboard      β”‚
β”‚ β€’ File Upload   β”‚    β”‚ β€’ Code Generationβ”‚    β”‚ β€’ Reports        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                        β”‚                        β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”‚
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚ Data Processing  β”‚
                         β”‚                  β”‚
                         β”‚ β€’ Cleaning       β”‚
                         β”‚ β€’ Transformation β”‚
                         β”‚ β€’ Encoding       β”‚
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

divider

✨ Core Capabilities

πŸ€– AI-Driven Intelligence

Gemini API for automated quality assessment and insights generation

🧹 Smart Data Cleaning

Intelligent cleaning code generation based on data profiling

πŸ“Š Advanced Visualizations

10+ chart types with professional styling and interactivity

πŸ“ˆ Interactive Dashboard

Real-time filters, KPI cards, and auto-updating charts

🏭 Production-Ready

ML-ready datasets with encoding and standardization

πŸ“ Seamless Upload

Interactive widget supporting CSV and Excel formats

divider

πŸš€ Quick Start

Prerequisites

Requirement Version Status
Python 3.8+ βœ… Required
Google Colab - 🌟 Recommended
Gemini API Key - πŸ”‘ Required

Installation

pip install pandas numpy matplotlib seaborn plotly scikit-learn ipywidgets \
            jsonschema google-generativeai google-auth google-auth-oauthlib \
            openpyxl xlrd jupyterlab

Setup Steps

1. Access the Notebook

# Open in Google Colab
File β†’ Upload β†’ Select the .ipynb notebook

2. Configure API Key

# Add Gemini API key to Colab Secrets
# 1. Click πŸ”‘ in left sidebar
# 2. Add new secret: GEMINI_API_KEY = "your_api_key"

3. Run the Pipeline

# Execute cells sequentially:
# Cells 1-2:  Environment setup
# Cells 3-4:  AI initialization  
# Cell 5:     Data upload
# Cells 6-9:  AI cleaning
# Cells 10-14: Visualizations
# Cells 15-17: Dashboard
# Cells 18-19: Reports

Usage Example

πŸ”§ Initialize Environment
# Cell 1: Install dependencies
!pip install pandas numpy matplotlib seaborn plotly scikit-learn ipywidgets \
            jsonschema google-generativeai --quiet

⏱️ ~2 minutes

πŸ€– Initialize Gemini AI
# Cell 3-4: Configure API
from google.colab import userdata
import google.genai as genai

api_key = userdata.get("GEMINI_API_KEY")
client = genai.Client(api_key=api_key)

⏱️ ~30 seconds

πŸ“ Load Dataset
# Cell 5: Upload and analyze
df = upload_dataset()
dataset_summary = generate_dataset_summary(df)

⏱️ Variable (depends on file size)

🧹 AI-Powered Cleaning
# Cell 6-7: Automated cleaning
cleaning_prompt = build_cleaning_prompt(dataset_summary)
cleaning_output = ask_gemini_cleaning(cleaning_prompt)

⏱️ ~1 minute

πŸ“Š Generate Visualizations
# Cell 10-14: Create charts
viz_code = prompt_gemini(viz_prompt)
exec(viz_code)

# Cell 15-17: Build dashboard
dashboard_code = prompt_gemini(dash_prompt)
exec(dashboard_code)

⏱️ ~2 minutes

divider

🎨 Features

πŸ” Automated Data Analysis

  • Comprehensive Summary: Statistical metrics, missing values, data type profiling
  • AI Quality Assessment: Gemini-powered evaluation
  • Column-wise Analysis: Detailed numeric and categorical insights

🧹 Smart Data Cleaning

Feature Description Status
Missing Value Detection Automatic identification and handling βœ…
Outlier Management 99th percentile statistical capping βœ…
Data Normalization Column standardization and value scaling βœ…
Categorical Encoding One-hot encoding for ML readiness βœ…
Negative Value Handling Automatic conversion to absolute values βœ…

πŸ“Š Visualization Suite

πŸ“Š Chart Types

β€’ Histograms
β€’ Bar charts
β€’ Line charts
β€’ Scatter plots
β€’ Box plots
β€’ Heatmaps
β€’ Pie charts
β€’ Correlation matrices
β€’ Violin plots
β€’ Area charts

πŸŽ›οΈ Interactive Features

β€’ Real-time filtering
β€’ KPI cards
β€’ Multi-select widgets
β€’ Auto-updating charts
β€’ Dynamic interactions
β€’ Responsive design

✨ Professional Quality

β€’ Custom styling
β€’ Proper titles
β€’ Axis labels
β€’ Legends
β€’ Color schemes
β€’ Export-ready

πŸ€– AI Integration

🧠 Gemini 2.5 Pro

Advanced data analysis and cleaning recommendations

πŸ”΄ Thorough & Comprehensive

⚑ Gemini 2.5 Flash

Fast visualization code generation

🟒 Quick & Efficient

divider

πŸ”§ Technical Stack

Core Dependencies

πŸ“Š Data Processing

pandas      # Data manipulation
numpy       # Numerical operations

πŸ“ˆ Visualization

matplotlib  # Static plots
seaborn     # Statistical graphics
plotly      # Interactive charts

πŸ€– Machine Learning

scikit-learn  # Preprocessing & encoding

🧠 AI Integration

google-generativeai  # Gemini API

πŸŽ›οΈ Interactive Components

ipywidgets  # Dashboard widgets

βœ… Validation

jsonschema  # Data validation

Notebook Structure

πŸ““ Cells 1-2:   Environment Setup (Dependencies)
πŸ€– Cells 3-4:   AI Initialization (Gemini Config)
πŸ“ Cell 5:      Data Loading (Upload & Profile)
🧹 Cells 6-9:   AI Cleaning (Quality Improvement)
πŸ“Š Cells 10-14: Visualization (Chart Generation)
πŸ“ˆ Cells 15-17: Dashboard (Interactive Interface)
πŸ“‹ Cells 18-19: Reporting (Insights & Recommendations)

divider

πŸ“Š Output Deliverables

1️⃣

Cleaned Dataset

ML-ready with encoding and standardization

2️⃣

Visualizations

10+ professional charts

3️⃣

Interactive Dashboard

Real-time analytics with KPIs

4️⃣

Analysis Report

Automated insights & recommendations

divider

πŸ“ˆ Use Cases

πŸ’Ό Business Intelligence

  • Sales analysis & forecasting
  • Performance tracking
  • KPI monitoring & dashboards
  • Revenue analysis
  • Market trend identification

πŸ”¬ Data Science

  • Automated ETL pipelines
  • Feature engineering
  • Model preparation
  • Data preprocessing
  • Exploratory data analysis

πŸ“Š Research Analytics

  • Statistical analysis
  • Correlation studies
  • Pattern recognition
  • Hypothesis testing
  • Trend analysis

πŸ“‹ Reporting Automation

  • Automated report generation
  • Executive dashboards
  • Periodic reporting
  • Stakeholder presentations
  • Business intelligence insights

divider

πŸ›‘οΈ Security

╔═══════════════════════════════════════════════════════════╗
β•‘               SECURITY & PRIVACY MEASURES                 β•‘
╠═══════════════════════════════════════════════════════════╣
β•‘  βœ…  Secure API Handling                                  β•‘
β•‘      β†’ API keys stored in Colab secrets                   β•‘
β•‘                                                           β•‘
β•‘  βœ…  No Hardcoded Credentials                             β•‘
β•‘      β†’ Secure authentication practices                    β•‘
β•‘                                                           β•‘
β•‘  βœ…  Data Privacy                                         β•‘
β•‘      β†’ Local processing without external transmission     β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

divider

πŸ“š Project Structure

DataLens-AI-Intelligent-Data-Analytics-Agent/
β”‚
└── πŸ“Š DataLens AI - Intelligent Data Analytics Agent.ipynb
    (Version 2 - Optimized for Google Colab)

πŸš€ Deployment

🌐 Version 1 - Live Demo

Live Demo

Deployed on Hugging Face Spaces

πŸ“¦ Version 2 - Current

Status

Available in this Repository

🚨 Important Notes

⚠️  REQUIREMENTS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

βœ“ Google Colab environment recommended
βœ“ Gemini API key configured in Colab secrets
βœ“ Supports CSV and Excel file formats
βœ“ Automatic dependency installation

divider


πŸŽ“ Capstone Project

Google's 5-Day AI Agents Intensive Course


Video Demo Live Demo

Built Using

Gemini Python Stack


Transform your data into insights with AI ✨


Made by Adinath Somnath Jagtap & Prajwal Ashok Zolage


divider

About

πŸ“ŠDataLens AI - Transform raw datasets into stunning visual insights. Powered by Gemini AI for automated cleaning, interactive dashboards, and professional visualizations with intelligent analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •