An autonomous AI-powered data analytics system that transforms raw datasets into professional visualizations and interactive dashboards
π Live Demo (V1) β’ πΊ Video Demo β’
Capstone Project for Google's 5-Day AI Agents Intensive Course
- π― Problem & Solution
- ποΈ System Architecture
- β¨ Core Capabilities
- π Quick Start
- π¨ Features
- π§ Technical Stack
- π Output Deliverables
- π Use Cases
- π‘οΈ Security
- π Project Structure
Modern data analysis faces critical barriers:
- Complexity: Multiple tools required for cleaning, analysis, and visualization
- Technical Skills: Demands expertise in Python, pandas, and visualization libraries
- Time Investment: Manual processes consume hours of productive time
- Accessibility: Non-technical users locked out of advanced analytics
- Inconsistency: Variable quality based on individual expertise
DataLens AI democratizes data analysis through AI automation:
Raw Data β AI Processing β Professional Insights
β β β
Upload β Gemini Analysis β Interactive Dashboard
Key Benefits:
- π€ AI-Driven: Leverages Google's Gemini API for intelligent processing
- β‘ Fast: Hours of work reduced to minutes
- π― Complete: End-to-end pipeline in a single notebook
- π No-Code: Upload and process without manual coding
- π Professional: Publication-quality visualizations
graph LR
A[π§ Setup Environment] --> B[π€ Initialize Gemini AI]
B --> C[π Load Data]
C --> D[π§ AI Analysis & Cleaning]
D --> E[π Generate Visualizations]
E --> F[π Interactive Dashboard]
style A fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000000
style B fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px,color:#000000
style C fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#000000
style D fill:#e8f5e9,stroke:#388e3c,stroke-width:3px,color:#000000
style E fill:#fce4ec,stroke:#c2185b,stroke-width:3px,color:#000000
style F fill:#e0f2f1,stroke:#00796b,stroke-width:3px,color:#000000
βββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ
β Data Input β β AI Processing β β Output Generationβ
β β β β β β
β β’ CSV/Excel βββββΆβ β’ Gemini AI βββββΆβ β’ Visualizations β
β β’ Raw Data β β β’ Analysis β β β’ Dashboard β
β β’ File Upload β β β’ Code Generationβ β β’ Reports β
βββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ
β β β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββ
β
ββββββββββββββββββββ
β Data Processing β
β β
β β’ Cleaning β
β β’ Transformation β
β β’ Encoding β
ββββββββββββββββββββ
|
Gemini API for automated quality assessment and insights generation |
Intelligent cleaning code generation based on data profiling |
10+ chart types with professional styling and interactivity |
|
Real-time filters, KPI cards, and auto-updating charts |
ML-ready datasets with encoding and standardization |
Interactive widget supporting CSV and Excel formats |
| Requirement | Version | Status |
|---|---|---|
| Python | 3.8+ | β Required |
| Google Colab | - | π Recommended |
| Gemini API Key | - | π Required |
pip install pandas numpy matplotlib seaborn plotly scikit-learn ipywidgets \
jsonschema google-generativeai google-auth google-auth-oauthlib \
openpyxl xlrd jupyterlab1. Access the Notebook
# Open in Google Colab
File β Upload β Select the .ipynb notebook2. Configure API Key
# Add Gemini API key to Colab Secrets
# 1. Click π in left sidebar
# 2. Add new secret: GEMINI_API_KEY = "your_api_key"3. Run the Pipeline
# Execute cells sequentially:
# Cells 1-2: Environment setup
# Cells 3-4: AI initialization
# Cell 5: Data upload
# Cells 6-9: AI cleaning
# Cells 10-14: Visualizations
# Cells 15-17: Dashboard
# Cells 18-19: Reportsπ§ Initialize Environment
# Cell 1: Install dependencies
!pip install pandas numpy matplotlib seaborn plotly scikit-learn ipywidgets \
jsonschema google-generativeai --quietβ±οΈ ~2 minutes
π€ Initialize Gemini AI
# Cell 3-4: Configure API
from google.colab import userdata
import google.genai as genai
api_key = userdata.get("GEMINI_API_KEY")
client = genai.Client(api_key=api_key)β±οΈ ~30 seconds
π Load Dataset
# Cell 5: Upload and analyze
df = upload_dataset()
dataset_summary = generate_dataset_summary(df)β±οΈ Variable (depends on file size)
π§Ή AI-Powered Cleaning
# Cell 6-7: Automated cleaning
cleaning_prompt = build_cleaning_prompt(dataset_summary)
cleaning_output = ask_gemini_cleaning(cleaning_prompt)β±οΈ ~1 minute
π Generate Visualizations
# Cell 10-14: Create charts
viz_code = prompt_gemini(viz_prompt)
exec(viz_code)
# Cell 15-17: Build dashboard
dashboard_code = prompt_gemini(dash_prompt)
exec(dashboard_code)β±οΈ ~2 minutes
- Comprehensive Summary: Statistical metrics, missing values, data type profiling
- AI Quality Assessment: Gemini-powered evaluation
- Column-wise Analysis: Detailed numeric and categorical insights
| Feature | Description | Status |
|---|---|---|
| Missing Value Detection | Automatic identification and handling | β |
| Outlier Management | 99th percentile statistical capping | β |
| Data Normalization | Column standardization and value scaling | β |
| Categorical Encoding | One-hot encoding for ML readiness | β |
| Negative Value Handling | Automatic conversion to absolute values | β |
|
β’ Histograms |
β’ Real-time filtering |
β’ Custom styling |
|
Advanced data analysis and cleaning recommendations π΄ Thorough & Comprehensive |
Fast visualization code generation π’ Quick & Efficient |
pandas # Data manipulation
numpy # Numerical operationsmatplotlib # Static plots
seaborn # Statistical graphics
plotly # Interactive chartsscikit-learn # Preprocessing & encoding |
google-generativeai # Gemini APIipywidgets # Dashboard widgetsjsonschema # Data validation |
π Cells 1-2: Environment Setup (Dependencies)
π€ Cells 3-4: AI Initialization (Gemini Config)
π Cell 5: Data Loading (Upload & Profile)
π§Ή Cells 6-9: AI Cleaning (Quality Improvement)
π Cells 10-14: Visualization (Chart Generation)
π Cells 15-17: Dashboard (Interactive Interface)
π Cells 18-19: Reporting (Insights & Recommendations)
|
ML-ready with encoding and standardization |
10+ professional charts |
Real-time analytics with KPIs |
Automated insights & recommendations |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SECURITY & PRIVACY MEASURES β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β β
Secure API Handling β
β β API keys stored in Colab secrets β
β β
β β
No Hardcoded Credentials β
β β Secure authentication practices β
β β
β β
Data Privacy β
β β Local processing without external transmission β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
DataLens-AI-Intelligent-Data-Analytics-Agent/
β
βββ π DataLens AI - Intelligent Data Analytics Agent.ipynb
(Version 2 - Optimized for Google Colab)
|
π Version 1 - Live Demo Deployed on Hugging Face Spaces |
π¦ Version 2 - Current Available in this Repository |
β οΈ REQUIREMENTS
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Google Colab environment recommended
β Gemini API key configured in Colab secrets
β Supports CSV and Excel file formats
β Automatic dependency installation







