Skip to content

This handbook curates a list of AI-native tools, workflows, and resources to help data scientists accelerate their careers in the age of AI.

Notifications You must be signed in to change notification settings

andresvourakis/ai-data-scientist-handbook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 

Repository files navigation

🔖 AI Data Scientist Handbook 2026

This handbook aims to cut through the noise and curate the AI tools, workflows, and resources that can actually help data scientists accelerate their growth in the age of AI.

Rather than covering general tools and resources or trying to provide an exhaustive list, the aim is to surface the resources that are most relevant to data scientists right now.

If you want to read more about the motivation behind this project, see the About section.

Table of Contents

🛠️ Tools

Modern BI & Analytics Tools

This section highlights modern, forward-leaning BI and analytics tools that go beyond traditional dashboarding.
The focus is on tools that emphasize semantic layers, metrics-as-code, search-driven analytics, notebooks, and tighter integration with modern data and AI workflows.

Well-known, legacy BI platforms (Looker, PowerBI, Qlik, Tableau) are intentionally excluded to keep this list high-signal and oriented toward how analytics is evolving rather than how it has historically been done.

Tool Category What It's Good At Why It Matters for DS
Omni Semantic BI / Metrics Layer SQL-first BI with strong semantic modeling Bridges analytics and engineering workflows
Steep Metrics & Analytics Lightweight, modern metrics exploration Faster iteration than traditional BI
Lightdash Open-source BI dbt-native BI with metrics-as-code Fits modern analytics stacks
Evidence Analytics Reporting Markdown + SQL driven reports BI as code, versionable insights
Hex Notebook BI Notebooks, dashboards, collaboration; AI-powered conversational analytics Analyst-to-stakeholder friendly with self-serve AI features
Mode Analytics Platform SQL + Python with reporting Strong hybrid DS / BI workflows
Preset Open-source BI Managed Apache Superset Scalable, customizable BI
Metabase Open-source BI Simple querying and dashboards Fast exploration with low friction
ThoughtSpot Search-Driven Analytics Natural language search with AI-assisted insights Brings search-style analytics to large datasets

Standalone Semantic Layer Tools

Dedicated semantic layer platforms that sit between your data warehouse and consumption tools (BI, AI, applications). They provide a single source of truth for metrics, dimensions, and business logic (decoupled from any specific BI tool).

Tool Type What It's Good At Why It Matters
Cube Open-source / Cloud Headless semantic layer with REST, GraphQL, MDX, and SQL APIs; caching and pre-aggregations API-first, works with any BI tool or AI agent; strong for embedded analytics
AtScale Enterprise Universal semantic layer with MDX/DAX support; integrates with Power BI, Tableau, Excel Enterprise-grade; bridges legacy BI tools with modern cloud warehouses
dbt Semantic Layer (MetricFlow) Open-source / Cloud Metrics-as-code defined alongside dbt models; integrates with dbt Cloud Tight integration with dbt workflows; single place for transforms + metrics
Malloy Open-source Semantic modeling language from Google; compiles to SQL for BigQuery, Snowflake, Postgres, etc. Lightweight, expressive; good for teams wanting code-first semantic models

If you want to know more about the semantic layer, check out this article.

Conversational Analytics Tools

Tools that let you talk to your data via conversational AI, natural-language querying, or AI-assisted analytics, but that don't quite fit into the modern BI category.

Tool What it does
PandasAI AI-driven interface for Python dataframes; ask questions and get insights and visuals.
Julius AI Conversational AI analyst for data insights and charts (supports spreadsheets/CSV/sheet imports).
Zerve AI Conversational interface for querying and exploring data.
DataGPT Conversational AI data analyst that generates insights and deep analysis from business data.
FineChatBI Conversational analytics tool to ask questions and build dashboards and visualizations.
Vanna AI Natural-language chat interface for querying SQL databases; generates SQL and charts/summaries.
Powerdrill (Chat with Database) Chat-based analytics interface for asking questions and analyzing data without writing SQL.
Wren AI Natural-language interface for querying and interacting with data sources.

Miscellaneous Tools

This section includes tools built specifically with data scientists in mind that don’t fit into the other categories but are still highly relevant to modern, AI-native data science workflows.

Tool What it’s for Why it belongs here
Google Agent Development Kit (ADK) Framework for building structured, tool-using agents Designed for analytical and reasoning-heavy workflows, not just chatbots
MCP Toolbox for Databases Standardized way to connect agents to databases Directly addresses a core DS need: safe, structured access to data sources
Metaflow DS-first workflow and experiment framework Built to let data scientists move from notebooks to production without heavy infrastructure
cleanlab Data quality and label issue detection Focuses on a uniquely DS problem: silent data and label errors that hurt model performance

🤖 Foundation Models

This section lists foundation models (open-source and commercial) that are relevant to aim at solving core data science problems, including models for tabular data, time series, recommendations, and multimodal analysis.

Use this section as a starting point for exploring foundation models and their capabilities.

Model Domain Organization Access Type Primary Use Case
TimeGPT Time series Nixtla API / Open Forecasting and anomaly detection
TimesFM Time series Google Open Zero-shot forecasting
Chronos Time series Amazon AWS Open General forecasting
Moirai Time series Salesforce Open Multi-domain forecasting
Toto Observability Datadog Open High-cardinality forecasting
MOMENT Time series CMU Open Multi-task (forecasting, anomaly, etc.)
Granite TTM-R2 Time series IBM Open Sequential prediction
TabPFN Tabular Prior Labs Open Classification and regression
TableGPT2 Tabular / NLP Zhejiang Univ. Open Table question answering and code generation
Netflix RecSys Model Recommendations Netflix Proprietary Personalization at scale
Spotify 2T-HGNN Recommendations Spotify Proprietary Cross-modal recommendations

If you want to know more about foundation models for data science, check out this article.

📚 Learning Resources

This section lists learning resources that go beyond generic theory and either align with AI-native data workflows or applied data science with modern AI tools.

Newsletters

  • Future Proof Data Science: A weekly newsletter for data scientists who want to stay relevant and grow their careers in the age of AI (and beyond).
  • Jam with AI: A newsletter inspired by real-world AI/ML events & projects
  • To Data & Beyond: A newsletter for mastering Data Science & AI—Beyond the Basics
  • Daily Dose of Data Science: A free newsletter for continuous learning about data science and ML, lesser-known techniques, and how to apply them in 2 minutes.
  • Neural Pulse: A 5-minute, human-curated newsletter delivering the best in AI, ML, and data science (twice a week).

Courses

Resource What It Covers Why It Belongs Here
AI Workflows Bootcamp A cohort-based program that helps data scientists master AI workflows and automation to 10× productivity, stay relevant, and accelerate their careers. Built for data scientists, by data scientists.
DeepLearning.AI Courses AI/ML foundations and applied developer workflows Useful for DS learners who need conceptual grounding alongside applied workflows.
Building AI Agents and Agentic Workflows Specialization (Coursera) Building and orchestrating agent-based AI systems (LangChain, LangGraph, tool calling) Focuses on agentic workflows that map directly to DS productivity scenarios.
Introduction to LangGraph (LangChain Academy) Building stateful, multi-actor agents with LangGraph Hands-on course for building agentic workflows directly applicable to DS automation.

YouTube Channels

  • Data Neighbor Podcast: Hosted by industry veterans Hai Guan, Sravya Madipalli, and Shane Butler, covering data science careers, AI trends, and professional growth.
  • AI Engineer: Official channel from the AI Engineer conference/community, featuring talks on AI engineering, agents, and applied AI development.

Feel free to reach out to me if you have any suggestions for channels that should be added to this list!

🏆 Conferences

United States

Conference Date Location Details
ODSC AI East 2026 April 28-29, 2026 Boston, MA Various tracks including ML, NLP, MLOps, and Data Visualization. 250+ speakers.
IBM Think 2026 May 4-7, 2026 Boston, MA Focuses on AI productivity, trusted data, scalable AI architectures, and cost optimization.
Machine Learning Week 2026 May 5-6, 2026 San Francisco, CA Focuses on making AI products robust and deployment-worthy.
The Data Science Conference May 28-29, 2026 Chicago, IL Vendor-free, sponsor-free, and recruiter-free conference for data science professionals.
Data + AI Summit 2026 June 15-18, 2026 San Francisco, CA Hosted by Databricks. Includes discussions, networking, and hands-on training.
AI Engineer World's Fair 2026 June 30 - July 2, 2026 San Francisco, CA Largest technical AI conference with 20 tracks, 250 speakers, 6,000+ attendees.
The AI Conference 2026 Sept 30 - Oct 1, 2026 San Francisco, CA Vendor-neutral event by the creators of MLconf. Features AI research, engineering, and applied ML.
ODSC AI West 2026 Oct 27-29, 2026 Burlingame, CA Focuses on AI and data science with workshops, hands-on training, and strategic insights.

Europe

Conference Date Location Details
World AI Cannes Festival 2026 Feb 12-13, 2026 Cannes, France Focuses on AI, ML, and data science. Features AI technologies and global innovators.
AI Engineer Europe 2026 April 8-10, 2026 London, UK First official AI Engineer Europe event. Large multitrack technical AI conference for 1000+ AI engineers.
Data Innovation Summit 2026 May 6-8, 2026 Stockholm, Sweden Covers data governance, literacy, machine learning, with speakers from major companies.
DATA 2026 July 16-18, 2026 Porto, Portugal International conference on data science, technology, and applications.
ECML PKDD 2026 Sept 7-11, 2026 Naples, Italy Premier European conference on machine learning and knowledge discovery in databases.

Contributing

If you want to add to the repository or find any issues, please feel free to raise a PR and ensure correct placement within the relevant section or category.

About

This repo exists because data science is entering a new phase.

AI tools are no longer “nice to have” side experiments. They are becoming part of how we actually do the work, from analysis and exploration to production workflows.

As demand grows for data scientists who understand how to integrate AI into their existing workflows, the signal-to-noise ratio is getting worse. There are endless tools, ideas, and opinions, many of them generic or borrowed from other fields.

The goal of this repo is to cut through that noise. It’s a curated set of resources that are either built specifically for data scientists or closely aligned with how we already work. Not an exhaustive list, and not a guide, just a focused snapshot of what’s worth paying attention to right now.

FAQs

  1. How is curation done? Curation is based on thorough research, recommendations from people I trust, my 7+ years of experience as a Data Scientist and extensive work integrating AI into data science workflows.
  2. Are all resources free? Most resources here will be free, but I will also include paid alternatives if they are truly valuable to your career development.
  3. How often is the repository updated? I plan to come back here as often as possible to ensure all resources are still available and relevant and also to add new ones.

If you have questions or feedback send me a message through here. Enjoy!

About

This handbook curates a list of AI-native tools, workflows, and resources to help data scientists accelerate their careers in the age of AI.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published