This is the code repository for Databricks ML in Action, published by Packt.
Learn how Databricks supports the entire ML lifecycle end-to-end from data ingestion to the model deployment
Discover what makes the Databricks Data Intelligence Platform the go-to choice for top-tier machine learning solutions. Databricks ML in Action presents cloud-agnostic, end-to-end examples with hands-on illustrations of executing data science, machine learning, and generative AI projects on the Databricks Platform.
This book covers the following exciting features:
- Set up a workspace for a data team planning to perform data science
- Monitor data quality and detect drift
- Use autogenerated code for ML modeling and data exploration
- Operationalize ML with feature engineering client, AutoML, VectorSearch, Declarative Pipelines, AutoLoader, and Lakeflow Jobs
- Integrate open-source and third-party applications, such as OpenAI’s ChatGPT, into your AI projects
- Communicate insights through Databricks SQL dashboards and Delta Sharing
- Explore data and models through the Databricks marketplace
If you feel this book is for you, get your copy today!
All of the code is organized into folders. For example, Chapter 02.
The code will look like the following:
import opendatasets as od
od.download("https://www.kaggle.com/competitions/store-sales-time-series-forecasting/data",raw_data_path)
dbutils.fs.ls(raw_data_path + "/store-sales-time-series-forecasting/")
Following is what you need for this book: This book is for machine learning engineers, data scientists, and technical managers seeking hands-on expertise in implementing and leveraging the Databricks Data Intelligence Platform and its Lakehouse architecture to create data products.
With the following software and hardware list, you can run all code files present in the book (Chapters 2-8).
| Chapter | Software required | OS required |
|---|---|---|
| 2-8 | Databricks | Windows, macOS, or Linux |
| 2-8 | Python and its associated libraries | Windows, macOS, or Linux |
Each chapter folder contains code examples shared in the book using one of our data sources, and the links shared in the README file:
- Chapter 1 - Getting Started
- Chapter 2 - Designing Databricks Day One
- Chapter 3 - Building the Bronze Layer
- Chapter 4 - Getting to Know Your Data
- Chapter 5 - Feature Engineering on the Lakehouse
- Chapter 6 - Tools for Model Training and Experimenting
- Chapter 7 - Productionizing Machine Learning
- Chapter 8 - Monitoring, Evaluating, and More
You will need a Databricks environment and permissions to run a cluster in order to follow along. There is a Databricks Databricks Free Edition that you can use to run the provided notebooks and code. However, some features may not work. The scope of the free edition changes. Check for current limitations.
Stephanie Alba (Rivera) has worked in big data and machine learning since 2011. She collaborates with teams and companies as they design their data intelligence platform as a Sr. Solutions Architect for Databricks.
Previously, Stephanie was the VP, Data Intelligence for a global company, ingesting 20+ terabytes of data daily. She led the data science, data engineering, and business intelligence teams.
Her data career has also included contributing to and leading a team in creating software that teaches people to explore fictional planets using data science algorithms. Stephanie authored numerous sections of Booz Allen Hamilton’s publication, The Field Guide to Data Science.
I want to thank my loving partner, Rami Alba Lucio, Databricks coworkers, family, and friends for their unwavering support.
Mandy Baker began her career in data 8 years ago. She loves leveraging her skills as a data scientist to orchestrate transformative journeys for companies across diverse industries as a Solutions Architect for Databricks. Her experiences have brought her from large corporations to small startups and everything in between. Mandy is a graduate of Carnegie Mellon University and the University of Washington.
Thank you to my partner Emmanuel, my parents, sisters, and friends for their enduring love and support.
Hayley Horn started her data career 15 years ago as a data quality consultant on enterprise data integration projects. As a data scientist, she specialized in customer insights and strategy and presented at Data Science and AI conferences in the US and Europe. She is currently a Sr. Solutions Architect for Databricks, with expertise in data science and technology modernization.
A graduate of the MS Data Science program at Southern Methodist University in Dallas, Texas, USA, she is now a capstone advisor to students in their final semesters of the program.
I’d like to thank my husband, Kevin, and my sons Dyson and Dalton for their encouragement and enthusiastic support.
Anastasia Prokaieva began her career nine years ago as a research scientist at CEA (France), focusing on large-scale data analysis and satellite data assimilation, handling terabytes of data. She has been working within the big data analysis and machine learning domain since then. In 2021, she joined Databricks and became the regional AI subject matter expert.
On a daily basis, Anastasia consults Databricks users on best practices implementation of AI projects end-to-end, and she delivers training and workshops to democratize AI. Anastasia holds two Master of Science degrees in theoretical physics and energy science.
I would like to thank my partner, Julien, and my family for their tremendous support. My gratitude to my talented teammates all around the globe, as you inspire me every day!
Hope you enjoy it!

