Databricks ML in Action

This is the code repository for Databricks ML in Action, published by Packt.

Learn how Databricks supports the entire ML lifecycle end-to-end from data ingestion to the model deployment

What is this book about?

Discover what makes the Databricks Data Intelligence Platform the go-to choice for top-tier machine learning solutions. Databricks ML in Action presents cloud-agnostic, end-to-end examples with hands-on illustrations of executing data science, machine learning, and generative AI projects on the Databricks Platform.

This book covers the following exciting features:

Set up a workspace for a data team planning to perform data science
Monitor data quality and detect drift
Use autogenerated code for ML modeling and data exploration
Operationalize ML with feature engineering client, AutoML, VectorSearch, Declarative Pipelines, AutoLoader, and Lakeflow Jobs
Integrate open-source and third-party applications, such as OpenAI’s ChatGPT, into your AI projects
Communicate insights through Databricks SQL dashboards and Delta Sharing
Explore data and models through the Databricks marketplace

If you feel this book is for you, get your copy today!

Instructions and Navigations

All of the code is organized into folders. For example, Chapter 02.

The code will look like the following:

import opendatasets as od

od.download("https://www.kaggle.com/competitions/store-sales-time-series-forecasting/data",raw_data_path)

dbutils.fs.ls(raw_data_path + "/store-sales-time-series-forecasting/")

Following is what you need for this book: This book is for machine learning engineers, data scientists, and technical managers seeking hands-on expertise in implementing and leveraging the Databricks Data Intelligence Platform and its Lakehouse architecture to create data products.

With the following software and hardware list, you can run all code files present in the book (Chapters 2-8).

Software and Hardware List

Chapter	Software required	OS required
2-8	Databricks	Windows, macOS, or Linux
2-8	Python and its associated libraries	Windows, macOS, or Linux

Related products

Data Lakehouse in Action [Packt] [Amazon]
Azure Databricks Cookbook [Packt] [Amazon]

Code Samples and Chapter Links

Each chapter folder contains code examples shared in the book using one of our data sources, and the links shared in the README file:

What do you need to run the examples?

You will need a Databricks environment and permissions to run a cluster in order to follow along. There is a Databricks Databricks Free Edition that you can use to run the provided notebooks and code. However, some features may not work. The scope of the free edition changes. Check for current limitations.

About the authors

Stephanie Alba (Rivera) has worked in big data and machine learning since 2011. She collaborates with teams and companies as they design their data intelligence platform as a Sr. Solutions Architect for Databricks.

Previously, Stephanie was the VP, Data Intelligence for a global company, ingesting 20+ terabytes of data daily. She led the data science, data engineering, and business intelligence teams.

Her data career has also included contributing to and leading a team in creating software that teaches people to explore fictional planets using data science algorithms. Stephanie authored numerous sections of Booz Allen Hamilton’s publication, The Field Guide to Data Science.

I want to thank my loving partner, Rami Alba Lucio, Databricks coworkers, family, and friends for their unwavering support.

Mandy Baker began her career in data 8 years ago. She loves leveraging her skills as a data scientist to orchestrate transformative journeys for companies across diverse industries as a Solutions Architect for Databricks. Her experiences have brought her from large corporations to small startups and everything in between. Mandy is a graduate of Carnegie Mellon University and the University of Washington.

Thank you to my partner Emmanuel, my parents, sisters, and friends for their enduring love and support.

Hayley Horn started her data career 15 years ago as a data quality consultant on enterprise data integration projects. As a data scientist, she specialized in customer insights and strategy and presented at Data Science and AI conferences in the US and Europe. She is currently a Sr. Solutions Architect for Databricks, with expertise in data science and technology modernization.

A graduate of the MS Data Science program at Southern Methodist University in Dallas, Texas, USA, she is now a capstone advisor to students in their final semesters of the program.

I’d like to thank my husband, Kevin, and my sons Dyson and Dalton for their encouragement and enthusiastic support.

Anastasia Prokaieva began her career nine years ago as a research scientist at CEA (France), focusing on large-scale data analysis and satellite data assimilation, handling terabytes of data. She has been working within the big data analysis and machine learning domain since then. In 2021, she joined Databricks and became the regional AI subject matter expert.
On a daily basis, Anastasia consults Databricks users on best practices implementation of AI projects end-to-end, and she delivers training and workshops to democratize AI. Anastasia holds two Master of Science degrees in theoretical physics and energy science.

I would like to thank my partner, Julien, and my family for their tremendous support. My gratitude to my talented teammates all around the globe, as you inspire me every day!

Thanks for purchasing the book!

Hope you enjoy it!

Name		Name	Last commit message	Last commit date
Latest commit History 506 Commits
.databricks		.databricks
.github/workflows		.github/workflows
Chapter 1: Getting Started		Chapter 1: Getting Started
Chapter 2: Designing Databricks Day One		Chapter 2: Designing Databricks Day One
Chapter 3: Building Our Bronze Layer		Chapter 3: Building Our Bronze Layer
Chapter 4: Getting to Know Your Data		Chapter 4: Getting to Know Your Data
Chapter 5: Feature Engineering on Databricks		Chapter 5: Feature Engineering on Databricks
Chapter 6: Tools for Model Training and Experimenting		Chapter 6: Tools for Model Training and Experimenting
Chapter 7: Productionizing ML on Databricks		Chapter 7: Productionizing ML on Databricks
Chapter 8: Monitoring, Evaluating, and More		Chapter 8: Monitoring, Evaluating, and More
mlia_utils		mlia_utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
global-setup.py		global-setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Databricks ML in Action

What is this book about?

Instructions and Navigations

Software and Hardware List

Related products

Code Samples and Chapter Links

What do you need to run the examples?

About the authors

Thanks for purchasing the book!

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 12

Uh oh!

Languages

License

PacktPublishing/Databricks-ML-In-Action

Folders and files

Latest commit

History

Repository files navigation

Databricks ML in Action

What is this book about?

Instructions and Navigations

Software and Hardware List

Related products

Code Samples and Chapter Links

What do you need to run the examples?

About the authors

Thanks for purchasing the book!

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 12

Uh oh!

Languages

Packages