Dremio and all dependencies to setup self-analytics
This guide provides step-by-step instructions to set up a local data lakehouse environment using MinIO as Nessie or Amazon S3 and Dremio with Docker and Docker Compose.
- Prerequisites
- Installation Steps
- Additional Resources
- Docker installed on your machine.
- Docker Compose installed.
- Docker: Get Docker
- Docker Compose: Install Docker Compose
Clone the repository containing the docker-compose.yml file.
git clone <repository-url>
cd <repository-directory>Navigate to the directory containing the docker-compose.yml file and start the services using Docker Compose.
docker-compose up -dThis command will start MinIO, Nessie, and Dremio services in detached mode.
Access the MinIO console by navigating to http://localhost:9000 in your web browser. Use the default credentials to log in:
- Username:
admin - Password:
password
Create a new bucket named datalake.
Access the Dremio UI by navigating to http://localhost:9047 in your web browser. Follow the setup wizard to complete the initial configuration.
Ensure all services are running correctly by checking their respective UIs:
- MinIO:
http://localhost:9000 - Dremio:
http://localhost:9047
You should be able to interact with each service without issues.
-
To configure Nessie as a source in Dremio using MinIO, follow 7.1. Configure Nessie Source in Dremio using MinIO.
-
To configure Amazon S3 source as a source in Dremio, follow 7.2. Configure S3 Source in Dremio using MinIO.
-
Access Dremio UI: Navigate to
http://localhost:9047and log in if you haven't already. -
Add a New Source:
- Click on the
+icon next toSourcesin the left-hand menu. - Select
Nessiefrom the list of available sources.
- Click on the
-
Configure the Nessie Source:
-
Name: Enter a name for the Nessie source, e.g.,
NessieSource. -
Nessie Server URL: Enter
http://nessie:19120/api/v2. -
Authentication Type: Select
None(or configure as needed). -
Go to Storage inside Nessie configuration
- AWS root patht: Enter
datalake. - AWS Access Key: Enter
admin. - AWS Secret Key: Enter
password.
- AWS root patht: Enter
-
User Other set the followings Connection Properties:
- fs.s3a.path.style.access: Enter
true - fs.s3a.endpoint: Enter
minio:9000 - dremio.s3.compat: Enter
true
- fs.s3a.path.style.access: Enter
-
-
Save the Configuration: Click
Saveto add the Nessie source. -
Verify the Source:
- Navigate to the
Sourcessection in Dremio. - Click on the newly created
NessieSourceto ensure it connects and displays the contents of thedatalakebucket.
- Navigate to the
This completes the configuration of Nessie as a source in Dremio using MinIO.
-
Access Dremio UI: Navigate to
http://localhost:9047and log in if you haven't already. -
Add a New Source:
- Click on the
+icon next toSourcesin the left-hand menu. - Select
Amazon S3from the list of available sources.
- Click on the
-
Configure the Nessie Source:
-
Name: Enter a name for the S3 source source, e.g.,
S3Source. -
Authentication Type: Select
AWS Access Key. -
AWS Access Key: Enter
admin. -
AWS Secret Key: Enter
password. -
Disable the option
Encrypt connection -
Go to
Advanced Optionstab and set the following Connection Properties:- fs.s3a.path.style.access: Enter
true - fs.s3a.endpoint: Enter
minio:9000 - dremio.s3.compat: Enter
true
- fs.s3a.path.style.access: Enter
-
In
Cache Optionsdisable the optionEnable local caching when possible
-
This completes the configuration of Amazon S3 as a source in Dremio using MinIO.
To verify that writing to the source is working correctly, follow these steps and in <source_name> replace with NessieSource or S3Source accordingly:
-
Access Dremio SQL Editor:
- Navigate to
http://localhost:9047and log in if you haven't already. - Click on the
SQL Editortab at the top of the page.
- Navigate to
-
Create a New Table:
- In the SQL Editor, enter the following SQL command to create a new table in the source:
CREATE TABLE <source_name>.datalake.people ( id INT, first_name VARCHAR, last_name VARCHAR, age INT ) PARTITION BY (truncate(1, last_name));
-
Execute the Command:
- Click the
Runbutton to execute the SQL command.
- Click the
-
Verify the Table Creation:
- Navigate to the
Sourcessection in Dremio. - Click on
<source_name>and thendatalaketo ensure thepeopletable has been created successfully.
- Navigate to the
This step confirms that you can write to the source configured in Dremio using MinIO.
For more information and advanced configurations, refer to the following resources:
These resources provide deeper insights and extended functionalities that you can explore to enhance your data lakehouse setup.
This project is licensed under the MIT License. See the LICENSE file for details.