This repository provides a minimal, reproducible example of how to use ClearML to build machine learning pipelines, track experiments, and manage datasets using both task-based pipelines and function-based pipelines.
├── model_artifacts/ # Example outputs or saved models
├── work_dataset/ # Dataset samples and usage examples
├── demo_functions.py # Base Functions from ClearML
├── demo_using_artifacts_example.py # Demonstrates artifact loading
├── main.py # Entry point
├── pipeline_from_tasks.py # Pipeline built from existing ClearML Tasks
├── step1_dataset_artifact.py # Step 1: Upload dataset as artifact
├── step2_data_preprocessing.py # Step 2: Preprocess dataset
├── step3_train_model.py # Step 3: Train model using preprocessed data
- ✅ Task-based pipeline using
PipelineController.add_step(...) - [TBD] Function-based pipeline using
PipelineController.add_function_step(...) - ✅ Reusable ClearML Task templates
- ✅ Dataset and model artifact management with ClearML
- ✅ End-to-end ML workflow: Dataset → Preprocessing → Training
- ✅ Fully compatible with ClearML Hosted and ClearML Server
pip install clearmlSet up ClearML by running:
clearml-initYou will be prompted to enter:
- ClearML Credential
Use https://app.clear.ml to register for a free account if needed.
- Install the ClearML agent on your machine or server.
pip install clearml-agentTo use a task-based pipeline, follow these steps:
Before running the pipeline, execute the following scripts **once** to create reusable ClearML Tasks:
Note: When running for the first time, comment out `task.execute_remotely()` in the each .py file of the three tasks to successfully create a task template.
# Step 1: Upload dataset
python step1_dataset_artifact.py
# Step 2: Preprocess dataset
python step2_data_preprocessing.py
# Step 3: Train model
python step3_train_model.pyThese will appear in your ClearML dashboard and serve as base tasks for the pipeline.
Create Queue with name as pipeline (or your customized one), ensure it is consistent in pipeline_from_tasks.py
pipe.start(queue="pipeline")
Run the agent for queue worker:

Once all base tasks are registered, run the pipeline:
python main.py # Where we execute the run_pipeline()This version demonstrates using add_function_step(...) to wrap Python logic as pipeline steps.
You can run each task separately as well:
Note: When running for the first time, comment out
task.execute_remotely()in the code file to successfully create a task template.
# Step 1: Upload dataset
python step1_dataset_artifact.py
# Step 2: Preprocess data
python step2_data_preprocessing.py
# Step 3: Train model
python step3_train_model.pyThis project is developed and maintained by:
- Jacoo-Zhao (GitHub: @Jacoo-Zhao)
- Zoe Lin (Github: @Zoe Lin)
This project is licensed under the MIT License. See the LICENSE file for details.
