Skip to content

Jacoo-Zhao/AI-Studio-ClearML

Repository files navigation

🧠 AI-Studio-ClearML

This repository provides a minimal, reproducible example of how to use ClearML to build machine learning pipelines, track experiments, and manage datasets using both task-based pipelines and function-based pipelines.


📦 Project Structure

├── model_artifacts/                  # Example outputs or saved models
├── work_dataset/                     # Dataset samples and usage examples
├── demo_functions.py                 # Base Functions from ClearML 
├── demo_using_artifacts_example.py  # Demonstrates artifact loading
├── main.py                           # Entry point
├── pipeline_from_tasks.py           # Pipeline built from existing ClearML Tasks
├── step1_dataset_artifact.py        # Step 1: Upload dataset as artifact
├── step2_data_preprocessing.py      # Step 2: Preprocess dataset
├── step3_train_model.py             # Step 3: Train model using preprocessed data

🧪 Features

  • ✅ Task-based pipeline using PipelineController.add_step(...)
  • [TBD] Function-based pipeline using PipelineController.add_function_step(...)
  • ✅ Reusable ClearML Task templates
  • ✅ Dataset and model artifact management with ClearML
  • ✅ End-to-end ML workflow: Dataset → Preprocessing → Training
  • ✅ Fully compatible with ClearML Hosted and ClearML Server

🚀 Getting Started

1. Install Dependencies

pip install clearml

2. Configure ClearML

Set up ClearML by running:

clearml-init

You will be prompted to enter:

  • ClearML Credential

Use https://app.clear.ml to register for a free account if needed.


3. Create a ClearML Agent

  • Install the ClearML agent on your machine or server.
pip install clearml-agent

🛠️ How to Use

Using Colab: refer to ClearML_Pipeline_Demo.ipynb.

🔁 Option 1: Pipeline from Predefined ClearML Tasks

To use a task-based pipeline, follow these steps:

Step 1: Register the Base Tasks

Before running the pipeline, execute the following scripts **once** to create reusable ClearML Tasks:

Note: When running for the first time, comment out `task.execute_remotely()` in the each .py file of the three tasks to successfully create a task template.

# Step 1: Upload dataset
python step1_dataset_artifact.py

# Step 2: Preprocess dataset
python step2_data_preprocessing.py

# Step 3: Train model
python step3_train_model.py

These will appear in your ClearML dashboard and serve as base tasks for the pipeline.

Step 1.5: Initial ClearML Queue

Create Queue with name as pipeline (or your customized one), ensure it is consistent in pipeline_from_tasks.py

pipe.start(queue="pipeline")

image

Run the agent for queue worker: image

Step 2: Run the Pipeline

Once all base tasks are registered, run the pipeline:

python main.py # Where we execute the run_pipeline()

🔧 [TBD] Option 2: Pipeline from Local Python Functions

This version demonstrates using add_function_step(...) to wrap Python logic as pipeline steps.


🧩 Run Individual Pipeline Steps

You can run each task separately as well:

Note: When running for the first time, comment out task.execute_remotely() in the code file to successfully create a task template.

# Step 1: Upload dataset
python step1_dataset_artifact.py

# Step 2: Preprocess data
python step2_data_preprocessing.py

# Step 3: Train model
python step3_train_model.py

📘 References


🙌 Acknowledgments

This project is developed and maintained by:


📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •