claim-modelling-kedro

Project for my master thesis.

Automates insurance claim modeling in Kedro: data processing, sampling, feature engineering, feature selection, GLM/LightGBM tuning, recalibration, and validation. Configurable via YAML. Supports generating, scheduling, and tracking hundreds of experiments. Outputs: metrics and charts. Uses Kedro, MLflow, Python.

Dataset

The dataset used in this project can be found at: Dataset Link

Basic scripts

create_venv.sh

This script creates a Python virtual environment and installs the required dependencies.

Usage:

./create_venv.sh

setup_postgres.sh

This script sets up a PostgreSQL database for a mlflow server. It creates a database named mlflow_db and a user named mlflow_user with the specified password. The script also grants all privileges on the database to the user.

Usage:

./setup_postgres.sh

The script will prompt you for the PostgreSQL password for mlflow user.

run_mlflow.sh

This script starts the MLflow server for tracking experiments.

Usage:

./run_mlflow.sh

run_jupyterlab.sh

This script starts a JupyterLab server for interactive data analysis and development.

Usage:

./run_jupyterlab.sh

kedro_run.sh

This script runs the Kedro pipeline defined in the project.

Usage:

./kedro_run.sh [OPTIONS]

Options:

--pipeline, -p: Specify the pipeline to run (default: __default__).
--mlflow-run-id: Continue the MLflow run with the given run ID.

How to Create and Run a Batch of Experiments

Creating a Batch of Experiments

In the experiments directory, copy the experiments_dir_template directory and rename it to your desired experiment name, e.g., my_experiment_name.
Edit the files in experiments/my_experiment_name/templates/ (parameters.yml and mlflow.yml) to define the parameters for your experiment.
In any notebook in the project (e.g., notebooks/my_experiment_analysis.ipynb), run the following method to create a new experiment run:
```
create_experiment_run(
    experiment_name=experiment_name,
    run_name=run_name,
    template_parameters=template_parameters
)
```
where:
- experiment_name is the name of your experiment directory,
- run_name is the name of the new MLflow run,
- template_parameters is a dictionary of parameters whose values will replace the template tags in the files located in experiments/<experiment_name>/templates/.
You can call this method multiple times with different pairs of run_name and template_parameters to create multiple runs for the same experiment.

Import the method in your notebook as follows:
```
from claim_modelling_kedro.experiments.experiment import (
    create_experiment_run,
    default_run_name_from_run_no
)
```

Running a Batch of Experiments

Run the experiment using the provided run_experiment.sh script. See usage below.
Restore the default configuration files using the restore_default_config.sh script. See usage below.

Viewing Experiment Results in a Notebook

Useful methods for viewing experiment results in your notebook:

From claim_modelling_kedro.experiments.experiment:
- get_run_mlflow_id
From claim_modelling_kedro.pipelines.utils.dataframes:
- load_metrics_table_from_mlflow
- load_predictions_and_target_from_mlflow
- load_metrics_cv_stats_from_mlflow
From claim_modelling_kedro.pipelines.utils.datasets:
- get_partition
- get_mlflow_run_id_for_partition

Scripts for Running Experiments

run_experiment.sh

This script runs an experiment for different pipelines.
It requires the experiment name and the name of the first pipeline to run as positional arguments.

Optionally, you can specify:

another pipeline to run for all subsequent runs,
a specific run name or multiple run names, and
a run name from which run_experiment.sh should continue.

The script copies the rendered templates from experiments/<experiment_name>/templates/ to the Kedro configuration directory.

Usage:

./run_experiment.sh <experiment_name> <first_pipeline> [--other-pipeline OTHER_PIPELINE] [--run-name RUN_NAME [RUN_NAME ...]] [--from-run-name FROM_RUN_NAME]

<experiment_name>: Name of the experiment directory (required)
<first_pipeline>: Name of the first pipeline to run (required)
--other-pipeline OTHER_PIPELINE: (optional) Name of the second pipeline to run after the first
--run-name RUN_NAME [RUN_NAME ...]: (optional) One or more run names to use for the experiment(s)
--from-run-name FROM_RUN_NAME: (optional) Use this run name as a template for the new run(s)

Example:

./run_experiment.sh sev_001_dummy_mean_regressor ds

or with additional options:

./run_experiment.sh sev_001_dummy_mean_regressor all_to_test --other-pipeline smpl_to_test --run-name my_run_1 my_run_2

restore_default_config.sh

This script restores the default configuration files for the project from claim_modelling_kedro/conf/default/.

Usage:

./restore_default_config.sh

Scripts for Managing MLflow Experiments

manage_mlflow_experiment.sh

This script allows you to delete, restore, or permanently delete (purge) an MLflow experiment.
It reads the tracking URI from: claim_modelling_kedro/conf/local/mlflow.yml.

Supported actions:

delete – soft-deletes the experiment (marks it as deleted)
restore – restores a soft-deleted experiment
purge – permanently deletes the experiment
⚠️ Requires MLflow ≥ 2.7 and SQL backend.

Usage:

./manage_mlflow_experiment.sh <delete|restore|purge> (--name <experiment_name> | --id <experiment_id>)

Examples:

Soft-delete by name

./manage_mlflow_experiment.sh delete --name sev_001_dummy_mean_regressor

Restore by ID

./manage_mlflow_experiment.sh restore --id 12

Permanently delete by name

./manage_mlflow_experiment.sh purge --name sev_001_dummy_mean_regressor

list_mlflow_experiments.sh

This script lists all MLflow experiments along with their name, ID, and lifecycle stage.
It uses the tracking URI configured in claim_modelling_kedro/conf/local/mlflow.yml.

Usage:

./list_mlflow_experiments.sh

Output includes:

experiment name
experiment ID
status (active or deleted)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

claim-modelling-kedro

Dataset

Basic scripts

create_venv.sh

setup_postgres.sh

run_mlflow.sh

run_jupyterlab.sh

kedro_run.sh

How to Create and Run a Batch of Experiments

Creating a Batch of Experiments

Running a Batch of Experiments

Viewing Experiment Results in a Notebook

Scripts for Running Experiments

run_experiment.sh

restore_default_config.sh

Scripts for Managing MLflow Experiments

manage_mlflow_experiment.sh

list_mlflow_experiments.sh

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 465 Commits
claim_modelling_kedro		claim_modelling_kedro
.gitignore		.gitignore
README.md		README.md
TODO.txt		TODO.txt
create_venv.sh		create_venv.sh
kedro_run.sh		kedro_run.sh
list_mlflow_experiments.sh		list_mlflow_experiments.sh
manage_mlflow_experiment.sh		manage_mlflow_experiment.sh
pytest.sh		pytest.sh
readme-to-pdf.sh		readme-to-pdf.sh
restore_default_config.sh		restore_default_config.sh
run_experiment.sh		run_experiment.sh
run_jupyterlab.sh		run_jupyterlab.sh
run_mlflow.sh		run_mlflow.sh
setup_postgres.sh		setup_postgres.sh

krzpiesiewicz/claim-modelling-kedro

Folders and files

Latest commit

History

Repository files navigation

claim-modelling-kedro

Dataset

Basic scripts

create_venv.sh

setup_postgres.sh

run_mlflow.sh

run_jupyterlab.sh

kedro_run.sh

How to Create and Run a Batch of Experiments

Creating a Batch of Experiments

Running a Batch of Experiments

Viewing Experiment Results in a Notebook

Scripts for Running Experiments

run_experiment.sh

restore_default_config.sh

Scripts for Managing MLflow Experiments

manage_mlflow_experiment.sh

list_mlflow_experiments.sh

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages