AIMO Math Finetuning using DeepSeek-R1-Distill-Qwen-7B

Description

This project focuses on fine-tuning the deepseek-ai/DeepSeek-R1-Distill-Qwen-7B model for mathematical problem-solving, specifically targeting problems similar to those found in the AI Mathematical Olympiad (AIMO) competition on Kaggle.

The fine-tuning process utilizes the Floppanacci/QWQ-LongCOT-AIMO dataset. The resulting fine-tuned model is available as Floppanacci/DeepSeek-R1-Distill-Qwen-7B-Floppanacci, and a 4-bit AWQ quantized version is available at Floppanacci/DeepSeek-R1-Distill-Qwen-7B-Floppanacci-AWQ.

Installation

Both notebooks were executed on Google Colab, using L4 gpu for the fine-tuning and A100 for the quantization (VRAM requirements were too high for the L4). Both notebooks would require modifications to run on a different hardware. Notably the dtype torch.bfloat16 is only available on high-end Nvidia GPUs (e.g. L4, A100, H100) but not on other hardware such as T4 GPUs, which would require torch.float16.

Data

The fine-tuning process uses the Floppanacci/QWQ-LongCOT-AIMO dataset available on Hugging Face.

Size: 36.8k samples (29.5k train, 3.68k validation, 3.68k test).
Format: CSV/Parquet, containing mathematical questions and detailed step-by-step answers (Chain-of-Thought style).
The dataset can be loaded using the datasets library.

Fine-tuning

The supervised-fine-tuning.ipynb notebook contains the code for Supervised Fine-tuning (SFT).

Configure Training: Modify the TrainingArguments within the notebook. Key parameters to adjust include:
- output_dir: Directory to save checkpoints and the final model.
- num_train_epochs: Number of training epochs (e.g., 3).
- per_device_train_batch_size: Batch size per GPU (e.g., 4).
- gradient_accumulation_steps: Steps for gradient accumulation (e.g., 8).
- learning_rate: Learning rate (e.g., 3e-5).
- Ensure you configure report_to="wandb" (or other tracker) and set an appropriate project name for monitoring.
Run Fine-tuning: Execute the cells in the notebook. The script loads the base model (deepseek-ai/DeepSeek-R1-Distill-Qwen-7B), prepares the dataset, and runs the training using the SFTTrainer from the TRL library.

Models

Base Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
Fine-tuned Model: Floppanacci/DeepSeek-R1-Distill-Qwen-7B-Floppanacci
Quantized Fine-tuned Model (4-bit AWQ): Floppanacci/DeepSeek-R1-Distill-Qwen-7B-Floppanacci-AWQ

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
awq-quantization.ipynb		awq-quantization.ipynb
supervised-fine-tuning.ipynb		supervised-fine-tuning.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AIMO Math Finetuning using DeepSeek-R1-Distill-Qwen-7B

Description

Installation

Data

Fine-tuning

Models

About

Uh oh!

Releases

Packages

Languages

License

clement-cvll/AIMO-Math-Finetuning

Folders and files

Latest commit

History

Repository files navigation

AIMO Math Finetuning using DeepSeek-R1-Distill-Qwen-7B

Description

Installation

Data

Fine-tuning

Models

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages