This project explores two distinct fine-tuning methodologies—Weighted Full Fine-tuning and LoRA (Low-Rank Adaptation)—applied to the specialized Mental-BERT-Base-Uncased model. The goal was to compare the performance, training efficiency, and parameter footprint of these methods for a specific downstream task (e.g., mental health text classification, which is implied by the base model).
The foundation of this work is the Mental-BERT-Base-Uncased model, a BERT-based model specifically pre-trained on mental health-related texts.
- Hugging Face Link: mental/mental-bert-base-uncased
This specialized model is leveraged to ensure better domain-specific understanding and performance compared to general-purpose language models.
This is the standard approach where the entire pre-trained model is trained on the new dataset. Weighted Full Fine-tuning specifically incorporates sample weights (or class weights) into the loss function to handle potential class imbalance in the training data, ensuring the model pays adequate attention to underrepresented classes.
- Approach: All parameters in the pre-trained backbone are updated.
- Key Feature: Use of weights in the loss function to mitigate class imbalance.
LoRA is a parameter-efficient fine-tuning (PEFT) technique. It freezes the pre-trained model weights and injects trainable rank decomposition matrices (A and B) into the transformer layers. This drastically reduces the number of trainable parameters while retaining the performance of the full model.
- Approach: Only the newly added low-rank matrices are updated; the original model weights are frozen.
- Key Feature: Significantly reduced memory usage and faster training due to fewer trainable parameters.
The following figures illustrate the performance metrics (e.g., accuracy, F1-score, loss) achieved by both fine-tuning methods on the test set.
The analysis of the results demonstrates the efficiency and effectiveness of the LoRA approach.
LoRA achieved similar results to full fine-tuning with training only 4.67% of parameters. This validates LoRA as a highly efficient method for adapting large, specialized language models like Mental-BERT to new tasks without the computational burden of updating the entire model.

