A paper titled "Fighting Randomness with Randomness: Mitigating Optimisation Instability of Fine-Tuning using Delayed Ensemble and Noisy Interpolation" produced by experts from our consortium member KInIT together with the Faculty of Information Technology at the University in Brno introduces Delayed Ensemble with Noisy Interpolation (DENI), a novel strategy to mitigate performance instability during the fine-tuning of pre-trained language models (PLMs) like BERT and RoBERTa.
Fine-tuning often exhibits high variance in performance due to randomness in initialization, data shuffling, and model behavior, especially in low-resource settings. DENI builds on noise regularization, ensembling, and model interpolation to reduce this instability while maintaining computational efficiency.
Key Findings and Contributions:
- Performance Stability:
DENI effectively reduces variability in results and outperforms nine established mitigation strategies, including ensembles and noise-based methods, across seven datasets. The strategy improves mean performance while requiring only 37% of the computational cost compared to full ensembling. - Parameter-Efficient Fine-Tuning (PEFT):
DENI shows marked benefits for PEFT methods like LoRA, IA3, and UniPELT, reducing deviations and even outperforming full fine-tuning in some cases. - Synergy with Data Augmentation:
Combining DENI with data augmentation (e.g., paraphrased samples) further enhances performance, particularly in low-resource scenarios, though it incurs a higher computational cost.
Core Methodology:
- Delayed Ensemble (DE): Instead of training multiple models independently, DE creates ensembles by perturbing the parameters of a single trained model using noise, reducing computational overhead.
- Noisy Interpolation (NI): Models are iteratively perturbed and aggregated during training, enhancing robustness against randomness while retaining parameter efficiency.
Evaluation:
The method was benchmarked against:
- Nine mitigation strategies, including Stochastic Weight Averaging (SWA), Mixout, and data augmentation.
- Baselines such as standard fine-tuning (Default) and training with full datasets (All Data).
Results:
- DENI demonstrated consistent performance gains, with a statistically significant improvement of up to 2.11% over existing strategies like ensembling.
- PEFT approaches combined with DENI outperformed full fine-tuning in certain datasets, making it a viable option for resource-constrained environments.
Practical Implications:
DENI’s efficiency and robustness make it ideal for applications requiring stable fine-tuning of PLMs, particularly in low-resource settings where variance and instability are more pronounced. Its compatibility with existing methods like data augmentation further solidifies its utility.
Link to Zenodo: https://zenodo.org/records/14247076
Code repository: https://github.com/kinit-sk/DENI