Get the latest tech news

LoRA vs. Full Fine-Tuning: An Illusion of Equivalence


Fine-tuning is a crucial paradigm for adapting pre-trained large language models to downstream tasks. Recently, methods like Low-Rank Adaptation (LoRA) have been shown to match the performance of fully fine-tuned models on various tasks with an extreme reduction in the number of trainable parameters. Even in settings where both methods learn similarly accurate models, \emph{are their learned solutions really equivalent?} We study how different fine-tuning methods change pre-trained models by analyzing the model's weight matrices through the lens of their spectral properties. We find that full fine-tuning and LoRA yield weight matrices whose singular value decompositions exhibit very different structure; moreover, the fine-tuned models themselves show distinct generalization behaviors when tested outside the adaptation task's distribution. More specifically, we first show that the weight matrices trained with LoRA have new, high-ranking singular vectors, which we call \emph{intruder dimensions}. Intruder dimensions do not appear during full fine-tuning. Second, we show that LoRA models with intruder dimensions, despite achieving similar performance to full fine-tuning on the target task, become worse models of the pre-training distribution and adapt less robustly to multiple tasks sequentially. Higher-rank, rank-stabilized LoRA models closely mirror full fine-tuning, even when performing on par with lower-rank LoRA models on the same tasks. These results suggest that models updated with LoRA and full fine-tuning access different parts of parameter space, even when they perform equally on the fine-tuned distribution. We conclude by examining why intruder dimensions appear in LoRA fine-tuned models, why they are undesirable, and how their effects can be minimized.

View a PDF of the paper titled LoRA vs Full Fine-tuning: An Illusion of Equivalence, by Reece Shuttleworth and 3 other authors View PDFHTML (experimental) Abstract:Fine-tuning is a crucial paradigm for adapting pre-trained large language models to downstream tasks. Recently, methods like Low-Rank Adaptation (LoRA) have been shown to match the performance of fully fine-tuned models on various tasks with an extreme reduction in the number of trainable parameters.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of tuning

tuning

Photo of LoRA

LoRA

Photo of illusion

illusion

Related news:

News photo

19th-Century ‘Toy Book’ Used Science to Prove That Ghosts Were an Illusion

News photo

SVT-AV1 2.3 Brings More Performance Improvements: AVX-512 & LTO By Default, More Tuning

News photo

MM1.5: Methods, Analysis and Insights from Multimodal LLM Fine-Tuning