Get the latest tech news

Uncensor any LLM with abliteration

A Blog post by Maxime Labonne on Hugging Face

In this example, we'll use mlabonne/Daredevil-8B, a mega-merge created with DARE TIES (see my article about model merging) that has the highest MMLU score on the Open LLM Leaderboard in the 8B category. If you can't find a layer that satisfies these requirements, you might want to test other residual streams in the previous selected_layers list, other instructions, additional blocks, etc. Andy Arditi, Oscar Obeso, Aaquib111, wesg, Neel Nanda, " Refusal in LLMs is mediated by a single direction," Lesswrong, 2024.

Get the Android app

Or read this on Hacker News