ICML Poster Adaptive Localization of Knowledge Negation for Continual LLM Unlearning

Poster

Adaptive Localization of Knowledge Negation for Continual LLM Unlearning

Abudukelimu Wuerkaixi · Qizhou Wang · Sen Cui · Wutong Xu · Bo Han · Gang Niu · Masashi Sugiyama · Changshui Zhang

[ Abstract ]

Tue 15 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

With the growing deployment of large language models (LLMs) across diverse domains, concerns regarding their safety have grown substantially.LLM unlearning has emerged as a pivotal approach to removing harmful or unlawful contents while maintaining utility.Despite increasing interest, the challenges of continual unlearning, which is common in real-world scenarios, remain underexplored.Successive unlearning tasks often lead to intensified utility degradation.To effectively unlearn targeted knowledge while preserving LLM utility, it is essential to minimize changes in model parameters by selectively updating those linked to the target knowledge, thereby ensuring other knowledge remains unaffected.Building on the task vector framework, we propose a new method named ALKN (Adaptive Localization of Knowledge Negation), which uses dynamic masking to sparsify training gradients and adaptively adjusts unlearning intensity based on inter-task relationships.Comprehensive experiments across three well-established LLM unlearning datasets demonstrate that our approach consistently outperforms baseline methods in both unlearning effectiveness and utility retention under continual unlearning settings.

Live content is unavailable. Log in and register to view live content