Skip to yearly menu bar Skip to main content


Poster

Asymmetric Decision-Making in Online Knowledge Distillation: Unifying Consensus and Divergence

zhaowei chen · Borui Zhao · Yuchen Ge · Yuhao Chen · Renjie Song · Jiajun Liang

[ ]
Tue 15 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

Online Knowledge Distillation (OKD) methods represent a streamlined, one-stage distillation training process that obviates the necessity of transferring knowledge from a pretrained teacher network to a more compact student network. In contrast to existing logits-based OKD methods, this paper presents an innovative approach to leverage intermediate spatial representations. Our analysis of the intermediate features from both teacher and student models reveals two pivotal insights: (1) the similar features between students and teachers are predominantly focused on the foreground objects. (2) teacher models emphasize foreground objects more than students. Building on these findings, we propose Asymmetric Decision-Making (ADM) to enhance feature consensus learning for student models while continuously promoting feature diversity in teacher models. Specifically, Consensus Learning for student models prioritizes spatial features with high consensus relative to teacher models. Conversely, Divergence Learning for teacher models highlights spatial features with lower similarity compared to student models, indicating superior performance by teacher models in these regions. Consequently, ADM facilitates the student models to catch up with the feature learning process of the teacher models. Extensive experiments demonstrate that ADM consistently surpasses existing OKD methods across various online knowledge distillation settings and also achieves superior results when transferred to offline knowledge distillation, semantic segmentation and diffusion distillation tasks.

Live content is unavailable. Log in and register to view live content