Soft Label & Hard Label

Published in

學以廣才

5 min readJul 5, 2021

參考李宏毅老師與各路大神的整理

Hard Label

一般常見的標記分類方式，以One-Hot-Encoding定義一個長度為N的向量，每個類別以二元碼表示，例如 : 狗 [1, 0, 0] ，貓[0,1,0]，其他[0,0,1]
Hard Label = binary encoded e.g. cat is [dog:0, cat:1, others:0]

Soft Label

給定一個長度為N的向量，描述一個樣本同時屬於多個類別，以一個分佈的形式呈現樣本為屬於其他樣本的可能性，例如:
Soft Label = probability encoded e.g. cat is [dog:0.3, cat:0.5, others:0.2]
機率化標籤 (probabilistic label)的所有元素加總為1，每個值為正數。
未限制標籤(unrestricted label)元素沒有限制，可以是任何實數值(含負數)。

應用方面

hard label常用於一般分類任務，而Soft label可用於一些模擬兩可的狀況與知識蒸餾(Knowledge Distillation)的技術上，也有混用的情況。

知識蒸餾簡介 (節錄此文)

通常擁有良好的性能和泛化能力的模型，通常較為複雜或者是若干網絡的集合，而小模型因為規模較小，表達能力有限。知識蒸餾就希望可以利用大模型學習到的知識去指導小模型，使得小模型訓練後具有與大模型相當的性能，但是參數數量大幅降低，從而實現模型壓縮與加速。模型壓縮大體上可以分為:

模型剪枝：即移除對結果作用較小的組件，如減少head的數量和去除作用較少的層，共享參數等，ALBERT屬於這種；
量化：比如將float32降到float8；
知識蒸餾：將teacher的能力蒸餾到student上，一般student會比teacher小。我們可以把一個大而深的網絡蒸餾到一個小的網絡，也可以把集成的網絡蒸餾到一個小的網絡上。
參數共享：通過共享參數，達到減少網絡參數的目的，如ALBERT共享了Transformer層
參數矩陣近似：通過矩陣低秩分解或其他方法達到降低矩陣參數的目的

Hard-target：原始數據集標註的one-shot標籤，除了正標籤為1，其他負標籤都是0。
Soft-target：Teacher模型softmax層輸出的類別概率，每個類別都分配了概率，正標籤的概率最高。

參考與延伸閱讀

Knowledge Distillation: A Survey

In recent years, deep neural networks have been successful in both industry and academia, especially for computer…

arxiv.org

About the definition of "soft label" and "hard label"

Thanks for contributing an answer to Artificial Intelligence Stack Exchange! Please be sure to answer the question…

ai.stackexchange.com

HackMD - Collaborative Markdown Knowledge Base

We've looked for ways to make our team discussions and process more visible and accessible to all users. We recently…

hackmd.io

深度学习中的知识蒸馏技术（上）

作者：Microstrong转载自：Microstrong原文链接： ...

zhuanlan.zhihu.com

如何理解soft target这一做法？

题主最近在研究如何压缩一个ensemble模型。读了hinton的distill dark knowledge的文章(http://arxiv.org/...

www.zhihu.com

從軟標籤來思考Few Shot Learning

一窺Less Than One Shot Learning的全新思路

yulongtsai.medium.com

Self-training當道：對比Pre-training的優缺點

通常在做大部分deep…

medium.com

Noisy Student: Knowledge Distillation強化Semi-supervise Learning

Self-supervised leaning與semi-supervised learning都是近年相當熱門的研究題目 (畢竟supervised learning早已發展到一個高峰)。其中，Noisy Student是今年…