loss函數之CosineEmbeddingLoss，HingeEmbeddingLoss

CosineEmbeddingLoss

餘弦相似度損失函數，用于判斷輸入的兩個向量是否相似。常用于非線性詞向量學習以及半監督學習。

對于包含 N N N個樣本的batch資料 D ( a , b , y ) D(a,b,y) D(a,b,y)。 a a a, b b b 代表輸入的兩個向量， y y y代表真實的類别标簽，屬于 { 1 , − 1 } \{1,-1\} {1,−1}，分别表示相似與不相似。第 i i i個樣本對應的 l o s s loss loss，如下：

l i = { 1 − cos ⁡ ( a i , b i ) , if y i = 1 max ⁡ ( 0 , cos ⁡ ( a i , b i ) − margin ⁡ ) , if y i = − 1 l_{i}=\left\{\begin{array}{ll}1-\cos \left(a_{i},b_{i}\right), & \text { if } y_{i}=1 \\ \max \left(0, \cos \left(a_{i}, b_{i}\right)-\operatorname{margin}\right), & \text { if } y_{i}=-1\end{array}\right. li={1−cos(ai,bi),max(0,cos(ai,bi)−margin), if yi=1 if yi=−1

cos ⁡ ( a i , b i ) \cos \left(a_{i},b_{i}\right) cos(ai,bi)代表計算向量 a i a_{i} ai 和 b i b_{i} bi夾角的餘弦值。

當标簽 y i = − 1 y_{i}=-1 yi=−1且 cos ⁡ ( a i , b i ) < m a r g i n \cos \left(a_{i},b_{i}\right) < margin cos(ai,bi)<margin， l i = 0 l_{i}=0 li=0。

此時輸入樣本不相似且 cos ⁡ ( a i , b i ) \cos \left(a_{i},b_{i}\right) cos(ai,bi)比較小，屬于易分類樣本，不計入 l o s s loss loss.

當标簽 y i = − 1 y_{i}=-1 yi=−1且 cos ⁡ ( a i , b i ) > m a r g i n \cos \left(a_{i},b_{i}\right) > margin cos(ai,bi)>margin， l i = cos ⁡ ( a i , b i ) − margin ⁡ l_{i}=\cos \left(a_{i},b_{i}\right)-\operatorname{margin} li=cos(ai,bi)−margin。

當 y i = 1 y_{i}=1 yi=1時， l i = 1 − cos ⁡ ( a i , b i ) l_{i}=1-\cos \left(a_{i},b_{i}\right) li=1−cos(ai,bi)。特别的，當 a i a_{i} ai 和 b i b_{i} bi夾角為0時， l i = 0 l_{i}=0 li=0。

class CosineEmbeddingLoss(_Loss):
    __constants__ = ['margin', 'reduction']
    def __init__(self, margin=0., size_average=None, reduce=None, reduction='mean'):
        super(CosineEmbeddingLoss, self).__init__(size_average, reduce, reduction)
        self.margin = margin
    def forward(self, input1, input2, target):
        return F.cosine_embedding_loss(input1, input2, target, margin=self.margin, reduction=self.reduction)

pytorch中通過

torch.nn.CosineEmbeddingLoss

類實作，也可以直接調用

F.cosine_embedding_loss

函數，代碼中的

size_average

與

reduce

已經棄用。reduction有三種取值

mean

sum

none

，對應不同的傳回 ℓ ( x , y ) \ell(x, y) ℓ(x,y)。預設為

mean

，對應于上述 l o s s loss loss的計算

L = { l 1 , … , l N } L=\left\{l_{1}, \ldots, l_{N}\right\} L={l1,…,lN}

ℓ ( x , y ) = { L , if reduction = ’none’ 1 N ∑ i = 1 N l i , if reduction = ’mean’ ∑ i = 1 N l i if reduction = ’sum’ \ell(x, y)=\left\{\begin{array}{ll}L, & \text { if reduction }=\text { 'none' } \\ \frac1{N}\sum_{i=1}^{N} l_{i}, & \text { if reduction }=\text { 'mean' } \\ \sum_{i=1}^{N} l_{i} & \text { if reduction }=\text { 'sum' }\end{array} \right. ℓ(x,y)=⎩⎨⎧L,N1∑i=1Nli,∑i=1Nli if reduction = ’none’ if reduction = ’mean’ if reduction = ’sum’

m a r g i n margin margin取值為 [ − 1 , 1 ] [-1,1] [−1,1]，建議取值 [ 0 , 0.5 ] [0,0.5] [0,0.5]

例子：

import torch
import torch.nn as nn

torch.manual_seed(20)
cosine_loss = nn.CosineEmbeddingLoss(margin=0.2)
a = torch.randn(100, 128, requires_grad=True)
b = torch.randn(100, 128, requires_grad=True)
print(a.size())
print(b.size())
y = 2 * torch.empty(100).random_(2) - 1
output = cosine_loss(a, b, y)
print(output.item())

# none 失效？
triplet_loss = nn.CosineEmbeddingLoss(margin=0.2, reduction="none")
output = cosine_loss(a, b, y)
print(output.item())

輸出結果：

torch.Size([100, 128])
torch.Size([100, 128])
0.49418219923973083
0.49418219923973083

似乎

reduction='none'

也是傳回一個标量，為什麼？可能是函數雖然定義了，但是未實作

HingeEmbeddingLoss

用于判斷兩個向量是否相似，輸入是兩個向量之間的距離。常用于非線性詞向量學習以及半監督學習。

對于包含 N N N個樣本的batch資料 D ( x , y ) D(x,y) D(x,y)。 x x x代表兩個向量的距離， y y y代表真實的标簽， y y y中元素的值屬于 { 1 , − 1 } \{1,-1\} {1,−1}，分别表示相似與不相似。第 i i i個樣本對應的 l o s s loss loss，如下：

l i = { x i , if y i = 1 m a x ( 0 , margin − x i ) , if y i = − 1 l_{i}=\left\{\begin{array}{ll}x_{i}, & \text { if }y_{i}=1 \\ max(0,\text{margin} -x_{i}), & \text { if }y_{i}=-1\end{array}\right. li={xi,max(0,margin−xi), if yi=1 if yi=−1

當 y i = − 1 y_{i}=-1 yi=−1, 即兩個向量不相似時，若距離 x i x_{i} xi大于

margin

，則屬于易判斷樣本，不計入loss， l i = 0 l_{i}=0 li=0

class HingeEmbeddingLoss(_Loss):
    __constants__ = ['margin', 'reduction']
    def __init__(self, margin=1.0, size_average=None, reduce=None, reduction='mean'):
        super(HingeEmbeddingLoss, self).__init__(size_average, reduce, reduction)
        self.margin = margin
    def forward(self, input, target):
        return F.hinge_embedding_loss(input, target, margin=self.margin, reduction=self.reduction)

pytorch中通過

torch.nn.HingeEmbeddingLoss

類實作，也可以直接調用

F.hinge_embedding_loss

函數，代碼中的

size_average

與

reduce

已經棄用。reduction有三種取值

mean

sum

none

，對應不同的傳回 ℓ ( x , y ) \ell(x, y) ℓ(x,y)。預設為

mean

，對應于上述 l o s s loss loss的計算。

margin

預設為1

L = { l 1 , … , l N } L=\left\{l_{1}, \ldots, l_{N}\right\} L={l1,…,lN}

ℓ ( x , y ) = { L ⁡ , if reduction = ’none’ 1 N ∑ i = 1 N l i , if reduction = ’mean’ ∑ i = 1 N l i if reduction = ’sum’ \ell(x, y)=\left\{\begin{array}{ll}\operatorname L, & \text { if reduction }=\text { 'none' } \\ \frac1{N}\sum_{i=1}^{N} l_{i}, & \text { if reduction }=\text { 'mean' } \\ \sum_{i=1}^{N} l_{i} & \text { if reduction }=\text { 'sum' }\end{array} \right. ℓ(x,y)=⎩⎨⎧L,N1∑i=1Nli,∑i=1Nli if reduction = ’none’ if reduction = ’mean’ if reduction = ’sum’

例子：

import torch
import torch.nn as nn

torch.manual_seed(20)
hinge_loss = nn.HingeEmbeddingLoss(margin=0.2)
a = torch.randn(100, 128, requires_grad=True)
b = torch.randn(100, 128, requires_grad=True)
x = 1 - torch.cosine_similarity(a, b)
# 定義a與b之間的距離為x
print(x.size())
y = 2 * torch.empty(100).random_(2) - 1
output = hinge_loss(x, y)
print(output.item())

hinge_loss = nn.HingeEmbeddingLoss(margin=0.2, reduction="none")
output = hinge_loss(x, y)
print(output)

輸出：

torch.Size([100])
0.4938560426235199
tensor([0.0000, 1.0821, 0.0000, 1.0337, 1.0798, 0.0000, 1.0582, 0.0000, 0.8795,
        0.0000, 1.1377, 0.0000, 0.9727, 1.0088, 0.0000, 0.0000, 0.0000, 0.0000,
        0.0000, 0.9941, 1.0539, 0.0000, 0.0000, 0.0000, 1.1907, 0.9647, 0.8875,
        0.8585, 0.9471, 0.0000, 0.0000, 0.9677, 0.0000, 0.0000, 0.0000, 0.8393,
        0.0000, 0.9900, 1.1510, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
        0.9491, 0.9202, 0.0000, 0.9338, 1.0044, 0.0000, 1.1716, 1.0480, 0.8654,
        0.8302, 0.0000, 0.8969, 0.0000, 0.0000, 1.0293, 0.0000, 1.1107, 0.8257,
        0.9162, 1.0796, 1.0330, 0.0000, 0.9933, 0.0000, 0.0000, 1.0066, 0.0000,
        0.0000, 0.0000, 0.0000, 0.9410, 0.8609, 1.0060, 0.0000, 0.8454, 0.0000,
        1.0362, 0.0000, 1.0253, 1.0560, 1.0759, 0.9888, 0.0000, 1.0147, 0.8566,
        0.9453, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.9874, 0.0000, 0.0000,
        1.0352], grad_fn=<AddBackward0>)

loss函數之CosineEmbeddingLoss，HingeEmbeddingLoss

CosineEmbeddingLoss

HingeEmbeddingLoss

繼續閱讀

Bilibili Narrows Loss by 38% in Q1

BEV感覺：DETR3D前言Method實驗結果

【深度學習理論】Model的Errors來自哪裡？1. 前言2. 實驗3. 分析

不同開源圖像庫的resize及上下采樣背景實作方法插值方法

Can not squeeze dim[1], expected a dimension of 1

iQOO新出的真無線耳機TW1到手了，iQOO經典的賽道版配色，連外包裝都是碳纖維紋理，看起來就很性能，TW1跟iQOO

iQOOTWS1「賽道版」#TECHMAN圖賞#iQOOTWS1名字很簡單粗暴，賣點也一樣簡單粗暴，就是主打無損音質。用

YOLOV3模型優化系列（二）CIOU-Loss YOLOV3介紹

基于Keras的IMDB資料集的損失率和準确率對比實戰：

China's Top Property Developer Country Garden Expected to Record Losses

GoogLeNet inception v3 到底有多少參數？

模型的flops、推理速度、參數量

softmax與cross entropy的差別聯系

深度學習理論之數學基礎一、線性代數二機率論及資訊論

【深度學習500問】深度學習的數學基礎部分（10/9）

深度學習之卷積01 卷積02 填充Padding03 步幅Stride04 卷積核的選擇05 多通道卷積參考