天天看点

Glove公式推导

定义符号:

X i = ∑ j = 1 N X i , j P i , k = X i , k X i r a t i o i , j , k = P i , k P j , k X_i = \sum_{j=1}^N{X_{i,j}}\\ P_{i,k} = \frac{X_{i,k}}{X_i}\\ ratio_{i,j,k} = \frac{P_{i,k}}{P_{j,k}} Xi​=j=1∑N​Xi,j​Pi,k​=Xi​Xi,k​​ratioi,j,k​=Pj,k​Pi,k​​

ratioi,j,k的值 单词j,k相关 单词j,k不相关
单词i,k相关 趋近1 很大
单词i,k不相关 很小 趋近1

推导:

假设已经得到词向量,则词向量和共现矩阵应该具有很好的一致性。假设词向量

$v_i ,v_j, v_k$

计算 r a t i o i , j , k ratio_{i,j,k} ratioi,j,k​的函数为 g ( w i , w j , w k ) g(w_i ,w_j ,w_k) g(wi​,wj​,wk​),则:

P i , k P j , k = r a t i o i , j , k = g ( w i , w j , w k ) \frac{P_{i,k}}{P_{j,k}} = ratio_{i,j,k} = g(w_{i},w_{j},w_{k}) Pj,k​Pi,k​​=ratioi,j,k​=g(wi​,wj​,wk​)

需要等式左右尽可能接近,所以代价函数:

J = ∑ i , j , k N ( P i , k P j , k − g ( w i , w j , w k ) ) 2 J = \sum_{i,j,k}^N(\frac{P_{i,k}}{P_{j,k}}-g(w_{i},w_{j},w_{k}))^2 J=i,j,k∑N​(Pj,k​Pi,k​​−g(wi​,wj​,wk​))2

但是模型包括三个单词,复杂度 N ∗ N ∗ N N*N*N N∗N∗N。

如何简化:

  1. 要考虑单词i和j之间的关系,则g大概会有 w i − w j w_i - w_j wi​−wj​;
  2. r a t i o i , j , k ratio_{i,j,k} ratioi,j,k​是标量,g也应该是标量,所以g应该包含 ( w i − w j ) T w k (w_i-w_j)^Tw_k (wi​−wj​)Twk​;
  3. 再套上指数运算 e x p ( ) exp() exp(),最终 g ( w i , w j , w k ) = e x p ( ( w i − w j ) T w k ) g(w_i,w_j,w_k) = exp((w_i-w_j)^Tw_k) g(wi​,wj​,wk​)=exp((wi​−wj​)Twk​)

P i , k P j , k = g ( w i , w j , w k ) P i , k P j , k = e x p ( ( w i − w j ) T w k ) P i , k P j , k = e x p ( w i T w k − w j T w k ) P i , k P j , k = e x p ( w i T w k ) e x p ( w j T w k ) \frac{P_{i,k}}{P_{j,k}} = g(w_i,w_j,w_k)\\ \frac{P_{i,k}}{P_{j,k}} = exp((w_i-w_j)^Tw_k)\\ \frac{P_{i,k}}{P_{j,k}} = exp(w_i^Tw_k-w_j^Tw_k)\\ \frac{P_{i,k}}{P_{j,k}} = \frac{exp(w_i^Tw_k)}{exp(w_j^Tw_k)} Pj,k​Pi,k​​=g(wi​,wj​,wk​)Pj,k​Pi,k​​=exp((wi​−wj​)Twk​)Pj,k​Pi,k​​=exp(wiT​wk​−wjT​wk​)Pj,k​Pi,k​​=exp(wjT​wk​)exp(wiT​wk​)​

可以看出:

P i , j = e x p ( w i T w j ) P_{i,j} = exp(w_i^Tw_j) Pi,j​=exp(wiT​wj​) l o g ( X i , j ) − l o g ( X i ) = w i T w j log(X_{i,j}) - log(X_i) = w_i^Tw_j log(Xi,j​)−log(Xi​)=wiT​wj​ l o g ( X i , j ) = w i T w j + b i + b j log(X_{i,j}) = w_i^Tw_j+b_i+b_j log(Xi,j​)=wiT​wj​+bi​+bj​

损失函数变为:

J = ∑ i , j N ( w i T w j + b i + b j − l o g ( X i , j ) ) 2 J = \sum_{i,j}^N(w_i^Tw_j+b_i+b_j-log(X_{i,j}))^2 J=i,j∑N​(wiT​wj​+bi​+bj​−log(Xi,j​))2

矩阵分解方法,有个缺点,就是各个词的权重是一样的

基于出现频率越高的词对权重应该越大的原则,损失函数添加权重项:

J = ∑ i , j N f ( X i , j ) ( v i T v j + b i + b j − l o g ( X i , j ) ) 2 J = \sum_{i,j}^Nf(X_{i,j})(v_i^Tv_j+b_i+b_j-log(X_{i,j}))^2 J=i,j∑N​f(Xi,j​)(viT​vj​+bi​+bj​−log(Xi,j​))2 f ( x ) = { ( x / x m a x ) 0.75 , if  x &lt; x m a x 1 , if  x &gt; = x m a x f(x) = \begin{cases} (x/x_{max})^{0.75}, &amp;\text{if } x &lt; x_{max} \\ 1, &amp;\text{if } x&gt;=x_{max} \end{cases} f(x)={(x/xmax​)0.75,1,​if x<xmax​if x>=xmax​​

Glove公式推导

继续阅读