【论文笔记】Reasoning about Entailment with Neural AttentionReasoning about Entailment with Neural Attention

2023-03-08 04:08:26

Reasoning about Entailment with Neural Attention

这篇论文主要讲了他们第一次应用深度学习取得了比现阶段人工特征更好的结果（201509），模型架构大体是:LSTM—Attention—FC分类

https://arxiv.org/pdf/1509.06664v1.pdf

【论文笔记】Reasoning about Entailment with Neural AttentionReasoning about Entailment with Neural Attention

LSTM层

他们使用两个不同的LSTM来分别对Premise和Hypothesis进行向前传播， L S T M h y p o t h e s i s LSTM_{hypothesis} LSTMhypothesis的第一个中间状态 c 0 c_0 c0 是由 L S T M p r e m i s e LSTM_{premise} LSTMpremise的最后一个中间状态初始化的。他们的说法是，没有必要重复对Hypothesis进行encode（指将hypothesis句子经过和Premise同一个LSTM的处理称为encode），这样在 L S T M h y p o t h e s i s LSTM_{hypothesis} LSTMhypothesis里，会更加关注与premise的语义关联的处理。

attention层

他们提出了两种方法

传统方法：将 L S T M p r e m i s e LSTM_{premise} LSTMpremise输出拼接为矩阵Y作为输入向量， L S T M h y p o t h e s i s LSTM_{hypothesis} LSTMhypothesis的最后一个输出 h N h_N hN 作为查询向量，使用加性模型计算attention:

M = t a n h ( W y Y + W h h N ⨂ e L ) M = tanh(W^yY+W^hh_N\bigotimes e_L) M=tanh(WyY+WhhN⨂eL)

α = s o f t m a x ( w T M ) \alpha = softmax(w^TM) α=softmax(wTM)

r = Y α T r = Y\alpha^T r=YαT

其中 ⨂ \bigotimes ⨂操作是外积，作用等价于 W h h N ∈ ( k × 1 ) W^hh_N\in (k×1) WhhN∈(k×1)与一个 1 × L 1×L 1×L 维的纯1向量点乘

最后，通过如下计算得到用于分类的最终输出

h ∗ = t a n h ( W p r + W x h N ) h^* = tanh(W^pr + W^xh_N) h∗=tanh(Wpr+WxhN)
Word-by-word Attention：他们的想法是只利用最后一个输出hn作为查询会遇到LSTM的对前面输入记忆的瓶颈，于是迭代地对每一个 L S T M h y p o t h e s i s LSTM_{hypothesis} LSTMhypothesis的输出都用上面的方法进行注意力计算，并在每次计算中使用了上一次计算的输出（即 r t − 1 r_{t-1} rt−1)。最终得到的 r L h r_{L_h} rLh以同样的方式处理。

M t = t a n h [ W y Y + ( W h h t + W r r t − 1 ) ⨂ e L ] M_t = tanh[W^yY+(W^hh_t+W^rr_{t-1}) \bigotimes e_L] Mt=tanh[WyY+(Whht+Wrrt−1)⨂eL]

α t = s o f t m a x ( w T M t ) \alpha_t = softmax(w^TM_t) αt=softmax(wTMt)

r t = Y α L T + t a n h ( W t r t − 1 ) r_t = Y\alpha^T_L + tanh (W^tr_{t-1}) rt=YαLT+tanh(Wtrt−1)

h ∗ = t a n h ( W p r L + W x h N ) h^* = tanh(W^pr_L + W^xh_N) h∗=tanh(WprL+WxhN)

另外他们将Premise和Hypothesis换位输入仅模型并将最后输入合并进行分类，称为双向注意力，这个操作没有带来性能的提高，他们分析是因为蕴藏的含义具有非对称的关系，所以使用相同模型再次encode Hypothesis时可能会造成噪声（这点暂时不太理解）。

【论文笔记】Reasoning about Entailment with Neural AttentionReasoning about Entailment with Neural Attention

Reasoning about Entailment with Neural Attention

LSTM层

attention层

继续阅读

敲黑板！2021年证券从业考试考点预测

2021年银行从业考试考情介绍,果断收藏!

证券从业合格证书什么时候打印？有哪些注意事项？

【干货满满】初级银行从业考试《个人理财》重点梳理

2020年经济师考试，难吗？

初级银行从业资格证有什么用？

MBA提前面试纯干货分享

MBA值得学么

软考-高项-论文-信息系统项目的风险管理

吴恩达logistic回归实现

【人工智能行业大师访谈1】吴恩达采访 Geoffery Hinton

深度学习模型分析人类复杂疾病的准确性

人工智能如何有效地运用于自然语言处理

【趋高机器视觉】机器视觉技术原理解析及解决方案

解码器用于语义分割：数据依赖的解码可以实现灵活的特征聚合

cs231n斯坦福基于卷积神经网络的CV学习笔记（一）KNN和线性分类器/分类器损失/反向传播一，KNN图像分类算法二，线性分类器三，线性分类器损失四，反向传播五，神经网络