Show, Reward and Tell

2023-06-21 00:03:55

這是AAAI2018用GAN和reinforcement learning（RL）做Photo Stream Story Telling的文章。paper連結https://pdfs.semanticscholar.org/977b/eecdf0b5c3487d03738cff501c79770f0858.pdf，暫時還沒有找到作何的首頁和相關的code，文章題目Show, Reward and Tell: Automatic Generation of Narrative Paragraph from Photo Stream by Adversarial Training

個人瞎扯：看這篇文章的兩個原因

這個task算是跨媒體任務中比較有意思的。
這篇paper同時利用了GAN和reinforcement learning（RL）

文章要做的事情（Photo Stream Story Telling）：

輸入：photo strean（several images）　　　　輸出：paragraph

文章show出來的example如下圖所示。

Show, Reward and Tell

與state-of-the-art方法和ground-truth對比結果如下所示。

Show, Reward and Tell

method

paper的framework如下所示。

Show, Reward and Tell

文章中的幾個點：

Multi-modal Discriminator（sentence-level）： generate relevance sentence of image。判斷的内容為圖像分别與成對，不成對和生成的sentence做concatenation（discriminate concatenation）。

Language-style Discriminator（paragraph-level）： generate human-level story。判斷的内容為ground truth stories (gt), random combinations of ground truth sentences (random) and the generated narrative paragraphs (generated)（discriminate paragraph）。

Reward Function：對relevant sentence和human-level story做reward。

Show, Reward and Tell

繼續閱讀

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

ECCV2018比較有意思的paper