這是AAAI2018用GAN和reinforcement learning(RL)做Photo Stream Story Telling的文章。paper連結https://pdfs.semanticscholar.org/977b/eecdf0b5c3487d03738cff501c79770f0858.pdf,暫時還沒有找到作何的首頁和相關的code,文章題目Show, Reward and Tell: Automatic Generation of Narrative Paragraph from Photo Stream by Adversarial Training
個人瞎扯:看這篇文章的兩個原因
- 這個task算是跨媒體任務中比較有意思的。
- 這篇paper同時利用了GAN和reinforcement learning(RL)
文章要做的事情(Photo Stream Story Telling):
輸入:photo strean(several images) 輸出:paragraph
文章show出來的example如下圖所示。
![](https://img.laitimes.com/img/9ZDMuAjOiMmIsIjOiQnIsICM38CXlZHbvN3cpR2Lc1TPB10QGtWUCpEMJ9CXsxWam9CXwADNvwVZ6l2c052bm9CXUJDT1wkNhVzLcRnbvZ2LcNTQq5UdsdUZxolMMBjVtJWd0ckW65UbM5WOHJWa5kHT20ESjBjUIF2LcRHelR3LcJzLctmch1mclRXY39TNyMDNwIjMwETMyQDM4EDMy8CX0Vmbu4GZzNmLn9Gbi1yZtl2Lc9CX6MHc0RHaiojIsJye.jpg)
與state-of-the-art方法和ground-truth對比結果如下所示。
method
paper的framework如下所示。
文章中的幾個點:
Multi-modal Discriminator(sentence-level): generate relevance sentence of image。判斷的内容為圖像分别與成對,不成對和生成的sentence做concatenation(discriminate concatenation)。
Language-style Discriminator(paragraph-level): generate human-level story。判斷的内容為ground truth stories (gt), random combinations of ground truth sentences (random) and the generated narrative paragraphs (generated)(discriminate paragraph)。
Reward Function: 對relevant sentence和human-level story做reward。