Learning Real-World Robot Policies by Dreaming 論文速讀Learning Real-World Robot Policies by Dreaming 論文速讀

2023-03-08 07:27:42

Learning Real-World Robot Policies by Dreaming 論文速讀

文章目錄

Learning Real-World Robot Policies by Dreaming 論文速讀
- 前言：
- title: Learning Real World Robot Policies by Dreaming
- Main Idea
- 和model-based的差別，用作者的原話：
- 資訊流圖
- 不同的實驗設定：
- 效果：
- 聯系方式：

前言：

好久沒有細看論文了，最近好奇一個新的領域，搜了十幾篇文章。

但是如果認真看，時間肯定來不及，是以整一個速讀。

康康能不能兩小時整一篇比較感興趣的文章。

模闆直接借用劉嘉俊大佬的。

title: Learning Real World Robot Policies by Dreaming

Paper: http://arxiv.org/abs/1805.07813

Website: https://piergiaj.github.io/robot-dreaming-policy/

Keywords

data efficiency, real-world, dreaming model(world model)

Main Idea

設計了一個dreaming model，使機器人在其中進行interaction，而不是直接和real-world。

能夠處理沒有見過的（unseen）場景，這個就有點意思了。

任務場景：

任務場景1為導航到目标點

任務場景2為避開目标點。

總共就一兩米的場景，接近0.2米内就算成功，這任務也太…

預訓練過程：

we collect a dataset consisting of 40,000 images (400 random trajectories)

訓練：

except initial random action policy samples in all

our experiments

和model-based的差別，用作者的原話：

We use “dreaming” to refer to far more than just model-based RL. What our “dreaming” model does is learns a state-transition model that we can randomly sample previously unseen trajectories from (i.e. what we call dreaming).

Dreaming Model 由 FCNN, VAE, action-conditioned future regressor(ACFR)構成。

ACFR: 可以模拟機器人執行指令action之後的state變化。這就意味着，Dreaming Model相較于之前的Model-based方法，引入了 imagined trajectories 來代替之前的 real trajectories，這也是作者 use the word ‘dreaming’ rather than ‘model-based’ 的用意。詳見reddit上的debate.

以下是dreaming生成的imagine trajectories的可視化：

Learning Real-World Robot Policies by Dreaming 論文速讀Learning Real-World Robot Policies by Dreaming 論文速讀

It is really awesome, isn’t it?

資訊流圖

那我們接下來看一下如此marvelous的dreaming是如何實作的吧！

Learning Real-World Robot Policies by Dreaming 論文速讀Learning Real-World Robot Policies by Dreaming 論文速讀

利用VAE對state圖像進行表征，而不是簡單的自編碼器，是以有一定的生成能力，能處理沒有見過的場景。但是缺點是生成的圖檔太模糊。

那我們接下來看一下如此marvelous的dreaming是如何實作的吧！

Learning Real-World Robot Policies by Dreaming 論文速讀Learning Real-World Robot Policies by Dreaming 論文速讀

Opinion

其實我一直認為像VAE，GAN這種生成網絡是可以用于RL提升data efficiency的，這篇文章确實在像這個方向做，但是GAN本身在實際使用時訓練時間過長，消耗大量資源，是以對RL來說是利是弊還得看具體使用。

利用VAE對state圖像進行表征；
建立一個state-transition model，以 s t , a t s_t, a_t st,at 作為輸入，以 s t + 1 s_{t+1} st+1作為輸出，使其成為action-conditioned s t + 1 = f ( s t , a t ) = F ( s t , G ( a t ) ) s_{t+1}=f(s_t, a_t)=F(s_t, G(a_t)) st+1=f(st,at)=F(st,G(at))
Learning Real-World Robot Policies by Dreaming 論文速讀Learning Real-World Robot Policies by Dreaming 論文速讀
總loss： L = L V A E + γ ∗ L f L = L_{VAE}+ \gamma* L_{f} L=LVAE+γ∗Lf

不同的實驗設定：

沒什麼可說的，這個思路還行，但是效果不夠吸引我。

效果：

圖都不想貼了~

聯系方式：

ps: 歡迎做強化的同學加群一起學習：

深度強化學習-DRL：799378128

歡迎關注知乎帳号：未入門的煉丹學徒

CSDN帳号：https://blog.csdn.net/hehedadaq

極簡spinup+HER+PER代碼實作：https://github.com/kaixindelele/DRLib

Learning Real-World Robot Policies by Dreaming 論文速讀Learning Real-World Robot Policies by Dreaming 論文速讀

Learning Real-World Robot Policies by Dreaming 論文速讀

文章目錄

前言：

title: Learning Real World Robot Policies by Dreaming

Main Idea

和model-based的差別，用作者的原話：

資訊流圖

不同的實驗設定：

效果：

聯系方式：

繼續閱讀

2018國際人工智能與機器人科技峰會

從其他領域和智能機器人的聯系，淺談人工智能的前景

OpenAI ChatGPT 人工智能機器人注冊使用，能以中文對答如流的機器人

當人工智能機器人具有七情六欲

人工智能：人類潛在的毀滅者嗎

Python 搭建一個簡易QQ機器人

ROS機器人Diego 1#制作（五）base controller---角速度的标定

考證大全 | 證券從業資格考試

敲黑闆！2021年證券從業考試考點預測

2021年銀行從業考試考情介紹,果斷收藏!

證券從業合格證書什麼時候列印？有哪些注意事項？

【幹貨滿滿】初級銀行從業考試《個人理财》重點梳理

2020年經濟師考試，難嗎？

MBA提前面試純幹貨分享

MBA值得學麼

【趨高機器視覺】機器視覺技術原了解析及解決方案