機器人相關學術速遞[9.10]

Update！H5支援摘要折疊，體驗更佳！點選閱讀原文通路arxivdaily.com，涵蓋CS|實體|數學|經濟|統計|金融|生物|電氣領域，更有搜尋、收藏等功能！

cs.RO機器人相關，共計26篇

【1】 Leveraging Local Domains for Image-to-Image Translation

标題：利用本地域進行圖像到圖像的轉換

連結：https://arxiv.org/abs/2109.04468

作者：Anthony Dell'Eva,Fabio Pizzati,Massimo Bertozzi,Raoul de Charette

機構：VisLab, Parma, Italy, Inria, Paris, France, University of Parma, Parma, Italy

備注：Submitted to conference

摘要：圖像到圖像（i2i）網絡難以捕獲局部變化，因為它們不會影響全局場景結構。例如，從公路場景轉換到越野場景，i2i網絡很容易關注全局顔色特征，但忽略了人類的明顯特征，如沒有車道标記。在本文中，我們利用人類關于空間域特征的知識，我們稱之為“局部域”，并展示其對圖像到圖像翻譯的好處。依靠一個簡單的幾何指導，我們根據少量的源資料訓練了一個基于GAN的更新檔，并幻覺出一個新的看不見的域，進而簡化了向目标的轉移學習。我們在三個任務上進行了實驗，從非結構化環境到惡劣天氣。我們的綜合評估設定表明，我們能夠以最少的先驗知識生成真實的翻譯，并且隻對少數圖像進行訓練。此外，當對我們的翻譯圖像進行訓練時，我們發現所有測試的代理任務都得到了顯著的改進，在訓練時從未看到目标域。

摘要：Image-to-image (i2i) networks struggle to capture local changes because they do not affect the global scene structure. For example, translating from highway scenes to offroad, i2i networks easily focus on global color features but ignore obvious traits for humans like the absence of lane markings. In this paper, we leverage human knowledge about spatial domain characteristics which we refer to as 'local domains' and demonstrate its benefit for image-to-image translation. Relying on a simple geometrical guidance, we train a patch-based GAN on few source data and hallucinate a new unseen domain which subsequently eases transfer learning to target. We experiment on three tasks ranging from unstructured environments to adverse weather. Our comprehensive evaluation setting shows we are able to generate realistic translations, with minimal priors, and training only on a few images. Furthermore, when trained on our translations images we show that all tested proxy tasks are significantly improved, without ever seeing target domain at training.

【2】 NEAT: Neural Attention Fields for End-to-End Autonomous Driving

标題：Neat：端到端自動駕駛的神經注意區域

連結：https://arxiv.org/abs/2109.04456

作者：Kashyap Chitta,Aditya Prakash,Andreas Geiger

機構：Max Planck Institute for Intelligent Systems, T¨ubingen, University of T¨ubingen

備注：ICCV 2021

摘要：關于場景的語義、空間和時間結構的有效推理是自主駕駛的關鍵先決條件。我們提出了神經注意場（NEAT），這是一種新的表示方法，可以為端到端的模仿學習模型提供這種推理。NEAT是一個連續函數，它将鳥瞰視圖（BEV）場景坐标中的位置映射到航路點和語義，使用中間注意貼圖将高維2D圖像特征疊代壓縮為緊湊表示。這使得我們的模型能夠有選擇地關注輸入中的相關區域，同時忽略與駕駛任務無關的資訊，進而有效地将圖像與BEV表示相關聯。在一個新的評估設定涉及惡劣的環境條件和具有挑戰性的方案，整潔優于幾個強基線，并實作駕駛分數與特權卡拉專家用來生成其訓練資料。此外，可視化具有整潔中間表示的模型的注意圖提供了改進的可解釋性。

摘要：Efficient reasoning about the semantic, spatial, and temporal structure of a scene is a crucial prerequisite for autonomous driving. We present NEural ATtention fields (NEAT), a novel representation that enables such reasoning for end-to-end imitation learning models. NEAT is a continuous function which maps locations in Bird's Eye View (BEV) scene coordinates to waypoints and semantics, using intermediate attention maps to iteratively compress high-dimensional 2D image features into a compact representation. This allows our model to selectively attend to relevant regions in the input while ignoring information irrelevant to the driving task, effectively associating the images with the BEV representation. In a new evaluation setting involving adverse environmental conditions and challenging scenarios, NEAT outperforms several strong baselines and achieves driving scores on par with the privileged CARLA expert used to generate its training data. Furthermore, visualizing the attention maps for models with NEAT intermediate representations provides improved interpretability.

【3】 Mini Cheetah, the Falling Cat: A Case Study in Machine Learning and Trajectory Optimization for Robot Acrobatics

标題：迷你獵豹，墜落的貓：雜技機器人的機器學習和軌迹優化研究

連結：https://arxiv.org/abs/2109.04424

作者：Vince Kurtz,He Li,Patrick M. Wensing,Hai Lin

機構： University of Notre Dame

摘要：貓在跌倒後總是用腳着地，這似乎是對基本實體的蔑視。在本文中，我們設計了一個控制器，使小型獵豹四足機器人也能用腳着陸。具體來說，我們将探索軌迹優化和機器學習如何協同工作以實作高度動态的仿生行為。我們發現，神經網絡學習整個狀态軌迹的反射方法優于神經網絡學習從狀态到控制輸入的映射的政策方法。我們在仿真和硬體實驗中驗證了我們提出的控制器，并且能夠在初始俯仰角在-90到90度之間的情況下，使機器人從墜落中雙腳着地。

摘要：Seemingly in defiance of basic physics, cats consistently land on their feet after falling. In this paper, we design a controller that lands the Mini Cheetah quadruped robot on its feet as well. Specifically, we explore how trajectory optimization and machine learning can work together to enable highly dynamic bioinspired behaviors. We find that a reflex approach, in which a neural network learns entire state trajectories, outperforms a policy approach, in which a neural network learns a mapping from states to control inputs. We validate our proposed controller in both simulation and hardware experiments, and are able to land the robot on its feet from falls with initial pitch angles between -90 and 90 degrees.

【4】 Dynamic Modeling of Hand-Object Interactions via Tactile Sensing

标題：基于觸覺的手-物互動動态模組化

連結：https://arxiv.org/abs/2109.04378

作者：Qiang Zhang,Yunzhu Li,Yiyue Luo,Wan Shou,Michael Foshey,Junchi Yan,Joshua B. Tenenbaum,Wojciech Matusik,Antonio Torralba

機構：Juggling, Tactile glove, Stick balancing, Tactile response, Time

備注：IROS 2021. First two authors contributed equally. Project page: this http URL

摘要：觸覺感覺對于人類執行日常任務至關重要。雖然在從視覺分析物體抓取方面已經取得了重大進展，但我們如何利用觸覺感覺來推理和模組化手-物體互相作用的動力學尚不清楚。在這項工作中，我們使用一個高分辨率的觸覺手套在一組不同的物體上執行四種不同的互動活動。我們在跨模式學習架構上構模組化型，并使用視覺處理管道生成标簽，以監督觸覺模型，然後在測試期間可以單獨使用該模型。觸覺模型旨在通過預測模型和對比學習子產品相結合，純粹從觸摸資料預測手和物體的三維位置。該架構可以從觸覺資料推斷互動模式，幻覺環境的變化，估計預測的不确定性，并推廣到看不見的對象。我們還提供了關于不同系統設計的詳細消融研究以及預測軌迹的可視化。這項工作在手-物體互動的動力學模組化方面邁出了一步，從稠密的觸覺感覺開始，這為機器人的活動學習、人機互動和模仿學習的未來應用打開了大門。

摘要：Tactile sensing is critical for humans to perform everyday tasks. While significant progress has been made in analyzing object grasping from vision, it remains unclear how we can utilize tactile sensing to reason about and model the dynamics of hand-object interactions. In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects. We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model, which can then be used on its own during the test time. The tactile model aims to predict the 3d locations of both the hand and the object purely from the touch data by combining a predictive model and a contrastive learning module. This framework can reason about the interaction patterns from the tactile data, hallucinate the changes in the environment, estimate the uncertainty of the prediction, and generalize to unseen objects. We also provide detailed ablation studies regarding different system designs as well as visualizations of the predicted trajectories. This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing, which opens the door for future applications in activity learning, human-computer interactions, and imitation learning for robotics.

【5】 Learning Vision-Guided Dynamic Locomotion Over Challenging Terrains

标題：在具有挑戰性的地形上學習視覺引導的動态運動

連結：https://arxiv.org/abs/2109.04322

作者：Zhaocheng Liu,Fernando Acero,Zhibin Li

機構：Institute of Perception, Action and Behaviour, School of Informatics, University of Edinburgh

備注：9 pages, 27 figures, 1 table

摘要：近年來，腿型機器人正變得越來越強大和流行，因為它們有可能将自主智能體的移動性提升到一個新的水準。這項工作提出了一種深度強化學習方法，該方法使用近端政策優化在部分可觀測環境中學習基于雷射雷達的魯棒感覺運動政策。視覺感覺對于積極克服具有挑戰性的地形至關重要，為此，我們提出了一種新的學習政策：動态獎勵政策（DRS），它可以作為一種有效的啟發式方法，使用神經網絡架構學習多功能步态，而無需通路曆史資料。此外，在OpenAI gym環境的修改版本中，在所有測試的挑戰地形中，對提議的工作進行評估，成功率超過90%。

摘要：Legged robots are becoming increasingly powerful and popular in recent years for their potential to bring the mobility of autonomous agents to the next level. This work presents a deep reinforcement learning approach that learns a robust Lidar-based perceptual locomotion policy in a partially observable environment using Proximal Policy Optimisation. Visual perception is critical to actively overcome challenging terrains, and to do so, we propose a novel learning strategy: Dynamic Reward Strategy (DRS), which serves as effective heuristics to learn a versatile gait using a neural network architecture without the need to access the history data. Moreover, in a modified version of the OpenAI gym environment, the proposed work is evaluated with scores over 90% success rate in all tested challenging terrains.

【6】 OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching

标題：OPIRL：基于分布比對的樣本有效非政策逆強化學習

連結：https://arxiv.org/abs/2109.04307

作者：Hana Hoshino,Kei Ota,Asako Kanezaki,Rio Yokota

機構： emerging from the same motivation as 1 School of Computing, Department of Computer Science, Tokyo Instituteof Technology

備注：Under submission

摘要：反向強化學習（IRL）在獎勵工程繁瑣的場景中很有吸引力。然而，以前的IRL算法用于政策轉換，這需要從目前政策中進行密集采樣以獲得穩定和最佳性能。這限制了現實世界中的IRL應用程式，在現實世界中，環境互動可能變得非常昂貴。為了解決這個問題，我們提出了非政策反向強化學習（OPIRL），它（1）采用非政策資料分布而不是政策上的資料分布，能夠顯著減少與環境的互動次數，（2）學習一個固定的獎勵函數，該函數在不斷變化的動态中具有較高的泛化能力，并且（3）利用模式覆寫行為加快收斂。通過實驗，我們證明了我們的方法具有更高的樣本效率，并且可以推廣到新的環境中。我們的方法在政策性能基線上取得了更好或可比的結果，并且互動作用顯著減少。此外，我們的經驗表明，恢複的獎勵函數推廣到現有技術容易失敗的不同任務。

摘要：Inverse Reinforcement Learning (IRL) is attractive in scenarios where reward engineering can be tedious. However, prior IRL algorithms use on-policy transitions, which require intensive sampling from the current policy for stable and optimal performance. This limits IRL applications in the real world, where environment interactions can become highly expensive. To tackle this problem, we present Off-Policy Inverse Reinforcement Learning (OPIRL), which (1) adopts off-policy data distribution instead of on-policy and enables significant reduction of the number of interactions with the environment, (2) learns a stationary reward function that is transferable with high generalization capabilities on changing dynamics, and (3) leverages mode-covering behavior for faster convergence. We demonstrate that our method is considerably more sample efficient and generalizes to novel environments through the experiments. Our method achieves better or comparable results on policy performance baselines with significantly fewer interactions. Furthermore, we empirically show that the recovered reward function generalizes to different tasks where prior arts are prone to fail.

【7】 Energy-Efficient Mobile Robot Control via Run-time Monitoring of Environmental Complexity and Computing Workload

标題：基于運作時環境複雜性和計算工作量監測的高能效移動機器人控制

連結：https://arxiv.org/abs/2109.04285

作者：Sherif A. S. Mohamed,Mohammad-Hashem Haghbayan,Antonio Miele,Onur Mutlu,Juha Plosila

機構：University of Turku

備注：Accepted to be published on 2021 International Conference on Intelligent Robots and Systems (IROS)

摘要：我們提出了一種節能控制器，通過動态操作機器人的機械和計算執行器來最小化移動機器人的能量消耗。移動機器人基于基于事件的錄影機執行基于視覺的實時應用。控制器的執行器為計算部分的CPU電壓/頻率和機械部分的電機電壓。我們表明，獨立考慮機器人的速度控制和CPU的電壓/頻率控制并不一定能得到節能的解決方案。事實上，為了獲得最高的效率，計算部分和機械部分應該協同控制。我們提出了一種快速爬山優化算法，允許控制器在運作時以及移動機器人在行進過程中遇到新環境時找到最佳的CPU/電機配置。在以Jetson TX2闆為計算單元的無刷直流電機機器人和基于DAVIS-346事件的錄影機上的實驗結果表明，在低複雜度、中複雜度和高複雜度環境下，所提出的控制算法在基線上平均可以節省50.5%、41%和30%的電池能量。

摘要：We propose an energy-efficient controller to minimize the energy consumption of a mobile robot by dynamically manipulating the mechanical and computational actuators of the robot. The mobile robot performs real-time vision-based applications based on an event-based camera. The actuators of the controller are CPU voltage/frequency for the computation part and motor voltage for the mechanical part. We show that independently considering speed control of the robot and voltage/frequency control of the CPU does not necessarily result in an energy-efficient solution. In fact, to obtain the highest efficiency, the computation and mechanical parts should be controlled together in synergy. We propose a fast hill-climbing optimization algorithm to allow the controller to find the best CPU/motor configuration at run-time and whenever the mobile robot is facing a new environment during its travel. Experimental results on a robot with Brushless DC Motors, Jetson TX2 board as the computing unit, and a DAVIS-346 event-based camera show that the proposed control algorithm can save battery energy by an average of 50.5%, 41%, and 30%, in low-complexity, medium-complexity, and high-complexity environments, over baselines.

【8】 Solving Simultaneous Target Assignment and Path Planning Efficiently with Time-Independent Execution

标題：時間無關的同時目标配置設定和路徑規劃問題的高效求解

連結：https://arxiv.org/abs/2109.04264

作者：Keisuke Okumura,Xavier Défago

機構：School of Computing, Tokyo Institute of Technology, Tokyo, Japan

備注：19 pages, preprint

摘要：多智能體目标配置設定和路徑規劃組合問題的實時規劃，也稱為多智能體路徑發現（MAPF）的未标記版本，對于多智能體系統中的進階協調至關重要，例如，機器人群的模式形成。本文研究了未标記MAPF的兩個方面：（1）離線場景：以較小的計算時間通過集中式方法解決大型執行個體；（2）線上場景：不顧真實機器人的時間不确定性執行未标記MAPF。為此，我們提出了一種新的完整算法TSWAP，它由具有延遲評估的目标配置設定和具有目标交換的路徑規劃組成。TSWAP可以适應離線和線上場景。我們的經驗表明，離線TSWAP具有高度的可擴充性；提供接近最優的解決方案，同時與現有方法相比，運作時間減少了幾個數量級。此外，我們還通過真實的機器人示範展示了線上TSWAP的優點，例如延遲容忍度。

摘要：Real-time planning for a combined problem of target assignment and path planning for multiple agents, also known as the unlabeled version of Multi-Agent Path Finding (MAPF), is crucial for high-level coordination in multi-agent systems, e.g., pattern formation by robot swarms. This paper studies two aspects of unlabeled-MAPF: (1) offline scenario: solving large instances by centralized approaches with small computation time, and (2) online scenario: executing unlabeled-MAPF despite timing uncertainties of real robots. For this purpose, we propose TSWAP, a novel complete algorithm consisting of target assignment with lazy evaluation and path planning with target swapping. TSWAP can adapt to both offline and online scenarios. We empirically demonstrate that Offline TSWAP is highly scalable; providing near-optimal solutions while reducing runtime by orders of magnitude compared to existing approaches. In addition, we present the benefits of Online TSWAP, such as delay tolerance, through real-robot demos.

【9】 Learning Forceful Manipulation Skills from Multi-modal Human Demonstrations

标題：從多模态人體示範中學習有力的操作技巧

連結：https://arxiv.org/abs/2109.04222

作者：An T. Le,Meng Guo,Niels van Duijkeren,Leonel Rozo,Robert Krug,Andras G. Kupcsik,Mathias Buerger

機構： Instead 1University of Stuttgart; 2Bosch Center for Artificial Intelligence (BCAI)

摘要：從示範中學習（LfD）提供了一種直覺、快速的方法來程式設計機械手。任務參數化表示允許輕松适應新場景和線上觀察。然而，這種方法僅限于姿勢示範，是以僅限于具有空間和時間特征的技能。在這項工作中，我們擴充了LfD架構，以解決強有力的操作技能，這對于裝配等工業過程非常重要。對于這些技能，包括機器人末端執行器姿勢、力和扭矩讀數以及操作場景在内的多模态示範是必不可少的。我們的目标是根據不同場景中示範的姿勢和力剖面可靠地再現這些技能。該方法結合了我們以前在任務參數化優化和基于吸引子的阻抗控制方面的工作。學習技能模型包括（i）統一姿勢和力特征的吸引子模型，以及（ii）優化技能不同階段剛度的剛度模型。此外，還提出了一種線上執行算法，使技能執行适應機器人姿勢、測量力和變化場景的實時觀察。我們在電動自行車電機裝配過程中的幾個步驟上，在7自由度機械臂上嚴格驗證了該方法，這些步驟需要不同類型的強制互動，如插入、滑動和扭轉。

摘要：Learning from Demonstration (LfD) provides an intuitive and fast approach to program robotic manipulators. Task parameterized representations allow easy adaptation to new scenes and online observations. However, this approach has been limited to pose-only demonstrations and thus only skills with spatial and temporal features. In this work, we extend the LfD framework to address forceful manipulation skills, which are of great importance for industrial processes such as assembly. For such skills, multi-modal demonstrations including robot end-effector poses, force and torque readings, and operation scene are essential. Our objective is to reproduce such skills reliably according to the demonstrated pose and force profiles within different scenes. The proposed method combines our previous work on task-parameterized optimization and attractor-based impedance control. The learned skill model consists of (i) the attractor model that unifies the pose and force features, and (ii) the stiffness model that optimizes the stiffness for different stages of the skill. Furthermore, an online execution algorithm is proposed to adapt the skill execution to real-time observations of robot poses, measured forces, and changed scenes. We validate this method rigorously on a 7-DoF robot arm over several steps of an E-bike motor assembly process, which require different types of forceful interaction such as insertion, sliding and twisting.

【10】 Performance, Precision, and Payloads: Adaptive Nonlinear MPC for Quadrotors

标題：性能、精度和有效載荷：四旋翼的自适應非線性預測控制

連結：https://arxiv.org/abs/2109.04210

作者：Drew Hanover,Philipp Foehn,Sihao Sun,Elia Kaufmann,Davide Scaramuzza

機構： University of Zurichand ETH Zurich

備注：8 Pages, 5 figures, submitted to RAL + ICRA 22

摘要：在充滿挑戰的環境中進行靈活的四旋翼飛行有可能徹底改變航運、運輸和搜尋救援應用。非線性模型預測控制（NMPC）最近在靈活四轉子控制中顯示出了有希望的結果，但它依賴于高精度的模型來實作最大性能。是以，未模組化的複雜氣動效應、變化的有效載荷和參數失配形式的模型不确定性将降低系統的整體性能。在本文中，我們提出了L1-NMPC，這是一種新型的混合自适應NMPC，用于線上學習模型不确定性并立即對其進行補償，以最小的計算開銷大大提高了非自适應基線的性能。我們提出的體系結構可以推廣到許多不同的環境中，從中我們可以評估風、未知有效載荷和高度靈活的飛行條件。該方法具有極大的靈活性和魯棒性，在大未知幹擾和無任何增益調整的情況下，與非自适應NMPC相比，跟蹤誤差降低了90%以上。此外，具有相同增益的同一控制器可以準确地飛行高度靈活的賽車軌迹，最高速度為70 km/h，與非自适應NMPC基線相比，提供大約50%的跟蹤性能改進。我們将在驗收後釋出完全開源的代碼。

摘要：Agile quadrotor flight in challenging environments has the potential to revolutionize shipping, transportation, and search and rescue applications. Nonlinear model predictive control (NMPC) has recently shown promising results for agile quadrotor control, but relies on highly accurate models for maximum performance. Hence, model uncertainties in the form of unmodeled complex aerodynamic effects, varying payloads and parameter mismatch will degrade overall system performance. In this paper, we propose L1-NMPC, a novel hybrid adaptive NMPC to learn model uncertainties online and immediately compensate for them, drastically improving performance over the non-adaptive baseline with minimal computational overhead. Our proposed architecture generalizes to many different environments from which we evaluate wind, unknown payloads, and highly agile flight conditions. The proposed method demonstrates immense flexibility and robustness, with more than 90% tracking error reduction over non-adaptive NMPC under large unknown disturbances and without any gain tuning. In addition, the same controller with identical gains can accurately fly highly agile racing trajectories exhibiting top speeds of 70 km/h, offering tracking performance improvements of around 50% relative to the non-adaptive NMPC baseline. We will release our code fully open-sourced upon acceptance.

【11】 DAN: Decentralized Attention-based Neural Network to Solve the MinMax Multiple Traveling Salesman Problem

标題：DAN：基于分散注意力的神經網絡求解MinMax多旅行商問題

連結：https://arxiv.org/abs/2109.04205

作者：Yuhong Cao,Zhanhong Sun,Guillaume Sartoretti

備注：Submitted to IEEE Robotics and Automation Letters (RA-L) on September 9, 2021

摘要：多旅行商問題（mTSP）是一個著名的NP難問題，有着廣泛的實際應用。特别是，這項工作涉及最小-最大mTSP，其目标是最小化所有代理之間的最大行程長度（歐幾裡得距離之和）。中期戰略計劃通常被視為一個組合優化問題，但由于其計算複雜性，基于搜尋的精确和啟發式算法随着城市數量的增加而變得效率低下。受深度強化學習（dRL）最新發展的鼓舞，本研究将mTSP視為一項合作任務，并引入了一種基于分散注意的神經網絡方法來求解MinMax mTSP，即DAN。在DAN中，代理學習完全分散的政策，通過預測其他代理的未來決策，協作建構一個旅行。我們的模型依賴于Transformer體系結構，并使用具有參數共享的多代理RL進行訓練，這為代理和城市的數量提供了自然的可伸縮性。我們在涉及50到1000個城市和5到20個代理的小規模到大規模中期戰略計劃執行個體上實驗性地示範了我們的模型，并與最先進的基線進行了比較。對于小規模問題（少于100個城市），在相同的計算時間預算下，DAN能夠與可用的最佳解算器（或工具，元啟發式解算器）的性能緊密比對。在更大規模的執行個體中，DAN的性能優于傳統的和基于dRL的解算器，同時保持較低的計算時間，并表現出增強的代理之間的協作。

摘要：The multiple traveling salesman problem (mTSP) is a well-known NP-hard problem with numerous real-world applications. In particular, this work addresses MinMax mTSP, where the objective is to minimize the max tour length (sum of Euclidean distances) among all agents. The mTSP is normally considered as a combinatorial optimization problem, but due to its computational complexity, search-based exact and heuristic algorithms become inefficient as the number of cities increases. Encouraged by the recent developments in deep reinforcement learning (dRL), this work considers the mTSP as a cooperative task and introduces a decentralized attention-based neural network method to solve the MinMax mTSP, named DAN. In DAN, agents learn fully decentralized policies to collaboratively construct a tour, by predicting the future decisions of other agents. Our model relies on the Transformer architecture, and is trained using multi-agent RL with parameter sharing, which provides natural scalability to the numbers of agents and cities. We experimentally demonstrate our model on small- to large-scale mTSP instances, which involve 50 to 1000 cities and 5 to 20 agents, and compare against state-of-the-art baselines. For small-scale problems (fewer than 100 cities), DAN is able to closely match the performance of the best solver available (OR Tools, a meta-heuristic solver) given the same computation time budget. In larger-scale instances, DAN outperforms both conventional and dRL-based solvers, while keeping computation times low, and exhibits enhanced collaboration among agents.

【12】 Comfort and Sickness while Virtually Aboard an Autonomous Telepresence Robot

标題：在自主網真機器人上虛拟時的舒适和惡心

連結：https://arxiv.org/abs/2109.04177

作者：Markku Suomalainen,Katherine J. Mimnaugh,Israel Becerra,Eliezer Lozano,Rafael Murrieta-Cid,Steven M. LaValle

機構： Center for Ubiquitous Computing, University of Oulu, Oulu, Finland, Centro de Investigacion en Matematicas (CIMAT), Guanajuato, Mexico

備注：Accepted for publication in EuroXR 2021

摘要：在本文中，我們分析了不同的路徑方面如何影響使用者的體驗，主要是虛拟現實疾病和整體舒适度，同時通過虛拟現實耳機沉浸在自主移動的臨場感機器人中。特别是，我們關注機器人如何轉動以及它與物體保持的距離，目的是為自主移動的沉浸式臨場感機器人規劃合适的軌迹；衆所周知，旋轉加速度會導緻大多數VR疾病，而物體的距離會調節光流。我們進行了一項受試者内部使用者研究（n=36，女性=18），參與者在虛拟博物館中觀看了三個全景視訊，同時乘坐一個自主移動的遠端呈現機器人，走三條不同的路徑，在轉彎、速度或到牆壁和物體的距離等方面有所不同。我們發現，通過SSQ測量的使用者疾病與所有路徑的6點Likert量表上的舒适度之間存在中度相關性。然而，我們沒有發現疾病與最舒适路徑的選擇之間的關聯，表明疾病不是影響使用者舒适度的唯一因素。轉向速度的主觀體驗與SSQ分數或舒适度均不相關，盡管人們在開放式問題中經常提到轉向速度是不适的來源。通過更仔細地探索開放式答案，一個可能的原因是長度和缺乏可預測性也在很大程度上導緻人們觀察轉彎時感到不舒服。在定量和定性資料中，與牆壁和物體的較大主觀距離增加了舒适度，減少了疾病。最後，SSQ分量表和權重總分顯示出年齡組和性别的差異。

摘要：In this paper, we analyze how different path aspects affect a user's experience, mainly VR sickness and overall comfort, while immersed in an autonomously moving telepresence robot through a virtual reality headset. In particular, we focus on how the robot turns and the distance it keeps from objects, with the goal of planning suitable trajectories for an autonomously moving immersive telepresence robot in mind; rotational acceleration is known for causing the majority of VR sickness, and distance to objects modulates the optical flow. We ran a within-subjects user study (n = 36, women = 18) in which the participants watched three panoramic videos recorded in a virtual museum while aboard an autonomously moving telepresence robot taking three different paths varying in aspects such as turns, speeds, or distances to walls and objects. We found a moderate correlation between the users' sickness as measured by the SSQ and comfort on a 6-point Likert scale across all paths. However, we detected no association between sickness and the choice of the most comfortable path, showing that sickness is not the only factor affecting the comfort of the user. The subjective experience of turn speed did not correlate with either the SSQ scores or comfort, even though people often mentioned turning speed as a source of discomfort in the open-ended questions. Through exploring the open-ended answers more carefully, a possible reason is that the length and lack of predictability also play a large role in making people observe turns as uncomfortable. A larger subjective distance from walls and objects increased comfort and decreased sickness both in quantitative and qualitative data. Finally, the SSQ subscales and total weighted scores showed differences by age group and by gender.

【13】 Safe, Deterministic Trajectory Planning for Unstructured and Partially Occluded Environments

标題：非結構化和部分遮擋環境下的安全确定性軌迹規劃

連結：https://arxiv.org/abs/2109.04175

作者：Sebastian vom Dorff,Maximilian Kneissl,Martin Fränzle

摘要：確定自動化車輛在不受管制的交通區域内的安全行為對該行業構成了複雜的挑戰。為這一挑戰提供可擴充和可認證的解決方案是一個公開的問題。我們推導了一種基于模型預測控制的軌迹規劃器，它與基于元胞自動機的行人安全監控系統互操作。以一個狹窄的室内停車場環境為例，示範了組合式planner監控系統。該系統的特點是确定性行為，減輕了黑匣子的内在風險，并提供完全的可認證性。通過使用行人的基本和保守預測模型，螢幕能夠确定部分閉塞和非結構化停車環境中的安全駕駛區域。該資訊被回報給軌迹規劃器，該規劃器通過限制優化確定車輛随時保持在安全駕駛區域。我們将展示該方法如何在停車場擁擠的情況下解決大量問題。盡管采用了保守的預測模型，但評估結果表明，測試的低速導航系統性能良好。

摘要：Ensuring safe behavior for automated vehicles in unregulated traffic areas poses a complex challenge for the industry. It is an open problem to provide scalable and certifiable solutions to this challenge. We derive a trajectory planner based on model predictive control which interoperates with a monitoring system for pedestrian safety based on cellular automata. The combined planner-monitor system is demonstrated on the example of a narrow indoor parking environment. The system features deterministic behavior, mitigating the immanent risk of black boxes and offering full certifiability. By using fundamental and conservative prediction models of pedestrians the monitor is able to determine a safe drivable area in the partially occluded and unstructured parking environment. The information is fed to the trajectory planner which ensures the vehicle remains in the safe drivable area at any time through constrained optimization. We show how the approach enables solving plenty of situations in tight parking garage scenarios. Even though conservative prediction models are applied, evaluations indicate a performant system for the tested low-speed navigation.

【14】 Self-supervised Reinforcement Learning with Independently Controllable Subgoals

标題：具有獨立可控子目标的自監督強化學習

連結：https://arxiv.org/abs/2109.04150

作者：Andrii Zadaianchuk,Georg Martius,Fanny Yang

機構： Max Planck Institute for Intelligent Systems, T¨ubingen, Germany, Department of Computer Science, ETH Zurich

摘要：為了成功地處理具有挑戰性的操作任務，自治代理必須學習多種技能，以及如何将它們結合起來。最近，通過利用環境中發現的結構來設定自己抽象目标的自監督代理在許多不同的任務中表現良好。特别是，其中一些用于學習合成多對象環境中的基本操作技能。但是，這些方法學習技能時不考慮對象之間的依賴關系。是以，所學的技能很難在現實環境中結合起來。我們提出了一種新的自監督代理，它估計環境元件之間的關系，并使用它們獨立地控制環境狀态的不同部分。此外，對象之間的估計關系可用于将複雜目标分解為相容的子目标序列。我們證明，通過使用該架構，agent可以在對象間關系不同的多對象環境中高效、自動地學習操作任務。

摘要：To successfully tackle challenging manipulation tasks, autonomous agents must learn a diverse set of skills and how to combine them. Recently, self-supervised agents that set their own abstract goals by exploiting the discovered structure in the environment were shown to perform well on many different tasks. In particular, some of them were applied to learn basic manipulation skills in compositional multi-object environments. However, these methods learn skills without taking the dependencies between objects into account. Thus, the learned skills are difficult to combine in realistic environments. We propose a novel self-supervised agent that estimates relations between environment components and uses them to independently control different parts of the environment state. In addition, the estimated relations between objects can be used to decompose a complex goal into a compatible sequence of subgoals. We show that, by using this framework, an agent can efficiently and automatically learn manipulation tasks in multi-object environments with different relations between objects.

【15】 Robot Localization and Navigation through Predictive Processing using LiDAR

标題：基于雷射雷達預測處理的機器人定位與導航

連結：https://arxiv.org/abs/2109.04139

作者：Daniel Burghardt,Pablo Lanillos

機構： Lanillos 2 1 Radboud University, NL 2 Donders Institute for Brain, Department of ArtificialIntelligence

備注：2nd International Workshop on Active Inference IWAI2021, European Conference on Machine Learning (ECML/PCKDD 2021)

摘要：了解機器人在世界上的位置對于導航至關重要。如今，貝葉斯濾波器，如卡爾曼濾波和基于粒子濾波，是移動機器人的标準方法。最近，端到端學習允許擴充到高維輸入并改進泛化。然而，提供可靠的雷射導航仍然存在局限性。這裡，我們展示了一種基于預測處理的感覺方法的概念證明，該方法應用于使用雷射傳感器的定位和導航，無需裡程計。我們通過自監督學習學習雷射器的生成模型，并在變分自由能界上通過随機梯度下降進行線上狀态估計和導航。我們在露台上裝有雷射傳感器（SICK）的移動機器人（TIAGo基地）上評估了該算法。結果表明，在沒有裡程計的情況下，與最先進的粒子濾波器相比，狀态估計性能有所提高。此外，與标準的貝葉斯估計方法相反，我們的方法還使機器人能夠在提供所需目标時通過推斷使預測誤差最小化的動作進行導航。

摘要：Knowing the position of the robot in the world is crucial for navigation. Nowadays, Bayesian filters, such as Kalman and particle-based, are standard approaches in mobile robotics. Recently, end-to-end learning has allowed for scaling-up to high-dimensional inputs and improved generalization. However, there are still limitations to providing reliable laser navigation. Here we show a proof-of-concept of the predictive processing-inspired approach to perception applied for localization and navigation using laser sensors, without the need for odometry. We learn the generative model of the laser through self-supervised learning and perform both online state-estimation and navigation through stochastic gradient descent on the variational free-energy bound. We evaluated the algorithm on a mobile robot (TIAGo Base) with a laser sensor (SICK) in Gazebo. Results showed improved state-estimation performance when comparing to a state-of-the-art particle filter in the absence of odometry. Furthermore, conversely to standard Bayesian estimation approaches our method also enables the robot to navigate when providing the desired goal by inferring the actions that minimize the prediction error.

【16】 Learning Cross-Scale Visual Representations for Real-Time Image Geo-Localization

标題：用于實時圖像地理定位的跨尺度視覺表示學習

連結：https://arxiv.org/abs/2109.04087

作者：Tianyi Zhang,Matthew Johnson-Roberson

機構： Zhang is with the Robotics Institute, University of Michigan, Johnson-Roberson is with the Department of Naval Architectureand Marine Engineering

摘要：在GPS拒絕的環境中，機器人定位仍然是一項具有挑戰性的任務。基于局部傳感器（如攝像頭或IMU）的狀态估計方法，随着誤差累積，在遠端任務中很容易漂移。在這項研究中，我們的目标是通過在二維多模态地理空間地圖中定位圖像觀測來解決這個問題。我們介紹了跨尺度資料集和從跨模态來源生成附加資料的方法。我們提出了一個架構，學習跨尺度的視覺表示沒有監督。實驗是在水下和空中兩個不同領域的資料上進行的。與現有的交叉圖像地理定位研究相比，我們的方法a）在小比例尺多模态地圖上表現更好；b）對于實時應用而言，計算效率更高；c）可直接與狀态估計管道配合使用。

摘要：Robot localization remains a challenging task in GPS denied environments. State estimation approaches based on local sensors, e.g. cameras or IMUs, are drifting-prone for long-range missions as error accumulates. In this study, we aim to address this problem by localizing image observations in a 2D multi-modal geospatial map. We introduce the cross-scale dataset and a methodology to produce additional data from cross-modality sources. We propose a framework that learns cross-scale visual representations without supervision. Experiments are conducted on data from two different domains, underwater and aerial. In contrast to existing studies in cross-view image geo-localization, our approach a) performs better on smaller-scale multi-modal maps; b) is more computationally efficient for real-time applications; c) can serve directly in concert with state estimation pipelines.

【17】 Risk-Averse Decision Making Under Uncertainty

标題：不确定條件下的風險規避決策

連結：https://arxiv.org/abs/2109.04082

作者：Mohamadreza Ahmadi,Ugo Rosolia,Michel D. Ingham,Richard M. Murray,Aaron D. Ames

機構： Ames are with Controland Dynamical Systems (CDS) at the California Institute of Technol-ogy

備注：arXiv admin note: substantial text overlap with arXiv:2012.02423

摘要：不确定性問題下的一大類決策可以通過馬爾可夫決策過程（MDP）或部分可觀測MDP（POMDP）來描述，并應用于人工智能和運籌學等領域。傳統上，政策綜合技術的提出使得總的預期成本或回報最小化或最大化。然而，隻有當大量運作中的系統行為值得關注時，總預期成本意義上的最優才是合理的，這限制了在實際任務關鍵場景中使用此類政策，其中，與預期行為的大偏差可能導緻任務失敗。在本文中，我們考慮的設計問題的MDPs和POMDPs的目标和限制的動态連貫風險措施，我們稱之為限制風險規避問題。對于MDP，我們通過拉格朗日架構将問題轉化為infsup問題，并提出了一種基于優化的方法來綜合馬爾可夫政策。對于MDP，我們證明了公式化的優化問題是以差分凸規劃（DCP）的形式存在的，并且可以通過限制凸凹規劃（DCCP）架構來解決。我們證明了這些結果推廣了具有總貼現期望成本和限制的限制MDP的線性規劃。對于POMDP，我們證明，如果一緻風險度量可以定義為馬爾可夫風險轉移映射，則可以使用無限維優化來設計基于馬爾可夫信念的政策。對于随機有限狀态控制器（FSC），我們證明後者的優化簡化為（有限維）DCP，并且可以通過DCCP架構來解決。我們将這些DCP合并到政策疊代算法中，為POMDP設計風險規避FSC。

摘要：A large class of decision making under uncertainty problems can be described via Markov decision processes (MDPs) or partially observable MDPs (POMDPs), with application to artificial intelligence and operations research, among others. Traditionally, policy synthesis techniques are proposed such that a total expected cost or reward is minimized or maximized. However, optimality in the total expected cost sense is only reasonable if system behavior in the large number of runs is of interest, which has limited the use of such policies in practical mission-critical scenarios, wherein large deviations from the expected behavior may lead to mission failure. In this paper, we consider the problem of designing policies for MDPs and POMDPs with objectives and constraints in terms of dynamic coherent risk measures, which we refer to as the constrained risk-averse problem. For MDPs, we reformulate the problem into a infsup problem via the Lagrangian framework and propose an optimization-based method to synthesize Markovian policies. For MDPs, we demonstrate that the formulated optimization problems are in the form of difference convex programs (DCPs) and can be solved by the disciplined convex-concave programming (DCCP) framework. We show that these results generalize linear programs for constrained MDPs with total discounted expected costs and constraints. For POMDPs, we show that, if the coherent risk measures can be defined as a Markov risk transition mapping, an infinite-dimensional optimization can be used to design Markovian belief-based policies. For stochastic finite-state controllers (FSCs), we show that the latter optimization simplifies to a (finite-dimensional) DCP and can be solved by the DCCP framework. We incorporate these DCPs in a policy iteration algorithm to design risk-averse FSCs for POMDPs.

【18】 Keeping an Eye on Things: Deep Learned Features for Long-Term Visual Localization

标題：緊盯事物：長期視覺定位的深層習得特征

連結：https://arxiv.org/abs/2109.04041

作者：Mona Gridseth,Timothy D. Barfoot

機構： while theAllauthorsarewiththeUniversityofTorontoInstituteforAerospaceStudies(UTIAS)

摘要：在本文中，我們學習視覺特征，我們使用這些特征首先建構一張地圖，然後定位一個機器人在一整天的燈光變化中（包括在黑暗中）自主駕駛。我們訓練了一個神經網絡來預測稀疏關鍵點，該網絡具有相關的描述符和分數，可以與經典的姿勢估計器一起用于定位。我們的訓練管道包括一個可微分的姿勢估計器，這樣就可以使用先前收集的資料中的地面真實姿勢對訓練進行監督，在我們2016年和2017年的案例中，這些資料是通過多體驗視覺教學和重複（VT&R）收集的。然後，我們将學習到的特性插入現有VT&R管道，以便在非結構化室外環境中執行閉環路徑跟蹤。我們展示了在所有照明條件下的成功路徑跟蹤，盡管機器人的地圖是使用日光條件建構的。此外，我們還通過在特征訓練資料集中不存在的兩個新區域駕駛機器人穿越所有照明條件，探索特征的普遍性。總之，我們在具有挑戰性的條件下通過30公裡的自主路徑跟蹤實驗驗證了我們的方法。

摘要：In this paper, we learn visual features that we use to first build a map and then localize a robot driving autonomously across a full day of lighting change, including in the dark. We train a neural network to predict sparse keypoints with associated descriptors and scores that can be used together with a classical pose estimator for localization. Our training pipeline includes a differentiable pose estimator such that training can be supervised with ground truth poses from data collected earlier, in our case from 2016 and 2017 gathered with multi-experience Visual Teach and Repeat (VT&R). We then insert the learned features into the existing VT&R pipeline to perform closed-loop path-following in unstructured outdoor environments. We show successful path following across all lighting conditions despite the robot's map being constructed using daylight conditions. Moreover, we explore generalizability of the features by driving the robot across all lighting conditions in two new areas not present in the feature training dataset. In all, we validated our approach with 30 km of autonomous path-following experiments in challenging conditions.

【19】 Taxim: An Example-based Simulation Model for GelSight Tactile Sensors

标題：Taxim：一種基于執行個體的GelSight觸覺傳感器仿真模型

連結：https://arxiv.org/abs/2109.04027

作者：Zilin Si,Wenzhen Yuan

機構： as the contact medium for 1Zilin Si and Wenzhen Yuan are with the Robotics Institute, CarnegieMellon University

摘要：仿真在機器人技術中被廣泛用于系統驗證和大規模資料采集。然而，模拟傳感器，包括觸覺傳感器，一直是一個長期的挑戰。在本文中，我們提出了TAXM，一個基于視覺的觸覺傳感器GelSight的真實感和高速仿真模型。GelSight傳感器使用一塊軟彈性體作為接觸媒體，并嵌入光學結構以捕獲彈性體的變形，進而推斷出接觸面上施加的幾何形狀和力。我們提出了一種基于示例的GelSight模拟方法：使用多項式查找表模拟變形的光學響應。此表将變形幾何體映射到嵌入式相機采樣的像素強度。為了模拟彈性體表面拉伸引起的表面标記的運動，我們應用了線彈性變形理論和疊加原理。仿真模型使用實際傳感器提供的不到100個資料點進行校準。基于示例的方法使模型能夠輕松移植到其他GelSight傳感器或其變體。據我們所知，我們的模拟架構是第一個将源自彈性體變形的标記運動場模拟與光學模拟相結合的架構，建立了一個全面且計算效率高的觸覺模拟架構。實驗表明，與以前的工作相比，我們的光學模拟具有最低的像素級強度誤差，并且可以通過CPU計算線上運作。

摘要：Simulation is widely used in robotics for system verification and large-scale data collection. However, simulating sensors, including tactile sensors, has been a long-standing challenge. In this paper, we propose Taxim, a realistic and high-speed simulation model for a vision-based tactile sensor, GelSight. A GelSight sensor uses a piece of soft elastomer as the medium of contact and embeds optical structures to capture the deformation of the elastomer, which infers the geometry and forces applied at the contact surface. We propose an example-based method for simulating GelSight: we simulate the optical response to the deformation with a polynomial look-up table. This table maps the deformed geometries to pixel intensity sampled by the embedded camera. In order to simulate the surface markers' motion that is caused by the surface stretch of the elastomer, we apply the linear elastic deformation theory and the superposition principle. The simulation model is calibrated with less than 100 data points from a real sensor. The example-based approach enables the model to easily migrate to other GelSight sensors or its variations. To the best of our knowledge, our simulation framework is the first to incorporate marker motion field simulation that derives from elastomer deformation together with the optical simulation, creating a comprehensive and computationally efficient tactile simulation framework. Experiments reveal that our optical simulation has the lowest pixel-wise intensity errors compared to prior work and can run online with CPU computing.

【20】 Active Multi-Object Exploration and Recognition via Tactile Whiskers

标題：基于觸覺胡須的主動多目标探測與識别

連結：https://arxiv.org/abs/2109.03976

作者：Chenxi Xiao,Shujia Xu,Wenzhuo Wu,Juan Wachs

機構： While some of this can be found through specializedChenxi Xiao is with the School of Industrial Engineering at PurdueUniversity, eduShujia Xu is with the School of Industrial Engineering at Purdue University

摘要：當光學資訊不可用時，機器人在不确定環境下的探索具有挑戰性。在本文中，我們提出了一種基于觸覺感覺的未知任務空間探索的自主解決方案。我們首先設計了一種基于MEMS氣壓計器件的晶須傳感器。該傳感器可以通過與環境的非侵入性互動來擷取接觸資訊。這種傳感器伴随着一種規劃技術，通過僅僅使用觸覺感覺來生成探索軌迹。該技術依賴于觸覺探索的混合政策，包括用于對象搜尋的主動資訊路徑規劃器和用于輪廓跟蹤的反應式Hopf振蕩器。結果表明，混合探測政策可以提高目标發現的效率。最後，通過分割對象和分類來促進場景了解。基于晶須傳感器采集的幾何特征，開發了一種識别目标類别的分類器。這種方法表明，觸須傳感器與觸覺智能一起，可以提供足夠的鑒别特征來區分物體。

摘要：Robotic exploration under uncertain environments is challenging when optical information is not available. In this paper, we propose an autonomous solution of exploring an unknown task space based on tactile sensing alone. We first designed a whisker sensor based on MEMS barometer devices. This sensor can acquire contact information by interacting with the environment non-intrusively. This sensor is accompanied by a planning technique to generate exploration trajectories by using mere tactile perception. This technique relies on a hybrid policy for tactile exploration, which includes a proactive informative path planner for object searching, and a reactive Hopf oscillator for contour tracing. Results indicate that the hybrid exploration policy can increase the efficiency of object discovery. Last, scene understanding was facilitated by segmenting objects and classification. A classifier was developed to recognize the object categories based on the geometric features collected by the whisker sensor. Such an approach demonstrates the whisker sensor, together with the tactile intelligence, can provide sufficiently discriminative features to distinguish objects.

【21】 Quality-Diversity Meta-Evolution: customising behaviour spaces to a meta-objective

标題：品質多樣性元進化：為元目标定制行為空間

連結：https://arxiv.org/abs/2109.03918

作者：David M. Bossens,Danesh Tarapore

摘要：品質多樣性（QD）算法進化出行為多樣性和高性能的解決方案。為了闡明行為空間的精英解決方案，QD算法需要定義合适的行為空間。如果行為空間是高維的，則需要适當的降維技術來維持有限數量的行為生态位。雖然目前的自動化行為空間方法側重于改變幾何結構或無監督學習，但仍然需要根據最終使用者指定的特定元目标定制行為多樣性。在新出現的量子點元進化（簡稱量子點元進化）架構中，人們進化出一批量子點算法，每個算法都具有不同的算法和代表性特征，以優化算法及其結果檔案，進而實作使用者定義的元目标。盡管與傳統QD算法相比，QD Meta的結果很有希望，但尚未與最先進的行為空間自動化方法進行比較，如形心Voronoi鑲嵌多元表型精英存檔算法（CVT MAP Elites）和實作其能力的自主機器人（AURORA）。本文對QD元函數優化和多足機器人運動基準進行了實證研究。結果表明，與CVT地圖精英和極光相比，QD元檔案提供了更好的平均性能和更快的适應環境先驗未知變化的能力。定性分析表明，生成的歸檔檔案是如何根據最終使用者提供的元目标進行定制的。

摘要：Quality-Diversity (QD) algorithms evolve behaviourally diverse and high-performing solutions. To illuminate the elite solutions for a space of behaviours, QD algorithms require the definition of a suitable behaviour space. If the behaviour space is high-dimensional, a suitable dimensionality reduction technique is required to maintain a limited number of behavioural niches. While current methodologies for automated behaviour spaces focus on changing the geometry or on unsupervised learning, there remains a need for customising behavioural diversity to a particular meta-objective specified by the end-user. In the newly emerging framework of QD Meta-Evolution, or QD-Meta for short, one evolves a population of QD algorithms, each with different algorithmic and representational characteristics, to optimise the algorithms and their resulting archives to a user-defined meta-objective. Despite promising results compared to traditional QD algorithms, QD-Meta has yet to be compared to state-of-the-art behaviour space automation methods such as Centroidal Voronoi Tessellations Multi-dimensional Archive of Phenotypic Elites Algorithm (CVT-MAP-Elites) and Autonomous Robots Realising their Abilities (AURORA). This paper performs an empirical study of QD-Meta on function optimisation and multilegged robot locomotion benchmarks. Results demonstrate that QD-Meta archives provide improved average performance and faster adaptation to a priori unknown changes to the environment when compared to CVT-MAP-Elites and AURORA. A qualitative analysis shows how the resulting archives are tailored to the meta-objectives provided by the end-user.

【22】 Dynamic Locomotion Teleoperation of a Wheeled Humanoid Robot Reduced Model with a Whole-Body Human-Machine Interface

标題：具有全身人機接口的輪式仿人機器人簡化模型的動态運動遙操作

連結：https://arxiv.org/abs/2109.03906

作者：Sunyu Wang,Joao Ramos

機構： 1The author was with the Department of Mechanical Science and Engi-neering at the University of Illinois at Urbana-Champaign, The authoris currently with the Robotics Institute at Carnegie Mellon University

摘要：雙邊遙操作為仿人機器人提供了人類規劃智能，同時使人類能夠感受機器人的感受。它有可能将具有實體能力的仿人機器人轉變為動态智能機器人。然而，由于涉及複雜的動力學，動态雙邊移動遙操作仍然是一個挑戰。這項工作提出了我們的初步步驟，以解決這一挑戰，通過概念的輪式拟人機器人移動遙操作的身體傾斜。具體而言，我們開發了一個具有力回報功能的全身人機界面（HMI），并設計了一個力回報映射和兩個遙操作映射，将人體傾斜映射到機器人的速度或加速度。我們比較了這兩種映射，并通過一個實驗研究了力回報的效果，其中七名受試者通過HMI遙控模拟機器人執行動态目标跟蹤任務。實驗結果表明，所有受試者在練習後都使用這兩種映射完成了任務，并且力回報改善了他們的表現。然而，受試者表現出兩種不同的遙操作方式，這兩種方式從力回報中受益不同。此外，力回報影響受試者對遙操作映射的偏好，盡管大多數受試者在速度映射中表現更好。

摘要：Bilateral teleoperation provides humanoid robots with human planning intelligence while enabling the human to feel what the robot feels. It has the potential to transform physically capable humanoid robots into dynamically intelligent ones. However, dynamic bilateral locomotion teleoperation remains as a challenge due to the complex dynamics it involves. This work presents our initial step to tackle this challenge via the concept of wheeled humanoid robot locomotion teleoperation by body tilt. Specifically, we developed a force-feedback-capable whole-body human-machine interface (HMI), and designed a force feedback mapping and two teleoperation mappings that map the human's body tilt to the robot's velocity or acceleration. We compared the two mappings and studied the force feedback's effect via an experiment, where seven human subjects teleoperated a simulated robot with the HMI to perform dynamic target tracking tasks. The experimental results suggest that all subjects accomplished the tasks with both mappings after practice, and the force feedback improved their performances. However, the subjects exhibited two distinct teleoperation styles, which benefited from the force feedback differently. Moreover, the force feedback affected the subjects' preferences on the teleoperation mappings, though most subjects performed better with the velocity mapping.

【23】 Interpretable Run-Time Prediction and Planning in Co-Robotic Environments

标題：協同機器人環境中可解釋的運作時預測與規劃

連結：https://arxiv.org/abs/2109.03893

作者：Rahul Peddi,Nicola Bezzo

機構： If robotsRahul Peddi and Nicola Bezzo are with the Department of Systemsand Information Engineering and the Charles L, Brown Department ofElectrical and Computer Engineering, University of Virginia

備注：Final version to be presented at IROS 2021

摘要：傳統上，移動機器人的發展方向是反應靈敏，避免與周圍的人發生碰撞，通常在不遵守社交協定的情況下以不自然的方式移動，迫使人們的行為與人類互動規則非常不同。另一方面，人類能夠無縫地了解為什麼他們可能會幹擾周圍的人，并根據他們的推理改變他們的行為，進而産生平滑、直覺的避免行為。在本文中，我們提出了一種移動機器人避免幹擾周圍人類期望路徑的方法。我們利用先前觀察到的軌迹庫來設計基于決策樹的可解釋螢幕：i）預測機器人是否幹擾周圍的人類，ii）解釋導緻預測的行為，以及iii）如果預測到幹擾，則計劃糾正行為。我們還提出了一個在運作時改進預測模型的驗證方案。該方法通過無人地面車輛（UGV）在有人在場的情況下執行目标操作、示範無幹擾行為和運作時學習的仿真和實驗進行了驗證。

摘要：Mobile robots are traditionally developed to be reactive and avoid collisions with surrounding humans, often moving in unnatural ways without following social protocols, forcing people to behave very differently from human-human interaction rules. Humans, on the other hand, are seamlessly able to understand why they may interfere with surrounding humans and change their behavior based on their reasoning, resulting in smooth, intuitive avoiding behaviors. In this paper, we propose an approach for a mobile robot to avoid interfering with the desired paths of surrounding humans. We leverage a library of previously observed trajectories to design a decision-tree based interpretable monitor that: i) predicts whether the robot is interfering with surrounding humans, ii) explains what behaviors are causing either prediction, and iii) plans corrective behaviors if interference is predicted. We also propose a validation scheme to improve the predictive model at run-time. The proposed approach is validated with simulations and experiments involving an unmanned ground vehicle (UGV) performing go-to-goal operations in the presence of humans, demonstrating non-interfering behaviors and run-time learning.

【24】 SORNet: Spatial Object-Centric Representations for Sequential Manipulation

标題：SORNet：面向順序操作的空間對象中心表示法

連結：https://arxiv.org/abs/2109.03891

作者：Wentao Yuan,Chris Paxton,Karthik Desingh,Dieter Fox

機構：University of Washington, NVIDIA

摘要：順序操作任務要求機器人感覺環境狀态并規劃一系列動作，進而達到所需的目标狀态，從原始傳感器輸入推斷對象實體之間的空間關系的能力至關重要。以前的工作依賴于顯式狀态估計或端到端學習與新對象進行鬥争。在這項工作中，我們提出了SORNet（空間以對象為中心的表示網絡），它從RGB圖像中提取以對象為中心的表示，并以感興趣對象的規範視圖為條件。我們發現，SORNet學習的對象嵌入在三個空間推理任務上（空間關系分類、技能前提分類和相對方向回歸）将Zero-Shot推廣到看不見的對象實體，顯著優于基線。此外，我們提供了真實世界的機器人實驗，示範了學習對象嵌入在順序操作任務規劃中的使用。

摘要：Sequential manipulation tasks require a robot to perceive the state of an environment and plan a sequence of actions leading to a desired goal state, where the ability to reason about spatial relationships among object entities from raw sensor inputs is crucial. Prior works relying on explicit state estimation or end-to-end learning struggle with novel objects. In this work, we propose SORNet (Spatial Object-Centric Representation Network), which extracts object-centric representations from RGB images conditioned on canonical views of the objects of interest. We show that the object embeddings learned by SORNet generalize zero-shot to unseen object entities on three spatial reasoning tasks: spatial relationship classification, skill precondition classification and relative direction regression, significantly outperforming baselines. Further, we present real-world robotic experiments demonstrating the usage of the learned object embeddings in task planning for sequential manipulation.

【25】 Integrated and Adaptive Guidance and Control for Endoatmospheric Missiles via Reinforcement Learning

标題：基于強化學習的大氣層内飛彈綜合自适應制導控制

連結：https://arxiv.org/abs/2109.03880

作者：Brian Gaudet,Isaac Charcos,Roberto Furfaro

機構：University of Arizona, E. Roger Way, Tucson Arizona

摘要：我們應用元強化學習架構對空空飛彈的內建自适應制導和飛行控制系統進行優化，并将該系統作為深度神經網絡（政策）實作。該政策将觀測值直接映射到飛彈控制面偏轉的指令變化率，通過最小處理從捷聯式導引頭測量的計算穩定視線機關矢量、速率陀螺儀估計的旋轉速度和控制面偏轉角導出觀測值。該系統針對機動目标誘導攔截軌迹，該軌迹滿足鳍偏轉角控制限制、視角和載荷路徑限制。我們在一個包括非線性天線罩模型和捷聯導引頭模型的六自由度模拟器中測試了優化後的系統。通過大量的仿真，我們證明了該系統能夠适應大的飛行包線和非标稱飛行條件，包括氣動系數參數和壓力中心位置的攝動。此外，我們發現該系統對天線罩折射、導引頭不穩定和傳感器比例因子誤差引起的寄生姿态回路具有魯棒性。最後，我們将我們的系統性能與兩個基準進行比較：簡化3自由度環境中的比例導航制導系統基準，我們将其作為單獨制導和飛行控制系統可達到的性能上限，以及比例導航縱向模型與三回路自動駕駛儀耦合。我們發現，我們的系統适度優于前者，并且大大優于後者。

摘要：We apply the meta reinforcement learning framework to optimize an integrated and adaptive guidance and flight control system for an air-to-air missile, implementing the system as a deep neural network (the policy). The policy maps observations directly to commanded rates of change for the missile's control surface deflections, with the observations derived with minimal processing from the computationally stabilized line of sight unit vector measured by a strap down seeker, estimated rotational velocity from rate gyros, and control surface deflection angles. The system induces intercept trajectories against a maneuvering target that satisfy control constraints on fin deflection angles, and path constraints on look angle and load. We test the optimized system in a six degrees-of-freedom simulator that includes a non-linear radome model and a strapdown seeker model. Through extensive simulation, we demonstrate that the system can adapt to a large flight envelope and off nominal flight conditions that include perturbation of aerodynamic coefficient parameters and center of pressure locations. Moreover, we find that the system is robust to the parasitic attitude loop induced by radome refraction, imperfect seeker stabilization, and sensor scale factor errors. Finally, we compare our system's performance to two benchmarks: a proportional navigation guidance system benchmark in a simplified 3-DOF environment, which we take as an upper bound on performance attainable with separate guidance and flight control systems, and a longitudinal model of proportional navigation coupled with a three loop autopilot. We find that our system moderately outperforms the former, and outperforms the latter by a large margin.

【26】 Recurrent Neural Network Controllers Synthesis with Stability Guarantees for Partially Observed Systems

标題：部分觀測系統具有穩定性保證的遞歸神經網絡控制器綜合

連結：https://arxiv.org/abs/2109.03861

作者：Fangda Gu,He Yin,Laurent El Ghaoui,Murat Arcak,Peter Seiler,Ming Jin

機構： University of California, Berkeley, Hearst Ave, Berkeley, California , University of Michigan, S State St, Ann Arbor, Michigan , Virginia Tech, Perry Street , Whittemore (,), Blacksburg, Virginia

摘要：神經網絡控制器由于其靈活性和表現力而在控制任務中得到廣泛應用。穩定性是安全關鍵動力系統的一個重要特性，而在許多情況下，部分觀測系統的穩定需要控制器保留和處理對過去的長期記憶。我們将重要的類遞歸神經網絡（RNN）作為非線性不确定部分觀測系統的動态控制器，并基于積分二次限制、S-引理和序列凸化得到凸穩定性條件。為了確定學習和控制過程中的穩定性，我們提出了一種投影政策梯度方法，該方法利用系統動力學的輕微附加資訊，在重新參數化的空間中疊代執行穩定性條件。數值實驗表明，與政策梯度相比，該方法在使用較少樣本的情況下學習穩定化控制器，并獲得更高的最終性能。

摘要：Neural network controllers have become popular in control tasks thanks to their flexibility and expressivity. Stability is a crucial property for safety-critical dynamical systems, while stabilization of partially observed systems, in many cases, requires controllers to retain and process long-term memories of the past. We consider the important class of recurrent neural networks (RNN) as dynamic controllers for nonlinear uncertain partially-observed systems, and derive convex stability conditions based on integral quadratic constraints, S-lemma and sequential convexification. To ensure stability during the learning and control process, we propose a projected policy gradient method that iteratively enforces the stability conditions in the reparametrized space taking advantage of mild additional information on system dynamics. Numerical experiments show that our method learns stabilizing controllers while using fewer samples and achieving higher final performance compared with policy gradient.

機器翻譯，僅供參考

機器人相關學術速遞[9.10]

繼續閱讀

哈登因賭氣無緣全明星？曾被裡弗斯多次批評本人回應ESPN：謊言

湖人官宣簽約伍德！繼續身披35号自信能奪冠名記曝光新賽季角色

鍵盤俠丨陣容深厚的煩惱！湖人首發如何選？球迷：4号位濃眉可怕

回顧：全球首個！馬斯克吹的牛，中國率先實作了，外媒：這對他打

靜觀NBA | 裡夫斯和這類拼圖強在哪？

伍德加盟湖人輪換怎麼變？隊記解析10人時間配置設定：詹皇或創生涯新低

靜觀NBA | 王身邊的影子刺客

4％奇迹？邁阿密季後賽形勢解析：最後8場能否逆襲就看梅西的了

最已陣？範德比爾特續約湖人，昔日天才五星高中生在洛城迎新生

紙貴洛城|詳解佩林卡夏窗封神操作：花小錢給湖人更新大陣容！

快船正式放棄交易哈登！美媒解析大胡子現實處境：與76人和解最可行

紙貴洛城|湖人球星考古：最佳配角“袋鼠男孩”吉姆·波拉德！

湖人正式裁掉漢密爾頓：11天前簽下合同訓練營大名單減至20人

鍵盤俠丨有效引援？烏布雷加盟76人，球迷：留住哈登沖出東部

湖人曬詹姆斯+伍德訓練照：新援苦練彌補短闆？美媒相信能有巨大貢獻

珍妮曝父親曾拒絕10億出售湖人美媒調侃：現在100億都不舍得賣