參考維基百科: https://en.wikipedia.org/wiki/Simultaneous_localization_and_mapping
你們叫他SLAM,我還是習慣叫他三維重建......當夏日的中午,你看到一隻鳥兒飛在叢林中,鳥兒偶爾被樹葉遮擋,時隐時現,但你總能估計出鳥兒下一時刻大緻的位置。在很多時刻,人腦潛意識下使用了濾波,機率密度估計的最大似然,整個過程表現模型為EM方法。
在機器人學中,SLAM為面對未知環境進行即時位置和地圖重建。現階段最流行的方法為基于視覺的濾波方法,比如離子濾波和EKF方法。基于雷達的方法也可以用于稀疏SLAM問題,卻缺乏一定的适用性和親人性。
在AI行業裡,總會這樣說,怎樣才能達到人的水準。SLAM問題對應了人熟悉未知環境并根據位置環境不斷确定自身位置的過程,這個過程中所獲的資訊一般隻是視覺資訊,是以,建構類人的方法,使用純視覺傳感器來完成SLAM成為一個更接近于AI的研究方向,且其可達性顯而易見。
古老的曆史
基于雷射雷達建構場景是古老并且可信的方法,雷射雷達也是精确場景使用最多的SLAM傳感器。它們提供機器人本體與周圍環境障礙物間的距離資訊。常見的雷射雷達,例如SICK、Velodyne還有我們國産的rplidar等,都可以拿來做SLAM。雷射雷達能以很高精度測出機器人周圍障礙點的角度和距離,進而很友善地實作SLAM、避障等功能。
Problem definition
Given a series of sensor observations o t {\displaystyle o_{t}}

over discrete time stepst {\displaystyle t}
, the SLAM problem is to compute an estimate of the agent's location x t {\displaystyle x_{t}}
and a map of the environmentm t {\displaystyle m_{t}}
. All quantities are usually probabilistic, so the objective is to compute:
P ( m t , x t | o 1 : t ) {\displaystyle P(m_{t},x_{t}|o_{1:t})}
Applying Bayes' rule gives a framework for sequentially updating the location posteriors, given a map and a transition functionP ( x t | x t − 1 ) {\displaystyle P(x_{t}|x_{t-1})}
,
P ( x t | o 1 : t , m t ) = ∑ m t − 1 P ( o t | x t , m t ) ∑ x t − 1 P ( x t | x t − 1 ) P ( x t − 1 | m t , o 1 : t − 1 ) / Z {\displaystyle P(x_{t}|o_{1:t},m_{t})=\sum _{m_{t-1}}P(o_{t}|x_{t},m_{t})\sum _{x_{t-1}}P(x_{t}|x_{t-1})P(x_{t-1}|m_{t},o_{1:t-1})/Z}
Similarly the map can be updated sequentially by
P ( m t | x t , o 1 : t ) = ∑ x t ∑ m t P ( m t | x t , m t − 1 , o t ) P ( m t − 1 , x t | o 1 : t − 1 , m t − 1 ) {\displaystyle P(m_{t}|x_{t},o_{1:t})=\sum _{x_{t}}\sum _{m_{t}}P(m_{t}|x_{t},m_{t-1},o_{t})P(m_{t-1},x_{t}|o_{1:t-1},m_{t-1})}
Like many inference problems, the solutions to inferring the two variables together can be found, to a local optimum solution, by alternating updates of the two beliefs in a form ofEM algorithm.
被看作一個EM方法
Algorithms
Statistical techniques used to approximate the above equations includeKalman filters,particle filters (aka. Monte Carlo methods) and scan matching of range data. They provide an estimation of the posterior probability function for the pose of the robot and for the parameters of the map.Set-membership techniques are mainly based on interval constraint propagation.[2][3] They provide a set which encloses the pose of the robot and a set approximation of the map.
Bundle adjustment is another popular technique for SLAM using image data. Bundle adjustment jointly estimates poses and landmark positions, increasing map fidelity, and is used in many recently commercialized SLAM systems such as Google'sProject Tango.
New SLAM algorithms remain an active research area, and are often driven by differing requirements and assumptions about the types of maps, sensors and models as detailed below. Many SLAM systems can be viewed as combinations of choices from each of these aspects.
兩種方法:濾波方法和平差方法。
Mapping
Topological maps are a method of environment representation which capture the connectivity (i.e.,topology) of the environment rather than creating a geometrically accurate map. Topological SLAM approaches have been used to enforce global consistency in metric SLAM algorithms.[4]
In contrast, grid maps use arrays (typically square or hexagonal) of discretized cells to represent a topological world, and make inferences about which cells are occupied. Typically the cells are assumed to be statistically independent in order to simplify computation. Under such assumption, P ( m t | x t , m t − 1 , o t ) {\displaystyle P(m_{t}|x_{t},m_{t-1},o_{t})}
are set to 1 if the new map's cells are consistent with the observation o t {\displaystyle o_{t}}

at locationx t {\displaystyle x_{t}}
and 0 if inconsistent.
地圖:表示真實環境的資料映射,一般使用拓撲圖地圖表示全局的地圖結構。另外的一種表達方式為網格地圖,網格地圖為離散坐标系的全覆寫。拓撲地圖和網格地圖有各自的優點,并适用于不同的環境。
Sensing
See also: 3D scanner
SLAM will always use several different types of sensors, and the powers and limits of various sensor types have been a major driver of new algorithms.[5] Statistical independence is the mandatory requirement to cope with metric bias and with noise in measures. Different types of sensors give rise to different SLAM algorithms whose assumptions are which are most appropriate to the sensors. At one extreme, laser scans or visual features provide details of a great many points within an area, sometimes rendering SLAM inference unnecessary because shapes in these point clouds can be easily and unambiguously aligned at each step viaimage registration. At the opposite extreme, tactile sensors are extremely sparse as they contain only information about points very close to the agent, so they require strong prior models to compensate in purely tactile SLAM. Most practical SLAM tasks fall somewhere between these visual and tactile extremes.
Sensor models divide broadly into landmark-based and raw-data approaches. Landmarks are uniquely identifiable objects in the world whose location can be estimated by a sensor—such as wifi access points or radio beacons. Raw-data approaches make no assumption that landmarks can be identified, and instead modelP ( o t | x t ) {\displaystyle P(o_{t}|x_{t})}
directly as a function of the location.
Optical sensors may be one-dimensional (single beam) or 2D- (sweeping)laser rangefinders, 3D High Definition LiDAR,3D Flash LIDAR, 2D or 3Dsonar sensors and one or more 2D cameras.[5] Since 2005, there has been intense research into VSLAM (visual SLAM) using primarily visual (camera) sensors, because of the increasing ubiquity of cameras such as those in mobile devices.[6] Visual and LIDAR sensors are informative enough to allow for landmark extraction in many cases. Other recent forms of SLAM include tactile SLAM[7] (sensing by local touch only), radar SLAM,[8] and wifi-SLAM (sensing by strengths of nearby wifi access points). Recent approaches apply quasi-optical wireless ranging for multi-lateration (RTLS) ormulti-angulation in conjunction with SLAM as a tribute to erratic wireless measures. A kind of SLAM for human pedestrians uses a shoe mountedinertial measurement unit as the main sensor and relies on the fact that pedestrians are able to avoid walls to automatically build floor plans of buildings. by anindoor positioning system.[9]
For some outdoor applications, the need for SLAM has been almost entirely removed due to high precision differentialGPS sensors. From a SLAM perspective, these may be viewed as location sensors whose likelihoods are so sharp that they completely dominate the inference. However GPS sensors may go down entirely or in performance on occasions, especially during times of military conflict which are of particular interest to some robotics applications.
傳感器:SLAM有時需要不同的傳感器,在資料擷取上表現為高斯獨立的性質。傳感器模型可分為基于标記的模型和基于原始資料的模型。其差別在于有沒有對資料進行标記和假設。
光學傳感器有一維和二維、三維雷射雷達,或者更多的是圖像傳感器。随着移動攝影裝置的普及,基于視覺的SLAM成為主流。
Kinematics modeling
The P ( x t | x t − 1 ) {\displaystyle P(x_{t}|x_{t-1})}
term represents the kinematics of the model, which usually include information about action commands given to a robot. As a part of the model, the kinematics of the robot is included, to improve estimates of sensing under conditions of inherent and ambient noise. The dynamic model balances the contributions from various sensors, various partial error models and finally comprises in a sharp virtual depiction as a map with the location and heading of the robot as some cloud of probability. Mapping is the final depicting of such model, the map is either such depiction or the abstract term for the model.
For 2D robots, the kinematics are usually given by a mixture of rotation and "move forward" commands, which are implemented with additional motor noise. Unfortunately the distribution formed by independent noise in angular and linear directions is non-Gaussian, but is often approximated by a Gaussian. An alternative approach is to ignore the kinematic term and read odometry data from robot wheels after each command—such data may then be treated as one of the sensors rather than as kinematics.
動力學模型:轉移機率用以描述環境模型,包含了運動模型和一部分環境噪音模型。對于2d機器人,動力學模型通常用一個旋轉和移動的混合模型進行表述,并附加一定的獨立噪音。由于噪音的分布一般不是高斯的,但接近于高斯,是以使用一個傳感器模型相對于動力學模型更好一些。
Multiple objects
The related problems of data association and computational complexity are among the problems yet to be fully resolved, for example the identification of multiple confusable landmarks. A significant recent advance in the feature-based SLAM literature involved the re-examination of the probabilistic foundation for Simultaneous Localisation and Mapping (SLAM) where it was posed in terms of multi-object Bayesian filtering with random finite sets that provide superior performance to leading feature-based SLAM algorithms in challenging measurement scenarios with high false alarm rates and high missed detection rates without the need for data association.[10]
Popular techniques for handling multiple objects includeJoint Probabilistic Data Association Filter (JPDAF) and probability hypothesis density filter (PHD).
多目标:資料關聯 和計算複雜度估計使用到 多個待分析的目标。一個有意義的提升是基于特征的SLAM的使用,廣泛使用機器學習的方法(如一個貝葉斯分類器)。
Moving objects
Non-static environments, such as those containing other vehicles or pedestrians, continue to present research challenges. SLAM with DATMO is a model which tracks moving objects in a similar way to the agent itself.[citation needed]
Loop closure
Loop closure is the problem of recognizing a previously-visited location and updates the beliefs accordingly. This can be a problem because model or algorithm errors can assign low priors to the location. Typical loop closure methods apply a second algorithm to measure some type of sensor similarity, and re-set the location priors when a match is detected. For example, this can be done by storing and comparingbag of words vectors of SIFT features from each previously visited location.
閉環檢測:是一個識别是否進入先前環境的過程,檢測到閉環可以相應的提高可信度,降低誤差。經典的閉環檢測的算法是使用基于局部特征SIFT的詞包模型,比對每一個走過的場景。
Exploration
"Active SLAM" studies the combined problem of SLAM with deciding where to move next in order to build the map as efficiently as possible. The need for active exploration is especially pronounced in sparse sensing regimes such as tactile SLAM. Active SLAM is generally performed by approximating the entropy of the map under hypothetical actions. "Multi agent SLAM" extends this problem to the case of multiple robots coordinating themselves to explore optimally.
動态 SLAM 把 Agent ”将要去哪兒“ 這個議題結合起來。
Biological inspiration
In neuroscience, the hippocampus appears to be involved in SLAM-like computations giving rise to place cells and has formed the basis for bio-inspired SLAM systems such as RatSLAM. Bio-inspired methods are not currently competitive with engineering approaches however[according to whom?].
Complexity
Researchers and experts in artificial intelligence have struggled to solve the SLAM problem in practical settings: that is, it required a great deal of computational power to sense a sizable area and process the resulting data to both map and localize.[11] A 2008 review of the topic summarized: "[SLAM] is one of the fundamental challenges of robotics . . . [but it] seems that almost all the current approaches can not perform consistent maps for large areas, mainly due to the increase of the computational cost and due to the uncertainties that become prohibitive when the scenario becomes larger."[12] Generally, complete 3D SLAM solutions are highly computationally intensive as they use complex real-time particle filters, sub-mapping strategies or hierarchical combination of metric topological representations, etc.[13] Robots that use embedded systems cannot fully implement SLAM because of their limitation in computing power. Nguyen V., Harati A., & Siegwart R. (2007) presented a fast, lightweight solution called OrthoSLAM, which breaks down the complexity of the environment into orthogonal planes. By mapping only the planes that are orthogonal to each other, the structure of most indoor environments can be estimated fairly accurately. OrthoSLAM algorithm reduces SLAM to a linear estimation problem since only a single line is processed at a time.[13]
複雜度:SLAM是一個計算複雜的問題,持續的曾來那個地圖和環境增加了過程的複雜度.........
Implementations
Various SLAM algorithms are implemented in the open-sourcerobot operating system (ROS) libraries.
執行環境:一般的SLAM算法,都實作了對ROS的相容,可以在ROS上運作。
History
A seminal work in SLAM is the research of R.C. Smith and P. Cheeseman on the representation and estimation of spatial uncertainty in 1986.[14][15] Other pioneering work in this field was conducted by the research group of Hugh F. Durrant-Whyte in the early 1990s.[16] which showed that solutions to SLAM exist in the infinite data limit. This finding motivates the search for algorithms which are computationally tractable and approximate the solution. The self-driving STANLEY car won the DARPA Grand Challenge and included a SLAM system, bringing SLAM to worldwide attention. Mass-market SLAM implementations can now be found in consumer robot vacuum cleaners such as the Neato XV11.Self-driving cars by Google and others have now received licenses to drive on public roads in some US states.
SLAM問題最先是由Smith Self 和Cheeseman 在1988年提出來的,被認為是實作真正全自主移動機器人的關鍵。 由 Smith, R.C. and P. Cheeseman, 提出的論文:On the Representation and Estimation of Spatial Uncertainty. 《International Journal of Robotics Research》, 1986. 5(4): p. 56 -- 68. 以濾波的形式表示SLAM問題。
基于統計的濾波器思路,把SLAM寫成一個運動方程和觀測方程,以最小化這兩個方程中的噪聲項為目的,使用典型的濾波器思路來解決SLAM問題。
具體方法:當一個幀到達時,我們能(通過碼盤或IMU)測出該幀與上一幀的相對運動,但是存在噪聲,是為運動方程。同時,通過傳感器對路标的觀測,我們測出了機器人與路标間的位姿關系,同樣也帶有噪聲,是為觀測方程。通過這兩者資訊,我們可以預測出機器人在目前時刻的位置。同樣,根據以往記錄的路标點,我們又能計算出一個卡爾曼增益,以補償噪聲的影響。于是,對目前幀和路标的估計,即是這個預測與更新的不斷疊代的過程。
随後的時間裡,Se, S., D. Lowe and J. Little,的論文: Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks. 《The international Journal of robotics Research》, 2002. 21(8): p. 735--758.中使用了卡爾曼濾波的方法。
使用EKF的方法是一段時間内的SLAM主流方法。
EKF來由:KF使用高斯分布表示後驗機率
,這個高斯分布的均值為ut,協方差記為Et. 高斯線性的主要問題在于,他們隻有在運動模型f 和 測量模型h 是線性的情況下才比較接近實際情況。對于非線性的f 和h ,濾波器更新的結果往往不是高斯的。是以使用KF的定位算法需要将運動模型和傳感器模型進行線性化(linearize),使用一個線性函數對非線性函數進行局部近似。通過一個一階泰勒展開式對 f 和 h 進行線性化的卡爾曼濾波器為EKF。
稀疏圖集:SALM問題通過不同的機率技術得到解決,包括EKF方法。使用EKF方法隻需增加狀态向量和環境中地标的位置,且EKF方法對地圖和狀态向量進行二次更新(O(n^2))。對于小地圖,計算方法是可行的。對于大地圖,則常常使用圖松弛-稀疏圖集的方法進行地圖更新。
BA方法:21世紀之後,SLAM研究者開始借鑒SFM(Structure from Motion)問題中的方法,把平差方法--捆集優化BA(Bundle Adjustment)引入到SLAM中來。優化方法和濾波器方法有根本上的不同。它并不是一個疊代的過程,而是考慮過去所有幀中的資訊,這是一個最小二乘和最大似然的差別。通過優化,把誤差平均分到每一次觀測當中。在SLAM中的Bundle Adjustment常常以圖的形式給出,是以研究者亦稱之為圖優化方法(Graph Optimization)。圖優化可以直覺地表示優化問題,可利用稀疏代數進行快速的求解,表達回環也十分的友善,因而成為現今視覺SLAM中主流的優化方法。
The Future
SLAM始終是一個計算複雜的問題,增量地圖的更新帶來的計算代價是不可避免的,正是由于計算能力的提升,以及殘差和函數矩陣的稀疏性的認識,BA方法在視覺SLAM才有更廣泛的流行。實時性需要有強有力的計算能力作為支撐,硬體的提升是促進SLAM發展的潛力所在。