MovingPandas是一個基于Python和GeoPandas的開源地理時空資料處理庫,用于處理移動物體的軌迹資料。它提供了一組強大的工具,可以輕松地加載、分析和可視化移動物體的軌迹。通過使用MovingPandas,使用者可以輕松地處理和分析移動對象資料,并從中提取有關行為、模式和趨勢的見解。無論是處理交通流量資料、物流軌迹資料還是動物遷徙資料,MovingPandas都是一個強大的地理可視化工具。
MovingPandas庫具有以下主要特性:
- 軌迹對象:MovingPandas引入了Trajectory對象,用于表示和管理軌迹資料。該對象包含了時間、位置和屬性資訊,友善對軌迹進行操作和分析。
- 時空切片:MovingPandas支援時空切片操作,可以按照時間和空間條件對軌迹資料進行篩選和分割。
- 軌迹分析:MovingPandas提供了豐富的軌迹分析功能,包括計算軌迹長度、速度、方向、加速度等名額,以及軌迹聚類、互動和相似性分析。
- 軌迹可視化:MovingPandas可以友善地将軌迹資料可視化,支援在地圖上繪制軌迹線、點、熱力圖等,幫助使用者更直覺地了解移動物體的行為。
- 資料格式相容:MovingPandas支援多種常見的地理資料格式,包括GeoJSON、Shapefile等,友善使用者加載和儲存軌迹資料。
MovingPandas官方倉庫位址為:movingpandas。MovingPandas官方示例代碼倉庫位址為:movingpandas-examples。本文所有實驗資料來自于:movingpandas-examples-data。
本文所有代碼見:Python-Study-Notes
MovingPandas作者推薦在Python 3.8及以上環境下安裝MovingPandas,并建議使用conda進行安裝。可以使用以下指令來安裝MovingPandas:
conda install -c conda-forge movingpandas
由于MovingPandas的依賴環境較為複雜,是以不推薦使用pip進行安裝。如果堅持使用pip進行安裝,可以按照以下指令來安裝MovingPandas:
pip install movingpandas
# 本文必安裝第三方庫
pip install contextily
# 以下第三方庫可選
pip install hvplot
pip install cartopy
pip install geoviews
下面的代碼展示了MovingPandas的版本資訊,本文所用Python版本為Python3.10。
# jupyter notebook環境去除warning
import warnings
warnings.filterwarnings("ignore")
# 檢視movingpandas版本及依賴庫版本資訊
import movingpandas as mpd
mpd.show_versions()
MovingPandas 0.16.1
SYSTEM INFO
-----------
python : 3.10.10 (main, Mar 21 2023, 18:45:11) [GCC 11.2.0]
executable : /opt/conda/envs/python35-paddle120-env/bin/python
machine : Linux-5.4.0-104-generic-x86_64-with-glibc2.31
GEOS, GDAL, PROJ INFO
---------------------
GEOS : None
GEOS lib : None
GDAL : 3.6.4
GDAL data dir: /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/fiona/gdal_data
PROJ : 9.2.1
PROJ data dir: /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/pyproj/proj_dir/share/proj
PYTHON DEPENDENCIES
-------------------
geopandas : 0.13.2
pandas : 2.0.2
fiona : 1.9.4.post1
numpy : 1.24.3
shapely : 2.0.1
rtree : 1.0.1
pyproj : 3.6.0
matplotlib : 3.7.1
mapclassify: None
geopy : 2.3.0
holoviews : None
hvplot : None
geoviews : None
stonesoup : None
[toc]
1 使用入門
1.1 基礎結構
# 加載第三方庫
import pandas as pd
import geopandas as gpd
import movingpandas as mpd
import shapely as shp
# hvPlot是建立在資料可視化庫holoviews的一個進階封裝庫
# import hvplot.pandas
from geopandas import GeoDataFrame, read_file
from shapely.geometry import Point, LineString, Polygon
from datetime import datetime, timedelta
# hvplot通常與HoloViews內建使用
# from holoviews import opts
# 設定繪圖界面
# opts.defaults(opts.Overlay(active_tools=['wheel_zoom'], frame_width=500, frame_height=400))
1.1.1 Trajectory類
MovingPandas提供Trajectory對象用于表示單個軌迹的對象。它由一系列坐标點組成。Trajectory對象初始化參數如下:
class Trajectory:
def __init__(
self,
df,
traj_id,
obj_id=None,
t=None,
x=None,
y=None,
crs="epsg:4326",
parent=None,
):
其中各個參數解釋如下:
- df:具有GeoPandas的geometry坐标列和時間戳索引的GeoDataFrame,或Pandas的DataFrame。必填參數。
- traj_id:任意類型,表示軌迹的唯一辨別符。必填參數。
- obj_id:任意類型,表示移動物體的唯一辨別符。預設為 None。
- t:表示包含時間戳的列名,預設為 None。
- x:表示包含x坐标的列名,使用Pandas的DataFrame需指定。預設為 None。
- y:表示包含y坐标的列名,使用Pandas的DataFrame需指定。預設為 None。
- crs:表示 x/y 坐标的坐标參考系統。預設為 "epsg:4326",即 WGS84。
- parent:一個Trajectory 對象,表示父軌迹。預設為 None。
由Trajectory軌迹對象的初始化參數可知,MovingPandas用相關點位置資訊和其時間資訊來表示軌迹。下面用一個執行個體代碼來展示Trajectory對象的使用。
首先建立一個用于表示位置和其對應時間資訊的GeoDataFrame對象。
import pandas as pd
from shapely.geometry import Point
from geopandas import GeoDataFrame
import movingpandas as mpd
from datetime import datetime
# 建立DataFrame對象
df = pd.DataFrame([
{'geometry': Point(0, 0), 't': datetime(2023, 7, 1, 12, 0, 0)},
{'geometry': Point(6, 0), 't': datetime(2023, 7, 1, 12, 6, 0)},
{'geometry': Point(6, 6), 't': datetime(2023, 7, 1, 12, 10, 0)},
{'geometry': Point(9, 9), 't': datetime(2023, 7, 1, 12, 15, 0)}])
# 設定索引軸
df = df.set_index('t')
# 建立GeoDataFrame對象
# EPSG:3857是線上地圖常用的投影坐标系,以米為機關
gdf = GeoDataFrame(df, crs='EPSG:3857')
gdf
geometry | |
t | |
2023-07-01 12:00:00 | POINT (0.000 0.000) |
2023-07-01 12:06:00 | POINT (6.000 0.000) |
2023-07-01 12:10:00 | POINT (6.000 6.000) |
2023-07-01 12:15:00 | POINT (9.000 9.000) |
然後基于GeoDataFrame對象建立軌迹對象。
# 建立Trajectory對象
# 1表示軌迹辨別符
toy_traj = mpd.Trajectory(gdf, 1)
toy_traj
Trajectory 1 (2023-07-01 12:00:00 to 2023-07-01 12:15:00) | Size: 4 | Length: 16.2m
Bounds: (0.0, 0.0, 9.0, 9.0)
LINESTRING (0 0, 6 0, 6 6, 9 9)
接下來可以可視化軌迹路徑,橫縱軸表示點的xy坐标。
# 可視化路徑
toy_traj.plot()
<Axes: >
# 互動式展示
# toy_traj.hvplot()
此外也可以直接調用toy_traj的GeoPandas對象進行可視化展示。
toy_traj.df.plot()
<Axes: >
1.1.2 TrajectoryCollection類
TrajectoryCollection類是一個集合,用于存儲多個Trajectory 對象。它提供了對整個軌迹集合進行操作和分析的功能。可以使用 TrajectoryCollection來合并多個軌迹為一個對象、篩選軌迹、進行空間查詢、可視化等。TrajectoryCollection對象初始化參數如下:
class TrajectoryCollection:
def __init__(
self,
data,
traj_id_col=None,
obj_id_col=None,
t=None,
x=None,
y=None,
crs="epsg:4326",
min_length=0,
min_duration=None,
)
其中大部分參數與Trajectory對象的參數一緻,不同參數解釋如下:
- min_length: 軌迹的最小長度,低于該長度的軌迹将被丢棄。
- min_duration: 軌迹的期望最小持續時間,低于該續時間的軌迹将被丢棄。
直接建立TrajectoryCollection對象
# 建立兩條軌迹
df = pd.DataFrame(
{
"t": pd.date_range("2023-07-01", periods=5, freq="min"),
"trajectory_id": [1, 1, 2, 2, 2],
"geometry": [Point(0, 0), Point(0, 1), Point(1, 2), Point(1, 3), Point(2, 4)],
}
)
gdf = gpd.GeoDataFrame(df, crs='EPSG:3857')
gdf
t | trajectory_id | geometry | |
2023-07-01 00:00:00 | 1 | POINT (0.000 0.000) | |
1 | 2023-07-01 00:01:00 | 1 | POINT (0.000 1.000) |
2 | 2023-07-01 00:02:00 | 2 | POINT (1.000 2.000) |
3 | 2023-07-01 00:03:00 | 2 | POINT (1.000 3.000) |
4 | 2023-07-01 00:04:00 | 2 | POINT (2.000 4.000) |
# 建立TrajectoryCollection對象,用trajectory_id作為單個軌迹的唯一辨別
tc = mpd.TrajectoryCollection(gdf, traj_id_col='trajectory_id', t='t')
tc
TrajectoryCollection with 2 trajectories
# 繪制軌迹, column指定軌迹對象,legend展示軌迹持續時間
import matplotlib.cm as cm
tc.plot(column='trajectory_id', legend=True)
<Axes: >
# 互動式展示軌迹
# tc.hvplot()
此外我們還可以從TrajectoryCollection提取單個軌迹,或者篩選所需要的軌迹。
# 從TrajectoryCollection提取單個軌迹
my_traj = tc.trajectories[1]
print(my_traj)
Trajectory 2 (2023-07-01 00:02:00 to 2023-07-01 00:04:00) | Size: 3 | Length: 2.4m
Bounds: (1.0, 2.0, 2.0, 4.0)
LINESTRING (1 2, 1 3, 2 4)
# 篩選路徑持續時間大于一分鐘的
min_duration = timedelta(minutes=1)
tc.trajectories = [traj for traj in tc if traj.get_duration() > min_duration]
print(tc.trajectories)
[Trajectory 2 (2023-07-01 00:02:00 to 2023-07-01 00:04:00) | Size: 3 | Length: 2.4m
Bounds: (1.0, 2.0, 2.0, 4.0)
LINESTRING (1 2, 1 3, 2 4)]
從檔案資料建立TrajectoryCollection對象
TrajectoryCollection可以通過GeoPandas函數從各種地理空間資料存儲中讀取資料來建立,也可以利用Pandas從各類檔案類型中讀取資料進行建立。
下面展示了直接通過csv檔案建立TrajectoryCollection對象,資料下載下傳位址為:movingpandas-examples-data
# 讀取資料
df = pd.read_csv('data/geolife_small.csv', delimiter=';')
df.head()
X | Y | fid | id | sequence | trajectory_id | tracker | t | |
116.391305 | 39.898573 | 1 | 1 | 1 | 1 | 19 | 2008-12-11 04:42:14+00 | |
1 | 116.391317 | 39.898617 | 2 | 2 | 2 | 1 | 19 | 2008-12-11 04:42:16+00 |
2 | 116.390928 | 39.898613 | 3 | 3 | 3 | 1 | 19 | 2008-12-11 04:43:26+00 |
3 | 116.390833 | 39.898635 | 4 | 4 | 4 | 1 | 19 | 2008-12-11 04:43:32+00 |
4 | 116.389410 | 39.898723 | 5 | 5 | 5 | 1 | 19 | 2008-12-11 04:43:47+00 |
# 轉換為traj_collection對象
traj_collection = mpd.TrajectoryCollection(df, 'trajectory_id', t='t', x='X', y='Y')
print(traj_collection)
TrajectoryCollection with 5 trajectories
# 繪制軌迹
traj_collection.plot(column='trajectory_id', legend=True, figsize=(9,5))
<Axes: >
1.2 基礎函數
MovingPandas中的函數可以同時适用于單個軌迹(Trajectory)對象和軌迹集合(TrajectoryCollection)對象,以實作對單條和多條軌迹的分析。
1.2.1 移動資訊添加函數
MovingPandas提供add_speed、add_distance、add_direction、add_timedelta、add_timedelta和add_acceleration等移動資訊添加函數來處理移動資料。這些函數的接口如下:
add_speed(overwrite=False, name=SPEED_COL_NAME, units=UNITS())
add_distance(overwrite=False, name=DISTANCE_COL_NAME, units=None)
add_direction(overwrite=False, name=DIRECTION_COL_NAME)
add_angular_difference(overwrite=False,name=ANGULAR_DIFFERENCE_COL_NAM):
add_timedelta(overwrite=False, name=TIMEDELTA_COL_NAME)
add_acceleration(overwrite=False, name=ACCELERATION_COL_NAME, units=UNITS())
所有函數均使用系統預設值或全局變量,具體函數介紹如下:
- add_speed: 用于計算移動對象的速度。根據兩個時間點之間的距離和時間差來計算速度。參數overwrite表示是否覆寫現有的速度列,name表示速度列的名稱,units表示速度的機關,名字和機關不指定則使用預設值。
- add_distance: 用于計算移動對象的距離。根據兩個時間點之間的直線距離異來計算距離。參數overwrite表示是否覆寫現有的距離列,name表示距離列的名稱,units表示距離的機關。
- add_direction: 用于計算移動對象的方向。根據兩個時間點之間的坐标差異來計算方向。參數overwrite表示是否覆寫現有的方向列,name表示方向列的名稱。方向值以度為機關,從北開始順時針旋轉,範圍值為[0,360)。
- add_angular_difference: 用于計算兩個對象之間的角度差。根據移動對象的方向資訊來計算兩個對象之間的角度差。參數overwrite表示是否覆寫現有的方向列,name表示方向列的名稱。角度範圍值為[0,180]。
- add_timedelta: 用于計算移動對象的時間差。根據時間戳之間的差異來計算時間差。參數overwrite表示是否覆寫現有的時間差列,name表示時間差列的名稱。時間差為目前行與前一行之差。
- add_acceleration: 用于計算移動對象的加速度。根據速度和時間差來計算加速度。參數overwrite表示是否覆寫現有的加速度列,name表示加速度列的名稱,units表示加速度的機關。
以下代碼詳細介紹了這些函數的使用。
# 建立軌迹
df = pd.DataFrame([
{'geometry':Point(0,0), 't':datetime(2023,7,1,12,0,0)},
{'geometry':Point(6,0), 't':datetime(2023,7,1,12,0,6)},
{'geometry':Point(6,6), 't':datetime(2023,7,1,12,0,11)},
{'geometry':Point(9,9), 't':datetime(2023,7,1,12,0,14)}
]).set_index('t')
gdf = GeoDataFrame(df, crs='EPSG:3857')
toy_traj = mpd.Trajectory(gdf, 1)
toy_traj
Trajectory 1 (2023-07-01 12:00:00 to 2023-07-01 12:00:14) | Size: 4 | Length: 16.2m
Bounds: (0.0, 0.0, 9.0, 9.0)
LINESTRING (0 0, 6 0, 6 6, 9 9)
# 檢視軌迹的坐标系,以米為機關
toy_traj.crs
<Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World between 85.06°S and 85.06°N.
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
# 添加速度列,不指定機關則以米為機關
toy_traj.add_speed(overwrite=True)
toy_traj.df
geometry | speed | |
t | ||
2023-07-01 12:00:00 | POINT (0.000 0.000) | 1.000000 |
2023-07-01 12:00:06 | POINT (6.000 0.000) | 1.000000 |
2023-07-01 12:00:11 | POINT (6.000 6.000) | 1.200000 |
2023-07-01 12:00:14 | POINT (9.000 9.000) | 1.414214 |
# 添加自定義定機關的速度列
toy_traj.add_speed(overwrite=True, name="speed (ft/min)", units=("ft", "min"))
toy_traj.df
geometry | speed | speed (ft/min) | |
t | |||
2023-07-01 12:00:00 | POINT (0.000 0.000) | 1.000000 | 196.850394 |
2023-07-01 12:00:06 | POINT (6.000 0.000) | 1.000000 | 196.850394 |
2023-07-01 12:00:11 | POINT (6.000 6.000) | 1.200000 | 236.220472 |
2023-07-01 12:00:14 | POINT (9.000 9.000) | 1.414214 | 278.388497 |
# 其他列的添加
toy_traj.add_distance(overwrite=True, name="distance (m)", units="m")
toy_traj.add_direction(overwrite=True)
toy_traj.add_timedelta(overwrite=True)
toy_traj.add_angular_difference(overwrite=True)
toy_traj.add_acceleration(overwrite=True, name="acceleration (mph/s)", units=("mi", "h", "s"))
toy_traj.df
geometry | speed | speed (ft/min) | distance (m) | direction | timedelta | angular_difference | acceleration (mph/s) | |
t | ||||||||
2023-07-01 12:00:00 | POINT (0.000 0.000) | 1.000000 | 196.850394 | 0.000000 | 90.0 | NaT | 0.0 | 0.000000 |
2023-07-01 12:00:06 | POINT (6.000 0.000) | 1.000000 | 196.850394 | 6.000000 | 90.0 | 0 days 00:00:06 | 0.0 | 0.000000 |
2023-07-01 12:00:11 | POINT (6.000 6.000) | 1.200000 | 236.220472 | 6.000000 | 0.0 | 0 days 00:00:05 | 90.0 | 0.089477 |
2023-07-01 12:00:14 | POINT (9.000 9.000) | 1.414214 | 278.388497 | 4.242641 | 45.0 | 0 days 00:00:03 | 45.0 | 0.159727 |
# 可視化某列結果繪制結果
toy_traj.plot(column='speed', linewidth=5, capstyle='round', figsize=(9,3), legend=True, vmax=20)
<Axes: >
1.2.2 軌迹位置提取函數
MovingPandas提供add_speed、add_distance、add_direction、add_timedelta、add_timedelta和add_acceleration等軌迹位置提取函數用于提取軌迹位置相關資訊資料。具體函數介紹如下:
- get_start_location(): 該函數用于擷取移動對象的起始位置。它傳回移動對象的起始位置坐标。
- get_end_location(): 該函數用于擷取移動對象的結束位置。它傳回移動對象的結束位置坐标。
- get_position_at(t, method="interpolated"): 該函數用于在給定的時間點(t)處擷取移動對象的位置。參數t表示所需位置的時間點,而可選的參數method指定了用于擷取位置的方法,預設為"interpolated",表示使用插值方法擷取位置。
- get_segment_between(t1, t2): 這個函數用于擷取兩個給定時間點(t1和t2)之間的移動對象的片段。它傳回包含這段時間内移動對象的子集。
- clip(polygon, point_based=False): 該函數用于剪裁移動對象,使其僅保留在指定多邊形區域内的部分。參數polygon表示用于剪裁的多邊形區域,而可選的參數point_based指定剪裁是否基于點而不是線段。預設情況下,剪裁是基于線段的。此外,MovingPandas也提供了intersection函數來算軌迹資料與空間幾何圖形之間的交集,效果類似clip函數,具體使用可以參考官方文檔。
以下代碼詳細介紹了這些函數的使用。
# 建立軌迹
df = pd.DataFrame([
{'geometry':Point(0,0), 't':datetime(2023,7,1,12,0,0)},
{'geometry':Point(6,0), 't':datetime(2023,7,1,12,6,6)},
{'geometry':Point(6,6), 't':datetime(2023,7,1,12,10,11)},
{'geometry':Point(9,9), 't':datetime(2023,7,1,12,19,14)}
]).set_index('t')
gdf = GeoDataFrame(df, crs='EPSG:3857')
toy_traj = mpd.Trajectory(gdf, 1)
toy_traj
Trajectory 1 (2023-07-01 12:00:00 to 2023-07-01 12:19:14) | Size: 4 | Length: 16.2m
Bounds: (0.0, 0.0, 9.0, 9.0)
LINESTRING (0 0, 6 0, 6 6, 9 9)
起始位置和結束位置
ax = toy_traj.plot()
# 提取軌迹起始點和結束點
gpd.GeoSeries(toy_traj.get_start_location()).plot(ax=ax, color='blue')
gpd.GeoSeries(toy_traj.get_end_location()).plot(ax=ax, color='red')
<Axes: >
中間位置
# 提取軌迹中的一個時間點的位置
t = datetime(2023,7,1,12,6,0)
# 改時間點不同位置資訊擷取方式
print(toy_traj.get_position_at(t, method="nearest"))
print(toy_traj.get_position_at(t, method="interpolated"))
print(toy_traj.get_position_at(t, method="ffill"))
print(toy_traj.get_position_at(t, method="bfill"))
POINT (6 0)
POINT (5.9016393442622945 0)
POINT (0 0)
POINT (6 0)
point = toy_traj.get_position_at(t, method="interpolated")
ax = toy_traj.plot()
gpd.GeoSeries(point).plot(ax=ax, color='red', markersize=100)
<Axes: >
軌迹片段
# 提取軌迹中的一個片段
segment = toy_traj.get_segment_between(datetime(2023,7,1,12,6,0), datetime(2023,7,1,12,12,0))
print(segment)
Trajectory 1_2023-07-01 12:06:00 (2023-07-01 12:06:06 to 2023-07-01 12:10:11) | Size: 2 | Length: 6.0m
Bounds: (6.0, 0.0, 6.0, 6.0)
LINESTRING (6 0, 6 6)
ax = toy_traj.plot()
segment.plot(ax=ax, color='red', linewidth=5)
<Axes: >
軌迹區域
xmin, xmax, ymin, ymax = 2, 8, -10, 5
# 設定區域
polygon = Polygon([(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin), (xmin, ymin)])
# 提取位于該區域的軌迹
intersections = toy_traj.clip(polygon)
intersections
TrajectoryCollection with 1 trajectories
# 繪制軌迹區域
ax = toy_traj.plot()
gpd.GeoSeries(polygon).plot(ax=ax, color='lightgray')
intersections.plot(ax=ax, color='red', linewidth=5, capstyle='round')
<Axes: >
# 單獨繪制軌迹區域
intersections = toy_traj.clip(polygon)
intersections.plot()
<Axes: >
1.2.3 與GeoPandas互動
MovingPandas提供to_point_gdf,to_line_gdf和to_traj_gdf函數以按照不同規則将軌迹類Trajectories和TrajectoryCollections轉換回為GeoPandas的GeoDataFrames結構。這些函數解釋如下:
- to_point_gdf(return_orig_tz=False),将軌迹對象轉換為與每個點相關的GeoDataFrame。return_orig_tz用于指定是否傳回與原始時區比對的時間戳。預設為False,時間戳會根據本地時區進行轉換。
- to_line_gdf(),将軌迹對象轉換為每條線相關的GeoDataFrame。
- to_traj_gdf(wkt=False, agg=False),傳回一個GeoDataFrame,其每一行表示一條軌迹。wkt用于指定是否将軌迹幾何表示為Well-Known Text (WKT)格式。預設為False,傳回幾何對象。agg用于指定是否對軌迹進行聚合。預設為False,不進行聚合,agg可選參數見代碼。
# 讀取資料
df = pd.read_csv('data/geolife_small.csv', delimiter=';')
df.head()
# 轉換為traj_collection對象
traj_collection = mpd.TrajectoryCollection(df, 'trajectory_id', t='t', x='X', y='Y')
print(traj_collection)
TrajectoryCollection with 5 trajectories
# 轉換為點GeoDataFrame
traj_collection.to_point_gdf().head()
fid | id | sequence | trajectory_id | tracker | geometry | |
t | ||||||
2008-12-11 04:42:14 | 1 | 1 | 1 | 1 | 19 | POINT (116.39131 39.89857) |
2008-12-11 04:42:16 | 2 | 2 | 2 | 1 | 19 | POINT (116.39132 39.89862) |
2008-12-11 04:43:26 | 3 | 3 | 3 | 1 | 19 | POINT (116.39093 39.89861) |
2008-12-11 04:43:32 | 4 | 4 | 4 | 1 | 19 | POINT (116.39083 39.89863) |
2008-12-11 04:43:47 | 5 | 5 | 5 | 1 | 19 | POINT (116.38941 39.89872) |
# 轉換為線GeoDataFrame
traj_collection.to_line_gdf().head()
fid | id | sequence | trajectory_id | tracker | t | prev_t | geometry | |
2 | 2 | 2 | 1 | 19 | 2008-12-11 04:42:16 | 2008-12-11 04:42:14 | LINESTRING (116.39131 39.89857, 116.39132 39.8... | |
1 | 3 | 3 | 3 | 1 | 19 | 2008-12-11 04:43:26 | 2008-12-11 04:42:16 | LINESTRING (116.39132 39.89862, 116.39093 39.8... |
2 | 4 | 4 | 4 | 1 | 19 | 2008-12-11 04:43:32 | 2008-12-11 04:43:26 | LINESTRING (116.39093 39.89861, 116.39083 39.8... |
3 | 5 | 5 | 5 | 1 | 19 | 2008-12-11 04:43:47 | 2008-12-11 04:43:32 | LINESTRING (116.39083 39.89863, 116.38941 39.8... |
4 | 6 | 6 | 6 | 1 | 19 | 2008-12-11 04:43:50 | 2008-12-11 04:43:47 | LINESTRING (116.38941 39.89872, 116.39052 39.8... |
# 每條軌迹單獨聚合GeoDataFrame
traj_collection.to_traj_gdf()
traj_id | start_t | end_t | geometry | length | direction | |
1 | 2008-12-11 04:42:14 | 2008-12-11 05:15:46 | LINESTRING (116.39131 39.89857, 116.39132 39.8... | 6207.020261 | 186.681376 | |
1 | 2 | 2009-06-29 07:02:25 | 2009-06-29 11:13:12 | LINESTRING (116.59096 40.07196, 116.59091 40.0... | 38764.575483 | 250.585295 |
2 | 3 | 2009-02-04 04:32:53 | 2009-02-04 11:20:12 | LINESTRING (116.38569 39.89977, 116.38565 39.8... | 12745.157506 | 304.115160 |
3 | 4 | 2009-03-10 10:36:45 | 2009-03-10 12:01:07 | LINESTRING (116.38805 39.90342, 116.38804 39.9... | 14363.780551 | 300.732843 |
4 | 5 | 2009-02-25 09:47:03 | 2009-02-25 14:31:24 | LINESTRING (116.38526 39.90027, 116.38525 39.9... | 39259.779560 | 305.200501 |
1.3 軌迹處理
1.3.1 軌迹提取
MovingPandas提供了TemporalSplitter、ObservationGapSplitter,StopSplitter,SpeedSplitter類來根據不同規則從軌迹中提取指定軌迹片段。具體如下:
- TemporalSplitter:使用規則的時間間隔将軌迹拆分為子區間。
- ObservationGapSplitter:根據觀測時間之間的間隔将軌迹拆分為子區間。如果兩個連續觀測之間的間隔超過指定的門檻值,則認為是該軌迹需要拆分。
- StopSplitter:根據停留點的定義将軌迹拆分為子區間。停留點是指軌迹在相對較小的區域内停留了一段時間。
- SpeedSplitter:據速度門檻值将軌迹拆分為子區間。如果軌迹在某段速度低于指定門檻值,則需要在此拆分軌迹。
這些類都隻需待處理軌迹進行初始化,然後調用split函數進行軌迹提取。具體使用見下列代碼。
import matplotlib.pyplot as plt
# 讀取資料
gdf = read_file('data/geolife_small.gpkg').to_crs('EPSG:3857')
tc = mpd.TrajectoryCollection(gdf, 'trajectory_id', t='t')
# 共五條軌迹
tc.to_traj_gdf()
traj_id | start_t | end_t | geometry | length | direction | |
1 | 2008-12-11 04:42:14 | 2008-12-11 05:15:46 | LINESTRING (12956620.805 4851214.113, 12956622... | 8101.428690 | 186.679744 | |
1 | 2 | 2009-06-29 07:02:25 | 2009-06-29 11:13:12 | LINESTRING (12978845.964 4876404.973, 12978840... | 50621.731208 | 250.500509 |
2 | 3 | 2009-02-04 04:32:53 | 2009-02-04 11:20:12 | LINESTRING (12955995.635 4851388.236, 12955991... | 16626.383723 | 304.099365 |
3 | 4 | 2009-03-10 10:36:45 | 2009-03-10 12:01:07 | LINESTRING (12956258.794 4851917.156, 12956257... | 18739.337154 | 300.716597 |
4 | 5 | 2009-02-25 09:47:03 | 2009-02-25 14:31:24 | LINESTRING (12955947.434 4851460.354, 12955946... | 51327.869585 | 305.185128 |
TemporalSplitter
# 将軌迹切分為以每小時為機關的軌迹片段,min_length表示删除軌迹長度小于該值的軌迹片段,min_lwngth取決于軌迹的機關
split = mpd.TemporalSplitter(tc).split(mode="hour", min_length=20000)
split.to_traj_gdf()
traj_id | start_t | end_t | geometry | length | direction | |
2_2009-06-29 07:00:00 | 2009-06-29 07:02:25 | 2009-06-29 07:59:55 | LINESTRING (12978845.964 4876404.973, 12978840... | 40106.105641 | 245.123482 | |
1 | 5_2009-02-25 10:00:00 | 2009-02-25 10:00:04 | 2009-02-25 10:54:47 | LINESTRING (12955251.019 4851210.485, 12955248... | 25455.673254 | 337.129155 |
# 繪制軌迹
fig, axes = plt.subplots(nrows=1, ncols=len(split), figsize=(6,4))
for i, traj in enumerate(split):
traj.plot(ax=axes[i], linewidth=3.0, capstyle='round', column='speed', vmax=20)
ObservationGapSplitter
# 如果兩個連續觀測超過間隔gap,如30分鐘,則認為該軌迹需要拆分
split = mpd.ObservationGapSplitter(tc).split(gap=timedelta(minutes=30),min_length=0)
split.to_traj_gdf()
traj_id | start_t | end_t | geometry | length | direction | |
1_0 | 2008-12-11 04:42:14 | 2008-12-11 05:15:46 | LINESTRING (12956620.805 4851214.113, 12956622... | 8101.428690 | 186.679744 | |
1 | 2_0 | 2009-06-29 07:02:25 | 2009-06-29 08:20:15 | LINESTRING (12978845.964 4876404.973, 12978840... | 47469.321958 | 252.952783 |
2 | 2_1 | 2009-06-29 10:57:17 | 2009-06-29 11:13:12 | LINESTRING (12948649.439 4867034.108, 12948650... | 3040.348707 | 139.615947 |
3 | 3_0 | 2009-02-04 04:32:53 | 2009-02-04 04:35:03 | LINESTRING (12955995.635 4851388.236, 12955991... | 311.729231 | 42.937430 |
4 | 3_1 | 2009-02-04 10:03:21 | 2009-02-04 11:20:12 | LINESTRING (12956011.999 4851497.646, 12956043... | 16228.264596 | 303.229612 |
5 | 4_0 | 2009-03-10 10:36:45 | 2009-03-10 12:01:07 | LINESTRING (12956258.794 4851917.156, 12956257... | 18739.337154 | 300.716597 |
6 | 5_0 | 2009-02-25 09:47:03 | 2009-02-25 10:54:47 | LINESTRING (12955947.434 4851460.354, 12955946... | 27149.500896 | 335.377580 |
7 | 5_1 | 2009-02-25 13:30:22 | 2009-02-25 14:31:24 | LINESTRING (12945965.972 4873487.115, 12945952... | 24074.321167 | 165.727187 |
StopSplitter
# 如果軌迹在某一點為圓心,直徑為10範圍内停留60s,則認為軌迹需要在該段分割
split = mpd.StopSplitter(tc).split(max_diameter=10, min_duration=timedelta(seconds=60))
split.to_traj_gdf()
traj_id | start_t | end_t | geometry | length | direction | |
1_2008-12-11 04:42:14 | 2008-12-11 04:42:14 | 2008-12-11 05:15:46 | LINESTRING (12956620.805 4851214.113, 12956622... | 8101.428690 | 186.679744 | |
1 | 2_2009-06-29 07:02:25 | 2009-06-29 07:02:25 | 2009-06-29 07:05:30 | LINESTRING (12978845.964 4876404.973, 12978840... | 608.233016 | 29.527683 |
2 | 2_2009-06-29 07:06:55 | 2009-06-29 07:06:55 | 2009-06-29 08:02:40 | LINESTRING (12979026.970 4876730.251, 12979022... | 41655.491556 | 246.215181 |
3 | 2_2009-06-29 08:03:40 | 2009-06-29 08:03:40 | 2009-06-29 11:13:12 | LINESTRING (12949605.674 4863764.794, 12949579... | 8333.942283 | 357.660458 |
4 | 3_2009-02-04 04:32:53 | 2009-02-04 04:32:53 | 2009-02-04 11:20:12 | LINESTRING (12955995.635 4851388.236, 12955991... | 16626.383723 | 304.099365 |
5 | 4_2009-03-10 10:36:45 | 2009-03-10 10:36:45 | 2009-03-10 12:01:07 | LINESTRING (12956258.794 4851917.156, 12956257... | 18739.337154 | 300.716597 |
6 | 5_2009-02-25 09:47:03 | 2009-02-25 09:47:03 | 2009-02-25 14:31:24 | LINESTRING (12955947.434 4851460.354, 12955946... | 51327.869585 | 305.185128 |
SpeedSplitter
# 把超過30分鐘速度低于1地理系統機關/秒的軌迹分開
split = mpd.SpeedSplitter(tc).split(speed=1, duration=timedelta(minutes=30))
split.to_traj_gdf()
traj_id | start_t | end_t | geometry | length | direction | |
1_0 | 2008-12-11 04:42:14 | 2008-12-11 05:15:46 | LINESTRING (12956620.805 4851214.113, 12956622... | 8048.160604 | 186.679744 | |
1 | 2_0 | 2009-06-29 07:02:25 | 2009-06-29 08:20:15 | LINESTRING (12978845.964 4876404.973, 12978840... | 47336.977010 | 252.952783 |
2 | 2_1 | 2009-06-29 10:57:22 | 2009-06-29 11:10:07 | LINESTRING (12948650.441 4867044.718, 12948642... | 2915.988294 | 138.780873 |
3 | 3_0 | 2009-02-04 04:32:53 | 2009-02-04 04:35:03 | LINESTRING (12955995.635 4851388.236, 12955991... | 310.440780 | 42.937430 |
4 | 3_1 | 2009-02-04 10:03:23 | 2009-02-04 11:20:12 | LINESTRING (12956043.836 4851524.490, 12956025... | 15962.930350 | 302.882421 |
5 | 4_0 | 2009-03-10 10:36:45 | 2009-03-10 12:01:07 | LINESTRING (12956258.794 4851917.156, 12956257... | 18349.431950 | 300.716597 |
6 | 5_0 | 2009-02-25 09:47:03 | 2009-02-25 10:54:47 | LINESTRING (12955947.434 4851460.354, 12955946... | 27081.554127 | 335.377580 |
7 | 5_1 | 2009-02-25 13:30:23 | 2009-02-25 14:31:24 | LINESTRING (12945952.057 4873516.928, 12945956... | 24006.683028 | 165.708568 |
1.3.2 軌迹壓縮與平滑
MovingPandas提供了各種軌迹壓縮類和軌迹平滑類,以減少軌迹對象的大小(點的數量),進而加快處理速度。
壓縮軌迹類都隻需待處理軌迹進行初始化,然後調用generalize函數進行軌迹壓縮。用于壓縮軌迹的類有:
- MinDistanceGeneralizer(最小距離):通過删除原始資料中的一些點來壓縮軌迹,被删除的點與相鄰點之間的距離必須小于指定的最小距離。
- DouglasPeuckerGeneralizer(道格拉斯-普克):道格拉斯-普克算法根據指定的門檻值,逐漸删除原始資料中的部分點,進而生成一個近似的簡化線串或軌迹。
- MaxDistanceGeneralizer(最大距離):删除原始資料中與相鄰點之間距離超過指定最大距離的點,進而實作軌迹壓縮。
- MinTimeDeltaGeneralizer(最小時間間隔):通過删除兩個連續時間戳之間時間間隔小于指定最小時間間隔的資料點來實作軌迹壓縮。
- TopDownTimeRatioGeneralizer(自頂向下時間比率):根據預先設定的時間比率,逐漸删除原始資料中的資料點。
平滑軌迹類目前隻提供了KalmanSmootherCV類進行軌迹平滑,KalmanSmootherCV需要額外安裝第三庫且處理流程麻煩,是以一般都是效果相近的壓縮軌迹類。
# 讀取資料
gdf = read_file('data/geolife_small.gpkg').to_crs('EPSG:3857')
tc = mpd.TrajectoryCollection(gdf, 'trajectory_id', t='t')
# 提取單條軌迹進行操作
original_traj = tc.trajectories[1]
# 可以看到軌迹包括897個點
print(original_traj)
Trajectory 2 (2009-06-29 07:02:25 to 2009-06-29 11:13:12) | Size: 897 | Length: 50621.7m
Bounds: (12948595.449314836, 4861831.088215791, 12979030.643375682, 4877940.244020148)
LINESTRING (12978845.964340456 4876404.972802613, 12978840.175726935 4876411.664456934, 12978837.281
# tolerance設定連續點之間的最小距離
mpd.MinDistanceGeneralizer(original_traj).generalize(tolerance=1.0)
Trajectory 2 (2009-06-29 07:02:25 to 2009-06-29 11:13:12) | Size: 818 | Length: 50611.6m
Bounds: (12948595.449314836, 4861831.088215791, 12979030.643375682, 4877940.244020148)
LINESTRING (12978845.964340456 4876404.972802613, 12978840.175726935 4876411.664456934, 12978837.281
# tolerance表示距離公差,具體使用見算法介紹
mpd.DouglasPeuckerGeneralizer(original_traj).generalize(tolerance=1.0)
Trajectory 2 (2009-06-29 07:02:25 to 2009-06-29 11:13:12) | Size: 564 | Length: 50606.8m
Bounds: (12948595.449314836, 4861831.088215791, 12979030.643375682, 4877940.244020148)
LINESTRING (12978845.964340456 4876404.972802613, 12978837.281420175 4876414.573873267, 12978852.086
# tolerance設定連續點之間的最大距離
mpd.MaxDistanceGeneralizer(original_traj).generalize(tolerance=1.0)
Trajectory 2 (2009-06-29 07:02:25 to 2009-06-29 11:13:12) | Size: 324 | Length: 50166.8m
Bounds: (12948595.449314836, 4861831.088215791, 12979029.752819754, 4877940.244020148)
LINESTRING (12978845.964340456 4876404.972802613, 12978837.281420175 4876414.573873267, 12978852.086
# tolerance連續軌迹最小的時間距離
mpd.MinTimeDeltaGeneralizer(original_traj).generalize(tolerance=timedelta(minutes=1))
Trajectory 2 (2009-06-29 07:02:25 to 2009-06-29 11:13:12) | Size: 76 | Length: 47565.6m
Bounds: (12948636.414887449, 4862053.18880025, 12979029.30754179, 4877912.7458354365)
LINESTRING (12978845.964340456 4876404.972802613, 12978815.79675845 4876446.868451926, 12978780.3971
# tolerance距離公差,該類處理速度很慢
mpd.TopDownTimeRatioGeneralizer(original_traj).generalize(tolerance=1.0)
Trajectory 2 (2009-06-29 07:02:25 to 2009-06-29 11:13:12) | Size: 756 | Length: 50616.4m
Bounds: (12948595.449314836, 4861831.088215791, 12979030.643375682, 4877940.244020148)
LINESTRING (12978845.964340456 4876404.972802613, 12978840.175726935 4876411.664456934, 12978837.281
1.3.3 軌迹停留檢測
MovingPandas提供了軌迹停留檢測的功能,事實上軌迹停留并不意味着軌迹運動速度為零,事實上由于跟蹤不準确,物體移動速度很少達到0,例如GPS軌迹往往會在物體的停止位置周圍不斷移動。是以MovingPandas檢測軌迹停留的方式為如果物體停留在指定大小的區域内超過一段時間,則判定為軌迹停留。具體使用如下:
# 讀取資料
gdf = read_file('data/geolife_small.gpkg').to_crs('EPSG:3857')
tc = mpd.TrajectoryCollection(gdf, 'trajectory_id', t='t')
# 使用單個軌迹進行運算,也可以軌迹集合
my_traj = tc.trajectories[0]
my_traj
Trajectory 1 (2008-12-11 04:42:14 to 2008-12-11 05:15:46) | Size: 466 | Length: 8101.4m
Bounds: (12955985.950308602, 4845963.532957837, 12956871.051579898, 4851235.877839145)
LINESTRING (12956620.805364596 4851214.112520242, 12956622.141198486 4851220.49700885, 12956578.8379
# 建立停留檢測器
detector = mpd.TrajectoryStopDetector(tc)
# 擷取在某個直徑為10的區域停留60s以上的軌迹時間範圍
stop_time_ranges = detector.get_stop_time_ranges(min_duration=timedelta(seconds=60), max_diameter=10)
for x in stop_time_ranges:
print(x)
Traj 2: 2009-06-29 07:05:30 - 2009-06-29 07:06:55 (duration: 0 days 00:01:25)
Traj 2: 2009-06-29 08:02:40 - 2009-06-29 08:03:40 (duration: 0 days 00:01:00)
# 擷取在某個直徑為10的區域停留60s以上的軌迹點
stop_points = detector.get_stop_points(min_duration=timedelta(seconds=60), max_diameter=10)
stop_points
geometry | start_time | end_time | traj_id | duration_s | |
stop_id | |||||
2_2009-06-29 07:05:30 | POINT (12979027.415 4876729.960) | 2009-06-29 07:05:30 | 2009-06-29 07:06:55 | 2 | 85.0 |
2_2009-06-29 08:02:40 | POINT (12949605.785 4863764.939) | 2009-06-29 08:02:40 | 2009-06-29 08:03:40 | 2 | 60.0 |
# 擷取在某個直徑為10的區域停留60s以上的軌迹片段
stops = detector.get_stop_segments(min_duration=timedelta(seconds=60), max_diameter=10)
stops
TrajectoryCollection with 2 trajectories
1.3.4 軌迹聚類
Tmovingpandas提供TrajectoryCollectionAggregator類,用于将多個軌迹集合(TrajectoryCollection)合并為一個更大的軌迹集合。具體使用如下:
# 讀取資料
gdf = read_file('data/geolife_small.gpkg').to_crs('EPSG:3857')
tc = mpd.TrajectoryCollection(gdf, 'trajectory_id', t='t')
# 簡化軌迹
generalized = mpd.MinDistanceGeneralizer(tc).generalize(tolerance=100)
# 聚類
aggregator = mpd.TrajectoryCollectionAggregator(generalized, max_distance=1000, min_distance=100, min_stop_duration=timedelta(minutes=5))
# 從合并的軌迹集合中提取顯著點,并傳回一個Geopandas的GeoDataFrame對象
pts = aggregator.get_significant_points_gdf()
pts.head()
geometry | |
POINT (12956620.805 4851214.113) | |
1 | POINT (12956054.412 4846377.879) |
2 | POINT (12956409.855 4851235.878) |
3 | POINT (12956533.642 4851120.522) |
4 | POINT (12956436.794 4851148.817) |
# 合并的軌迹集合視為一個整體,并将其劃分為不同的簇
# 傳回每個簇的質心結果
clusters = aggregator.get_clusters_gdf()
clusters.head()
geometry | n | |
POINT (12955779.665 4851264.376) | 30 | |
1 | POINT (12956533.123 4846314.334) | 6 |
2 | POINT (12956511.072 4849943.391) | 4 |
3 | POINT (12956768.526 4848886.514) | 2 |
4 | POINT (12956668.895 4847819.306) | 2 |
# 将合并後的軌迹資料轉換為一個GeoDataFrame對象,其中每一條記錄代表着一次移動(即一段軌迹)
flows = aggregator.get_flows_gdf()
flows.head()
geometry | weight | |
LINESTRING (12955779.665 4851264.376, 12956511... | 1 | |
1 | LINESTRING (12956511.072 4849943.391, 12956768... | 1 |
2 | LINESTRING (12956768.526 4848886.514, 12956668... | 1 |
3 | LINESTRING (12956668.895 4847819.306, 12956533... | 1 |
4 | LINESTRING (12978848.347 4876830.605, 12978353... | 1 |
1.3.5 軌迹距離計算
MovingPandas提供了distance函數和hausdorff_distance函數來計算兩個幾何對象的距離。distance函數基于shapely-objectdistance計算距離,而hausdorff_distance傳回豪斯多夫距離。使用示例如下:
# 定義軌迹1
df = pd.DataFrame([
{'geometry':Point(0,0), 't':datetime(2023,7,1,12,0,0)},
{'geometry':Point(6,0), 't':datetime(2023,7,1,12,6,0)},
{'geometry':Point(6,6), 't':datetime(2023,7,1,12,10,0)},
{'geometry':Point(9,9), 't':datetime(2023,7,1,12,15,0)}
]).set_index('t')
# 機關為米
geo_df = GeoDataFrame(df, crs='EPSG:3857')
toy_traj = mpd.Trajectory(geo_df, 1)
# 定義軌迹2
df = pd.DataFrame([
{'geometry':Point(3,3), 't':datetime(2023,7,1,12,0,0)},
{'geometry':Point(3,9), 't':datetime(2023,7,1,12,6,0)},
{'geometry':Point(2,9), 't':datetime(2023,7,1,12,10,0)},
{'geometry':Point(0,7), 't':datetime(2023,7,1,12,15,0)}
]).set_index('t')
geo_df = GeoDataFrame(df, crs='EPSG:3857')
toy_traj2 = mpd.Trajectory(geo_df, 1)
# 展示軌迹
ax = toy_traj.plot()
toy_traj2.plot(ax=ax, color='red')
<Axes: >
# 計算距離,預設機關由坐标系機關決定,該坐标系機關為米
print(toy_traj.distance(toy_traj2))
# 計算hausdorff距離,預設機關由坐标系機關決定。
print(toy_traj.hausdorff_distance(toy_traj2))
3.0
6.082762530298219
# 計算距離,units自定義機關為厘米
print(toy_traj.distance(toy_traj2, units="cm"))
# 計算hausdorff距離,units自定義機關為英寸
print(toy_traj.hausdorff_distance(toy_traj2, units="ft"))
300.0
19.95656998129337
2 軌迹繪圖執行個體之海鷗遷徙軌迹分析
基于海鷗遷徙資料來分析其移動軌迹。資料下載下傳位址:movingpandas-examples-data。
step1 加載資料
df = read_file('data/gulls.gpkg')
# 展示資料
df.head()
event-id | visible | timestamp | location-long | location-lat | sensor-type | individual-taxon-canonical-name | tag-local-identifier | individual-local-identifier | study-name | geometry | |
1082620685 | true | 2009-05-27 14:00:00.000 | 24.58617 | 61.24783 | gps | Larus fuscus | 91732 | 91732A | Navigation experiments in lesser black-backed ... | POINT (24.58617 61.24783) | |
1 | 1082620686 | true | 2009-05-27 20:00:00.000 | 24.58217 | 61.23267 | gps | Larus fuscus | 91732 | 91732A | Navigation experiments in lesser black-backed ... | POINT (24.58217 61.23267) |
2 | 1082620687 | true | 2009-05-28 05:00:00.000 | 24.53133 | 61.18833 | gps | Larus fuscus | 91732 | 91732A | Navigation experiments in lesser black-backed ... | POINT (24.53133 61.18833) |
3 | 1082620688 | true | 2009-05-28 08:00:00.000 | 24.58200 | 61.23283 | gps | Larus fuscus | 91732 | 91732A | Navigation experiments in lesser black-backed ... | POINT (24.58200 61.23283) |
4 | 1082620689 | true | 2009-05-28 14:00:00.000 | 24.58250 | 61.23267 | gps | Larus fuscus | 91732 | 91732A | Navigation experiments in lesser black-backed ... | POINT (24.58250 61.23267) |
# 檢視資料次元
df.shape
(89867, 11)
# 展示資料坐标點
df.plot()
<Axes: >
# 海鷗唯一id
df['individual-local-identifier'].unique()
array(['91732A', '91733A', '91734A', '91735A', '91737A', '91738A',
'91739A', '91740A', '91741A', '91742A', '91743A', '91744A',
'91745A', '91746A', '91747A', '91748A', '91749A', '91750A',
'91751A', '91752A', '91754A', '91755A', '91756A', '91758A',
'91759A', '91761A', '91762A', '91763A', '91764A', '91765A',
'91766A', '91767A', '91769A', '91771A', '91774A', '91775A',
'91776A', '91777A', '91778A', '91779A', '91780A', '91781A',
'91782A', '91783A', '91785A', '91786A', '91787A', '91788A',
'91789A', '91794A', '91795A', '91797A', '91798A', '91799A',
'91800A', '91802A', '91803A', '91807A', '91809A', '91810A',
'91811A', '91812A', '91813A', '91814A', '91815A', '91816A',
'91819A', '91821A', '91823A', '91824A', '91825A', '91826A',
'91827A', '91828A', '91829A', '91830A', '91831A', '91832A',
'91835A', '91836A', '91837A', '91838A', '91839A', '91843A',
'91845A', '91846A', '91848A', '91849A', '91852A', '91854A',
'91858A', '91861A', '91862A', '91864A', '91865A', '91866A',
'91870A', '91871A', '91872A', '91875A', '91876A', '91877A',
'91878A', '91880A', '91881A', '91884A', '91885A', '91893A',
'91894A', '91897A', '91900A', '91901A', '91903A', '91907A',
'91908A', '91910A', '91911A', '91913A', '91916A', '91918A',
'91919A', '91920A', '91921A', '91924A', '91929A', '91930A'],
dtype=object)
# 海鷗數量
len(df['individual-local-identifier'].unique())
126
# 檢視海鷗個體軌迹記錄數,可以看到呈長尾分布
df['individual-local-identifier'].value_counts().plot(kind='bar', figsize=(16,4))
<Axes: xlabel='individual-local-identifier'>
# 建立軌迹集合,以海鷗編号為軌迹編号
tc = mpd.TrajectoryCollection(df, traj_id_col='individual-local-identifier', t='timestamp', min_length=100, crs='EPSG:3857')
tc
TrajectoryCollection with 125 trajectories
# 壓縮軌迹
tc = mpd.MinTimeDeltaGeneralizer(tc).generalize(tolerance=timedelta(days=1))
step2 個體分析
# 挑選個體編号91916A的海鷗軌迹,該海鷗軌迹記錄數最多
filtered = tc.filter('individual-local-identifier', '91916A')
my_traj = filtered.trajectories[0].copy()
my_traj
Trajectory 91916A (2009-08-15 15:00:00 to 2015-08-27 09:00:00) | Size: 2134 | Length: 77841407.0m
Bounds: (7.915, 20.70533, 39.51317, 61.54817)
LINESTRING (7.915 54.18533, 9.44183 54.87233, 9.4425 54.87283, 9.41167 54.8555, 9.39583 54.88, 9.396
# 展示軌迹資訊
# my_traj.hvplot(title='Movement', line_width=2)
my_traj.plot()
<Axes: >
由上圖可以看到該海鷗一直呈現季節性來回移動。是以按照年份,将該海鷗的軌迹資訊進行拆分。
trips_by_year = mpd.TemporalSplitter(filtered).split(mode='year')
trips_by_year.to_traj_gdf()
traj_id | start_t | end_t | geometry | length | direction | |
91916A_2009-12-31 00:00:00 | 2009-08-15 15:00:00 | 2009-12-31 19:00:00 | LINESTRING (7.91500 54.18533, 9.44183 54.87233... | 5.352375e+06 | 131.939002 | |
1 | 91916A_2010-12-31 00:00:00 | 2010-01-01 19:00:00 | 2010-12-31 07:00:00 | LINESTRING (39.18417 21.17583, 39.18833 21.170... | 1.232428e+07 | 281.047091 |
2 | 91916A_2011-12-31 00:00:00 | 2011-01-01 07:00:00 | 2011-12-31 04:00:00 | LINESTRING (39.17000 21.18017, 39.16883 21.156... | 1.441793e+07 | 189.238229 |
3 | 91916A_2012-12-31 00:00:00 | 2012-01-01 04:00:00 | 2012-12-31 19:00:00 | LINESTRING (39.16933 21.16667, 39.17567 21.142... | 1.324811e+07 | 62.132640 |
4 | 91916A_2013-12-31 00:00:00 | 2013-01-01 19:00:00 | 2013-12-31 13:00:00 | LINESTRING (39.18167 21.17333, 39.18100 21.173... | 1.293261e+07 | 273.165851 |
5 | 91916A_2014-12-31 00:00:00 | 2014-01-01 13:00:00 | 2014-12-31 19:00:00 | LINESTRING (39.17083 21.15400, 39.17100 21.157... | 1.300973e+07 | 33.742967 |
6 | 91916A_2015-12-31 00:00:00 | 2015-01-01 19:00:00 | 2015-08-27 09:00:00 | LINESTRING (39.18167 21.16967, 39.18233 21.169... | 6.551740e+06 | 343.887905 |
# 按照年份顯示軌迹
from matplotlib.cm import tab20
ax = None
for index,tmp in enumerate(trips_by_year):
if ax is None:
ax = tmp.plot(color=tab20(index))
else:
ax= tmp.plot(ax=ax,color=tab20(index))
我們可以單獨提取某一年的資料進行可視化,如下展示某一年速度:
one_year = trips_by_year.get_trajectory('91916A_2010-12-31 00:00:00')
one_year.add_speed(overwrite=True,units=("km", "h"))
# 可視化當年的軌迹
one_year.plot(column='speed', cmap='tab20', legend=True)
<Axes: >
當然也可以可是軌迹中某一天的結果,如下所示:
fig, ax = plt.subplots(figsize=(5, 5))
one_year.plot(ax=ax, linewidth=2.0, color='black')
GeoDataFrame([one_year.get_row_at(datetime(2010,9,1))]).plot(ax=ax, color='red',zorder=2)
GeoDataFrame([one_year.get_row_at(datetime(2010,10,1))]).plot(ax=ax, color='red',zorder=2)
GeoDataFrame([one_year.get_row_at(datetime(2010,11,1))]).plot(ax=ax, color='red',zorder=2)
<Axes: >
此外也可以選擇一個區域,比如栖息地。以确定某一軌迹到達該區域的次數及進出時間。
area_of_interest = Polygon([(30, 25), (50, 25), (50, 15), (30, 15), (30, 25)])
plotted_area_of_interest = GeoDataFrame(pd.DataFrame([{'geometry': area_of_interest, 'id': 1}]), crs='EPSG:3857')
# 輸入的是單個軌迹得到的單個軌迹位于該區域的片段
arrivals = [traj for traj in my_traj.clip(area_of_interest)]
# 位于該區域的軌迹片段數
print("Found {} arrivals".format({len(arrivals)}))
# 提取每條軌迹到達該區域的時間
for traj in arrivals:
name = traj.df['individual-local-identifier'].iloc[0]
start_time = traj.get_start_time()
end_time = traj.get_end_time()
print("Individual {} arrived at {} left at {}".format(name, start_time, end_time))
Found {6} arrivals
Individual 91916A arrived at 2009-12-23 02:58:36.946571 left at 2010-04-17 14:31:42.170632
Individual 91916A arrived at 2010-10-30 20:55:36.697832 left at 2011-04-03 11:40:12.803103
Individual 91916A arrived at 2011-11-09 20:25:19.550486 left at 2012-04-04 09:12:15.852988
Individual 91916A arrived at 2012-10-14 11:25:28.063070 left at 2013-03-30 04:22:18.125679
Individual 91916A arrived at 2013-10-08 04:17:33.524488 left at 2014-03-29 18:26:54.311106
Individual 91916A arrived at 2014-10-28 19:05:32.941608 left at 2015-04-07 22:59:45.284499
step3 群體分析
step2中對單個軌迹Trajectory類進行分析,同樣的函數也可以應用到軌迹集合TrajectoryCollection類中以對多條軌迹進行分析。
year_of_interest = 2010
# 輸入的是多個軌迹得到的位于該區域的單個軌迹
trajs_in_aoi = tc.clip(area_of_interest)
# 提取2010到達該區域的軌迹
relevant = [ traj for traj in trajs_in_aoi if traj.get_start_time().year <= year_of_interest
and traj.get_end_time().year >= year_of_interest]
# 檢視有多少隻海鷗,也就是多少條單個軌迹到達該區域
print("Found {} arrivals".format(len(relevant)))
Found 16 arrivals
# 檢視哪些海鷗到達該區域以及停留時間
for traj in relevant:
print("Individual '{}' arrived at {} (duration: {})".format(
traj.df['individual-local-identifier'].iloc[0], traj.get_start_time().date(),
traj.get_end_time()-traj.get_start_time()))
Individual '91732A' arrived at 2010-04-10 (duration: 5 days, 21:27:11.670985)
Individual '91737A' arrived at 2009-12-08 (duration: 140 days, 11:11:29.473206)
Individual '91761A' arrived at 2010-04-11 (duration: 12 days, 6:10:03.857850)
Individual '91761A' arrived at 2010-10-04 (duration: 6 days, 10:42:00.340093)
Individual '91762A' arrived at 2010-04-19 (duration: 42 days, 1:28:22.699066)
Individual '91771A' arrived at 2009-12-23 (duration: 66 days, 8:05:31.053782)
Individual '91789A' arrived at 2009-11-10 (duration: 550 days, 20:39:18.769409)
Individual '91824A' arrived at 2010-05-06 (duration: 12:53:53.409236)
Individual '91832A' arrived at 2010-04-21 (duration: 3 days, 5:48:46.276706)
Individual '91832A' arrived at 2010-09-23 (duration: 1 day, 1:40:25.175880)
Individual '91837A' arrived at 2010-05-04 (duration: 1 day, 18:38:46.170554)
Individual '91846A' arrived at 2010-05-15 (duration: 10 days, 2:50:28.505732)
Individual '91862A' arrived at 2010-01-06 (duration: 248 days, 6:10:16.962620)
Individual '91910A' arrived at 2010-09-28 (duration: 2 days, 20:34:31.117736)
Individual '91916A' arrived at 2009-12-23 (duration: 115 days, 11:33:05.224061)
Individual '91916A' arrived at 2010-10-30 (duration: 154 days, 14:44:36.105271)
對于'91761A'這隻海鷗,可以看到進入該區域兩次。我們可以展示這隻海鷗在當年的飛行軌迹并添加背景地圖以更好展示這隻海鷗的行進情況。
my_traj = tc.get_trajectory('91761A')
# 提取當年飛行結果
my_traj = my_traj.get_segment_between(datetime(year_of_interest,1,1), datetime(year_of_interest,12,31))
# 壓縮軌迹,tolerance越大壓縮比例越大
my_traj = mpd.DouglasPeuckerGeneralizer(my_traj).generalize(tolerance=2.0)
my_traj.plot()
<Axes: >
通過如下代碼繪制該海鷗的飛行軌迹,可以看到其在去途和歸途都穿過了該地區,其中添加的背景地圖具體使用方法見GeoPandas疊加背景地圖。
import contextily as cx
fig, ax = plt.subplots(figsize=(3, 5))
# 添加飛行軌迹
ax = my_traj.plot(ax=ax,color='blue', capstyle='round')
# 添加停留區域
ax = gpd.GeoSeries(area_of_interest).plot(ax=ax, color='lightgray')
# 繪制1月到11月的位置,12月該鳥飛回原地
for i in range(1,12):
# 紅點表示上半年,綠點表示下半年
color = 'red' if i<6 else 'green'
GeoDataFrame([my_traj.get_row_at(datetime(year_of_interest,i,1))]).plot(ax=ax, color=color,zorder=2)
# 添加背景地圖,zoom越大越精細,這裡使用自适應zoom。
cx.add_basemap(ax = ax , source=cx.providers.Stamen.TonerLite,crs=my_traj.df.crs, zoom='auto',attribution="")
# 儲存圖檔
fig.savefig("res.jpg",dpi=300)
3 參考
- movingpandas
- movingpandas-examples
- movingpandas-examples-data
- shapely-objectdistance
- 豪斯多夫距離
- GeoPandas疊加背景地圖