MovingPandas是一個基于Python和GeoPandas的開源地理時空資料處理庫，用于處理移動物體的軌迹資料。關于MovingPandas的使用見文章：MovingPandas入門指北，本文主要介紹三個MovingPandas的繪圖執行個體。 MovingPandas官方倉庫位址為：movingpandas。MovingPandas官方示例代碼倉庫位址為：movingpandas-examples。本文所有實驗資料來自于：movingpandas-examples-data。

本文所有代碼見：Python-Study-Notes

MovingPandas作者推薦在Python 3.8及以上環境下安裝MovingPandas，并建議使用conda進行安裝。可以使用以下指令來安裝MovingPandas：

conda install -c conda-forge movingpandas

由于MovingPandas的依賴環境較為複雜，是以不推薦使用pip進行安裝。如果堅持使用pip進行安裝，可以按照以下指令來安裝MovingPandas：

pip install movingpandas
# 本文必安裝第三方庫
pip install contextily
pip install seaborn
# 以下第三方庫可選
pip install hvplot
pip install cartopy
pip install geoviews

下面的代碼展示了MovingPandas的版本資訊，本文所用Python版本為Python3.10。

# jupyter notebook環境去除warning
import warnings
warnings.filterwarnings("ignore")

# 檢視movingpandas版本及依賴庫版本資訊
import movingpandas as mpd
mpd.show_versions()

MovingPandas 0.16.1

SYSTEM INFO
-----------
python     : 3.10.10 (main, Mar 21 2023, 18:45:11) [GCC 11.2.0]
executable : /opt/conda/envs/python35-paddle120-env/bin/python
machine    : Linux-5.4.0-109-generic-x86_64-with-glibc2.31

GEOS, GDAL, PROJ INFO
---------------------
GEOS       : None
GEOS lib   : None
GDAL       : 3.6.4
GDAL data dir: /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/fiona/gdal_data
PROJ       : 9.2.1
PROJ data dir: /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/pyproj/proj_dir/share/proj

PYTHON DEPENDENCIES
-------------------
geopandas  : 0.13.2
pandas     : 2.0.2
fiona      : 1.9.4.post1
numpy      : 1.24.3
shapely    : 2.0.1
rtree      : 1.0.1
pyproj     : 3.6.0
matplotlib : 3.7.1
mapclassify: None
geopy      : 2.3.0
holoviews  : None
hvplot     : None
geoviews   : None
stonesoup  : None

以下代碼用于加載繪圖所需第三方庫。

import numpy as np
import pandas as pd
import geopandas as gpd
import movingpandas as mpd
import shapely as shp
import matplotlib.pyplot as plt

from geopandas import GeoDataFrame, read_file
from shapely.geometry import Point, LineString, Polygon
from datetime import datetime, timedelta
from os.path import exists
from urllib.request import urlretrieve
import contextily as cx
import warnings
import seaborn as sns
warnings.filterwarnings('ignore')

plot_defaults = {'linewidth':5, 'capstyle':'round', 'figsize':(9,3), 'legend':True}

1 船舶資料分析示例

船舶自動識别系統（Automatic Identification System，簡稱AIS）是一種用于船舶間的自動通信系統。它通過無線電信号在船舶之間傳輸資訊，讓附近的其他船舶和岸上監控站能夠獲得船舶的資訊，如船名、船籍國、船舶類型、船舶位置、航向、航速等。本教程使用丹麥海事局釋出的AIS資料，所提取的AIS記錄樣本涵蓋了2017年7月5日哥德堡附近的船舶交通資料。本章通過對AIS資料進行軌迹資料分析，能夠獲得有關船舶交通有價值的見解。

step1 加載資料

# 加載資料
df = read_file('data/ais.gpkg')
# 檢視資料
# Timestamp: 時間戳，訓示AIS資料記錄的時間。它表示了記錄被建立或接收的日期和時間。
# MMSI: 船舶識别碼（Maritime Mobile Service Identity），是一個唯一的數字辨別符。
# NavStatus: 導航狀态，訓示船舶目前的導航狀态或活動狀态。
# SOG: 對地航速（Speed Over Ground），指船舶相對于地面的速度。
# COG: 對地航向（Course Over Ground），表示船舶相對于地面的航向。
# Name: 船名，是船舶的名稱或辨別符。
# ShipType: 船舶類型，表示船舶的類别或類型。
# geometry: 幾何資訊，表示船舶的空間位置。
df.head()

Timestamp	MMSI	NavStatus	SOG	COG	Name	ShipType	geometry
05/07/2017 00:00:03	219632000	Under way using engine	0.0	270.4	NaN	Undefined	POINT (11.85958 57.68817)
1	05/07/2017 00:00:05	265650970	Under way using engine	0.0	0.5	NaN	Undefined	POINT (11.84175 57.66150)
2	05/07/2017 00:00:06	265503900	Under way using engine	0.0	0.0	NaN	Undefined	POINT (11.90650 57.69077)
3	05/07/2017 00:00:14	219632000	Under way using engine	0.0	188.4	NaN	Undefined	POINT (11.85958 57.68817)
4	05/07/2017 00:00:19	265519650	Under way using engine	0.0	357.2	NaN	Undefined	POINT (11.87192 57.68233)

# 檢視資料的坐标系
df_crs = df.crs
df_crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

# 可視化資料點
df.plot()

<Axes: >

[資料分析與可視化] Python繪制資料地圖5-MovingPandas繪圖執行個體

# 檢視各船舶的對地航速
df['SOG'].hist(bins=100, figsize=(12,3))

<Axes: >

對于資料分析，首先删除那些對地航速為0的樣本：

# 列印資料次元
print("原始資料樣本數：{}".format(len(df)))
df = df[df.SOG>0]
print("删除對地航速為0後資料樣本數：{}".format(len(df)))

原始資料樣本數：84702
删除對地航速為0後資料樣本數：33593

檢視船隻有哪些類型：

df['ShipType'].value_counts().plot(kind='bar', figsize=(12,3))

<Axes: xlabel='ShipType'>

df['t'] = pd.to_datetime(df['Timestamp'], format='%d/%m/%Y %H:%M:%S')
# 建立軌迹，軌迹資料點數最少為100
traj_collection = mpd.TrajectoryCollection(df, 'MMSI', t='t', min_length=100)
# 壓縮軌迹
traj_collection = mpd.MinTimeDeltaGeneralizer(traj_collection).generalize(tolerance=timedelta(minutes=1))
traj_collection

TrajectoryCollection with 77 trajectories

step2 資料可視化

我們可以展示不同類型船隻的航行軌迹。其中添加的背景地圖具體使用方法見GeoPandas疊加背景地圖。

shiptype_to_color = {'Passenger': 'blue', 'HSC': 'green', 'Tanker': 'red', 'Cargo': 'orange'}
ax = traj_collection.plot(column='ShipType', column_to_color=shiptype_to_color, linewidth=1, capstyle='round')
# 添加背景地圖,zoom越大越精細，這裡使用自适應zoom。
cx.add_basemap(ax = ax , source=cx.providers.Stamen.TonerLite,crs=traj_collection.trajectories[0].crs, zoom='auto',attribution="")

# 單獨展示某種類型船隻的航線軌迹
passenger = traj_collection.filter('ShipType', 'Passenger')
passenger.plot()

<Axes: >

也可以單獨展示某條軌迹。

my_traj = traj_collection.trajectories[0]
my_traj.df.head()

Timestamp	MMSI	NavStatus	SOG	COG	Name	ShipType	geometry
t
2017-07-05 17:32:18	05/07/2017 17:32:18	210035000	Under way using engine	9.8	52.8	NORDIC HAMBURG	Cargo	POINT (11.80462 57.67612)
2017-07-05 17:33:18	05/07/2017 17:33:18	210035000	Under way using engine	9.5	58.9	NORDIC HAMBURG	Cargo	POINT (11.80875 57.67773)
2017-07-05 17:34:18	05/07/2017 17:34:18	210035000	Under way using engine	9.3	70.5	NORDIC HAMBURG	Cargo	POINT (11.81311 57.67879)
2017-07-05 17:35:28	05/07/2017 17:35:28	210035000	Under way using engine	9.5	71.1	NORDIC HAMBURG	Cargo	POINT (11.81855 57.67968)
2017-07-05 17:36:28	05/07/2017 17:36:28	210035000	Under way using engine	9.4	71.3	NORDIC HAMBURG	Cargo	POINT (11.82334 57.68044)

# 繪制該軌迹
ax = my_traj.plot() 
ax.set_title('Trajectory {}'.format(my_traj.id))

Text(0.5, 1.0, 'Trajectory 210035000')

我們也可以檢視感興趣區域的軌迹。

area_of_interest = Polygon([(11.89935, 57.69270), (11.90161, 57.68902), (11.90334, 57.68967), (11.90104, 57.69354)
, (11.89935, 57.69270)])
# 傳回與感興趣區域相交的軌迹
intersecting = traj_collection.get_intersecting(area_of_interest)
intersecting

TrajectoryCollection with 20 trajectories

trips = mpd.ObservationGapSplitter(passenger).split(gap=timedelta(minutes=5))
trips.plot()

<Axes: >

step3 感興趣區域分析

我們可以确定一個感興趣區域，如海事局。然後分析離開或達到該區域的船隻時間和類型。

# 對于單個軌迹，如果其兩個連續觀測超過間隔gap，如5分鐘，則認為該軌迹需要拆分為
trips = mpd.ObservationGapSplitter(traj_collection).split(gap=timedelta(minutes=5))
# 設定感興趣區域，如海事局的坐标
area_of_interest = Polygon([(11.86815, 57.68273), (11.86992, 57.68047), (11.87419, 57.68140), (11.87288, 57.68348), (11.86815, 57.68273)])
# 獲得軌迹起點在感興趣區域的軌迹
departures = [traj for traj in trips if traj.get_start_location().intersects(area_of_interest) and traj.get_length() > 100]

# 合并軌迹
tc = mpd.TrajectoryCollection(departures)
ax = tc.plot()
gpd.GeoSeries(area_of_interest).plot(ax=ax,color='red')

<Axes: >

這樣我們可以看到各艘船隻從該區域出發的時間。

for traj in departures:
    print(f"{traj.df['ShipType'].iloc[0]} vessel '{traj.df['Name'].iloc[0]}' departed at {traj.get_start_time()}")

Law enforcement vessel 'KBV 010' departed at 2017-07-05 10:36:03
Law enforcement vessel 'KBV 010' departed at 2017-07-05 14:33:02
Law enforcement vessel 'KBV 048' departed at 2017-07-05 10:20:44
Pilot vessel 'PILOT 794 SE' departed at 2017-07-05 01:21:07
Pilot vessel 'PILOT 794 SE' departed at 2017-07-05 04:15:04
Pilot vessel 'PILOT 794 SE' departed at 2017-07-05 06:58:56
Pilot vessel 'PILOT 794 SE' departed at 2017-07-05 08:45:08
Pilot vessel 'PILOT 794 SE' departed at 2017-07-05 12:02:18
Pilot vessel 'PILOT 794 SE' departed at 2017-07-05 13:34:42
Pilot vessel 'PILOT 794 SE' departed at 2017-07-05 22:32:47
Pilot vessel 'PILOT 218 SE' departed at 2017-07-05 09:27:24
Pilot vessel 'PILOT 218 SE' departed at 2017-07-05 16:10:29

此外，我們也可以獲得到達該區域各船隻的時間。

arrivals = [traj for traj in trips if traj.get_end_location().intersects(area_of_interest) and traj.get_length() > 100]
print(f"Found {len(arrivals)} arrivals")

for traj in arrivals:
    print(f"{traj.df['ShipType'].iloc[0]} vessel '{traj.df['Name'].iloc[0]}' arrived at {traj.get_end_time()}")

Found 12 arrivals
Law enforcement vessel 'KBV 010' arrived at 2017-07-05 10:51:03
Law enforcement vessel 'KBV 048' arrived at 2017-07-05 10:26:44
Pilot vessel 'PILOT 794 SE' arrived at 2017-07-05 01:36:56
Pilot vessel 'PILOT 794 SE' arrived at 2017-07-05 04:45:36
Pilot vessel 'PILOT 794 SE' arrived at 2017-07-05 08:16:46
Pilot vessel 'PILOT 794 SE' arrived at 2017-07-05 08:54:34
Pilot vessel 'PILOT 794 SE' arrived at 2017-07-05 13:06:37
Pilot vessel 'PILOT 794 SE' arrived at 2017-07-05 16:44:06
Pilot vessel 'PILOT 794 SE' arrived at 2017-07-05 23:58:49
Pilot vessel 'PILOT 218 SE' arrived at 2017-07-05 10:07:23
Pilot vessel 'PILOT 218 SE' arrived at 2017-07-05 17:46:12
Tanker vessel 'DANA' arrived at 2017-07-05 08:35:42

step4 軌迹起始點聚類分析

# 加載sklearn
from sklearn.cluster import DBSCAN
from geopy.distance import great_circle
from shapely.geometry import MultiPoint

首先提取各條軌迹的起始點xy坐标。

origins = trips.get_start_locations()
origins['lat'] = origins.geometry.y
origins['lon'] = origins.geometry.x
matrix = origins[['lat','lon']].values
matrix.shape

(302, 2)

然後聚類各個起始點。

# 經緯度距離換算：每弧度距離約為6371.0088公裡
kms_per_radian = 6371.0088

# DBSCAN的鄰域半徑設定：根據實際情況，将0.1公裡轉換為對應的弧度值
epsilon = 0.1 / kms_per_radian

# 使用Ball Tree算法和haversine（球面距離）作為度量方式，對經緯度資料進行DBSCAN聚類
db = DBSCAN(eps=epsilon, min_samples=1, algorithm='ball_tree', metric='haversine').fit(np.radians(matrix))

# 擷取DBSCAN聚類結果的标簽
cluster_labels = db.labels_

# 計算聚類的數量
num_clusters = len(set(cluster_labels))

# 将聚類的資料劃分到不同的簇中，儲存為pandas的Series資料結構
clusters = pd.Series([matrix[cluster_labels == n] for n in range(num_clusters)])

# 輸出聚類的數量
print(f'聚類的數量：{num_clusters}')

聚類的數量：69

提取起始點聚類結果的中心。

# 将聚類标簽添加到origins資料中
origins['cluster'] = cluster_labels

# 定義一個函數，用于擷取簇中心點
def get_centermost_point(cluster):
    # 計算簇的質心
    centroid = (MultiPoint(cluster).centroid.x, MultiPoint(cluster).centroid.y)
    # 找到距離質心最近的點作為簇的中心點
    centermost_point = min(cluster, key=lambda point: great_circle(point, centroid).m)
    # 傳回中心點的坐标（經度，緯度）構成的Point對象
    return Point(tuple(centermost_point)[1], tuple(centermost_point)[0])

# 對每個簇應用get_centermost_point函數，得到各個簇的中心點集合
centermost_points = clusters.map(get_centermost_point)
centermost_points.shape

(69,)

彙總聚類結果中心的資訊。

# 通過cluster列對origins資料框進行分組，得到一個按照簇(cluster)标簽進行分組的DataFrame
origins_by_cluster = pd.DataFrame(origins).groupby(['cluster'])

# 建立一個新的DataFrame，用于存儲每個簇的彙總資訊
summary = origins_by_cluster['ShipType'].unique().to_frame(name='types')

# 在彙總DataFrame中添加n列，表示每個簇中資料點的數量
summary['n'] = origins_by_cluster.size()

# 在彙總DataFrame中添加sog列，表示每個簇中資料點的SOG（Speed Over Ground）平均值
summary['sog'] = origins_by_cluster['SOG'].mean()

# 在彙總DataFrame中添加geometry列，表示每個簇的中心點坐标
summary['geometry'] = centermost_points

# 從彙總DataFrame中移除資料點數量小于1的簇，并按照n列進行降序排序
summary = summary[summary['n']>1].sort_values(by='n', ascending=False)

# 顯示彙總DataFrame中前幾行的資料
summary.head()

types	n	sog	geometry
cluster
5	[Tanker, Passenger, Undefined, Fishing, Cargo]	52	9.217308	POINT (11.911787 57.69663)
28	[Passenger, Undefined, HSC]	47	0.804255	POINT (11.84232 57.661593)
[Cargo, Tanker, Tug, Passenger]	28	11.946429	POINT (11.80495 57.676108)
27	[Passenger, Undefined, HSC]	24	15.9875	POINT (11.819332 57.648027)
11	[SAR, Passenger]	19	10.736842	POINT (11.804653 57.654408)

我們可以檢視某一個聚類簇各個船隻的資訊。

cluster_of_interest_id = 28
ax = origins[origins['cluster']==cluster_of_interest_id].plot( column='ShipType')

最後我們可以繪制軌迹資訊和圖中各個軌迹起點聚類的中心。

fig, ax = plt.subplots(figsize=(9, 6))
ax = trips.plot(ax=ax, color='gray', line_width=1) 
# 添加簇中心
ax = GeoDataFrame(summary,crs=df_crs).plot(ax=ax, column='sog', cmap='RdYlGn', markersize=summary['n'].values*3,zorder=2)
# 添加背景地圖,zoom越大越精細，這裡使用自适應zoom。
cx.add_basemap(ax = ax , source=cx.providers.Stamen.TonerLite,crs=my_traj.df.crs, zoom='auto',attribution="")
cx.add_basemap(ax = ax , source=cx.providers.Stamen.TonerLabels,crs=my_traj.df.crs, zoom='auto',attribution="")
# 儲存圖檔
fig.savefig("res.jpg",dpi=300)

2 馬頸圈資料分析示例

馬頸圈Horse collar是一種用于馴服和控制馬匹的裝置。它通常是由皮革、尼龍或其他堅固材料制成的環狀物，以圍繞馬的頸部。馬頸圈的設計目的是使馬匹更容易受到駕馭者的控制并按照駕馭者的訓示移動。本章基于哥本哈根大學和丹麥某市政技術與環境中心提供的馬頸圈跟蹤資料進行馬匹行為分析，其分析方法可用于其他類似資料集。

step1資料導入

加載資料，并删除不需要的列。

df = read_file('data/horse_collar.gpkg')
df['t'] = pd.to_datetime(df['timestamp'])
df = df.set_index('t').tz_localize(None)
# 去除不使用的列
df = df.drop(columns=['LMT_Date', 'LMT_Time',
       'Origin', 'SCTS_Date', 'SCTS_Time', 'Latitude [?]', 'Longitude [?]',
       'FixType', 'Main [V]', 'Beacon [V]', 'Sats', 'Sat',
       'C/N', 'Sat_1', 'C/N_1', 'Sat_2', 'C/N_2', 'Sat_3', 'C/N_3', 'Sat_4',
       'C/N_4', 'Sat_5', 'C/N_5', 'Sat_6', 'C/N_6', 'Sat_7', 'C/N_7', 'Sat_8',
       'C/N_8', 'Sat_9', 'C/N_9', 'Sat_10', 'C/N_10', 'Sat_11', 'C/N_11',
       'Easting', 'Northing',], axis=1)

# 檢視資料
# No: 每條記錄的序号。
# CollarID: 馬頸圈的唯一辨別符，每一個馬頸圈代表一匹馬
# UTC_Date: 記錄資料的日期，使用世界協調時間（UTC）标準表示。
# UTC_Time: 記錄資料的時間，使用世界協調時間（UTC）标準表示。
# lat: 資料點的緯度資訊。
# long: 資料點的經度資訊。
# Mort. Status: 
# Temp [?C]: 馬頸圈資料點的溫度。
# Activity: 馬頸圈資料點的活動級别。
# timestamp: 記錄資料的時間戳。
# geometry: 馬頸圈資料點的GeoPandas位置幾何資訊。
df.head()

No	CollarID	UTC_Date	UTC_Time	lat	long	Mort. Status	Temp [?C]	Activity	timestamp	geometry
t
2018-11-14 12:01:08	299	30788	14-11-2018	12:01:08	54.743331	11.916987	NaN	22.0	NaN	2018-11-14 12:01:08	POINT (687757.574 6070134.334)
2018-11-14 12:15:09	300	30788	14-11-2018	12:15:09	54.676884	11.910876	NaN	22.0	NaN	2018-11-14 12:15:09	POINT (687671.088 6062727.428)
2018-11-14 12:30:08	301	30788	14-11-2018	12:30:08	54.627018	11.957852	NaN	21.0	NaN	2018-11-14 12:30:08	POINT (690932.614 6057307.716)
2018-11-14 13:00:33	302	30788	14-11-2018	13:00:33	54.625893	11.953686	NaN	17.0	NaN	2018-11-14 13:00:33	POINT (690669.038 6057171.248)
2018-11-14 13:30:09	303	30788	14-11-2018	13:30:09	54.626171	11.954280	NaN	17.0	NaN	2018-11-14 13:30:09	POINT (690706.056 6057203.814)

# 檢視資料次元
df.shape

(17517, 11)

# 檢視資料的坐标系
original_crs  = df.crs
original_crs

<Projected CRS: EPSG:25832>
Name: ETRS89 / UTM zone 32N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Europe between 6°E and 12°E: Austria; Belgium; Denmark - onshore and offshore; Germany - onshore and offshore; Norway including - onshore and offshore; Spain - offshore.
- bounds: (6.0, 38.76, 12.01, 84.33)
Coordinate Operation:
- name: UTM zone 32N
- method: Transverse Mercator
Datum: European Terrestrial Reference System 1989 ensemble
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

#　隻有一個馬頸圈辨別符
print(df['CollarID'].unique())
collar_id = df['CollarID'].unique()[0]

[30788]

# 隻有一個馬頸圈運作狀态
df['Activity'].unique()

array([nan])

step2 資料概覽

該步主要檢查資料是否符合邏輯。

位置概覽

馬頸圈資料集個資料點的坐标應該比較接近。按照這一規則，可以删除位置離群點。

# EPSG:4326是常用的經緯度坐标系WGS84
df.to_crs({'init': 'EPSG:4326'}).plot()

<Axes: >

我們可以看到有兩個馬頸圈的坐标和其他馬頸圈的坐标相隔較遠，需要删除這兩項資料以删除離群資料。

# 将資料按照資料點經度從小到大排列
# 經度最高的兩個點即為離群點
df.sort_values('lat').tail(2)

No	CollarID	UTC_Date	UTC_Time	lat	long	Mort. Status	Temp [?C]	Activity	timestamp	geometry
t
2018-11-14 12:15:09	300	30788	14-11-2018	12:15:09	54.676884	11.910876	NaN	22.0	NaN	2018-11-14 12:15:09	POINT (687671.088 6062727.428)
2018-11-14 12:01:08	299	30788	14-11-2018	12:01:08	54.743331	11.916987	NaN	22.0	NaN	2018-11-14 12:01:08	POINT (687757.574 6070134.334)

按照這種方式删除這兩個離群點。

df = df.sort_values('lat')[:-2]
# 按照編号排序
df = df.sort_values('No')
df.head()

No	CollarID	UTC_Date	UTC_Time	lat	long	Mort. Status	Temp [?C]	Activity	timestamp	geometry
t
2018-11-14 12:30:08	301	30788	14-11-2018	12:30:08	54.627018	11.957852	NaN	21.0	NaN	2018-11-14 12:30:08	POINT (690932.614 6057307.716)
2018-11-14 13:00:33	302	30788	14-11-2018	13:00:33	54.625893	11.953686	NaN	17.0	NaN	2018-11-14 13:00:33	POINT (690669.038 6057171.248)
2018-11-14 13:30:09	303	30788	14-11-2018	13:30:09	54.626171	11.954280	NaN	17.0	NaN	2018-11-14 13:30:09	POINT (690706.056 6057203.814)
2018-11-14 14:00:38	304	30788	14-11-2018	14:00:38	54.626167	11.954662	NaN	17.0	NaN	2018-11-14 14:00:38	POINT (690730.771 6057204.431)
2018-11-14 14:30:08	305	30788	14-11-2018	14:30:08	54.626427	11.955650	NaN	18.0	NaN	2018-11-14 14:30:08	POINT (690793.288 6057235.998)

繪制删除離群點後的資料。

df.to_crs({'init': 'EPSG:4326'}).plot()

<Axes: >

最後我們可以大體計算馬群活動面積。

temp = df.to_crs(original_crs)
# 膨脹資料
# 對temp DataFrame中的幾何對象進行緩沖處理。
# 對geometry列中的每個幾何對象應用buffer(5)，即在每個幾何對象周圍建立一個半徑為5的緩沖區。
temp['geometry'] = temp['geometry'].buffer(5)
# dissolve将所有項按照CollarID列的資訊合并
# area計算面積，original_crs使用的機關是米，那麼這裡面積的機關應該是平方米
total_area = temp.dissolve(by='CollarID').area 
# 将平方米轉換為平方公頃(平方百米)
total_area = total_area[collar_id]/10000
print("資料覆寫的總面積為: {:,.2f} 公頃".format(total_area))

資料覆寫的總面積為: 65.19 公頃

時間概覽

# 檢視資料點的時間範圍
print("資料點的時間範圍為 {} 到 {}.".format(df.index.min(), df.index.max()))

資料點的時間範圍為 2018-11-14 12:30:08 到 2019-11-07 05:30:10.

# 檢視資料點的時間跨度
print("資料點的時間跨度為 {}".format(df.index.max() - df.index.min()))

資料點的時間跨度為 357 days 17:00:02

# 統計每一天的馬頸圈資料
df['No'].resample('1d').count()

t
2018-11-14    23
2018-11-15    66
2018-11-16    58
2018-11-17    70
2018-11-18    79
              ..
2019-11-03    48
2019-11-04    48
2019-11-05    48
2019-11-06    48
2019-11-07    12
Freq: D, Name: No, Length: 359, dtype: int64

# 可視化資料量
df['No'].resample('1d').count().plot()

<Axes: xlabel='t'>

此外，也可以進一步統計各個月份的樣本數，代碼如下：

# 添加年份-月份列
df['Y-M'] = df.index.to_period('M')    
for i in df['Y-M'].unique():
    print("時間{}，樣本數{}".format(i,len(df[df['Y-M']==i])))

時間2018-11，樣本數938
時間2018-12，樣本數1486
時間2019-01，樣本數1489
時間2019-02，樣本數1346
時間2019-03，樣本數1488
時間2019-04，樣本數1441
時間2019-05，樣本數1650
時間2019-06，樣本數1474
時間2019-07，樣本數1556
時間2019-08，樣本數1499
時間2019-09，樣本數1439
時間2019-10，樣本數1407
時間2019-11，樣本數302

step3 資料分析

采樣間隔分析

對于資料的采樣，很重要的一點是确定資料是否為間隔采樣。如果是間隔采樣，那麼采樣間隔是多少？

# 重置索引并提取't'列
t = df.reset_index()['t']
# 計算時間間隔，當每一項資料與前一項資料的時間差
df['delta_t'] = t.diff().values
# 将時間間隔轉換為分鐘
df['delta_t'] = df['delta_t'].dt.total_seconds() / 60
df['delta_t']

t
2018-11-14 12:30:08          NaN
2018-11-14 13:00:33    30.416667
2018-11-14 13:30:09    29.600000
2018-11-14 14:00:38    30.483333
2018-11-14 14:30:08    29.500000
                         ...    
2019-11-07 03:30:38    30.466667
2019-11-07 04:00:09    29.516667
2019-11-07 04:30:08    29.983333
2019-11-07 05:00:10    30.033333
2019-11-07 05:30:10    30.000000
Name: delta_t, Length: 17515, dtype: float64

如下繪制時間間隔柱狀圖，檢視時間間隔集中哪個值。結果是采樣方式為均勻采樣，采樣間隔通常在30分鐘左右。

# 繪制時間間隔柱狀圖
# bins=60表示将資料範圍劃分成60個等寬的區間，每一個bin表示一個分鐘
#range=(0, 60)表示設定直方圖顯示的資料範圍在0到60之間。
df['delta_t'].hist(bins=60, range=(0, 60))

<Axes: >

運動速度和方向分析

通過最大運動速度判斷資料中是否包含無法達到的速度。

# 'CollarID'隻有一個，是以軌迹隻有一個
tc = mpd.TrajectoryCollection(df, 'CollarID')
traj = tc.trajectories[0]
# 添加速度，機關是坐标系機關/s
traj.add_speed()
# 計算速度的最大值
max_speed = traj.df.speed.max()
print("The highest computed speed is {:,.2f} m/s ({:,.2f} km/h)".format(max_speed, max_speed*3600/1000))

The highest computed speed is 0.82 m/s (2.94 km/h)

基于速度，我們可以檢視速度分布，如下所示：

pd.DataFrame(traj.df)['speed'].hist(bins=100)

<Axes: >

進而，我們也可以分析運動方向分布。

traj.add_direction(overwrite=True)
pd.DataFrame(traj.df)['direction'].hist(bins=360)

<Axes: >

時間趨勢分析

如果我們想了解馬匹運動趨勢與時間的關系，可以通過如下代碼實作。

# 分析時間和速度的關系
tmp = pd.DataFrame(traj.df)[['speed']]
tmp = tmp.reindex(pd.DataFrame(traj.df).index)
# 生成小時和月份列
tmp["hour"] = tmp.index.hour
tmp["month"] = tmp.index.month
tmp.head()

speed	hour	month
t
2018-11-14 12:30:08	0.162635	12	11
2018-11-14 13:00:33	0.162635	13	11
2018-11-14 13:30:09	0.027761	13	11
2018-11-14 14:00:38	0.013517	14	11
2018-11-14 14:30:08	0.039568	14	11

然後我們可以彙總不同月份不同小時下馬匹平均速度。結果發現一年中按小時劃分的運動速度表現出明顯的不同，馬匹夏季的運動較早且時間跨度大，冬季的運動較晚且時間跨度小。

# 使用pivot_table函數計算平均速度在不同小時和月份的彙總資料
pivot_table = tmp.pivot_table(values='speed', index='hour', columns='month', aggfunc='mean')
# 建立熱圖
plt.figure(figsize=(9, 12))
# 橫軸表示月份，縱軸表示小時，
sns.heatmap(pivot_table, annot=False, cmap='coolwarm')

<Axes: xlabel='month', ylabel='hour'>

溫度趨勢分析

除了時間，資料集還包含每個記錄的溫度資訊，我們也可以對溫度進行分析。

# 分析溫度和速度的關系
tmp = pd.DataFrame(traj.df)[['Temp [?C]', 'speed']]
tmp = tmp.reindex(pd.DataFrame(traj.df).index)
# 生成月份列
tmp["month"] = tmp.index.month
tmp.head()

Temp [?C]	speed	month
t
2018-11-14 12:30:08	21.0	0.162635	11
2018-11-14 13:00:33	17.0	0.162635	11
2018-11-14 13:30:09	17.0	0.027761	11
2018-11-14 14:00:38	17.0	0.013517	11
2018-11-14 14:30:08	18.0	0.039568	11

如上資料所示，每一行表示某月某個時間出現某個溫度一次。于此，可以新增清單示次數，但都是一次。

tmp['n'] = 1
tmp.head()

Temp [?C]	speed	month	n
t
2018-11-14 12:30:08	21.0	0.162635	11	1
2018-11-14 13:00:33	17.0	0.162635	11	1
2018-11-14 13:30:09	17.0	0.027761	11	1
2018-11-14 14:00:38	17.0	0.013517	11	1
2018-11-14 14:30:08	18.0	0.039568	11	1

基于溫度和月份，我們可以統計每個月份每個溫度出現的次數，如下所示：

# 使用pivot_table函數統計不同月份下不同溫度值出現次數
pivot_table = tmp.pivot_table(values='n', index='Temp [?C]', columns='month', aggfunc='sum')
# 建立熱圖
plt.figure(figsize=(9, 12))
# 橫軸表示月份，縱軸表示溫度值
sns.heatmap(pivot_table, annot=False, cmap='coolwarm')

<Axes: xlabel='month', ylabel='Temp [?C]'>

同樣我們也可以統計不同月份不同溫度下速度的平均值。

# 使用pivot_table函數統計不同月份下不同溫度值下速度平均值
pivot_table = tmp.pivot_table(values='speed', index='Temp [?C]', columns='month', aggfunc='mean')
# 建立熱圖
plt.figure(figsize=(9, 12))
# 橫軸表示月份，縱軸表示溫度值
sns.heatmap(pivot_table, annot=False, cmap='coolwarm')

<Axes: xlabel='month', ylabel='Temp [?C]'>

step3 軌迹分析

本步着眼于個體軌迹的分析。是以，需要将連續軌迹分割成單獨的軌迹。分析結果取決于連續流如何劃分為軌迹、位置和事件。

軌迹長度分析

# 按天為機關分割軌迹
daily = mpd.TemporalSplitter(tc).split(mode='day')
daily

TrajectoryCollection with 358 trajectories

# 計算每條軌迹的長度
daily_lengths = [traj.get_length() for traj in daily]
# 計算每條軌迹的開始時間
daily_t = [traj.get_start_time() for traj in daily]

# 建立軌迹開始時間和長度的dataframe
daily_lengths = pd.DataFrame(daily_lengths, index=daily_t, columns=['length'])
daily_lengths.head()

length
2018-11-14 12:30:08	1090.598526
2018-11-15 00:00:08	4219.980813
2018-11-16 00:00:10	3198.209140
2018-11-17 00:00:09	4307.483500
2018-11-18 00:00:09	3548.902314

可視化軌迹開始時間和軌迹長度的關系，可以看到季節趨勢與之前發現的季節運動速度模式非常一緻：冬季軌迹往往比夏季軌迹短。

daily_lengths.plot()

<Axes: >

停留檢測分析

我們可以對某日軌迹進行單獨分析，并通過停留檢測以提取軌迹的停留段，如下所示：

# 設定最大直徑為100
MAX_DIAMETER = 100

# 設定最小持續時間為3小時
MIN_DURATION = timedelta(hours=3)

# 擷取指定日期（2018年11月17日）的軌迹資料，并指派給變量one_day
one_day = daily.get_trajectory('30788_2018-11-17 00:00:00')

# 使用停留段檢測器(mpd.TrajectoryStopDetector)擷取指定軌迹資料的停留段，
# 并根據最小持續時間和最大直徑進行篩選，結果存儲在變量one_day_stops中
one_day_stops = mpd.TrajectoryStopDetector(one_day).get_stop_segments(
    min_duration=MIN_DURATION, max_diameter=MAX_DIAMETER)

繪制軌迹停留可視化結果代碼如下：

# 繪制軌迹
# 建立一個繪圖區域(ax)，并将指定日期（one_day）的軌迹資料以灰色(slategray)進行繪制
ax = one_day.plot(color='slategray')
# 将停留段資料（one_day_stops）以深粉色(deeppink)的顔色進行繪制
ax = one_day_stops.plot(ax=ax, color='deeppink')
# 擷取停留段的起始位置，并将這些起始位置以紅色(red)進行繪制
ax = one_day_stops.get_start_locations().plot(ax=ax, color='red')

我們也可以對所有時間段的軌迹進行停留識别，代碼如下：

# 獲得停留軌迹點
stops = mpd.TrajectoryStopDetector(tc).get_stop_points(min_duration=MIN_DURATION, max_diameter=MAX_DIAMETER)
len(stops)

# 繪制這些停留的軌迹點
stops.plot(color='deeppink',alpha=0.2)

<Axes: >

此外還可以繪制各停留時間對應的停留次數：

stops['duration_h'] = (stops['end_time']-stops['start_time']).dt.total_seconds() / 3600
# 橫軸為停留時間，縱軸為停留次數
pd.DataFrame(stops)['duration_h'].hist(bins=12)

<Axes: >

3 足球分析示例

本教程使用利物浦足球比賽視訊片段中提取的資料，資料每一行代表利物浦隊在2019年一次比賽中的一個時間點的資料。資料集下載下傳位址為：liverpool_2019.csv。

step1 資料處理

input_file = "data/liverpool_2019.csv"
df = pd.read_csv(input_file)
# 删除列
df.drop(columns=['Unnamed: 0'], inplace=True)
print("樣本數",len(df))
df.tail()

樣本數 74936

bgcolor	dx	dy	edgecolor	frame	play	player	player_num	team	x	y	z
74931	blue	0.0	0.0	white	120	Leicester 0 - [3] Liverpool	10267	NaN	defense	98.724826	53.720353	0.0
74932	blue	0.0	0.0	white	121	Leicester 0 - [3] Liverpool	10267	NaN	defense	98.724826	53.720353	0.0
74933	blue	0.0	0.0	white	122	Leicester 0 - [3] Liverpool	10267	NaN	defense	98.724826	53.720353	0.0
74934	blue	0.0	0.0	white	123	Leicester 0 - [3] Liverpool	10267	NaN	defense	98.724826	53.720353	0.0
74935	blue	0.0	0.0	white	124	Leicester 0 - [3] Liverpool	10267	NaN	defense	98.724826	53.720353	0.0

所讀取的資料列名解釋如下：

play：進球後的比分情況。進球的球隊位于括号旁邊。
frame：目前位置的幀編号。提供的資料每秒有20幀。
player：球員的編号。此編号在一次比賽中保持一緻，但在不同比賽之間可能會有變化。
player_num：球員的球衣号碼。這個号碼是官方号碼，在2019年的利物浦隊沒有發生變化。
x, y：球員/足球的坐标。球場坐标在每個軸上從0到100。
dx, dy：從上一幀到目前幀的（x，y）坐标變化。
z：高度，從0到1.5（僅填充足球的高度資訊）。
bgcolor：球隊的主要顔色（用作背景色）。
edgecolor：輔助顔色（用作邊緣顔色）。

進一步我們可以對資料進行處理，如下所示：

# 擷取唯一plays清單
plays = list(df.play.unique())
# 獲得時間戳
def to_timestamp(row):
    day = plays.index(row.play) + 1
    start_time = datetime(2019, 1, day, 12, 0, 0)
    # 将frame轉換為時間戳
    td = timedelta(milliseconds=1000 / 20 * row.frame)
    return start_time + td

# 将frame轉換為時間戳，并添加到DataFrame的新列'time'中
df['time'] = df.apply(to_timestamp, axis=1)
# 将'time'列設為索引，以便後續使用時間作為索引來通路資料
df.set_index('time', inplace=True)

# 根據足球場地的标準尺寸，将x和y坐标縮放到合适的範圍
# 許多職業球隊的足球場地标準尺寸為105米乘以68米
pitch_length = 105
pitch_width = 68
df.x = df.x / 100 * pitch_length 
df.y = df.y / 100 * pitch_width

# 将以下列轉化為分類資料
df['team'] = df['team'].astype('category').cat.as_ordered()
df['player'] = df['player'].astype('category').cat.as_ordered()
df['player_num'] = df['player_num'].astype('category').cat.as_ordered()

df.tail()

bgcolor	dx	dy	edgecolor	frame	play	player	player_num	team	x	y	z
time
2019-01-19 12:00:06.000	blue	0.0	0.0	white	120	Leicester 0 - [3] Liverpool	10267	NaN	defense	103.661067	36.52984	0.0
2019-01-19 12:00:06.050	blue	0.0	0.0	white	121	Leicester 0 - [3] Liverpool	10267	NaN	defense	103.661067	36.52984	0.0
2019-01-19 12:00:06.100	blue	0.0	0.0	white	122	Leicester 0 - [3] Liverpool	10267	NaN	defense	103.661067	36.52984	0.0
2019-01-19 12:00:06.150	blue	0.0	0.0	white	123	Leicester 0 - [3] Liverpool	10267	NaN	defense	103.661067	36.52984	0.0
2019-01-19 12:00:06.200	blue	0.0	0.0	white	124	Leicester 0 - [3] Liverpool	10267	NaN	defense	103.661067	36.52984	0.0

得到了資料後，可以轉換為軌迹：

CRS = None
tc = mpd.TrajectoryCollection(df, 'player', x='x', y='y', crs=CRS)
tc

TrajectoryCollection with 364 trajectories

軌迹分析

我們可以對單個球員的軌迹進行分析，代碼如下所示：

# 獲得球員軌迹
PLAY = 2
play_trajs = tc.filter('play', plays[PLAY])
play_trajs

TrajectoryCollection with 20 trajectories

# 繪制其進攻或防守軌迹
play_trajs.plot(column='team', colormap={'attack':'hotpink', 'defense':'turquoise'})

<Axes: >

# 簡化軌迹
generalized = mpd.MinTimeDeltaGeneralizer(play_trajs).generalize(tolerance=timedelta(seconds=0.5))
generalized.add_speed()
# 繪制軌迹速度
generalized.plot(column='speed',legend=True)

<Axes: >

# 添加完整軌迹資料
ax = generalized.plot(column='team', colormap={'attack':'limegreen', 'defense':'purple'})
ax = generalized.get_start_locations().plot(ax = ax ,label='start', color='orange',zorder=2)
import matplotlib.image as mpimg

# 添加背景圖檔
# 下載下傳位址https://github.com/movingpandas/movingpandas/raw/main/tutorials/data/soccer_field.png
img = mpimg.imread('data/soccer_field.png')
ax.imshow(img, extent=[0, pitch_length, 0, pitch_width])

<matplotlib.image.AxesImage at 0x7f0a747a1ed0>

4 參考

MovingPandas入門指北
liverpool_2019.csv
movingpandas
movingpandas-examples
movingpandas-examples-data
GeoPandas疊加背景地圖

[資料分析與可視化] Python繪制資料地圖5-MovingPandas繪圖執行個體

1 船舶資料分析示例

2 馬頸圈資料分析示例

3 足球分析示例

4 參考

繼續閱讀

《方舟生存飛升》今日上線新地圖希臘世界、新生物巨型海龜

“靜安區公共法律服務地圖(雙語版)” 上線啦！

KK官方對戰平台超爽地圖《蒼天隕落》解鎖防守新方式！這次來點不一樣的！

高德地圖的本地圍城：被指景區門票造假，還暗藏“特殊服務”擦邊

這兩天，地圖服務商“心慌慌”

10月上半月“蚊子地圖”公布！注意防蚊滅蚊！

跟着地圖去打卡|在長甯溫水遊泳池，世界冠軍邁出征戰泳壇第一步

歧路旅人大陸的霸者：授予一切之人第3章，劇情有那味了，附地圖

帶上這份賞葉地圖開啟秋遊模式

測繪通報 | 劉傑：2015—2023年地圖稽核服務資料分析與問題應對

七年級上冊地理，期中考試90%學生易丢分考點《地圖》通關訓練

不能忍，勇士主場中國地圖抹去台灣，篡改中國國旗，莫雷事件重演

過年不躺平，一起解鎖“年味新地圖”！

Cache作者宣布地圖已完成重置！TYLOO三圖憾負GL

五一假期自駕去哪裡？2025國内12條精華自駕路線地圖攻略分享

五一假期去哪裡？2025國内15條精華自駕路線地圖攻略分享（2）