MovingPandas is an open-source geospatio-temporal data processing library based on Python and GeoPandas, which is used to process the trajectory data of moving objects. For more information about the use of MovingPandas, see the article: Getting Started with MovingPandas, which mainly introduces three drawing examples of MovingPandas. The official address of the MovingPandas repository is: movingpandas. The address of the official sample code repository for MovingPandas is: movingpandas-examples. All experimental data in this article comes from: movingpandas-examples-data.

All the code in this article can be found at: Python-Study-Notes

The author recommends installing MovingPandas in Python 3.8 or above, and recommends using conda for installation. MovingPandas can be installed using the following command:

conda install -c conda-forge movingpandas

Due to the complexity of MovingPandas' dependency environment, pip is not recommended for installation. If you insist on using pip for installation, you can install MovingPandas by following the following command:

pip install movingpandas
# 本文必安装第三方库
pip install contextily
pip install seaborn
# 以下第三方库可选
pip install hvplot
pip install cartopy
pip install geoviews

The following code shows the version of MovingPandas, which is Python 3.10.

# jupyter notebook环境去除warning
import warnings
warnings.filterwarnings("ignore")

# 查看movingpandas版本及依赖库版本信息
import movingpandas as mpd
mpd.show_versions()

MovingPandas 0.16.1

SYSTEM INFO
-----------
python     : 3.10.10 (main, Mar 21 2023, 18:45:11) [GCC 11.2.0]
executable : /opt/conda/envs/python35-paddle120-env/bin/python
machine    : Linux-5.4.0-109-generic-x86_64-with-glibc2.31

GEOS, GDAL, PROJ INFO
---------------------
GEOS       : None
GEOS lib   : None
GDAL       : 3.6.4
GDAL data dir: /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/fiona/gdal_data
PROJ       : 9.2.1
PROJ data dir: /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/pyproj/proj_dir/share/proj

PYTHON DEPENDENCIES
-------------------
geopandas  : 0.13.2
pandas     : 2.0.2
fiona      : 1.9.4.post1
numpy      : 1.24.3
shapely    : 2.0.1
rtree      : 1.0.1
pyproj     : 3.6.0
matplotlib : 3.7.1
mapclassify: None
geopy      : 2.3.0
holoviews  : None
hvplot     : None
geoviews   : None
stonesoup  : None

The following code is used to load the third-party libraries required for the drawing.

import numpy as np
import pandas as pd
import geopandas as gpd
import movingpandas as mpd
import shapely as shp
import matplotlib.pyplot as plt

from geopandas import GeoDataFrame, read_file
from shapely.geometry import Point, LineString, Polygon
from datetime import datetime, timedelta
from os.path import exists
from urllib.request import urlretrieve
import contextily as cx
import warnings
import seaborn as sns
warnings.filterwarnings('ignore')

plot_defaults = {'linewidth':5, 'capstyle':'round', 'figsize':(9,3), 'legend':True}

1 Example of ship data analysis

The Automatic Identification System (AIS) is an automatic communication system used between ships. It transmits information between ships through radio signals, allowing other nearby ships and shore monitoring stations to obtain information about the ship, such as ship name, flag country, ship type, ship location, heading, speed, etc. This tutorial uses AIS data published by the Danish Maritime Authority, and the extracted sample of AIS records covers vessel traffic data near Gothenburg on July 5, 2017. This chapter provides valuable insights into vessel traffic by analyzing trajectory data from AIS data.

Step1 Load the data

# 加载数据
df = read_file('data/ais.gpkg')
# 查看数据
# Timestamp: 时间戳，指示AIS数据记录的时间。它表示了记录被创建或接收的日期和时间。
# MMSI: 船舶识别码（Maritime Mobile Service Identity），是一个唯一的数字标识符。
# NavStatus: 导航状态，指示船舶当前的导航状态或活动状态。
# SOG: 对地航速（Speed Over Ground），指船舶相对于地面的速度。
# COG: 对地航向（Course Over Ground），表示船舶相对于地面的航向。
# Name: 船名，是船舶的名称或标识符。
# ShipType: 船舶类型，表示船舶的类别或类型。
# geometry: 几何信息，表示船舶的空间位置。
df.head()

Timestamp	MMSI	NavStatus	UNDERTOW	COG	Name	ShipType	geometry
05/07/2017 00:00:03	219632000	Under way using engine	0.0	270.4	In	Undefined	POINT (11.85958 57.68817)
1	05/07/2017 00:00:05	265650970	Under way using engine	0.0	0.5	In	Undefined	POINT (11.84175 57.66150)
2	05/07/2017 00:00:06	265503900	Under way using engine	0.0	0.0	In	Undefined	POINT (11.90650 57.69077)
3	05/07/2017 00:00:14	219632000	Under way using engine	0.0	188.4	In	Undefined	POINT (11.85958 57.68817)
4	05/07/2017 00:00:19	265519650	Under way using engine	0.0	357.2	In	Undefined	POINT (11.87192 57.68233)

# 查看数据的坐标系
df_crs = df.crs
df_crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

# 可视化数据点
df.plot()

<Axes: >

[Data Analysis & Visualization] Python Plotting Data Map 5 - MovingPandas Plot Example

# 查看各船舶的对地航速
df['SOG'].hist(bins=100, figsize=(12,3))

<Axes: >

For data analysis, first delete those samples with a ground speed of 0:

# 打印数据维度
print("原始数据样本数：{}".format(len(df)))
df = df[df.SOG>0]
print("删除对地航速为0后数据样本数：{}".format(len(df)))

原始数据样本数：84702
删除对地航速为0后数据样本数：33593

See what types of vessels there are:

df['ShipType'].value_counts().plot(kind='bar', figsize=(12,3))

<Axes: xlabel='ShipType'>

df['t'] = pd.to_datetime(df['Timestamp'], format='%d/%m/%Y %H:%M:%S')
# 创建轨迹，轨迹数据点数最少为100
traj_collection = mpd.TrajectoryCollection(df, 'MMSI', t='t', min_length=100)
# 压缩轨迹
traj_collection = mpd.MinTimeDeltaGeneralizer(traj_collection).generalize(tolerance=timedelta(minutes=1))
traj_collection

TrajectoryCollection with 77 trajectories

Step 2 Data visualization

We can show the trajectory of different types of vessels. For details on how to use the background map added in it, see GeoPandas Overlay Background Map.

shiptype_to_color = {'Passenger': 'blue', 'HSC': 'green', 'Tanker': 'red', 'Cargo': 'orange'}
ax = traj_collection.plot(column='ShipType', column_to_color=shiptype_to_color, linewidth=1, capstyle='round')
# 添加背景地图,zoom越大越精细，这里使用自适应zoom。
cx.add_basemap(ax = ax , source=cx.providers.Stamen.TonerLite,crs=traj_collection.trajectories[0].crs, zoom='auto',attribution="")

# 单独展示某种类型船只的航线轨迹
passenger = traj_collection.filter('ShipType', 'Passenger')
passenger.plot()

<Axes: >

You can also display a track individually.

my_traj = traj_collection.trajectories[0]
my_traj.df.head()

Timestamp	MMSI	NavStatus	UNDERTOW	COG	Name	ShipType	geometry
t
2017-07-05 17:32:18	05/07/2017 17:32:18	210035000	Under way using engine	9.8	52.8	NORDIC HAMBURG	Cargo	POINT (11.80462 57.67612)
2017-07-05 17:33:18	05/07/2017 17:33:18	210035000	Under way using engine	9.5	58.9	NORDIC HAMBURG	Cargo	POINT (11.80875 57.67773)
2017-07-05 17:34:18	05/07/2017 17:34:18	210035000	Under way using engine	9.3	70.5	NORDIC HAMBURG	Cargo	POINT (11.81311 57.67879)
2017-07-05 17:35:28	05/07/2017 17:35:28	210035000	Under way using engine	9.5	71.1	NORDIC HAMBURG	Cargo	POINT (11.81855 57.67968)
2017-07-05 17:36:28	05/07/2017 17:36:28	210035000	Under way using engine	9.4	71.3	NORDIC HAMBURG	Cargo	POINT (11.82334 57.68044)

# 绘制该轨迹
ax = my_traj.plot() 
ax.set_title('Trajectory {}'.format(my_traj.id))

Text(0.5, 1.0, 'Trajectory 210035000')

We can also look at the trajectory of the region of interest.

area_of_interest = Polygon([(11.89935, 57.69270), (11.90161, 57.68902), (11.90334, 57.68967), (11.90104, 57.69354)
, (11.89935, 57.69270)])
# 返回与感兴趣区域相交的轨迹
intersecting = traj_collection.get_intersecting(area_of_interest)
intersecting

TrajectoryCollection with 20 trajectories

trips = mpd.ObservationGapSplitter(passenger).split(gap=timedelta(minutes=5))
trips.plot()

<Axes: >

STEP3 Region of Interest Analysis

We can identify an area of interest, such as the Maritime Administration. The time and type of vessels leaving or arriving in the area are then analyzed.

# 对于单个轨迹，如果其两个连续观测超过间隔gap，如5分钟，则认为该轨迹需要拆分为
trips = mpd.ObservationGapSplitter(traj_collection).split(gap=timedelta(minutes=5))
# 设置感兴趣区域，如海事局的坐标
area_of_interest = Polygon([(11.86815, 57.68273), (11.86992, 57.68047), (11.87419, 57.68140), (11.87288, 57.68348), (11.86815, 57.68273)])
# 获得轨迹起点在感兴趣区域的轨迹
departures = [traj for traj in trips if traj.get_start_location().intersects(area_of_interest) and traj.get_length() > 100]

# 合并轨迹
tc = mpd.TrajectoryCollection(departures)
ax = tc.plot()
gpd.GeoSeries(area_of_interest).plot(ax=ax,color='red')

<Axes: >

This way we can see when the ships depart from the area.

for traj in departures:
    print(f"{traj.df['ShipType'].iloc[0]} vessel '{traj.df['Name'].iloc[0]}' departed at {traj.get_start_time()}")

Law enforcement vessel 'KBV 010' departed at 2017-07-05 10:36:03
Law enforcement vessel 'KBV 010' departed at 2017-07-05 14:33:02
Law enforcement vessel 'KBV 048' departed at 2017-07-05 10:20:44
Pilot vessel 'PILOT 794 SE' departed at 2017-07-05 01:21:07
Pilot vessel 'PILOT 794 SE' departed at 2017-07-05 04:15:04
Pilot vessel 'PILOT 794 SE' departed at 2017-07-05 06:58:56
Pilot vessel 'PILOT 794 SE' departed at 2017-07-05 08:45:08
Pilot vessel 'PILOT 794 SE' departed at 2017-07-05 12:02:18
Pilot vessel 'PILOT 794 SE' departed at 2017-07-05 13:34:42
Pilot vessel 'PILOT 794 SE' departed at 2017-07-05 22:32:47
Pilot vessel 'PILOT 218 SE' departed at 2017-07-05 09:27:24
Pilot vessel 'PILOT 218 SE' departed at 2017-07-05 16:10:29

In addition, we can also get the time of the ships arriving in the area.

arrivals = [traj for traj in trips if traj.get_end_location().intersects(area_of_interest) and traj.get_length() > 100]
print(f"Found {len(arrivals)} arrivals")

for traj in arrivals:
    print(f"{traj.df['ShipType'].iloc[0]} vessel '{traj.df['Name'].iloc[0]}' arrived at {traj.get_end_time()}")

Found 12 arrivals
Law enforcement vessel 'KBV 010' arrived at 2017-07-05 10:51:03
Law enforcement vessel 'KBV 048' arrived at 2017-07-05 10:26:44
Pilot vessel 'PILOT 794 SE' arrived at 2017-07-05 01:36:56
Pilot vessel 'PILOT 794 SE' arrived at 2017-07-05 04:45:36
Pilot vessel 'PILOT 794 SE' arrived at 2017-07-05 08:16:46
Pilot vessel 'PILOT 794 SE' arrived at 2017-07-05 08:54:34
Pilot vessel 'PILOT 794 SE' arrived at 2017-07-05 13:06:37
Pilot vessel 'PILOT 794 SE' arrived at 2017-07-05 16:44:06
Pilot vessel 'PILOT 794 SE' arrived at 2017-07-05 23:58:49
Pilot vessel 'PILOT 218 SE' arrived at 2017-07-05 10:07:23
Pilot vessel 'PILOT 218 SE' arrived at 2017-07-05 17:46:12
Tanker vessel 'DANA' arrived at 2017-07-05 08:35:42

Step4 Cluster analysis at the beginning of the trajectory

# 加载sklearn
from sklearn.cluster import DBSCAN
from geopy.distance import great_circle
from shapely.geometry import MultiPoint

Firstly, the XY coordinates of the starting point of each trajectory are extracted.

origins = trips.get_start_locations()
origins['lat'] = origins.geometry.y
origins['lon'] = origins.geometry.x
matrix = origins[['lat','lon']].values
matrix.shape

(302, 2)

The individual starting points are then clustered.

# 经纬度距离换算：每弧度距离约为6371.0088公里
kms_per_radian = 6371.0088

# DBSCAN的邻域半径设置：根据实际情况，将0.1公里转换为对应的弧度值
epsilon = 0.1 / kms_per_radian

# 使用Ball Tree算法和haversine（球面距离）作为度量方式，对经纬度数据进行DBSCAN聚类
db = DBSCAN(eps=epsilon, min_samples=1, algorithm='ball_tree', metric='haversine').fit(np.radians(matrix))

# 获取DBSCAN聚类结果的标签
cluster_labels = db.labels_

# 计算聚类的数量
num_clusters = len(set(cluster_labels))

# 将聚类的数据划分到不同的簇中，保存为pandas的Series数据结构
clusters = pd.Series([matrix[cluster_labels == n] for n in range(num_clusters)])

# 输出聚类的数量
print(f'聚类的数量：{num_clusters}')

聚类的数量：69

Extract the center of the starting point clustering results.

# 将聚类标签添加到origins数据中
origins['cluster'] = cluster_labels

# 定义一个函数，用于获取簇中心点
def get_centermost_point(cluster):
    # 计算簇的质心
    centroid = (MultiPoint(cluster).centroid.x, MultiPoint(cluster).centroid.y)
    # 找到距离质心最近的点作为簇的中心点
    centermost_point = min(cluster, key=lambda point: great_circle(point, centroid).m)
    # 返回中心点的坐标（经度，纬度）构成的Point对象
    return Point(tuple(centermost_point)[1], tuple(centermost_point)[0])

# 对每个簇应用get_centermost_point函数，得到各个簇的中心点集合
centermost_points = clusters.map(get_centermost_point)
centermost_points.shape

(69,)

Summarize the information for the clustered result center.

# 通过cluster列对origins数据框进行分组，得到一个按照簇(cluster)标签进行分组的DataFrame
origins_by_cluster = pd.DataFrame(origins).groupby(['cluster'])

# 创建一个新的DataFrame，用于存储每个簇的汇总信息
summary = origins_by_cluster['ShipType'].unique().to_frame(name='types')

# 在汇总DataFrame中添加n列，表示每个簇中数据点的数量
summary['n'] = origins_by_cluster.size()

# 在汇总DataFrame中添加sog列，表示每个簇中数据点的SOG（Speed Over Ground）平均值
summary['sog'] = origins_by_cluster['SOG'].mean()

# 在汇总DataFrame中添加geometry列，表示每个簇的中心点坐标
summary['geometry'] = centermost_points

# 从汇总DataFrame中移除数据点数量小于1的簇，并按照n列进行降序排序
summary = summary[summary['n']>1].sort_values(by='n', ascending=False)

# 显示汇总DataFrame中前几行的数据
summary.head()

types	n	undertow	geometry
cluster
5	[Tanker, Passenger, Undefined, Fishing, Cargo]	52	9.217308	POINT (11.911787 57.69663)
28	[Passenger, Undefined, HSC]	47	0.804255	POINT (11.84232 57.661593)
[Cargo, Tanker, Tug, Passenger]	28	11.946429	POINT (11.80495 57.676108)
27	[Passenger, Undefined, HSC]	24	15.9875	POINT (11.819332 57.648027)
11	[SAR, Passenger]	19	10.736842	POINT (11.804653 57.654408)

We can view information about individual vessels in a cluster.

cluster_of_interest_id = 28
ax = origins[origins['cluster']==cluster_of_interest_id].plot( column='ShipType')

Finally, we can plot the trajectory information and the center of the clustering of the starting points of each trajectory in the graph.

fig, ax = plt.subplots(figsize=(9, 6))
ax = trips.plot(ax=ax, color='gray', line_width=1) 
# 添加簇中心
ax = GeoDataFrame(summary,crs=df_crs).plot(ax=ax, column='sog', cmap='RdYlGn', markersize=summary['n'].values*3,zorder=2)
# 添加背景地图,zoom越大越精细，这里使用自适应zoom。
cx.add_basemap(ax = ax , source=cx.providers.Stamen.TonerLite,crs=my_traj.df.crs, zoom='auto',attribution="")
cx.add_basemap(ax = ax , source=cx.providers.Stamen.TonerLabels,crs=my_traj.df.crs, zoom='auto',attribution="")
# 保存图片
fig.savefig("res.jpg",dpi=300)

2 Example of data analysis of horse collars

A horse collar is a device used to tame and control horses. It is usually a ring made of leather, nylon, or other sturdy material to wrap around the horse's neck. Horse collars are designed to make it easier for horses to be controlled by the rider and move according to the rider's instructions. This chapter analyzes horse behavior based on horse collar tracking data provided by the University of Copenhagen and a municipal center for technology and the environment in Denmark, and its analysis methods can be used for other similar datasets.

Step 1Data Import

Load the data, and remove the columns you don't need.

df = read_file('data/horse_collar.gpkg')
df['t'] = pd.to_datetime(df['timestamp'])
df = df.set_index('t').tz_localize(None)
# 去除不使用的列
df = df.drop(columns=['LMT_Date', 'LMT_Time',
       'Origin', 'SCTS_Date', 'SCTS_Time', 'Latitude [?]', 'Longitude [?]',
       'FixType', 'Main [V]', 'Beacon [V]', 'Sats', 'Sat',
       'C/N', 'Sat_1', 'C/N_1', 'Sat_2', 'C/N_2', 'Sat_3', 'C/N_3', 'Sat_4',
       'C/N_4', 'Sat_5', 'C/N_5', 'Sat_6', 'C/N_6', 'Sat_7', 'C/N_7', 'Sat_8',
       'C/N_8', 'Sat_9', 'C/N_9', 'Sat_10', 'C/N_10', 'Sat_11', 'C/N_11',
       'Easting', 'Northing',], axis=1)

# 查看数据
# No: 每条记录的序号。
# CollarID: 马颈圈的唯一标识符，每一个马颈圈代表一匹马
# UTC_Date: 记录数据的日期，使用世界协调时间（UTC）标准表示。
# UTC_Time: 记录数据的时间，使用世界协调时间（UTC）标准表示。
# lat: 数据点的纬度信息。
# long: 数据点的经度信息。
# Mort. Status: 
# Temp [?C]: 马颈圈数据点的温度。
# Activity: 马颈圈数据点的活动级别。
# timestamp: 记录数据的时间戳。
# geometry: 马颈圈数据点的GeoPandas位置几何信息。
df.head()

No	CollarID	UTC_Date	UTC_Time	years	long	Mort. Status	Temp [? C]	Activity	timestamp	geometry
t
2018-11-14 12:01:08	299	30788	14-11-2018	12:01:08	54.743331	11.916987	In	22.0	In	2018-11-14 12:01:08	POINT (687757.574 6070134.334)
2018-11-14 12:15:09	300	30788	14-11-2018	12:15:09	54.676884	11.910876	In	22.0	In	2018-11-14 12:15:09	POINT (687671.088 6062727.428)
2018-11-14 12:30:08	301	30788	14-11-2018	12:30:08	54.627018	11.957852	In	21.0	In	2018-11-14 12:30:08	POINT (690932.614 6057307.716)
2018-11-14 13:00:33	302	30788	14-11-2018	13:00:33	54.625893	11.953686	In	17.0	In	2018-11-14 13:00:33	POINT (690669.038 6057171.248)
2018-11-14 13:30:09	303	30788	14-11-2018	13:30:09	54.626171	11.954280	In	17.0	In	2018-11-14 13:30:09	POINT (690706.056 6057203.814)

# 查看数据维度
df.shape

(17517, 11)

# 查看数据的坐标系
original_crs  = df.crs
original_crs

<Projected CRS: EPSG:25832>
Name: ETRS89 / UTM zone 32N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Europe between 6°E and 12°E: Austria; Belgium; Denmark - onshore and offshore; Germany - onshore and offshore; Norway including - onshore and offshore; Spain - offshore.
- bounds: (6.0, 38.76, 12.01, 84.33)
Coordinate Operation:
- name: UTM zone 32N
- method: Transverse Mercator
Datum: European Terrestrial Reference System 1989 ensemble
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

#　只有一个马颈圈标识符
print(df['CollarID'].unique())
collar_id = df['CollarID'].unique()[0]

[30788]

# 只有一个马颈圈运行状态
df['Activity'].unique()

array([nan])

Step 2 Data Overview

This step checks whether the data is logical.

Location at a glance

The coordinates of the data points in the horse collar dataset should be relatively close. Following this rule, positional outliers can be removed.

# EPSG:4326是常用的经纬度坐标系WGS84
df.to_crs({'init': 'EPSG:4326'}).plot()

<Axes: >

We can see that there are two coordinates of the collar that are far apart from the coordinates of the other collar, and we need to delete these two data to remove the outliers.

# 将数据按照数据点经度从小到大排列
# 经度最高的两个点即为离群点
df.sort_values('lat').tail(2)

No	CollarID	UTC_Date	UTC_Time	years	long	Mort. Status	Temp [? C]	Activity	timestamp	geometry
t
2018-11-14 12:15:09	300	30788	14-11-2018	12:15:09	54.676884	11.910876	In	22.0	In	2018-11-14 12:15:09	POINT (687671.088 6062727.428)
2018-11-14 12:01:08	299	30788	14-11-2018	12:01:08	54.743331	11.916987	In	22.0	In	2018-11-14 12:01:08	POINT (687757.574 6070134.334)

Remove these two outliers in this way.

df = df.sort_values('lat')[:-2]
# 按照编号排序
df = df.sort_values('No')
df.head()

No	CollarID	UTC_Date	UTC_Time	years	long	Mort. Status	Temp [? C]	Activity	timestamp	geometry
t
2018-11-14 12:30:08	301	30788	14-11-2018	12:30:08	54.627018	11.957852	In	21.0	In	2018-11-14 12:30:08	POINT (690932.614 6057307.716)
2018-11-14 13:00:33	302	30788	14-11-2018	13:00:33	54.625893	11.953686	In	17.0	In	2018-11-14 13:00:33	POINT (690669.038 6057171.248)
2018-11-14 13:30:09	303	30788	14-11-2018	13:30:09	54.626171	11.954280	In	17.0	In	2018-11-14 13:30:09	POINT (690706.056 6057203.814)
2018-11-14 14:00:38	304	30788	14-11-2018	14:00:38	54.626167	11.954662	In	17.0	In	2018-11-14 14:00:38	POINT (690730.771 6057204.431)
2018-11-14 14:30:08	305	30788	14-11-2018	14:30:08	54.626427	11.955650	In	18.0	In	2018-11-14 14:30:08	POINT (690793.288 6057235.998)

Plot the data after removing outliers.

df.to_crs({'init': 'EPSG:4326'}).plot()

<Axes: >

Finally, we can roughly calculate the area of the herd.

temp = df.to_crs(original_crs)
# 膨胀数据
# 对temp DataFrame中的几何对象进行缓冲处理。
# 对geometry列中的每个几何对象应用buffer(5)，即在每个几何对象周围创建一个半径为5的缓冲区。
temp['geometry'] = temp['geometry'].buffer(5)
# dissolve将所有项按照CollarID列的信息合并
# area计算面积，original_crs使用的单位是米，那么这里面积的单位应该是平方米
total_area = temp.dissolve(by='CollarID').area 
# 将平方米转换为平方公顷(平方百米)
total_area = total_area[collar_id]/10000
print("数据覆盖的总面积为: {:,.2f} 公顷".format(total_area))

数据覆盖的总面积为: 65.19 公顷

Time overview

# 查看数据点的时间范围
print("数据点的时间范围为 {} 到 {}.".format(df.index.min(), df.index.max()))

数据点的时间范围为 2018-11-14 12:30:08 到 2019-11-07 05:30:10.

# 查看数据点的时间跨度
print("数据点的时间跨度为 {}".format(df.index.max() - df.index.min()))

数据点的时间跨度为 357 days 17:00:02

# 统计每一天的马颈圈数据
df['No'].resample('1d').count()

t
2018-11-14    23
2018-11-15    66
2018-11-16    58
2018-11-17    70
2018-11-18    79
              ..
2019-11-03    48
2019-11-04    48
2019-11-05    48
2019-11-06    48
2019-11-07    12
Freq: D, Name: No, Length: 359, dtype: int64

# 可视化数据量
df['No'].resample('1d').count().plot()

<Axes: xlabel='t'>

In addition, you can further count the number of samples for each month, and the code is as follows:

# 添加年份-月份列
df['Y-M'] = df.index.to_period('M')    
for i in df['Y-M'].unique():
    print("时间{}，样本数{}".format(i,len(df[df['Y-M']==i])))

时间2018-11，样本数938
时间2018-12，样本数1486
时间2019-01，样本数1489
时间2019-02，样本数1346
时间2019-03，样本数1488
时间2019-04，样本数1441
时间2019-05，样本数1650
时间2019-06，样本数1474
时间2019-07，样本数1556
时间2019-08，样本数1499
时间2019-09，样本数1439
时间2019-10，样本数1407
时间2019-11，样本数302

Step3 Data analysis

Sampling interval analysis

For sampling data, it is important to determine whether the data is spaced or not. If it's interval sampling, what is the sampling interval?

# 重置索引并提取't'列
t = df.reset_index()['t']
# 计算时间间隔，当每一项数据与前一项数据的时间差
df['delta_t'] = t.diff().values
# 将时间间隔转换为分钟
df['delta_t'] = df['delta_t'].dt.total_seconds() / 60
df['delta_t']

t
2018-11-14 12:30:08          NaN
2018-11-14 13:00:33    30.416667
2018-11-14 13:30:09    29.600000
2018-11-14 14:00:38    30.483333
2018-11-14 14:30:08    29.500000
                         ...    
2019-11-07 03:30:38    30.466667
2019-11-07 04:00:09    29.516667
2019-11-07 04:30:08    29.983333
2019-11-07 05:00:10    30.033333
2019-11-07 05:30:10    30.000000
Name: delta_t, Length: 17515, dtype: float64

Draw a time interval histogram below to see which value is in the interval set. The result is that the sampling method is uniform, and the sampling interval is usually about 30 minutes.

# 绘制时间间隔柱状图
# bins=60表示将数据范围划分成60个等宽的区间，每一个bin表示一个分钟
#range=(0, 60)表示设置直方图显示的数据范围在0到60之间。
df['delta_t'].hist(bins=60, range=(0, 60))

<Axes: >

Movement speed and direction analysis

The maximum speed of movement is used to determine whether unattainable speeds are included in the data.

# 'CollarID'只有一个，所以轨迹只有一个
tc = mpd.TrajectoryCollection(df, 'CollarID')
traj = tc.trajectories[0]
# 添加速度，单位是坐标系单位/s
traj.add_speed()
# 计算速度的最大值
max_speed = traj.df.speed.max()
print("The highest computed speed is {:,.2f} m/s ({:,.2f} km/h)".format(max_speed, max_speed*3600/1000))

The highest computed speed is 0.82 m/s (2.94 km/h)

Based on the velocity, we can look at the velocity distribution as follows:

pd.DataFrame(traj.df)['speed'].hist(bins=100)

<Axes: >

In turn, we can also analyze the distribution of the direction of motion.

traj.add_direction(overwrite=True)
pd.DataFrame(traj.df)['direction'].hist(bins=360)

<Axes: >

Time trend analysis

If we want to understand the relationship between horse movement trends and time, we can do so with the following code.

# 分析时间和速度的关系
tmp = pd.DataFrame(traj.df)[['speed']]
tmp = tmp.reindex(pd.DataFrame(traj.df).index)
# 生成小时和月份列
tmp["hour"] = tmp.index.hour
tmp["month"] = tmp.index.month
tmp.head()

speed	hour	month
t
2018-11-14 12:30:08	0.162635	12	11
2018-11-14 13:00:33	0.162635	13	11
2018-11-14 13:30:09	0.027761	13	11
2018-11-14 14:00:38	0.013517	14	11
2018-11-14 14:30:08	0.039568	14	11

Then we can aggregate the average speed of horses in different months and hours. It was found that the speed of movement by hour showed a significant difference in the number of years in which horses moved earlier and over a larger time span in the summer and later and less in the winter.

# 使用pivot_table函数计算平均速度在不同小时和月份的汇总数据
pivot_table = tmp.pivot_table(values='speed', index='hour', columns='month', aggfunc='mean')
# 创建热图
plt.figure(figsize=(9, 12))
# 横轴表示月份，纵轴表示小时，
sns.heatmap(pivot_table, annot=False, cmap='coolwarm')

<Axes: xlabel='month', ylabel='hour'>

Temperature trend analysis

In addition to time, the dataset also contains temperature information for each record, which we can also analyze.

# 分析温度和速度的关系
tmp = pd.DataFrame(traj.df)[['Temp [?C]', 'speed']]
tmp = tmp.reindex(pd.DataFrame(traj.df).index)
# 生成月份列
tmp["month"] = tmp.index.month
tmp.head()

Temp [? C]	speed	month
t
2018-11-14 12:30:08	21.0	0.162635	11
2018-11-14 13:00:33	17.0	0.162635	11
2018-11-14 13:30:09	17.0	0.027761	11
2018-11-14 14:00:38	17.0	0.013517	11
2018-11-14 14:30:08	18.0	0.039568	11

As you can see in the data above, each row represents a certain temperature at a certain time in a month. In this case, you can add a column to indicate the number of times, but it is all once.

tmp['n'] = 1
tmp.head()

Temp [? C]	speed	month	n
t
2018-11-14 12:30:08	21.0	0.162635	11	1
2018-11-14 13:00:33	17.0	0.162635	11	1
2018-11-14 13:30:09	17.0	0.027761	11	1
2018-11-14 14:00:38	17.0	0.013517	11	1
2018-11-14 14:30:08	18.0	0.039568	11	1

Based on the temperature and month, we can count the number of times each temperature occurs in each month, as follows:

# 使用pivot_table函数统计不同月份下不同温度值出现次数
pivot_table = tmp.pivot_table(values='n', index='Temp [?C]', columns='month', aggfunc='sum')
# 创建热图
plt.figure(figsize=(9, 12))
# 横轴表示月份，纵轴表示温度值
sns.heatmap(pivot_table, annot=False, cmap='coolwarm')

<Axes: xlabel='month', ylabel='Temp [?C]'>

Similarly, we can calculate the average of the velocity at different temperatures in different months.

# 使用pivot_table函数统计不同月份下不同温度值下速度平均值
pivot_table = tmp.pivot_table(values='speed', index='Temp [?C]', columns='month', aggfunc='mean')
# 创建热图
plt.figure(figsize=(9, 12))
# 横轴表示月份，纵轴表示温度值
sns.heatmap(pivot_table, annot=False, cmap='coolwarm')

<Axes: xlabel='month', ylabel='Temp [?C]'>

STEP3 Trajectory analysis

This step focuses on the analysis of individual trajectories. Therefore, it is necessary to split the continuous trajectories into separate trajectories. The results of the analysis depend on how the continuous flow is divided into trajectories, locations, and events.

Trajectory length analysis

# 按天为单位分割轨迹
daily = mpd.TemporalSplitter(tc).split(mode='day')
daily

TrajectoryCollection with 358 trajectories

# 计算每条轨迹的长度
daily_lengths = [traj.get_length() for traj in daily]
# 计算每条轨迹的开始时间
daily_t = [traj.get_start_time() for traj in daily]

# 建立轨迹开始时间和长度的dataframe
daily_lengths = pd.DataFrame(daily_lengths, index=daily_t, columns=['length'])
daily_lengths.head()

length
2018-11-14 12:30:08	1090.598526
2018-11-15 00:00:08	4219.980813
2018-11-16 00:00:10	3198.209140
2018-11-17 00:00:09	4307.483500
2018-11-18 00:00:09	3548.902314

Visualizing the relationship between the start time and the length of the trajectory, we can see that the seasonal trend is very consistent with the previously discovered pattern of seasonal movement speed: winter tracks tend to be shorter than summer tracks.

daily_lengths.plot()

<Axes: >

Dwell detection analysis

We can analyze the trajectory of a certain day separately and extract the dwell segment of the trajectory by dwell detection as follows:

# 设置最大直径为100
MAX_DIAMETER = 100

# 设置最小持续时间为3小时
MIN_DURATION = timedelta(hours=3)

# 获取指定日期（2018年11月17日）的轨迹数据，并赋值给变量one_day
one_day = daily.get_trajectory('30788_2018-11-17 00:00:00')

# 使用停留段检测器(mpd.TrajectoryStopDetector)获取指定轨迹数据的停留段，
# 并根据最小持续时间和最大直径进行筛选，结果存储在变量one_day_stops中
one_day_stops = mpd.TrajectoryStopDetector(one_day).get_stop_segments(
    min_duration=MIN_DURATION, max_diameter=MAX_DIAMETER)

The code for drawing the trajectory stop visualization is as follows:

# 绘制轨迹
# 创建一个绘图区域(ax)，并将指定日期（one_day）的轨迹数据以灰色(slategray)进行绘制
ax = one_day.plot(color='slategray')
# 将停留段数据（one_day_stops）以深粉色(deeppink)的颜色进行绘制
ax = one_day_stops.plot(ax=ax, color='deeppink')
# 获取停留段的起始位置，并将这些起始位置以红色(red)进行绘制
ax = one_day_stops.get_start_locations().plot(ax=ax, color='red')

We can also perform stop recognition for all time periods, the code is as follows:

# 获得停留轨迹点
stops = mpd.TrajectoryStopDetector(tc).get_stop_points(min_duration=MIN_DURATION, max_diameter=MAX_DIAMETER)
len(stops)

# 绘制这些停留的轨迹点
stops.plot(color='deeppink',alpha=0.2)

<Axes: >

In addition, you can plot the number of dwells for each dwell time:

stops['duration_h'] = (stops['end_time']-stops['start_time']).dt.total_seconds() / 3600
# 横轴为停留时间，纵轴为停留次数
pd.DataFrame(stops)['duration_h'].hist(bins=12)

<Axes: >

3 Examples of football analysis

This tutorial uses data extracted from video footage of Liverpool's football matches, with each row representing a point in time for the Liverpool team in a match in 2019. The dataset can be downloaded from liverpool_2019.csv.

STEP1 Data processing

input_file = "data/liverpool_2019.csv"
df = pd.read_csv(input_file)
# 删除列
df.drop(columns=['Unnamed: 0'], inplace=True)
print("样本数",len(df))
df.tail()

样本数 74936

bgcolor	dx	two	edgecolor	frame	play	player	player_num	team	x	and	with
74931	blue	0.0	0.0	white	120	Leicester 0 - [3] Liverpool	10267	In	defense	98.724826	53.720353	0.0
74932	blue	0.0	0.0	white	121	Leicester 0 - [3] Liverpool	10267	In	defense	98.724826	53.720353	0.0
74933	blue	0.0	0.0	white	122	Leicester 0 - [3] Liverpool	10267	In	defense	98.724826	53.720353	0.0
74934	blue	0.0	0.0	white	123	Leicester 0 - [3] Liverpool	10267	In	defense	98.724826	53.720353	0.0
74935	blue	0.0	0.0	white	124	Leicester 0 - [3] Liverpool	10267	In	defense	98.724826	53.720353	0.0

The column names of the read data are explained as follows:

play: The score situation after the goal is scored. The team that scored the goal is placed next to the brackets.
frame: the frame number of the current position. The data provided has 20 frames per second.
player: The player's number. This number is consistent from race to match, but may vary from race to match.
player_num: The player's jersey number. This number is the official number and has not changed in the 2019 Liverpool team.
x, y: the coordinates of the player/football. The court coordinates range from 0 to 100 on each axis.
dx, dy: the change in (x,y) coordinates from the previous frame to the current frame.
z: height, from 0 to 1.5 (only the height information of the football is filled).
bgcolor: The main color of the team (used as the background color).
edgecolor: The secondary color (used as the edge color).

Further we can process the data as follows:

# 获取唯一plays列表
plays = list(df.play.unique())
# 获得时间戳
def to_timestamp(row):
    day = plays.index(row.play) + 1
    start_time = datetime(2019, 1, day, 12, 0, 0)
    # 将frame转换为时间戳
    td = timedelta(milliseconds=1000 / 20 * row.frame)
    return start_time + td

# 将frame转换为时间戳，并添加到DataFrame的新列'time'中
df['time'] = df.apply(to_timestamp, axis=1)
# 将'time'列设为索引，以便后续使用时间作为索引来访问数据
df.set_index('time', inplace=True)

# 根据足球场地的标准尺寸，将x和y坐标缩放到合适的范围
# 许多职业球队的足球场地标准尺寸为105米乘以68米
pitch_length = 105
pitch_width = 68
df.x = df.x / 100 * pitch_length 
df.y = df.y / 100 * pitch_width

# 将以下列转化为分类数据
df['team'] = df['team'].astype('category').cat.as_ordered()
df['player'] = df['player'].astype('category').cat.as_ordered()
df['player_num'] = df['player_num'].astype('category').cat.as_ordered()

df.tail()

bgcolor	dx	two	edgecolor	frame	play	player	player_num	team	x	and	with
time
2019-01-19 12:00:06.000	blue	0.0	0.0	white	120	Leicester 0 - [3] Liverpool	10267	In	defense	103.661067	36.52984	0.0
2019-01-19 12:00:06.050	blue	0.0	0.0	white	121	Leicester 0 - [3] Liverpool	10267	In	defense	103.661067	36.52984	0.0
2019-01-19 12:00:06.100	blue	0.0	0.0	white	122	Leicester 0 - [3] Liverpool	10267	In	defense	103.661067	36.52984	0.0
2019-01-19 12:00:06.150	blue	0.0	0.0	white	123	Leicester 0 - [3] Liverpool	10267	In	defense	103.661067	36.52984	0.0
2019-01-19 12:00:06.200	blue	0.0	0.0	white	124	Leicester 0 - [3] Liverpool	10267	In	defense	103.661067	36.52984	0.0

Once you have the data, you can convert it into a trajectory:

CRS = None
tc = mpd.TrajectoryCollection(df, 'player', x='x', y='y', crs=CRS)
tc

TrajectoryCollection with 364 trajectories

Trajectory analysis

We can analyze the trajectory of a single player as follows:

# 获得球员轨迹
PLAY = 2
play_trajs = tc.filter('play', plays[PLAY])
play_trajs

TrajectoryCollection with 20 trajectories

# 绘制其进攻或防守轨迹
play_trajs.plot(column='team', colormap={'attack':'hotpink', 'defense':'turquoise'})

<Axes: >

# 简化轨迹
generalized = mpd.MinTimeDeltaGeneralizer(play_trajs).generalize(tolerance=timedelta(seconds=0.5))
generalized.add_speed()
# 绘制轨迹速度
generalized.plot(column='speed',legend=True)

<Axes: >

# 添加完整轨迹数据
ax = generalized.plot(column='team', colormap={'attack':'limegreen', 'defense':'purple'})
ax = generalized.get_start_locations().plot(ax = ax ,label='start', color='orange',zorder=2)
import matplotlib.image as mpimg

# 添加背景图片
# 下载地址https://github.com/movingpandas/movingpandas/raw/main/tutorials/data/soccer_field.png
img = mpimg.imread('data/soccer_field.png')
ax.imshow(img, extent=[0, pitch_length, 0, pitch_width])

<matplotlib.image.AxesImage at 0x7f0a747a1ed0>

4 See also

MovingPandas入门指北
liverpool_2019.csv
movingpandas
movingpandas-examples
movingpandas-examples-data
GeoPandas overlay background map

[Data Analysis & Visualization] Python Plotting Data Map 5 - MovingPandas Plot Example

1 Example of ship data analysis

2 Example of data analysis of horse collars

3 Examples of football analysis

4 See also

Read on

From now on, Great Wall brand advanced drawing erasers and H&M brand umbrellas are recalled

Great Wall brand advanced drawing eraser and H&M brand umbrella are recalled

FGO: FES2024 Servant adds new drawings to the handsome Musashi and Bedwell

A list of desperate zero full-staff explosion-clad and full-life paintings, and high-definition full-fledged drawings of all staff-bursting clothes

How the AI software of the AI painting generator generates cyberpunk drawing style picture works with one click

It's cute, and Mei is crying! This is the classic of mountains and seas that children love to see! Full-color graphic edition!

Triple screen design Jiyue 07 interior hand-drawn drawings were released

How to use the image generation function of AI software? Batch generate similar images with different drawing styles

Recognize flowers and map, and residents' autonomy makes the community beautiful to a new height

CAD绘图软件 Autodesk AutoCAD 2004-2025.0.0 珊瑚の海精简版

1. Draw pictures, my gallery and gallery sharing functions to generate pictures with different drawing styles

Taylor · Swift posted in support of Harris, and Musk's response sparked controversy. United States actress Taylor · Swift told her 283 million followers on Instagram

Pan Zhanle received the hand-drawn drawings drawn by the students, and in the face of reporters' questions, he began to "scold" people again!

Intelligently generate images with multiple drawing styles and multiple vertical sizes, a creative accelerator for AI painting experts

10.13 Sunday warm heart blessings, exquisite hand-painted blessings, may happiness be accompanied, and the year will be safe

Blue Route: Donghuang Ship Niang SR Ding'an Painter Drawing Disclosure The figure has been enhanced at an epic level

"Ark Survival Ascension" today launched a new map, the Greece world, and a new creature giant turtle

"Jing'an District Public Legal Service Map (Bilingual Version)" is online!

KK's official battle platform, the super cool map "Skyfall", unlocks a new way to defend! This time it's a little different!

The local siege of Gaode Map: It was accused of fraudulent tickets for scenic spots, and there were also hidden "special services" to wipe the edges

In the past two days, map service providers have been "panicking"

In the first half of October, the "Mosquito Map" was announced! Pay attention to mosquito prevention and control!

Follow the map to check in|In the heated swimming pool in Changning, the world champion took the first step in the swimming world

The Overlord of the Octopath Traveler Continent: The Man Who Granted Everything Chapter 3, the plot has that flavor, with a map

Take this foliage map with you to start the autumn travel mode

Surveying and Mapping Bulletin | Liu Jie: 2015-2023 map review service data analysis and problem response

In the seventh grade, 90% of students in the mid-term exam are easy to lose points and pass the "map" test center

I can't bear it, the Warriors' home China map erased Taiwan, tampered with the Chinese flag, and the Morey incident was repeated

Don't lie down during the New Year, let's unlock the "New Year's New Map" together!

The Cache author announces that the map has been reset! TYLOO lost to GL in three pictures

Where to go by car during the May Day holiday? 2025 domestic 12 essence self-driving route map strategy sharing

Where to go during the May Day holiday? 2025 15 Essence Self-driving Route Map Sharing (2)