MovingPandas is an open-source geospatio-temporal data processing library based on Python and GeoPandas, which is used to process the trajectory data of moving objects. It provides a powerful set of tools that make it easy to load, analyze, and visualize the trajectory of moving objects. By using MovingPandas, users can easily process and analyze moving object data and extract insights about behavior, patterns, and trends from it. Whether you're working with traffic flow data, logistics trajectory data, or animal migration data, MovingPandas is a powerful geovisualization tool.
The MovingPandas library has the following key features:
- Trajectory Objects: MovingPandas introduces Trajectory objects for representing and managing trajectory data. The object contains time, position, and attribute information to facilitate manipulation and analysis of the trajectory.
- Spatiotemporal slicing: MovingPandas supports spatiotemporal slicing operations, which can filter and segment trajectory data according to temporal and spatial conditions.
- Trajectory analysis: MovingPandas provides a wealth of trajectory analysis functions, including calculating trajectory length, velocity, direction, acceleration and other indicators, as well as trajectory clustering, interaction and similarity analysis.
- Trajectory visualization: MovingPandas can easily visualize trajectory data and support drawing trajectory lines, points, and heat maps on the map, helping users understand the behavior of moving objects more intuitively.
- Data format compatibility: MovingPandas supports a variety of common geographic data formats, including GeoJSON, Shapefile, etc., which is convenient for users to load and save trajectory data.
The official address of the MovingPandas repository is: movingpandas. The address of the official sample code repository for MovingPandas is: movingpandas-examples. All experimental data in this article comes from: movingpandas-examples-data.
All the code in this article can be found at: Python-Study-Notes
The author recommends installing MovingPandas in Python 3.8 or above, and recommends using conda for installation. MovingPandas can be installed using the following command:
conda install -c conda-forge movingpandas
Due to the complexity of MovingPandas' dependency environment, pip is not recommended for installation. If you insist on using pip for installation, you can install MovingPandas by following the following command:
pip install movingpandas
# 本文必安装第三方库
pip install contextily
# 以下第三方库可选
pip install hvplot
pip install cartopy
pip install geoviews
The following code shows the version of MovingPandas, which is Python 3.10.
# jupyter notebook环境去除warning
import warnings
warnings.filterwarnings("ignore")
# 查看movingpandas版本及依赖库版本信息
import movingpandas as mpd
mpd.show_versions()
MovingPandas 0.16.1
SYSTEM INFO
-----------
python : 3.10.10 (main, Mar 21 2023, 18:45:11) [GCC 11.2.0]
executable : /opt/conda/envs/python35-paddle120-env/bin/python
machine : Linux-5.4.0-104-generic-x86_64-with-glibc2.31
GEOS, GDAL, PROJ INFO
---------------------
GEOS : None
GEOS lib : None
GDAL : 3.6.4
GDAL data dir: /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/fiona/gdal_data
PROJ : 9.2.1
PROJ data dir: /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/pyproj/proj_dir/share/proj
PYTHON DEPENDENCIES
-------------------
geopandas : 0.13.2
pandas : 2.0.2
fiona : 1.9.4.post1
numpy : 1.24.3
shapely : 2.0.1
rtree : 1.0.1
pyproj : 3.6.0
matplotlib : 3.7.1
mapclassify: None
geopy : 2.3.0
holoviews : None
hvplot : None
geoviews : None
stonesoup : None
[toc]
1 Getting Started
1.1 Infrastructure
# 加载第三方库
import pandas as pd
import geopandas as gpd
import movingpandas as mpd
import shapely as shp
# hvPlot是建立在数据可视化库holoviews的一个高级封装库
# import hvplot.pandas
from geopandas import GeoDataFrame, read_file
from shapely.geometry import Point, LineString, Polygon
from datetime import datetime, timedelta
# hvplot通常与HoloViews集成使用
# from holoviews import opts
# 设置绘图界面
# opts.defaults(opts.Overlay(active_tools=['wheel_zoom'], frame_width=500, frame_height=400))
1.1.1 Trajectory类
MovingPandas provides a Trajectory object to represent an object for a single trajectory. It consists of a series of coordinate points. The Trajectory object initialization parameters are as follows:
class Trajectory:
def __init__(
self,
df,
traj_id,
obj_id=None,
t=None,
x=None,
y=None,
crs="epsg:4326",
parent=None,
):
The individual parameters are explained below:
- df: GeoDataFrame with Geometry coordinate column and timestamp index of GeoPandas, or DataFrame of Pandas. Required parameters.
- traj_id: Any type that represents a unique identifier for a track. Required parameters.
- obj_id: Any type, a unique identifier for a moving object. The default is None.
- t: indicates the column name that contains the timestamp, which is None by default.
- x: indicates the name of the column containing the x coordinates, which must be specified in the DataFrame of Pandas. The default is None.
- y: indicates the name of the column containing the y coordinates, which must be specified when using the DataFrame of Pandas. The default is None.
- CRS: A coordinate reference system that represents the X/Y coordinates. The default is "epsg:4326", which is WGS84.
- parent: A Trajectory object, representing the parent trajectory. The default is None.
As can be seen from the initialization parameters of the Trajectory object, MovingPandas represents the trajectory with the relevant point position information and its time information. Here's an example of how to use the Trajectory object.
First, create a GeoDataFrame object that represents the location and its corresponding time information.
import pandas as pd
from shapely.geometry import Point
from geopandas import GeoDataFrame
import movingpandas as mpd
from datetime import datetime
# 创建DataFrame对象
df = pd.DataFrame([
{'geometry': Point(0, 0), 't': datetime(2023, 7, 1, 12, 0, 0)},
{'geometry': Point(6, 0), 't': datetime(2023, 7, 1, 12, 6, 0)},
{'geometry': Point(6, 6), 't': datetime(2023, 7, 1, 12, 10, 0)},
{'geometry': Point(9, 9), 't': datetime(2023, 7, 1, 12, 15, 0)}])
# 设置索引轴
df = df.set_index('t')
# 创建GeoDataFrame对象
# EPSG:3857是在线地图常用的投影坐标系,以米为单位
gdf = GeoDataFrame(df, crs='EPSG:3857')
gdf
geometry | |
t | |
2023-07-01 12:00:00 | POINT (0.000 0.000) |
2023-07-01 12:06:00 | POINT (6.000 0.000) |
2023-07-01 12:10:00 | POINT (6.000 6.000) |
2023-07-01 12:15:00 | POINT (9.000 9.000) |
A trajectory object is then created based on the GeoDataFrame object.
# 创建Trajectory对象
# 1表示轨迹标识符
toy_traj = mpd.Trajectory(gdf, 1)
toy_traj
Trajectory 1 (2023-07-01 12:00:00 to 2023-07-01 12:15:00) | Size: 4 | Length: 16.2m
Bounds: (0.0, 0.0, 9.0, 9.0)
LINESTRING (0 0, 6 0, 6 6, 9 9)
Next, the trajectory path can be visualized, and the horizontal and vertical axes represent the xy coordinates of the points.
# 可视化路径
toy_traj.plot()
<Axes: >
# 交互式展示
# toy_traj.hvplot()
In addition, you can also directly call the toy_traj GeoPandas object for visualization.
toy_traj.df.plot()
<Axes: >
1.1.2 TrajectoryCollection类
The TrajectoryCollection class is a collection that is used to store multiple Trajectory objects. It provides the ability to manipulate and analyze the entire collection of trajectories. You can use TrajectoryCollection to merge multiple tracks into a single object, filter tracks, do spatial querying, visualize, and more. The initialization parameters of the TrajectoryCollection object are as follows:
class TrajectoryCollection:
def __init__(
self,
data,
traj_id_col=None,
obj_id_col=None,
t=None,
x=None,
y=None,
crs="epsg:4326",
min_length=0,
min_duration=None,
)
Most of the parameters are the same as those of the Trajectory object, and the different parameters are explained as follows:
- min_length: The minimum length of the track, below which tracks will be discarded.
- min_duration: The expected minimum duration of the track, tracks below that duration will be discarded.
Create a TrajectoryCollection object directly
# 创建两条轨迹
df = pd.DataFrame(
{
"t": pd.date_range("2023-07-01", periods=5, freq="min"),
"trajectory_id": [1, 1, 2, 2, 2],
"geometry": [Point(0, 0), Point(0, 1), Point(1, 2), Point(1, 3), Point(2, 4)],
}
)
gdf = gpd.GeoDataFrame(df, crs='EPSG:3857')
gdf
t | trajectory_id | geometry | |
2023-07-01 00:00:00 | 1 | POINT (0.000 0.000) | |
1 | 2023-07-01 00:01:00 | 1 | POINT (0.000 1.000) |
2 | 2023-07-01 00:02:00 | 2 | POINT (1.000 2.000) |
3 | 2023-07-01 00:03:00 | 2 | POINT (1.000 3.000) |
4 | 2023-07-01 00:04:00 | 2 | POINT (2.000 4.000) |
# 创建TrajectoryCollection对象,用trajectory_id作为单个轨迹的唯一标识
tc = mpd.TrajectoryCollection(gdf, traj_id_col='trajectory_id', t='t')
tc
TrajectoryCollection with 2 trajectories
# 绘制轨迹, column指定轨迹对象,legend展示轨迹持续时间
import matplotlib.cm as cm
tc.plot(column='trajectory_id', legend=True)
<Axes: >
# 交互式展示轨迹
# tc.hvplot()
In addition, we can extract individual trajectories from the TrajectoryCollection, or filter the desired trajectories.
# 从TrajectoryCollection提取单个轨迹
my_traj = tc.trajectories[1]
print(my_traj)
Trajectory 2 (2023-07-01 00:02:00 to 2023-07-01 00:04:00) | Size: 3 | Length: 2.4m
Bounds: (1.0, 2.0, 2.0, 4.0)
LINESTRING (1 2, 1 3, 2 4)
# 筛选路径持续时间大于一分钟的
min_duration = timedelta(minutes=1)
tc.trajectories = [traj for traj in tc if traj.get_duration() > min_duration]
print(tc.trajectories)
[Trajectory 2 (2023-07-01 00:02:00 to 2023-07-01 00:04:00) | Size: 3 | Length: 2.4m
Bounds: (1.0, 2.0, 2.0, 4.0)
LINESTRING (1 2, 1 3, 2 4)]
Create a TrajectoryCollection object from file data
TrajectoryCollections can be created by reading data from a variety of geospatial data stores using the GeoPandas function, or by reading data from a variety of file types using Pandas.
The following shows how to create a TrajectoryCollection object from a csv file and download the data from movingpandas-examples-data
# 读取数据
df = pd.read_csv('data/geolife_small.csv', delimiter=';')
df.head()
X | And | Fid | id | sequence | trajectory_id | tracker | t | |
116.391305 | 39.898573 | 1 | 1 | 1 | 1 | 19 | 2008-12-11 04:42:14+00 | |
1 | 116.391317 | 39.898617 | 2 | 2 | 2 | 1 | 19 | 2008-12-11 04:42:16+00 |
2 | 116.390928 | 39.898613 | 3 | 3 | 3 | 1 | 19 | 2008-12-11 04:43:26+00 |
3 | 116.390833 | 39.898635 | 4 | 4 | 4 | 1 | 19 | 2008-12-11 04:43:32+00 |
4 | 116.389410 | 39.898723 | 5 | 5 | 5 | 1 | 19 | 2008-12-11 04:43:47+00 |
# 转换为traj_collection对象
traj_collection = mpd.TrajectoryCollection(df, 'trajectory_id', t='t', x='X', y='Y')
print(traj_collection)
TrajectoryCollection with 5 trajectories
# 绘制轨迹
traj_collection.plot(column='trajectory_id', legend=True, figsize=(9,5))
<Axes: >
1.2 Basic Functions
The functions in MovingPandas can be applied to both a single Trajectory object and a TrajectoryCollection object to enable the analysis of single and multiple trajectories.
1.2.1 Move Info Add Function
MovingPandas provides mobile information adding functions such as add_speed, add_distance, add_direction, add_timedelta, add_timedelta, and add_acceleration to process moving data. The interfaces for these functions are as follows:
add_speed(overwrite=False, name=SPEED_COL_NAME, units=UNITS())
add_distance(overwrite=False, name=DISTANCE_COL_NAME, units=None)
add_direction(overwrite=False, name=DIRECTION_COL_NAME)
add_angular_difference(overwrite=False,name=ANGULAR_DIFFERENCE_COL_NAM):
add_timedelta(overwrite=False, name=TIMEDELTA_COL_NAME)
add_acceleration(overwrite=False, name=ACCELERATION_COL_NAME, units=UNITS())
All functions use system defaults or global variables, and the specific functions are described below:
- add_speed: Used to calculate the speed of moving objects. The velocity is calculated based on the distance and time difference between the two time points. The overwrite parameter indicates whether to overwrite the existing velocity column, name indicates the name of the velocity column, units indicates the unit of velocity, and the default value is used if the name and unit are not specified.
- add_distance: Used to calculate the distance of moving objects. The distance is calculated based on the straight-line distance difference between two time points. The overwrite parameter indicates whether to overwrite the existing distance column, name indicates the name of the distance column, and units indicates the unit of distance.
- add_direction: Used to calculate the direction of moving objects. The direction is calculated based on the difference in coordinates between two time points. The overwrite parameter indicates whether to overwrite the existing direction column, and name indicates the name of the direction column. The direction value is in degrees, rotating clockwise from north, with a range value of [0,360].
- add_angular_difference: Used to calculate the angular difference between two objects. The angular difference between two objects is calculated based on the orientation information of the moving object. The overwrite parameter indicates whether to overwrite the existing direction column, and name indicates the name of the direction column. The angular range value is [0,180].
- add_timedelta: Used to calculate the time difference of moving objects. The time difference is calculated based on the difference between the timestamps. The overwrite parameter indicates whether to overwrite the existing time difference column, and name indicates the name of the time difference column. The time difference is the difference between the current row and the previous row.
- add_acceleration: Used to calculate the acceleration of a moving object. Acceleration is calculated based on velocity and time difference. The overwrite parameter indicates whether to overwrite the existing acceleration column, name indicates the name of the acceleration column, and units indicates the unit of acceleration.
The following code describes the use of these functions in detail.
# 创建轨迹
df = pd.DataFrame([
{'geometry':Point(0,0), 't':datetime(2023,7,1,12,0,0)},
{'geometry':Point(6,0), 't':datetime(2023,7,1,12,0,6)},
{'geometry':Point(6,6), 't':datetime(2023,7,1,12,0,11)},
{'geometry':Point(9,9), 't':datetime(2023,7,1,12,0,14)}
]).set_index('t')
gdf = GeoDataFrame(df, crs='EPSG:3857')
toy_traj = mpd.Trajectory(gdf, 1)
toy_traj
Trajectory 1 (2023-07-01 12:00:00 to 2023-07-01 12:00:14) | Size: 4 | Length: 16.2m
Bounds: (0.0, 0.0, 9.0, 9.0)
LINESTRING (0 0, 6 0, 6 6, 9 9)
# 查看轨迹的坐标系,以米为单位
toy_traj.crs
<Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World between 85.06°S and 85.06°N.
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
# 添加速度列,不指定单位则以米为单位
toy_traj.add_speed(overwrite=True)
toy_traj.df
geometry | speed | |
t | ||
2023-07-01 12:00:00 | POINT (0.000 0.000) | 1 |
2023-07-01 12:00:06 | POINT (6.000 0.000) | 1 |
2023-07-01 12:00:11 | POINT (6.000 6.000) | 1.200000 |
2023-07-01 12:00:14 | POINT (9.000 9.000) | 1.414214 |
# 添加自定义定单位的速度列
toy_traj.add_speed(overwrite=True, name="speed (ft/min)", units=("ft", "min"))
toy_traj.df
geometry | speed | speed (ft/min) | |
t | |||
2023-07-01 12:00:00 | POINT (0.000 0.000) | 1 | 196.850394 |
2023-07-01 12:00:06 | POINT (6.000 0.000) | 1 | 196.850394 |
2023-07-01 12:00:11 | POINT (6.000 6.000) | 1.200000 | 236.220472 |
2023-07-01 12:00:14 | POINT (9.000 9.000) | 1.414214 | 278.388497 |
# 其他列的添加
toy_traj.add_distance(overwrite=True, name="distance (m)", units="m")
toy_traj.add_direction(overwrite=True)
toy_traj.add_timedelta(overwrite=True)
toy_traj.add_angular_difference(overwrite=True)
toy_traj.add_acceleration(overwrite=True, name="acceleration (mph/s)", units=("mi", "h", "s"))
toy_traj.df
geometry | speed | speed (ft/min) | distance (m) | direction | timedelta | angular_difference | acceleration (mph/s) | |
t | ||||||||
2023-07-01 12:00:00 | POINT (0.000 0.000) | 1 | 196.850394 | 0.000000 | 90.0 | Night | 0.000000 | 0.000000 |
2023-07-01 12:00:06 | POINT (6.000 0.000) | 1 | 196.850394 | 6.000000 | 90.0 | 0 days 00:00:06 | 0.000000 | 0.000000 |
2023-07-01 12:00:11 | POINT (6.000 6.000) | 1.200000 | 236.220472 | 6.000000 | 0.000000 | 0 days 00:00:05 | 90.0 | 0.089477 |
2023-07-01 12:00:14 | POINT (9.000 9.000) | 1.414214 | 278.388497 | 4.242641 | 45.0 | 0 days 00:00:03 | 45.0 | 0.159727 |
# 可视化某列结果绘制结果
toy_traj.plot(column='speed', linewidth=5, capstyle='round', figsize=(9,3), legend=True, vmax=20)
<Axes: >
1.2.2 Trajectory position extraction function
MovingPandas provides track position extraction functions such as add_speed, add_distance, add_direction, add_timedelta, add_timedelta, and add_acceleration to extract track position-related information. The specific functions are described as follows:
- get_start_location(): This function is used to get the starting position of the moving object. It returns the coordinates of the starting position of the moving object.
- get_end_location(): This function is used to get the end position of the moving object. It returns the coordinates of the end position of the moving object.
- get_position_at(t, method="interpolated"): This function is used to get the position of a moving object at a given point in time (t). The parameter t indicates the point in time of the desired location, while the optional parameter method specifies the method used to get the location, which defaults to "interpolated", which means that the location is obtained using the interpolation method.
- get_segment_between(t1, t2): This function is used to get a fragment of a moving object between two given points in time (t1 and t2). It returns a subset of objects that were moved during that time.
- clip(polygon, point_based=False): This function is used to clip the moving object so that it only retains the part within the specified polygon area. The parameter polygon represents the area of the polygon used for clipping, while the optional parameter point_based specify whether clipping is based on points rather than segments. By default, clipping is segment-based. In addition, MovingPandas also provides an intersection function to calculate the intersection between the trajectory data and the space geometry, which is similar to the clip function, which can be used in the official documentation.
The following code describes the use of these functions in detail.
# 创建轨迹
df = pd.DataFrame([
{'geometry':Point(0,0), 't':datetime(2023,7,1,12,0,0)},
{'geometry':Point(6,0), 't':datetime(2023,7,1,12,6,6)},
{'geometry':Point(6,6), 't':datetime(2023,7,1,12,10,11)},
{'geometry':Point(9,9), 't':datetime(2023,7,1,12,19,14)}
]).set_index('t')
gdf = GeoDataFrame(df, crs='EPSG:3857')
toy_traj = mpd.Trajectory(gdf, 1)
toy_traj
Trajectory 1 (2023-07-01 12:00:00 to 2023-07-01 12:19:14) | Size: 4 | Length: 16.2m
Bounds: (0.0, 0.0, 9.0, 9.0)
LINESTRING (0 0, 6 0, 6 6, 9 9)
Start and end positions
ax = toy_traj.plot()
# 提取轨迹起始点和结束点
gpd.GeoSeries(toy_traj.get_start_location()).plot(ax=ax, color='blue')
gpd.GeoSeries(toy_traj.get_end_location()).plot(ax=ax, color='red')
<Axes: >
Intermediate position
# 提取轨迹中的一个时间点的位置
t = datetime(2023,7,1,12,6,0)
# 改时间点不同位置信息获取方式
print(toy_traj.get_position_at(t, method="nearest"))
print(toy_traj.get_position_at(t, method="interpolated"))
print(toy_traj.get_position_at(t, method="ffill"))
print(toy_traj.get_position_at(t, method="bfill"))
POINT (6 0)
POINT (5.9016393442622945 0)
POINT (0 0)
POINT (6 0)
point = toy_traj.get_position_at(t, method="interpolated")
ax = toy_traj.plot()
gpd.GeoSeries(point).plot(ax=ax, color='red', markersize=100)
<Axes: >
Track fragments
# 提取轨迹中的一个片段
segment = toy_traj.get_segment_between(datetime(2023,7,1,12,6,0), datetime(2023,7,1,12,12,0))
print(segment)
Trajectory 1_2023-07-01 12:06:00 (2023-07-01 12:06:06 to 2023-07-01 12:10:11) | Size: 2 | Length: 6.0m
Bounds: (6.0, 0.0, 6.0, 6.0)
LINESTRING (6 0, 6 6)
ax = toy_traj.plot()
segment.plot(ax=ax, color='red', linewidth=5)
<Axes: >
Trajectory area
xmin, xmax, ymin, ymax = 2, 8, -10, 5
# 设置区域
polygon = Polygon([(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin), (xmin, ymin)])
# 提取位于该区域的轨迹
intersections = toy_traj.clip(polygon)
intersections
TrajectoryCollection with 1 trajectories
# 绘制轨迹区域
ax = toy_traj.plot()
gpd.GeoSeries(polygon).plot(ax=ax, color='lightgray')
intersections.plot(ax=ax, color='red', linewidth=5, capstyle='round')
<Axes: >
# 单独绘制轨迹区域
intersections = toy_traj.clip(polygon)
intersections.plot()
<Axes: >
1.2.3 与GeoPandas交互
MovingPandas provides to_point_gdf, to_line_gdf, and to_traj_gdf functions to convert the trajectory classes Trajectories and TrajectoryCollections back to GeoDataFrames structures for GeoPandas according to different rules. These functions are explained as follows:
- to_point_gdf (return_orig_tz=False) to convert the track object to a GeoDataFrame associated with each point. return_orig_tz is used to specify whether to return a timestamp that matches the original time zone. The default is False, and the timestamp is converted according to the local time zone.
- to_line_gdf() to convert the track object to a GeoDataFrame associated with each line.
- to_traj_gdf(wkt=False, agg=False) returns a GeoDataFrame, each row representing a track. wkt is used to specify whether the trajectory geometry is represented in the Well-Known Text (WKT) format. Defaults to False, which returns geometric objects. AGG is used to specify whether or not to aggregate tracks. The default value is False, no aggregation is performed, and the optional parameters of agg are shown in the code.
# 读取数据
df = pd.read_csv('data/geolife_small.csv', delimiter=';')
df.head()
# 转换为traj_collection对象
traj_collection = mpd.TrajectoryCollection(df, 'trajectory_id', t='t', x='X', y='Y')
print(traj_collection)
TrajectoryCollection with 5 trajectories
# 转换为点GeoDataFrame
traj_collection.to_point_gdf().head()
Fid | id | sequence | trajectory_id | tracker | geometry | |
t | ||||||
2008-12-11 04:42:14 | 1 | 1 | 1 | 1 | 19 | POINT (116.39131 39.89857) |
2008-12-11 04:42:16 | 2 | 2 | 2 | 1 | 19 | POINT (116.39132 39.89862) |
2008-12-11 04:43:26 | 3 | 3 | 3 | 1 | 19 | POINT (116.39093 39.89861) |
2008-12-11 04:43:32 | 4 | 4 | 4 | 1 | 19 | POINT (116.39083 39.89863) |
2008-12-11 04:43:47 | 5 | 5 | 5 | 1 | 19 | POINT (116.38941 39.89872) |
# 转换为线GeoDataFrame
traj_collection.to_line_gdf().head()
Fid | id | sequence | trajectory_id | tracker | t | prev_t | geometry | |
2 | 2 | 2 | 1 | 19 | 2008-12-11 04:42:16 | 2008-12-11 04:42:14 | LINESTRING(116.39131 39.89857, 116.39132 39.8... | |
1 | 3 | 3 | 3 | 1 | 19 | 2008-12-11 04:43:26 | 2008-12-11 04:42:16 | LINESTRING(116.39132 39.89862, 116.39093 39.8... |
2 | 4 | 4 | 4 | 1 | 19 | 2008-12-11 04:43:32 | 2008-12-11 04:43:26 | LINESTRING(116.39093 39.89861, 116.39083 39.8... |
3 | 5 | 5 | 5 | 1 | 19 | 2008-12-11 04:43:47 | 2008-12-11 04:43:32 | LINESTRING(116.39083 39.89863, 116.38941 39.8... |
4 | 6.000000 | 6.000000 | 6.000000 | 1 | 19 | 2008-12-11 04:43:50 | 2008-12-11 04:43:47 | LINESTRING (116.38941 39.89872, 116.39052 39.8... |
# 每条轨迹单独聚合GeoDataFrame
traj_collection.to_traj_gdf()
traj_id | start_t | end_t | geometry | length | direction | |
1 | 2008-12-11 04:42:14 | 2008-12-11 05:15:46 | LINESTRING(116.39131 39.89857, 116.39132 39.8... | 6207.020261 | 186.681376 | |
1 | 2 | 2009-06-29 07:02:25 | 2009-06-29 11:13:12 | LINESTRING(116.59096 40.07196, 116.59091 40.0... | 38764.575483 | 250.585295 |
2 | 3 | 2009-02-04 04:32:53 | 2009-02-04 11:20:12 | LINESTRING (116.38569 39.89977, 116.38565 39.8... | 12745.157506 | 304.115160 |
3 | 4 | 2009-03-10 10:36:45 | 2009-03-10 12:01:07 | LINESTRING (116.38805 39.90342, 116.38804 39.9... | 14363.780551 | 300.732843 |
4 | 5 | 2009-02-25 09:47:03 | 2009-02-25 14:31:24 | LINESTRING(116.38526 39.90027, 116.38525 39.9... | 39259.779560 | 305.200501 |
1.3 Trajectory processing
1.3.1 Trajectory extraction
MovingPandas provides TemporalSplitter, ObservationGapSplitter, StopSplitter, and SpeedSplitter classes to extract specified track segments from tracks according to different rules. The details are as follows:
- TemporalSplitter: Splits the track into sub-intervals using regular time intervals.
- ObservationGapSplitter: Splits the trajectory into sub-intervals based on the interval between observation times. If the interval between two consecutive observations exceeds the specified threshold, the trajectory is considered to need to be split.
- StopSplitter: Splits the trajectory into sub-intervals based on the definition of the stops. A stop is when a trajectory stays in a relatively small area for a period of time.
- SpeedSplitter: Splits the trajectory into sub-sections based on the speed threshold. If the trajectory velocity falls below a specified threshold at a certain segment, you need to split the trajectory here.
These classes only need to be initialized with the pending tracks, and then the split function is called to extract the traces. See the following code for specific use.
import matplotlib.pyplot as plt
# 读取数据
gdf = read_file('data/geolife_small.gpkg').to_crs('EPSG:3857')
tc = mpd.TrajectoryCollection(gdf, 'trajectory_id', t='t')
# 共五条轨迹
tc.to_traj_gdf()
traj_id | start_t | end_t | geometry | length | direction | |
1 | 2008-12-11 04:42:14 | 2008-12-11 05:15:46 | LINESTRING(12956620.805 4851214.113, 12956622... | 8101.428690 | 186.679744 | |
1 | 2 | 2009-06-29 07:02:25 | 2009-06-29 11:13:12 | LINESTRING(12978845.964 4876404.973, 12978840... | 50621.731208 | 250.500509 |
2 | 3 | 2009-02-04 04:32:53 | 2009-02-04 11:20:12 | LINESTRING(12955995.635 4851388.236, 12955991... | 16626.383723 | 304.099365 |
3 | 4 | 2009-03-10 10:36:45 | 2009-03-10 12:01:07 | LINESTRING(12956258.794 4851917.156, 12956257... | 18739.337154 | 300.716597 |
4 | 5 | 2009-02-25 09:47:03 | 2009-02-25 14:31:24 | LINESTRING(12955947.434 4851460.354, 12955946... | 51327.869585 | 305.185128 |
TemporalSplitter
# 将轨迹切分为以每小时为单位的轨迹片段,min_length表示删除轨迹长度小于该值的轨迹片段,min_lwngth取决于轨迹的单位
split = mpd.TemporalSplitter(tc).split(mode="hour", min_length=20000)
split.to_traj_gdf()
traj_id | start_t | end_t | geometry | length | direction | |
2_2009-06-29 07:00:00 | 2009-06-29 07:02:25 | 2009-06-29 07:59:55 | LINESTRING(12978845.964 4876404.973, 12978840... | 40106.105641 | 245.123482 | |
1 | 5_2009-02-25 10:00:00 | 2009-02-25 10:00:04 | 2009-02-25 10:54:47 | LINESTRING(12955251.019 4851210.485, 12955248... | 25455.673254 | 337.129155 |
# 绘制轨迹
fig, axes = plt.subplots(nrows=1, ncols=len(split), figsize=(6,4))
for i, traj in enumerate(split):
traj.plot(ax=axes[i], linewidth=3.0, capstyle='round', column='speed', vmax=20)
ObservationGapSplitter
# 如果两个连续观测超过间隔gap,如30分钟,则认为该轨迹需要拆分
split = mpd.ObservationGapSplitter(tc).split(gap=timedelta(minutes=30),min_length=0)
split.to_traj_gdf()
traj_id | start_t | end_t | geometry | length | direction | |
1_0 | 2008-12-11 04:42:14 | 2008-12-11 05:15:46 | LINESTRING(12956620.805 4851214.113, 12956622... | 8101.428690 | 186.679744 | |
1 | 2_0 | 2009-06-29 07:02:25 | 2009-06-29 08:20:15 | LINESTRING(12978845.964 4876404.973, 12978840... | 47469.321958 | 252.952783 |
2 | 2_1 | 2009-06-29 10:57:17 | 2009-06-29 11:13:12 | LINESTRING(12948649.439 4867034.108, 12948650... | 3040.348707 | 139.615947 |
3 | 3_0 | 2009-02-04 04:32:53 | 2009-02-04 04:35:03 | LINESTRING(12955995.635 4851388.236, 12955991... | 311.729231 | 42.937430 |
4 | 3_1 | 2009-02-04 10:03:21 | 2009-02-04 11:20:12 | LINESTRING(12956011.999 4851497.646, 12956043... | 16228.264596 | 303.229612 |
5 | 4_0 | 2009-03-10 10:36:45 | 2009-03-10 12:01:07 | LINESTRING(12956258.794 4851917.156, 12956257... | 18739.337154 | 300.716597 |
6.000000 | 5_0 | 2009-02-25 09:47:03 | 2009-02-25 10:54:47 | LINESTRING(12955947.434 4851460.354, 12955946... | 27149.500896 | 335.377580 |
7 | 5_1 | 2009-02-25 13:30:22 | 2009-02-25 14:31:24 | LINESTRING(12945965.972 4873487.115, 12945952... | 24074.321167 | 165.727187 |
StopSplitter
# 如果轨迹在某一点为圆心,直径为10范围内停留60s,则认为轨迹需要在该段分割
split = mpd.StopSplitter(tc).split(max_diameter=10, min_duration=timedelta(seconds=60))
split.to_traj_gdf()
traj_id | start_t | end_t | geometry | length | direction | |
1_2008-12-11 04:42:14 | 2008-12-11 04:42:14 | 2008-12-11 05:15:46 | LINESTRING(12956620.805 4851214.113, 12956622... | 8101.428690 | 186.679744 | |
1 | 2_2009-06-29 07:02:25 | 2009-06-29 07:02:25 | 2009-06-29 07:05:30 | LINESTRING(12978845.964 4876404.973, 12978840... | 608.233016 | 29.527683 |
2 | 2_2009-06-29 07:06:55 | 2009-06-29 07:06:55 | 2009-06-29 08:02:40 | LINESTRING(12979026.970 4876730.251, 12979022... | 41655.491556 | 246.215181 |
3 | 2_2009-06-29 08:03:40 | 2009-06-29 08:03:40 | 2009-06-29 11:13:12 | LINESTRING(12949605.674 4863764.794, 12949579... | 8333.942283 | 357.660458 |
4 | 3_2009-02-04 04:32:53 | 2009-02-04 04:32:53 | 2009-02-04 11:20:12 | LINESTRING(12955995.635 4851388.236, 12955991... | 16626.383723 | 304.099365 |
5 | 4_2009-03-10 10:36:45 | 2009-03-10 10:36:45 | 2009-03-10 12:01:07 | LINESTRING(12956258.794 4851917.156, 12956257... | 18739.337154 | 300.716597 |
6.000000 | 5_2009-02-25 09:47:03 | 2009-02-25 09:47:03 | 2009-02-25 14:31:24 | LINESTRING(12955947.434 4851460.354, 12955946... | 51327.869585 | 305.185128 |
SpeedSplitter
# 把超过30分钟速度低于1地理系统单位/秒的轨迹分开
split = mpd.SpeedSplitter(tc).split(speed=1, duration=timedelta(minutes=30))
split.to_traj_gdf()
traj_id | start_t | end_t | geometry | length | direction | |
1_0 | 2008-12-11 04:42:14 | 2008-12-11 05:15:46 | LINESTRING(12956620.805 4851214.113, 12956622... | 8048.160604 | 186.679744 | |
1 | 2_0 | 2009-06-29 07:02:25 | 2009-06-29 08:20:15 | LINESTRING(12978845.964 4876404.973, 12978840... | 47336.977010 | 252.952783 |
2 | 2_1 | 2009-06-29 10:57:22 | 2009-06-29 11:10:07 | LINESTRING(12948650.441 4867044.718, 12948642... | 2915.988294 | 138.780873 |
3 | 3_0 | 2009-02-04 04:32:53 | 2009-02-04 04:35:03 | LINESTRING(12955995.635 4851388.236, 12955991... | 310.440780 | 42.937430 |
4 | 3_1 | 2009-02-04 10:03:23 | 2009-02-04 11:20:12 | LINESTRING(12956043.836 4851524.490, 12956025... | 15962.930350 | 302.882421 |
5 | 4_0 | 2009-03-10 10:36:45 | 2009-03-10 12:01:07 | LINESTRING(12956258.794 4851917.156, 12956257... | 18349.431950 | 300.716597 |
6.000000 | 5_0 | 2009-02-25 09:47:03 | 2009-02-25 10:54:47 | LINESTRING(12955947.434 4851460.354, 12955946... | 27081.554127 | 335.377580 |
7 | 5_1 | 2009-02-25 13:30:23 | 2009-02-25 14:31:24 | LINESTRING(12945952.057 4873516.928, 12945956... | 24006.683028 | 165.708568 |
1.3.2 Trajectory compression and smoothing
MovingPandas provides a variety of trajectory compression classes and trajectory smoothing classes to reduce the size of the trajectory object (the number of points) for faster processing.
The compressed trajectory class only needs to be initialized with the desired trajectory, and then the generalize function is called to compress the trajectory. The classes used to compress the trajectory are:
- MinDistanceGeneralizer: Compresses the track by deleting some points in the original data, and the distance between the deleted points and adjacent points must be less than the specified minimum distance.
- DouglasPeuckerGeneralizer (Douglas-Pucker): The DouglasPeucker algorithm gradually deletes some points from the original data based on a specified threshold, resulting in an approximate simplified line string or trajectory.
- MaxDistanceGeneralizer: Tracks are compressed by removing points in the original data that are more than a specified maximum distance from adjacent points.
- MinTimeDeltaGeneralizer (minimum time interval): Track compression is achieved by removing data points where the time interval between two consecutive timestamps is less than the specified minimum time interval.
- TopDownTimeRatioGeneralizer: Gradually deletes data points from the original data based on a preset time ratio.
At present, the KalmanSmootherCV class only provides the KalmanSmootherCV class for trajectory smoothing, and the KalmanSmootherCV requires the installation of a third library and the processing process is troublesome, so it is generally a compression trajectory class with similar effects.
# 读取数据
gdf = read_file('data/geolife_small.gpkg').to_crs('EPSG:3857')
tc = mpd.TrajectoryCollection(gdf, 'trajectory_id', t='t')
# 提取单条轨迹进行操作
original_traj = tc.trajectories[1]
# 可以看到轨迹包括897个点
print(original_traj)
Trajectory 2 (2009-06-29 07:02:25 to 2009-06-29 11:13:12) | Size: 897 | Length: 50621.7m
Bounds: (12948595.449314836, 4861831.088215791, 12979030.643375682, 4877940.244020148)
LINESTRING (12978845.964340456 4876404.972802613, 12978840.175726935 4876411.664456934, 12978837.281
# tolerance设置连续点之间的最小距离
mpd.MinDistanceGeneralizer(original_traj).generalize(tolerance=1.0)
Trajectory 2 (2009-06-29 07:02:25 to 2009-06-29 11:13:12) | Size: 818 | Length: 50611.6m
Bounds: (12948595.449314836, 4861831.088215791, 12979030.643375682, 4877940.244020148)
LINESTRING (12978845.964340456 4876404.972802613, 12978840.175726935 4876411.664456934, 12978837.281
# tolerance表示距离公差,具体使用见算法介绍
mpd.DouglasPeuckerGeneralizer(original_traj).generalize(tolerance=1.0)
Trajectory 2 (2009-06-29 07:02:25 to 2009-06-29 11:13:12) | Size: 564 | Length: 50606.8m
Bounds: (12948595.449314836, 4861831.088215791, 12979030.643375682, 4877940.244020148)
LINESTRING (12978845.964340456 4876404.972802613, 12978837.281420175 4876414.573873267, 12978852.086
# tolerance设置连续点之间的最大距离
mpd.MaxDistanceGeneralizer(original_traj).generalize(tolerance=1.0)
Trajectory 2 (2009-06-29 07:02:25 to 2009-06-29 11:13:12) | Size: 324 | Length: 50166.8m
Bounds: (12948595.449314836, 4861831.088215791, 12979029.752819754, 4877940.244020148)
LINESTRING (12978845.964340456 4876404.972802613, 12978837.281420175 4876414.573873267, 12978852.086
# tolerance连续轨迹最小的时间距离
mpd.MinTimeDeltaGeneralizer(original_traj).generalize(tolerance=timedelta(minutes=1))
Trajectory 2 (2009-06-29 07:02:25 to 2009-06-29 11:13:12) | Size: 76 | Length: 47565.6m
Bounds: (12948636.414887449, 4862053.18880025, 12979029.30754179, 4877912.7458354365)
LINESTRING (12978845.964340456 4876404.972802613, 12978815.79675845 4876446.868451926, 12978780.3971
# tolerance距离公差,该类处理速度很慢
mpd.TopDownTimeRatioGeneralizer(original_traj).generalize(tolerance=1.0)
Trajectory 2 (2009-06-29 07:02:25 to 2009-06-29 11:13:12) | Size: 756 | Length: 50616.4m
Bounds: (12948595.449314836, 4861831.088215791, 12979030.643375682, 4877940.244020148)
LINESTRING (12978845.964340456 4876404.972802613, 12978840.175726935 4876411.664456934, 12978837.281
1.3.3 Trajectory dwell detection
MovingPandas provides the function of trajectory dwell detection, in fact the track dwell does not mean that the trajectory movement speed is zero, in fact the object movement speed rarely reaches 0 due to inaccurate tracking, for example GPS tracks tend to keep moving around the stop position of the object. Therefore, the way MovingPandas detects trajectory dwelling is that if an object stays in an area of a specified size for more than a certain period of time, it is judged to be a trajectory dwelling. The specific use is as follows:
# 读取数据
gdf = read_file('data/geolife_small.gpkg').to_crs('EPSG:3857')
tc = mpd.TrajectoryCollection(gdf, 'trajectory_id', t='t')
# 使用单个轨迹进行运算,也可以轨迹集合
my_traj = tc.trajectories[0]
my_traj
Trajectory 1 (2008-12-11 04:42:14 to 2008-12-11 05:15:46) | Size: 466 | Length: 8101.4m
Bounds: (12955985.950308602, 4845963.532957837, 12956871.051579898, 4851235.877839145)
LINESTRING (12956620.805364596 4851214.112520242, 12956622.141198486 4851220.49700885, 12956578.8379
# 创建停留检测器
detector = mpd.TrajectoryStopDetector(tc)
# 获取在某个直径为10的区域停留60s以上的轨迹时间范围
stop_time_ranges = detector.get_stop_time_ranges(min_duration=timedelta(seconds=60), max_diameter=10)
for x in stop_time_ranges:
print(x)
Traj 2: 2009-06-29 07:05:30 - 2009-06-29 07:06:55 (duration: 0 days 00:01:25)
Traj 2: 2009-06-29 08:02:40 - 2009-06-29 08:03:40 (duration: 0 days 00:01:00)
# 获取在某个直径为10的区域停留60s以上的轨迹点
stop_points = detector.get_stop_points(min_duration=timedelta(seconds=60), max_diameter=10)
stop_points
geometry | start_time | end_time | traj_id | duration_s | |
stop_id | |||||
2_2009-06-29 07:05:30 | POINT (12979027.415 4876729.960) | 2009-06-29 07:05:30 | 2009-06-29 07:06:55 | 2 | 85.0 |
2_2009-06-29 08:02:40 | POINT (12949605.785 4863764.939) | 2009-06-29 08:02:40 | 2009-06-29 08:03:40 | 2 | 60.0 |
# 获取在某个直径为10的区域停留60s以上的轨迹片段
stops = detector.get_stop_segments(min_duration=timedelta(seconds=60), max_diameter=10)
stops
TrajectoryCollection with 2 trajectories
1.3.4 Trajectory clustering
Tmovingpandas provides the TrajectoryCollectionAggregator class, which is used to merge multiple TrajectoryCollections into one larger Trajectory Collection. The specific use is as follows:
# 读取数据
gdf = read_file('data/geolife_small.gpkg').to_crs('EPSG:3857')
tc = mpd.TrajectoryCollection(gdf, 'trajectory_id', t='t')
# 简化轨迹
generalized = mpd.MinDistanceGeneralizer(tc).generalize(tolerance=100)
# 聚类
aggregator = mpd.TrajectoryCollectionAggregator(generalized, max_distance=1000, min_distance=100, min_stop_duration=timedelta(minutes=5))
# 从合并的轨迹集合中提取显著点,并返回一个Geopandas的GeoDataFrame对象
pts = aggregator.get_significant_points_gdf()
pts.head()
geometry | |
POINT (12956620.805 4851214.113) | |
1 | POINT (12956054.412 4846377.879) |
2 | POINT (12956409.855 4851235.878) |
3 | POINT (12956533.642 4851120.522) |
4 | POINT (12956436.794 4851148.817) |
# 合并的轨迹集合视为一个整体,并将其划分为不同的簇
# 返回每个簇的质心结果
clusters = aggregator.get_clusters_gdf()
clusters.head()
geometry | n | |
POINT (12955779.665 4851264.376) | 30 | |
1 | POINT (12956533.123 4846314.334) | 6.000000 |
2 | POINT (12956511.072 4849943.391) | 4 |
3 | POINT (12956768.526 4848886.514) | 2 |
4 | POINT (12956668.895 4847819.306) | 2 |
# 将合并后的轨迹数据转换为一个GeoDataFrame对象,其中每一条记录代表着一次移动(即一段轨迹)
flows = aggregator.get_flows_gdf()
flows.head()
geometry | weight | |
LINESTRING(12955779.665 4851264.376, 12956511... | 1 | |
1 | LINESTRING(12956511.072 4849943.391, 12956768... | 1 |
2 | LINESTRING(12956768.526 4848886.514, 12956668... | 1 |
3 | LINESTRING(12956668.895 4847819.306, 12956533... | 1 |
4 | LINESTRING(12978848.347 4876830.605, 12978353... | 1 |
1.3.5 Trajectory distance calculation
MovingPandas provides a distance function and a hausdorff_distance function to calculate the distance between two geometric objects. The distance function calculates the distance based on shapely-objectdistance, while the hausdorff_distance returns the Hausdorf distance. The following is an example:
# 定义轨迹1
df = pd.DataFrame([
{'geometry':Point(0,0), 't':datetime(2023,7,1,12,0,0)},
{'geometry':Point(6,0), 't':datetime(2023,7,1,12,6,0)},
{'geometry':Point(6,6), 't':datetime(2023,7,1,12,10,0)},
{'geometry':Point(9,9), 't':datetime(2023,7,1,12,15,0)}
]).set_index('t')
# 单位为米
geo_df = GeoDataFrame(df, crs='EPSG:3857')
toy_traj = mpd.Trajectory(geo_df, 1)
# 定义轨迹2
df = pd.DataFrame([
{'geometry':Point(3,3), 't':datetime(2023,7,1,12,0,0)},
{'geometry':Point(3,9), 't':datetime(2023,7,1,12,6,0)},
{'geometry':Point(2,9), 't':datetime(2023,7,1,12,10,0)},
{'geometry':Point(0,7), 't':datetime(2023,7,1,12,15,0)}
]).set_index('t')
geo_df = GeoDataFrame(df, crs='EPSG:3857')
toy_traj2 = mpd.Trajectory(geo_df, 1)
# 展示轨迹
ax = toy_traj.plot()
toy_traj2.plot(ax=ax, color='red')
<Axes: >
# 计算距离,默认单位由坐标系单位决定,该坐标系单位为米
print(toy_traj.distance(toy_traj2))
# 计算hausdorff距离,默认单位由坐标系单位决定。
print(toy_traj.hausdorff_distance(toy_traj2))
3.0
6.082762530298219
# 计算距离,units自定义单位为厘米
print(toy_traj.distance(toy_traj2, units="cm"))
# 计算hausdorff距离,units自定义单位为英寸
print(toy_traj.hausdorff_distance(toy_traj2, units="ft"))
300.0
19.95656998129337
2 Analysis of seagull migration trajectory in an example of trajectory mapping
The trajectory of seagulls was analyzed based on their migration data. The data can be downloaded from movingpandas-examples-data.
Step1 Load the data
df = read_file('data/gulls.gpkg')
# 展示数据
df.head()
event-id | visible | timestamp | staðsetning-löng | location-lat | sensor-type | individual-taxon-canonical-name | tag-local-identifier | individual-local-identifier | study-name | geometry | |
1082620685 | true | 2009-05-27 14:00:00.000 | 24.58617 | 61.24783 | gps | Larus fuscus | 91732 | 91732A | Navigation experiments in lesser black-backed ... | POINT (24.58617 61.24783) | |
1 | 1082620686 | true | 2009-05-27 20:00:00.000 | 24.58217 | 61.23267 | gps | Larus fuscus | 91732 | 91732A | Navigation experiments in lesser black-backed ... | POINT (24.58217 61.23267) |
2 | 1082620687 | true | 2009-05-28 05:00:00.000 | 24.53133 | 61.18833 | gps | Larus fuscus | 91732 | 91732A | Navigation experiments in lesser black-backed ... | POINT (24.53133 61.18833) |
3 | 1082620688 | true | 2009-05-28 08:00:00.000 | 24.58200 | 61.23283 | gps | Larus fuscus | 91732 | 91732A | Navigation experiments in lesser black-backed ... | POINT (24.58200 61.23283) |
4 | 1082620689 | true | 2009-05-28 14:00:00.000 | 24.58250 | 61.23267 | gps | Larus fuscus | 91732 | 91732A | Navigation experiments in lesser black-backed ... | POINT (24.58250 61.23267) |
# 查看数据维度
df.shape
(89867, 11)
# 展示数据坐标点
df.plot()
<Axes: >
# 海鸥唯一id
df['individual-local-identifier'].unique()
array(['91732A', '91733A', '91734A', '91735A', '91737A', '91738A',
'91739A', '91740A', '91741A', '91742A', '91743A', '91744A',
'91745A', '91746A', '91747A', '91748A', '91749A', '91750A',
'91751A', '91752A', '91754A', '91755A', '91756A', '91758A',
'91759A', '91761A', '91762A', '91763A', '91764A', '91765A',
'91766A', '91767A', '91769A', '91771A', '91774A', '91775A',
'91776A', '91777A', '91778A', '91779A', '91780A', '91781A',
'91782A', '91783A', '91785A', '91786A', '91787A', '91788A',
'91789A', '91794A', '91795A', '91797A', '91798A', '91799A',
'91800A', '91802A', '91803A', '91807A', '91809A', '91810A',
'91811A', '91812A', '91813A', '91814A', '91815A', '91816A',
'91819A', '91821A', '91823A', '91824A', '91825A', '91826A',
'91827A', '91828A', '91829A', '91830A', '91831A', '91832A',
'91835A', '91836A', '91837A', '91838A', '91839A', '91843A',
'91845A', '91846A', '91848A', '91849A', '91852A', '91854A',
'91858A', '91861A', '91862A', '91864A', '91865A', '91866A',
'91870A', '91871A', '91872A', '91875A', '91876A', '91877A',
'91878A', '91880A', '91881A', '91884A', '91885A', '91893A',
'91894A', '91897A', '91900A', '91901A', '91903A', '91907A',
'91908A', '91910A', '91911A', '91913A', '91916A', '91918A',
'91919A', '91920A', '91921A', '91924A', '91929A', '91930A'],
dtype=object)
# 海鸥数量
len(df['individual-local-identifier'].unique())
126
# 查看海鸥个体轨迹记录数,可以看到呈长尾分布
df['individual-local-identifier'].value_counts().plot(kind='bar', figsize=(16,4))
<Axes: xlabel='individual-local-identifier'>
# 创建轨迹集合,以海鸥编号为轨迹编号
tc = mpd.TrajectoryCollection(df, traj_id_col='individual-local-identifier', t='timestamp', min_length=100, crs='EPSG:3857')
tc
TrajectoryCollection with 125 trajectories
# 压缩轨迹
tc = mpd.MinTimeDeltaGeneralizer(tc).generalize(tolerance=timedelta(days=1))
Step2 Individual analysis
# 挑选个体编号91916A的海鸥轨迹,该海鸥轨迹记录数最多
filtered = tc.filter('individual-local-identifier', '91916A')
my_traj = filtered.trajectories[0].copy()
my_traj
Trajectory 91916A (2009-08-15 15:00:00 to 2015-08-27 09:00:00) | Size: 2134 | Length: 77841407.0m
Bounds: (7.915, 20.70533, 39.51317, 61.54817)
LINESTRING (7.915 54.18533, 9.44183 54.87233, 9.4425 54.87283, 9.41167 54.8555, 9.39583 54.88, 9.396
# 展示轨迹信息
# my_traj.hvplot(title='Movement', line_width=2)
my_traj.plot()
<Axes: >
As you can see from the image above, the seagull has been moving back and forth seasonally. Therefore, according to the year, the trajectory information of the seagull is split.
trips_by_year = mpd.TemporalSplitter(filtered).split(mode='year')
trips_by_year.to_traj_gdf()
traj_id | start_t | end_t | geometry | length | direction | |
91916A_2009-12-31 00:00:00 | 2009-08-15 15:00:00 | 2009-12-31 19:00:00 | LINESTRING (7.91500 54.18533, 9.44183 54.87233... | 5.352375E+06 | 131.939002 | |
1 | 91916A_2010-12-31 00:00:00 | 2010-01-01 19:00:00 | 2010-12-31 07:00:00 | LINESTRING(39.18417 21.17583, 39.18833 21.170... | 1.232428E+07 | 281.047091 |
2 | 91916A_2011-12-31 00:00:00 | 2011-01-01 07:00:00 | 2011-12-31 04:00:00 | LINESTRING(39.17000 21.18017, 39.16883 21.156... | 1.441793E+07 | 189.238229 |
3 | 91916A_2012-12-31 00:00:00 | 2012-01-01 04:00:00 | 2012-12-31 19:00:00 | LINESTRING(39.16933 21.16667, 39.17567 21.142... | 1.324811e+07 | 62.132640 |
4 | 91916A_2013-12-31 00:00:00 | 2013-01-01 19:00:00 | 2013-12-31 13:00:00 | LINESTRING(39.18167 21.17333, 39.18100 21.173... | 1.293261E+07 | 273.165851 |
5 | 91916A_2014-12-31 00:00:00 | 2014-01-01 13:00:00 | 2014-12-31 19:00:00 | LINESTRING(39.17083 21.15400, 39.17100 21.157... | 1.300973e+07 | 33.742967 |
6.000000 | 91916A_2015-12-31 00:00:00 | 2015-01-01 19:00:00 | 2015-08-27 09:00:00 | LINESTRING(39.18167 21.16967, 39.18233 21.169... | 6.551740E+06 | 343.887905 |
# 按照年份显示轨迹
from matplotlib.cm import tab20
ax = None
for index,tmp in enumerate(trips_by_year):
if ax is None:
ax = tmp.plot(color=tab20(index))
else:
ax= tmp.plot(ax=ax,color=tab20(index))
We can extract the data of a specific year separately for visualization, and show the rate of a certain year as follows:
one_year = trips_by_year.get_trajectory('91916A_2010-12-31 00:00:00')
one_year.add_speed(overwrite=True,units=("km", "h"))
# 可视化当年的轨迹
one_year.plot(column='speed', cmap='tab20', legend=True)
<Axes: >
Of course, it can also be the result of a day in the trajectory, as shown below:
fig, ax = plt.subplots(figsize=(5, 5))
one_year.plot(ax=ax, linewidth=2.0, color='black')
GeoDataFrame([one_year.get_row_at(datetime(2010,9,1))]).plot(ax=ax, color='red',zorder=2)
GeoDataFrame([one_year.get_row_at(datetime(2010,10,1))]).plot(ax=ax, color='red',zorder=2)
GeoDataFrame([one_year.get_row_at(datetime(2010,11,1))]).plot(ax=ax, color='red',zorder=2)
<Axes: >
You can also choose an area, such as a habitat. to determine the number of times a track reaches the area and the time it takes to enter and exit.
area_of_interest = Polygon([(30, 25), (50, 25), (50, 15), (30, 15), (30, 25)])
plotted_area_of_interest = GeoDataFrame(pd.DataFrame([{'geometry': area_of_interest, 'id': 1}]), crs='EPSG:3857')
# 输入的是单个轨迹得到的单个轨迹位于该区域的片段
arrivals = [traj for traj in my_traj.clip(area_of_interest)]
# 位于该区域的轨迹片段数
print("Found {} arrivals".format({len(arrivals)}))
# 提取每条轨迹到达该区域的时间
for traj in arrivals:
name = traj.df['individual-local-identifier'].iloc[0]
start_time = traj.get_start_time()
end_time = traj.get_end_time()
print("Individual {} arrived at {} left at {}".format(name, start_time, end_time))
Found {6} arrivals
Individual 91916A arrived at 2009-12-23 02:58:36.946571 left at 2010-04-17 14:31:42.170632
Individual 91916A arrived at 2010-10-30 20:55:36.697832 left at 2011-04-03 11:40:12.803103
Individual 91916A arrived at 2011-11-09 20:25:19.550486 left at 2012-04-04 09:12:15.852988
Individual 91916A arrived at 2012-10-14 11:25:28.063070 left at 2013-03-30 04:22:18.125679
Individual 91916A arrived at 2013-10-08 04:17:33.524488 left at 2014-03-29 18:26:54.311106
Individual 91916A arrived at 2014-10-28 19:05:32.941608 left at 2015-04-07 22:59:45.284499
Step3 Population analysis
In step 2, the same function can also be applied to the trajectory collection class to analyze multiple trajectories.
year_of_interest = 2010
# 输入的是多个轨迹得到的位于该区域的单个轨迹
trajs_in_aoi = tc.clip(area_of_interest)
# 提取2010到达该区域的轨迹
relevant = [ traj for traj in trajs_in_aoi if traj.get_start_time().year <= year_of_interest
and traj.get_end_time().year >= year_of_interest]
# 查看有多少只海鸥,也就是多少条单个轨迹到达该区域
print("Found {} arrivals".format(len(relevant)))
Found 16 arrivals
# 查看哪些海鸥到达该区域以及停留时间
for traj in relevant:
print("Individual '{}' arrived at {} (duration: {})".format(
traj.df['individual-local-identifier'].iloc[0], traj.get_start_time().date(),
traj.get_end_time()-traj.get_start_time()))
Individual '91732A' arrived at 2010-04-10 (duration: 5 days, 21:27:11.670985)
Individual '91737A' arrived at 2009-12-08 (duration: 140 days, 11:11:29.473206)
Individual '91761A' arrived at 2010-04-11 (duration: 12 days, 6:10:03.857850)
Individual '91761A' arrived at 2010-10-04 (duration: 6 days, 10:42:00.340093)
Individual '91762A' arrived at 2010-04-19 (duration: 42 days, 1:28:22.699066)
Individual '91771A' arrived at 2009-12-23 (duration: 66 days, 8:05:31.053782)
Individual '91789A' arrived at 2009-11-10 (duration: 550 days, 20:39:18.769409)
Individual '91824A' arrived at 2010-05-06 (duration: 12:53:53.409236)
Individual '91832A' arrived at 2010-04-21 (duration: 3 days, 5:48:46.276706)
Individual '91832A' arrived at 2010-09-23 (duration: 1 day, 1:40:25.175880)
Individual '91837A' arrived at 2010-05-04 (duration: 1 day, 18:38:46.170554)
Individual '91846A' arrived at 2010-05-15 (duration: 10 days, 2:50:28.505732)
Individual '91862A' arrived at 2010-01-06 (duration: 248 days, 6:10:16.962620)
Individual '91910A' arrived at 2010-09-28 (duration: 2 days, 20:34:31.117736)
Individual '91916A' arrived at 2009-12-23 (duration: 115 days, 11:33:05.224061)
Individual '91916A' arrived at 2010-10-30 (duration: 154 days, 14:44:36.105271)
For the '91761A' seagull, it can be seen to enter the area twice. We can show the seagull's trajectory during the year and add a background map to better show the seagull's travel.
my_traj = tc.get_trajectory('91761A')
# 提取当年飞行结果
my_traj = my_traj.get_segment_between(datetime(year_of_interest,1,1), datetime(year_of_interest,12,31))
# 压缩轨迹,tolerance越大压缩比例越大
my_traj = mpd.DouglasPeuckerGeneralizer(my_traj).generalize(tolerance=2.0)
my_traj.plot()
<Axes: >
By plotting the flight trajectory of the seagull in the following code, you can see that it passed through the area on the way to and from the destination, and the background map added is shown in the GeoPandas overlay background map for details.
import contextily as cx
fig, ax = plt.subplots(figsize=(3, 5))
# 添加飞行轨迹
ax = my_traj.plot(ax=ax,color='blue', capstyle='round')
# 添加停留区域
ax = gpd.GeoSeries(area_of_interest).plot(ax=ax, color='lightgray')
# 绘制1月到11月的位置,12月该鸟飞回原地
for i in range(1,12):
# 红点表示上半年,绿点表示下半年
color = 'red' if i<6 else 'green'
GeoDataFrame([my_traj.get_row_at(datetime(year_of_interest,i,1))]).plot(ax=ax, color=color,zorder=2)
# 添加背景地图,zoom越大越精细,这里使用自适应zoom。
cx.add_basemap(ax = ax , source=cx.providers.Stamen.TonerLite,crs=my_traj.df.crs, zoom='auto',attribution="")
# 保存图片
fig.savefig("res.jpg",dpi=300)
3 See also
- movingpandas
- movingpandas-examples
- movingpandas-examples-data
- shapely-objectdistance
- Hausdorf distance
- GeoPandas overlay background map