laitimes

[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North

MovingPandas is an open-source geospatio-temporal data processing library based on Python and GeoPandas, which is used to process the trajectory data of moving objects. It provides a powerful set of tools that make it easy to load, analyze, and visualize the trajectory of moving objects. By using MovingPandas, users can easily process and analyze moving object data and extract insights about behavior, patterns, and trends from it. Whether you're working with traffic flow data, logistics trajectory data, or animal migration data, MovingPandas is a powerful geovisualization tool.

The MovingPandas library has the following key features:

  • Trajectory Objects: MovingPandas introduces Trajectory objects for representing and managing trajectory data. The object contains time, position, and attribute information to facilitate manipulation and analysis of the trajectory.
  • Spatiotemporal slicing: MovingPandas supports spatiotemporal slicing operations, which can filter and segment trajectory data according to temporal and spatial conditions.
  • Trajectory analysis: MovingPandas provides a wealth of trajectory analysis functions, including calculating trajectory length, velocity, direction, acceleration and other indicators, as well as trajectory clustering, interaction and similarity analysis.
  • Trajectory visualization: MovingPandas can easily visualize trajectory data and support drawing trajectory lines, points, and heat maps on the map, helping users understand the behavior of moving objects more intuitively.
  • Data format compatibility: MovingPandas supports a variety of common geographic data formats, including GeoJSON, Shapefile, etc., which is convenient for users to load and save trajectory data.

The official address of the MovingPandas repository is: movingpandas. The address of the official sample code repository for MovingPandas is: movingpandas-examples. All experimental data in this article comes from: movingpandas-examples-data.

All the code in this article can be found at: Python-Study-Notes

The author recommends installing MovingPandas in Python 3.8 or above, and recommends using conda for installation. MovingPandas can be installed using the following command:

conda install -c conda-forge movingpandas           

Due to the complexity of MovingPandas' dependency environment, pip is not recommended for installation. If you insist on using pip for installation, you can install MovingPandas by following the following command:

pip install movingpandas
# 本文必安装第三方库
pip install contextily
# 以下第三方库可选
pip install hvplot
pip install cartopy
pip install geoviews           

The following code shows the version of MovingPandas, which is Python 3.10.

# jupyter notebook环境去除warning
import warnings
warnings.filterwarnings("ignore")

# 查看movingpandas版本及依赖库版本信息
import movingpandas as mpd
mpd.show_versions()           
MovingPandas 0.16.1

SYSTEM INFO
-----------
python     : 3.10.10 (main, Mar 21 2023, 18:45:11) [GCC 11.2.0]
executable : /opt/conda/envs/python35-paddle120-env/bin/python
machine    : Linux-5.4.0-104-generic-x86_64-with-glibc2.31

GEOS, GDAL, PROJ INFO
---------------------
GEOS       : None
GEOS lib   : None
GDAL       : 3.6.4
GDAL data dir: /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/fiona/gdal_data
PROJ       : 9.2.1
PROJ data dir: /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/pyproj/proj_dir/share/proj

PYTHON DEPENDENCIES
-------------------
geopandas  : 0.13.2
pandas     : 2.0.2
fiona      : 1.9.4.post1
numpy      : 1.24.3
shapely    : 2.0.1
rtree      : 1.0.1
pyproj     : 3.6.0
matplotlib : 3.7.1
mapclassify: None
geopy      : 2.3.0
holoviews  : None
hvplot     : None
geoviews   : None
stonesoup  : None           

[toc]

1 Getting Started

1.1 Infrastructure

# 加载第三方库
import pandas as pd
import geopandas as gpd
import movingpandas as mpd
import shapely as shp
# hvPlot是建立在数据可视化库holoviews的一个高级封装库
# import hvplot.pandas 

from geopandas import GeoDataFrame, read_file
from shapely.geometry import Point, LineString, Polygon
from datetime import datetime, timedelta
# hvplot通常与HoloViews集成使用
# from holoviews import opts

# 设置绘图界面
# opts.defaults(opts.Overlay(active_tools=['wheel_zoom'], frame_width=500, frame_height=400))           

1.1.1 Trajectory类

MovingPandas provides a Trajectory object to represent an object for a single trajectory. It consists of a series of coordinate points. The Trajectory object initialization parameters are as follows:

class Trajectory:
    def __init__(
        self,
        df,
        traj_id,
        obj_id=None,
        t=None,
        x=None,
        y=None,
        crs="epsg:4326",
        parent=None,
    ):           

The individual parameters are explained below:

  • df: GeoDataFrame with Geometry coordinate column and timestamp index of GeoPandas, or DataFrame of Pandas. Required parameters.
  • traj_id: Any type that represents a unique identifier for a track. Required parameters.
  • obj_id: Any type, a unique identifier for a moving object. The default is None.
  • t: indicates the column name that contains the timestamp, which is None by default.
  • x: indicates the name of the column containing the x coordinates, which must be specified in the DataFrame of Pandas. The default is None.
  • y: indicates the name of the column containing the y coordinates, which must be specified when using the DataFrame of Pandas. The default is None.
  • CRS: A coordinate reference system that represents the X/Y coordinates. The default is "epsg:4326", which is WGS84.
  • parent: A Trajectory object, representing the parent trajectory. The default is None.

As can be seen from the initialization parameters of the Trajectory object, MovingPandas represents the trajectory with the relevant point position information and its time information. Here's an example of how to use the Trajectory object.

First, create a GeoDataFrame object that represents the location and its corresponding time information.

import pandas as pd
from shapely.geometry import Point
from geopandas import GeoDataFrame
import movingpandas as mpd
from datetime import datetime

# 创建DataFrame对象
df = pd.DataFrame([
    {'geometry': Point(0, 0), 't': datetime(2023, 7, 1, 12, 0, 0)},
    {'geometry': Point(6, 0), 't': datetime(2023, 7, 1, 12, 6, 0)},
    {'geometry': Point(6, 6), 't': datetime(2023, 7, 1, 12, 10, 0)},
    {'geometry': Point(9, 9), 't': datetime(2023, 7, 1, 12, 15, 0)}])
# 设置索引轴
df = df.set_index('t')

# 创建GeoDataFrame对象
# EPSG:3857是在线地图常用的投影坐标系,以米为单位
gdf = GeoDataFrame(df, crs='EPSG:3857')
gdf           
geometry
t
2023-07-01 12:00:00 POINT (0.000 0.000)
2023-07-01 12:06:00 POINT (6.000 0.000)
2023-07-01 12:10:00 POINT (6.000 6.000)
2023-07-01 12:15:00 POINT (9.000 9.000)

A trajectory object is then created based on the GeoDataFrame object.

# 创建Trajectory对象
# 1表示轨迹标识符
toy_traj = mpd.Trajectory(gdf, 1)
toy_traj           
Trajectory 1 (2023-07-01 12:00:00 to 2023-07-01 12:15:00) | Size: 4 | Length: 16.2m
Bounds: (0.0, 0.0, 9.0, 9.0)
LINESTRING (0 0, 6 0, 6 6, 9 9)           

Next, the trajectory path can be visualized, and the horizontal and vertical axes represent the xy coordinates of the points.

# 可视化路径
toy_traj.plot()           
<Axes: >           
[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North
# 交互式展示
# toy_traj.hvplot()           

In addition, you can also directly call the toy_traj GeoPandas object for visualization.

toy_traj.df.plot()           
<Axes: >           
[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North

1.1.2 TrajectoryCollection类

The TrajectoryCollection class is a collection that is used to store multiple Trajectory objects. It provides the ability to manipulate and analyze the entire collection of trajectories. You can use TrajectoryCollection to merge multiple tracks into a single object, filter tracks, do spatial querying, visualize, and more. The initialization parameters of the TrajectoryCollection object are as follows:

class TrajectoryCollection:
    def __init__(
        self,
        data,
        traj_id_col=None,
        obj_id_col=None,
        t=None,
        x=None,
        y=None,
        crs="epsg:4326",
        min_length=0,
        min_duration=None,
    )           

Most of the parameters are the same as those of the Trajectory object, and the different parameters are explained as follows:

  • min_length: The minimum length of the track, below which tracks will be discarded.
  • min_duration: The expected minimum duration of the track, tracks below that duration will be discarded.

Create a TrajectoryCollection object directly

# 创建两条轨迹
df = pd.DataFrame(
    {
        "t": pd.date_range("2023-07-01", periods=5, freq="min"),
        "trajectory_id": [1, 1, 2, 2, 2],
        "geometry": [Point(0, 0), Point(0, 1), Point(1, 2), Point(1, 3), Point(2, 4)],
    }
)
gdf = gpd.GeoDataFrame(df, crs='EPSG:3857')
gdf           
t trajectory_id geometry
2023-07-01 00:00:00 1 POINT (0.000 0.000)
1 2023-07-01 00:01:00 1 POINT (0.000 1.000)
2 2023-07-01 00:02:00 2 POINT (1.000 2.000)
3 2023-07-01 00:03:00 2 POINT (1.000 3.000)
4 2023-07-01 00:04:00 2 POINT (2.000 4.000)
# 创建TrajectoryCollection对象,用trajectory_id作为单个轨迹的唯一标识
tc = mpd.TrajectoryCollection(gdf, traj_id_col='trajectory_id', t='t')
tc           
TrajectoryCollection with 2 trajectories           
# 绘制轨迹, column指定轨迹对象,legend展示轨迹持续时间
import matplotlib.cm as cm
tc.plot(column='trajectory_id', legend=True)           
<Axes: >           
[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North
# 交互式展示轨迹
# tc.hvplot()           

In addition, we can extract individual trajectories from the TrajectoryCollection, or filter the desired trajectories.

# 从TrajectoryCollection提取单个轨迹
my_traj = tc.trajectories[1]
print(my_traj)           
Trajectory 2 (2023-07-01 00:02:00 to 2023-07-01 00:04:00) | Size: 3 | Length: 2.4m
Bounds: (1.0, 2.0, 2.0, 4.0)
LINESTRING (1 2, 1 3, 2 4)           
# 筛选路径持续时间大于一分钟的
min_duration = timedelta(minutes=1)
tc.trajectories = [traj for traj in tc if traj.get_duration() > min_duration]
print(tc.trajectories)           
[Trajectory 2 (2023-07-01 00:02:00 to 2023-07-01 00:04:00) | Size: 3 | Length: 2.4m
Bounds: (1.0, 2.0, 2.0, 4.0)
LINESTRING (1 2, 1 3, 2 4)]           

Create a TrajectoryCollection object from file data

TrajectoryCollections can be created by reading data from a variety of geospatial data stores using the GeoPandas function, or by reading data from a variety of file types using Pandas.

The following shows how to create a TrajectoryCollection object from a csv file and download the data from movingpandas-examples-data

# 读取数据
df = pd.read_csv('data/geolife_small.csv', delimiter=';')
df.head()           
X And Fid id sequence trajectory_id tracker t
116.391305 39.898573 1 1 1 1 19 2008-12-11 04:42:14+00
1 116.391317 39.898617 2 2 2 1 19 2008-12-11 04:42:16+00
2 116.390928 39.898613 3 3 3 1 19 2008-12-11 04:43:26+00
3 116.390833 39.898635 4 4 4 1 19 2008-12-11 04:43:32+00
4 116.389410 39.898723 5 5 5 1 19 2008-12-11 04:43:47+00
# 转换为traj_collection对象
traj_collection = mpd.TrajectoryCollection(df, 'trajectory_id', t='t', x='X', y='Y')
print(traj_collection)           
TrajectoryCollection with 5 trajectories           
# 绘制轨迹
traj_collection.plot(column='trajectory_id', legend=True, figsize=(9,5))           
<Axes: >           
[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North

1.2 Basic Functions

The functions in MovingPandas can be applied to both a single Trajectory object and a TrajectoryCollection object to enable the analysis of single and multiple trajectories.

1.2.1 Move Info Add Function

MovingPandas provides mobile information adding functions such as add_speed, add_distance, add_direction, add_timedelta, add_timedelta, and add_acceleration to process moving data. The interfaces for these functions are as follows:

add_speed(overwrite=False, name=SPEED_COL_NAME, units=UNITS())
add_distance(overwrite=False, name=DISTANCE_COL_NAME, units=None)
add_direction(overwrite=False, name=DIRECTION_COL_NAME)
add_angular_difference(overwrite=False,name=ANGULAR_DIFFERENCE_COL_NAM):
add_timedelta(overwrite=False, name=TIMEDELTA_COL_NAME)
add_acceleration(overwrite=False, name=ACCELERATION_COL_NAME, units=UNITS())           

All functions use system defaults or global variables, and the specific functions are described below:

  • add_speed: Used to calculate the speed of moving objects. The velocity is calculated based on the distance and time difference between the two time points. The overwrite parameter indicates whether to overwrite the existing velocity column, name indicates the name of the velocity column, units indicates the unit of velocity, and the default value is used if the name and unit are not specified.
  • add_distance: Used to calculate the distance of moving objects. The distance is calculated based on the straight-line distance difference between two time points. The overwrite parameter indicates whether to overwrite the existing distance column, name indicates the name of the distance column, and units indicates the unit of distance.
  • add_direction: Used to calculate the direction of moving objects. The direction is calculated based on the difference in coordinates between two time points. The overwrite parameter indicates whether to overwrite the existing direction column, and name indicates the name of the direction column. The direction value is in degrees, rotating clockwise from north, with a range value of [0,360].
  • add_angular_difference: Used to calculate the angular difference between two objects. The angular difference between two objects is calculated based on the orientation information of the moving object. The overwrite parameter indicates whether to overwrite the existing direction column, and name indicates the name of the direction column. The angular range value is [0,180].
  • add_timedelta: Used to calculate the time difference of moving objects. The time difference is calculated based on the difference between the timestamps. The overwrite parameter indicates whether to overwrite the existing time difference column, and name indicates the name of the time difference column. The time difference is the difference between the current row and the previous row.
  • add_acceleration: Used to calculate the acceleration of a moving object. Acceleration is calculated based on velocity and time difference. The overwrite parameter indicates whether to overwrite the existing acceleration column, name indicates the name of the acceleration column, and units indicates the unit of acceleration.

The following code describes the use of these functions in detail.

# 创建轨迹
df = pd.DataFrame([
  {'geometry':Point(0,0), 't':datetime(2023,7,1,12,0,0)},
  {'geometry':Point(6,0), 't':datetime(2023,7,1,12,0,6)},
  {'geometry':Point(6,6), 't':datetime(2023,7,1,12,0,11)},
  {'geometry':Point(9,9), 't':datetime(2023,7,1,12,0,14)}
]).set_index('t')
gdf = GeoDataFrame(df, crs='EPSG:3857')
toy_traj = mpd.Trajectory(gdf, 1)
toy_traj           
Trajectory 1 (2023-07-01 12:00:00 to 2023-07-01 12:00:14) | Size: 4 | Length: 16.2m
Bounds: (0.0, 0.0, 9.0, 9.0)
LINESTRING (0 0, 6 0, 6 6, 9 9)           
# 查看轨迹的坐标系,以米为单位
toy_traj.crs           
<Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World between 85.06°S and 85.06°N.
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich           
# 添加速度列,不指定单位则以米为单位
toy_traj.add_speed(overwrite=True)
toy_traj.df           
geometry speed
t
2023-07-01 12:00:00 POINT (0.000 0.000) 1
2023-07-01 12:00:06 POINT (6.000 0.000) 1
2023-07-01 12:00:11 POINT (6.000 6.000) 1.200000
2023-07-01 12:00:14 POINT (9.000 9.000) 1.414214
# 添加自定义定单位的速度列
toy_traj.add_speed(overwrite=True, name="speed (ft/min)", units=("ft", "min"))
toy_traj.df           
geometry speed speed (ft/min)
t
2023-07-01 12:00:00 POINT (0.000 0.000) 1 196.850394
2023-07-01 12:00:06 POINT (6.000 0.000) 1 196.850394
2023-07-01 12:00:11 POINT (6.000 6.000) 1.200000 236.220472
2023-07-01 12:00:14 POINT (9.000 9.000) 1.414214 278.388497
# 其他列的添加
toy_traj.add_distance(overwrite=True, name="distance (m)", units="m")
toy_traj.add_direction(overwrite=True)
toy_traj.add_timedelta(overwrite=True)
toy_traj.add_angular_difference(overwrite=True)
toy_traj.add_acceleration(overwrite=True, name="acceleration (mph/s)", units=("mi", "h", "s"))
toy_traj.df           
geometry speed speed (ft/min) distance (m) direction timedelta angular_difference acceleration (mph/s)
t
2023-07-01 12:00:00 POINT (0.000 0.000) 1 196.850394 0.000000 90.0 Night 0.000000 0.000000
2023-07-01 12:00:06 POINT (6.000 0.000) 1 196.850394 6.000000 90.0 0 days 00:00:06 0.000000 0.000000
2023-07-01 12:00:11 POINT (6.000 6.000) 1.200000 236.220472 6.000000 0.000000 0 days 00:00:05 90.0 0.089477
2023-07-01 12:00:14 POINT (9.000 9.000) 1.414214 278.388497 4.242641 45.0 0 days 00:00:03 45.0 0.159727
# 可视化某列结果绘制结果
toy_traj.plot(column='speed', linewidth=5, capstyle='round', figsize=(9,3), legend=True, vmax=20)           
<Axes: >           
[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North

1.2.2 Trajectory position extraction function

MovingPandas provides track position extraction functions such as add_speed, add_distance, add_direction, add_timedelta, add_timedelta, and add_acceleration to extract track position-related information. The specific functions are described as follows:

  • get_start_location(): This function is used to get the starting position of the moving object. It returns the coordinates of the starting position of the moving object.
  • get_end_location(): This function is used to get the end position of the moving object. It returns the coordinates of the end position of the moving object.
  • get_position_at(t, method="interpolated"): This function is used to get the position of a moving object at a given point in time (t). The parameter t indicates the point in time of the desired location, while the optional parameter method specifies the method used to get the location, which defaults to "interpolated", which means that the location is obtained using the interpolation method.
  • get_segment_between(t1, t2): This function is used to get a fragment of a moving object between two given points in time (t1 and t2). It returns a subset of objects that were moved during that time.
  • clip(polygon, point_based=False): This function is used to clip the moving object so that it only retains the part within the specified polygon area. The parameter polygon represents the area of the polygon used for clipping, while the optional parameter point_based specify whether clipping is based on points rather than segments. By default, clipping is segment-based. In addition, MovingPandas also provides an intersection function to calculate the intersection between the trajectory data and the space geometry, which is similar to the clip function, which can be used in the official documentation.

The following code describes the use of these functions in detail.

# 创建轨迹
df = pd.DataFrame([
  {'geometry':Point(0,0), 't':datetime(2023,7,1,12,0,0)},
  {'geometry':Point(6,0), 't':datetime(2023,7,1,12,6,6)},
  {'geometry':Point(6,6), 't':datetime(2023,7,1,12,10,11)},
  {'geometry':Point(9,9), 't':datetime(2023,7,1,12,19,14)}
]).set_index('t')
gdf = GeoDataFrame(df, crs='EPSG:3857')
toy_traj = mpd.Trajectory(gdf, 1)
toy_traj           
Trajectory 1 (2023-07-01 12:00:00 to 2023-07-01 12:19:14) | Size: 4 | Length: 16.2m
Bounds: (0.0, 0.0, 9.0, 9.0)
LINESTRING (0 0, 6 0, 6 6, 9 9)           

Start and end positions

ax = toy_traj.plot()
# 提取轨迹起始点和结束点
gpd.GeoSeries(toy_traj.get_start_location()).plot(ax=ax, color='blue')
gpd.GeoSeries(toy_traj.get_end_location()).plot(ax=ax, color='red')           
<Axes: >           
[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North

Intermediate position

# 提取轨迹中的一个时间点的位置
t = datetime(2023,7,1,12,6,0)
# 改时间点不同位置信息获取方式
print(toy_traj.get_position_at(t, method="nearest"))
print(toy_traj.get_position_at(t, method="interpolated"))
print(toy_traj.get_position_at(t, method="ffill"))
print(toy_traj.get_position_at(t, method="bfill"))           
POINT (6 0)
POINT (5.9016393442622945 0)
POINT (0 0)
POINT (6 0)           
point = toy_traj.get_position_at(t, method="interpolated")
ax = toy_traj.plot()
gpd.GeoSeries(point).plot(ax=ax, color='red', markersize=100)           
<Axes: >           
[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North

Track fragments

# 提取轨迹中的一个片段
segment = toy_traj.get_segment_between(datetime(2023,7,1,12,6,0), datetime(2023,7,1,12,12,0))
print(segment)           
Trajectory 1_2023-07-01 12:06:00 (2023-07-01 12:06:06 to 2023-07-01 12:10:11) | Size: 2 | Length: 6.0m
Bounds: (6.0, 0.0, 6.0, 6.0)
LINESTRING (6 0, 6 6)           
ax = toy_traj.plot()
segment.plot(ax=ax, color='red', linewidth=5)           
<Axes: >           
[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North

Trajectory area

xmin, xmax, ymin, ymax = 2, 8, -10, 5
# 设置区域
polygon = Polygon([(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin), (xmin, ymin)])           
# 提取位于该区域的轨迹
intersections = toy_traj.clip(polygon)
intersections           
TrajectoryCollection with 1 trajectories           
# 绘制轨迹区域
ax = toy_traj.plot()
gpd.GeoSeries(polygon).plot(ax=ax, color='lightgray')
intersections.plot(ax=ax, color='red', linewidth=5, capstyle='round')           
<Axes: >           
[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North
# 单独绘制轨迹区域
intersections = toy_traj.clip(polygon)
intersections.plot()           
<Axes: >           
[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North

1.2.3 与GeoPandas交互

MovingPandas provides to_point_gdf, to_line_gdf, and to_traj_gdf functions to convert the trajectory classes Trajectories and TrajectoryCollections back to GeoDataFrames structures for GeoPandas according to different rules. These functions are explained as follows:

  • to_point_gdf (return_orig_tz=False) to convert the track object to a GeoDataFrame associated with each point. return_orig_tz is used to specify whether to return a timestamp that matches the original time zone. The default is False, and the timestamp is converted according to the local time zone.
  • to_line_gdf() to convert the track object to a GeoDataFrame associated with each line.
  • to_traj_gdf(wkt=False, agg=False) returns a GeoDataFrame, each row representing a track. wkt is used to specify whether the trajectory geometry is represented in the Well-Known Text (WKT) format. Defaults to False, which returns geometric objects. AGG is used to specify whether or not to aggregate tracks. The default value is False, no aggregation is performed, and the optional parameters of agg are shown in the code.
# 读取数据
df = pd.read_csv('data/geolife_small.csv', delimiter=';')
df.head()
# 转换为traj_collection对象
traj_collection = mpd.TrajectoryCollection(df, 'trajectory_id', t='t', x='X', y='Y')
print(traj_collection)           
TrajectoryCollection with 5 trajectories           
# 转换为点GeoDataFrame
traj_collection.to_point_gdf().head()           
Fid id sequence trajectory_id tracker geometry
t
2008-12-11 04:42:14 1 1 1 1 19 POINT (116.39131 39.89857)
2008-12-11 04:42:16 2 2 2 1 19 POINT (116.39132 39.89862)
2008-12-11 04:43:26 3 3 3 1 19 POINT (116.39093 39.89861)
2008-12-11 04:43:32 4 4 4 1 19 POINT (116.39083 39.89863)
2008-12-11 04:43:47 5 5 5 1 19 POINT (116.38941 39.89872)
# 转换为线GeoDataFrame
traj_collection.to_line_gdf().head()           
Fid id sequence trajectory_id tracker t prev_t geometry
2 2 2 1 19 2008-12-11 04:42:16 2008-12-11 04:42:14 LINESTRING(116.39131 39.89857, 116.39132 39.8...
1 3 3 3 1 19 2008-12-11 04:43:26 2008-12-11 04:42:16 LINESTRING(116.39132 39.89862, 116.39093 39.8...
2 4 4 4 1 19 2008-12-11 04:43:32 2008-12-11 04:43:26 LINESTRING(116.39093 39.89861, 116.39083 39.8...
3 5 5 5 1 19 2008-12-11 04:43:47 2008-12-11 04:43:32 LINESTRING(116.39083 39.89863, 116.38941 39.8...
4 6.000000 6.000000 6.000000 1 19 2008-12-11 04:43:50 2008-12-11 04:43:47 LINESTRING (116.38941 39.89872, 116.39052 39.8...
# 每条轨迹单独聚合GeoDataFrame
traj_collection.to_traj_gdf()           
traj_id start_t end_t geometry length direction
1 2008-12-11 04:42:14 2008-12-11 05:15:46 LINESTRING(116.39131 39.89857, 116.39132 39.8... 6207.020261 186.681376
1 2 2009-06-29 07:02:25 2009-06-29 11:13:12 LINESTRING(116.59096 40.07196, 116.59091 40.0... 38764.575483 250.585295
2 3 2009-02-04 04:32:53 2009-02-04 11:20:12 LINESTRING (116.38569 39.89977, 116.38565 39.8... 12745.157506 304.115160
3 4 2009-03-10 10:36:45 2009-03-10 12:01:07 LINESTRING (116.38805 39.90342, 116.38804 39.9... 14363.780551 300.732843
4 5 2009-02-25 09:47:03 2009-02-25 14:31:24 LINESTRING(116.38526 39.90027, 116.38525 39.9... 39259.779560 305.200501

1.3 Trajectory processing

1.3.1 Trajectory extraction

MovingPandas provides TemporalSplitter, ObservationGapSplitter, StopSplitter, and SpeedSplitter classes to extract specified track segments from tracks according to different rules. The details are as follows:

  • TemporalSplitter: Splits the track into sub-intervals using regular time intervals.
  • ObservationGapSplitter: Splits the trajectory into sub-intervals based on the interval between observation times. If the interval between two consecutive observations exceeds the specified threshold, the trajectory is considered to need to be split.
  • StopSplitter: Splits the trajectory into sub-intervals based on the definition of the stops. A stop is when a trajectory stays in a relatively small area for a period of time.
  • SpeedSplitter: Splits the trajectory into sub-sections based on the speed threshold. If the trajectory velocity falls below a specified threshold at a certain segment, you need to split the trajectory here.

These classes only need to be initialized with the pending tracks, and then the split function is called to extract the traces. See the following code for specific use.

import matplotlib.pyplot as plt

# 读取数据
gdf = read_file('data/geolife_small.gpkg').to_crs('EPSG:3857')
tc = mpd.TrajectoryCollection(gdf, 'trajectory_id', t='t')
# 共五条轨迹
tc.to_traj_gdf()           
traj_id start_t end_t geometry length direction
1 2008-12-11 04:42:14 2008-12-11 05:15:46 LINESTRING(12956620.805 4851214.113, 12956622... 8101.428690 186.679744
1 2 2009-06-29 07:02:25 2009-06-29 11:13:12 LINESTRING(12978845.964 4876404.973, 12978840... 50621.731208 250.500509
2 3 2009-02-04 04:32:53 2009-02-04 11:20:12 LINESTRING(12955995.635 4851388.236, 12955991... 16626.383723 304.099365
3 4 2009-03-10 10:36:45 2009-03-10 12:01:07 LINESTRING(12956258.794 4851917.156, 12956257... 18739.337154 300.716597
4 5 2009-02-25 09:47:03 2009-02-25 14:31:24 LINESTRING(12955947.434 4851460.354, 12955946... 51327.869585 305.185128

TemporalSplitter

# 将轨迹切分为以每小时为单位的轨迹片段,min_length表示删除轨迹长度小于该值的轨迹片段,min_lwngth取决于轨迹的单位
split = mpd.TemporalSplitter(tc).split(mode="hour", min_length=20000)
split.to_traj_gdf()           
traj_id start_t end_t geometry length direction
2_2009-06-29 07:00:00 2009-06-29 07:02:25 2009-06-29 07:59:55 LINESTRING(12978845.964 4876404.973, 12978840... 40106.105641 245.123482
1 5_2009-02-25 10:00:00 2009-02-25 10:00:04 2009-02-25 10:54:47 LINESTRING(12955251.019 4851210.485, 12955248... 25455.673254 337.129155
# 绘制轨迹
fig, axes = plt.subplots(nrows=1, ncols=len(split), figsize=(6,4))
for i, traj in enumerate(split):
    traj.plot(ax=axes[i], linewidth=3.0, capstyle='round', column='speed', vmax=20)           
[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North

ObservationGapSplitter

# 如果两个连续观测超过间隔gap,如30分钟,则认为该轨迹需要拆分
split = mpd.ObservationGapSplitter(tc).split(gap=timedelta(minutes=30),min_length=0)
split.to_traj_gdf()           
traj_id start_t end_t geometry length direction
1_0 2008-12-11 04:42:14 2008-12-11 05:15:46 LINESTRING(12956620.805 4851214.113, 12956622... 8101.428690 186.679744
1 2_0 2009-06-29 07:02:25 2009-06-29 08:20:15 LINESTRING(12978845.964 4876404.973, 12978840... 47469.321958 252.952783
2 2_1 2009-06-29 10:57:17 2009-06-29 11:13:12 LINESTRING(12948649.439 4867034.108, 12948650... 3040.348707 139.615947
3 3_0 2009-02-04 04:32:53 2009-02-04 04:35:03 LINESTRING(12955995.635 4851388.236, 12955991... 311.729231 42.937430
4 3_1 2009-02-04 10:03:21 2009-02-04 11:20:12 LINESTRING(12956011.999 4851497.646, 12956043... 16228.264596 303.229612
5 4_0 2009-03-10 10:36:45 2009-03-10 12:01:07 LINESTRING(12956258.794 4851917.156, 12956257... 18739.337154 300.716597
6.000000 5_0 2009-02-25 09:47:03 2009-02-25 10:54:47 LINESTRING(12955947.434 4851460.354, 12955946... 27149.500896 335.377580
7 5_1 2009-02-25 13:30:22 2009-02-25 14:31:24 LINESTRING(12945965.972 4873487.115, 12945952... 24074.321167 165.727187

StopSplitter

# 如果轨迹在某一点为圆心,直径为10范围内停留60s,则认为轨迹需要在该段分割
split = mpd.StopSplitter(tc).split(max_diameter=10, min_duration=timedelta(seconds=60))
split.to_traj_gdf()           
traj_id start_t end_t geometry length direction
1_2008-12-11 04:42:14 2008-12-11 04:42:14 2008-12-11 05:15:46 LINESTRING(12956620.805 4851214.113, 12956622... 8101.428690 186.679744
1 2_2009-06-29 07:02:25 2009-06-29 07:02:25 2009-06-29 07:05:30 LINESTRING(12978845.964 4876404.973, 12978840... 608.233016 29.527683
2 2_2009-06-29 07:06:55 2009-06-29 07:06:55 2009-06-29 08:02:40 LINESTRING(12979026.970 4876730.251, 12979022... 41655.491556 246.215181
3 2_2009-06-29 08:03:40 2009-06-29 08:03:40 2009-06-29 11:13:12 LINESTRING(12949605.674 4863764.794, 12949579... 8333.942283 357.660458
4 3_2009-02-04 04:32:53 2009-02-04 04:32:53 2009-02-04 11:20:12 LINESTRING(12955995.635 4851388.236, 12955991... 16626.383723 304.099365
5 4_2009-03-10 10:36:45 2009-03-10 10:36:45 2009-03-10 12:01:07 LINESTRING(12956258.794 4851917.156, 12956257... 18739.337154 300.716597
6.000000 5_2009-02-25 09:47:03 2009-02-25 09:47:03 2009-02-25 14:31:24 LINESTRING(12955947.434 4851460.354, 12955946... 51327.869585 305.185128

SpeedSplitter

# 把超过30分钟速度低于1地理系统单位/秒的轨迹分开
split = mpd.SpeedSplitter(tc).split(speed=1, duration=timedelta(minutes=30))
split.to_traj_gdf()           
traj_id start_t end_t geometry length direction
1_0 2008-12-11 04:42:14 2008-12-11 05:15:46 LINESTRING(12956620.805 4851214.113, 12956622... 8048.160604 186.679744
1 2_0 2009-06-29 07:02:25 2009-06-29 08:20:15 LINESTRING(12978845.964 4876404.973, 12978840... 47336.977010 252.952783
2 2_1 2009-06-29 10:57:22 2009-06-29 11:10:07 LINESTRING(12948650.441 4867044.718, 12948642... 2915.988294 138.780873
3 3_0 2009-02-04 04:32:53 2009-02-04 04:35:03 LINESTRING(12955995.635 4851388.236, 12955991... 310.440780 42.937430
4 3_1 2009-02-04 10:03:23 2009-02-04 11:20:12 LINESTRING(12956043.836 4851524.490, 12956025... 15962.930350 302.882421
5 4_0 2009-03-10 10:36:45 2009-03-10 12:01:07 LINESTRING(12956258.794 4851917.156, 12956257... 18349.431950 300.716597
6.000000 5_0 2009-02-25 09:47:03 2009-02-25 10:54:47 LINESTRING(12955947.434 4851460.354, 12955946... 27081.554127 335.377580
7 5_1 2009-02-25 13:30:23 2009-02-25 14:31:24 LINESTRING(12945952.057 4873516.928, 12945956... 24006.683028 165.708568

1.3.2 Trajectory compression and smoothing

MovingPandas provides a variety of trajectory compression classes and trajectory smoothing classes to reduce the size of the trajectory object (the number of points) for faster processing.

The compressed trajectory class only needs to be initialized with the desired trajectory, and then the generalize function is called to compress the trajectory. The classes used to compress the trajectory are:

  • MinDistanceGeneralizer: Compresses the track by deleting some points in the original data, and the distance between the deleted points and adjacent points must be less than the specified minimum distance.
  • DouglasPeuckerGeneralizer (Douglas-Pucker): The DouglasPeucker algorithm gradually deletes some points from the original data based on a specified threshold, resulting in an approximate simplified line string or trajectory.
  • MaxDistanceGeneralizer: Tracks are compressed by removing points in the original data that are more than a specified maximum distance from adjacent points.
  • MinTimeDeltaGeneralizer (minimum time interval): Track compression is achieved by removing data points where the time interval between two consecutive timestamps is less than the specified minimum time interval.
  • TopDownTimeRatioGeneralizer: Gradually deletes data points from the original data based on a preset time ratio.

At present, the KalmanSmootherCV class only provides the KalmanSmootherCV class for trajectory smoothing, and the KalmanSmootherCV requires the installation of a third library and the processing process is troublesome, so it is generally a compression trajectory class with similar effects.

# 读取数据
gdf = read_file('data/geolife_small.gpkg').to_crs('EPSG:3857')
tc = mpd.TrajectoryCollection(gdf, 'trajectory_id', t='t')
# 提取单条轨迹进行操作
original_traj = tc.trajectories[1]
# 可以看到轨迹包括897个点
print(original_traj)           
Trajectory 2 (2009-06-29 07:02:25 to 2009-06-29 11:13:12) | Size: 897 | Length: 50621.7m
Bounds: (12948595.449314836, 4861831.088215791, 12979030.643375682, 4877940.244020148)
LINESTRING (12978845.964340456 4876404.972802613, 12978840.175726935 4876411.664456934, 12978837.281           
# tolerance设置连续点之间的最小距离
mpd.MinDistanceGeneralizer(original_traj).generalize(tolerance=1.0)           
Trajectory 2 (2009-06-29 07:02:25 to 2009-06-29 11:13:12) | Size: 818 | Length: 50611.6m
Bounds: (12948595.449314836, 4861831.088215791, 12979030.643375682, 4877940.244020148)
LINESTRING (12978845.964340456 4876404.972802613, 12978840.175726935 4876411.664456934, 12978837.281           
# tolerance表示距离公差,具体使用见算法介绍
mpd.DouglasPeuckerGeneralizer(original_traj).generalize(tolerance=1.0)           
Trajectory 2 (2009-06-29 07:02:25 to 2009-06-29 11:13:12) | Size: 564 | Length: 50606.8m
Bounds: (12948595.449314836, 4861831.088215791, 12979030.643375682, 4877940.244020148)
LINESTRING (12978845.964340456 4876404.972802613, 12978837.281420175 4876414.573873267, 12978852.086           
# tolerance设置连续点之间的最大距离
mpd.MaxDistanceGeneralizer(original_traj).generalize(tolerance=1.0)           
Trajectory 2 (2009-06-29 07:02:25 to 2009-06-29 11:13:12) | Size: 324 | Length: 50166.8m
Bounds: (12948595.449314836, 4861831.088215791, 12979029.752819754, 4877940.244020148)
LINESTRING (12978845.964340456 4876404.972802613, 12978837.281420175 4876414.573873267, 12978852.086           
# tolerance连续轨迹最小的时间距离
mpd.MinTimeDeltaGeneralizer(original_traj).generalize(tolerance=timedelta(minutes=1))           
Trajectory 2 (2009-06-29 07:02:25 to 2009-06-29 11:13:12) | Size: 76 | Length: 47565.6m
Bounds: (12948636.414887449, 4862053.18880025, 12979029.30754179, 4877912.7458354365)
LINESTRING (12978845.964340456 4876404.972802613, 12978815.79675845 4876446.868451926, 12978780.3971           
# tolerance距离公差,该类处理速度很慢
mpd.TopDownTimeRatioGeneralizer(original_traj).generalize(tolerance=1.0)           
Trajectory 2 (2009-06-29 07:02:25 to 2009-06-29 11:13:12) | Size: 756 | Length: 50616.4m
Bounds: (12948595.449314836, 4861831.088215791, 12979030.643375682, 4877940.244020148)
LINESTRING (12978845.964340456 4876404.972802613, 12978840.175726935 4876411.664456934, 12978837.281           

1.3.3 Trajectory dwell detection

MovingPandas provides the function of trajectory dwell detection, in fact the track dwell does not mean that the trajectory movement speed is zero, in fact the object movement speed rarely reaches 0 due to inaccurate tracking, for example GPS tracks tend to keep moving around the stop position of the object. Therefore, the way MovingPandas detects trajectory dwelling is that if an object stays in an area of a specified size for more than a certain period of time, it is judged to be a trajectory dwelling. The specific use is as follows:

# 读取数据
gdf = read_file('data/geolife_small.gpkg').to_crs('EPSG:3857')
tc = mpd.TrajectoryCollection(gdf, 'trajectory_id', t='t')
# 使用单个轨迹进行运算,也可以轨迹集合
my_traj = tc.trajectories[0]
my_traj           
Trajectory 1 (2008-12-11 04:42:14 to 2008-12-11 05:15:46) | Size: 466 | Length: 8101.4m
Bounds: (12955985.950308602, 4845963.532957837, 12956871.051579898, 4851235.877839145)
LINESTRING (12956620.805364596 4851214.112520242, 12956622.141198486 4851220.49700885, 12956578.8379           
# 创建停留检测器
detector = mpd.TrajectoryStopDetector(tc)
# 获取在某个直径为10的区域停留60s以上的轨迹时间范围
stop_time_ranges = detector.get_stop_time_ranges(min_duration=timedelta(seconds=60), max_diameter=10)
for x in stop_time_ranges: 
    print(x)           
Traj 2: 2009-06-29 07:05:30 - 2009-06-29 07:06:55 (duration: 0 days 00:01:25)
Traj 2: 2009-06-29 08:02:40 - 2009-06-29 08:03:40 (duration: 0 days 00:01:00)           
# 获取在某个直径为10的区域停留60s以上的轨迹点
stop_points = detector.get_stop_points(min_duration=timedelta(seconds=60), max_diameter=10)
stop_points           
geometry start_time end_time traj_id duration_s
stop_id
2_2009-06-29 07:05:30 POINT (12979027.415 4876729.960) 2009-06-29 07:05:30 2009-06-29 07:06:55 2 85.0
2_2009-06-29 08:02:40 POINT (12949605.785 4863764.939) 2009-06-29 08:02:40 2009-06-29 08:03:40 2 60.0
# 获取在某个直径为10的区域停留60s以上的轨迹片段
stops = detector.get_stop_segments(min_duration=timedelta(seconds=60), max_diameter=10)
stops           
TrajectoryCollection with 2 trajectories           

1.3.4 Trajectory clustering

Tmovingpandas provides the TrajectoryCollectionAggregator class, which is used to merge multiple TrajectoryCollections into one larger Trajectory Collection. The specific use is as follows:

# 读取数据
gdf = read_file('data/geolife_small.gpkg').to_crs('EPSG:3857')
tc = mpd.TrajectoryCollection(gdf, 'trajectory_id', t='t')
# 简化轨迹
generalized = mpd.MinDistanceGeneralizer(tc).generalize(tolerance=100)
# 聚类
aggregator = mpd.TrajectoryCollectionAggregator(generalized, max_distance=1000, min_distance=100, min_stop_duration=timedelta(minutes=5))           
# 从合并的轨迹集合中提取显著点,并返回一个Geopandas的GeoDataFrame对象
pts = aggregator.get_significant_points_gdf()
pts.head()           
geometry
POINT (12956620.805 4851214.113)
1 POINT (12956054.412 4846377.879)
2 POINT (12956409.855 4851235.878)
3 POINT (12956533.642 4851120.522)
4 POINT (12956436.794 4851148.817)
# 合并的轨迹集合视为一个整体,并将其划分为不同的簇
# 返回每个簇的质心结果
clusters = aggregator.get_clusters_gdf()
clusters.head()           
geometry n
POINT (12955779.665 4851264.376) 30
1 POINT (12956533.123 4846314.334) 6.000000
2 POINT (12956511.072 4849943.391) 4
3 POINT (12956768.526 4848886.514) 2
4 POINT (12956668.895 4847819.306) 2
# 将合并后的轨迹数据转换为一个GeoDataFrame对象,其中每一条记录代表着一次移动(即一段轨迹)
flows = aggregator.get_flows_gdf()
flows.head()           
geometry weight
LINESTRING(12955779.665 4851264.376, 12956511... 1
1 LINESTRING(12956511.072 4849943.391, 12956768... 1
2 LINESTRING(12956768.526 4848886.514, 12956668... 1
3 LINESTRING(12956668.895 4847819.306, 12956533... 1
4 LINESTRING(12978848.347 4876830.605, 12978353... 1

1.3.5 Trajectory distance calculation

MovingPandas provides a distance function and a hausdorff_distance function to calculate the distance between two geometric objects. The distance function calculates the distance based on shapely-objectdistance, while the hausdorff_distance returns the Hausdorf distance. The following is an example:

# 定义轨迹1
df = pd.DataFrame([
  {'geometry':Point(0,0), 't':datetime(2023,7,1,12,0,0)},
  {'geometry':Point(6,0), 't':datetime(2023,7,1,12,6,0)},
  {'geometry':Point(6,6), 't':datetime(2023,7,1,12,10,0)},
  {'geometry':Point(9,9), 't':datetime(2023,7,1,12,15,0)}
]).set_index('t')
# 单位为米
geo_df = GeoDataFrame(df, crs='EPSG:3857')
toy_traj = mpd.Trajectory(geo_df, 1)

# 定义轨迹2
df = pd.DataFrame([
  {'geometry':Point(3,3), 't':datetime(2023,7,1,12,0,0)},
  {'geometry':Point(3,9), 't':datetime(2023,7,1,12,6,0)},
  {'geometry':Point(2,9), 't':datetime(2023,7,1,12,10,0)},
  {'geometry':Point(0,7), 't':datetime(2023,7,1,12,15,0)}
]).set_index('t')
geo_df = GeoDataFrame(df, crs='EPSG:3857')
toy_traj2 = mpd.Trajectory(geo_df, 1)           
# 展示轨迹
ax = toy_traj.plot()
toy_traj2.plot(ax=ax, color='red')           
<Axes: >           
[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North
# 计算距离,默认单位由坐标系单位决定,该坐标系单位为米
print(toy_traj.distance(toy_traj2))
# 计算hausdorff距离,默认单位由坐标系单位决定。
print(toy_traj.hausdorff_distance(toy_traj2))           
3.0
6.082762530298219           
# 计算距离,units自定义单位为厘米
print(toy_traj.distance(toy_traj2, units="cm"))
# 计算hausdorff距离,units自定义单位为英寸
print(toy_traj.hausdorff_distance(toy_traj2, units="ft"))           
300.0
19.95656998129337           

2 Analysis of seagull migration trajectory in an example of trajectory mapping

The trajectory of seagulls was analyzed based on their migration data. The data can be downloaded from movingpandas-examples-data.

Step1 Load the data

df = read_file('data/gulls.gpkg')
# 展示数据
df.head()           
event-id visible timestamp staðsetning-löng location-lat sensor-type individual-taxon-canonical-name tag-local-identifier individual-local-identifier study-name geometry
1082620685 true 2009-05-27 14:00:00.000 24.58617 61.24783 gps Larus fuscus 91732 91732A Navigation experiments in lesser black-backed ... POINT (24.58617 61.24783)
1 1082620686 true 2009-05-27 20:00:00.000 24.58217 61.23267 gps Larus fuscus 91732 91732A Navigation experiments in lesser black-backed ... POINT (24.58217 61.23267)
2 1082620687 true 2009-05-28 05:00:00.000 24.53133 61.18833 gps Larus fuscus 91732 91732A Navigation experiments in lesser black-backed ... POINT (24.53133 61.18833)
3 1082620688 true 2009-05-28 08:00:00.000 24.58200 61.23283 gps Larus fuscus 91732 91732A Navigation experiments in lesser black-backed ... POINT (24.58200 61.23283)
4 1082620689 true 2009-05-28 14:00:00.000 24.58250 61.23267 gps Larus fuscus 91732 91732A Navigation experiments in lesser black-backed ... POINT (24.58250 61.23267)
# 查看数据维度
df.shape           
(89867, 11)           
# 展示数据坐标点
df.plot()           
<Axes: >           
[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North
# 海鸥唯一id
df['individual-local-identifier'].unique()           
array(['91732A', '91733A', '91734A', '91735A', '91737A', '91738A',
       '91739A', '91740A', '91741A', '91742A', '91743A', '91744A',
       '91745A', '91746A', '91747A', '91748A', '91749A', '91750A',
       '91751A', '91752A', '91754A', '91755A', '91756A', '91758A',
       '91759A', '91761A', '91762A', '91763A', '91764A', '91765A',
       '91766A', '91767A', '91769A', '91771A', '91774A', '91775A',
       '91776A', '91777A', '91778A', '91779A', '91780A', '91781A',
       '91782A', '91783A', '91785A', '91786A', '91787A', '91788A',
       '91789A', '91794A', '91795A', '91797A', '91798A', '91799A',
       '91800A', '91802A', '91803A', '91807A', '91809A', '91810A',
       '91811A', '91812A', '91813A', '91814A', '91815A', '91816A',
       '91819A', '91821A', '91823A', '91824A', '91825A', '91826A',
       '91827A', '91828A', '91829A', '91830A', '91831A', '91832A',
       '91835A', '91836A', '91837A', '91838A', '91839A', '91843A',
       '91845A', '91846A', '91848A', '91849A', '91852A', '91854A',
       '91858A', '91861A', '91862A', '91864A', '91865A', '91866A',
       '91870A', '91871A', '91872A', '91875A', '91876A', '91877A',
       '91878A', '91880A', '91881A', '91884A', '91885A', '91893A',
       '91894A', '91897A', '91900A', '91901A', '91903A', '91907A',
       '91908A', '91910A', '91911A', '91913A', '91916A', '91918A',
       '91919A', '91920A', '91921A', '91924A', '91929A', '91930A'],
      dtype=object)           
# 海鸥数量
len(df['individual-local-identifier'].unique())           
126           
# 查看海鸥个体轨迹记录数,可以看到呈长尾分布
df['individual-local-identifier'].value_counts().plot(kind='bar', figsize=(16,4))           
<Axes: xlabel='individual-local-identifier'>           
[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North
# 创建轨迹集合,以海鸥编号为轨迹编号
tc = mpd.TrajectoryCollection(df, traj_id_col='individual-local-identifier', t='timestamp', min_length=100, crs='EPSG:3857')    
tc           
TrajectoryCollection with 125 trajectories           
# 压缩轨迹
tc = mpd.MinTimeDeltaGeneralizer(tc).generalize(tolerance=timedelta(days=1))           

Step2 Individual analysis

# 挑选个体编号91916A的海鸥轨迹,该海鸥轨迹记录数最多
filtered = tc.filter('individual-local-identifier', '91916A')
my_traj = filtered.trajectories[0].copy()
my_traj           
Trajectory 91916A (2009-08-15 15:00:00 to 2015-08-27 09:00:00) | Size: 2134 | Length: 77841407.0m
Bounds: (7.915, 20.70533, 39.51317, 61.54817)
LINESTRING (7.915 54.18533, 9.44183 54.87233, 9.4425 54.87283, 9.41167 54.8555, 9.39583 54.88, 9.396           
# 展示轨迹信息
# my_traj.hvplot(title='Movement', line_width=2) 
my_traj.plot()           
<Axes: >           
[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North

As you can see from the image above, the seagull has been moving back and forth seasonally. Therefore, according to the year, the trajectory information of the seagull is split.

trips_by_year = mpd.TemporalSplitter(filtered).split(mode='year')
trips_by_year.to_traj_gdf()           
traj_id start_t end_t geometry length direction
91916A_2009-12-31 00:00:00 2009-08-15 15:00:00 2009-12-31 19:00:00 LINESTRING (7.91500 54.18533, 9.44183 54.87233... 5.352375E+06 131.939002
1 91916A_2010-12-31 00:00:00 2010-01-01 19:00:00 2010-12-31 07:00:00 LINESTRING(39.18417 21.17583, 39.18833 21.170... 1.232428E+07 281.047091
2 91916A_2011-12-31 00:00:00 2011-01-01 07:00:00 2011-12-31 04:00:00 LINESTRING(39.17000 21.18017, 39.16883 21.156... 1.441793E+07 189.238229
3 91916A_2012-12-31 00:00:00 2012-01-01 04:00:00 2012-12-31 19:00:00 LINESTRING(39.16933 21.16667, 39.17567 21.142... 1.324811e+07 62.132640
4 91916A_2013-12-31 00:00:00 2013-01-01 19:00:00 2013-12-31 13:00:00 LINESTRING(39.18167 21.17333, 39.18100 21.173... 1.293261E+07 273.165851
5 91916A_2014-12-31 00:00:00 2014-01-01 13:00:00 2014-12-31 19:00:00 LINESTRING(39.17083 21.15400, 39.17100 21.157... 1.300973e+07 33.742967
6.000000 91916A_2015-12-31 00:00:00 2015-01-01 19:00:00 2015-08-27 09:00:00 LINESTRING(39.18167 21.16967, 39.18233 21.169... 6.551740E+06 343.887905
# 按照年份显示轨迹
from matplotlib.cm import tab20
ax = None
for index,tmp in enumerate(trips_by_year):
    if ax is None:
        ax = tmp.plot(color=tab20(index))
    else:
        ax= tmp.plot(ax=ax,color=tab20(index))           
[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North

We can extract the data of a specific year separately for visualization, and show the rate of a certain year as follows:

one_year = trips_by_year.get_trajectory('91916A_2010-12-31 00:00:00')
one_year.add_speed(overwrite=True,units=("km", "h"))
# 可视化当年的轨迹
one_year.plot(column='speed', cmap='tab20', legend=True)           
<Axes: >           
[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North

Of course, it can also be the result of a day in the trajectory, as shown below:

fig, ax = plt.subplots(figsize=(5, 5))
one_year.plot(ax=ax, linewidth=2.0, color='black')
GeoDataFrame([one_year.get_row_at(datetime(2010,9,1))]).plot(ax=ax, color='red',zorder=2)
GeoDataFrame([one_year.get_row_at(datetime(2010,10,1))]).plot(ax=ax, color='red',zorder=2)
GeoDataFrame([one_year.get_row_at(datetime(2010,11,1))]).plot(ax=ax, color='red',zorder=2)           
<Axes: >           
[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North

You can also choose an area, such as a habitat. to determine the number of times a track reaches the area and the time it takes to enter and exit.

area_of_interest = Polygon([(30, 25), (50, 25), (50, 15), (30, 15), (30, 25)])
plotted_area_of_interest = GeoDataFrame(pd.DataFrame([{'geometry': area_of_interest, 'id': 1}]), crs='EPSG:3857')
# 输入的是单个轨迹得到的单个轨迹位于该区域的片段
arrivals = [traj for traj in my_traj.clip(area_of_interest)]
# 位于该区域的轨迹片段数
print("Found {} arrivals".format({len(arrivals)}))
# 提取每条轨迹到达该区域的时间
for traj in arrivals:
    name = traj.df['individual-local-identifier'].iloc[0]
    start_time = traj.get_start_time()
    end_time = traj.get_end_time()
    print("Individual {} arrived at {} left at {}".format(name, start_time, end_time))           
Found {6} arrivals
Individual 91916A arrived at 2009-12-23 02:58:36.946571 left at 2010-04-17 14:31:42.170632
Individual 91916A arrived at 2010-10-30 20:55:36.697832 left at 2011-04-03 11:40:12.803103
Individual 91916A arrived at 2011-11-09 20:25:19.550486 left at 2012-04-04 09:12:15.852988
Individual 91916A arrived at 2012-10-14 11:25:28.063070 left at 2013-03-30 04:22:18.125679
Individual 91916A arrived at 2013-10-08 04:17:33.524488 left at 2014-03-29 18:26:54.311106
Individual 91916A arrived at 2014-10-28 19:05:32.941608 left at 2015-04-07 22:59:45.284499           

Step3 Population analysis

In step 2, the same function can also be applied to the trajectory collection class to analyze multiple trajectories.

year_of_interest = 2010
# 输入的是多个轨迹得到的位于该区域的单个轨迹
trajs_in_aoi = tc.clip(area_of_interest)
# 提取2010到达该区域的轨迹
relevant = [ traj for traj in trajs_in_aoi if traj.get_start_time().year <= year_of_interest 
            and traj.get_end_time().year >= year_of_interest]
# 查看有多少只海鸥,也就是多少条单个轨迹到达该区域
print("Found {} arrivals".format(len(relevant)))           
Found 16 arrivals           
# 查看哪些海鸥到达该区域以及停留时间
for traj in relevant:
    print("Individual '{}' arrived at {} (duration: {})".format(
        traj.df['individual-local-identifier'].iloc[0], traj.get_start_time().date(), 
        traj.get_end_time()-traj.get_start_time()))           
Individual '91732A' arrived at 2010-04-10 (duration: 5 days, 21:27:11.670985)
Individual '91737A' arrived at 2009-12-08 (duration: 140 days, 11:11:29.473206)
Individual '91761A' arrived at 2010-04-11 (duration: 12 days, 6:10:03.857850)
Individual '91761A' arrived at 2010-10-04 (duration: 6 days, 10:42:00.340093)
Individual '91762A' arrived at 2010-04-19 (duration: 42 days, 1:28:22.699066)
Individual '91771A' arrived at 2009-12-23 (duration: 66 days, 8:05:31.053782)
Individual '91789A' arrived at 2009-11-10 (duration: 550 days, 20:39:18.769409)
Individual '91824A' arrived at 2010-05-06 (duration: 12:53:53.409236)
Individual '91832A' arrived at 2010-04-21 (duration: 3 days, 5:48:46.276706)
Individual '91832A' arrived at 2010-09-23 (duration: 1 day, 1:40:25.175880)
Individual '91837A' arrived at 2010-05-04 (duration: 1 day, 18:38:46.170554)
Individual '91846A' arrived at 2010-05-15 (duration: 10 days, 2:50:28.505732)
Individual '91862A' arrived at 2010-01-06 (duration: 248 days, 6:10:16.962620)
Individual '91910A' arrived at 2010-09-28 (duration: 2 days, 20:34:31.117736)
Individual '91916A' arrived at 2009-12-23 (duration: 115 days, 11:33:05.224061)
Individual '91916A' arrived at 2010-10-30 (duration: 154 days, 14:44:36.105271)           

For the '91761A' seagull, it can be seen to enter the area twice. We can show the seagull's trajectory during the year and add a background map to better show the seagull's travel.

my_traj = tc.get_trajectory('91761A')
# 提取当年飞行结果
my_traj = my_traj.get_segment_between(datetime(year_of_interest,1,1), datetime(year_of_interest,12,31))
# 压缩轨迹,tolerance越大压缩比例越大
my_traj = mpd.DouglasPeuckerGeneralizer(my_traj).generalize(tolerance=2.0)
my_traj.plot()           
<Axes: >           
[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North

By plotting the flight trajectory of the seagull in the following code, you can see that it passed through the area on the way to and from the destination, and the background map added is shown in the GeoPandas overlay background map for details.

import contextily as cx
fig, ax = plt.subplots(figsize=(3, 5))
# 添加飞行轨迹
ax = my_traj.plot(ax=ax,color='blue', capstyle='round')
# 添加停留区域
ax = gpd.GeoSeries(area_of_interest).plot(ax=ax, color='lightgray')
# 绘制1月到11月的位置,12月该鸟飞回原地
for i in range(1,12):
    # 红点表示上半年,绿点表示下半年
    color = 'red' if i<6 else 'green'
    GeoDataFrame([my_traj.get_row_at(datetime(year_of_interest,i,1))]).plot(ax=ax, color=color,zorder=2)

# 添加背景地图,zoom越大越精细,这里使用自适应zoom。
cx.add_basemap(ax = ax , source=cx.providers.Stamen.TonerLite,crs=my_traj.df.crs, zoom='auto',attribution="")
# 保存图片
fig.savefig("res.jpg",dpi=300)           
[Data Analysis & Visualization] Python Mapping Data 4 - Getting Started with MovingPandas Pointing North

3 See also

  • ​​movingpandas​​
  • ​​movingpandas-examples​​
  • ​​movingpandas-examples-data​​
  • ​​shapely-objectdistance​​
  • Hausdorf distance
  • GeoPandas overlay background map