laitimes

"Shared Bicycle Data Topic" Technical points of shared bicycle data analysis

author:Cube Data Society

In the previous two articles, we introduced what are the applications of shared bicycle data, how to obtain shared bicycle data, and in this article we bring the third content of shared bicycle data - the technical points of shared bicycle data analysis!

Two pieces of information can be mined from the shared bicycle data: riding and parking, which can be compared to the passenger and unloaded states in the taxi data (). Since a shared bicycle only generates data when the lock is switched on and off, its data characteristics are somewhat different from the taxi GPS data, as shown in Figure 1.

"Shared Bicycle Data Topic" Technical points of shared bicycle data analysis

Figure 1 Data for a day of a bicycle

Consider the two types of shared bicycle data introduced in the first article: switch lock records and order records.

Key fields for switch lock records:

  • Bike ID
  • Latitude and longitude
  • Time
  • Lock status

Key fields of the order record:

  • Bike ID
  • Unlock latitude and longitude
  • Unlocking time
  • Close the latitude and longitude of the lock
  • Shutdown time

Looking at Figure 1, and combining the key fields of the two types of data, comparing the record data of the two types of shared bicycles and the GPS data of the taxi, the following findings can be made:

(1) Sharing bicycle switch lock data and taxi GPS data comparison (1) The collection of the switch lock data of the shared bicycle only occurs when the lock is opened and closed, and one piece of data contains the start and end information of both parking and riding behaviors, and the data of each unlock represents the end of the previous parking behavior and the beginning of the next period of riding behavior, and the data of each lock represents the end of the previous cycling behavior and the beginning of the next period of parking behavior.

(2) There is no data on the switch lock data of the shared bicycle at the beginning and end of the observation period. The first piece of data usually appears on the first ride of the bike, but in fact between the start of the observation period and the first ride, the bike is parked, and this part of the parking. If a bike doesn't ride during the observation time, it doesn't leave a record in the dataset.

(2) Shared bicycle order data and taxi GPS data compared with shared bicycle order data Each row contains the time and space information of the bicycle to start riding and end riding, so it is similar to the order information extracted from the taxi GPS data, and the OD information of the traveler can be intuitively obtained.

Therefore, if you get the order data of the shared bicycle, then you can directly analyze the travel chain based on the spatio-temporal information of the order, but if you get the switch lock data, you need to reconstruct the bicycle travel chain to achieve the purpose of visually reflecting the travel chain of the bicycle within one day. Let's take the switch lock data as an example to introduce the ideas and steps of the travel chain reconstruction.

01 Travel chain reconstruction goal

If the switch lock data is drawn on the three-dimensional coordinate axis, as shown in the left figure of Figure 2, the x and y axes are latitude and longitude, the z axis is time, the red dot represents the closing lock, and the blue dot represents the unlock, our goal is to extract the scatter data of these switch locks into the travel trajectory shown in the figure on the right, such as the bicycle unlocking at A time and closing the lock at B time, which is equivalent to the travel segment between AB, similarly, closing the lock at C time and unlocking at D time, equivalent to the parking segment between CDs.

Therefore, as long as the spatio-temporal trajectory of the shared bicycle, that is, the travel chain, can be extracted separately from the travel segment and the parking segment, and then further analysis can be made. The travel segment can analyze the demand for cycling by calculating the characteristics of the number of rides, riding distance, riding time and so on; Parking section is also very critical, parking is equivalent to occupying a lot of public land, you can do the analysis of bicycle parking occupation of public land through parking location, parking time, etc., parking demand analysis, but also through the location changes that may occur when parking to analyze the scheduling needs.

"Shared Bicycle Data Topic" Technical points of shared bicycle data analysis

Figure 2 Shared bicycle order data points are converted into a schematic diagram of the spatio-temporal trajectory

02 Travel chain reconstruction operation

Since so much analysis can be done after building the travel chain, how do we reconstruct the travel trajectory based on the scattered data of the parking? Below we will introduce the operation idea of re-forming the switch lock data into a travel chain.

"Shared Bicycle Data Topic" Technical points of shared bicycle data analysis

Figure 3 Example of switchlock data for a bicycle

First, we can first extract the data of a car to observe (Figure 3), and after sorting in chronological order, we can observe some of the following:

  • The lock state (LOCK_STATUS) is alternately changed by 0, 1, 0, and 1, that is, there will only be a record when the bicycle lock state changes.
  • The meaning represented by 0 and 1 of the lock state (LOCK_STATUS) is unknown, and the data is probably not introduced with fields when it is taken, and the meaning represented by 0 and 1 needs to be inferred from the data situation. Since the general bicycle will not be ridden for several hours, generally only 5 to 10 minutes, so from the analysis of the adjacent data interval of Figure 3 (11:49:24 LOCK_STATUS is 1, 16:10:53 LOCK_STATUS is 0, from 1 to 0 is more than 4 hours apart), it can be inferred that driving is 0 and locking is 1.
  • The data can also observe a change in the geographical location between 1 (locking the car) and the next 0 (driving), that is, the position of the bicycle changes when parking, which may be caused by two reasons:

(1) The GPS positioning is inaccurate, and the actual position of the bicycle has not changed;

(2) The car has been artificially moved, such as the dispatching situation, the general sharing bicycle company found that there are too many bicycles in one place but no one is riding, there is no bicycle in another place, but there are many people who need to ride, and may send a large truck to move these cars over, that is, the dispatch situation.

If you draw the data for a day of shared bicycles as a timeline (Figure 3), the blue dot indicates that the lock state (LOCK_STATUS) is 0, and the red dot indicates that the lock state (LOCK_STATUS) is 1. Looking at such a timeline, it is difficult to say which section is travel and which section is parking. Therefore, it is necessary to reconstruct the travel chain.

Reconstructing the travel chain is broadly divided into two steps (Figure 4): the first step is to add a data record of the opposite state construct one second before the start of each state; The second step is to add a constructed data record at the beginning of the day (00:00) and the end (24:00).

"Shared Bicycle Data Topic" Technical points of shared bicycle data analysis

Figure 4 Schematic diagram of the chain reconstruction steps for a bicycle

Here's a brief discussion of each step:

(1) Why should you add an opposite state a second before each state starts? Because the original data only has a state change point, the travel segment cannot be identified, and in order to construct the travel segment, it is necessary to hit a point in front of each data point, which is an opposite state, so that the cycling segment and the parking segment can be constructed (Figure 4).

(2) Add records at the beginning and end of the day We also need to add a record to each car at 0 o'clock and 24 o'clock of the day, and the record added at 0 points needs to be the same as the first record lock state of the day, and the record added at 24 o'clock needs to be the same as the last record lock state of the day. In this way, a complete bicycle travel chain can be reconstructed.

After reconstructing the travel chain, not only can you know which times of the day the car is parked and which time is ridden, you can not only dig up the characteristics of cycling, analyze the number of rides, riding distance, riding time, etc., but also dig up parking characteristics to analyze parking demand, parking time and analysis scheduling needs through location changes when parking.

03 Summary of travel chain reconstruction ideas

The mobility chain reconstruction is divided into two steps:

(1) Each record of the original data that is added to the opposite state of the record one second before each record of the original data includes two layers of information: the end of the previous state and the beginning of the next state. Therefore, adding a record of the opposite state one second before each record can help us more clearly grasp the start and end of the riding and parking phases.

(2) On the basis of the previous step, the start time of each bicycle at the start and end time of the observation period is inserted into the record to the start time, and the status and position of the inserted record are the same as the first record of the bicycle; For the end time, the status and position of the inserted record are the same as the last record of the bicycle. This step helps us to identify the state of the bicycle at the beginning and end of the observation time.

The idea of travel chain reconstruction is shown in Figure 5. After reconstructing the travel chain, we can find that the state of unlocking and closing the lock is two identical records appearing consecutively, one identifies the beginning of this state, and the other marks the end of this state. From the reconstructed travel chain, it is easier for us to distinguish when the bicycle is riding and when it is parked.

"Shared Bicycle Data Topic" Technical points of shared bicycle data analysis

Figure 5 Travel chain reconstruction idea

It should be noted that the travel chain reconstruction step will increase a certain amount of data and computation, and in actual data processing, the travel chain reconstruction is not a necessary step. Cycling and parking information can also be extracted directly from the raw data. The main role of the travel chain reconstruction here is to help us more clearly observe the changes in the state of cycling and parking status during the day, clarify the logic, and ensure that the subsequent code is correct.

03 Advanced Learning

1. Video version of shared bicycle data analysis content

For the above-mentioned shared bicycle data analysis content is interested, students who want to learn further, you can move to the Cube Data Institute B station account (account name: Cube Data Academy) to learn the video version of this part!

2. "Traffic Big Data Analysis Practice" courseIf you want to learn the analysis methods of taxi GPS data, shared bicycle data, subway IC swipe card data, and bus GPS data, you can enroll in the "Traffic Big Data Analysis Practice" course launched by The Cube Data Academy, and the course details are focused on the "Cube Data Society" public account to understand!