laitimes

59. Time series data summary

author:Lee Kwang Joo

Case import

In the following table, the first column is used as the index column and the engine is openpyxl. Please summarize the measurements for each day. Add code comments.

Time Measurements
2022-03-18 00:00:00 5
2022-03-18 01:00:00 9
2022-03-18 02:00:00 6
2022-03-18 03:00:00 9
2022-03-18 04:00:00 8
2022-03-18 05:00:00 10
2022-03-18 06:00:00 4
2022-03-18 07:00:00 3
2022-03-18 08:00:00 6
2022-03-18 09:00:00 3
2022-03-18 10:00:00 1
2022-03-18 11:00:00 6
2022-03-18 12:00:00 7
2022-03-18 13:00:00 4
2022-03-18 14:00:00 1
2022-03-18 15:00:00 7
2022-03-18 16:00:00 4
2022-03-18 17:00:00 10
2022-03-18 18:00:00 8
2022-03-18 19:00:00 6
2022-03-18 20:00:00 6
2022-03-18 21:00:00 1
2022-03-18 22:00:00 5
2022-03-18 23:00:00 4
2022-03-19 00:00:00 5
2022-03-19 01:00:00 5
2022-03-19 02:00:00 4
2022-03-19 03:00:00 1
2022-03-19 04:00:00 8
2022-03-19 05:00:00 6
2022-03-19 06:00:00 1
2022-03-19 07:00:00 2
2022-03-19 08:00:00 8
2022-03-19 09:00:00 4
2022-03-19 10:00:00 9
2022-03-19 11:00:00 1
2022-03-19 12:00:00 8
2022-03-19 13:00:00 2
2022-03-19 14:00:00 5
2022-03-19 15:00:00 2
2022-03-19 16:00:00 6
2022-03-19 17:00:00 9
2022-03-19 18:00:00 6
2022-03-19 19:00:00 2
2022-03-19 20:00:00 1
2022-03-19 21:00:00 8
2022-03-19 22:00:00 1
2022-03-19 23:00:00 2
2022-03-20 00:00:00 6
2022-03-20 01:00:00 6
2022-03-20 02:00:00 10
2022-03-20 03:00:00 9
2022-03-20 04:00:00 6
2022-03-20 05:00:00 7
2022-03-20 06:00:00 8
2022-03-20 07:00:00 10
2022-03-20 08:00:00 1
2022-03-20 09:00:00 3
2022-03-20 10:00:00 1
2022-03-20 11:00:00 3
2022-03-20 12:00:00 2
2022-03-20 13:00:00 10
2022-03-20 14:00:00 9
2022-03-20 15:00:00 5
2022-03-20 16:00:00 2
2022-03-20 17:00:00 4
2022-03-20 18:00:00 10
2022-03-20 19:00:00 7
2022-03-20 20:00:00 6
2022-03-20 21:00:00 10
2022-03-20 22:00:00 4
2022-03-20 23:00:00 7
2022-03-21 00:00:00 7
2022-03-21 01:00:00 9
2022-03-21 02:00:00 8
2022-03-21 03:00:00 6
2022-03-21 04:00:00 2
2022-03-21 05:00:00 6
2022-03-21 06:00:00 10
2022-03-21 07:00:00 3
2022-03-21 08:00:00 2
2022-03-21 09:00:00 1
2022-03-21 10:00:00 7
2022-03-21 11:00:00 2
2022-03-21 12:00:00 9
2022-03-21 13:00:00 1
2022-03-21 14:00:00 7
2022-03-21 15:00:00 2
2022-03-21 16:00:00 2
2022-03-21 17:00:00 10
2022-03-21 18:00:00 5
2022-03-21 19:00:00 5
2022-03-21 20:00:00 4
2022-03-21 21:00:00 6
2022-03-21 22:00:00 4
2022-03-21 23:00:00 1
2022-03-22 00:00:00 5
2022-03-22 01:00:00 10
2022-03-22 02:00:00 6
2022-03-22 03:00:00 9

Wenxin is a prompt word

Open Baidu's "Wenxin Yiyan" and enter the prompt word: "You are a pandas expert." The file path is: D:/Samples/Time Series.xlsx. Use pandas to import the data in the 1st worksheet in the Excel file, use the 1st column as the index column, and the engine is "openpyxl". Please summarize the measurements for each day. Add code comments. ”

The results are as follows:

First of all, you need to make sure that you have installed the pandas library and the OpenPyXL engine (which is used to read .xlsx files). If you haven't already, you can use pip to install them:

pip install pandas openpyxl           

Next, you can use the following Python code to accomplish your needs:

import pandas as pd

# 文件路径
file_path = 'D:/Samples/时间序列.xlsx'

# 读取Excel文件中第1个工作表的数据,第1列作为索引列,使用'openpyxl'作为引擎
# parse_dates参数用于指定哪一列或多列应该被解析为日期,这里假设日期在第一行且列名为'Date'
# 如果你的日期列没有特定的列名,而是直接作为第一列(即索引列),则不需要指定parse_dates
df = pd.read_excel(file_path, sheet_name=0, index_col=0, engine='openpyxl')

# 假设'Value'是你要汇总的测量值的列名
# 如果你的列名不同,请替换为实际的列名
# 使用resample方法按天汇总数据,这里使用'sum'作为聚合函数
# 如果你的数据已经是按天排序的,可以省略base参数
# 如果你的索引不是日期时间类型,你需要先将其转换为日期时间类型
df.index = pd.to_datetime(df.index)  # 确保索引是日期时间类型
daily_summary = df['Value'].resample('D').sum()  # 假设你的测量值在'Value'列中

# 输出每天的测量值汇总
print(daily_summary)

# 如果需要将结果保存回Excel文件,可以使用下面的代码
output_path = 'D:/Samples/时间序列_每日汇总.xlsx'
daily_summary.to_excel(output_path)           

Caution:

  1. In the code above, I'm assuming that your date column is an index column and your measurements are in a column named 'Value'. You need to modify these parameters according to your actual Excel file structure.
  2. If your date column is not an index column, but a normal data column, you need to set it as an index column first, or specify the on parameter as the column name of the date column when using the resample method.
  3. If your date column doesn't have a specific column name (e.g., it's directly the first column, and you want to index it), then index_col=0 is enough and you don't need to specify additional parse_dates.
  4. resample('D') resamples the data by the number of days. This requires your index to be of the datetime type. If not, you'll need to convert it to a datetime type first, as shown in the code above.
  5. I used sum as an aggregate function to summarize the measurements. If your needs are to find an average, maximum, minimum, etc., you can change the aggregate function accordingly.

Python running results

59. Time series data summary

Figure 1. Python running results

The 'value' in the code example given by Wenxin Yiyan needs to be changed to 'measured value' according to the actual situation.