laitimes

62. Resampling

Case import

In the following data table, column 1 is used as the index column and the engine is openpyxl. Resample the measurement data by day, and the aggregation function is the mean. Add code comments.

Time Measurements
2022-03-18 00:00:00 5
2022-03-18 01:00:00 9
2022-03-18 02:00:00 6
2022-03-18 03:00:00 9
2022-03-18 04:00:00 8
2022-03-18 05:00:00 10
2022-03-18 06:00:00 4
2022-03-18 07:00:00 3
2022-03-18 08:00:00 6
2022-03-18 09:00:00 3
2022-03-18 10:00:00 1
2022-03-18 11:00:00 6
2022-03-18 12:00:00 7
2022-03-18 13:00:00 4
2022-03-18 14:00:00 1
2022-03-18 15:00:00 7
2022-03-18 16:00:00 4
2022-03-18 17:00:00 10
2022-03-18 18:00:00 8
2022-03-18 19:00:00 6
2022-03-18 20:00:00 6
2022-03-18 21:00:00 1
2022-03-18 22:00:00 5
2022-03-18 23:00:00 4
2022-03-19 00:00:00 5
2022-03-19 01:00:00 5
2022-03-19 02:00:00 4
2022-03-19 03:00:00 1
2022-03-19 04:00:00 8
2022-03-19 05:00:00 6
2022-03-19 06:00:00 1
2022-03-19 07:00:00 2
2022-03-19 08:00:00 8
2022-03-19 09:00:00 4
2022-03-19 10:00:00 9
2022-03-19 11:00:00 1
2022-03-19 12:00:00 8
2022-03-19 13:00:00 2
2022-03-19 14:00:00 5
2022-03-19 15:00:00 2
2022-03-19 16:00:00 6
2022-03-19 17:00:00 9
2022-03-19 18:00:00 6
2022-03-19 19:00:00 2
2022-03-19 20:00:00 1
2022-03-19 21:00:00 8
2022-03-19 22:00:00 1
2022-03-19 23:00:00 2
2022-03-20 00:00:00 6
2022-03-20 01:00:00 6
2022-03-20 02:00:00 10
2022-03-20 03:00:00 9
2022-03-20 04:00:00 6
2022-03-20 05:00:00 7
2022-03-20 06:00:00 8
2022-03-20 07:00:00 10
2022-03-20 08:00:00 1
2022-03-20 09:00:00 3
2022-03-20 10:00:00 1
2022-03-20 11:00:00 3
2022-03-20 12:00:00 2
2022-03-20 13:00:00 10
2022-03-20 14:00:00 9
2022-03-20 15:00:00 5
2022-03-20 16:00:00 2
2022-03-20 17:00:00 4
2022-03-20 18:00:00 10
2022-03-20 19:00:00 7
2022-03-20 20:00:00 6
2022-03-20 21:00:00 10
2022-03-20 22:00:00 4
2022-03-20 23:00:00 7
2022-03-21 00:00:00 7
2022-03-21 01:00:00 9
2022-03-21 02:00:00 8
2022-03-21 03:00:00 6
2022-03-21 04:00:00 2
2022-03-21 05:00:00 6
2022-03-21 06:00:00 10
2022-03-21 07:00:00 3
2022-03-21 08:00:00 2
2022-03-21 09:00:00 1
2022-03-21 10:00:00 7
2022-03-21 11:00:00 2
2022-03-21 12:00:00 9
2022-03-21 13:00:00 1
2022-03-21 14:00:00 7
2022-03-21 15:00:00 2
2022-03-21 16:00:00 2
2022-03-21 17:00:00 10
2022-03-21 18:00:00 5
2022-03-21 19:00:00 5
2022-03-21 20:00:00 4
2022-03-21 21:00:00 6
2022-03-21 22:00:00 4
2022-03-21 23:00:00 1
2022-03-22 00:00:00 5
2022-03-22 01:00:00 10
2022-03-22 02:00:00 6
2022-03-22 03:00:00 9

Wenxin is a prompt word

Open Baidu's "Wenxin Yiyan" and enter the prompt word: "You are a pandas expert." The file path is: D:/Samples/Time Series.xlsx. Use pandas to import the data in the 1st worksheet in the Excel file, use the 1st column as the index column, and the engine is "openpyxl". Resample the measurement data by day, and the aggregation function is the mean. Add code comments. ”

The results are as follows:

In order to import the data in the Excel file with pandas as per your requirements, take column 1 as the index column, and use openpyxl as the engine to read, and then resample the data by day and calculate the mean, you can refer to the following code:

import pandas as pd

# 设置文件路径
file_path = 'D:/Samples/时间序列.xlsx'

# 使用pandas读取Excel文件,设置第1列为索引列,并使用openpyxl引擎
# 注意:openpyxl通常用于.xlsx文件,如果是.xls文件,则可能需要使用xlrd引擎
df = pd.read_excel(file_path, sheet_name=0, index_col=0, engine='openpyxl')

# 假设DataFrame中的时间数据是datetime类型且位于索引位置
# 如果不是datetime类型,您可能需要先转换索引列的数据类型
# 例如:df.index = pd.to_datetime(df.index)

# 检查数据类型
print(df.index.dtype)

# 如果数据类型不是datetime64[ns],则需要进行转换
# if not isinstance(df.index.dtype, pd.DatetimeTZDtype):
#     df.index = pd.to_datetime(df.index)

# 按天对数据进行重采样,并计算均值
# 假设我们要重采样整个DataFrame,而不是某一列
resampled_df = df.resample('D').mean()

# 输出结果
print(resampled_df)

# 如果需要保存结果到新的Excel文件
output_path = 'D:/Samples/时间序列_重采样.xlsx'
resampled_df.to_excel(output_path, engine='openpyxl')           

Caution:

  1. Before running the above code, make sure that the openpyxl library is installed. If it is not installed, it can be installed via pip install openpyxl.
  2. The index column of the Excel file (i.e. column 1) should contain data that can be converted to a datetime type, otherwise you will run into problems when resampling.
  3. In the above code, I'm assuming that the entire DataFrame needs to be resampled by days. If your DataFrame contains multiple columns, and only certain columns need to be resampled, you may need to apply the resample method to specific columns.
  4. If the index column in the raw data is not of the datetime type, you need to use the pd.to_datetime function to convert it to the correct type. In the code above, I've provided the commented out sample code, and you can decide if you need to use them or not.
  5. If you want to save the resampled data to a new Excel file, I've added a snippet to save the file. If you don't need this step, you can delete or comment out the code.

Python running results

62. Resampling

Figure 1. Python running results

The tools used in this collection are as follows:

Wenxin Yiyan: Version 3.5

Python Editor: 3.9.0

Disclaimer: The above results are the actual operation results, if you encounter failure to run, you can leave a message in the comment area. Part of the data comes from the Internet, if there is any infringement, please inform us by private message, thank you!