實驗資料:
資料每分鐘一個點,共擷取7天的資料。
![](https://img.laitimes.com/img/__Qf2AjLwojIjJCLyojI0JCLiAzNfRHLGZkRGZkRfJ3bs92YsYTMfVmepNHL6NmeOJTT61EMJpHW4Z0MMBjVtJWd0ckW65UbM5WOHJWa5kHT20ESjBjUIF2X0hXZ0xCMx81dvRWYoNHLrdEZwZ1Rh5WNXp1bwNjW1ZUba9VZwlHdssmch1mclRXY39CXldWYtlWPzNXZj9mcw1ycz9WL49zZuBnL5ITM4UDO0UTM3EDNwAjMwIzLc52YucWbp5GZzNmLn9Gbi1yZtl2Lc9CX6MHc0RHaiojIsJye.png)
def xgboost_model_forecast(data, step):
"""
将原始資料分割為兩部分,一部分進行訓練,一部分用于模型評估(預設近三天),
然後預測未來step hours的資料
:param data: dataframe格式,index,val
:param hours: 要預測的時長
:return: Series,預測的時間點和預測值
"""
latest_date = data.index[-1] + datetime.timedelta(minutes=1)
forecast_times = pd.date_range(start=latest_date, periods=step, freq='T')
split_date = latest_date + datetime.timedelta(-3)
train_data = data.loc[data.index <= split_date].copy()
evaluation_data = data.loc[data.index > split_date].copy()
forecast_data = pd.DataFrame({'val': np.zeros(len(forecast_times))}, index=forecast_times)
x_train, y_train = create_features(train_data, label='val')
x_eval, y_eval = create_features(evaluation_data, label='val')
x_forecast = create_features(forecast_data)
reg = XGBRegressor(n_estimators=1000)
reg.fit(x_train, y_train,eval_set=[(x_train, y_train), (x_eval, y_eval)],early_stopping_rounds=50,verbose=False)
evaluation_data['val'] = reg.predict(x_eval)
forecast_data['val'] = reg.predict(x_forecast)
data_forecasted = pd.Series(dict(zip(forecast_times, forecast_data['val'])))
return data_forecasted
def create_features(df, label=None):
df['date'] = df.index # index: DatetimeIndex
df['hour'] = df['date'].dt.hour # dt: DatetimeProperties, hour: Series
df['quarter'] = df['date'].dt.quarter
df['minute'] = df['date'].dt.minute
df['month'] = df['date'].dt.month
df['year'] = df['date'].dt.year
df['day_of_year'] = df['date'].dt.dayofyear
df['day_of_month'] = df['date'].dt.day
df['week_of_year'] = df['date'].dt.weekofyear
X = df[['hour', 'quarter', 'minute', 'month', 'year', 'day_of_year', 'day_of_month', 'week_of_year']]
if label:
y = df[label]
return X, y
return X
如上圖,使用曆史一周的資料進行預測未來一天的資料,可以看到,預測結果和真實值還是很接近的。
參考:
https://blog.csdn.net/kewei168/article/details/90375743,原理+代碼,
https://blog.csdn.net/guolindonggld/article/details/87826024,附帶部分代碼,指定分割點切分資料集
https://www.cnblogs.com/zongfa/p/9324684.html,原理講解,很詳細,需要細看
https://blog.csdn.net/ljzology/article/details/82154143,參數解讀,模型參數的調參
https://blog.csdn.net/qq_20412595/article/details/82621744