天天看點

時間序列預測--xgboost

實驗資料:

資料每分鐘一個點,共擷取7天的資料。

時間序列預測--xgboost
def xgboost_model_forecast(data, step):
    """
    将原始資料分割為兩部分,一部分進行訓練,一部分用于模型評估(預設近三天),
    然後預測未來step hours的資料
    :param data: dataframe格式,index,val
    :param hours: 要預測的時長
    :return: Series,預測的時間點和預測值
    """
    latest_date = data.index[-1] + datetime.timedelta(minutes=1)
    forecast_times = pd.date_range(start=latest_date, periods=step, freq='T')

    split_date = latest_date + datetime.timedelta(-3)
    train_data = data.loc[data.index <= split_date].copy()
    evaluation_data = data.loc[data.index > split_date].copy()

    forecast_data = pd.DataFrame({'val': np.zeros(len(forecast_times))}, index=forecast_times)

    x_train, y_train = create_features(train_data, label='val')
    x_eval, y_eval = create_features(evaluation_data, label='val')
    x_forecast = create_features(forecast_data)

    reg = XGBRegressor(n_estimators=1000)

    reg.fit(x_train, y_train,eval_set=[(x_train, y_train), (x_eval, y_eval)],early_stopping_rounds=50,verbose=False)

    evaluation_data['val'] = reg.predict(x_eval)
    forecast_data['val'] = reg.predict(x_forecast)

    data_forecasted = pd.Series(dict(zip(forecast_times, forecast_data['val'])))

    return data_forecasted


def create_features(df, label=None):
    df['date'] = df.index # index: DatetimeIndex
    df['hour'] = df['date'].dt.hour # dt: DatetimeProperties, hour: Series
    df['quarter'] = df['date'].dt.quarter
    df['minute'] = df['date'].dt.minute
    df['month'] = df['date'].dt.month
    df['year'] = df['date'].dt.year
    df['day_of_year'] = df['date'].dt.dayofyear
    df['day_of_month'] = df['date'].dt.day
    df['week_of_year'] = df['date'].dt.weekofyear

    X = df[['hour', 'quarter', 'minute', 'month', 'year', 'day_of_year', 'day_of_month', 'week_of_year']]
    if label:
        y = df[label]
        return X, y
    return X
           
時間序列預測--xgboost

如上圖,使用曆史一周的資料進行預測未來一天的資料,可以看到,預測結果和真實值還是很接近的。

參考:

https://blog.csdn.net/kewei168/article/details/90375743,原理+代碼,

https://blog.csdn.net/guolindonggld/article/details/87826024,附帶部分代碼,指定分割點切分資料集

https://www.cnblogs.com/zongfa/p/9324684.html,原理講解,很詳細,需要細看

https://blog.csdn.net/ljzology/article/details/82154143,參數解讀,模型參數的調參

https://blog.csdn.net/qq_20412595/article/details/82621744

繼續閱讀