天天看点

时间序列预测--xgboost

实验数据:

数据每分钟一个点,共获取7天的数据。

时间序列预测--xgboost
def xgboost_model_forecast(data, step):
    """
    将原始数据分割为两部分,一部分进行训练,一部分用于模型评估(默认近三天),
    然后预测未来step hours的数据
    :param data: dataframe格式,index,val
    :param hours: 要预测的时长
    :return: Series,预测的时间点和预测值
    """
    latest_date = data.index[-1] + datetime.timedelta(minutes=1)
    forecast_times = pd.date_range(start=latest_date, periods=step, freq='T')

    split_date = latest_date + datetime.timedelta(-3)
    train_data = data.loc[data.index <= split_date].copy()
    evaluation_data = data.loc[data.index > split_date].copy()

    forecast_data = pd.DataFrame({'val': np.zeros(len(forecast_times))}, index=forecast_times)

    x_train, y_train = create_features(train_data, label='val')
    x_eval, y_eval = create_features(evaluation_data, label='val')
    x_forecast = create_features(forecast_data)

    reg = XGBRegressor(n_estimators=1000)

    reg.fit(x_train, y_train,eval_set=[(x_train, y_train), (x_eval, y_eval)],early_stopping_rounds=50,verbose=False)

    evaluation_data['val'] = reg.predict(x_eval)
    forecast_data['val'] = reg.predict(x_forecast)

    data_forecasted = pd.Series(dict(zip(forecast_times, forecast_data['val'])))

    return data_forecasted


def create_features(df, label=None):
    df['date'] = df.index # index: DatetimeIndex
    df['hour'] = df['date'].dt.hour # dt: DatetimeProperties, hour: Series
    df['quarter'] = df['date'].dt.quarter
    df['minute'] = df['date'].dt.minute
    df['month'] = df['date'].dt.month
    df['year'] = df['date'].dt.year
    df['day_of_year'] = df['date'].dt.dayofyear
    df['day_of_month'] = df['date'].dt.day
    df['week_of_year'] = df['date'].dt.weekofyear

    X = df[['hour', 'quarter', 'minute', 'month', 'year', 'day_of_year', 'day_of_month', 'week_of_year']]
    if label:
        y = df[label]
        return X, y
    return X
           
时间序列预测--xgboost

如上图,使用历史一周的数据进行预测未来一天的数据,可以看到,预测结果和真实值还是很接近的。

参考:

https://blog.csdn.net/kewei168/article/details/90375743,原理+代码,

https://blog.csdn.net/guolindonggld/article/details/87826024,附带部分代码,指定分割点切分数据集

https://www.cnblogs.com/zongfa/p/9324684.html,原理讲解,很详细,需要细看

https://blog.csdn.net/ljzology/article/details/82154143,参数解读,模型参数的调参

https://blog.csdn.net/qq_20412595/article/details/82621744

继续阅读