時間序列預測--xgboost

2023-03-19 10:29:38

實驗資料：

資料每分鐘一個點，共擷取7天的資料。

時間序列預測--xgboost

def xgboost_model_forecast(data, step):
    """
    将原始資料分割為兩部分，一部分進行訓練，一部分用于模型評估（預設近三天），
    然後預測未來step hours的資料
    :param data: dataframe格式，index,val
    :param hours: 要預測的時長
    :return: Series，預測的時間點和預測值
    """
    latest_date = data.index[-1] + datetime.timedelta(minutes=1)
    forecast_times = pd.date_range(start=latest_date, periods=step, freq='T')

    split_date = latest_date + datetime.timedelta(-3)
    train_data = data.loc[data.index <= split_date].copy()
    evaluation_data = data.loc[data.index > split_date].copy()

    forecast_data = pd.DataFrame({'val': np.zeros(len(forecast_times))}, index=forecast_times)

    x_train, y_train = create_features(train_data, label='val')
    x_eval, y_eval = create_features(evaluation_data, label='val')
    x_forecast = create_features(forecast_data)

    reg = XGBRegressor(n_estimators=1000)

    reg.fit(x_train, y_train,eval_set=[(x_train, y_train), (x_eval, y_eval)],early_stopping_rounds=50,verbose=False)

    evaluation_data['val'] = reg.predict(x_eval)
    forecast_data['val'] = reg.predict(x_forecast)

    data_forecasted = pd.Series(dict(zip(forecast_times, forecast_data['val'])))

    return data_forecasted


def create_features(df, label=None):
    df['date'] = df.index # index: DatetimeIndex
    df['hour'] = df['date'].dt.hour # dt: DatetimeProperties, hour: Series
    df['quarter'] = df['date'].dt.quarter
    df['minute'] = df['date'].dt.minute
    df['month'] = df['date'].dt.month
    df['year'] = df['date'].dt.year
    df['day_of_year'] = df['date'].dt.dayofyear
    df['day_of_month'] = df['date'].dt.day
    df['week_of_year'] = df['date'].dt.weekofyear

    X = df[['hour', 'quarter', 'minute', 'month', 'year', 'day_of_year', 'day_of_month', 'week_of_year']]
    if label:
        y = df[label]
        return X, y
    return X

時間序列預測--xgboost

如上圖，使用曆史一周的資料進行預測未來一天的資料，可以看到，預測結果和真實值還是很接近的。

參考：

https://blog.csdn.net/kewei168/article/details/90375743，原理+代碼，

https://blog.csdn.net/guolindonggld/article/details/87826024，附帶部分代碼，指定分割點切分資料集

https://www.cnblogs.com/zongfa/p/9324684.html，原理講解，很詳細，需要細看

https://blog.csdn.net/ljzology/article/details/82154143，參數解讀，模型參數的調參

https://blog.csdn.net/qq_20412595/article/details/82621744

時間序列預測--xgboost

繼續閱讀

【原創】流水線處理對比執行個體

常見時序預測模型的R實作三

【時序聚類】Neurocomputing:Multivariate time series clustering based on common principal component analysi一、論文總體架構二、總結

四個“智慧城市”項目看有限資源下如何實作最高效的時序資料處理

時序邏輯之線性時序邏輯（LTL）和分支時序邏輯（CTL）對比及典型示例線性時序邏輯分支時序邏輯參考資料

Distributed Algorithms - Preface

時間序列分析 | Python實作時間序列特征生成

探索TDengine資料庫的終極性能，通過ETL實作高效時序資料處理

計算機組成原理——硬布線控制器設計（1）

java 使用BigDecimal運算的時候報錯No exact representable decimal result

STL分解 Python實作

趨勢預測方法（二）其他函數拟合_函數拟合其它函數拟合

趨勢預測方法（一）多項式拟合(最小二乘法)_函數拟合多項式拟合(最小二乘法)

ARIMA時序模型預測股價波動情況

趨勢預測方法（四）高斯過程回歸_時序機率性預測高斯過程回歸(GPR)

【R語言】GARCH模型的應用一、資料來源二、資料分析三、模型建立四、模型優化五、結論六、實作代碼七、參考資料