天天看點

python時間序列分析按月_python – 使用statsmodels進行時間序列分析

我真的很想看到一個資料樣本以及一個代碼片段來重制你的錯誤.

沒有它,我的建議不會解決您的特定錯誤消息.但是,它允許您對存儲在pandas資料幀中的一組時間序列運作多元回歸分析.假設您在時間序列中使用連續值而非分類值,以下是使用pandas和statsmodel執行此操作的方法:

具有随機值的資料框:

# Imports

import pandas as pd

import numpy as np

import itertools

np.random.seed(1)

rows = 12

listVars= ['y','x1', 'x2', 'x3']

rng = pd.date_range('1/1/2017', periods=rows, freq='D')

df_1 = pd.DataFrame(np.random.randint(100,150,size=(rows, len(listVars))), columns=listVars)

df_1 = df_1.set_index(rng)

print(df_1)

輸出 – 一些可用的資料:

y x1 x2 x3

2017-01-01 137 143 112 108

2017-01-02 109 111 105 115

2017-01-03 100 116 101 112

2017-01-04 107 145 106 125

2017-01-05 120 137 118 120

2017-01-06 111 142 128 129

2017-01-07 114 104 123 123

2017-01-08 141 149 130 132

2017-01-09 122 113 141 109

2017-01-10 107 122 101 100

2017-01-11 117 108 124 113

2017-01-12 147 142 108 130

下面的函數将允許您指定源資料幀以及因變量y和選擇的獨立變量x1,x2.使用statsmodels,一些期望的結果将存儲在資料幀中.在那裡,R2将是數字類型,而回歸系數和p值将是清單,因為這些估計的數量将随着您希望包括在分析中的獨立變量的數量而變化.

def LinReg(df, y, x, const):

betas = x.copy()

# Model with out without a constant

if const == True:

x = sm.add_constant(df[x])

model = sm.OLS(df[y], x).fit()

else:

model = sm.OLS(df[y], df[x]).fit()

# Estimates of R2 and p

res1 = {'Y': [y],

'R2': [format(model.rsquared, '.4f')],

'p': [model.pvalues.tolist()],

'start': [df.index[0]],

'stop': [df.index[-1]],

'obs' : [df.shape[0]],

'X': [betas]}

df_res1 = pd.DataFrame(data = res1)

# Regression Coefficients

theParams = model.params[0:]

coefs = theParams.to_frame()

df_coefs = pd.DataFrame(coefs.T)

xNames = list(df_coefs)

xValues = list(df_coefs.loc[0].values)

xValues2 = [ '%.2f' % elem for elem in xValues ]

res2 = {'Independent': [xNames],

'beta': [xValues2]}

df_res2 = pd.DataFrame(data = res2)

# All results

df_res = pd.concat([df_res1, df_res2], axis = 1)

df_res = df_res.T

df_res.columns = ['results']

return(df_res)

這是一個測試運作:

df_regression = LinReg(df = df, y = 'y', x = ['x1', 'x2'], const = True)

print(df_regression)

輸出:

results

R2 0.3650

X [x1, x2]

Y y

obs 12

p [0.7417691742514285, 0.07989515781898897, 0.25...

start 2017-01-01 00:00:00

stop 2017-01-12 00:00:00

Independent [const, x1, x2]

coefficients [16.29, 0.47, 0.37]

這是一個簡單的複制粘貼的全部内容:

# Imports

import pandas as pd

import numpy as np

import statsmodels.api as sm

np.random.seed(1)

rows = 12

listVars= ['y','x1', 'x2', 'x3']

rng = pd.date_range('1/1/2017', periods=rows, freq='D')

df = pd.DataFrame(np.random.randint(100,150,size=(rows, len(listVars))), columns=listVars)

df = df.set_index(rng)

def LinReg(df, y, x, const):

betas = x.copy()

# Model with out without a constant

if const == True:

x = sm.add_constant(df[x])

model = sm.OLS(df[y], x).fit()

else:

model = sm.OLS(df[y], df[x]).fit()

# Estimates of R2 and p

res1 = {'Y': [y],

'R2': [format(model.rsquared, '.4f')],

'p': [model.pvalues.tolist()],

'start': [df.index[0]],

'stop': [df.index[-1]],

'obs' : [df.shape[0]],

'X': [betas]}

df_res1 = pd.DataFrame(data = res1)

# Regression Coefficients

theParams = model.params[0:]

coefs = theParams.to_frame()

df_coefs = pd.DataFrame(coefs.T)

xNames = list(df_coefs)

xValues = list(df_coefs.loc[0].values)

xValues2 = [ '%.2f' % elem for elem in xValues ]

res2 = {'Independent': [xNames],

'beta': [xValues2]}

df_res2 = pd.DataFrame(data = res2)

# All results

df_res = pd.concat([df_res1, df_res2], axis = 1)

df_res = df_res.T

df_res.columns = ['results']

return(df_res)

df_regression = LinReg(df = df, y = 'y', x = ['x1', 'x2'], const = True)

print(df_regression)