天天看点

Python 第三方模块 统计

一.statsmodels模块

官方文档:https://www.statsmodels.org/stable/user-guide.html \quad https://www.statsmodels.org/stable/api.html

1.概述

(1)简介:

更多功能参见:https://zhuanlan.zhihu.com/p/91384305

statsmodels是1个Python统计分析模块,源于斯坦福大学统计学教授Jonathan Taylor,并由Skipper Seabold和Josef Perktold于2010年正式创
建该项目.其包含了许多经典统计学和经济计量学的算法,主要有:
①回归模型:线性回归,广义线性模型,健壮线性模型,线性混合效应模型等
②方差分析(ANOVA)
③时间序列分析和状态空间模型:AR,ARMA,ARIMA,VAR等
④广义的矩量法
⑤非参数方法:核密度估计,核回归
⑥统计模型结果可视化方法
           

(2)与其他模块的关系:

①与patsy:受R的公式系统的启发,Nathaniel Smith创建了patsy项目.该模块提供了statsmodels的公式/模型的规范框架
②与scikit-learn:statsmodels更关注统计推断,而sklearn更注重预测
           

(3)安装:

pip install statsmodels
           

(4)不同导入方法的比较:

参见:https://www.statsmodels.org/stable/api-structure.html#import-paths-and-structure

2.横断面研究(Cross-Sectional Study)-基于数组的接口

#注意:
①这类接口推荐用于交互式使用
②这些类/函数实际上是定义在其他地方的,sm只是提供了1个接口
           

(1)导入:

#通常导入为sm:
import statsmodels.api as sm
           

(2)回归(Regression):

"普通最小二乘法"(Ordinary Least Squares):class sm.OLS(<endog>,<exog>[,missing='none',hasconst=None,**kwargs])
  #实际上是class statsmodels.regression.linear_model.OLS
  #参数说明:
    endog:指定数据点的y值;为1-D array-like
    exog:指定数据点的x值;为n×k array-like,其中n=len(<endog>),k为特征数
    missing:指定如何处理缺失值;为"none"(不检查是否包含NaN)/"drop"(丢弃相应记录)/"raise"(报错)
      hasconst:Indicates whether the RHS includes a user-supplied constant. If True, a constant is not checked
      for and k_constant is set to 1 and all result statistics are calculated as if a constant is present. If
      False, a constant is not checked for and k_constant is set to 0;为None/bool
    kwargs:指定使用公式接口时要传入的其他参数

######################################################################################################################

"广义最小二乘法"(Generalized Least Squares):class sm.GLS(<endog>,<exog>[,sigma=None,missing='none',hasconst=None,**kwargs])
  #实际上是class statsmodels.regression.linear_model.GLS
  #参数说明:其他参数同sm.OLS
    sigma:指定协方差加权矩阵;为None/scalar/array
      #The default is None for no scaling
      #If sigma is a scalar, it is assumed that sigma is an n x n diagonal matrix with the given scalar, sigma as the
      #value of each diagonal element
      #If sigma is an n-length vector, then sigma is assumed to be a diagonal matrix with the given sigma on the
      #diagonal
      #This should be the same as WLS

######################################################################################################################

Generalized Least Squares with AR covariance structures:class sm.GLSAR(<endog>,<exog>[,rho=1,missing='none',hasconst=None,**kwargs])
  #实际上是class statsmodels.regression.linear_model.GLSAR

######################################################################################################################

"加权最小二乘法"(Weighted Least Squares):class sm.WLS(<endog>,<exog>[,weights=1.0,missing='none',hasconst=None,**kwargs])
  #实际上是class statsmodels.regression.linear_model.WLS
  #参数说明:其他参数同sm.OLS
    weights:指定权重;为int/1-D array-like

######################################################################################################################

"递归最小二乘法"(Recursive Least Squares):class sm.RecursiveLS(<endog>,<exog>[,constraints=None,**kwargs])
  #实际上是class statsmodels.regression.recursive_ls.RecursiveLS

######################################################################################################################

"滚动普通最小二乘法"(Rolling Ordinary Least Squares):class sm.RollingOLS(<endog>,<exog>[,window=None,min_nobs=None,missing='drop',expanding=False])
  #实际上是class statsmodels.regression.rolling.RollingOLS

######################################################################################################################

"滚动加权最小二乘法"(Rolling Weighted Least Squares):class sm.RollingWLS(<endog>,<exog>[,window=None,weights=None,min_nobs=None,missing='drop',expanding=False])
  #实际上是class statsmodels.regression.rolling.RollingWLS
           

(3)缺失值的处理(Imputation):

"基于高斯模型的贝叶斯插补"(Bayesian Imputation using a Gaussian model):class sm.BayesGaussMI(<data>[,mean_prior=None,cov_prior=None,cov_prior_df=1])
  #实际上是class statsmodels.imputation.bayes_mi.BayesGaussMI
"基于贝叶斯估计的广义线性混合模型"(Generalized Linear Mixed Model with Bayesian estimation):class sm.BinomialBayesMixedGLM(<endog>,<exog>,<exog_vc>,<ident>[,vcp_p=1,fe_p=2,fep_names=None,vcp_names=None,vc_names=None])
  #实际上是class statsmodels.genmod.bayes_mixed_glm.BinomialBayesMixedGLM
"因子分析"(Factor analysis):class sm.Factor([endog=None,n_factor=1,corr=None,method='pa',smc=True,endog_names=None,nobs=None,missing='drop'])
  #实际上是class statsmodels.multivariate.factor.Factor
基于指定"缺失值处理器"(Imputer)的"多重插补"(Multiple Imputation):class sm.MI(<imp>,<model>[,model_args_fn=None,model_kwds_fn=None,formula=None,fit_args=None,fit_kwds=None,xfunc=None,burn=100,nrep=20,skip=10])
  #实际上是class statsmodels.imputation.bayes_mi.MI
基于"链式方程"(Chained Equations)的多重插补:class sm.MICE(<model_formula>,<model_class>,<data>[,n_skip=3,init_kwds=None,fit_kwds=None])
  #实际上是class statsmodels.imputation.mice.MICE
包装数据集以允许使用sm.MICE处理缺失值:class sm.MICEData(<data>[,perturbation_method='gaussian',k_pmm=20,history_callback=None])
  #实际上是class statsmodels.imputation.mice.MICEData
           

(4)广义估计方程(Generalized Estimating Equations;GEE):

"基于GEE的边际回归模型"(Marginal Regression Model using GEE):class sm.GEE(<endog>,<exog>,<groups>[,time=None,family=None,cov_struct=None,missing='none',offset=None,exposure=None,dep_data=None,constraint=None,update_dep=True,weights=None,**kwargs])
  #实际上是class statsmodels.genmod.generalized_estimating_equations.GEE
"基于GEE的名义反应边际回归模型"(Nominal Response Marginal Regression Model using GEE):sm.NominalGEE(<endog>,<exog>,<groups>[,time=None,family=None,cov_struct=None,missing='none',offset=None,dep_data=None,constraint=None,**kwargs])
  #实际上是class statsmodels.genmod.generalized_estimating_equations.NominalGEE
"基于GEE的顺序反应边际回归模型"(Ordinal Response Marginal Regression Model using GEE):class sm.OrdinalGEE(<endog>,<exog>,<groups>[,time=None,family=None,cov_struct=None,missing='none',offset=None,dep_data=None,constraint=None,**kwargs])
  #实际上是statsmodels.genmod.generalized_estimating_equations.OrdinalGEE
           

(5)广义线性模型(Generalized Linear Models;GLM):

"广义线性模型"(Generalized Linear Models;GLM):class sm.GLM(<endog>,<exog>[,family=None,offset=None,exposure=None,freq_weights=None,var_weights=None,missing='none',**kwargs])
  #实际上是class statsmodels.genmod.generalized_linear_model.GLM
"广义加性模型"(Generalized Additive Models;GAM):class sm.GLMGam(<endog>,<exog>[,smoother=None,alpha=0,family=None,offset=None,exposure=None,missing='none',**kwargs])
  #实际上是class statsmodels.gam.generalized_additive_model.GLMGam
"基于贝叶斯估计的广义线性混合模型"(Generalized Linear Mixed Model with Bayesian estimation):class sm.PoissonBayesMixedGLM(<endog>,<exog>,<exog_vc>,<ident>[,vcp_p=1,fe_p=2,fep_names=None,vcp_names=None,vc_names=None])
  #实际上是class statsmodels.genmod.bayes_mixed_glm.PoissonBayesMixedGLM
           

(6)离散与计数模型(Discrete and Count Models):

"广义泊松模型"(Generalized Poisson Model):class sm.GeneralizedPoisson(<endog>,<exog>[,p=1,offset=None,exposure=None,missing='none',check_rank=True,**kwargs])
  #实际上是class statsmodels.discrete.discrete_model.GeneralizedPoisson
"Logit模型"(Logit Model):class sm.Logit(<endog>,<exog>[,check_rank=True,**kwargs])
  #实际上是class statsmodels.discrete.discrete_model.Logit
"多分类Logit模型"(Multinomial Logit Model):class sm.MNLogit(<endog>,<exog>[,check_rank=True,**kwargs])
  #实际上是class statsmodels.discrete.discrete_model.MNLogit
"泊松模型"(Poisson Model):class sm.Poisson(<endog>,<exog>[,offset=None,exposure=None,missing='none',check_rank=True,**kwargs])
  #实际上是class statsmodels.discrete.discrete_model.Poisson
"Probit模型"(Probit Model):class sm.Probit(<endog>,<exog>[,check_rank=True,**kwargs])
  #实际上是class statsmodels.discrete.discrete_model.Probit
"负二项式模型"(Negative Binomial Model):class sm.NegativeBinomial(<endog>,<exog>[,loglike_method='nb2',offset=None,exposure=None,missing='none',check_rank=True,**kwargs])
  #实际上是class statsmodels.discrete.discrete_model.NegativeBinomial
"广义负二项式模型"(Generalized Negative Binomial Model):class sm.NegativeBinomialP(<endog>,<exog>[,p=2,offset=None,exposure=None,missing='none',check_rank=True,**kwargs])
  #实际上是class statsmodels.discrete.discrete_model.NegativeBinomialP
"零膨胀广义泊松模型"(Zero Inflated Generalized Poisson Model):class sm.ZeroInflatedGeneralizedPoisson(<endog>,<exog>[,exog_infl=None,offset=None,exposure=None,inflation='logit',p=2,missing='none',**kwargs])
  #实际上是class statsmodels.discrete.count_model.ZeroInflatedGeneralizedPoisson
"零膨胀广义负二项式模型"(Zero Inflated Generalized Negative Binomial Model):class sm.ZeroInflatedNegativeBinomialP(<endog>,<exog>[,exog_infl=None,offset=None,exposure=None,inflation='logit',p=2,missing='none',**kwargs])
  #实际上是class statsmodels.discrete.count_model.ZeroInflatedNegativeBinomialP
"泊松零膨胀模型"(Poisson Zero Inflated Model):class sm.ZeroInflatedPoisson(<endog>,<exog>[,exog_infl=None,offset=None,exposure=None,inflation='logit',missing='none',**kwargs])
  #实际上是class statsmodels.discrete.count_model.ZeroInflatedPoisson
           

(7)多变量模型(Multivariate Models):

"多元方差分析"(Multivariate Analysis of Variance;MANOVA):class sm.MANOVA(<endog>,<exog>[,missing='none',hasconst=None,**kwargs])
  #实际上是class statsmodels.multivariate.manova.MANOVA
"主成分分析"(Principal Component Analysis;PCA):class sm.PCA(<data>[,ncomp=None,standardize=True,demean=True,normalize=True,gls=False,weights=None,method='svd',missing=None,tol=5e-08,max_iter=1000,tol_em=5e-08,max_em_iter=100])
  #实际上是class statsmodels.multivariate.pca.PCA
           

(8)其他模型(Misc Models):

"线性混合效应模型"(Linear Mixed Effects Model):class sm.MixedLM(<endog>,<exog>,<groups>[,exog_re=None,exog_vc=None,use_sqrt=True,missing='none',**kwargs])
  #实际上是class statsmodels.regression.mixed_linear_model.MixedLM
"Cox比例风险回归模型"(Cox Proportional Hazards Regression Model):class sm.PHReg(<endog>,<exog>[,status=None,entry=None,strata=None,offset=None,ties='breslow',missing='drop',**kwargs])
  #实际上是class statsmodels.duration.hazard_regression.PHReg
"分位数回归"(Quantile Regression):class sm.QuantReg(<endog>,<exog>[,**kwargs])
  #实际上是class statsmodels.regression.quantile_regression.QuantReg
"稳健线性模型"(Robust Linear Model):class sm.RLM(<endog>,<exog>[,M=None,missing='none',**kwargs])
  #实际上是class statsmodels.robust.robust_linear_model.RLM
"对生存函数的估计和推断"(Estimation and inference for a survival function):class sm.SurvfuncRight(<time>,<status>[,entry=None,title=None,freq_weights=None,exog=None,bw_factor=1.0])
  #实际上是class statsmodels.duration.survfunc.SurvfuncRight
           

(9)图像(Graphics):

Q-Q and P-P Probability Plots:class sm.ProbPlot(<data>[,dist=<scipy.stats._continuous_distns.norm_gen object>,fit=False,distargs=(),a=0,loc=0,scale=1])
  #实际上是class statsmodels.graphics.gofplots.ProbPlot
Plot a reference line for a qqplot:sm.qqline(<ax>,<line>[,x=None,y=None,dist=None,fmt='r-',**lineoptions])
  #实际上是statsmodels.graphics.gofplots.qqline()
Q-Q plot of the quantiles of x versus the quantiles/ppf of a distribution:sm.qqplot(<data>[,dist=<scipy.stats._continuous_distns.norm_gen object>,distargs=(),a=0,loc=0,scale=1,fit=False,line=None,ax=None,**plotkwargs])
  #实际上是statsmodels.graphics.gofplots.qqplot()
Q-Q Plot of two samples’ quantiles:sm.qqplot_2samples(<data1>,<data2>[,xlabel=None,ylabel=None,line=None,ax=None])
  #实际上是statsmodels.graphics.gofplots.qqplot_2samples()
           

(10)工具(Tools):

Run the test suite:sm.test([extra_args=None,exit=False])
  #实际上是statsmodels.__init__.test()
Add a column of ones to an array:sm.add_constant(<data>[,prepend=True,has_constant='skip'])
  #实际上是tatsmodels.tools.tools.add_constant()
Load a previously saved object:sm.load_pickle(<fname>)
  #实际上是statsmodels.iolib.smpickle.load_pickle()
List the versions of statsmodels and any installed dependencies:sm.show_versions([show_dirs=True])
  #实际上是statsmodels.tools.print_version.show_versions()
Opens a browser and displays online documentation:sm.webdoc([func=None,stable=None])
  #实际上是statsmodels.tools.web.webdoc()
           

3.时间序列研究(Time-Series Study)

(1)导入:

#通常导入为tsa:
import statsmodels.tsa.api as tsa
           

(2)统计与测试(Statistics and Tests):

求"自相关函数"(Autocorrelation Function):tsa.acf(<x>[,adjusted=False,nlags=None,qstat=False,fft=None,alpha=None,missing='none'])
  #实际上是statsmodels.tsa.stattools.acf()
估计"自协方差"(Autocovariance):tsa.acovf(<x>[,adjusted=False,demean=True,fft=None,missing='none',nlag=None])
  #实际上是statsmodels.tsa.stattools.acovf()
"迪基-福勒单位根检验"(Augmented Dickey-Fuller Unit Root Test):tsa.adfuller(<x>[,maxlag=None,regression='c',autolag='AIC',store=False,regresults=False])
  #实际上是statsmodels.tsa.stattools.adfuller()
时间序列独立性的"BDS检验统计"(BDS Test Statistic):tsa.bds(<x>[,max_dim=2,epsilon=None,distance=1.5])
  #实际上是statsmodels.tsa.stattools.bds()
"互相关函数"(Cross-Correlation Function):tsa.ccf(<x>,<y>[,adjusted=True])
  #实际上是statsmodels.tsa.stattools.ccf()
求2个序列间的"互协方差"(Cross-Covariance):tsa.ccovf(<x>,<y>[,adjusted=True,demean=True])
  #实际上是statsmodels.tsa.stattools.ccovf()
对"一元方程"(Univariate Equation)的"无协整性"(No-Cointegration)的测试:tsa.coint(<y0>,<y1>[,trend='c',method='aeg',maxlag=None,autolag='aic',return_results=None])
  #实际上是statsmodels.tsa.stattools.coint()
"KPSS检验"(Kwiatkowski-Phillips-Schmidt-Shin Test):tsa.kpss(<x>[,regression='c',nlags=None,store=False])
  #实际上是statsmodels.tsa.stattools.kpss()
"偏自相关估计"(Partial Autocorrelation Estimate):tsa.pacf(<x>[,nlags=None,method='ywadjusted',alpha=None])
  #实际上是statsmodels.tsa.stattools.pacf()
基于OLS的偏自相关:tsa.pacf_ols(<x>[,nlags=None,efficient=True,adjusted=False])
  #实际上是statsmodels.tsa.stattools.pacf_ols()
基于"非递归尤尔—沃克方程"(Non-Recursive Yule_Walker)的偏自相关估计:tsa.pacf_yw(<x>[,nlags=None,method='adjusted'])
  #实际上是statsmodels.tsa.stattools.pacf_yw()
求"LBQ统计量"(Ljung-Box Q Statistic):tsa.q_stat(<x>,<nobs>[,type=None])
  #statsmodels.tsa.stattools.q_stat()
           

(3)单变量时间序列分析(Univariate Time-Series Analysis):

"自回归模型"(Autoregressive Model;AR Model):class tsa.AutoReg(<endog>,<lags>[,trend='c',seasonal=False,exog=None,hold_back=None,period=None,missing='none',deterministic=None,old_names=None])
  #实际上是class statsmodels.tsa.ar_model.AutoReg
"自回归差分整合滑动平均模型"(Autoregressive Integrated Moving Average Model;ARIMA Model):class tsa.ARIMA(<endog>[,exog=None,order=(0,0,0),seasonal_order=(0,0,0,0),trend=None,enforce_stationarity=True,enforce_invertibility=True,concentrate_scale=False,trend_offset=1,dates=None,freq=None,missing='none',validate_specification=True])
  #实际上是class statsmodels.tsa.arima.model.ARIMA
带有"外生回归因子"(Exogenous Regressors)的"季节性自回归差分整合滑动平均模型"(Seasonal AutoRegressive Integrated Moving Average Model;SARIMA Model):class tsa.SARIMAX(<endog>[,exog=None,order=(0,0,0),seasonal_order=(0,0,0,0),trend=None,measurement_error=False,time_varying_regression=False,mle_regression=True,simple_differencing=False,enforce_stationarity=True,enforce_invertibility=True,hamilton_representation=False,concentrate_scale=False,trend_offset=1,use_exact_diffuse=False,dates=None,freq=None,missing='none',validate_specification=True,**kwargs])
  #实际上是class statsmodels.tsa.statespace.sarimax.SARIMAX
计算大量ARMA模型的"信息准则"(Information Criteria):tsa.arma_order_select_ic(<y>[,max_ar=4,max_ma=2,ic='bic',trend='c',model_kw=None,fit_kw=None])
  #实际上是statsmodels.tsa.stattools.arma_order_select_ic()
来自ARMA模型的"模拟数据"(Simulate Data):tsa.arma_generate_sample(<ar>,<ma>[,nsample,scale=1,distrvs=None,axis=0,burnin=0])
  #实际上是statsmodels.tsa.arima_process.arma_generate_sample()
指定"拉格朗日多项式"(Lagrange-Polynomial)的ARMA过程的理论性质:class tsa.ArmaProcess([ar=None,ma=None,nobs=100])
  #实际上是class statsmodels.tsa.arima_process.ArmaProcess
           

(4)指数平滑法(Exponential Smoothing):

"霍尔特-温特指数平滑法"(Holt Winter's Exponential Smoothing):class tsa.ExponentialSmoothing(<endog>[,trend=None,damped_trend=False,seasonal=None,seasonal_periods=None,initialization_method=None,initial_level=None,initial_trend=None,initial_seasonal=None,use_boxcox=None,bounds=None,dates=None,freq=None,missing="none")
  #实际上是class statsmodels.tsa.holtwinters.ExponentialSmoothing
"霍尔特指数平滑法"(Holt's Exponential Smoothing):class tsa.Holt(<endog>[,exponential=False,damped_trend=False,initialization_method=None,initial_level=None,initial_trend=None])
  #实际上是class statsmodels.tsa.holtwinters.Holt
"简单指数平滑法"(Simple Exponential Smoothing):class tsa.SimpleExpSmoothing(<endog>[,initialization_method=None,initial_level=None])
  #实际上是class statsmodels.tsa.holtwinters.SimpleExpSmoothing
"线性指数平滑模型"(Linear Exponential Smoothing Models):class tsa.ExponentialSmoothing(<endog>[,trend=False,damped_trend=False,seasonal=None,initialization_method='estimated',initial_level=None,initial_trend=None,initial_seasonal=None,bounds=None,concentrate_scale=True,dates=None,freq=None,missing='none'])
  #实际上是class statsmodels.tsa.statespace.exponential_smoothing.ExponentialSmoothing
"ETS模型"(ETS models):class tsa.ETSModel(<endog>[,error='add',trend=None,damped_trend=False,seasonal=None,seasonal_periods=None,initialization_method='estimated',initial_level=None,initial_trend=None,initial_seasonal=None,bounds=None,dates=None,freq=None,missing='none'])
  #实际上是class statsmodels.tsa.exponential_smoothing.ets.ETSModel
           

(5)多元时间序列模型(Multivariate Time Series Models):

"动态因子模型"(Dynamic factor model):class tsa.DynamicFactor(<endog>,<k_factors>,<factor_order>[,exog=None,error_order=0,error_var=False,error_cov_type='diagonal',enforce_stationarity=True,**kwargs])
  #实际上是class statsmodels.tsa.statespace.dynamic_factor.DynamicFactor
基于"最大期望算法"(Expectation-Maximization Algorithm;EM Algorithm)的动态因子模型:class tsa.DynamicFactorMQ(<endog>[,k_endog_monthly=None,factors=1,factor_orders=1,factor_multiplicities=None,idiosyncratic_ar1=True,standardize=True,endog_quarterly=None,init_t0=False,obs_cov_diag=False,**kwargs])
  #实际上是class statsmodels.tsa.statespace.dynamic_factor_mq.DynamicFactorMQ
拟合VAR(p)过程并选择"滞后阶数"(Lag Order):class tsa.VAR(<endog>[,exog=None,dates=None,freq=None,missing='none'])
  #实际上是class statsmodels.tsa.vector_ar.var_model.VAR
带有外生回归因子的"向量自回归滑动平均模型"(Vector Autoregressive Moving Average Model):class tsa.VARMAX(<endog>[,exog=None,order=(1,0),trend='c',error_cov_type='unstructured',measurement_error=False,enforce_stationarity=True,enforce_invertibility=True,trend_offset=1,**kwargs])
  #实际上是class statsmodels.tsa.statespace.varmax.VARMAX
拟合VAR过程并估计A与B的"Structural Components":class tsa.SVAR(<endog>,<svar_type>[,dates=None,freq=None,A=None,B=None,missing='none'])
  #实际上是class statsmodels.tsa.vector_ar.svar_model.SVAR
"向量误差修正模型"(Vector Error Correction Model;VECM):class tsa.VECM(<endog>[,exog=None,exog_coint=None,dates=None,freq=None,missing='none',k_ar_diff=1,coint_rank=1,deterministic='nc',seasons=0,first_season=0])
  #实际上是class statsmodels.tsa.vector_ar.vecm.VECM
"一元未观测分量时间序列模型"(Univariate Unobserved Components Time Series Model):class tsaNone,exog=None,irregular=False,stochastic_level=False,stochastic_trend=False,stochastic_seasonal=True,stochastic_freq_seasonal=None,stochastic_cycle=False,damped_cycle=False,cycle_period_bounds=None,mle_regression=True,use_exact_diffuse=False,**kwargs])
  #实际上是class statsmodels.tsa.statespace.structural.UnobservedComponents
           

(6)过滤与分解(Filters and Decompositions):

基于滑动平均的"季节分解"(Seasonal Decomposition):tsa.seasonal_decompose(<x>[,model='additive',filt=None,period=None,two_sided=True,extrapolate_trend=0])
  #实际上是statsmodels.tsa.seasonal.seasonal_decompose()
"基于LOESS的季节-趋势分解"(Season-Trend Decomposition using LOESS;STL):class tsa.STL(<endog>[,period=None,seasonal=7,trend=None,low_pass=None,seasonal_deg=0,trend_deg=0,low_pass_deg=0,robust=False,seasonal_jump=1,trend_jump=1,low_pass_jump=1])
  #实际上是class statsmodels.tsa.seasonal.STL
"BK带通滤波器"(Baxter-King Bandpass Filter):tsa.bkfilter(<x>[,low=6,high=32,K=12])
  #实际上是statsmodels.tsa.filters.bk_filter.bkfilter()
"CF不对称随机游走滤波器"(Christiano Fitzgerald Asymmetric,Random Walk Filter):tsa.cffilter(<x>[,low=6,high=32,drift=True])
  #实际上是statsmodels.tsa.filters.cf_filter.cffilter()
"HP滤波器"(Hodrick-Prescott Filter.):tsa.hpfilter(<x>[,lamb=1600])
  #实际上是statsmodels.tsa.filters.hp_filter.hpfilter()
           

(7)马尔可夫区制转换模型(Markov Regime Switching Models):

"马尔可夫转换回归模型"(Markov Switching Regression Model):class tsa.MarkovAutoregression(<endog>,<k_regimes>,<order>[,trend='c',exog=None,exog_tvtp=None,switching_ar=True,switching_trend=True,switching_exog=False,switching_variance=False,dates=None,freq=None,missing='none'])
  #实际上是class statsmodels.tsa.regime_switching.markov_autoregression.MarkovAutoregression
"1阶K-区制马尔可夫转换模型"(First-Order K-Regime Markov Switching Regression Model):class tsa.MarkovRegression(<endog>,<k_regimes>[,trend='c',exog=None,order=0,exog_tvtp=None,switching_trend=True,switching_exog=True,switching_variance=False,dates=None,freq=None,missing='none'])
  #实际上是class statsmodels.tsa.regime_switching.markov_regression.MarkovRegression
           

(8)预测(Forecasting):

Model-based forecasting using STL to remove seasonality:class tsa.STLForecast(<endog>,<model>[,model_kwargs=None,period=None,seasonal=7,trend=None,low_pass=None,seasonal_deg=1,trend_deg=1,low_pass_deg=1,robust=False,seasonal_jump=1,trend_jump=1,low_pass_jump=1])
  #实际上是class statsmodels.tsa.forecasting.stl.STLForecast
The Theta forecasting model of Assimakopoulos and Nikolopoulos (2000):class tsa.ThetaModel(<endog>[,period=None,deseasonalize=True,use_test=True,method='auto',difference=False)
  #实际上是class statsmodels.tsa.forecasting.theta.ThetaModel
           

(9)时间序列工具(Time-Series Tools):

Returns an array with lags included given an array:tsa.add_lag(<x>[,col=None,lags=1,drop=False,insert=True])
  #实际上是statsmodels.tsa.tsatools.add_lag()
Add a trend and/or constant to an array:tsa.add_trend(<x>[,trend='c',prepend=False,has_constant='skip'])
  #实际上是statsmodels.tsa.tsatools.add_trend()
Detrend an array with a trend of given order along axis 0 or 1:tsa.detrend(<x>[,order=1,axis=0])
  #实际上是statsmodels.tsa.tsatools.detrend()
Create 2d array of lags:tsa.lagmat(<x>,<maxlag>[,trim='forward',original='ex',use_pandas=False])
  #实际上是statsmodels.tsa.tsatools.lagmat()
Generate lagmatrix for 2d array, columns arranged by variables:tsa.lagmat2ds(<x>[,maxlag0,maxlagex=None,dropex=0,trim='forward',use_pandas=False])
  #实际上是statsmodels.tsa.tsatools.lagmat2ds()
Container class for deterministic terms:class tsa.DeterministicProcess(<index>[,period=None,constant=False,order=0,seasonal=False,fourier=0,additional_terms=(),drop=False])
  #实际上是class statsmodels.tsa.deterministic.DeterministicProcess
           

(10)X12/X13接口(X12/X13 Interface):

Perform x13-arima analysis for monthly or quarterly data:tsa.x13_arima_analysis(<endog>[,maxorder=(2,1),maxdiff=(2,1),diff=None,exog=None,log=None,outlier=True,trading=False,forecast_periods=None,retspec=False,speconly=False,start=None,freq=None,print_stdout=False,x12path=None,prefer_x13=True])
  #实际上是statsmodels.tsa.x13.x13_arima_analysis()
Perform automatic seasonal ARIMA order identification using x12/x13 ARIMA:tsa.x13_arima_select_order(<endog>[,maxorder=(2,1),maxdiff=(2,1),diff=None,exog=None,log=None,outlier=True,trading=False,forecast_periods=None,start=None,freq=None,print_stdout=False,x12path=None,prefer_x13=True])
  #实际上是statsmodels.tsa.x13.x13_arima_select_order()
           

4.基于公式的接口(Formula Interface)

A convenience interface for specifying models using formula strings and DataFrames. This API directly exposes the
from_formula class method of models that support the formula API
           

(1)导入:

#通常导入为smf:
import statsmodels.formula.api as smf
           

(2)模型(Models):

二.patsy模块

官方文档:https://pypi.org/project/patsy/

1.概述

(1)简介:

patsy是1个用于描述统计模型(尤其是线性模型或具有线性组件的模型)和构建设计矩阵的Python库.其受R/S语言中的公式迷你语言启发并与之兼容,为
Python带来了"R公式"(R "formulas")的便利性
           

(2)安装:

pip install patsy
           

2.使用