三种中心化Mean Centered

内容来源链接：http://blog.sina.com.cn/s/blog_6d078e2b0101f9bn.html

中心化Centering

HLM提供两种中心化的选项。HLM offers the options to use predictors as they are, or to use them after grand- or group-mean centering them.

选择哪一种中心化的方法由研究问题决定，对于中心化策略的选择应该投入更多的关注，因为中心化类型的选择决定了模型的系数。The choice of centering method is dictated by the question studied, and great care should be taken to select a form of centering appropriate to the model considered, as the interpretation of coefficients in the model is dependent on the type of centering used.

A number of options concerning the location of predictors can be considered. In some cases, a proper choice of location will be required in order to insure numerical stability in estimating hierarchical linear models. Predictors can, for example, be transformed to deviations from the grand mean or to deviations from the group means. In the case of Model 2, the firm mean of the log of the assets over the years for which measurements are available can be used.

不同的中心化的策略会产生不同的结果。However, centering in multilevel models can have different and unexpected results, depending on the way in which the variables are centered.

Two main advantages of centering the predictors are:

1、使模型的系数更容易理解，将推动研究的理论模型与统计结果联系起来。Obtaining estimates of and other effects that are easier to interpret, so that the statistical results can be related to the theoretical concerns that motivate the research.

2、降低随机斜率和截距的高相关，降低不同层和跨层的变量相关。Removing high correlations between the random intercept and slopes, and high correlations between first- and second-level variables and cross-level interactions (for a detailed discussion of this aspect, the reader is referred to Kreft and de Leeuw, (1998), pp. 135 to 137).

实际上由我的上一篇日志，我们可以知道中心化可以处理多重共线。

总平减 Model 3: Grand mean centering

总平减是利用总平均值对变量进行中心化。In the grand mean centered model, the explanatory variable(s) are centered around the overall mean. Continuing with the financial data example given in model 2, the log of assets (LNASTS) used in model 2 is replaced with deviations from , which represents the grand mean of all the log of assets values, irrespective of firm.

In this model, is the expected value of given , that is the expected value of a measurement i from firm j with log of assets equal to the grand mean of all the log(assets) values. Each is interpreted as the mean outcome for the j-th level-2 unit adjusted for differences in within this level-2 unit.

The variance of has a different interpretation, too. It is now the variance between level-2 units in the adjusted means.

It can be shown that the two models considered thus far are equivalent linear models (see Kreft et al. (1995) and Kreft and De Leeuw (1998) pages 106 to 114). Equivalent models will not have the same parameter estimates, but estimates from one can be translated into the estimates from another. They will have the same fit, same predicted values and the same residuals.

组平减 Model 4: Group mean centering

组平减是应用组平均值对解释变量进行中心化。In the group mean centered model, the explanatory variable(s) are centered around the group mean, in this case the firm mean. In the model given below, represents the group mean of all the log of assets values from firm j.

In this model, the between-group variation of the log of assets (LNASTS) is no longer estimated separately. As a result, the level-2 intercept relationships simply represent the group (firm) level relationship between the level-2 predictor and the outcome variable, as can be seen here from the interpretation of given above.

It can also be shown that the group-mean centered model described here is not linearly numerically equivalent to either the raw score or grand mean centered model. The two special cases when the group centered model is equivalent to the other models are described in Kreft & De Leeuw (1998), p 109. In all other cases the lack of linear equivalence implies that the fit of the group centered model, as well as the estimates, are different for this model.

两种中心化模型的对比 Comparison of centered models

The results for the raw score, grand mean centered and group mean centered models are given in the table below.From the table, we see thatThe estimate of in the case of the raw score model is 13.3978. Recall that this is the expected return of assets for a measurement with the log of assets, LNASTS, equal to 0. The level-2 residual associated with has a variance of 179.7382, which is the estimated variation in intercepts over the firms.

For the grand mean centered model, is estimated at 5.0709. This is the adjusted mean return on assets. is estimated at 14.3438, and represents the variance among the groups in the adjusted means.

In the case of the group mean centered model, is the average of the return on assets across the population of firms. Here is estimated as 20.0190, with the unique increment to the intercept associated with firm.

In the case of , the estimates obtained from the raw score and grand mean centered models are essentially the same. The estimate from the group centered model, however, is smaller.

In the grand mean centered model, is the pooled-within regression coefficient for LNASTS.

In the raw score model, is the main effect for the LNASTS slope.

In the group mean centered model, however, is the average LNASTS-ROA regression slope across firms.

Turning to the variance components, we find that and were the same for the raw score and grand mean centered models, but different for the group mean centered model. In the case of the group mean centered model, the variance at level-1 in is now the residual variance after controlling for the level-1 predictor within each group. As group-mean centering individual measurements leads to smaller differences between the original measurements and the group mean than would be obtained when differences from the grand mean is calculated, a reduced level-1 variance component is not unexpected. In the table below, a small illustration of the effect of group mean and grand mean centering is given for 2 hypothetical level-2 units.

As centering reduces the correlation between, in this example, the intercept and the LNASTS slope at level-2, decreases dramatically.

As is clear from this illustration, the type of centering used has different consequences. A researcher should consider these options carefully, and refrain from centering simply to improve statistical stability. If group-mean centering is used, it is advisable to add the group means for the centered predictor to the model as well, as illustrated above in order to avoid estimating an uncorrected between-groups effect.

Further reading:

Bryk, A.S. & Raudenbush, S.W., Hierarchical Linear Models, Sage, (1992): General centering information is given on pages 25 to 28 of this text. A group mean centered model is discussed in detail in Chapter 2, while a grand mean centered example can be found in Chapter 4.

Kreft, I. & De Leeuw, J., Introducing Multilevel Modeling, Sage, (1998):A detailed discussion of the consequences of centering, with examples of the different options using one data set can be found on pages 106 to 114.

Snijders, T. & Bosker, R., Sage, (2000): pp. 80-81 and pp. 52-54 of this text covers the topic of centering and provides a simple, yet clear example of the use of a level-1 predictor and its mean as covariate at level-2 of the hierarchy.

Hofmann, D.A. & Gavin, Mark B., Centering Decisions in Hierarchical Linear Models: Implications for Research in Organizations, Journal of Management, 1998, Vol 24, No 5, pages 623-641.

1、什么是多重共线性？

所谓多重共线性（Multicollinearity）是指线性回归模型中的解释变量之间由于存在精确相关关系或高度相关关系而使模型估计失真或难以估计准确。一般来说，由于经济数据的限制使得模型设计不当，导致设计矩阵中解释变量间存在普遍的相关关系。

在经典的线性回归模型分析中，我们曾假设解释变量矩阵X是满秩的，也就是解释变量之间没有明确的线性相关关系，这样也就保证了Multicollinearity的存在性以及普通最小二乘法的可行性。

2 、解决方法

多重共线性由R.Frisch在1934年引入的,主要研究是在上世纪六、七十年代进行的,但直到现在仍然没有完全解决。目前国内文献中处理严重共线性的方法常用的有以下几种:岭回归(RR)、主成分回归(PCR)、逐步回归、偏最小二乘法(PLS)、数据分组处理算法(GMDH)等。

当自变量为分类变量，且需要判断影响因变量的众多因素中，哪些因素起主要作用，哪些因素起次要作用，或判断不同的方案中哪个方案最好时，可以选用 UNIANOVA。例如，比较不同的广告类型的促销效果；分析不同的机械操作方法中哪一种提高劳动效率最高；分析影响产品质量、生产量或销售量的众多因素中，哪些因素起显著影响等等。当自变量为等测度变量，需要研究变量之间的规律性进而对生产或科学试验的结果进行预测或控制时，就必须获得变量间的精确关系式，可以选择Linear。例如，农业生产中施肥量与产量之间的关系；居民存款与居民收入之间的关系；工程建设项目中施工成本与工程量之间的关系；

3、在分层线性模型如何处理多重共线性

HLM对多重共线的处理较为方便，主要是通过组平减和总平减的方法。

4、为什么提出这个问题？

主要是因为当我们去找FX的以拟合非线性的函数的时候，总会遇到找到的fx之间的VIF超过10的情况出现。而HLM对此非常敏感。

三种中心化Mean Centered

继续阅读

根据时间或时间戳分组统计查询SQL记录

SPSS——基本的统计概念SPSS——基本的统计概念总体和样本随机变量统计量

python 矢量转栅格

Subaru/HiCIAO观测RYTau近红外散射光：蝴蝶状分布展现引言再RYTau（一颗原恒星）上进行了近红外冕状成像

C语言实现的滑动平均滤波算法

算法和算法分析

快速排序算法的优势

从算法入手讲解如何在数据库中实现最优最简

pytorch nn.Linear(x)中x的数据维度

Java工具类之Apache的Commons-lang

Pandas将inf， nan转化成特定的值

Excel技巧：巧用字符串连接

企业架构13——数据处理

IBM Power编程马拉松——以“码”会友，把手言欢！

【数据处理】 python 基于Basemap地理信息可视化数据可视化方法——Basemap效果

【python】【数据处理】画多维数据分布图