天天看點

三種中心化Mean Centered

内容來源連結:http://blog.sina.com.cn/s/blog_6d078e2b0101f9bn.html

中心化Centering

HLM提供兩種中心化的選項。HLM offers the options to use predictors as they are, or to use them after grand- or group-mean centering them.

選擇哪一種中心化的方法由研究問題決定,對于中心化政策的選擇應該投入更多的關注,因為中心化類型的選擇決定了模型的系數。The choice of centering method is dictated by the question studied, and great care should be taken to select a form of centering appropriate to the model considered, as the interpretation of coefficients in the model is dependent on the type of centering used.

A number of options concerning the location of predictors can be considered. In some cases, a proper choice of location will be required in order to insure numerical stability in estimating hierarchical linear models. Predictors can, for example, be transformed to deviations from the grand mean or to deviations from the group means. In the case of Model 2, the firm mean of the log of the assets over the years for which measurements are available can be used.

不同的中心化的政策會産生不同的結果。However, centering in multilevel models can have different and unexpected results, depending on the way in which the variables are centered.

Two main advantages of centering the predictors are:

1、使模型的系數更容易了解,将推動研究的理論模型與統計結果聯系起來。Obtaining estimates of and other effects that are easier to interpret, so that the statistical results can be related to the theoretical concerns that motivate the research.

2、降低随機斜率和截距的高相關,降低不同層和跨層的變量相關。Removing high correlations between the random intercept and slopes, and high correlations between first- and second-level variables and cross-level interactions (for a detailed discussion of this aspect, the reader is referred to Kreft and de Leeuw, (1998), pp. 135 to 137).

實際上由我的上一篇日志,我們可以知道中心化可以處理多重共線。

總平減 Model 3: Grand mean centering

總平減是利用總平均值對變量進行中心化。In the grand mean centered model, the explanatory variable(s) are centered around the overall mean. Continuing with the financial data example given in model 2, the log of assets (LNASTS) used in model 2 is replaced with deviations from , which represents the grand mean of all the log of assets values, irrespective of firm.

In this model, is the expected value of given , that is the expected value of a measurement i from firm j with log of assets equal to the grand mean of all the log(assets) values. Each is interpreted as the mean outcome for the j-th level-2 unit adjusted for differences in within this level-2 unit.

The variance of has a different interpretation, too. It is now the variance between level-2 units in the adjusted means.

It can be shown that the two models considered thus far are equivalent linear models (see Kreft et al. (1995) and Kreft and De Leeuw (1998) pages 106 to 114). Equivalent models will not have the same parameter estimates, but estimates from one can be translated into the estimates from another. They will have the same fit, same predicted values and the same residuals.

組平減 Model 4: Group mean centering

組平減是應用組平均值對解釋變量進行中心化。In the group mean centered model, the explanatory variable(s) are centered around the group mean, in this case the firm mean. In the model given below, represents the group mean of all the log of assets values from firm j.

In this model, the between-group variation of the log of assets (LNASTS) is no longer estimated separately. As a result, the level-2 intercept relationships simply represent the group (firm) level relationship between the level-2 predictor and the outcome variable, as can be seen here from the interpretation of given above.

It can also be shown that the group-mean centered model described here is not linearly numerically equivalent to either the raw score or grand mean centered model. The two special cases when the group centered model is equivalent to the other models are described in Kreft & De Leeuw (1998), p 109. In all other cases the lack of linear equivalence implies that the fit of the group centered model, as well as the estimates, are different for this model.

兩種中心化模型的對比 Comparison of centered models

The results for the raw score, grand mean centered and group mean centered models are given in the table below.From the table, we see thatThe estimate of in the case of the raw score model is 13.3978. Recall that this is the expected return of assets for a measurement with the log of assets, LNASTS, equal to 0. The level-2 residual associated with has a variance of 179.7382, which is the estimated variation in intercepts over the firms.

For the grand mean centered model, is estimated at 5.0709. This is the adjusted mean return on assets. is estimated at 14.3438, and represents the variance among the groups in the adjusted means.

In the case of the group mean centered model, is the average of the return on assets across the population of firms. Here is estimated as 20.0190, with the unique increment to the intercept associated with firm.

In the case of , the estimates obtained from the raw score and grand mean centered models are essentially the same. The estimate from the group centered model, however, is smaller.

In the grand mean centered model, is the pooled-within regression coefficient for LNASTS.

In the raw score model, is the main effect for the LNASTS slope.

In the group mean centered model, however, is the average LNASTS-ROA regression slope across firms.

Turning to the variance components, we find that and were the same for the raw score and grand mean centered models, but different for the group mean centered model. In the case of the group mean centered model, the variance at level-1 in is now the residual variance after controlling for the level-1 predictor within each group. As group-mean centering individual measurements leads to smaller differences between the original measurements and the group mean than would be obtained when differences from the grand mean is calculated, a reduced level-1 variance component is not unexpected. In the table below, a small illustration of the effect of group mean and grand mean centering is given for 2 hypothetical level-2 units.

As centering reduces the correlation between, in this example, the intercept and the LNASTS slope at level-2, decreases dramatically.

As is clear from this illustration, the type of centering used has different consequences. A researcher should consider these options carefully, and refrain from centering simply to improve statistical stability. If group-mean centering is used, it is advisable to add the group means for the centered predictor to the model as well, as illustrated above in order to avoid estimating an uncorrected between-groups effect.

Further reading:

Bryk, A.S. & Raudenbush, S.W., Hierarchical Linear Models, Sage, (1992): General centering information is given on pages 25 to 28 of this text. A group mean centered model is discussed in detail in Chapter 2, while a grand mean centered example can be found in Chapter 4.

Kreft, I. & De Leeuw, J., Introducing Multilevel Modeling, Sage, (1998):A detailed discussion of the consequences of centering, with examples of the different options using one data set can be found on pages 106 to 114.

Snijders, T. & Bosker, R., Sage, (2000): pp. 80-81 and pp. 52-54 of this text covers the topic of centering and provides a simple, yet clear example of the use of a level-1 predictor and its mean as covariate at level-2 of the hierarchy.

Hofmann, D.A. & Gavin, Mark B., Centering Decisions in Hierarchical Linear Models: Implications for Research in Organizations, Journal of Management, 1998, Vol 24, No 5, pages 623-641.

1、什麼是多重共線性?

     所謂多重共線性(Multicollinearity)是指線性回歸模型中的解釋變量之間由于存在精确相關關系或高度相關關系而使模型估計失真或難以估計準确。一般來說,由于經濟資料的限制使得模型設計不當,導緻設計矩陣中解釋變量間存在普遍的相關關系。

     在經典的線性回歸模型分析中,我們曾假設解釋變量矩陣X是滿秩的,也就是解釋變量之間沒有明确的線性相關關系,這樣也就保證了Multicollinearity的存在性以及普通最小二乘法的可行性。

2 、解決方法

多重共線性由R.Frisch在1934年引入的,主要研究是在上世紀六、七十年代進行的,但直到現在仍然沒有完全解決。目前國内文獻中處理嚴重共線性的方法常用的有以下幾種:嶺回歸(RR)、主成分回歸(PCR)、逐漸回歸、偏最小二乘法(PLS)、資料分組處理算法(GMDH)等。

當自變量為分類變量,且需要判斷影響因變量的衆多因素中,哪些因素起主要作用,哪些因素起次要作用,或判斷不同的方案中哪個方案最好時,可以選用 UNIANOVA。例如,比較不同的廣告類型的促銷效果;分析不同的機械操作方法中哪一種提高勞動效率最高;分析影響産品品質、生産量或銷售量的衆多因素中,哪些因素起顯著影響等等。當自變量為等測度變量,需要研究變量之間的規律性進而對生産或科學試驗的結果進行預測或控制時,就必須獲得變量間的精确關系式,可以選擇Linear。例如,農業生産中施肥量與産量之間的關系;居民存款與居民收入之間的關系;工程建設項目中施工成本與工程量之間的關系;

3、在分層線性模型如何處理多重共線性

HLM對多重共線的處理較為友善,主要是通過組平減和總平減的方法。

4、為什麼提出這個問題?

主要是因為當我們去找FX的以拟合非線性的函數的時候,總會遇到找到的fx之間的VIF超過10的情況出現。而HLM對此非常敏感。

繼續閱讀