天天看點

混合效應模型的假設與作用

It was Eisenhart (1947) who realized that there were actually two fundamentally different sorts of categorical

explanatory variables: he called these fixed effects and random effects. It will take a good deal of practice

before you are confident in deciding whether a particular categorical explanatory variable should be treated

as a fixed effect or a random effect, but in essence:

fixed effects influence only the mean of y;

random effects influence only the variance of y

The important point is that because the random effects come from a large population, there is not much

point in concentrating on estimating means of our small subset of factor levels, and no point at all in comparing

individual pairs of means for different factor levels. Much better to recognize them for what they are, random

samples from a much larger population, and to concentrate on their variance. This is the added variation

caused by differences between the levels of the random effects

Variance components analysis is all about estimating the size of this variance, and working out its percentage

contribution to the overall variation. There are five fundamental assumptions of linear mixed-effects models:

Within-group errors are independent with mean zero and variance σ2.

Within-group errors are independent of the random effects.

The random effects are normally distributed with mean zero and covariance matrix .

The random effects are independent in different groups.

The covariance matrix does not depend on the group.

The tricks with mixed-effects models are:

learning which variables are random effects;

specifying the fixed and random effects in the model formula;

getting the nesting structure of the random effects right;

remembering to get library(lme4) or library(nlme) at the outset.

The issues fall into two broad categories: questions about experimental design and the management of

experimental error (e.g. where does most of the variation occur, and where would increased replication

be most profitable?); and questions about hierarchical structure, and the relative magnitude of variation at

different levels within the hierarchy (e.g. studies on the genetics of individuals within families, families

within parishes, and parishes with counties, to discover the relative importance of genetic and phenotypic

variation)

Most ANOVA models are based on the assumption that there is a single error term. But in hierarchical

studies and nested experiments, where the data are gathered at two or more different spatial scales, there

is a different error variance for each different spatial scale. There are two reasonably clear-cut sets of

circumstances where your first choice would be to use a linear mixed-effects model: you want to do variance

components analysis because all your explanatory variables are categorical random effects and you do not

have any fixed effects; or you do have fixed effects, but you also have pseudoreplication of one sort or another

(e.g. temporal pseudoreplication resulting from repeated measurements on the same individuals; see p. 699).

To test whether one should use a model with mixed effects or just a plain old linear model, Douglas Bates

wrote in the R help archive: ‘I would recommend the likelihood ratio test against a linear model fit by lm.

The p-value returned from this test will be conservative because you are testing on the boundary of the

parameter space.

繼續閱讀