Hi,
Where to start .... This is not an easy subject but I will try. Others should
correct me. For the most part everything is referred to the standard regression
and ANOVA terms. This is because these are one case of generalized linear
models (GLM). GLM's are more general accounting for all distributions in the
exponential family (Normal, Poisson, Gamma, Binomial) and some that aren't
through quasi-likelihood.
Note also, that the threshold model for binary data is the very basic binomial
GLM without the flexibility of the GLM.
> I have also read the various answers regarding aspects of running GLMMs
>(on-line and in the manual), and have a fairly general understanding of
>what's going on (I think). However, I am a bit confused as to exactly what
>information the Deviance value is telling you (and why you calculate it
> twice per iteration used to estimate the VC) and how this is related to
> dispersion.
Find a copy of 'Modelling Binary data' by Collett - it is now back in print. It
is very good and rather basic. To my mind, the best book on the subject for
standard analysis of binary data. A passable (nothing to do with the fact the
author's Aussie address :-)) next step is 'An introduction to generalized linear
models' by AJ Dobson. The key reference is 'Generalized linear
models' by McCullagh and Nelder - it is rather heavy going. I have yet to
find any other books as good as these.
Deviance is a measure of extent of the difference between a full model and a
reduced model. Under normality with identity link (i.e. your standard
regression: Y ~ mu etc), this is just the residual sums of squares.
General asymptotic theory for GLMs: Deviance is Chi-squared distributed.
So, the difference between two deviance is approximately Chi-squared and
leading to analysis of Deviance (very similar to the ANOVA).
The deviance divided by the residual degrees of freedom is the scale parameter.
In Normal theory, this is just the residual variance.
The pure binary case, the scale parameter is one:
if the scale parameter is sufficiently greater than 1 then it is called
overdispersion.
If sufficiently less than 1 then underdispersion - rather rare ->
due to incorrect model assumptions?
Overdispersion suggests that other sources of variance are present. For
example a sire model (assuming a pure additive effects model) is overdispersed
as pointed out by others (Templeman and Gianola, 1996 Biometrics 52:265 is
the only one I know published where it is clearly stated.) - residual variance
contains 3/4 of the additive genetic variance, whereas the animal model
accounts for all the additive genetic variance. I believe this point was
ignored by Mayer (1995: Genet. Sel. Evol. 27:423) as sire and animal models are
not equivalent models.
Sorry, I have to rush,so but I will continue this later.
Please feel free to ask further questions,
Bruce Southey
--
Asreml mailinglist archive: http://www.chiswick.anprod.csiro.au/lists/asreml