Re: DF when there are missing factor levels
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: DF when there are missing factor levels

> The problem: only a subsection of the full data has a particular effect
> recorded. We wish to estimate this effect without subsetting the data,
> given that other effects of significance are estimated from the complete
> data, and these estimates behave poorly in the subset. So, we wish to
> analyse a data set where factor levels are missing for all animals without
> this effect.
> Previously, Arthur indicated that missing factor levels are treated as a
> zero level. In our case we have four factor levels (+ the one that is
> missing). The DF reported are 4 instead of three, and all levels have a
> non-zero solution reported. No equations are generated for level zero,
> which probably explains why there is no zero solution for level zero.
> How are these results to be interpreted? Are the solutions meaningful, and
> what exactly do they represent a deviation from for the effect of interest?
> I can't really see that we should have four degrees of freedom for
> starters, or that you can generate a legitimate DF by having missing
> records. There is no equation fitting a dummy effect for all animals where
> the level is missing (unless this became one of the singularities
> reported?). We get identical solutions when we specify the effect is only
> to be fit for animals with records (eg using at(yesrecord,1).effect).

If the level is really missing, then this is just analysis of data set
with missing cells (Type IV sum of squares in SAS parlance and I don't
remember how Genstat does it).  There should be no reason to just consider
the subset if this is the case. You just have to make sure that you are
dealing with estimable functions, especially with interactions.

From your description, you don't have a missing cell when you
consider this term as a main effect (or it is not correctly coded in
ASREML).  All five levels have some observations present.  Consider the
following structure with the number of observations for each cells:

Factor1      A       B   Total1
  L          10      20    30
  M          10      20    30
  N           0      30    30

Total2       20      50

In a main effects model there is no problem since the missing
combination doesn't appear in the calculations of the sum of squares.
However, if you fitted the interaction, you would need to find the
estimable contrasts. The interaction will only have 1 df and generally you
can find alternative ways to calculate the sum of squares associated with
the main effects. The interpretation remains the same after you allow for
the fact that you cannot say anything about level N of Factor1 interacts
with Factor2.

If your library has it, see Milliken and Johnson (1992) book 'Analysis of
Messy Data Volume 1: Designed experiments'. Note that they use GLM
procedure of SAS throughout so it is a little dated  but the general
concepts remain valid.


Asreml mailinglist archive: