Re: DF when there are missing factor levels
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: DF when there are missing factor levels

Dear Kim,

I gather the response variable is complete so I am unclear as to what
it means 'a particular effect is recorded'  unless you mean that
the factor has 4 possible states (Say Red Grn Ylw Blu) 
but the state is not recorded
on a proportion of the data creating a fifth state (UNKNOWN).

So there are actually five levels of the factor and therefore 4 DF.

The fact that 4 are explicitly coded and the fifth is missing
just means that you are formally constraining the 5 levels
so that the UNKNOWN has an effect of zero
and the other 4 are then calculated relative to the UNKNOWN group.

You would expect that the UNKNOWNs would be a weighted averaage of
the 4 real levels.  So, the average of the 4 real effects
shoulkd be near zero.

You might want to force the UNKNOWNs to be at the average of the others
in which case you could use  c(FACTOR)   and only fit 3 degrees
of freedom.

If  FACTOR has a large effect, you might expect the UNKNOWN animals
to have an inflated residual variance.


> Hi all,
> we have been over the DF question before, but I have another result I am
> curious about.
> The problem: only a subsection of the full data has a particular effect
> recorded. We wish to estimate this effect without subsetting the data,
> given that other effects of significance are estimated from the complete
> data, and these estimates behave poorly in the subset. So, we wish to
> analyse a data set where factor levels are missing for all animals without
> this effect.
> Previously, Arthur indicated that missing factor levels are treated as a
> zero level. In our case we have four factor levels (+ the one that is
> missing). The DF reported are 4 instead of three, and all levels have a
> non-zero solution reported. No equations are generated for level zero,
> which probably explains why there is no zero solution for level zero.
> How are these results to be interpreted? Are the solutions meaningful, and
> what exactly do they represent a deviation from for the effect of interest?
> I can't really see that we should have four degrees of freedom for
> starters, or that you can generate a legitimate DF by having missing
> records. There is no equation fitting a dummy effect for all animals where
> the level is missing (unless this became one of the singularities
> reported?). We get identical solutions when we specify the effect is only
> to be fit for animals with records (eg using at(yesrecord,1).effect).
> Any comments much appreciated.
> Cheers
> Kim
> Kim Bunter
> Research Scientist
> Animal Genetics and Breeding Unit
> University of New England
> Armidale, NSW, 2351
> Ph (ISD): -61-2-67733788
> Fax (ISD): -61-2-67733266
> email:
> --
> Asreml mailinglist archive:

Arthur Gilmour PhD        
Principal Research Scientist (Biometrics)            fax: <61> 2 6391 3899
NSW Agriculture                                           <61> 2 6391 3922
Orange Agricultural Institute             telephone work: <61> 2 6391 3815
Forest Rd, ORANGE, 2800, AUSTRALIA                  home: <61> 2 6362 0046
                                                   Cargo: <61> 2 6364 3288
Until July 2001, I will be mainly building a house at Cargo.

ASREML is still free by anonymous ftp from pub/aar on
    or point your web browser at 

To join the asreml discussion list, send the message  

To send messages to the list,

Asreml list archive:

                        <> <> <> <> <> <> <>
"Blessed are the peacemakers,
        for they shall be called sons of God"   Jesus; Matthew 5:9



Asreml mailinglist archive: