*To*: kbunter@metz.une.edu.au
*Subject*: Re: DF when there are missing factor levels
*From*: Arthur Gilmour <gilmoua@agric.nsw.gov.au>
*Date*: Wed, 30 May 2001 16:23:57 +1000 (EST)

Dear Kim, I gather the response variable is complete so I am unclear as to what it means 'a particular effect is recorded' unless you mean that the factor has 4 possible states (Say Red Grn Ylw Blu) but the state is not recorded on a proportion of the data creating a fifth state (UNKNOWN). So there are actually five levels of the factor and therefore 4 DF. The fact that 4 are explicitly coded and the fifth is missing just means that you are formally constraining the 5 levels so that the UNKNOWN has an effect of zero and the other 4 are then calculated relative to the UNKNOWN group. You would expect that the UNKNOWNs would be a weighted averaage of the 4 real levels. So, the average of the 4 real effects shoulkd be near zero. You might want to force the UNKNOWNs to be at the average of the others in which case you could use c(FACTOR) and only fit 3 degrees of freedom. If FACTOR has a large effect, you might expect the UNKNOWN animals to have an inflated residual variance. Arthur > > Hi all, > > we have been over the DF question before, but I have another result I am > curious about. > > The problem: only a subsection of the full data has a particular effect > recorded. We wish to estimate this effect without subsetting the data, > given that other effects of significance are estimated from the complete > data, and these estimates behave poorly in the subset. So, we wish to > analyse a data set where factor levels are missing for all animals without > this effect. > > Previously, Arthur indicated that missing factor levels are treated as a > zero level. In our case we have four factor levels (+ the one that is > missing). The DF reported are 4 instead of three, and all levels have a > non-zero solution reported. No equations are generated for level zero, > which probably explains why there is no zero solution for level zero. > > How are these results to be interpreted? Are the solutions meaningful, and > what exactly do they represent a deviation from for the effect of interest? > I can't really see that we should have four degrees of freedom for > starters, or that you can generate a legitimate DF by having missing > records. There is no equation fitting a dummy effect for all animals where > the level is missing (unless this became one of the singularities > reported?). We get identical solutions when we specify the effect is only > to be fit for animals with records (eg using at(yesrecord,1).effect). > > Any comments much appreciated. > > Cheers > > Kim > Kim Bunter > Research Scientist > Animal Genetics and Breeding Unit > University of New England > Armidale, NSW, 2351 > AUSTRALIA

Arthur Gilmour PhD
Principal Research Scientist (Biometrics)
NSW Agriculture
Orange Agricultural Institute

