Re: Binary Data with missing values

From: Bruce Southey <bsouthey_at_GMAIL.COM>
Date: Wed, 8 Jul 2009 21:07:21 -0500

On Wed, Jul 8, 2009 at 5:16 AM, Clempson, Andrew
Martin<> wrote:
> Dear All
> I wonder if someone could provide advice on dealing with missing values in
> binary data models.

Avoid it! :-)

> I am in the process of performing association studies between SNPs and
> pregnancy status in cows. I have genotypic information on approximately 90%
> of animals and pregnancy status data on approximately 80% of animals (coded
> as 0 = pregnant, 1 = not pregnant).

First some observations regarding missing records without considering the SNPs:
You probably should remove all animals with missing pregnancy status
from your data file but not the pedigree file. Just ensures the
correct data will be used.
If you have repeated records on the same animal, you must fit that
otherwise the genetic parameters are biased. If few animal have
repeated records, you probably need to delete the repeated records.
Since you are fitting an animal model, you need pedigree relationships
for all animals. If you have animals with no or limited genetic
relations you probably need to remove those.
Then fit your model without any SNPs to ensure that it does run and
the results are sensible (including solutions of animal effects).
Beware of the possibility for the occurrence of the extreme value

It is not a good idea to fit the actual SNPs rather you should do
association mapping (as been addressed by this list). That way you
will not have missing data due to missing SNPs.

If you really really must fit the actual SNPs, then if an SNP is very
rare then remove it because of the extreme category problem. If an SNP
is missing then you can impute it by different methods especially if
you know the location and surrounding SNPs - there is a genome now!

