Re: editing outliers

To: kbunter@metz.une.edu.au
Subject: Re: editing outliers
From: Arthur Gilmour <gilmoua@ornsun.agric.nsw.gov.au>
Date: Thu, 21 Jan 1999 14:14:20 +1100 (EST)
Cc: asreml@ram.chiswick.anprod.csiro.au
Reply-To: Arthur Gilmour <gilmoua@ornsun.agric.nsw.gov.au>
Sender: asreml-owner@ram.chiswick.anprod.csiro.au

Dear Kim

Best wishes for 1999
> 
> Hi all,
> 
> I am interested in using asremls feature of identifying outliers to edit my
> data - rather than editing my data prior to analyses. So - I thought I
> might canvas peoples ideas about what is the most appropriate strategy!
> 
> For example, when you are developing a model for analyses, can you use the
> number of outliers as an indication of whether your model is getting better
> or worse? (in the absence of R2 values and assuming the same data of course)

NO.  This would not work because the criterion for detecting outliers would
change between models.  ASREML takes the average absolute residual and
calculates a criterion of 3 standard deviaitons based on that figure
and assuming independent nomality.  Residuals are however not independent
and often not normal.  Changing the model would change the criterion
so this would not work to assess fit of the model.

> 
> If you know that your raw data values lie within a sensible distribution
> (assuming close to normal distribution), should you then remove outliers
> based on their residual solutions once you have the appropriate model
> established. (What came first - the best model or the identification of
> outliers?)

ASREML only identifies possible outliers.  
Such points should not be automatically deleted.
They should be investigated and deleted if you conclude that the
data value is not a plausible value.  

If you drop all the 'outliers' and repeat the analysis, you are likely 
to get further 'outliers' identified.  Thus you need to establish the
value is implausible.

In most cases that really matter, genuine outliers will be evident in the 
raw data as well as in the residuals.  Outliers are easier
to detect when there is good replication.  Data plots are often useful
in determining if a point is an outlier.


> 
> I know the usual approach is to edit your data before analyses based on raw
> values and perhaps within levels of fixed effects if things are getting
> hairy. However, this editing is usually done with no knowledge of animal
> (random) effects, and when you have unbalanced data it seems to me that
> using asreml to identify outliers (fitting both fixed and random effects
> simultaneously) may be a better option. Otherwise, I would use SAS
> facilities for the fixed effect model development, and asreml to include
> random effects.
>
I would just use ASREML but then I am a bit one-eyed.  SAS, Genstat and Splus
probably have tools like QQ plots to help investigate the normality
of the residuals.

Next by Date: No Subject
Next by thread: No Subject
Index(es):
- Date
- Thread