[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

*To*: kbunter@metz.une.edu.au*Subject*: Re: editing outliers*From*: Arthur Gilmour <gilmoua@ornsun.agric.nsw.gov.au>*Date*: Thu, 21 Jan 1999 14:14:20 +1100 (EST)*Cc*: asreml@ram.chiswick.anprod.csiro.au*Reply-To*: Arthur Gilmour <gilmoua@ornsun.agric.nsw.gov.au>*Sender*: asreml-owner@ram.chiswick.anprod.csiro.au

Dear Kim Best wishes for 1999 > > Hi all, > > I am interested in using asremls feature of identifying outliers to edit my > data - rather than editing my data prior to analyses. So - I thought I > might canvas peoples ideas about what is the most appropriate strategy! > > For example, when you are developing a model for analyses, can you use the > number of outliers as an indication of whether your model is getting better > or worse? (in the absence of R2 values and assuming the same data of course) NO. This would not work because the criterion for detecting outliers would change between models. ASREML takes the average absolute residual and calculates a criterion of 3 standard deviaitons based on that figure and assuming independent nomality. Residuals are however not independent and often not normal. Changing the model would change the criterion so this would not work to assess fit of the model. > > If you know that your raw data values lie within a sensible distribution > (assuming close to normal distribution), should you then remove outliers > based on their residual solutions once you have the appropriate model > established. (What came first - the best model or the identification of > outliers?) ASREML only identifies possible outliers. Such points should not be automatically deleted. They should be investigated and deleted if you conclude that the data value is not a plausible value. If you drop all the 'outliers' and repeat the analysis, you are likely to get further 'outliers' identified. Thus you need to establish the value is implausible. In most cases that really matter, genuine outliers will be evident in the raw data as well as in the residuals. Outliers are easier to detect when there is good replication. Data plots are often useful in determining if a point is an outlier. > > I know the usual approach is to edit your data before analyses based on raw > values and perhaps within levels of fixed effects if things are getting > hairy. However, this editing is usually done with no knowledge of animal > (random) effects, and when you have unbalanced data it seems to me that > using asreml to identify outliers (fitting both fixed and random effects > simultaneously) may be a better option. Otherwise, I would use SAS > facilities for the fixed effect model development, and asreml to include > random effects. > I would just use ASREML but then I am a bit one-eyed. SAS, Genstat and Splus probably have tools like QQ plots to help investigate the normality of the residuals.

- Next by Date:
**No Subject** - Next by thread:
**No Subject** - Index(es):