Data Format and Model Design help needed
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Data Format and Model Design help needed



Hi,

I wish to use ASREML to analyse a large dataset of horse race results.  I
have previosuly tried to use DFREML for this dataset with unsatisfactory
results as it seems unable to handle the volume of data and often fails to
find results or just crashes.

I should add that I have virtually no comprehension of advanced statistics
and I am far from confident in my ability to use ASREML appropriately.  I
thus apologise for the possible naivity of the questions below but I don't
wish to misuse the product.   Accordingly if anyone is prepared to offer me
advice privately rather than through this list I would be most grateful.

Below are approximate volumes of the data I possess.

Total Horses:	100,000
Horses with race results: 60,000  (the others are Sires, Dams or Damsires)
Individual Race results:	500,000 (there is a "result" for each horse
in each race)

For each race result I have the following (conceptual) data structure:

Horse, Sire, Dam, Damsire, Age, Sex, Race Distance, Track Condition, Year,
Score1, Score2, Score3, Score4, Score5

where Race Distance is in metres, Track Condition consists of five
categories of how wet/dry the track was and Year is whether the result was
in first or second year for which I have data.
Scores 1 through 5 are assessments of the horse's performance using
different techniques such as time, earnings etc.

It should be noted that my race results are all within one generation - no
sires or dams have results.  Some sires do however also exist as damsires.

Desired analyses:
1) Heritability estimates for each technique of determining Scores
2) Correlation / regression analyses for the techniques of determining
scores
3) Estimation of any maternal effect

Questions:
1) How do I construct a datastructure (and appropriate .as file) for the
input which supports a highly variable number of raceresults per horse (1 to
about 70) plus allows the multiple regression analyses?  Do they need to be
performed separately? 
2) Will ASREML be able to handle this volume of data for these calculations?
I am currently using a midrange pc with windows.  
3) Any suggestions on how to model Race Distance?  In previous simple
analysis of variance calculations I used distance ranges but I have been
instructed to not do this if possible.  I don't consider that Distance is a
fixed effect as different horses excel at different distances, although for
assessments such as race times/ speed it clearly has a close to fixed
effect.  
4) What is the correct Pin file for calculating heritability using Sire, Dam
and Damsire simultaneously?
 
Any assistance most appreciated.

regards
Stuart Williamson

PhD student
University of Melbourne


--
Asreml mailinglist archive: http://www.chiswick.anprod.csiro.au/lists/asreml