Re: Analysis of Microarray Data using ASReml

From: Arthur Gilmour <arthur.gilmour_at_AGRIC.NSW.GOV.AU>
Date: Wed, 8 Sep 2004 07:30:17 +1000

Dear Fisrek
re I am working on a gene expression (microarray) data set of 398000
observations. I fitted the following mixed model to normalize data.
log2i ~ mu dye !r slide slide*dye
The model it self is not complicated, but ASReml failed to finish the
analysis. I increased the memory allocation 3 fold (-S3). That did not
help either. I received the following error message: "Severe <157>,
Program Exception-access violation". Is there a way to overcome this
problem? Can ASReml handle this kind of large data sets?'.

*** On a PC,you can normally go up to -S8 and you probably need -S7
[1Mbyte] for this job. However, if memory is small, it may take a while
to run because
it will need to page heavily.

My second questions is about fitting a gene-by-gene model using the
normalized data. In SAS this can be done using the 'BY' statement before
the model statements as follows.

proc mixed data=&ds cl lognote;
 BY geneid ;
     class dye slide sample;
     model log2in=dye sample/outp=array.gene_res ;
     random slide;

SAS runs the model for each gene separately. However, in general, if the
data are large, the log message window overflows and it takes quite a
long time to finish the job. Is there a way to fit the same model in
ASReml for multiple genes each time for a different gene and to automate
the job? Any comments will be appreciated.

*** It depends how many lefvels of geneid you want to process. If it is
just a few,
the model could be set up using

datafile !FILTER geneid !SELECT $1
log2in ~ mu dye !r slide

[I do not understand SAS code so may not have the correct model here]

Then run this with the command line [say a .bat file or just fromthe
command propmt]

ASReml -s5r jobname 01 02 03

would do three runs [geneid==01, geneid==02, geneid==03] generating three
sets of output files. The arguments [01, 02, 03] are substituted into
the job
in place of $1 [01 the first job, 02 the second job, 03 the third job]

Use of a DIAG structure to get variance components for each GENE as
suggested by Bruce
is probably the best approach - again if I have interpreted the problem


Best wishes in the Name of Christ Jesus,
the Saviour of all who believe.

Arthur Gilmour PhD
Principal Research Scientist (Biometrics)
NSW Department of Primary Industries
Orange Agricultural Institute, Forest Rd, ORANGE, 2800, AUSTRALIA
fax: 02 6391 3899; 02 6391 3922 Australia +61
telephone work: 02 6391 3815; home: 02 6364 3288;
mobile: 04 2764 3288 (no reception at home)

ASREML website:
The ASReml discussion group
has reactivated. To join,

                        <> <> <> <> <> <> <>
I expect to be on leave 27Sept to 8 October.

This message is intended for the addressee named and may contain
confidential information. If you are not the intended recipient or
received it in error, please delete the message and notify sender. Views
expressed are those of the individual sender and are not necessarily the
views of their organisation.
Received on Wed Sep 08 2004 - 07:30:17 EST

This webpage is part of the ASReml-l discussion list archives 2004-2010. More information on ASReml can be found at the VSN website. This discussion list is now deprecated - please use the VSN forum for discussion on ASReml. (These online archives were generated using the hypermail package.)