Dear All,
We tried to use ASREML to model a multivariate outcome. We simulated five correlated traits on four members of independent nuclear families. Each family has four individuals, two parents and two children. We simulated the genotype of a SNP, and tried to model the relationship between the SNP and the multivariate phenotype (P1 - P5).
Below is an extract of the first and last few lines of the data file:
PID
P1
P2
P3
P4
P5
geno
1
-1.604
-0.069
-1.274
0.652
-0.978
0
2
1.023
1.933
1.972
1.352
0.438
-1
3
1.221
-0.157
0.494
0.099
-0.318
0
4
-0.332
0.08
-0.158
1.457
-0.036
-1
5
0.909
0.381
1.722
1.701
-0.226
0
6
1.031
-0.369
0.064
0.233
-0.199
1
7
0.704
0.587
1.358
1.312
0.732
1
8
0.115
1.217
0.34
-0.46
0.476
1
9
-0.204
0.024
1.836
0.658
0.26
0
10
0.444
-0.121
1.01
0.372
0.438
1
.
.
.
391
1.81
1.833
1.516
2.874
0.74
0
392
-0.632
-0.377
-0.289
-0.803
-1.201
-1
393
0.562
0.178
0.293
-0.237
-0.322
0
394
-0.835
-1.441
1.779
0.266
-0.389
1
395
0.098
-0.565
-0.536
-1.34
-1.005
0
396
-0.771
0.554
-2.058
-1.563
-0.733
1
397
0.956
-0.793
1.419
-0.589
-0.002
0
398
-0.035
-0.545
-1.247
-0.473
-0.578
0
399
-1.125
-0.554
0.387
-1.833
0.788
-1
400
-0.087
0.43
0.378
0.223
-0.887
0
Our simulation and analysis models are the same as follows: For each family, ykl, the kth measurement on the lth person in that family is distributed as follows:
where k is additive genetic effect size, with X(.) being the additive score of the coded allele. is the residual covariance matrix as the follows:
where and are the polygenic and error variances of the kth phenotype respectively, the multivariate phenotype variance within an individual, is the correlation between phenotype k and l attributable to multivariate phenotype variance, and ¥Õ is the kinship coefficient between the two person (in this case, ¨ù), and Please see the attached EXCEL file (COV.xls) for details about
We tried the following model to fit the above model. The Fortran code and corresponding outputs are pasted. For your convenience, one simulation data set (widedataset.txt) and the pedigree file (Family.ped.txt) are also attached.
Model:
(R for multivariate phenotype variance within an individual, G for polygenic and error variances)
ASReml Fortran
PID !P # individuals coded as per pedigree and with associated Ainverse
P1
P2
P3
P4
P5
geno
Family.ped.txt !skip 1 # The pedigree is read and Ainverse is formed
widedataset.txt !skip 1 !MAXIT 500 !ASUV !DDF
P1 P2 P3 P4 P5 ~ Trait.geno !r PID.Trait Trait.PID
1 2 2
400 0 ID
5 0 CORGV
10*0.1
0.1 !S2==1
PID.Trait 2
PID 0 AINV
Trait 0 CORUH 1
5*0.4
Trait.PID 1
2000 0 IDV 0.5
ASReml 2.00a [01 Jul 2006] ASReml Fortran
Build: v [27 Feb 2007] 64 bit
29 Oct 2009 18:06:02.607 32.00 Mbyte Linux (x64) asreml1
Licensed to: School of Medicine
***********************************************************
* SYNTAX change: A/B now means A A.B *
* *
* Contact support_at_asreml.co.uk for licensing and support *
***************************************************** ARG *
Folder: /data/home/wuh/ASReml
PID !P
A-inverse retrieved from ainverse.bin
PEDIGREE [Family.ped.txt ] has 400 identities, 900 Non zero elements
QUALIFIERS: !SKIP 1 !MAXIT 500 !ASUV !DDF
Reading widedataset.txt FREE FORMAT skipping 1 lines
Multivariate analysis of P1 P2 P3 P4
Multivariate analysis of P5
Warning: The default action with missing values in multivariate data
treated as univariate is to drop the missing values.
You should instead include the mv model term if different
residual variances are fitted.
Using 400 records of 400 read
Model term Size #miss #zero MinNon0 Mean MaxNon0
1 PID !P 400 0 0 1.000 200.5 400.0
2 P1 Variate 0 0 -3.004 0.5083E-02 2.757
3 P2 Variate 0 0 -3.176 -0.7188E-02 2.351
4 P3 Variate 0 0 -2.335 0.5909E-01 3.339
5 P4 Variate 0 0 -2.990 0.5701E-01 3.110
6 P5 Variate 0 0 -3.006 0.5356E-01 3.303
7 geno 0 201 -1.000 0.7500E-02 1.000
8 Trait 5
9 Trait.geno 5 8 Trait : 5 7 geno : 1
10 PID.Trait 2000 1 PID : 400 8 Trait : 5
11 Trait.PID 2000 8 Trait : 5 1 PID : 400
400 identity
5 CORRelation 0.1000 0.1000 0.1000 0.1000 0.1000 0.1000
0.1000 0.1000 0.1000 0.1000 0.1000
2000 records assumed pre-sorted 5 within 400
Warning: Unrecognised qualifier at character 5 !S2==1
400 Ainverse
5 CORRelation 1.0000 0.4000 0.4000 0.4000 0.4000 0.4000
Structure for PID.Trait has 2000 levels defined
2000 identity 0.5000
Structure for Trait.PID has 2000 levels defined
Forming 4005 equations: 5 dense.
Initial updates will be shrunk by factor 0.045
Notice: Algebraic ANOVA Denominator DF calculation is not available
Numerical derivatives will be used.
1 LogL=-768.581 S2= 1.0198 1995 df
Notice: 2 singularities appeared in Average Information matrix
This could be a problem of scale or a problem with the model.
It is preferable to revise the model to remove the singularity.
Specify !AISING qualifier to force the job to continue.
Source Model terms Gamma Component Comp/SE % C
Variance 2000 1995 1.00000 1.01512 0.00 0 S
Residual CORRelat 5 0.100000 0.100000 0.00 0 S
Residual CORRelat 5 -0.272305E-01 -0.272305E-01 -0.10 -17 U
Residual CORRelat 5 -0.276731E-01 -0.276731E-01 -0.10 -17 U
Residual CORRelat 5 -0.399757E-01 -0.399757E-01 -0.15 -18 U
Residual CORRelat 5 0.150869E-01 0.150869E-01 0.06 -12 U
Residual CORRelat 5 -0.237994E-01 -0.237994E-01 -0.09 -16 U
Residual CORRelat 5 0.614697E-02 0.614697E-02 0.02 -13 U
Residual CORRelat 5 0.181364E-01 0.181364E-01 0.07 -11 U
Residual CORRelat 5 -0.273173E-01 -0.273173E-01 -0.10 -16 U
Residual CORRelat 5 -0.351864E-01 -0.351864E-01 -0.13 -17 U
Residual CORRelat 5 0.501177 0.508752 0.96 61 U
PID.Trait CORRelat 5 0.978676 0.978676 15.16 -27 U
PID.Trait CORRelat 5 0.400151 0.406199 4.58 0 U
PID.Trait CORRelat 5 0.361202 0.366662 3.82 -32 U
PID.Trait CORRelat 5 0.414570 0.420837 4.90 14 U
PID.Trait CORRelat 5 0.429757 0.436253 5.24 29 U
PID.Trait CORRelat 5 0.408595 0.414771 4.76 8 U
Trait.PID identity 2000 0.905429E-01 0.919115E-01 0.18 -65 U
Warning: Code B - fixed at a boundary (!GP) F - fixed by user
? - liable to change from P to B P - positive definite
C - Constrained by user (!VCC) U - unbounded
S - Singular Information matrix
S means there is no information in the data for this parameter.
Very small components with Comp/SE ratios of zero sometimes indicate poor
scaling. Consider rescaling the design matrix in such cases.
Analysis of Variance NumDF DenDF F_inc Prob
9 Trait.geno 5 NA 2.84 NA
Notice: The DenDF values are calculated ignoring fixed/boundary/singular
variance parameters using numerical derivatives.
Estimate Standard Error T-value T-prev
9 Trait.geno
1 0.248071 0.736959E-01 3.37
2 0.308683E-01 0.733995E-01 0.42 -2.82
3 0.254145E-01 0.737932E-01 0.34 -0.07
4 0.457996E-01 0.739074E-01 0.62 0.26
5 0.287065E-01 0.737565E-01 0.39 -0.22
10 PID.Trait 2000 effects fitted
11 Trait.PID 2000 effects fitted
2 possible outliers: in section 11 (see .res file)
Finished: 29 Oct 2009 18:06:03.228 Singularity in Average Information Matrix
The questions we have are,
1. Does this program reflect our model correctly? If not, can you suggest a better one?
2. How to fix the singularity issue?
Thanks for your help. If you need more information, please feel free to let us know.
Hongsheng
This webpage is part of the ASReml-l discussion list archives 2004-2010. More information on ASReml can be found at the VSN website. This discussion list is now deprecated - please use the VSN forum for discussion on ASReml. (These online archives were generated using the hypermail package.)