Multivariate analysis

From: Wu, Hong-Sheng <wuh_at_WIT.EDU>
Date: Sun, 1 Nov 2009 22:52:00 -0500

Dear All,

We tried to use ASREML to model a multivariate outcome. We simulated five correlated traits on four members of independent nuclear families. Each family has four individuals, two parents and two children. We simulated the genotype of a SNP, and tried to model the relationship between the SNP and the multivariate phenotype (P1 - P5).

Below is an extract of the first and last few lines of the data file:

 

PID

P1

P2

P3

P4

P5

geno

1

-1.604

-0.069

-1.274

0.652

-0.978

0

2

1.023

1.933

1.972

1.352

0.438

-1

3

1.221

-0.157

0.494

0.099

-0.318

0

4

-0.332

0.08

-0.158

1.457

-0.036

-1

5

0.909

0.381

1.722

1.701

-0.226

0

6

1.031

-0.369

0.064

0.233

-0.199

1

7

0.704

0.587

1.358

1.312

0.732

1

8

0.115

1.217

0.34

-0.46

0.476

1

9

-0.204

0.024

1.836

0.658

0.26

0

10

0.444

-0.121

1.01

0.372

0.438

1

.

.

.

                                                
391

1.81

1.833

1.516

2.874

0.74

0

392

-0.632

-0.377

-0.289

-0.803

-1.201

-1

393

0.562

0.178

0.293

-0.237

-0.322

0

394

-0.835

-1.441

1.779

0.266

-0.389

1

395

0.098

-0.565

-0.536

-1.34

-1.005

0

396

-0.771

0.554

-2.058

-1.563

-0.733

1

397

0.956

-0.793

1.419

-0.589

-0.002

0

398

-0.035

-0.545

-1.247

-0.473

-0.578

0

399

-1.125

-0.554

0.387

-1.833

0.788

-1

400

-0.087

0.43

0.378

0.223

-0.887

0

 

Our simulation and analysis models are the same as follows: For each family, ykl, the kth measurement on the lth person in that family is distributed as follows:

 

where k is additive genetic effect size, with X(.) being the additive score of the coded allele. is the residual covariance matrix as the follows:

 

where and are the polygenic and error variances of the kth phenotype respectively, the multivariate phenotype variance within an individual, is the correlation between phenotype k and l attributable to multivariate phenotype variance, and is the kinship coefficient between the two person (in this case, ), and Please see the attached EXCEL file (COV.xls) for details about

We tried the following model to fit the above model. The Fortran code and corresponding outputs are pasted. For your convenience, one simulation data set (widedataset.txt) and the pedigree file (Family.ped.txt) are also attached.

Model:

(R for multivariate phenotype variance within an individual, G for polygenic and error variances)

 

ASReml Fortran

 PID !P # individuals coded as per pedigree and with associated Ainverse

 P1

 P2

 P3

 P4

 P5

 geno

Family.ped.txt !skip 1 # The pedigree is read and Ainverse is formed

widedataset.txt !skip 1 !MAXIT 500 !ASUV !DDF

P1 P2 P3 P4 P5 ~ Trait.geno !r PID.Trait Trait.PID

1 2 2

400 0 ID

5 0 CORGV

10*0.1

0.1 !S2==1

PID.Trait 2

PID 0 AINV

Trait 0 CORUH 1

5*0.4

Trait.PID 1

2000 0 IDV 0.5

 

 

ASReml 2.00a [01 Jul 2006] ASReml Fortran

     Build: v [27 Feb 2007] 64 bit

 29 Oct 2009 18:06:02.607 32.00 Mbyte Linux (x64) asreml1

 Licensed to: School of Medicine

 ***********************************************************

 * SYNTAX change: A/B now means A A.B *

 * *

 * Contact support_at_asreml.co.uk for licensing and support *

 ***************************************************** ARG *

 Folder: /data/home/wuh/ASReml

  PID !P

 A-inverse retrieved from ainverse.bin

 PEDIGREE [Family.ped.txt ] has 400 identities, 900 Non zero elements

 QUALIFIERS: !SKIP 1 !MAXIT 500 !ASUV !DDF

 Reading widedataset.txt FREE FORMAT skipping 1 lines

 

 Multivariate analysis of P1 P2 P3 P4

 

 Multivariate analysis of P5

 

 Warning: The default action with missing values in multivariate data

          treated as univariate is to drop the missing values.

          You should instead include the mv model term if different

          residual variances are fitted.

 Using 400 records of 400 read

 

  Model term Size #miss #zero MinNon0 Mean MaxNon0

   1 PID !P 400 0 0 1.000 200.5 400.0

   2 P1 Variate 0 0 -3.004 0.5083E-02 2.757

   3 P2 Variate 0 0 -3.176 -0.7188E-02 2.351

   4 P3 Variate 0 0 -2.335 0.5909E-01 3.339

   5 P4 Variate 0 0 -2.990 0.5701E-01 3.110

   6 P5 Variate 0 0 -3.006 0.5356E-01 3.303

   7 geno 0 201 -1.000 0.7500E-02 1.000

   8 Trait 5

   9 Trait.geno 5 8 Trait : 5 7 geno : 1

  10 PID.Trait 2000 1 PID : 400 8 Trait : 5

  11 Trait.PID 2000 8 Trait : 5 1 PID : 400

    400 identity

      5 CORRelation 0.1000 0.1000 0.1000 0.1000 0.1000 0.1000

    0.1000 0.1000 0.1000 0.1000 0.1000

    2000 records assumed pre-sorted 5 within 400

 Warning: Unrecognised qualifier at character 5 !S2==1

    400 Ainverse

      5 CORRelation 1.0000 0.4000 0.4000 0.4000 0.4000 0.4000

 

 Structure for PID.Trait has 2000 levels defined

   2000 identity 0.5000

 Structure for Trait.PID has 2000 levels defined

 Forming 4005 equations: 5 dense.

 Initial updates will be shrunk by factor 0.045

 Notice: Algebraic ANOVA Denominator DF calculation is not available

         Numerical derivatives will be used.

   1 LogL=-768.581 S2= 1.0198 1995 df

 Notice: 2 singularities appeared in Average Information matrix

          This could be a problem of scale or a problem with the model.

          It is preferable to revise the model to remove the singularity.

          Specify !AISING qualifier to force the job to continue.

 

 Source Model terms Gamma Component Comp/SE % C

 Variance 2000 1995 1.00000 1.01512 0.00 0 S

 Residual CORRelat 5 0.100000 0.100000 0.00 0 S

 Residual CORRelat 5 -0.272305E-01 -0.272305E-01 -0.10 -17 U

 Residual CORRelat 5 -0.276731E-01 -0.276731E-01 -0.10 -17 U

 Residual CORRelat 5 -0.399757E-01 -0.399757E-01 -0.15 -18 U

 Residual CORRelat 5 0.150869E-01 0.150869E-01 0.06 -12 U

 Residual CORRelat 5 -0.237994E-01 -0.237994E-01 -0.09 -16 U

 Residual CORRelat 5 0.614697E-02 0.614697E-02 0.02 -13 U

 Residual CORRelat 5 0.181364E-01 0.181364E-01 0.07 -11 U

 Residual CORRelat 5 -0.273173E-01 -0.273173E-01 -0.10 -16 U

 Residual CORRelat 5 -0.351864E-01 -0.351864E-01 -0.13 -17 U

 Residual CORRelat 5 0.501177 0.508752 0.96 61 U

 PID.Trait CORRelat 5 0.978676 0.978676 15.16 -27 U

 PID.Trait CORRelat 5 0.400151 0.406199 4.58 0 U

 PID.Trait CORRelat 5 0.361202 0.366662 3.82 -32 U

 PID.Trait CORRelat 5 0.414570 0.420837 4.90 14 U

 PID.Trait CORRelat 5 0.429757 0.436253 5.24 29 U

 PID.Trait CORRelat 5 0.408595 0.414771 4.76 8 U

 Trait.PID identity 2000 0.905429E-01 0.919115E-01 0.18 -65 U

 Warning: Code B - fixed at a boundary (!GP) F - fixed by user

               ? - liable to change from P to B P - positive definite

               C - Constrained by user (!VCC) U - unbounded

               S - Singular Information matrix

 S means there is no information in the data for this parameter.

 Very small components with Comp/SE ratios of zero sometimes indicate poor

           scaling. Consider rescaling the design matrix in such cases.

 

 Analysis of Variance NumDF DenDF F_inc Prob

   9 Trait.geno 5 NA 2.84 NA

 Notice: The DenDF values are calculated ignoring fixed/boundary/singular

             variance parameters using numerical derivatives.

 

                     Estimate Standard Error T-value T-prev

   9 Trait.geno

                    1 0.248071 0.736959E-01 3.37

                    2 0.308683E-01 0.733995E-01 0.42 -2.82

                    3 0.254145E-01 0.737932E-01 0.34 -0.07

                    4 0.457996E-01 0.739074E-01 0.62 0.26

                    5 0.287065E-01 0.737565E-01 0.39 -0.22

  10 PID.Trait 2000 effects fitted

  11 Trait.PID 2000 effects fitted

         2 possible outliers: in section 11 (see .res file)

 Finished: 29 Oct 2009 18:06:03.228 Singularity in Average Information Matrix

 

The questions we have are,

1. Does this program reflect our model correctly? If not, can you suggest a better one?

2. How to fix the singularity issue?

Thanks for your help. If you need more information, please feel free to let us know.

Hongsheng

 

image019.png
image020.png
image021.png
image022.png
image023.png
image024.png
image025.png
image026.png
image027.png
Received on Mon Nov 01 2009 - 22:52:00 EST

This webpage is part of the ASReml-l discussion list archives 2004-2010. More information on ASReml can be found at the VSN website. This discussion list is now deprecated - please use the VSN forum for discussion on ASReml. (These online archives were generated using the hypermail package.)