# Multivariate analysis

From: Wu, Hong-Sheng <wuh_at_WIT.EDU>
Date: Sun, 1 Nov 2009 22:52:00 -0500

Dear All,

We tried to use ASREML to model a multivariate outcome. We simulated five correlated traits on four members of independent nuclear families. Each family has four individuals, two parents and two children. We simulated the genotype of a SNP, and tried to model the relationship between the SNP and the multivariate phenotype (P1 - P5).

Below is an extract of the first and last few lines of the data file:

PID

P1

P2

P3

P4

P5

geno

1

-1.604

-0.069

-1.274

0.652

-0.978

0

2

1.023

1.933

1.972

1.352

0.438

-1

3

1.221

-0.157

0.494

0.099

-0.318

0

4

-0.332

0.08

-0.158

1.457

-0.036

-1

5

0.909

0.381

1.722

1.701

-0.226

0

6

1.031

-0.369

0.064

0.233

-0.199

1

7

0.704

0.587

1.358

1.312

0.732

1

8

0.115

1.217

0.34

-0.46

0.476

1

9

-0.204

0.024

1.836

0.658

0.26

0

10

0.444

-0.121

1.01

0.372

0.438

1

.

.

.

391

1.81

1.833

1.516

2.874

0.74

0

392

-0.632

-0.377

-0.289

-0.803

-1.201

-1

393

0.562

0.178

0.293

-0.237

-0.322

0

394

-0.835

-1.441

1.779

0.266

-0.389

1

395

0.098

-0.565

-0.536

-1.34

-1.005

0

396

-0.771

0.554

-2.058

-1.563

-0.733

1

397

0.956

-0.793

1.419

-0.589

-0.002

0

398

-0.035

-0.545

-1.247

-0.473

-0.578

0

399

-1.125

-0.554

0.387

-1.833

0.788

-1

400

-0.087

0.43

0.378

0.223

-0.887

0

Our simulation and analysis models are the same as follows: For each family, ykl, the kth measurement on the lth person in that family is distributed as follows:

where k is additive genetic effect size, with X(.) being the additive score of the coded allele. is the residual covariance matrix as the follows:

where and are the polygenic and error variances of the kth phenotype respectively, the multivariate phenotype variance within an individual, is the correlation between phenotype k and l attributable to multivariate phenotype variance, and ¥Õ is the kinship coefficient between the two person (in this case, ¨ù), and Please see the attached EXCEL file (COV.xls) for details about

We tried the following model to fit the above model. The Fortran code and corresponding outputs are pasted. For your convenience, one simulation data set (widedataset.txt) and the pedigree file (Family.ped.txt) are also attached.

Model:

(R for multivariate phenotype variance within an individual, G for polygenic and error variances)

ASReml Fortran

PID !P # individuals coded as per pedigree and with associated Ainverse

P1

P2

P3

P4

P5

geno

Family.ped.txt !skip 1 # The pedigree is read and Ainverse is formed

widedataset.txt !skip 1 !MAXIT 500 !ASUV !DDF

P1 P2 P3 P4 P5 ~ Trait.geno !r PID.Trait Trait.PID

1 2 2

400 0 ID

5 0 CORGV

10*0.1

0.1 !S2==1

PID.Trait 2

PID 0 AINV

Trait 0 CORUH 1

5*0.4

Trait.PID 1

2000 0 IDV 0.5

ASReml 2.00a [01 Jul 2006] ASReml Fortran

Build: v [27 Feb 2007] 64 bit

29 Oct 2009 18:06:02.607 32.00 Mbyte Linux (x64) asreml1

Licensed to: School of Medicine

***********************************************************

* SYNTAX change: A/B now means A A.B *

* *

* Contact support_at_asreml.co.uk for licensing and support *

***************************************************** ARG *

Folder: /data/home/wuh/ASReml

PID !P

A-inverse retrieved from ainverse.bin

PEDIGREE [Family.ped.txt ] has 400 identities, 900 Non zero elements

QUALIFIERS: !SKIP 1 !MAXIT 500 !ASUV !DDF

Reading widedataset.txt FREE FORMAT skipping 1 lines

Multivariate analysis of P1 P2 P3 P4

Multivariate analysis of P5

Warning: The default action with missing values in multivariate data

treated as univariate is to drop the missing values.

You should instead include the mv model term if different

residual variances are fitted.

Using 400 records of 400 read

Model term Size #miss #zero MinNon0 Mean MaxNon0

1 PID !P 400 0 0 1.000 200.5 400.0

2 P1 Variate 0 0 -3.004 0.5083E-02 2.757

3 P2 Variate 0 0 -3.176 -0.7188E-02 2.351

4 P3 Variate 0 0 -2.335 0.5909E-01 3.339

5 P4 Variate 0 0 -2.990 0.5701E-01 3.110

6 P5 Variate 0 0 -3.006 0.5356E-01 3.303

7 geno 0 201 -1.000 0.7500E-02 1.000

8 Trait 5

9 Trait.geno 5 8 Trait : 5 7 geno : 1

10 PID.Trait 2000 1 PID : 400 8 Trait : 5

11 Trait.PID 2000 8 Trait : 5 1 PID : 400

400 identity

5 CORRelation 0.1000 0.1000 0.1000 0.1000 0.1000 0.1000

0.1000 0.1000 0.1000 0.1000 0.1000

2000 records assumed pre-sorted 5 within 400

Warning: Unrecognised qualifier at character 5 !S2==1

400 Ainverse

5 CORRelation 1.0000 0.4000 0.4000 0.4000 0.4000 0.4000

Structure for PID.Trait has 2000 levels defined

2000 identity 0.5000

Structure for Trait.PID has 2000 levels defined

Forming 4005 equations: 5 dense.

Initial updates will be shrunk by factor 0.045

Notice: Algebraic ANOVA Denominator DF calculation is not available

Numerical derivatives will be used.

1 LogL=-768.581 S2= 1.0198 1995 df

Notice: 2 singularities appeared in Average Information matrix

This could be a problem of scale or a problem with the model.

It is preferable to revise the model to remove the singularity.

Specify !AISING qualifier to force the job to continue.

Source Model terms Gamma Component Comp/SE % C

Variance 2000 1995 1.00000 1.01512 0.00 0 S

Residual CORRelat 5 0.100000 0.100000 0.00 0 S

Residual CORRelat 5 -0.272305E-01 -0.272305E-01 -0.10 -17 U

Residual CORRelat 5 -0.276731E-01 -0.276731E-01 -0.10 -17 U

Residual CORRelat 5 -0.399757E-01 -0.399757E-01 -0.15 -18 U

Residual CORRelat 5 0.150869E-01 0.150869E-01 0.06 -12 U

Residual CORRelat 5 -0.237994E-01 -0.237994E-01 -0.09 -16 U

Residual CORRelat 5 0.614697E-02 0.614697E-02 0.02 -13 U

Residual CORRelat 5 0.181364E-01 0.181364E-01 0.07 -11 U

Residual CORRelat 5 -0.273173E-01 -0.273173E-01 -0.10 -16 U

Residual CORRelat 5 -0.351864E-01 -0.351864E-01 -0.13 -17 U

Residual CORRelat 5 0.501177 0.508752 0.96 61 U

PID.Trait CORRelat 5 0.978676 0.978676 15.16 -27 U

PID.Trait CORRelat 5 0.400151 0.406199 4.58 0 U

PID.Trait CORRelat 5 0.361202 0.366662 3.82 -32 U

PID.Trait CORRelat 5 0.414570 0.420837 4.90 14 U

PID.Trait CORRelat 5 0.429757 0.436253 5.24 29 U

PID.Trait CORRelat 5 0.408595 0.414771 4.76 8 U

Trait.PID identity 2000 0.905429E-01 0.919115E-01 0.18 -65 U

Warning: Code B - fixed at a boundary (!GP) F - fixed by user

? - liable to change from P to B P - positive definite

C - Constrained by user (!VCC) U - unbounded

S - Singular Information matrix

S means there is no information in the data for this parameter.

Very small components with Comp/SE ratios of zero sometimes indicate poor

scaling. Consider rescaling the design matrix in such cases.

Analysis of Variance NumDF DenDF F_inc Prob

9 Trait.geno 5 NA 2.84 NA

Notice: The DenDF values are calculated ignoring fixed/boundary/singular

variance parameters using numerical derivatives.

Estimate Standard Error T-value T-prev

9 Trait.geno

1 0.248071 0.736959E-01 3.37

2 0.308683E-01 0.733995E-01 0.42 -2.82

3 0.254145E-01 0.737932E-01 0.34 -0.07

4 0.457996E-01 0.739074E-01 0.62 0.26

5 0.287065E-01 0.737565E-01 0.39 -0.22

10 PID.Trait 2000 effects fitted

11 Trait.PID 2000 effects fitted

2 possible outliers: in section 11 (see .res file)

Finished: 29 Oct 2009 18:06:03.228 Singularity in Average Information Matrix

The questions we have are,

1. Does this program reflect our model correctly? If not, can you suggest a better one?

2. How to fix the singularity issue?

Thanks for your help. If you need more information, please feel free to let us know.

Hongsheng

• application/vnd.ms-excel attachment: COV.xls
Received on Mon Nov 01 2009 - 22:52:00 EST

This webpage is part of the ASReml-l discussion list archives 2004-2010. More information on ASReml can be found at the VSN website. This discussion list is now deprecated - please use the VSN forum for discussion on ASReml. (These online archives were generated using the hypermail package.)