Re: Further problem with prediction

To: "N.W. Galwey" <ngalwey@cyllene.uwa.edu.au>
Subject: Re: Further problem with prediction
From: "B. Cullis" <cullisb@wagsun.agric.nsw.gov.au>
Date: Fri, 30 Oct 1998 08:29:38 +1100 (EST)
cc: asreml@ram.chiswick.anprod.csiro.au
In-Reply-To: <199810290505.NAA13300@cyllene.uwa.edu.au>
Sender: owner-asreml@chiswick.anprod.csiro.au

Dear Nick,
We have a similar analysis in the SPLines paper we are presenting in 
South Africa. It may be useful to read it and look at the attached .as 
file that produced the analysis.

On Thu, 29 Oct 1998, N.W. Galwey wrote:

> I would be grateful for feedback on the ASREML command file and output
> embedded below.   The objective is to determine the relationship between
> field establishment (plants/m2) and grain yield (t/ha) in three varieties of
> lupin.   
> 
> I'd like to know whether the output indicates that the model I am fitting is
> appropriate.   In particular:
> 
> i.  I put two starting values after the codeword AR with the intention of
> getting first- and second-lag autocorrelations.   Is this the correct method?
yes I believe so.
> 
> ii.  If so, does the output indicate that the inclusion of second-lag
> autocorrelation gives a significantly better fit?   I am assuming that it
> does because the Compnt/StndErr values for both AR terms are quite large
> (>3).   
The most reliable way to do this is to drop the ar(2) parameter and 
examine the change in REMLlog lik. 
> 
> iii.   Why are the gamma and component values for these two terms so
> similar?   Is this just coincidence?
I presume you mean the coeffs for the row and column terms. this is no 
coincidence. They are the same. These are not components, but correlations. 

> > iv.   I don't want to fit any column effects.   So is 
the codeword IDEN
> okay?   How should I decide whether the identify matrix should be scaled?
> 
No scaling is required we are fitting a separable model to the row.column 
terms. This is usually the error term as it uniquely defines the plots. 
The model is
var(vec(e)) = \sigma^2 C_c x C_r
where the C_i's are the correlation matrices for cols and rows.



> v.  Is it okay to fit a linear effect of the variate ESTAB (and its
> interaction with Variety), to account for any overall trend, before
> including the spline in the model?
This must be done, otherwise you dont have a natural cubic smoothing 
spline (NCS). The spl() term only includes the nonlinear component. You 
dont need c(variety) in the random part, as this is only relevant for fixed 
effects.


> 
> vi.  Should the spline (and its interaction with Variety) be put in the
> random effects model?   If so, will the variance explained by the spline be
> included in the error against which the linear trend is tested? 
> 
Definitely! Yes. But my comment above stills holds. Dropping linear terms 
implies you no longer are fitting a NCS.


> vii.  Am I right in thinking that it makes no difference to the fitted value
> of a data point whether a term (e.g. spl(ESTAB)) is fitted as a fixed or a
> random effect?
> 
No. it must be a random effect. I make a few more points further down in 
the analysis which are relevant.


> The desired outcome is a curve relating yield to establishment in each
> variety, and it is therefore necessary to obtain predicted values.   I would
> like to avoid the complexities of setting up a .pin file, and the ASREML
> manual suggests adding some "missing data" and fitting them with "missing
> values" (p 86, version of 2 October 1998).   However, it seems to me that
> this approach won't work for a spatial analysis, as each observation must be
> allocated to a row and column.   I had hoped that the row and column values
> would not influence the fitted value (just as the variogram is not
> influenced by whether AR terms are fitted), but it turns out that they do
> have an effect.   Any suggestions?
> 

see further down



> __________________
> 
> 96wh14.as
> 
> Restricted branching lupin - density trials
>  Variety   3 !A
>  TDENS     6
>  Rep       5
>  PLOT     90
>  COL       2
>  ROW     101
>  THA
>  ESTAB
> 
> C:\DOCS\Dracup_M\96WH14\96WH14.dat !skip 1 !maxit 20
> 
> THA ~ mu mv c(Variety) ESTAB c(Variety).ESTAB,
>  !r Rep spl(ESTAB) c(Variety).spl(ESTAB) 
> 1 2
> ROW ROW AR .1 .1
> COL COL IDEN
> ______________________
> 96wh14.asr
> 
>   ASREML [30 Sep 1998]  Restricted branching lupin - density trials
> 
>  10/29/98 12:09:43.22       8.00 Mbyte    c:\DOCS\Dracup_M\96WH14\96wh14.as
>  QUALIFIERS: !skip 1                                                     
>  Reading C:\DOCS\Dracup_M\96WH14\96WH14.dat  FREE FORMAT skipping            1
>  lines
>  Univariate analysis of THA                 
>  Using      108 records [of     108 read from     108 lines of
> C:\DOCS\Dracup_M\96W]
>   Model term      Size Type    COL   Minimum    Mean      Maximum   #zero #miss
>    1 Variety         3 Factor    1      1     2.0000          3         0    18
>    2 TDENS           6 Factor    2     25    71.6667        125         0    18
>   WARNING - More levels in factor than expected
>    3 Rep             5 Factor    3      1     3.0000          5         0    18
>    4 PLOT           90 Factor    4      1    45.5000         90         0    18
>    5 COL             2 Factor    5      1     1.5000          2         0     0
>    6 ROW            54 Factor    6      1    27.5000         54         0     0
>    7 THA             1 Variate   7 0.7530      1.014      1.428         0    20
>    8 ESTAB           1 Covariat  8  19.60      71.19      137.6         0    18
>    9 mu              1 Constant Term
>   10 mv_estimates   20 Missing value
>   11 c(Variety)      2 Factor    1      1     2.0000          3         0    18
>   12 c(Variety).E    2 Interaction 11 c(Vari:    2    8 ESTAB         :    1
>   13 spl(ESTAB)     47 Spline    8   19.60     71.19      137.6         0    18
>   14 c(Variety).s   94 Interaction 11 c(Vari:    2   13 spl(ESTAB)    :   47
>     54  AR=AutoR    0.10    0.10
>      2  identity
>  Forming          173 equations:           27 dense
>  Initial updates will be shrunk by factor    0.548
>  LogL= 80.6926     S2= 0.11703E-01     82 df  0.10000    0.10000    0.10000    
>    1.0000    0.10000    0.10000    
>  LogL= 104.813     S2= 0.11911E-01     82 df  0.16662    0.10000E-010.10000E-01
>    1.0000    0.21445    0.27837    
>  LogL= 116.546     S2= 0.13897E-01     82 df  0.30356    0.10000E-020.10000E-02
>    1.0000    0.30181    0.35295    
>  LogL= 121.136     S2= 0.15483E-01     82 df  0.40011    0.10000E-030.10000E-03
>    1.0000    0.35163    0.34838    
>  LogL= 122.124     S2= 0.14720E-01     82 df  0.44461    0.58089E-040.10000E-04
>    1.0000    0.35465    0.31151    
>  LogL= 122.722     S2= 0.18048E-01     82 df  0.42452    0.49216E-040.10000E-06
>    1.0000    0.34659    0.41749    
>  LogL= 122.719     S2= 0.20917E-01     82 df  0.36671    0.45774E-040.10000E-06
>    1.0000    0.40451    0.39840    
>  Final parameter values                       0.56309    0.65929E-040.10000E-06
>    1.0000    0.37353    0.37798    
> 
>    Source               Model  terms     Gamma     Component    Compnt/StndErr
>   Rep                       5      5  0.563094      0.117781E-01      1.77 P  
>   spl(ESTAB)               47     47  0.659291E-04  0.137902E-05      0.75 P  
>   c(Variety).spl(ESTAB     94     94  0.100000E-06  0.209167E-08      1.86 B  
>    Variance               108     82   1.00000      0.209167E-01      1.86 P  
>    Residual           AR=AutoR    54  0.373534      0.373534          3.20 U  
>    Residual           AR=AutoR    54  0.377980      0.377980          3.20 U  
################################################
     POINTS TO NOTE
###############################################

1. I wouldnt normally include Rep as a fixed effect in a spatial 
analysis! This would need some reasonable diagnostic to convince me that 
it was necessary. The gamma seems large so I am suspicious that there is 
something strange with the field layout. Were the replicates adjacent. Is 
the field layout correct. The data must be exhibiting large trend!!

2. the gamma for variety.spl() is zero. This could mean that ESTAB may 
need scaling or it is zero. Try rerunning the job with ESTAB /100. Gammas 
for spl() terms are scale dependant, just like regression coeff's so be 
careful.

3. There are too many knots in ESTAB (ie 47!). I presume this trial was 
designed 
as a density by variety trial, with target densities. I would try the 
target densities as input for the spline. This is an interesting, as yet 
unsolved problem. My pragmatic approach is to choose knot points for 
which there is coverage of the design space for the 'x' but only enough 
to be sensible. Marx and Eilers have done some work in this area but I 
haven't fully investigated the issue. I certainly dont think ASREML has 
it right yet. We (ie Robin and I) have some notes on how to fit models 
with more data than knots but it is not in ASREML yet. This data set is 
an exmaple of one where you should nominate knots and then use the 
results. The current strategy for these data would be to use the target 
density as a sensible 'x'.

4. Getting fitted splines!!!!!! Well this is tricky. I dont use pin files 
(sorry Arthur) I prefer to reconstruct the spline myself. The output in 
the .asr file is all you need, along with some code to generate the 
spline design matrix (Z_s) which we have, if you would like it. You can 
turn it into a GENSTAT proc if you like ie pick up the output from .asr 
and use GENSTAT to get the fitted spline. The .vrb file can be used to 
get prediction intervals.
The output below gives the values of Z_su_s for spl() and variety.spl().
You simply add these to the linear part ie for variety 1

fitspl = 157.88 - .0355 - .001966*ESTAB +.000582*ESTAB + spl() + 
var1.spl()
          mu      MERRIT    ESTAB          c(var).ESTAB   spl() v1.spl()



Ie there are bits from the usual linear part and then add the two spl() 
parts. 

4. I get nervous about ROW and COL as you have it. I always do it as I 
have it in the attached file. The data is sorted into rows within cols 
and then I cant be worried about whether ASREML is fitting the model I 
want.


Hope this helps


cheers
brian





>  WARN ING: Code B - fixed at a boundary (!GP)
>                C - Constrained by user (!CON)
>                S - Singular Information matrix
>  S means there is no information in the data for this parameter.
>  Very small components with Comp/SE ratios of zero sometimes indicate poor
>   scaling.  Consider rescaling the design matrix in such cases
>  Fitted Spline  49 (X) for spl(ESTAB)          
>     19.600    21.800    24.400    27.050    28.200    29.800    32.800    37.400
>     41.200    42.900    44.000    45.200    46.600    48.400    50.350    52.400
>     54.200    56.000    57.700    59.000    60.600    63.350    64.733    69.900
>     71.400    72.800    76.000    79.500    80.800    82.400    85.000    87.267
>     89.400    91.600    93.400    95.600    98.600   100.000   103.800   105.800
>    111.000   115.267   116.900   118.800   121.000   122.700   128.400   130.800
>    137.600
>  Fitted Spline  49 (Y) for spl(ESTAB)            1
>   0.028825  0.025985  0.022633  0.019228  0.017760  0.015734  0.011984  0.006362
>   0.001891 -0.000024 -0.001228 -0.002504 -0.003938 -0.005686 -0.007444 -0.009122
>  -0.010445 -0.011620 -0.012588 -0.013231 -0.013901 -0.014735 -0.015003 -0.015187
>  -0.015038 -0.014829 -0.014125 -0.013057 -0.012588 -0.011960 -0.010830 -0.009747
>  -0.008657 -0.007466 -0.006443 -0.005138 -0.003267 -0.002361  0.000183  0.001561
>   0.005231  0.008293  0.009465  0.010819  0.012369  0.013552  0.017448  0.019085
>   0.023756
>  ',                                              ,    0.03
>    '',,                                        ,'     0.02
>        '                                  ,,,''       0.01
>         ',,                            ,,'            0.00
>            ''',,                  ,,,''               0.00
>                 ''',,,,,,,,,,,''''                   -0.01
>                                                      -0.02
>                                                      -0.03
>  Fitted Spline  49 (Y) for c(Variety).spl(ESTAB  1
>  -0.74E-06 -0.58E-06 -0.40E-06 -0.23E-06 -0.16E-06 -0.58E-07  0.14E-06  0.50E-06
>   0.89E-06  0.11E-05  0.13E-05  0.15E-05  0.17E-05  0.20E-05  0.24E-05  0.29E-05
>   0.32E-05  0.35E-05  0.38E-05  0.39E-05  0.39E-05  0.37E-05  0.35E-05  0.20E-05
>   0.14E-05  0.73E-06 -0.97E-06 -0.30E-05 -0.37E-05 -0.45E-05 -0.57E-05 -0.65E-05
>  -0.70E-05 -0.74E-05 -0.76E-05 -0.76E-05 -0.72E-05 -0.69E-05 -0.59E-05 -0.51E-05
>  -0.26E-05 -0.24E-06  0.73E-06  0.19E-05  0.32E-05  0.42E-05  0.76E-05  0.91E-05
>   0.13E-04
>  '''''''''''''''''''''''''''''''''''''''''''''''''    0.00
>                                                       0.00
>                                                       0.00
>                                                      -0.01
>                                                      -0.01
>                                                      -0.01
>                                                      -0.01
>                                                      -0.01
>  Fitted Spline  49 (Y) for c(Variety).spl(ESTAB  2
>  -0.61E-05 -0.52E-05 -0.41E-05 -0.31E-05 -0.26E-05 -0.20E-05 -0.93E-06  0.40E-06
>   0.11E-05  0.13E-05  0.14E-05  0.16E-05  0.17E-05  0.17E-05  0.18E-05  0.17E-05
>   0.16E-05  0.15E-05  0.13E-05  0.12E-05  0.94E-06  0.53E-06  0.33E-06 -0.14E-06
>  -0.18E-06 -0.17E-06  0.13E-07  0.44E-06  0.65E-06  0.93E-06  0.14E-05  0.19E-05
>   0.22E-05  0.26E-05  0.28E-05  0.29E-05  0.30E-05  0.30E-05  0.27E-05  0.24E-05
>   0.13E-05  0.12E-06 -0.38E-06 -0.10E-05 -0.18E-05 -0.24E-05 -0.46E-05 -0.55E-05
>  -0.82E-05
>  '''''''''''''''''''''''''''''''''''''''''''''''''    0.00
>                                                       0.00
>                                                       0.00
>                                                      -0.01
>                                                      -0.01
>                                                      -0.01
>                                                      -0.01
>                                                      -0.01
>                      Solution       Standard Error    T-value     T-prev
>   12 c(Variety).ESTAB         2        3.46        3.46             [DF
> F_inc F_all]
>                     1   0.582442E-03   0.385741E-03      1.51
>                     2  -0.101474E-02   0.388952E-03     -2.61     -2.41
>    8 ESTAB                    1       23.84       24.87             [DF
> F_inc F_all]
>                     3  -0.196641E-02   0.394323E-03     -4.99
>   11 c(Variety)               2        0.37        2.08  0.5136E-01 [DF F_i
> F_a SED]
>  MERRIT                -0.305525E-01   0.302952E-01     -1.01
>  85SO46-37              0.607833E-01   0.298523E-01      2.04      1.73
>   10 mv_estimates            20       13.49       14.01             [DF
> F_inc F_all]
>                     6  -0.882628       0.916597E-01     -9.63
>                     7  -0.647879       0.941482E-01     -6.88      1.92
>                     8   -1.07879       0.116429         -9.27     -2.83
>                     9   -1.11726       0.122686         -9.11     -0.34
>                    10   -1.10920       0.133636         -8.30      0.07
>                    11   -1.12127       0.138845         -8.08     -0.10
>                    12   -1.12293       0.143739         -7.81     -0.01
>                    13   -1.12842       0.147031         -7.67     -0.05
>                    14   -1.13130       0.149708         -7.56     -0.02
>                    15   -1.13465       0.151712         -7.48     -0.03
>                    16   -1.13715       0.153298         -7.42     -0.02
>                    17   -1.13950       0.154532         -7.37     -0.02
>                    18   -1.14145       0.155512         -7.34     -0.02
>                    19   -1.14317       0.156290         -7.31     -0.01
>                    20   -1.14464       0.156913         -7.29     -0.01
>                    21   -1.14592       0.157416         -7.28     -0.01
>                    22   -1.14703       0.157823         -7.27     -0.01
>                    23   -1.14799       0.158154         -7.26     -0.01
>                    24   -1.14882       0.158426         -7.25     -0.01
>                    25   -1.14953       0.158650         -7.25     -0.01
>    9 mu                       1      157.88      287.57             [DF
> F_inc F_all]
>                    26    1.15412       0.680586E-01     16.96
>    3 Rep                      5 effects fitted
>   13 spl(ESTAB)              47 effects fitted
>   14 c(Variety).spl(ESTAB    94 effects fitted
>  Finished: 12:10:59.39                LogL Converged
> 
> _____________________________________________________________________
> N.W. Galwey,
> Faculty of Agriculture,
> University of Western Australia,
> Nedlands, WA 6709, Australia.
> 
> Tel.: +61 9 380 1959 (direct line)
>       +61 9 380 2554 (switchboard)
> Fax:  +61 9 380 1108
> 
> 

...............................................................................
      Brian Cullis                       Tel: 02 6938 1855
      NSW Agriculture                    Fax: 02 6938 1809
      Wagga Agricultural Institute     email: brian.cullis@agric.nsw.gov.au
      Pine Gully Rd
      WAGGA WAGGA  NSW 2650

met cowl
 num 1
 row 27
 column 6
 variety 9 !A
 seedrate 7 !I
 lincol 1
 linrow 1
 rcol 6
 rrow 27
 fixrow 4
 x 1
 site 11
 yield 1
 splrow 1
 trt 62 !A
cowlred.asd !skip 1 !dense 99
yield ~ mu mv c(site) x site.x,
site|2/lincol site|5/lincol,
site|6/lincol site|8/lincol,
site|9/lincol site|10/lincol,
site|3/linrow site|5/linrow,
site|7/linrow site|8/linrow,
site|2/c(fixrow) site|3/c(fixrow),
site|5/c(fixrow) site|7/c(fixrow),
site|8/c(fixrow) site|9/c(fixrow) site|11/c(fixrow),
!r spl(x) .00349 ,
variety .002 va.x .00046 va.spl(x) .000757,
si.va .018 si.va.x .00021,
site.spl(x) .000450 -si.va.spl(x) .00002,
-seedrate .001 -va.se .001 -si.se .001 -si.va.se .000113,
site|3/rrow .001 site|6/rrow .0017,
site|7/rrow .00046,
site|9/rrow .0045,
site|1/rcol .01 site|2/rcol .025 site|3/rcol .082,
-site|7/rcol .0031 site|9/rcol .0042 site|10/rcol .00043,
-rcol -rrow -rcol.rrow,
si|5/rc.rr .0042,
si|7/rc.rr .013,
si|8/rc.rr .013,
-si|10/rc.rr .0024
11 2 2
6 column AR .15 !S2=.034
16 row AR .21
6 column AR .16 !S2=.039
16 row AR .31
4 column AR .35 !S2=.067
16 row AR .17
6 column AR .04 !S2=.035
16 row AR .08
6 column AR .07 !S2=.018
16 row AR .58
6 column AR .22 !S2=.024
27 row AR .34
6 column AR .51 !S2=.041
27 row AR .92
6 column AR .33 !S2=.020
27 row AR .74
6 column AR -.02 !S2=.018
27 row AR .24
6 column AR .18 !S2=.0088
27 row AR .53
6 column AR .082 !S2=.054
27 row AR .21
variety 2
2 0 CORR .67 .0053 .000409
9 0 0
si.va 2
2 0 CORR .995 .018 .000178
99 0 0

References:
- Further problem with prediction
  - From: "N.W. Galwey" <ngalwey@cyllene.uwa.edu.au>

Prev by Date: Re: Thanks and another question
Next by Date: Re: Further problem with prediction
Prev by thread: Further problem with prediction
Next by thread: Re: Further problem with prediction
Index(es):
- Date
- Thread