Programme

MANTAWORKS08 (acronym for the workshop):

The plan is for the workshop to start at 10am on Thursday 29th May and finish at 4:05pm on Friday 30th May 2008. All talks will take place in G.209 in the Alan Turing Building at the University of Manchester.

Thursday 29^th May 2008

9:30-10:00: Registration

Session 1: Chaired by Prof Jianxin Pan (The University of Manchester)

10:00-10:05: Opening Remarks (Jianxin Pan/Tapio Nummi)

10:05-10:40: Professor Graham Dunn (School of Community Based Medicine, The

University of Manchester). Talk: Modelling treatment-effect heterogeneity

in complex intervention trials: mediation and moderation of psychological

treatment effects

10:40-11:15: Professor Erkki Liski (Department of Mathematics and Statistics, Tampere

University). Talk: Minimum description length model selection and

smoothing under linear mixed models

11:15-11:35: Coffee Break

Session 2: Chaired by Prof Erkki Liski (Tampere University)

11:35-12:10: Roseanne McNamee (Health Methodology Research Group, The University

of Manchester): Talk: Efficiency of two-phase designs for dealing with

measurement error

12:10-12:45: Jyrki Mottonen (Department of Mathematics and Statistics, University

of Kuopio) Talk: Multivariate nonparametric tests in a randomized

complete block design

12:45-14:00: Lunch

Session 3: Chaired by Prof Tata Subba-Rao (The University of Manchester)

14:00-14:35: Tapio Nummi (Department of Mathematics and Statistics, University of

Tampere) Talk: Analysis of Multivariate Growth Curves Using Smoothing

Splines

14:35-15:10: Peter Foster (School of Mathematics, The University of Manchester)

Talk: Smoothing non-stationary correlated data

15:10-15:30: Tea break

Session 4: Chaired by Dr Simo Puntanen (Tamprer University)

15:30-16:05: Antti Liski (Department of Signal Processing, Tampere University of

Technology) Talk: MDL knot selection for penalized splines

16:05-16:40: Tata Subba-Rao (School of Mathematics, The University of Manchester)

Talk: Frequency domain approach to linear and nonlinear regression in the

case of correlated stationary errors with applications

Friday 30th May 2008

Session 5: Chaired by Alex Donev (The University of Manchester)

10:00-10:35: Alastair Hall (School of Social Sciences, The University of Manchester)

Talk: Estimation and Testing in Models with Multiple Parameter Changes

10:35-11:10: Simo Puntanen (Department of Mathematics and Statistics, University

of Tampere) Talk: Some comments on the Watson efficiency in a linear

model

11:10-11:30: Coffee break

Session 6: Chaired by Dr Jyrki Mottonen (University of Kuopio)

11:30-12:05: Alex Donev (School of Mathematics, The University of Manchester) Talk:

Statistical Challenges in Drug Development

12:05-12:40: Jarkko Isotalo (Tampere School of Public Health, University of Tampere)

Talk: Invariance of the BLUE under the linear fixed and mixed effects

Models

12:40-14:00: Lunch

Session 7: Chaired by Dr Peter Foster (The University of Manchester)

14:00-14:35: Klaus Nordhausen (Tampere School of Public Health, University of

Tampere) Talk: Two scatter matrices for multivariate nonparametrics

14:35-15:10: Gilbert MacKenzie (Centre for Biostatistics, Limerick University, Ireland)

Talk: A Decade of Joint Mean-Covariance Modelling

15:10-15:30: Tea break

Session 8: Chaired by Dr Tapio Nummi (Tampere University)

15:30-16:05: Jianxin Pan (School of Mathematics, The University of Manchester)

Talk: Variable selection in joint modelling of mean and covariance

structures for longitudinal data

16:05: End of the workshop

Details of Talks:

1. Modelling treatment-effect heterogeneity in complex intervention trials:

mediation and moderation of psychological treatment effects

Graham Dunn

Health Methodology Research Group, University of Manchester

Randomised controlled trials (RCTs) of complex interventions (such as psychotherapy) can serve both a pragmatic and an explanatory function. Through intelligent analysis of the mediating and/or modifying effects of intervening variables we should obtain important insights about the way interventions might work. At its best the complex intervention trial will be a sophisticated clinical experiment designed to test the theories motivating the intervention and gain further understanding of the nature of the clinical problem being treated. I describe instrumental variable (IV) methods for the estimation of the ‘dose’-response effects of psychological interventions in RCTs in which there is variability in the number of sessions of therapy attended, the effect of which is modified by the strength of the therapeutic alliance between patients and their therapists. The IV methods allow for (a) hidden confounding (selection effects), (b) measurement errors and (c) that alliance is only measured in those receiving treatment.

2. Minimum description length model selection and smoothing under linear mixed models

Erkki Liski

Department of Mathematics and Statistics, University of Tampere

Models can be considered as statistical descriptions of observed data. Comparison between competing models is based on the stochastic complexity (SC) of each description. The normalized maximum likelihood form of the SC (Rissanen 1996) contains a component that may be interpreted as the parametric complexity of the model class. The SC for the data, relative to a class of suggested models, is calculated and the optimal model with the smallest SC is selected. This provides the minimum description length (MDL) model selection criterion (Rissanen 1978, 2007).

Many richly-parametrised models can be expressed as linear mixed models (LMM). Smoothing methods that use basis functions with penalisation can be formulated as ML estimators and best predictors in a LMM framework (Ruppert et al. 2003). The advent of mixed model software have made this approach successful also in the field of smoothing.

We derive a MDL model selection criterion for the Gaussian LMM and interpret the smoothing model as a LMM. In smoothing we control three modeling parameters: the degree of the regression spline, the number of knots and the smoothing parameter . A model is specified by the triple where the values for the modeling parameters, and should be determined in an optimal way. A model estimator is obtained by minimizing the selection criterion with respect to model, that is, with respect to parameters , and , using numerical optimization routines.

3. Efficiency of two-phase designs for dealing with measurement error

Roseanne McNamee

Health Methodology Research Group, University of Manchester

Measurement error can lead to attenuation of exposure-response relationships and bias in estimation of disease prevalence, for example. Accurate exposure measurement can be expensive. A two phase study design for dealing with measurement is one in which the first phase uses an error-prone exposure measure, Z, possible in conjunction with other variables while the second phase measures true exposure, X, in a subset of subjects. While there is a substantial literature on analysis of data from two-phase designs, for the researcher an important question is whether there is any gain in efficiency, relative to cost, compared to using X alone. Optimal designs and efficiency of a number of different designs – for prevalence estimation, for odds ratio estimation and for estimation of the linear regression of an outcome Y on X – will be discussed.

4. Multivariate nonparametric tests in a randomized complete block design

Jyrki Mottonen
Department of Mathematics and Statistics, University of Kuopio

Multivariate extensions of the Friedman and Page tests for the comparison of several treatments are introduced. Related unadjusted and adjusted treatment effect estimates for the multivariate response variable are also found and their properties discussed. The test statistics and estimates are analogous to the traditional univariate methods. In test constructions, the univariate ranks are replaced by multivariate spatial ranks. Asymptotic theory is developed to provide approximations for the limiting distributions of the test statistics and estimates. Limiting efficiencies of the tests and treatment effect estimates are found in the multivariate normal and distribution cases. The theory is illustrated by an example.

5. Analysis of Multivariate Growth Curves Using Smoothing Splines

Tapio Nummi

Department of Mathematics and Statistics, University of Tampere

Growth curve data arise when repeated measurements are observed on a number of individuals with an ordered dimension for occasions. Such data arise in almost all fields in which statistical models are applied. Often, for example in medicine, more than one variable are measured on each occasions. However, statistical analyses applied for such data often emphasize the exploration of one variable only. The consequence is that the information contained in the between-variables correlation structure will be lost. In this study we show how cubic smoothing splines can be applied for multivariate growth curve data. Some ideas of testing the mean curves are also presented. The analyses are illustrated by multivariate paper quality data.

6. Smoothing non-stationary correlated data

Peter Foster,

School of Mathematics, The University of Manchester

The smoothing technique we will be focussing on is local linear regression while the non-stationary covariance model we assume is the structured ante-dependence model of order one, denoted SAD(1). An expression for the asymptotic MISE of the smoother is derived and shown to be dependent on parameters defining the SAD(1) model. A non-linear least squares approach is used to nonparamtrically fit the SAD(1) model which will be described in detail. Finally, simulations show the benefits of modelling the non-
stationarioty in the data rather than erroneously assuming stationarity.

7. Minimum description length knot selection for penalized splines

Antti Liski

Department of Signal Processing, Tampere University of Technology

There exists a well known connection between penalized splines and mixed models. This connection makes it possible to exploit mixed models in the estimation of penalized splines. Liski & Liski (2008) derived the model selection criterion for mixed models by using the normalized maximum likelihood (NML) function (Rissanen 1996). They also demonstrated the performance of the criterion in spline fitting using a truncated power basis. In this paper we focus on the problem of finding the number of knots by the NML criterion. We also compare the performance of the selection criterion with that of others in use.

8. Frequency domain approach to linear and nonlinear regression in the case of correlated stationary errors with applications

Tata Subba-Rao

School of Mathematics, The University of Manchester

Here we discuss the estimation of parameters of linear and nonlinear regression models when the errors are correlated and stationary. The approach we use the use of frequency domain approach, which was advocated by Hannan. We show how the methods can be extended to multivariate multiple regression situations. The advantage of these methods is that they do not depend on any distributional assumptions on the errors (except for the existence of second order moments). In the multivariate situations, the classical assumption one makes is that the random vector has a multivariate normal, and hence marginals are normal. This in many real situations, for example in the case of environmental series, geophysics etc.is unrealistic. In such situations we advocate is using the Frequency domain methods. We illustrate the methodology with real data sets . The data we consider for illustration is maximum /minimum temperatures, Ozone level concentrations. The applications to other data sets are obvious.

Briefly we point out, how with small changes and additional assumptions, the estimation of regression in the case of longitudinal data sets can be handled.

9. Estimation and Testing in Models with Multiple Parameter Changes

Alastair Hall

School of Social Sciences, The University of Manchester

This presentation reviews recent work on estimation and testing in two classes of parametric models in which there are multiple parameter regimes. The first class consists of linear model with endogenous regressors that are estimated via Two Stage Least Squares parameters. The second class consists of nonlinear regression models estimated via Least Squares.

10. Some comments on the Watson efficiency in a linear model

Simo Puntanen

Department of Mathematics and Statistics, University of Tampere

We consider the estimation of regression coefficients in a partitioned linear model and focus on questions concerning the Watson efficiency of the OLS estimator of a subset of the parameters with respect to the best linear unbiased estimator. Certain submodels are also considered. The conditions under which the Watson efficiency in the full model splits into a function of some other Watson efficiencies is given special attention. In particular, a new decomposition of the Watson efficiency into a product of three particular factors appears to be very useful.

11. Statistical Challenges in Drug Development

Alex Donev

School of Mathematics, The University of Manchester

Drug development is a long, complex and expensive process of identifying chemical compounds that possess desirable biological properties and testing their abilities to become drugs. We will focus on its early stages and discuss the design of high-throughput screening (HTS), comparative and confirmatory bioassay. We will also show how choosing an appropriate experimental design can substantially reduce the cost of the studies, without compromising the quality of the obtained data.

12. Invariance of the BLUE under the linear fixed and mixed effects models

Jarkko Isotalo

Tampere School of Public Health, University of Tampere

We consider the estimation of the parametric function under the partitioned linear fixed effects model and the linear mixed effects model, where is considered to be a random vector. Particularly, we consider when the best linear unbiased estimator, BLUE, of under the linear fixed effects model equals the corresponding BLUE under the linear mixed effects model.

Joint work with Märt Möls (University of Tartu, Estonia) and Simo Puntanen (University of Tampere, Finland).

13. Two scatter matrices for multivariate nonparametrics

Klaus Nordhausen

Tampere School of Public Health, University of Tampere

Several multivariate extensions of the univariate signs and ranks and the tests based upon them lack the property of being invariant under affine transformations. Recently Tyler, Critchley, Dümbgen und Oja (2008) and Oja, Sirkiä und Eriksson (2006) showed how two different scatter matrices can be used for data transformation. In a multivariate model, without any assumptions specified, this transformation yields an invariant coordinate system. Does the data come however from an independent component model, the transformation recovers the independent components if the scatter matrices used have the so called independence property.

This talk will shortly describe the properties of this transformation and in greater detail discuss its usage in multivariate nonparametric tests for location. Assuming a symmetric independent component model one can construct using the transformation optimal location tests based on marginal signed ranks. Without the independent component assumption still invariant nonparametric tests can be obtained.

14. A Decade of Joint Mean-Covariance Modelling

Gilbert MacKenzie

Centre for Biostatistics, Limerick University, Ireland

The conventional approach to modelling longitudinal data places considerable emphasis on estimation of the mean structure and less on the covariance structure, between repeated measurements on the same subject. Often, the covariance structure is thought to be a nuisance parameter, or at least not to be of primary scientific interest and little effort is expended on elucidating its structure.

A decade on, we shall argue that these ideas are rather pass\'{e} and that from an inferential standpoint the problem is symmetrical in both parameters $\mu$ and $\Sigma$. Throughout, we distinguish carefully between joint estimation which is routine and joint model selection which is not.

At first sight the task of estimating the structure of $\Sigma$, from the data, rather than from a pre-specified menu, may seem daunting, whence the idea of searching the entire covariance model space, $\{\cal C\}$, for $\Sigma$, may seem prohibitive. Thus, the final demand that we conduct a simultaneous search of the Cartesian product of the mean-covariance model space, $\{\cal M$ x $\cal C\}$, may seem impossible. However, below, we shall accomplish all three tasks elegantly for a particular, but very general, class of covariance structures, $\{\cal C^*\}$, defined below.

The technique is based on a modified Cholesky decomposition of the usual marginal covariance matrix ${ \Sigma}(t,{ \theta})$, where $t$ represents time and $\theta$ is a low-dimensional vector of parameters describing dependence on time. The decomposition leads to a reparametrization, ${\Sigma}(t, { \varsigma},{ \phi})$, in which the new parameters have an obvious statistical interpretation in terms of the natural logarithms of the innovation variances, ${\varsigma}$, and autoregressive coefficients, ${\phi}$. These unconstrained parameters are modelled, parsimoniously, as different polynomial functions of time.

We trace the history of the development of joint mean-covariance modeling over the last decade, from Pourahmadi's seminal paper in 1999 to recent times, and discuss the scope for further research in this paradigm.

Key References

Pan J. X. and MacKenzie G (2003). On modelling mean-covariance structures in longitudinal studies. Biometrika, 90, 239-244.

Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation. Biometrika, 86, 677-90.

15. Variable selection in joint modelling of mean and covariance structures for longitudinal data

Jianxin Pan

School of Mathematics, The University of Manchester

Recently joint modelling of mean and covariance structures for longitudinal/clustered data has attracted an increasing attention. It is well known that correct modelling of covariance structures leads to more efficient statistical inferences for the mean, and in some circumstances it helps to remove bias in the mean parameter estimates, for example, when data contain missing values. Like the mean structure, covariance may change from group to group or more generally may be dependent on various covariate variables of interest. Selection of covariate variables that significantly affect the modelling of the mean and/or covariance structures is the main concern of this work. In this talk we will start with why and how covariance structures are modelled together with the mean, and then discuss how one should select the most important covariate variables for the mean and covariance structures using LASSO, HARD thresholding and SCAD techniques. Asymptotic properties including consistency and normality will be discussed. Various numerical comparisons will be made through real data analysis and simulation studies.