Programme
The plan is for the workshop to start at
Session
1: Chaired by Prof Jianxin
Pan (The
in
complex intervention trials: mediation and moderation of psychological
treatment
effects
University).
Talk: Minimum description length model
selection and
smoothing
under linear mixed models
Session
2: Chaired by Prof Erkki Liski (
of
measurement
error
of
complete
block design
Session
3: Chaired by Prof Tata Subba-Rao (The
Splines
Talk: Smoothing non-stationary correlated data
Session
4: Chaired by Dr Simo Puntanen (
Technology) Talk: MDL knot selection for penalized splines
Talk: Frequency domain approach to linear and nonlinear regression in the
case
of correlated stationary errors with applications
Session
5: Chaired by Alex Donev
(The
Talk: Estimation and Testing in Models with Multiple Parameter Changes
of
model
Session
6: Chaired by Dr Jyrki Mottonen (
Statistical Challenges in Drug Development
Talk: Invariance of the BLUE under the linear
fixed and mixed effects
Models
Session
7: Chaired by Dr Peter Foster (The
Talk: A Decade of Joint Mean-Covariance Modelling
Session
8: Chaired by Dr Tapio Nummi (
Talk: Variable
selection in joint modelling of mean and covariance
structures for longitudinal data
Details
of Talks:
1. Modelling treatment-effect heterogeneity in complex
intervention trials:
mediation and moderation of psychological treatment effects
Graham
Dunn
Health
Methodology Research Group,
Randomised controlled trials (RCTs) of complex interventions (such as psychotherapy) can serve both a pragmatic and an explanatory function. Through intelligent analysis of the mediating and/or modifying effects of intervening variables we should obtain important insights about the way interventions might work. At its best the complex intervention trial will be a sophisticated clinical experiment designed to test the theories motivating the intervention and gain further understanding of the nature of the clinical problem being treated. I describe instrumental variable (IV) methods for the estimation of the ‘dose’-response effects of psychological interventions in RCTs in which there is variability in the number of sessions of therapy attended, the effect of which is modified by the strength of the therapeutic alliance between patients and their therapists. The IV methods allow for (a) hidden confounding (selection effects), (b) measurement errors and (c) that alliance is only measured in those receiving treatment.
2.
Minimum description length model selection and smoothing under linear mixed
models
Erkki Liski
Department
of Mathematics and Statistics,
Models can be considered as statistical descriptions of observed data. Comparison between competing models is based on the stochastic complexity (SC) of each description. The normalized maximum likelihood form of the SC (Rissanen 1996) contains a component that may be interpreted as the parametric complexity of the model class. The SC for the data, relative to a class of suggested models, is calculated and the optimal model with the smallest SC is selected. This provides the minimum description length (MDL) model selection criterion (Rissanen 1978, 2007).
Many richly-parametrised models can be expressed as linear mixed models (LMM). Smoothing methods that use basis functions with penalisation can be formulated as ML estimators and best predictors in a LMM framework (Ruppert et al. 2003). The advent of mixed model software have made this approach successful also in the field of smoothing.
We derive a MDL model selection criterion for the Gaussian LMM and interpret the smoothing model as a LMM. In smoothing we control three modeling parameters: the degree of the regression spline, the number of knots and the smoothing parameter . A model is specified by the triple where the values for the modeling parameters, and should be determined in an optimal way. A model estimator is obtained by minimizing the selection criterion with respect to model, that is, with respect to parameters , and , using numerical optimization routines.
3. Efficiency of two-phase designs
for dealing with measurement error
Roseanne McNamee
Health
Methodology Research Group,
Measurement error can lead to attenuation of
exposure-response relationships and bias in estimation of disease prevalence,
for example. Accurate exposure measurement can be expensive. A two phase study
design for dealing with measurement is one in which the first phase uses an
error-prone exposure measure, Z, possible in conjunction with other
variables while the second phase
measures true exposure, X, in a subset
of subjects. While there is a
substantial literature on analysis of data from two-phase designs, for the
researcher an important question is whether there is any gain in efficiency,
relative to cost, compared to using X alone. Optimal designs and efficiency of
a number of different designs – for prevalence estimation, for odds ratio
estimation and for estimation of the linear regression of an outcome Y on X –
will be discussed.
4. Multivariate nonparametric tests in a
randomized complete block design
Jyrki Mottonen
Department of Mathematics and Statistics,
Multivariate extensions of the Friedman and Page tests for the comparison of several treatments are introduced. Related unadjusted and adjusted treatment effect estimates for the multivariate response variable are also found and their properties discussed. The test statistics and estimates are analogous to the traditional univariate methods. In test constructions, the univariate ranks are replaced by multivariate spatial ranks. Asymptotic theory is developed to provide approximations for the limiting distributions of the test statistics and estimates. Limiting efficiencies of the tests and treatment effect estimates are found in the multivariate normal and distribution cases. The theory is illustrated by an example.
5. Analysis of Multivariate Growth Curves Using
Smoothing Splines
Tapio Nummi
Department
of Mathematics and Statistics,
Growth curve data
arise when repeated measurements are observed on a number of individuals with
an ordered dimension for occasions. Such data arise in almost all fields in
which statistical models are applied. Often, for example in medicine, more than
one variable are measured on each occasions. However, statistical analyses
applied for such data often emphasize the exploration of one variable only. The
consequence is that the information contained in the between-variables
correlation structure will be lost. In this study we show how cubic smoothing splines can be applied for multivariate growth curve data.
Some ideas of testing the mean curves are also presented. The analyses are
illustrated by multivariate paper quality data.
6.
Smoothing non-stationary correlated data
Peter
Foster,
The smoothing technique we will be focussing on is local linear regression while the non-stationary
covariance model we assume is the structured ante-dependence model of order
one, denoted SAD(1). An expression for the
asymptotic MISE of the smoother is derived and shown to be dependent on
parameters defining the SAD(1) model. A non-linear
least squares approach is used to nonparamtrically
fit the SAD(1) model which will be described in
detail. Finally, simulations show the benefits of modelling
the non-
stationarioty in the data rather than erroneously
assuming stationarity.
7.
Minimum description length knot selection for penalized splines
Antti Liski
Department
of Signal Processing,
There exists a well known connection between penalized splines and mixed models. This connection makes it possible to exploit mixed models in the estimation of penalized splines. Liski & Liski (2008) derived the model selection criterion for mixed models by using the normalized maximum likelihood (NML) function (Rissanen 1996). They also demonstrated the performance of the criterion in spline fitting using a truncated power basis. In this paper we focus on the problem of finding the number of knots by the NML criterion. We also compare the performance of the selection criterion with that of others in use.
8.
Frequency domain approach to linear and nonlinear regression in the case of
correlated stationary errors with applications
Tata Subba-Rao
Here we discuss the estimation of parameters of linear and nonlinear regression models when the errors are correlated and stationary. The approach we use the use of frequency domain approach, which was advocated by Hannan. We show how the methods can be extended to multivariate multiple regression situations. The advantage of these methods is that they do not depend on any distributional assumptions on the errors (except for the existence of second order moments). In the multivariate situations, the classical assumption one makes is that the random vector has a multivariate normal, and hence marginals are normal. This in many real situations, for example in the case of environmental series, geophysics etc.is unrealistic. In such situations we advocate is using the Frequency domain methods. We illustrate the methodology with real data sets . The data we consider for illustration is maximum /minimum temperatures, Ozone level concentrations. The applications to other data sets are obvious.
Briefly we point out, how with small changes and additional assumptions, the estimation of regression in the case of longitudinal data sets can be handled.
9.
Estimation and Testing in Models with Multiple Parameter Changes
Alastair Hall
This presentation reviews recent work on estimation and testing in two classes of parametric models in which there are multiple parameter regimes. The first class consists of linear model with endogenous regressors that are estimated via Two Stage Least Squares parameters. The second class consists of nonlinear regression models estimated via Least Squares.
10.
Some comments on the Watson efficiency in a linear model
Simo Puntanen
Department
of Mathematics and Statistics,
We consider the estimation of regression
coefficients in a partitioned linear model and focus on questions
concerning the Watson efficiency of the OLS estimator of a subset of the
parameters with respect to the best linear unbiased estimator. Certain submodels are also considered. The conditions under
which the Watson efficiency in the full model splits
into a function of some other Watson efficiencies is given special
attention. In particular, a new decomposition of the Watson efficiency
into a product of three particular factors appears to be very useful.
11.
Statistical Challenges in Drug Development
Alex
Donev
Drug development is a long, complex and
expensive process of identifying chemical compounds that possess desirable
biological properties and testing their abilities to become drugs. We will
focus on its early stages and discuss the design of
high-throughput screening (HTS), comparative and confirmatory bioassay. We
will also show how choosing an appropriate experimental design can
substantially reduce the cost of the studies, without compromising the
quality of the obtained data.
12.
Invariance of the BLUE under the linear fixed and mixed effects models
Jarkko Isotalo
We consider the estimation of the parametric function under the partitioned linear fixed effects model and the linear mixed effects model, where is considered to be a random vector. Particularly, we consider when the best linear unbiased estimator, BLUE, of under the linear fixed effects model equals the corresponding BLUE under the linear mixed effects model.
Joint work with Märt Möls (
13.
Two scatter matrices for multivariate nonparametrics
Klaus
Nordhausen
Several multivariate extensions of the univariate signs and ranks and the tests based upon them lack the property of being invariant under affine transformations. Recently Tyler, Critchley, Dümbgen und Oja (2008) and Oja, Sirkiä und Eriksson (2006) showed how two different scatter matrices can be used for data transformation. In a multivariate model, without any assumptions specified, this transformation yields an invariant coordinate system. Does the data come however from an independent component model, the transformation recovers the independent components if the scatter matrices used have the so called independence property.
This talk will shortly describe the properties of this transformation and in greater detail discuss its usage in multivariate nonparametric tests for location. Assuming a symmetric independent component model one can construct using the transformation optimal location tests based on marginal signed ranks. Without the independent component assumption still invariant nonparametric tests can be obtained.
14.
A Decade of Joint Mean-Covariance Modelling
Gilbert
MacKenzie
Centre
for Biostatistics,
The conventional approach to modelling longitudinal data places considerable emphasis on estimation of the mean structure and less on the covariance structure, between repeated measurements on the same subject. Often, the covariance structure is thought to be a nuisance parameter, or at least not to be of primary scientific interest and little effort is expended on elucidating its structure.
A decade on, we shall argue that these ideas are rather pass\'{e} and that from an inferential standpoint the problem is symmetrical in both parameters $\mu$ and $\Sigma$. Throughout, we distinguish carefully between joint estimation which is routine and joint model selection which is not.
At first sight the task of estimating the structure of $\Sigma$, from the data, rather than from a pre-specified menu, may seem daunting, whence the idea of searching the entire covariance model space, $\{\cal C\}$, for $\Sigma$, may seem prohibitive. Thus, the final demand that we conduct a simultaneous search of the Cartesian product of the mean-covariance model space, $\{\cal M$ x $\cal C\}$, may seem impossible. However, below, we shall accomplish all three tasks elegantly for a particular, but very general, class of covariance structures, $\{\cal C^*\}$, defined below.
The technique is based on a modified Cholesky decomposition of the usual marginal covariance matrix ${ \Sigma}(t,{ \theta})$, where $t$ represents time and $\theta$ is a low-dimensional vector of parameters describing dependence on time. The decomposition leads to a reparametrization, ${\Sigma}(t, { \varsigma},{ \phi})$, in which the new parameters have an obvious statistical interpretation in terms of the natural logarithms of the innovation variances, ${\varsigma}$, and autoregressive coefficients, ${\phi}$. These unconstrained parameters are modelled, parsimoniously, as different polynomial functions of time.
We trace the history of the development of joint mean-covariance modeling over the last decade, from Pourahmadi's seminal paper in 1999 to recent times, and discuss the scope for further research in this paradigm.
Key References
Pan J. X. and MacKenzie G (2003). On modelling mean-covariance structures in longitudinal studies. Biometrika, 90, 239-244.
Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation. Biometrika, 86, 677-90.
15. Variable selection in joint modelling
of mean and covariance structures for longitudinal data
Recently joint modelling
of mean and covariance structures for longitudinal/clustered data has attracted
an increasing attention. It is well known that correct modelling
of covariance structures leads to more efficient statistical inferences for the
mean, and in some circumstances it helps to remove bias in the mean parameter
estimates, for example, when data contain missing values. Like the mean
structure, covariance may change from group to group or more generally may be
dependent on various covariate variables of interest. Selection of covariate variables
that significantly affect the modelling of the mean
and/or covariance structures is the main concern of this work. In this talk we will start with why and how covariance
structures are modelled together with the mean, and
then discuss how one should select the most important covariate variables for
the mean and covariance structures using LASSO, HARD thresholding
and SCAD techniques. Asymptotic properties including consistency and normality
will be discussed. Various numerical comparisons will be made through real data
analysis and simulation studies.