05-Tuesday.Rmd

# Programme And Abstracts For Tuesday 12^th^ Of December {#Tuesday .unnumbered}
<div id = "talk_198"><p class="keynoteBanner">Keynote: Tuesday 12<sup>th</sup> 9:10 098 Lecture Theatre (260-098)</p></div>
## Could Do Better &hellip; A Report Card For Statistical Computing {.unnumbered}
<p style="text-align:center">
Ross Ihaka and Brendon McArdle<br />
University of Auckland<br />
</p>
<span>**Abstract:**</span> Since the introduction of R, research in Statistical Computing has
plateaued. Although R is, at best, a stop-gap system, there appears to
be very little active research on creating better computing
environments for Statistics.

When work on R commenced there were a multitude of software systems
for statistical data analysis in use and under development. There was
friendly competition and collaboration between developers. While R can
be seen as providing a useful unification for users, its success and
dominance can be viewed as now holding back research and the
development of new systems.

In this talk we'll examine what might be behind this and also look at
some research aimed at exploring some of the design space for new
systems. The aim is to show constructively that new work in the area
is still possible.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_025"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:30 098 Lecture Theatre (260-098)</p></div>
## R&amp;D Policy Regimes In France: New Evidence From A Spatio-Temporal Analysis {.unnumbered}
<p style="text-align:center">
Benjamin Montmartin^1^, Marcos Herrera^2^, and Nadine Massard^3^<br />
^1^GREDEG CNRS<br />
^2^CONICET<br />
^3^GAEL<br />
</p>
<span>**Abstract:**</span> Using a unique database containing
information on the amount of R&D tax credits and regional, national and
European subsidies received by firms in French NUTS3 regions over the
period 2001-2011, we provide new evidence on the efficiency of R&D
policies taking into account spatial dependency across regions. By
estimating a spatial Durbin model with regimes and fixed effects, we
show that in a context of yardstick competition between regions,
national subsidies are the only instrument that displays total leverage
effect. For other instruments internal and external effects balance each
other resulting in insignificant total effects. Structural breaks
corresponding to tax credit reforms are also revealed.

<span>**Keywords:**</span> Additionality, French policy mix, Spatial
panel, Structural break

<span>**References:**</span>

Pesaran, M. H. (2007). A simple panel unit root test in the presence of
cross-section dependence In: *Journal of Applied Econometrics*, **22**,
265–312.

Hendry, D. F. (1979). Predictive failure and econometric modelling in
macroeconomics: The transactions demand for money. In: *P. Ormerod
(Ed.), Economic Modelling: Current Issues and Problems in Macroeconomic
Modelling in the UK and the US*, **9**, 217–242. Heinemann Education
Books, London.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_084"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:30 OGGB4 (260-073)</p></div>
## Analysing Scientific Collaborations Of New Zealand Institutions Using Scopus Bibliometric Data {.unnumbered}
<p style="text-align:center">
Samin Aref^1^, David Friggens^2^, and Shaun Hendy^1^<br />
^1^University of Auckland<br />
^2^Ministry of Business Innovation & Employment<br />
</p>
<span>**Abstract:**</span> Scientific collaborations are among the main
enablers of development in small national science systems. Although
analysing scientific collaborations is a well-established subject in
scientometrics, evaluations of collaborative activities of countries
remain speculative with studies based on a limited number of fields or
using data too inadequate to fully represent collaborations at a
national level. This study provides a unique view on the collaborative
aspect of scientific activities in New Zealand. We perform a
quantitative study based on all Scopus publications in all subjects for
over 1500 New Zealand institutions over a period of 6 years to generate
an extensive mapping of New Zealand scientific collaborations. The
comparative results reveal the levels of collaboration between New
Zealand institutions and business enterprises, government institutions,
higher education providers, and private not for profit organisations in
2010-2015. Constructing a collaboration network of institutions, we
observe a power-law distribution indicating that a small number of New
Zealand institutions account for a large proportion of national
collaborations. Network centrality measures are deployed to identify the
most influential institutions of the country in terms of scientific
collaboration. We also provide comparative results on 15 universities
and crown research institutes based on 27 subject classifications. This
study was based on Scopus custom data and supported by the Te Pūnaha
Matatini internship program at Ministry of Business, Innovation &
Employment.

ArXiv preprint link: https://arxiv.org/pdf/1709.02897

<span>**Keywords:**</span> Big data modelling, Scientific collaboration,
Scientometrics, Network analysis, Scopus, New Zealand
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_170"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:30 OGGB5 (260-051)</p></div>
## Family Structure And Academic Achievements Of High School Students In Tonga {.unnumbered}
<p style="text-align:center">
Losana Vao Latu Latu<br />
University of Canterbury<br />
</p>
<span>**Abstract:**</span> In this study we examine how family structure affects the academic
achievement of students at the secondary level of education age in
Tonga. It is a comparative study aiming to find out whether there is a
significant difference between the academic achievements of students
from a traditional family and those from a non-traditional family. We
define a Tongan traditional family as being two biological parents (or
adoptive parents from birth), one male and one female where as
non-traditional family can be a single parent family, or the student has
no parent present (for example they are staying with relatives or
friends). In our study we are looking at what are the key drivers of
success and trying to understand the relationship between academic
achievements and family structure. We hope the study will provide
evidence-based information to aid the administrators, other educators
and parents to adopt the best practices and actions for the students.
The target population for this study is the high school students age 13
to 18 in Tonga. The study is limited to the high schools in the main
island of Tonga- Tongatapu which has 12 high schools where two high
schools are government schools and the others are private schools run by
different religions. In April we surveyed 360 students, 60 from each of
6 high schools, and present here our preliminary results.

<span>**Keywords:**</span> Education, policy, stratified sampling
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_017"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:30 Case Room 2 (260-057)</p></div>
## Analysis Of Multivariate Binary Longitudinal Data: Metabolic Syndrome During Menopausal Transition {.unnumbered}
<p style="text-align:center">
Geoff Jones<br />
Massey University<br />
</p>
<span>**Abstract:**</span> Metabolic syndrome (MetS) is a major
multifactorial condition that predisposes adults to type 2 diabetes and
cardiovascular disease. It is defined as having at least three of five
cardiometabolic risk components: 1) high fasting triglyceride level, 2)
low high-density lipoprotein (HDL) cholesterol, 3) elevated fasting
plasma glucose, 4) large waist circumference (abdominal obesity) and 5)
hypertension. In the US Study of Women’s Health Across the Nation
(SWAN), a 15-year multi-centre prospective cohort study of women from
five racial/ethnic groups, the incidence of MetS increased as midlife
women underwent the menopausal transition (MT). A model is sought to
examine the interdependent progression of the five MetS components and
the influence of demographic covariates.

<span>**Keywords:**</span> Multivariate binary data, longitudinal
analysis, metabolic syndrome
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_169"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:30 Case Room 3 (260-055)</p></div>
## Clustering Of Curves On A Spatial Domain Using A Bayesian Partitioning Model {.unnumbered}
<p style="text-align:center">
Chae Young Lim<br />
Seoul National University<br />
</p>
<span>**Abstract:**</span> We propose a Bayesian hierarchical model for
spatial clustering of the high-dimensional functional data based on the
effects of functional covariates. We couple the functional mixed-effects
model with a generalized spatial partitioning method for: (1)
identifying subregions for the high-dimensional spatio-functional data;
(2) improving the computational feasibility via parallel computing over
subregions or multi-level partitions; and (3) addressing the
near-boundary ambiguity in model-based spatial clustering techniques.
The proposed model extends the existing spatial clustering techniques to
produce spatially contiguous partitions for spatio-functional data. The
model successfully captured the regional effects of the atmospheric and
cloud properties on the spectral radiance measurements. This elaborates
the importance of considering spatially contiguous partitions for
identifying regional effects and small-scale variability.

<span>**Keywords:**</span> spatial clustering, Bayesian wavelets,
Voronoi tessellation, functional covariates
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_044"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:30 Case Room 4 (260-009)</p></div>
## The Uncomfortable Entrepreneurs: Bad Working Conditions And Entrepreneurial Commitment {.unnumbered}
<p style="text-align:center">
Catherine Laffineur<br />
Université Côte d'Azur, GREDEG-CNRS<br />
</p>
<span>**Abstract:**</span> In contrast to previous model dividing
necessity entrepreneurs as individuals facing push factors due to lack
of employment, we consider the possibility of push factors faced by
employed individuals (Folta et al. (2010)). The theoretical model yields
distinctive predictions relating occupation characteristics and the
probability of entry into entrepreneurship. Using PSED and ONET data, we
investigate how the characteristics of individuals? primary occupations
affect nascent entrepreneurs? effort put into venture creation. The
empirical evidences show that necessity entrepreneurs are not only
confined to unemployed individuals. We find compelling evidence that
individuals facing arduous working conditions (e.g. stressful
environment and physical tiredness) have a higher likelihood of entering
and succeeding in self-employment than others. Contrariwise, individuals
who experience high degree of self-realization, independence and
responsibility in the workplace are less committed to their business
than individuals exposed to arduous working conditions. These findings
have strong implication for how we interpret and analyze necessity
entrepreneurs and provide novel insights into the role of occupational
experience in the process of venture emergence.

<span>**Keywords:**</span> Entrepreneurship, Motivation,
Occupational characteristics, Employment choice.

<span>**References:**</span>

Folta, T. B., Delmar, F., & Wennberg, K. 2010. Hybrid entrepreneurship.
*Management Science*, 56(2), 253-269.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_028"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:50 098 Lecture Theatre (260-098)</p></div>
## Spatial Surveillance With Scan Statistics By Controlling The False Discovery Rate {.unnumbered}
<p style="text-align:center">
Xun Xiao<br />
Massey University<br />
</p>
<span>**Abstract:**</span> In this paper, I investigate a false
discovery approach based on spatial scan statistics to detect the
spatial disease clusters in a geographical region proposed by Li et al.
(2016). The incidence of disease is assumed to follow an inhomogeneous
Poisson model discussed in Kulldorff (1997). I show that, though spatial
scan statistics are highly correlated, the simple Banjamini-Hochberg
(linear step-up) procedure can control the false discovery rate of them
by proving that the multivariate Poisson distribution satisfies the PRDS
condition (positive regression dependence on a subset) in Benjamini and
Yekutieli (2001).

<span>**Keywords:**</span> False Discovery Rate, Poisson Distribution,
PRDS, Spatial Scan Statistics

<span>**References:**</span>

Benjamini, Y. and Yekutieli, D. (2001). *The control of the false
discovery rate in multiple testing under dependency*, Annals of
Statistics, **29**(4), 1165–1188.

Kulldorff, M. (1997). *A spatial scan statistic*, Communications in
Statistics-Theory and Methods **26**(6), 1481–1496.

Li, Y., Shu, L., and Tsung, F. (2016). *A false discovery approach for
scanning spatial disease clusters with arbitrary shapes*, IIE
transactions, **48**(7), 684–698.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_113"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:50 OGGB4 (260-073)</p></div>
## Statistical Models For The Source Attribution Of Zoonotic Diseases: A Study Of Campylobacteriosis {.unnumbered}
<p style="text-align:center">
Sih-Jing Liao, Martin Hazelton, Jonathan Marshall, and Nigel French<br />
Massey University<br />
</p>
<span>**Abstract:**</span> Preventing and controlling zoonoses with a
public health policy depends on the knowledge scientists have about the
transmitted pathogens. Modelling jointly the epidemiological data and
genetic information provides a methodology for tracing back the source
of infection. However, this creates difficulties in assessing genetic
efforts behind models of the final statistical inferences due to
increased model complexity. To explore the genetic effects in the joint
model, we develop a genetic free model and compare it to the joint
model. We apply the two models to a recent campylobacteriosis study to
estimate the attribution probability for each source. A spatial
covariate is also considered in the models in order to investigate the
effect of the level of rurality on the source attributions. Comparing
the attributions generated by the two models, we find that: i) the
genetic information integrated in the joint model gives a little more
precise inference to the sparse cases observed in highly rural areas
than the genetic free model; ii) on the logit scale, source attribution
probabilities follow linear trends against level of rurality; and iii)
poultry is the dominant source of campylobacteriosis in urban centres,
whereas ruminants are the most attributable source when in rural areas.

<span>**Keywords:**</span> source attribution, *Campylobacter*,
multinomial model, Dirichlet prior, HPD interval, DIC

<span>**References:**</span>

Bronowski, C., James, C.E. and Winstanley, C. (2014). Role of
environmental survival in transmission of *Campylobacter jejuni*. *FEMS
Microbiol Lett.*, **356**(1) 8–19.

Dingle, K.E., Colles, F.M., Wareing, D.R., Ure, R., Fox, A.J., Bolton,
F.E., Bootsma, H.J., Willems, R.J. and Maiden, M.C. (2001). Multilocus
sequence typing system for *Campylobacter jejuni*. *J Clin Microbiol*,
**39**(1):14–23.

Marshall, J.C. and French, N.P. (2015). Source attribution January to
December 2014 of human *Campylobacter jejuni* cases from the Manawatu.
*Technical <span>R</span>eport*.

Wilson, D.J., Gabriel, E., Leatherbarrow, A.J., Cheesbrough, J., Gee,
S., Bolton, E., Fox, A., Fearnhead, P., Hart, C.A. and Diggle, P.J.
(2008). Tracing the source of campylobacteriosis. *PLoS Genet*,
**4**(9):e1000203.

Wagenaar, J.A., French, N.P. and Havelaar, A.H. (2013). Preventing
*Campylobacter* at the source: why is it so difficult? *Clin Infect
Dis*, **57**(11):1600–1606.

Biggs, P.J., Fearnhead, P., Hotter, G., Mohan, V., Collins-Emerson, J.,
Kwan, E., Besser, T.E., Cookson, A., Carter, P.E. and French, N.P.
(2011). Whole-genome comparison of two *Campylobacter jejuni* isolates
of the same sequence type reveals multiple loci of different ancestral
lineage. *PLoS One*, **6**(11):e27121.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_039"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:50 OGGB5 (260-051)</p></div>
## Towards An Informal Test For Goodness-Of-Fit {.unnumbered}
<p style="text-align:center">
Anna Fergusson and Maxine Pfannkuch<br />
University of Auckland<br />
</p>
<span>**Abstract:**</span> Informal approaches to goodness-of-fit tests often involve examining the
visual fit of the model to data ’by eye’. Such approaches are
problematic for Year 13 and undergraduate students and teachers from a
pedagogical perspective as key aspects such as sample size, the number
of categories and expected variation of sample proportions are difficult
to consider. In formal tests for goodness-of-fit a test statistic is
used in reference to its sampling distribution to decide if the model
distribution can be rejected. In general, a numeric test statistic does
not have an obvious graphical representation within the data itself.
This talk presents a new informal goodness-of-fit test that uses a
simulation-based modelling tool. Drawing on ideas from graphical
inference, the proposed test does not use numerical test statistics but
plots as test statistics. Comparisons of performance demonstrate that
the proposed test leads to similar decisions about the fit of the model
distribution as the chi square goodness-of-fit test. A research study
with Year 13 teachers indicated that there could be pedagogical benefits
of using this informal goodness-of-fit test in terms of introducing
important modelling and hypothesis test concepts.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_024"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:50 Case Room 2 (260-057)</p></div>
## Identifying Clusters Of Patients With Diabetes Using A Markov Birth-Death Process {.unnumbered}
<p style="text-align:center">
Mugdha Manda, Thomas Lumley, and Susan Wells<br />
University of Auckland<br />
</p>
<span>**Abstract:**</span> Estimating disease trajectories has
increasingly become more essential to clinical practitioners to
administer effective treatment to their patients. A part of describing
disease trajectories involves taking patients’ medical histories and
sociodemographic factors into account and grouping them into similar
groups, or clusters. Advances in computerised patient databases have
paved a way for identifying such trajectories in patients by recording a
patient’s medical history over a long period of time (longitudinal
data): we studied data from the PREDICT-CVD dataset, a national
primary-care cohort from which people with diabetes from 2002-2015 were
identified through routine clinical practice. We fitted a Bayesian
hierarchical linear model with latent clusters to the repeated
measurements of HbA$_1c$ and eGFR, using the Markov birth-death process
proposed by Stephens (2000) to handle the changes in dimensionality as
clusters were added or removed.

<span>**Keywords:**</span> Diabetes management, longitudinal data,
Markov chain Monte Carlo, birth-death process, mixture model, Bayesian
analysis, latent clusters, hierarchical models, primary care, clinical
practice

<span>**References:**</span>

Stephens, M. (2000). Bayesian Analysis of Mixture Models with an Unknown
Number of Components - An Alternative to Reversible Jump Methods. In:
*The Annals of Statistics*, 28(1), 40-74.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_174"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:50 Case Room 3 (260-055)</p></div>
## Bayesian Temporal Density Estimation Using Autoregressive Species Sampling Models {.unnumbered}
<p style="text-align:center">
Youngin Jo^1^, Seongil Jo^2^, and Jaeyong Lee^3^<br />
^1^Kakao Corporation<br />
^2^Chonbuk National University<br />
^3^Seoul National University<br />
</p>
<span>**Abstract:**</span> We propose a Bayesian nonparametric (BNP)
model, which is built on a class of species sampling models, for
estimating density functions of temporal data. In particular, we
introduce species sampling mixture models with temporal dependence. To
accommodate temporal dependence, we define dependent species sampling
models by modeling random support points and weights through an
autoregressive model, and then we construct the mixture models based on
the collection of these dependent species sampling models. We propose an
algorithm to generate posterior samples and present simulation studies
to compare the performance of the proposed models with competitors that
are based on Dirichlet process mixture models. We apply our method to
the estimation of densities for the price of apartment in Seoul, the
closing price in Korea Composite Stock Price Index (KOSPI), and climate
variables (daily maximum temperature and precipitation) of around the
Korean peninsula.

<span>**Keywords:**</span> Autoregressive species sampling models;
Dependent random probability measures; Mixture models; Temporal
structured data

<span>**Acknowledgements:**</span> This work is a part of the first author’s Ph.D. thesis at Seoul National University. Research of Seongil Jo was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2017R1D1A3B03035235). Research of Jaeyong Lee was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2011-0030811).
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_047"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:50 Case Room 4 (260-009)</p></div>
## How Does The Textile Set Describe Geometric Structures Of Data? {.unnumbered}
<p style="text-align:center">
Ushio Tanaka^1^ and Tomonari Sei^2^<br />
^1^Osaka Prefecture University<br />
^2^Unversity of Tokyo<br />
</p>
<span>**Abstract:**</span> The textile set is defined from the textile
plot proposed by Kumasaka and Shibata (2007, 2008), which is a powerful
tool for visualizing high dimensional data. The textile plot is based on
a parallel coordinate plot, where the ordering, locations and scales of
each axis are simultaneously chosen so that all connecting lines, each
of which signifies an observation, are aligned as horizontally as
possible. The textile plot transforms a data matrix in order to
delineate a parallel coordinate plot. Using the geometric properties of
the textile set derived by Sei and Tanaka (2015), we show that the
textile set describes an intrinsically geometric structures of data.

<span>**Keywords:**</span> Parallel coordinate plot, Textile set,
Differentiable manifold

<span>**References:**</span>

Kumasaka, N. and Shibata, R. (2007). The Textile Plot Environment,
*Proceedings of the Institute of Statistical Mathematics*, **55**,
47–68.

Kumasaka, N. and Shibata, R. (2008). High-dimensional data
visualisation: The textile plot, *Computational Statistics and Data
Analysis*, **52**, 3616–3644.

Sei, T. and Tanaka, U. (2015). Geometric Properties of Textile Plot:
*Geometric Science of Information*, *Lecture Notes in Computer Science*,
**9389**, 732–739.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_046"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:10 098 Lecture Theatre (260-098)</p></div>
## Intensity Estimation Of Spatial Point Processes Based On Area-Aggregated Data {.unnumbered}
<p style="text-align:center">
Hsin-Cheng Huang and Chi-Wei Lai<br />
Academia Sinica<br />
</p>
<span>**Abstract:**</span> We consider estimation of intensity function
for spatial point processes based on area-aggregated data. A standard
approach for estimating the intensity function for a spatial point
pattern is to use a kernel estimator. However, when data are only
available in a spatially aggregated form with the numbers of events
available in geographical subregions, traditional methods developed for
individual-level event data become infeasible. In this research, a
kernel-based method will be proposed to produce a smooth intensity
function based on aggregated count data. Some numerical examples will be
provided to demonstrate the effectiveness of the proposed method.

<span>**Keywords:**</span> Area censoring, inhomogeneous spatial point
processes, kernel density estimation
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_115"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:10 OGGB4 (260-073)</p></div>
## Bayesian Inference For Population Attributable Measures {.unnumbered}
<p style="text-align:center">
Sarah Pirikahu, Geoff Jones, Martin Hazelton, and Cord Heuer<br />
Massey University<br />
</p>
<span>**Abstract:**</span> Epidemiologists often wish to determine the population impact of an
intervention to remove or reduce a risk factor. Population attributable
type measures, such as the population attributable risk (PAR) and
population attributable fraction (PAF), provide a means of assessing
this impact, in a way that is accessible for a non-statistical audience.
To apply these concepts to epidemiological data, the calculation of
estimates and confidence intervals for these measures should take into
account the study design (cross-sectional, case-control, survey) and any
sources of uncertainty (such as measurement error in exposure to the
risk factor). We provide methods to produce estimates and Bayesian
credible intervals for the PAR and PAF from common epidemiological study
types and assess the Frequentist properties. The model is then extended
by incorporating uncertainty due to the use of imperfect diagnostic
tests for disease or exposure. The resulting model can be
non-identifiable, causing convergence problems for common MCMC samplers,
such as Gibbs and Metropolis-Hastings. An alternative importance
sampling method performs much better for these non-identifiable models
and can be used to explore the limiting posterior distribution. The data
used to estimate these population attributable measures may include
multiple risk factors in addition to the one being considered for
removal. Uncertainty regarding the distribution of these risk factors in
the population affects the inference for PAR and PAF. To allow for this
we propose a methodology involving the Bayesian bootstrap. We also
extend the analysis to allow for complex survey designs with unequal
weights, stratification and clustering.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_147"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:10 OGGB5 (260-051)</p></div>
## An Information Criterion For Prediction With Auxiliary Variables Under Covariate Shift {.unnumbered}
<p style="text-align:center">
Takahiro Ido^1^, Shinpei Imori^1,2^, and Hidetoshi Shimodaira^2,3^<br />
^1^Osaka University<br />
^2^RIKEN Center for Advanced Intelligence Project (AIP)<br />
^3^Kyoto University<br />
</p>
<span>**Abstract:**</span> It is beneficial for modeling data of
interest to exploit secondary information. The secondary information is
called auxiliary variables, which may not be observed in testing data
because they are not of primary interest. In this paper, we incorporate
the auxiliary variables into a framework of supervised learning.
Furthermore, we consider a covariate shift situation that allows a
density function of covariates to change between testing and training
data. It is known that the Maximum Log-likelihood Estimate (MLE) is not
a good estimator under model misspecification and the covariate shift.
This problem can be resolved by the Maximum Weighted Log-likelihood
Estimate (MWLE).

When we have multiple candidate models, it needs to select the best
candidate model where its optimality is measured by the expected
Kullback-Leibler (KL) divergence. The Akaike information criterion (AIC)
is a well known criterion based on the KL divergence and using the MLE.
Therefore, its validity is not guaranteed when the MWLE is used under
the covariate shift. An information criterion under the covariate shift
was proposed in Shimodaira (2000, JSPI) but this criterion does not take
use of the auxiliary variables into account. Hence, we resolve this
problem by deriving a new criterion. In addition, simulations are
conducted to examine the improvement.

<span>**Keywords:**</span> Auxiliary variables; Covariate shift;
Information criterion; Kullback-Leibler divergence; Misspecification;
Predictions.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_118"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:10 Case Room 2 (260-057)</p></div>
## Analysis Of A Brief Telephone Intervention For Problem Gambling And Examining The Impact On Co-Existing Depression? {.unnumbered}
<p style="text-align:center">
Nick Garrett, Maria Bellringer, and Max Abbott<br />
Auckland University of Technology<br />
</p>
<span>**Abstract:**</span> This study investigated the outcomes of a brief telephone intervention
for problem gambling. A total of 150 callers were recruited and followed
for 36 months. After giving consent, participants received a baseline
assessment followed by a manualised version of the helpline’s standard
care. Eight-six percent of participants were re-assessed at three
months, 79Depression is found to often be associated with problem
gambling behaviour, and analysis was undertaken to examine the impact of
a brief telephone intervention for problem gambling on rates of
depression using logistic regression. At baseline depression was found
to be associated with gender, problem gambling risk (PGSI), and
deprivation (NZiDep). A multiple variable model found that PGSI and
mental health medication best explained depression at baseline. A
repeated measures logistic regression utilising all 36 months of data
found that PGSI, NZiDep, and mental health medication were the best
variables to explain the change over time. Conclusion was that the
intervention’s impact on problem gambling behaviour also changed
depression rates, however deprivation and mental health medication also
contributed.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_175"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:10 Case Room 3 (260-055)</p></div>
## Prior-Based Bayesian Information Criterion {.unnumbered}
<p style="text-align:center">
M. J. Bayarri^1^, James Berger^2^, Woncheol Jang^3^, Surajit Ray^4^, Luis Pericchi^5^, and Ingmar Visser^6^<br />
^1^University of Valencia<br />
^2^Duke University<br />
^3^Seoul National University<br />
^4^University of Glasgow<br />
^5^University of Puerto Rico<br />
^6^University of Amsterdam<br />
</p>
<span>**Abstract:**</span> We present a new approach to model selection
and Bayes factor determination, based on Laplace expansions (as in BIC),
which we call Prior-based Bayes Information Criterion (PBIC). In this
approach, the Laplace expansion is only done with the likelihood
function, and then a suitable prior distribution is chosen to allow
exact computation of the (approximate) marginal likelihood arising from
the Laplace approximation and the prior. The result is a closed-form
expression similar to BIC, but now involves a term arising from the
prior distribution (which BIC ignores) and also incorporates the idea
that different parameters can have different effective sample sizes
(whereas BIC only allows one overall sample size $n$). We also consider
a modification of PBIC which is more favorable to complex models.

<span>**Keywords:**</span> Bayes factors, model selection, Cauchy
priors, consistency, effective sample size, Fisher information, Laplace
expansions, robust priors
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_208"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:10 Case Room 4 (260-009)</p></div>
## Early Childhood Dental Decay {.unnumbered}
<p style="text-align:center">
Sarah Sonal<br />
University of Canterbury<br />
</p>
<span>**Abstract:**</span> Our teeth are some of our most useful tools. They let us eat tasty food, take those plastic tags off new clothes and enhance our smiles to convey joy. They also have to last us a lifetime and need to be looked after. Teeth are a mutually supportive structure, even one extraction can destabilize the remaining teeth. Early intervention in oral health can prevent a lifetime of discomfort, embarrassment and expensive treatments. An issue that is facing Dentists in New Zealand and abroad are preschool children missing treatment appointments. These children have more dental issues in later childhood. 

The research question I aim to answer is: Does early dental neglect increase dental issues in later childhood? My thesis will use traditional statistics along with datamining and machine learning techniques to investigate these anecdotal claims.

Using the geographical information of the dataset I will be utilizing the Deprivation data from Statistics New Zealand to research if these children are from more deprived neighborhoods.

<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_067"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:30 098 Lecture Theatre (260-098)</p></div>
## Geographically Weighted Principal Component Analysis For Spatio-Temporal Statistical Dataset {.unnumbered}
<p style="text-align:center">
Narumasa Tsutsumida^1^, Paul Harris^2^, and Alexis Comber^3^<br />
^1^Kyoto University<br />
^2^Rothamsted Research<br />
^3^Univerisity of Leeds<br />
</p>
<span>**Abstract:**</span> Spatio-temporal statistical datasets are
becoming widely available for social, ecomonic, and environmental
researches, however it is often difficult to summarize it and undermine
hidden spatial/temporal patterns due to its complexity. Geographically
weighted principal component analysis (GWPCA), which uses a moving
window or kernel and applies localized PCAs over geographical scape, may
be worth to do it, while to optimize kernel bandwidth size and to
determine the number of component to retain (NCR) were the most concern
(Tsutsumida et al (2017)). In this research we determine both of them
together simultaneously so as to minimize leave-one-out residual
coefficient of variation of GWPCA with changing bandwidth size and NCR.
As a case study we use annual goat population statistics across 341
administrative units in Mongolia in 1990-2012, and show spatiotemporal
variations in data, especially influenced by natural disasters.

<span>**Keywords:**</span> Geographically weighted model,
Spatio-temporal data, Parameter optimization

<span>**References:**</span>

Tsutsumida N., P. Harris, , A. Comber. 2017. The Application of a
Geographically Weighted Principal Component Analysis for Exploring
Twenty-three Years of Goat Population Change across Mongolia. *Annals of
the American Association of Geographers*, **107(5)**, 1060–1074.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_116"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:30 OGGB4 (260-073)</p></div>
## Dimensionality Reduction Of Multivariate Data For Bayesian Analysis {.unnumbered}
<p style="text-align:center">
Anjali Gupta^1^, James Curran^1^, Sally Coulson^2^, and Christopher Triggs^1^<br />
^1^University of Auckland<br />
^2^ESR<br />
</p>
<span>**Abstract:**</span> In 2004, Aitken and Lucy published an article detailing a two-level
likelihood ratio for multivariate trace evidence. This model has been
adopted in a number of forensic disciplines such as the interpretation
of glass, drugs (MDMA), and ink. Modern instrumentation is capable of
measuring many elements in very low quantities and, not surprisingly,
forensic scientists wish to exploit the potential of this extra
information to increase the weight of this evidence. The issue, from a
statistical point of view, is that the increase in the number of
variables (dimension) in the problem leads to increased data demand to
understand both the variability within a source, and in between sources.
Such information will come in time, but usually we don’t have enough.
One solution to this problem is to attempt to reduce the dimensionality
through methods such as principal component analysis. This practice is
quite common in high dimensional machine learning problems. In this
talk, I will describe a study where we attempt to quantify the effects
of this this approach on the resulting likelihood ratios using data
obtained from SEM-EDX instrument.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_016"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:30 OGGB5 (260-051)</p></div>
## An EWMA Chart For Monitoring Covariance Matrix Based On Dissimilarity Index {.unnumbered}
<p style="text-align:center">
Longcheen Huwang<br />
National Tsing Hua University<br />
</p>
<span>**Abstract:**</span> In this talk, we propose an EWMA chart for
monitoring covariance matrix based on the dissimilarity index of two
matrices. It is different from the conventional EWMA charts for
monitoring covariance matrix which are either based on comparing the sum
or product or both of the eigenvalues of the estimated EWMA covariance
matrix with those of the IC covariance matrix. The proposed chart
essentially monitors covariance matrix by comparing the individual
eigenvalues of the estimated EWMA covariance matrix with those of the
estimated covariance matrix from the IC phase I data. We evaluate the
performance of the proposed chart by comparing it with the best existing
chart under the multivariate normal process. Furthermore, to prevent the
control limit of the proposed EMMA chart using the limited IC phase I
data from having extensively excessive false alarms, we use a bootstrap
method to adjust the control limit to guarantee that the proposed chart
has the actual IC average run length not less than the nominal one with
a certain probability. Finally, we use an example to demonstrate the
applicability and implementation of the proposed chart.

<span>**Keywords:**</span> Average run length, dissimilarity index,
EWMA; out-of-control

<span>**References:**</span>

Hawkins, D.M. and Maboudou-Tchao E.M. (2008). Multivariate exponentially
weighted moving covariance matrix. <span>*Technometrics*</span>,
<span>**50**</span>, 155-166.

Kano, M., Hasebe, S. and Hashimoto, I. (2002). Statistical process
monitoring based on dissimilarity of process data. <span>*AIChE
Journal*</span>, <span>**48**</span>, 1231-1240.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_162"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:30 Case Room 2 (260-057)</p></div>
## Adjusting For Linkage Bias In The Analysis Of Record-Linked Data {.unnumbered}
<p style="text-align:center">
Patrick Graham<br />
Stats NZ and Bayesian Research<br />
</p>
<span>**Abstract:**</span> Data formed from record-linkage of two or
more datasets are an increasingly important source of data for public
health and social science research. For example, a study cohort may be
linked to administrative data in order to add outcome or covariate
information to data collected directly from study participants. However,
regardless of the linkage method, it is often the case that not all
records are linked. Further, linkage rates usually vary with
characteristics of analytical interest and this differential linkage can
bias analyses restricted just to linked records. While linked records
have full outcome and covariate information, unlinked records exhibit
“block-missingness” whereby the values for the entire block of variables
contained in the file that is linked to are missing for unlinked
records. Similar missing data structures occur in other contexts,
including panel studies when participants decline participation in one
or more study waves. In this paper, I consider the problem of adjusting
for linkage bias from both Bayesian and frequentist perspectives. A
basic distinction is whether analysis is based on all available data or
just the linked cases. The Bayesian perspective leads to the former
option and to Gibbs sampling and multiple imputation as reasonable
methods. Basing analysis only on the linked cases seems to require a
frequentist perspective and leads to inverse probability of linkage
weighting and conditional maximum likelihood as reasonable approaches.
The implications of the assumption of ignorable linkage also differ
somewhat between the approaches. A simulation investigation confirms
that, assuming ignorable linkage given observed data, multiple
imputation, conditional maximum likelihood and inverse probability of
linkage weighting all succeed in adjusting for linkage bias and achieve
nominal interval coverage rates. Conditional maximum likelihood is
slightly more efficient than inverse probability of linkage weighting
and that multiple imputation can be more efficient than conditional
maximum likelihood. Extensions to the case of non-ignorable linkage are
also considered.

<span>**Keywords:**</span> Record linkage, Missing data, Bayesian
inference, Gibbs sampler, Multiple imputation
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_176"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:30 Case Room 3 (260-055)</p></div>
## Bayesian Semiparametric Hierarchical Models For Longitudinal Data Analysis With Application To Dose-Response Studies {.unnumbered}
<p style="text-align:center">
Taeryon Choi<br />
Korea University<br />
</p>
<span>**Abstract:**</span> In this work, we propose semiparametric
Bayesian hierarchical additive mixed effects models for analyzing either
longitudinal data or clustered data with applications to dose-response
studies. In the semiparametric mixed effects model structure, we
estimate nonparametric smoothing functions of continuous covariates by
using a spectral representation of Gaussian processes and the
subject-specific random effects by using Dirichlet process mixtures. In
this framework, we develop semiparametric mixed effects models that
include normal regression and quantile regressions with or without shape
restrictions. In addition, we deal with the Bayesian nonparametric
measurement error models, or errors-in-variable regression models, using
Fourier series and Dirchlet process mixtures, in which the true
covariate is not observable, but the surrogate of the true covariate, is
only observed. The proposed methodology is compared with other existing
approaches to additive mixed models in simulation studies and benchmark
data examples. More importantly, we consider a real data application for
dose-response analysis, in which measurement errors and shape
constraints in the regression functions need to be incorporated with
inter-study variability.

<span>**Keywords:**</span> Cadmium toxicity, Cosine series,
Dose-response study, Hierarchical Model, Measurement errors, Shape
restriction
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_091"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:30 Case Room 4 (260-009)</p></div>
## Optimizing Junior Rugby Weight Limits {.unnumbered}
<p style="text-align:center">
Emma Campbell, Ankit Patel, and Paul Bracewell<br />
DOT Loves Data<br />
</p>
<span>**Abstract:**</span> The New Zealand rugby community is aware of safety issues within the
junior game and has applied weight limits for each tackle grade to
minimize injury risk. However, for heavier children this can create an
uncomfortable situation as they may no longer be playing with their peer
group. The study evaluated almost 13,000 observations from junior rugby
players across three seasons (2015-2017) using data supplied by
Wellington Rugby. To protect privacy, the data was structured so that an
individual could not be readily identified but could be tracked across
seasons to determine churn. As data for several consecutive seasons was
available, we could determine the likelihood of a junior player
returning the following season and isolate the drivers of this
behaviour. Applying a logistic regression and repeated measures analysis
the study determined if children who are over the specified weight limit
for their age group are more likely to leave the game. Furthermore,
assuming the importance of playing with peers, the study identified the
impact of age in relation to the date-of-birth cut-off of January 1st.
This is of interest given that a child playing above their age-weight
grade could be competing against individuals three school years above
them. The study primarily focuses on determining the optimal age-weight
bands while the secondary focus is on determining the likelihood of a
junior Wellington rugby player returning the following season and
isolating the drivers of this behaviour.

<span>**Keywords:**</span> Logistic regression, repeated measures, player retention,
optimization
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_144"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:50 098 Lecture Theatre (260-098)</p></div>
## Spatial Scan Statistics For Matched Case-Control Data {.unnumbered}
<p style="text-align:center">
Inkyung Jung<br />
Yonsei University College of Medicine<br />
</p>
<span>**Abstract:**</span> Spatial scan statistics are widely used for
cluster detection analysis in geographical disease surveillance. While
the method has been developed for various types of data such as binary,
count and continuous data, spatial scan statistics for matched
case-control data, which often arise in spatial epidemiology, have not
been considered yet. In this paper, we propose two spatial scan
statistics for matched case-control data. The proposed test statistics
properly consider the correlations between matched pairs. We evaluate
statistical power and cluster detection accuracy of the proposed methods
through simulations comparing with the Bernoulli-based method. We
illustrate the methods with the use of a real data example.

<span>**Keywords:**</span> Spatial epidemiology, cluster detection,
SaTScan, McNemar test, conditional logistic regression
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_124"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:50 OGGB4 (260-073)</p></div>
## Whitebait In All Its Varieties: One Fish, Two Fish, Three, Four, Five Fish. {.unnumbered}
<p style="text-align:center">
Bridget Armstrong<br />
University of Canterbury<br />
</p>
<span>**Abstract:**</span> There are five species of fishes of the genus Galaxias that make up whitebait catches in New Zealand, although one species (G. maculatus) makes up &gt;90% of the catch. Whitebait are immature post-larval fish that have yet to develop the distinctive morphological traits of adults. However, in their tiny stages as whitebait the five species are difficult to tell apart. There are also distinct spatial (rivers) and temporal (different months in the whitebait fishing season) differences among the species and even within species. To manage the fishery better it is necessary to identify regional differences in the species composition of catches, which is difficult because of the time and effort required to sample catches and identify species morphologically or genetically. In my study, I will use a recently compiled database comprising 17,000 entries of whitebait samples, species composition, and variability to develop a statistical model to predict the likelihood of species-to-species composition of catches throughout New Zealand. This probabilistic model could potentially be a powerful tool in the fishery and conservation of whitebait species, some of which are considered to be threatened. 
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_191"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:50 OGGB5 (260-051)</p></div>
## Latent Variable Models And Multivariate Binomial Data {.unnumbered}
<p style="text-align:center">
John Holmes<br />
University of Otago<br />
</p>
<span>**Abstract:**</span> A large body of work has been devoted to
latent variable models applicable to multivariate binary data. However
little work has been put into extending these models to cases where the
observed data is multivariate binomial. In this paper, we will first
show that models that use either a logit or probit link function, offer
the same level of modelling flexibility in the binary case, but only the
logit link fits into a data augmentation approach that compactly extends
from binary to binomial. Secondly, we will demonstrate that multivariate
binomial data provides greater flexibility in how the link function can
be represented. Lastly, we will consider properties of the implied
distribution of latent probabilities under a logit link.

<span>**Keywords:**</span> Multivariate binomial data, principal
components/factor analysis, item response theory, link functions,
logit-normal distributions

<span>**References:**</span>

(ed.) Bartholomew, D. J. and Knott, M. and Moustaki, I. (2011). *Latent
Variable Models and Factor Analysis: A Unified Approach*. Chichester:
John Wiley & Sons.

Johnson, N.L. (1949). Systems of Frequency Curves Generated by Methods
of Translation. *Biometrika*, **36**, 149–276.

Polson, N. G. and Scott, J. G. and Windle, J. (2013). Bayesian inference
for logistic models using <span>P<span>ó</span>lya</span>-gamma latent
variables. *Journal of the American Statistical Association*, **108**,
1339–1349.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_193"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:50 Case Room 2 (260-057)</p></div>
## Asking About Sex In General Health Surveys: Comparing The Methods And Findings Of The 2010 Health Survey For England With Those Of The Third National Survey Of Sexual Attitudes And Lifestyles {.unnumbered}
<p style="text-align:center">
Philip Prah^1^, Anne Johnson^2^, Soazig Clifton^2^, Jennifer Mindell^2^, Andrew Copas^2^, Chloe Robinson^3^, Rachel Craig^3^, Sarah Woodhall^2^, Wendy Macdowall^4^, Elizabeth Fuller^3^, Bob Erens^2^, Pam Sonnenberg^2^, Kaye Wellings^4^, Catherine Mercer^2^, and Anthony Nardone^5^<br />
^1^Auckland University of Technology<br />
^2^University College London<br />
^3^NatCen<br />
^4^London School of Hygiene & Tropical Medicine<br />
^5^Public Health England<br />
</p>
<span>**Abstract:**</span> Including questions about sexual health in the annual Health Survey for
England (HSE) provides opportunities for regular measurement of key
public health indicators, augmenting Britain’s decennial National Survey
of Sexual Attitudes and Lifestyles (Natsal). However, contextual and
methodological differences may limit comparability of the findings. For
instance both surveys used self-completion for administering sexual
behaviour questions but this was via computer-assisted self-interview
(CASI) in Natsal-3 and a pen-and-paper questionnaire in HSE 2010. We
examine the extent of these differences between HSE 2010 and Natsal-3
(undertaken 2010-2012) and investigate their impact on parameter
estimates. For inclusion to this study, we restricted participants to
men and women in the 2010 HSE (n = 2,782 men and 3,588 women) and
Natsal-3 (n = 4,882 men and 6,869 women) aged 16-69 years and resident
in England. We compared their demographic characteristics, the amount of
non-response to, and estimates from, sexual health questions. We used
complex survey analysis to take into account stratification, clustering,
and weighting of the data in each survey. Logistic regression was used
to measure the extent to which sexual health estimates differ in HSE
2010 relative to Natsal-3, with multivariable models to adjust for
significant demographic confounders. Additionally, investigated
age-group interactions to see if differences between the surveys varied
by age. The surveys achieved similar response rates, both around 60While
a relatively high response to sexual health questions in HSE 2010
demonstrates the feasibility of asking such questions in a general
health survey, differences with Natsal-3 do exist. These are likely due
to the HSE’s context as a general health survey and methodological
limitations such as its current use of pen-and-paper questionnaires.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_132"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:50 Case Room 3 (260-055)</p></div>
## Bayesian Continuous Space-Time Model Of Burglaries {.unnumbered}
<p style="text-align:center">
Chaitanya Joshi, Paul Brown, and Stephen Joe<br />
University of Waikato<br />
</p>
<span>**Abstract:**</span> Building a predictive model of crime with
good predictive accuracy has a great value in enabling efficient use of
policing resources and reduction in crime. Building such models is not
straightforward though due to the dynamic nature of the crime process.
The crime not only evolves over both space and time, but is also related
to several complex socio-economic factors, not all of which can be
measured directly and accurately. The last decade or more has seen a
surge in the effort to model crime more accurately. Many of the models
developed so far have failed to capture the crime with a great degree of
accuracy. The main reasons could be that all these models discretise the
space using grid cells and that they are spatial, not spatio-temporal.
We fit a log Gaussian Cox process model using the INLA-SPDE approach.
This not only allows us to capture crime as a process continuous in both
space and time, but also allows us to include socio-economic factors as
well as the ’near repeat’ phenomenon. In this talk, we will discuss the
model building process and the accuracy achieved.

<span>**Keywords:**</span> Bayesian spatio-temporal model, INLA-SPDE,
predicting crime
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_164"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:50 Case Room 4 (260-009)</p></div>
## Tolerance Limits For The Reliability Of Semiconductor Devices Using Longitudinal Data {.unnumbered}
<p style="text-align:center">
Vera Hofer^1^, Johannes Leitner^1^, Horst Lewitschnig^2^, and Thomas Nowak^1^<br />
^1^University of Graz<br />
^2^Infineon Technologies Austria AG<br />
</p>
<span>**Abstract:**</span> Especially in the automotive industry, semiconductor devices are key components for the proper functioning of the entire vehicle. Therefore, issues concerning the reliability of these components are of crucial
importance to manufacturers of semiconductor devices.

In this quality control task, we consider longitudinal data from high
temperature operating life tests. Manufacturers then need to find
appropriate tolerance limits for their final electrical product tests,
such that the proper functioning of their devices is ensured. Based on
these datasets, we compute tolerance limits that could then be used by
automated test equipment for the ongoing quality control process.
Devices with electrical parameters within their respective tolerance
limits can successfully finish the production line, while all other
devices will be discarded. In calculating these tolerance limits, our
approach consists of two steps: First, the observed measurements are
transformed in order to capture measurement biases and gauge
repeatability and reproducibility. Then, in the second step, we compute
tolerance limits based on a multivariate copula model with skew normal
distributed margins. In order to solve the resulting optimization
problem, we propose a new derivative-free optimization procedure.

The capability of the model is demonstrated by computing optimal
tolerance limits for several drift patterns that are expected to cover a
wide range of scenarios. Based on these computations, we show the
resulting yield losses and analyze the performance of the tolerance
limits a large simulation study.

<span>**Acknowledgment**</span>

This work was supported by the ECSEL Joint Undertaking under grant
agreement No. 662133 - PowerBase. This Joint Undertaking receives
support from the European Union’s Horizon 2020 research and innovation
programme and Austria, Belgium, Germany, Italy, Netherlands, Norway,
Slovakia, Spain and United Kingdom.

<span>**Keywords:**</span> quality control, tolerance limits, copulas,
skew normal distribution
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_189"><p class="contribBanner">Tuesday 12<sup>th</sup> 16:00 098 Lecture Theatre (260-098)</p></div>
## Model-Checking For Regressions: A Local Smoothing-Based Global Smoothing Test {.unnumbered}
<p style="text-align:center">
Lingzhu Li and Lixing Zhu<br />
Hong Kong Baptist University<br />
</p>
<span>**Abstract:**</span> As the two kinds of methods for model
specification problem, local smoothing tests and global smoothing tests
exhibit different characteristics. Compared with global smoothing tests,
local smoothing tests can only detect local alternatives distinct from
the null hypothesis at a much slower rate when the dimension of
predictor vector is high, but can be more sensitive to high-frequency
alternatives. We suggest a projection-based test that builds a bridge
between the local and global smoothing methodologies to benefit from
their own advantages. The test construction is based on a kernel
estimation-based local smoothing method and the resulting test becomes a
distance-based global smoothing test. A closed-form expression of the
test statistic is derived and the asymptotic properties are
investigated. Simulations and a real data analysis are conducted to
evaluate the performance of the test in finite sample cases.

<span>**Keywords:**</span> Global smoothing test, projection-based
methods, local smoothing test

<span>**References:**</span>

Zheng, J. X. (1996). *Journal of Econometrics: A consistent test of
functional form via nonparametric estimation techniques*, **75(2)**,
263–289.

Bierens, H. J. (1982). *Journal of Econometrics: Consistent model
specification tests*, **20**, 105-134.

Lavergne, P. and Patilea, V. (2012). *Journal of business & economic
statistics: One for all and all for one: regression checks with many
regressors.* **30(1)**, 41–52. Taylor & Francis Group.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_135"><p class="contribBanner">Tuesday 12<sup>th</sup> 16:00 OGGB4 (260-073)</p></div>
## Breeding Value Estimation In Partially-Genotyped Populations {.unnumbered}
<p style="text-align:center">
Alastair Lamont<br />
University of Otago<br />
</p>
<span>**Abstract:**</span> In livestock, a primary goal is the identification of individuals’
breeding values - a measure of their genetic worth. This identification
can be used to aid with selective breeding, but is non trivial due to
how large data can be.

Measured traits are typically modelled as being caused by both breeding
values and also environmental fixed effects. An efficient method for
fitting this model was developed by Henderson (1984), based upon
generalized least squares. This method could be applied to data where
the pedigree - how each animal was related to one another - was fully
known.

Improvements in technology have allowed the genetic information of an
animal to be directly measured. These measurements can be taken very
early in life, with the goal of informing selective breeding faster and
more efficiently. Meuwissen (2001) adapted the standard model to
incorporate genetic data, and additionally developed multiple fitting
methods for this model.

Modern datasets are frequently only partially genotyped. The methods of
Meuwissen cannot be used for these data, as they are only applicable to
populations in which every individual is gentoyped. Modern fitting
approaches aim to make use of the available genetic information without
requiring all individuals be genotyped.

These approaches tend to either impute or average over missing genotype
data, which can affect the overall accuracy of breeding value
estimation. We are developing an alternative which instead incorporates
missing data within the model, rather than having to adapt fitting
approaches to accommodate it.

Preliminary results suggest that approaching fitting is this way can
lead to improved accuracy of estimation in certain situations.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_151"><p class="contribBanner">Tuesday 12<sup>th</sup> 16:00 OGGB5 (260-051)</p></div>
## BIVAS: A Scalable Bayesian Method For Bi-Level Variable Selection {.unnumbered}
<p style="text-align:center">
Mingxuan Cai^1^, Mingwei Dai^2^, Jingsi Ming^1^, Jin Liu^3^, Can Yang^4^, and Heng Peng^1^<br />
^1^Hong Kong Baptist University<br />
^2^Xi'an Jiaotong University<br />
^3^Duke-NUS Medical School<br />
^4^Hong Kong University of Science and Technology<br />
</p>
<span>**Abstract:**</span> In this paper we propose a bi-level variable selection approach, Bivas,
for linear regression under the Bayesian framework. This model assumes
that each variable is assigned to a pre-specified group where only a
subset of the groups truly contribute to the response variable. Besides,
within the active groups, there are only a small number of variables are
important. A hierarchical formulation is adopted to mimic this pattern,
where the spike-slab prior is put on both individual variable level and
group level. A computationally efficient algorithm is developed using
variational inference. Both simulation studies and real examples are
analyzed, through which we illustrate the advantages of our method for
both variable selection and parameter estimation under certain
conditions.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_078"><p class="contribBanner">Tuesday 12<sup>th</sup> 16:00 Case Room 2 (260-057)</p></div>
## Ranking Potential Shoplifters In Real Time {.unnumbered}
<p style="text-align:center">
Barry McDonald<br />
Massey University<br />
</p>
<span>**Abstract:**</span> A company with a focus on retail crime
prevention brought to MINZ (Mathematics in Industry in New Zealand) the
task of “<span>*Who is most likely to offend in my store, now*</span>”.
The company supplied an anonymised set of data on incidents and
offenders. The task, for the statisticians and mathematicians involved,
was to try to find ways to use the data to nominate, say, the top ten
likely offenders for any particular store and any particular time, using
up-to-the-minute information (real time). The problem was analogous to
finding a regression model when every row of data has response
identically 1 (an incident), and for many places and times there is no
data. This talk will describe how the problem was tackled.

<span>**Keywords:**</span> Retail crime, ranking, ZINB, regression, real
time
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_073"><p class="contribBanner">Tuesday 12<sup>th</sup> 16:00 Case Room 3 (260-055)</p></div>
## Two Stage Approach To Data-Driven Subgroup Identification In Clinical Trials {.unnumbered}
<p style="text-align:center">
Toshio Shimokawa and Kensuke Tanioka<br />
Wakayama Medical University<br />
</p>
<span>**Abstract:**</span> A personalized medicine have been improved
through the statistic analysis of Big data such as registry data. In
these researches, subgroup identification analysis have been focused on.
The purpose of the analysis is detecting subgroup such that the efficacy
of the medical treatment is effective based on predictive factors for
the treatment.

Foster et al., (2011) proposed the subgroup identification method based
on two stage approach, called Virtual Twins (VT) method. In the first
stage of VT, the difference of treatment effect between treatment group
and control group is estimated by Random Forest. In the second stage,
responders are identified by using CART, where the estimated these
differences are set as the predictor variables.

However, the prediction accuracy of RandomForest tends to be lower than
that of Boosting. Therefore, generalized boosted model (Ridgeway, 2006)
is adopted in the first step. In addition to that, the number of rules
tend to be large in the second step when CART is used. In this paper, we
adopt a priori algorithm as the same way of SIDES(Lipkovich et al.,
2011).

<span>**Keywords:**</span> A priori algorithm, boosting, personalized
medicine

<span>**References:**</span>

Forster, J.C., Taylor, J.M.G and Ruberg, S.J. (2011). *Subgroup
identification from randomized clinical trial data.* Stat.Med,
<span>**30**</span>, 2867-2880.

Lipkovich, I., Dmitrienko, A., Denne, J. and Enas, G. (2011). *Subgroup
identification based on differential effect search-recursive
partitioning method for establishing response to treatment in patient
subpopulations*. Stat.Med, <span>**30**</span>, 2601-2880.

Ridgeway, G. (2006).Gbm: Generalized boosted regression models. R
package version 1.5-7. Available at
`http://www.i-pensieri.com/gregr/gbm.shtml.`
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_184"><p class="contribBanner">Tuesday 12<sup>th</sup> 16:20 098 Lecture Theatre (260-098)</p></div>
## Inverse Regression For Multivariate Functional Data {.unnumbered}
<p style="text-align:center">
Ci-Ren Jiang^1^ and Lu-Hung Chen^2^<br />
^1^Academia Sinica<br />
^2^National Chung Hsing University<br />
</p>
<span>**Abstract:**</span> Inverse regression is an appearing dimension
reduction method for regression models with multivariate covariates.
Recently, it has been extended to the cases with functional or
longitudinal covariates. However, the extensions focus on one
functional/longitudinal covariate only. In this work, we extend
functional inverse regression to the cases with multivariate functional
covariates. The asymptotical properties of the proposed estimators are
investigated. Simulation studies and data analysis are also provided to
demonstrate the performance of our method.

<span>**Keywords:**</span> Multidimensional/Multivariate Functional Data
Analysis, Inverse Regression, Parallel Computing, Smoothing
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_138"><p class="contribBanner">Tuesday 12<sup>th</sup> 16:20 OGGB4 (260-073)</p></div>
## Including Covariate Estimation Error When Predicting Species Distributions: A Simulation Exercise Using Template Model Builder {.unnumbered}
<p style="text-align:center">
Andrea Havron and Russell Millar<br />
University of Auckland<br />
</p>
<span>**Abstract:**</span> Ecological managers often require knowledge
about species distributions across a spatial region in order to
facilitate best management practices. Statistical models are frequently
used to infer relationships between species observations (eg. presence,
abundance, biomass, etc.) and environmental covariates in order to
predict values at unobserved locations. Issues remain for situations
where covariate information is not available for a predictive location.
In these cases, spatial maps of covariates are often generated using
tools such as kriging; however, the uncertainties from this statistical
estimation are not carried through to the final species distribution
map. New advances in spatial modelling using the automated
differentiation software, Template Model Builder, allow both the spatial
process of the environmental covariates and the observations to be
modelled simultaneously by maximizing the marginal likelihood of the
fixed effects with a Laplace approximation after integrating out the
random spatial effects. This method allows for the uncertainty of the
covariate estimation process to be included in the standard errors of
final predictions as well as any derived quantities, such as total
biomass for a spatial region. We intend to demonstrate this method and
compare our predictions to those from a model where regional covariate
information is supplied from a kriging model.

<span>**Keywords:**</span> spatial model, predicting covariates,
Template Model Builder

<span>**References:**</span>

Kristensen, K.,Nielsen, A., Berg, C.W., Skuag, H. and Bell, B. (2015).
TMB: Automatic Differentiation and Laplace Approximation. In: *Journal
of Statistical Software*,**70**, 1–21.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_074"><p class="contribBanner">Tuesday 12<sup>th</sup> 16:20 OGGB5 (260-051)</p></div>
## Adjusted Adaptive Index Model For Binary Response {.unnumbered}
<p style="text-align:center">
Ke Wan^1^, Kensuke Tanioka^1^, Kun Yang^2^, and Toshio Shimokawa^1^<br />
^1^Wakayama Medical University<br />
^2^Southwest Jiaotong University<br />
</p>
<span>**Abstract:**</span> In questionnaire surveys, multiple regression analysis is usually used
to evaluate influence factors. In addition to that, data mining methods
such as Classification and Regression Trees (Breiman et al., 1984) are
also used. In the research for tourism studies, it is difficult to
contribute the policies for landscape or buildings from the results. In
this paper, we call these factors “ uncontrollable exploratory
variables". On the other hands, the polices for amounts of garbages or
inhabitant consciousness can be contributed from the results. We call
these factors “controllable exploratory variables". The purpose of this
report is grading for each subject which is conducted based on
controllable exploratory variables with adjusting the effects of
uncontrollable exploratory variables. Concretely, we modified the AIM
method (Tian and Tibshirani, 2010) and conduct gradings based on the sum
of the production rules for controllable exploratory variables with
adjusting the effects of uncontrollable exploratory variables.

<span>**Keywords:**</span> logistic regression, production rule, grading

<span>**References:**</span>

Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984).
*Classification and Regression Trees*. Wadsworth.

Tian, L., and Tibshirani, R. (2011). *Adaptive index models for
marker-based risk stratification.* Biostatistics, <span>**12**</span>,
68–86.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_109"><p class="contribBanner">Tuesday 12<sup>th</sup> 16:20 Case Room 2 (260-057)</p></div>
## Factors Influencing On Growth Of Garments Industry In Bangladesh {.unnumbered}
<p style="text-align:center">
Md. Shahidul Islam and Mohammad Sazzad Mosharrof<br />
Auckland University of Technology<br />
</p>
<span>**Abstract:**</span> If globalization provides the backdrop for drama, then the
achievements of the garment industry in Bangladesh are indeed dramatic.
The garment industry particularly has played a pioneering role in the
development of industrial sector of Bangladesh and has grown rapidly for
the last 15 years and now one of the largest garment exporters in the
world. The study of our research has examined the successful development
process of the Bangladesh garments industry and explored the keys to its
success. In point of view we collected some primary and secondary data
of garment manufacturers and traders to investigate further the key role
and mechanism of technology transfers to operate a garment industry in
Bangladesh. After that we apply some statistical models such as random
effect model, tobit model and probit model to generate the performance
of our variables. Also we use some dummy variables in case of different
years for all the models. The result of our statistical models indicate
that the high education of manufacturers and enterprise performance are
highly significant. The only reason of this close relationship is that
manufacturers have to upgrade their skills and they know how
continuously in order to survive the intense competition in the world
garment market and the high levels of the general human capital of the
entrepreneur are needed to manage an increasing number of managers and
experts. The result also shows the formal training that the garment
entrepreneur has received in a foreign country and the entrepreneur’s
experience of working at a garments enterprise have small effect on
growth of garment industry but not so much high. This is because those
garment workers who had acquired skills and know-how but they could not
helped smoothly new manufacturers and afford to start trading houses
without good marketing and communication skills. But the traders who
received formal training abroad have provided higher-valued services for
manufacturers and contributed more to the proliferation of
manufacturers. We have also found that foreign owned trading houses
perform better than indigenous trading houses, which suggests that there
still exist skills and know-how to be learned from foreign countries. So
technology transfer seems to be a long-term process and its effect also
seems last over the long term. Finally the key point of our findings
strongly suggest that the performance of manufacturers and traders as
well as production technologies are very potential for the high growth
of industrial development. It has a great opportunity to earn a lot of
foreign currency through developing garment industry and contribute
economic development.

<span>**Keywords:**</span> Bangladesh Garments, Growth of Garment
Industry, Performance of Manufacturers and Traders, Statistical Model
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_103"><p class="contribBanner">Tuesday 12<sup>th</sup> 16:20 Case Room 3 (260-055)</p></div>
## Comparison Of Exact And Approximate Testing Procedures In Clinical Trials With Multiple Binary Endpoints {.unnumbered}
<p style="text-align:center">
Takuma Ishihara and Kouji Yamamoto<br />
Osaka City University<br />
</p>
<span>**Abstract:**</span> In confirmatory clinical trials, the efficacy of a test treatment are
sometimes assessed by using multiple primary endpoints. We consider a
trial in which the efficacy of a test treatment is confirmed only when
it is superior to control for at least one of the endpoints and not
clinically inferior for the remaining endpoints. Nakazuru et al. (2014)
proposed a testing procedure that is applicable to the above case when
endpoints are continuous variables. In this presentation, firstly, we
propose a testing procedure in the case that all of the endpoints are
binary.

Westfall and Troendle (2008) proposed multivariate permutation tests.
Using this methods, we also propose an exact multiple testing procedure.

Finally, we compare an exact and approximate testing procedures proposed
above. The performance of the proposed procedures was examined through
Monte Carlo simulations.

<span>**Keywords:**</span> Clinical trial; Multivariate Bernoulli
distribution; Non-inferiority; Superiority.

<span>**References:**</span>

Nakazuru, Y., Sozu, T., Hamada, C. and Yoshimura, I. (2014). A new
procedure of one-sided test in clinical trials with
multiple endpoints. *Japanese Journal of Biometrics,* **35**, 17-35.

Westfall PH and Troendle JF. (2008). Multiple testing with
minimal assumptions. *Biometrical Journal,* **50(5)**, 745-755.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_185"><p class="contribBanner">Tuesday 12<sup>th</sup> 16:40 098 Lecture Theatre (260-098)</p></div>
## Multiple Function-On-Function Linear Regression With Application To Weather Forecast Calibration {.unnumbered}
<p style="text-align:center">
Min-Chia Huang, Xin-Hua Wang, and Lu-Hung Chen<br />
National Chung Hsing University<br />
</p>
<span>**Abstract:**</span> We suggest a direct approach to estimate the coefficient functions for
function-on-function linear regression models. To avoid the risk of
discarding useful information for regressions, the approach does not
depend on basis representations or dimension reductions. It can
accommodate for multiple functional responses and multiple functional
predictors on different multidimensional domains, observed on dense or
irregular sparse grids. We demonstrate the performances of the approach
by simulation studies and a real application on calibrating numerical
weather forecasts.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_156"><p class="contribBanner">Tuesday 12<sup>th</sup> 16:40 OGGB4 (260-073)</p></div>
## Modelling The Distribution Of Lifetime Using Compound Time-Homogenous Poisson Process {.unnumbered}
<p style="text-align:center">
Kien Tran<br />
Victoria University of Wellington<br />
</p>
<span>**Abstract:**</span> Modelling the distribution of lifetime has traditionally been done by
constructing a deterministic function for the survival function and/or
force of mortality. This paper outlines previous research and presents
the author’s initial attempts to model the force of mortality and
remaining lifetime using time-homogenous compound Poisson processes.

The paper presents two models. In model 1, the force of mortality of an
individual is modelled as a random sum of i.i.d random variables (i.e. a
compound Poisson process). In model 2, each individual is assumed to
have an initial normally distributed innate lifetime, and their
remaining life is a shifted compound Poisson process. In other words, we
assume that there are random events coming at a constant rate modifying
either the force of mortality or remaining lifetime of individuals.
Simulations in R are then run to find the optimized parameters and the
empirical survival function, force of mortality and distribution of
lifetime are then constructed. Finally, these outputs are compared
existing models and actual demographic data.

It turns out that for model 1, it is very difficult to model the force
of mortality using a time-homogenous compound Poisson process without
introducing additional complications such as the inclusion of event
times. For model 2, however, if we allow the events to be a Cauchy
random variable, then we can model the survival function of New Zealand
population much better than several existing well-known specifications
such as Weibull.

<span>**Keywords:**</span> Distribution of lifetime, force of mortality, survival
function, time-homogenous compound Poisson process, innate lifetime, R
simulation

<span>**References:**</span>

Khmaladze, E (2013). Statistical methods with application to demography
and life insurance. CRC Press.

Weibull, W (1939). A statistical theory of the strength of materials.
Generalstabens litografiska anstalts frlag, 1st edition.

Gompertz, B (1825). On the Nature of the Function Expressive of the Law
of Human Mortality, and on a New Mode of Determining the Value of Life.
Philosophical Transactions of the Royal Society of London, 115, 513-583.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_072"><p class="contribBanner">Tuesday 12<sup>th</sup> 16:40 OGGB5 (260-051)</p></div>
## Detecting Change-Points In The Stress-Strength Reliability P(X&LT;Y) {.unnumbered}
<p style="text-align:center">
Hang Xu^1^, Philip L.H. Yu^1^, and Mayer Alvo^2^<br />
^1^Unversity of Hong Kong<br />
^2^University of Ottawa<br />
</p>
<span>**Abstract:**</span> We address the statistical problem of
detecting change-points in the stress-strength reliability $R=P(X<Y)$ in
a sequence of paired variables $(X,Y)$. Without specifying their
underlying distributions, we embed this non-parametric problem into a
parametric framework and apply the maximum likelihood method via a
dynamic programming approach to determine the locations of the
change-points in R. Under some mild conditions, we show the consistency
and asymptotic properties of the procedure to locate the change-points.
Simulation experiments reveal that in comparison with existing
parametric and non-parametric change-point detection methods, our
proposed method performs well in detecting both single and multiple
change-points in R in terms of the accuracy of the location estimation
and the computation time. It offers robust and effective detection
capability without the need to specify the exact underling distribution
of the variables. Applications to real data demonstrate the usefulness
of our proposed methodology for detecting the change-points in the
stress-strength reliability R.

<span>**Keywords:**</span> Multiple change-points detection;
Stress-strength model; Dynamic programming
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_111"><p class="contribBanner">Tuesday 12<sup>th</sup> 16:40 Case Room 2 (260-057)</p></div>
## New Zealand Crime And Victims Survey: Filling The Knowledge Gap {.unnumbered}
<p style="text-align:center">
Andrew Butcher and Michael Slyuzberg<br />
NZ Ministry of Justice<br />
</p>
<span>**Abstract:**</span> The key objective of the Ministry of Justice is to ensure that New
Zealand has a strong justice system that contributes to a safe and just
society. To achieve this objective, the ministry and the wider Justice
Sector need to know whether they are focusing their efforts in the right
places and really making a difference. It is often difficult because we
lack a crucial piece of information: how much crime is actually out
there. Administrative data does not provide an answer as only about 30
The New Zealand Crime and Victims Survey (NZCVS) is introduced to fill
this knowledge gap. The survey which is currently on the pilot phase was
designed to meet the recommendations of Statistics New Zealand and key
stakeholders' demand. It will interview about 8,000 of New Zealand
residents aged from 15 years old and aims to: provide information
about the extent (volumes and prevalence) and nature of crime and
victimisation in New Zealand; provide geographical break-down of
victimisation; provide extensive victims' demographics; measure how
much crime gets reported to Police; understand the experiences of
victims; measure crime trends in New Zealand.

The paper summarises the core requirements to NZCVS obtained from
extended discussions with key stakeholders and describes key design
features to be implemented in order to meet these requirements. These
key requirements include, but are not limited to: Measuring the
extent and nature of reported and unreported crime across New Zealand;
Providing in-depth story-telling of victims' experiences; Providing
frequent and timely information to support Investment Approach for
Justice and wider decision making; Reducing information gaps by
matching the NZCVS with administrative data in Statistics New Zealand's
Integrated Data Infrastructure (IDI).

In particular, the paper discusses modular survey design which includes
core crime and victimisation questions and revolving modules added
annually, stratified random sampling, a new highly automated approach to
offence coding through extended screening, measuring harm from being
victimised, obtaining respondents' informed consent for data matching,
use of survey data for extended analysis and forecasting and other
important survey features.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>
<div id = "talk_127"><p class="contribBanner">Tuesday 12<sup>th</sup> 16:40 Case Room 3 (260-055)</p></div>
## Missing Data In Randomised Control Trials: Stepped Multiple Imputation {.unnumbered}
<p style="text-align:center">
Rose Sisk and Alain Vandal<br />
Auckland University of Technology<br />
</p>
<span>**Abstract:**</span> Missing data in Randomised Control Trials is usually
unavoidable, but can present considerable issues to analysis in an
Intention-to-Treat (ITT) setting. Multiple imputation is often regarded
as the most appropriate method of handling missing data when compared
with simpler methods such as complete case analysis and mean/mode
imputation. However, in practice it can often be tricky to implement
when working with large longitudinal datasets. The Sodium Lowering in
Dialysate (SOLID) trial is a randomised control trial seeking to improve
cardiovascular and other outcomes by lowering the dialysate
concentration of sodium of patients on home haemodialysis. The trial
contains 99 participants and over 30 primary and secondary outcomes.
Missing data from various sources are present at baseline and at
follow-up time points. Attempting to multiply impute a large number of
outcomes, each measured at up to 4 follow-up times, proved to be a
challenging task in this study. Several attempts to obtain sensible
imputations were made but many of these failed due to the presence of
highly correlated outcomes which were often missing together. This
presentation discusses the approach taken to overcome this problem,
which involved defining sets of outcomes to impute in various rounds,
preventing sets of similar (highly correlated, missing together)
outcomes being imputed in the same round. Once a round of imputation was
completed, the next set of outcomes to be imputed was matched onto the
completed dataset. This process is repeated until the full ITT dataset
contains no missing values in any outcomes. We call this “stepped
imputation”. Theory from mixed models was also applied to seek measures
associated with the missingness mechanism, with the potential to include
them in the final model to further reduce any possible bias resulting
from missing data. Results from a simulation to test the validity of
“stepped imputation” will be presented. In this simulation, an attempt
is made to generate data related in a similar way to the outcomes in the
SOLID trial. Results from the “gold standard” analysis with no missing
data, and the complete case analysis is compared to the stepped
imputation method.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>


<p class="pagebreak"></p>