-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path05-Tuesday.Rmd
1643 lines (1442 loc) · 87.9 KB
/
05-Tuesday.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Programme And Abstracts For Tuesday 12^th^ Of December {#Tuesday .unnumbered}
<div id = "talk_198"><p class="keynoteBanner">Keynote: Tuesday 12<sup>th</sup> 9:10 098 Lecture Theatre (260-098)</p></div>
## Could Do Better … A Report Card For Statistical Computing {.unnumbered}
<p style="text-align:center">
Ross Ihaka and Brendon McArdle<br />
University of Auckland<br />
</p>
<span>**Abstract:**</span> Since the introduction of R, research in Statistical Computing has
plateaued. Although R is, at best, a stop-gap system, there appears to
be very little active research on creating better computing
environments for Statistics.
When work on R commenced there were a multitude of software systems
for statistical data analysis in use and under development. There was
friendly competition and collaboration between developers. While R can
be seen as providing a useful unification for users, its success and
dominance can be viewed as now holding back research and the
development of new systems.
In this talk we'll examine what might be behind this and also look at
some research aimed at exploring some of the design space for new
systems. The aim is to show constructively that new work in the area
is still possible.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_025"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:30 098 Lecture Theatre (260-098)</p></div>
## R&D Policy Regimes In France: New Evidence From A Spatio-Temporal Analysis {.unnumbered}
<p style="text-align:center">
Benjamin Montmartin^1^, Marcos Herrera^2^, and Nadine Massard^3^<br />
^1^GREDEG CNRS<br />
^2^CONICET<br />
^3^GAEL<br />
</p>
<span>**Abstract:**</span> Using a unique database containing
information on the amount of R&D tax credits and regional, national and
European subsidies received by firms in French NUTS3 regions over the
period 2001-2011, we provide new evidence on the efficiency of R&D
policies taking into account spatial dependency across regions. By
estimating a spatial Durbin model with regimes and fixed effects, we
show that in a context of yardstick competition between regions,
national subsidies are the only instrument that displays total leverage
effect. For other instruments internal and external effects balance each
other resulting in insignificant total effects. Structural breaks
corresponding to tax credit reforms are also revealed.
<span>**Keywords:**</span> Additionality, French policy mix, Spatial
panel, Structural break
<span>**References:**</span>
Pesaran, M. H. (2007). A simple panel unit root test in the presence of
cross-section dependence In: *Journal of Applied Econometrics*, **22**,
265–312.
Hendry, D. F. (1979). Predictive failure and econometric modelling in
macroeconomics: The transactions demand for money. In: *P. Ormerod
(Ed.), Economic Modelling: Current Issues and Problems in Macroeconomic
Modelling in the UK and the US*, **9**, 217–242. Heinemann Education
Books, London.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_084"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:30 OGGB4 (260-073)</p></div>
## Analysing Scientific Collaborations Of New Zealand Institutions Using Scopus Bibliometric Data {.unnumbered}
<p style="text-align:center">
Samin Aref^1^, David Friggens^2^, and Shaun Hendy^1^<br />
^1^University of Auckland<br />
^2^Ministry of Business Innovation & Employment<br />
</p>
<span>**Abstract:**</span> Scientific collaborations are among the main
enablers of development in small national science systems. Although
analysing scientific collaborations is a well-established subject in
scientometrics, evaluations of collaborative activities of countries
remain speculative with studies based on a limited number of fields or
using data too inadequate to fully represent collaborations at a
national level. This study provides a unique view on the collaborative
aspect of scientific activities in New Zealand. We perform a
quantitative study based on all Scopus publications in all subjects for
over 1500 New Zealand institutions over a period of 6 years to generate
an extensive mapping of New Zealand scientific collaborations. The
comparative results reveal the levels of collaboration between New
Zealand institutions and business enterprises, government institutions,
higher education providers, and private not for profit organisations in
2010-2015. Constructing a collaboration network of institutions, we
observe a power-law distribution indicating that a small number of New
Zealand institutions account for a large proportion of national
collaborations. Network centrality measures are deployed to identify the
most influential institutions of the country in terms of scientific
collaboration. We also provide comparative results on 15 universities
and crown research institutes based on 27 subject classifications. This
study was based on Scopus custom data and supported by the Te Pūnaha
Matatini internship program at Ministry of Business, Innovation &
Employment.
ArXiv preprint link: https://arxiv.org/pdf/1709.02897
<span>**Keywords:**</span> Big data modelling, Scientific collaboration,
Scientometrics, Network analysis, Scopus, New Zealand
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_170"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:30 OGGB5 (260-051)</p></div>
## Family Structure And Academic Achievements Of High School Students In Tonga {.unnumbered}
<p style="text-align:center">
Losana Vao Latu Latu<br />
University of Canterbury<br />
</p>
<span>**Abstract:**</span> In this study we examine how family structure affects the academic
achievement of students at the secondary level of education age in
Tonga. It is a comparative study aiming to find out whether there is a
significant difference between the academic achievements of students
from a traditional family and those from a non-traditional family. We
define a Tongan traditional family as being two biological parents (or
adoptive parents from birth), one male and one female where as
non-traditional family can be a single parent family, or the student has
no parent present (for example they are staying with relatives or
friends). In our study we are looking at what are the key drivers of
success and trying to understand the relationship between academic
achievements and family structure. We hope the study will provide
evidence-based information to aid the administrators, other educators
and parents to adopt the best practices and actions for the students.
The target population for this study is the high school students age 13
to 18 in Tonga. The study is limited to the high schools in the main
island of Tonga- Tongatapu which has 12 high schools where two high
schools are government schools and the others are private schools run by
different religions. In April we surveyed 360 students, 60 from each of
6 high schools, and present here our preliminary results.
<span>**Keywords:**</span> Education, policy, stratified sampling
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_017"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:30 Case Room 2 (260-057)</p></div>
## Analysis Of Multivariate Binary Longitudinal Data: Metabolic Syndrome During Menopausal Transition {.unnumbered}
<p style="text-align:center">
Geoff Jones<br />
Massey University<br />
</p>
<span>**Abstract:**</span> Metabolic syndrome (MetS) is a major
multifactorial condition that predisposes adults to type 2 diabetes and
cardiovascular disease. It is defined as having at least three of five
cardiometabolic risk components: 1) high fasting triglyceride level, 2)
low high-density lipoprotein (HDL) cholesterol, 3) elevated fasting
plasma glucose, 4) large waist circumference (abdominal obesity) and 5)
hypertension. In the US Study of Women’s Health Across the Nation
(SWAN), a 15-year multi-centre prospective cohort study of women from
five racial/ethnic groups, the incidence of MetS increased as midlife
women underwent the menopausal transition (MT). A model is sought to
examine the interdependent progression of the five MetS components and
the influence of demographic covariates.
<span>**Keywords:**</span> Multivariate binary data, longitudinal
analysis, metabolic syndrome
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_169"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:30 Case Room 3 (260-055)</p></div>
## Clustering Of Curves On A Spatial Domain Using A Bayesian Partitioning Model {.unnumbered}
<p style="text-align:center">
Chae Young Lim<br />
Seoul National University<br />
</p>
<span>**Abstract:**</span> We propose a Bayesian hierarchical model for
spatial clustering of the high-dimensional functional data based on the
effects of functional covariates. We couple the functional mixed-effects
model with a generalized spatial partitioning method for: (1)
identifying subregions for the high-dimensional spatio-functional data;
(2) improving the computational feasibility via parallel computing over
subregions or multi-level partitions; and (3) addressing the
near-boundary ambiguity in model-based spatial clustering techniques.
The proposed model extends the existing spatial clustering techniques to
produce spatially contiguous partitions for spatio-functional data. The
model successfully captured the regional effects of the atmospheric and
cloud properties on the spectral radiance measurements. This elaborates
the importance of considering spatially contiguous partitions for
identifying regional effects and small-scale variability.
<span>**Keywords:**</span> spatial clustering, Bayesian wavelets,
Voronoi tessellation, functional covariates
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_044"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:30 Case Room 4 (260-009)</p></div>
## The Uncomfortable Entrepreneurs: Bad Working Conditions And Entrepreneurial Commitment {.unnumbered}
<p style="text-align:center">
Catherine Laffineur<br />
Université Côte d'Azur, GREDEG-CNRS<br />
</p>
<span>**Abstract:**</span> In contrast to previous model dividing
necessity entrepreneurs as individuals facing push factors due to lack
of employment, we consider the possibility of push factors faced by
employed individuals (Folta et al. (2010)). The theoretical model yields
distinctive predictions relating occupation characteristics and the
probability of entry into entrepreneurship. Using PSED and ONET data, we
investigate how the characteristics of individuals? primary occupations
affect nascent entrepreneurs? effort put into venture creation. The
empirical evidences show that necessity entrepreneurs are not only
confined to unemployed individuals. We find compelling evidence that
individuals facing arduous working conditions (e.g. stressful
environment and physical tiredness) have a higher likelihood of entering
and succeeding in self-employment than others. Contrariwise, individuals
who experience high degree of self-realization, independence and
responsibility in the workplace are less committed to their business
than individuals exposed to arduous working conditions. These findings
have strong implication for how we interpret and analyze necessity
entrepreneurs and provide novel insights into the role of occupational
experience in the process of venture emergence.
<span>**Keywords:**</span> Entrepreneurship, Motivation,
Occupational characteristics, Employment choice.
<span>**References:**</span>
Folta, T. B., Delmar, F., & Wennberg, K. 2010. Hybrid entrepreneurship.
*Management Science*, 56(2), 253-269.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_028"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:50 098 Lecture Theatre (260-098)</p></div>
## Spatial Surveillance With Scan Statistics By Controlling The False Discovery Rate {.unnumbered}
<p style="text-align:center">
Xun Xiao<br />
Massey University<br />
</p>
<span>**Abstract:**</span> In this paper, I investigate a false
discovery approach based on spatial scan statistics to detect the
spatial disease clusters in a geographical region proposed by Li et al.
(2016). The incidence of disease is assumed to follow an inhomogeneous
Poisson model discussed in Kulldorff (1997). I show that, though spatial
scan statistics are highly correlated, the simple Banjamini-Hochberg
(linear step-up) procedure can control the false discovery rate of them
by proving that the multivariate Poisson distribution satisfies the PRDS
condition (positive regression dependence on a subset) in Benjamini and
Yekutieli (2001).
<span>**Keywords:**</span> False Discovery Rate, Poisson Distribution,
PRDS, Spatial Scan Statistics
<span>**References:**</span>
Benjamini, Y. and Yekutieli, D. (2001). *The control of the false
discovery rate in multiple testing under dependency*, Annals of
Statistics, **29**(4), 1165–1188.
Kulldorff, M. (1997). *A spatial scan statistic*, Communications in
Statistics-Theory and Methods **26**(6), 1481–1496.
Li, Y., Shu, L., and Tsung, F. (2016). *A false discovery approach for
scanning spatial disease clusters with arbitrary shapes*, IIE
transactions, **48**(7), 684–698.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_113"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:50 OGGB4 (260-073)</p></div>
## Statistical Models For The Source Attribution Of Zoonotic Diseases: A Study Of Campylobacteriosis {.unnumbered}
<p style="text-align:center">
Sih-Jing Liao, Martin Hazelton, Jonathan Marshall, and Nigel French<br />
Massey University<br />
</p>
<span>**Abstract:**</span> Preventing and controlling zoonoses with a
public health policy depends on the knowledge scientists have about the
transmitted pathogens. Modelling jointly the epidemiological data and
genetic information provides a methodology for tracing back the source
of infection. However, this creates difficulties in assessing genetic
efforts behind models of the final statistical inferences due to
increased model complexity. To explore the genetic effects in the joint
model, we develop a genetic free model and compare it to the joint
model. We apply the two models to a recent campylobacteriosis study to
estimate the attribution probability for each source. A spatial
covariate is also considered in the models in order to investigate the
effect of the level of rurality on the source attributions. Comparing
the attributions generated by the two models, we find that: i) the
genetic information integrated in the joint model gives a little more
precise inference to the sparse cases observed in highly rural areas
than the genetic free model; ii) on the logit scale, source attribution
probabilities follow linear trends against level of rurality; and iii)
poultry is the dominant source of campylobacteriosis in urban centres,
whereas ruminants are the most attributable source when in rural areas.
<span>**Keywords:**</span> source attribution, *Campylobacter*,
multinomial model, Dirichlet prior, HPD interval, DIC
<span>**References:**</span>
Bronowski, C., James, C.E. and Winstanley, C. (2014). Role of
environmental survival in transmission of *Campylobacter jejuni*. *FEMS
Microbiol Lett.*, **356**(1) 8–19.
Dingle, K.E., Colles, F.M., Wareing, D.R., Ure, R., Fox, A.J., Bolton,
F.E., Bootsma, H.J., Willems, R.J. and Maiden, M.C. (2001). Multilocus
sequence typing system for *Campylobacter jejuni*. *J Clin Microbiol*,
**39**(1):14–23.
Marshall, J.C. and French, N.P. (2015). Source attribution January to
December 2014 of human *Campylobacter jejuni* cases from the Manawatu.
*Technical <span>R</span>eport*.
Wilson, D.J., Gabriel, E., Leatherbarrow, A.J., Cheesbrough, J., Gee,
S., Bolton, E., Fox, A., Fearnhead, P., Hart, C.A. and Diggle, P.J.
(2008). Tracing the source of campylobacteriosis. *PLoS Genet*,
**4**(9):e1000203.
Wagenaar, J.A., French, N.P. and Havelaar, A.H. (2013). Preventing
*Campylobacter* at the source: why is it so difficult? *Clin Infect
Dis*, **57**(11):1600–1606.
Biggs, P.J., Fearnhead, P., Hotter, G., Mohan, V., Collins-Emerson, J.,
Kwan, E., Besser, T.E., Cookson, A., Carter, P.E. and French, N.P.
(2011). Whole-genome comparison of two *Campylobacter jejuni* isolates
of the same sequence type reveals multiple loci of different ancestral
lineage. *PLoS One*, **6**(11):e27121.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_039"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:50 OGGB5 (260-051)</p></div>
## Towards An Informal Test For Goodness-Of-Fit {.unnumbered}
<p style="text-align:center">
Anna Fergusson and Maxine Pfannkuch<br />
University of Auckland<br />
</p>
<span>**Abstract:**</span> Informal approaches to goodness-of-fit tests often involve examining the
visual fit of the model to data ’by eye’. Such approaches are
problematic for Year 13 and undergraduate students and teachers from a
pedagogical perspective as key aspects such as sample size, the number
of categories and expected variation of sample proportions are difficult
to consider. In formal tests for goodness-of-fit a test statistic is
used in reference to its sampling distribution to decide if the model
distribution can be rejected. In general, a numeric test statistic does
not have an obvious graphical representation within the data itself.
This talk presents a new informal goodness-of-fit test that uses a
simulation-based modelling tool. Drawing on ideas from graphical
inference, the proposed test does not use numerical test statistics but
plots as test statistics. Comparisons of performance demonstrate that
the proposed test leads to similar decisions about the fit of the model
distribution as the chi square goodness-of-fit test. A research study
with Year 13 teachers indicated that there could be pedagogical benefits
of using this informal goodness-of-fit test in terms of introducing
important modelling and hypothesis test concepts.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_024"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:50 Case Room 2 (260-057)</p></div>
## Identifying Clusters Of Patients With Diabetes Using A Markov Birth-Death Process {.unnumbered}
<p style="text-align:center">
Mugdha Manda, Thomas Lumley, and Susan Wells<br />
University of Auckland<br />
</p>
<span>**Abstract:**</span> Estimating disease trajectories has
increasingly become more essential to clinical practitioners to
administer effective treatment to their patients. A part of describing
disease trajectories involves taking patients’ medical histories and
sociodemographic factors into account and grouping them into similar
groups, or clusters. Advances in computerised patient databases have
paved a way for identifying such trajectories in patients by recording a
patient’s medical history over a long period of time (longitudinal
data): we studied data from the PREDICT-CVD dataset, a national
primary-care cohort from which people with diabetes from 2002-2015 were
identified through routine clinical practice. We fitted a Bayesian
hierarchical linear model with latent clusters to the repeated
measurements of HbA$_1c$ and eGFR, using the Markov birth-death process
proposed by Stephens (2000) to handle the changes in dimensionality as
clusters were added or removed.
<span>**Keywords:**</span> Diabetes management, longitudinal data,
Markov chain Monte Carlo, birth-death process, mixture model, Bayesian
analysis, latent clusters, hierarchical models, primary care, clinical
practice
<span>**References:**</span>
Stephens, M. (2000). Bayesian Analysis of Mixture Models with an Unknown
Number of Components - An Alternative to Reversible Jump Methods. In:
*The Annals of Statistics*, 28(1), 40-74.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_174"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:50 Case Room 3 (260-055)</p></div>
## Bayesian Temporal Density Estimation Using Autoregressive Species Sampling Models {.unnumbered}
<p style="text-align:center">
Youngin Jo^1^, Seongil Jo^2^, and Jaeyong Lee^3^<br />
^1^Kakao Corporation<br />
^2^Chonbuk National University<br />
^3^Seoul National University<br />
</p>
<span>**Abstract:**</span> We propose a Bayesian nonparametric (BNP)
model, which is built on a class of species sampling models, for
estimating density functions of temporal data. In particular, we
introduce species sampling mixture models with temporal dependence. To
accommodate temporal dependence, we define dependent species sampling
models by modeling random support points and weights through an
autoregressive model, and then we construct the mixture models based on
the collection of these dependent species sampling models. We propose an
algorithm to generate posterior samples and present simulation studies
to compare the performance of the proposed models with competitors that
are based on Dirichlet process mixture models. We apply our method to
the estimation of densities for the price of apartment in Seoul, the
closing price in Korea Composite Stock Price Index (KOSPI), and climate
variables (daily maximum temperature and precipitation) of around the
Korean peninsula.
<span>**Keywords:**</span> Autoregressive species sampling models;
Dependent random probability measures; Mixture models; Temporal
structured data
<span>**Acknowledgements:**</span> This work is a part of the first author’s Ph.D. thesis at Seoul National University. Research of Seongil Jo was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2017R1D1A3B03035235). Research of Jaeyong Lee was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2011-0030811).
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_047"><p class="contribBanner">Tuesday 12<sup>th</sup> 10:50 Case Room 4 (260-009)</p></div>
## How Does The Textile Set Describe Geometric Structures Of Data? {.unnumbered}
<p style="text-align:center">
Ushio Tanaka^1^ and Tomonari Sei^2^<br />
^1^Osaka Prefecture University<br />
^2^Unversity of Tokyo<br />
</p>
<span>**Abstract:**</span> The textile set is defined from the textile
plot proposed by Kumasaka and Shibata (2007, 2008), which is a powerful
tool for visualizing high dimensional data. The textile plot is based on
a parallel coordinate plot, where the ordering, locations and scales of
each axis are simultaneously chosen so that all connecting lines, each
of which signifies an observation, are aligned as horizontally as
possible. The textile plot transforms a data matrix in order to
delineate a parallel coordinate plot. Using the geometric properties of
the textile set derived by Sei and Tanaka (2015), we show that the
textile set describes an intrinsically geometric structures of data.
<span>**Keywords:**</span> Parallel coordinate plot, Textile set,
Differentiable manifold
<span>**References:**</span>
Kumasaka, N. and Shibata, R. (2007). The Textile Plot Environment,
*Proceedings of the Institute of Statistical Mathematics*, **55**,
47–68.
Kumasaka, N. and Shibata, R. (2008). High-dimensional data
visualisation: The textile plot, *Computational Statistics and Data
Analysis*, **52**, 3616–3644.
Sei, T. and Tanaka, U. (2015). Geometric Properties of Textile Plot:
*Geometric Science of Information*, *Lecture Notes in Computer Science*,
**9389**, 732–739.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_046"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:10 098 Lecture Theatre (260-098)</p></div>
## Intensity Estimation Of Spatial Point Processes Based On Area-Aggregated Data {.unnumbered}
<p style="text-align:center">
Hsin-Cheng Huang and Chi-Wei Lai<br />
Academia Sinica<br />
</p>
<span>**Abstract:**</span> We consider estimation of intensity function
for spatial point processes based on area-aggregated data. A standard
approach for estimating the intensity function for a spatial point
pattern is to use a kernel estimator. However, when data are only
available in a spatially aggregated form with the numbers of events
available in geographical subregions, traditional methods developed for
individual-level event data become infeasible. In this research, a
kernel-based method will be proposed to produce a smooth intensity
function based on aggregated count data. Some numerical examples will be
provided to demonstrate the effectiveness of the proposed method.
<span>**Keywords:**</span> Area censoring, inhomogeneous spatial point
processes, kernel density estimation
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_115"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:10 OGGB4 (260-073)</p></div>
## Bayesian Inference For Population Attributable Measures {.unnumbered}
<p style="text-align:center">
Sarah Pirikahu, Geoff Jones, Martin Hazelton, and Cord Heuer<br />
Massey University<br />
</p>
<span>**Abstract:**</span> Epidemiologists often wish to determine the population impact of an
intervention to remove or reduce a risk factor. Population attributable
type measures, such as the population attributable risk (PAR) and
population attributable fraction (PAF), provide a means of assessing
this impact, in a way that is accessible for a non-statistical audience.
To apply these concepts to epidemiological data, the calculation of
estimates and confidence intervals for these measures should take into
account the study design (cross-sectional, case-control, survey) and any
sources of uncertainty (such as measurement error in exposure to the
risk factor). We provide methods to produce estimates and Bayesian
credible intervals for the PAR and PAF from common epidemiological study
types and assess the Frequentist properties. The model is then extended
by incorporating uncertainty due to the use of imperfect diagnostic
tests for disease or exposure. The resulting model can be
non-identifiable, causing convergence problems for common MCMC samplers,
such as Gibbs and Metropolis-Hastings. An alternative importance
sampling method performs much better for these non-identifiable models
and can be used to explore the limiting posterior distribution. The data
used to estimate these population attributable measures may include
multiple risk factors in addition to the one being considered for
removal. Uncertainty regarding the distribution of these risk factors in
the population affects the inference for PAR and PAF. To allow for this
we propose a methodology involving the Bayesian bootstrap. We also
extend the analysis to allow for complex survey designs with unequal
weights, stratification and clustering.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_147"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:10 OGGB5 (260-051)</p></div>
## An Information Criterion For Prediction With Auxiliary Variables Under Covariate Shift {.unnumbered}
<p style="text-align:center">
Takahiro Ido^1^, Shinpei Imori^1,2^, and Hidetoshi Shimodaira^2,3^<br />
^1^Osaka University<br />
^2^RIKEN Center for Advanced Intelligence Project (AIP)<br />
^3^Kyoto University<br />
</p>
<span>**Abstract:**</span> It is beneficial for modeling data of
interest to exploit secondary information. The secondary information is
called auxiliary variables, which may not be observed in testing data
because they are not of primary interest. In this paper, we incorporate
the auxiliary variables into a framework of supervised learning.
Furthermore, we consider a covariate shift situation that allows a
density function of covariates to change between testing and training
data. It is known that the Maximum Log-likelihood Estimate (MLE) is not
a good estimator under model misspecification and the covariate shift.
This problem can be resolved by the Maximum Weighted Log-likelihood
Estimate (MWLE).
When we have multiple candidate models, it needs to select the best
candidate model where its optimality is measured by the expected
Kullback-Leibler (KL) divergence. The Akaike information criterion (AIC)
is a well known criterion based on the KL divergence and using the MLE.
Therefore, its validity is not guaranteed when the MWLE is used under
the covariate shift. An information criterion under the covariate shift
was proposed in Shimodaira (2000, JSPI) but this criterion does not take
use of the auxiliary variables into account. Hence, we resolve this
problem by deriving a new criterion. In addition, simulations are
conducted to examine the improvement.
<span>**Keywords:**</span> Auxiliary variables; Covariate shift;
Information criterion; Kullback-Leibler divergence; Misspecification;
Predictions.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_118"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:10 Case Room 2 (260-057)</p></div>
## Analysis Of A Brief Telephone Intervention For Problem Gambling And Examining The Impact On Co-Existing Depression? {.unnumbered}
<p style="text-align:center">
Nick Garrett, Maria Bellringer, and Max Abbott<br />
Auckland University of Technology<br />
</p>
<span>**Abstract:**</span> This study investigated the outcomes of a brief telephone intervention
for problem gambling. A total of 150 callers were recruited and followed
for 36 months. After giving consent, participants received a baseline
assessment followed by a manualised version of the helpline’s standard
care. Eight-six percent of participants were re-assessed at three
months, 79Depression is found to often be associated with problem
gambling behaviour, and analysis was undertaken to examine the impact of
a brief telephone intervention for problem gambling on rates of
depression using logistic regression. At baseline depression was found
to be associated with gender, problem gambling risk (PGSI), and
deprivation (NZiDep). A multiple variable model found that PGSI and
mental health medication best explained depression at baseline. A
repeated measures logistic regression utilising all 36 months of data
found that PGSI, NZiDep, and mental health medication were the best
variables to explain the change over time. Conclusion was that the
intervention’s impact on problem gambling behaviour also changed
depression rates, however deprivation and mental health medication also
contributed.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_175"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:10 Case Room 3 (260-055)</p></div>
## Prior-Based Bayesian Information Criterion {.unnumbered}
<p style="text-align:center">
M. J. Bayarri^1^, James Berger^2^, Woncheol Jang^3^, Surajit Ray^4^, Luis Pericchi^5^, and Ingmar Visser^6^<br />
^1^University of Valencia<br />
^2^Duke University<br />
^3^Seoul National University<br />
^4^University of Glasgow<br />
^5^University of Puerto Rico<br />
^6^University of Amsterdam<br />
</p>
<span>**Abstract:**</span> We present a new approach to model selection
and Bayes factor determination, based on Laplace expansions (as in BIC),
which we call Prior-based Bayes Information Criterion (PBIC). In this
approach, the Laplace expansion is only done with the likelihood
function, and then a suitable prior distribution is chosen to allow
exact computation of the (approximate) marginal likelihood arising from
the Laplace approximation and the prior. The result is a closed-form
expression similar to BIC, but now involves a term arising from the
prior distribution (which BIC ignores) and also incorporates the idea
that different parameters can have different effective sample sizes
(whereas BIC only allows one overall sample size $n$). We also consider
a modification of PBIC which is more favorable to complex models.
<span>**Keywords:**</span> Bayes factors, model selection, Cauchy
priors, consistency, effective sample size, Fisher information, Laplace
expansions, robust priors
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_208"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:10 Case Room 4 (260-009)</p></div>
## Early Childhood Dental Decay {.unnumbered}
<p style="text-align:center">
Sarah Sonal<br />
University of Canterbury<br />
</p>
<span>**Abstract:**</span> Our teeth are some of our most useful tools. They let us eat tasty food, take those plastic tags off new clothes and enhance our smiles to convey joy. They also have to last us a lifetime and need to be looked after. Teeth are a mutually supportive structure, even one extraction can destabilize the remaining teeth. Early intervention in oral health can prevent a lifetime of discomfort, embarrassment and expensive treatments. An issue that is facing Dentists in New Zealand and abroad are preschool children missing treatment appointments. These children have more dental issues in later childhood.
The research question I aim to answer is: Does early dental neglect increase dental issues in later childhood? My thesis will use traditional statistics along with datamining and machine learning techniques to investigate these anecdotal claims.
Using the geographical information of the dataset I will be utilizing the Deprivation data from Statistics New Zealand to research if these children are from more deprived neighborhoods.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_067"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:30 098 Lecture Theatre (260-098)</p></div>
## Geographically Weighted Principal Component Analysis For Spatio-Temporal Statistical Dataset {.unnumbered}
<p style="text-align:center">
Narumasa Tsutsumida^1^, Paul Harris^2^, and Alexis Comber^3^<br />
^1^Kyoto University<br />
^2^Rothamsted Research<br />
^3^Univerisity of Leeds<br />
</p>
<span>**Abstract:**</span> Spatio-temporal statistical datasets are
becoming widely available for social, ecomonic, and environmental
researches, however it is often difficult to summarize it and undermine
hidden spatial/temporal patterns due to its complexity. Geographically
weighted principal component analysis (GWPCA), which uses a moving
window or kernel and applies localized PCAs over geographical scape, may
be worth to do it, while to optimize kernel bandwidth size and to
determine the number of component to retain (NCR) were the most concern
(Tsutsumida et al (2017)). In this research we determine both of them
together simultaneously so as to minimize leave-one-out residual
coefficient of variation of GWPCA with changing bandwidth size and NCR.
As a case study we use annual goat population statistics across 341
administrative units in Mongolia in 1990-2012, and show spatiotemporal
variations in data, especially influenced by natural disasters.
<span>**Keywords:**</span> Geographically weighted model,
Spatio-temporal data, Parameter optimization
<span>**References:**</span>
Tsutsumida N., P. Harris, , A. Comber. 2017. The Application of a
Geographically Weighted Principal Component Analysis for Exploring
Twenty-three Years of Goat Population Change across Mongolia. *Annals of
the American Association of Geographers*, **107(5)**, 1060–1074.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_116"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:30 OGGB4 (260-073)</p></div>
## Dimensionality Reduction Of Multivariate Data For Bayesian Analysis {.unnumbered}
<p style="text-align:center">
Anjali Gupta^1^, James Curran^1^, Sally Coulson^2^, and Christopher Triggs^1^<br />
^1^University of Auckland<br />
^2^ESR<br />
</p>
<span>**Abstract:**</span> In 2004, Aitken and Lucy published an article detailing a two-level
likelihood ratio for multivariate trace evidence. This model has been
adopted in a number of forensic disciplines such as the interpretation
of glass, drugs (MDMA), and ink. Modern instrumentation is capable of
measuring many elements in very low quantities and, not surprisingly,
forensic scientists wish to exploit the potential of this extra
information to increase the weight of this evidence. The issue, from a
statistical point of view, is that the increase in the number of
variables (dimension) in the problem leads to increased data demand to
understand both the variability within a source, and in between sources.
Such information will come in time, but usually we don’t have enough.
One solution to this problem is to attempt to reduce the dimensionality
through methods such as principal component analysis. This practice is
quite common in high dimensional machine learning problems. In this
talk, I will describe a study where we attempt to quantify the effects
of this this approach on the resulting likelihood ratios using data
obtained from SEM-EDX instrument.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_016"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:30 OGGB5 (260-051)</p></div>
## An EWMA Chart For Monitoring Covariance Matrix Based On Dissimilarity Index {.unnumbered}
<p style="text-align:center">
Longcheen Huwang<br />
National Tsing Hua University<br />
</p>
<span>**Abstract:**</span> In this talk, we propose an EWMA chart for
monitoring covariance matrix based on the dissimilarity index of two
matrices. It is different from the conventional EWMA charts for
monitoring covariance matrix which are either based on comparing the sum
or product or both of the eigenvalues of the estimated EWMA covariance
matrix with those of the IC covariance matrix. The proposed chart
essentially monitors covariance matrix by comparing the individual
eigenvalues of the estimated EWMA covariance matrix with those of the
estimated covariance matrix from the IC phase I data. We evaluate the
performance of the proposed chart by comparing it with the best existing
chart under the multivariate normal process. Furthermore, to prevent the
control limit of the proposed EMMA chart using the limited IC phase I
data from having extensively excessive false alarms, we use a bootstrap
method to adjust the control limit to guarantee that the proposed chart
has the actual IC average run length not less than the nominal one with
a certain probability. Finally, we use an example to demonstrate the
applicability and implementation of the proposed chart.
<span>**Keywords:**</span> Average run length, dissimilarity index,
EWMA; out-of-control
<span>**References:**</span>
Hawkins, D.M. and Maboudou-Tchao E.M. (2008). Multivariate exponentially
weighted moving covariance matrix. <span>*Technometrics*</span>,
<span>**50**</span>, 155-166.
Kano, M., Hasebe, S. and Hashimoto, I. (2002). Statistical process
monitoring based on dissimilarity of process data. <span>*AIChE
Journal*</span>, <span>**48**</span>, 1231-1240.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_162"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:30 Case Room 2 (260-057)</p></div>
## Adjusting For Linkage Bias In The Analysis Of Record-Linked Data {.unnumbered}
<p style="text-align:center">
Patrick Graham<br />
Stats NZ and Bayesian Research<br />
</p>
<span>**Abstract:**</span> Data formed from record-linkage of two or
more datasets are an increasingly important source of data for public
health and social science research. For example, a study cohort may be
linked to administrative data in order to add outcome or covariate
information to data collected directly from study participants. However,
regardless of the linkage method, it is often the case that not all
records are linked. Further, linkage rates usually vary with
characteristics of analytical interest and this differential linkage can
bias analyses restricted just to linked records. While linked records
have full outcome and covariate information, unlinked records exhibit
“block-missingness” whereby the values for the entire block of variables
contained in the file that is linked to are missing for unlinked
records. Similar missing data structures occur in other contexts,
including panel studies when participants decline participation in one
or more study waves. In this paper, I consider the problem of adjusting
for linkage bias from both Bayesian and frequentist perspectives. A
basic distinction is whether analysis is based on all available data or
just the linked cases. The Bayesian perspective leads to the former
option and to Gibbs sampling and multiple imputation as reasonable
methods. Basing analysis only on the linked cases seems to require a
frequentist perspective and leads to inverse probability of linkage
weighting and conditional maximum likelihood as reasonable approaches.
The implications of the assumption of ignorable linkage also differ
somewhat between the approaches. A simulation investigation confirms
that, assuming ignorable linkage given observed data, multiple
imputation, conditional maximum likelihood and inverse probability of
linkage weighting all succeed in adjusting for linkage bias and achieve
nominal interval coverage rates. Conditional maximum likelihood is
slightly more efficient than inverse probability of linkage weighting
and that multiple imputation can be more efficient than conditional
maximum likelihood. Extensions to the case of non-ignorable linkage are
also considered.
<span>**Keywords:**</span> Record linkage, Missing data, Bayesian
inference, Gibbs sampler, Multiple imputation
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_176"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:30 Case Room 3 (260-055)</p></div>
## Bayesian Semiparametric Hierarchical Models For Longitudinal Data Analysis With Application To Dose-Response Studies {.unnumbered}
<p style="text-align:center">
Taeryon Choi<br />
Korea University<br />
</p>
<span>**Abstract:**</span> In this work, we propose semiparametric
Bayesian hierarchical additive mixed effects models for analyzing either
longitudinal data or clustered data with applications to dose-response
studies. In the semiparametric mixed effects model structure, we
estimate nonparametric smoothing functions of continuous covariates by
using a spectral representation of Gaussian processes and the
subject-specific random effects by using Dirichlet process mixtures. In
this framework, we develop semiparametric mixed effects models that
include normal regression and quantile regressions with or without shape
restrictions. In addition, we deal with the Bayesian nonparametric
measurement error models, or errors-in-variable regression models, using
Fourier series and Dirchlet process mixtures, in which the true
covariate is not observable, but the surrogate of the true covariate, is
only observed. The proposed methodology is compared with other existing
approaches to additive mixed models in simulation studies and benchmark
data examples. More importantly, we consider a real data application for
dose-response analysis, in which measurement errors and shape
constraints in the regression functions need to be incorporated with
inter-study variability.
<span>**Keywords:**</span> Cadmium toxicity, Cosine series,
Dose-response study, Hierarchical Model, Measurement errors, Shape
restriction
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_091"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:30 Case Room 4 (260-009)</p></div>
## Optimizing Junior Rugby Weight Limits {.unnumbered}
<p style="text-align:center">
Emma Campbell, Ankit Patel, and Paul Bracewell<br />
DOT Loves Data<br />
</p>
<span>**Abstract:**</span> The New Zealand rugby community is aware of safety issues within the
junior game and has applied weight limits for each tackle grade to
minimize injury risk. However, for heavier children this can create an
uncomfortable situation as they may no longer be playing with their peer
group. The study evaluated almost 13,000 observations from junior rugby
players across three seasons (2015-2017) using data supplied by
Wellington Rugby. To protect privacy, the data was structured so that an
individual could not be readily identified but could be tracked across
seasons to determine churn. As data for several consecutive seasons was
available, we could determine the likelihood of a junior player
returning the following season and isolate the drivers of this
behaviour. Applying a logistic regression and repeated measures analysis
the study determined if children who are over the specified weight limit
for their age group are more likely to leave the game. Furthermore,
assuming the importance of playing with peers, the study identified the
impact of age in relation to the date-of-birth cut-off of January 1st.
This is of interest given that a child playing above their age-weight
grade could be competing against individuals three school years above
them. The study primarily focuses on determining the optimal age-weight
bands while the secondary focus is on determining the likelihood of a
junior Wellington rugby player returning the following season and
isolating the drivers of this behaviour.
<span>**Keywords:**</span> Logistic regression, repeated measures, player retention,
optimization
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_144"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:50 098 Lecture Theatre (260-098)</p></div>
## Spatial Scan Statistics For Matched Case-Control Data {.unnumbered}
<p style="text-align:center">
Inkyung Jung<br />
Yonsei University College of Medicine<br />
</p>
<span>**Abstract:**</span> Spatial scan statistics are widely used for
cluster detection analysis in geographical disease surveillance. While
the method has been developed for various types of data such as binary,
count and continuous data, spatial scan statistics for matched
case-control data, which often arise in spatial epidemiology, have not
been considered yet. In this paper, we propose two spatial scan
statistics for matched case-control data. The proposed test statistics
properly consider the correlations between matched pairs. We evaluate
statistical power and cluster detection accuracy of the proposed methods
through simulations comparing with the Bernoulli-based method. We
illustrate the methods with the use of a real data example.
<span>**Keywords:**</span> Spatial epidemiology, cluster detection,
SaTScan, McNemar test, conditional logistic regression
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_124"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:50 OGGB4 (260-073)</p></div>
## Whitebait In All Its Varieties: One Fish, Two Fish, Three, Four, Five Fish. {.unnumbered}
<p style="text-align:center">
Bridget Armstrong<br />
University of Canterbury<br />
</p>
<span>**Abstract:**</span> There are five species of fishes of the genus Galaxias that make up whitebait catches in New Zealand, although one species (G. maculatus) makes up >90% of the catch. Whitebait are immature post-larval fish that have yet to develop the distinctive morphological traits of adults. However, in their tiny stages as whitebait the five species are difficult to tell apart. There are also distinct spatial (rivers) and temporal (different months in the whitebait fishing season) differences among the species and even within species. To manage the fishery better it is necessary to identify regional differences in the species composition of catches, which is difficult because of the time and effort required to sample catches and identify species morphologically or genetically. In my study, I will use a recently compiled database comprising 17,000 entries of whitebait samples, species composition, and variability to develop a statistical model to predict the likelihood of species-to-species composition of catches throughout New Zealand. This probabilistic model could potentially be a powerful tool in the fishery and conservation of whitebait species, some of which are considered to be threatened.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_191"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:50 OGGB5 (260-051)</p></div>
## Latent Variable Models And Multivariate Binomial Data {.unnumbered}
<p style="text-align:center">
John Holmes<br />
University of Otago<br />
</p>
<span>**Abstract:**</span> A large body of work has been devoted to
latent variable models applicable to multivariate binary data. However
little work has been put into extending these models to cases where the
observed data is multivariate binomial. In this paper, we will first
show that models that use either a logit or probit link function, offer
the same level of modelling flexibility in the binary case, but only the
logit link fits into a data augmentation approach that compactly extends
from binary to binomial. Secondly, we will demonstrate that multivariate
binomial data provides greater flexibility in how the link function can
be represented. Lastly, we will consider properties of the implied
distribution of latent probabilities under a logit link.
<span>**Keywords:**</span> Multivariate binomial data, principal
components/factor analysis, item response theory, link functions,
logit-normal distributions
<span>**References:**</span>
(ed.) Bartholomew, D. J. and Knott, M. and Moustaki, I. (2011). *Latent
Variable Models and Factor Analysis: A Unified Approach*. Chichester:
John Wiley & Sons.
Johnson, N.L. (1949). Systems of Frequency Curves Generated by Methods
of Translation. *Biometrika*, **36**, 149–276.
Polson, N. G. and Scott, J. G. and Windle, J. (2013). Bayesian inference
for logistic models using <span>P<span>ó</span>lya</span>-gamma latent
variables. *Journal of the American Statistical Association*, **108**,
1339–1349.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_193"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:50 Case Room 2 (260-057)</p></div>
## Asking About Sex In General Health Surveys: Comparing The Methods And Findings Of The 2010 Health Survey For England With Those Of The Third National Survey Of Sexual Attitudes And Lifestyles {.unnumbered}
<p style="text-align:center">
Philip Prah^1^, Anne Johnson^2^, Soazig Clifton^2^, Jennifer Mindell^2^, Andrew Copas^2^, Chloe Robinson^3^, Rachel Craig^3^, Sarah Woodhall^2^, Wendy Macdowall^4^, Elizabeth Fuller^3^, Bob Erens^2^, Pam Sonnenberg^2^, Kaye Wellings^4^, Catherine Mercer^2^, and Anthony Nardone^5^<br />
^1^Auckland University of Technology<br />
^2^University College London<br />
^3^NatCen<br />
^4^London School of Hygiene & Tropical Medicine<br />
^5^Public Health England<br />
</p>
<span>**Abstract:**</span> Including questions about sexual health in the annual Health Survey for
England (HSE) provides opportunities for regular measurement of key
public health indicators, augmenting Britain’s decennial National Survey
of Sexual Attitudes and Lifestyles (Natsal). However, contextual and
methodological differences may limit comparability of the findings. For
instance both surveys used self-completion for administering sexual
behaviour questions but this was via computer-assisted self-interview
(CASI) in Natsal-3 and a pen-and-paper questionnaire in HSE 2010. We
examine the extent of these differences between HSE 2010 and Natsal-3
(undertaken 2010-2012) and investigate their impact on parameter
estimates. For inclusion to this study, we restricted participants to
men and women in the 2010 HSE (n = 2,782 men and 3,588 women) and
Natsal-3 (n = 4,882 men and 6,869 women) aged 16-69 years and resident
in England. We compared their demographic characteristics, the amount of
non-response to, and estimates from, sexual health questions. We used
complex survey analysis to take into account stratification, clustering,
and weighting of the data in each survey. Logistic regression was used
to measure the extent to which sexual health estimates differ in HSE
2010 relative to Natsal-3, with multivariable models to adjust for
significant demographic confounders. Additionally, investigated
age-group interactions to see if differences between the surveys varied
by age. The surveys achieved similar response rates, both around 60While
a relatively high response to sexual health questions in HSE 2010
demonstrates the feasibility of asking such questions in a general
health survey, differences with Natsal-3 do exist. These are likely due
to the HSE’s context as a general health survey and methodological
limitations such as its current use of pen-and-paper questionnaires.
<p style = "text-align: right">
<a href = "programme-at-a-glance.html#Tuesday-tbl">Return to Programme</a><br/><br/></p>
<p class="pagebreak"></p>
<div id = "talk_132"><p class="contribBanner">Tuesday 12<sup>th</sup> 11:50 Case Room 3 (260-055)</p></div>
## Bayesian Continuous Space-Time Model Of Burglaries {.unnumbered}
<p style="text-align:center">
Chaitanya Joshi, Paul Brown, and Stephen Joe<br />
University of Waikato<br />
</p>
<span>**Abstract:**</span> Building a predictive model of crime with
good predictive accuracy has a great value in enabling efficient use of
policing resources and reduction in crime. Building such models is not