NSF Annual Report - CiteSeerX

590
1 NSF Annual Report 2009-2010 Submitted to the National Science Foundation

Transcript of NSF Annual Report - CiteSeerX

1

NSF Annual Report

2009-2010

Submitted to the National Science Foundation

2

3

NSF Annual Progress Report for 2009-2010

As outlined in the terms of grant DMS-0635449, the following is the Annual

Progress Report for the Statistical and Applied Mathematical Sciences Institute

(SAMSI), for the period August 1, 2009 – July 31, 2010. Past activities that

concluded during this period and future activities of SAMSI are also discussed.

0. Executive Summary

A. Outline of Activities and Initiatives for 2009-2010 and the Future ...............................3

B. Financial Overview ........................................................................................................5

C. Directorate‟s Summary of Challenges and Responses ...................................................6

D. Synopsis of Research, Human Resource Development, and Education ........................9

E. Evaluation by the SAMSI Governing Board ................................................................29

Annual Report Table of Contents ......................................................................................34

A. Outline of Activities and Initiatives

1. 2009-2010 Programs and Activities Schedule Space-time Analysis for Environmental Mapping, Epidemiology and Climate

Change

o Summer School (7/28/09-8/1/09)

o Opening Workshop and Tutorials (9/13/09-9/16/09)

o GEOMED 2009 and Spatial Epidemiology (11/14/09-11/17/09)

o Climate Change Workshop (2/17/10-2/19/10)

o Fundamentals of Spatial Modeling Workshop (3/20/10-3/21/10)

o Workshop on Statistical Aspects of Environmental Risk (4/7/10-4/9/10)

o Transition Workshop (10/11/10-10/13/10)

Stochastic Dynamics

o Opening Workshop and Tutorials (8/30/09-9/2/09)

o Workshop on Self-Organization and Multi-Scale Mathematical

Modeling

of Active Biological Systems (10/26/09-10/28/09)

o Workshop on Theory and Qualitative Behavior of Stochastic Dynamics

(2/8/10-2/10/10)

o Workshop on Molecular Motors, Neuron Models, and Epidemic on

Networks (4/15/10-4/17-10)

o Transition Workshop (9/27/10-9/29/10)

Summer Program: Semiparametric Bayesian Inference, with Applications in

Pharmacokinetics and Pharmacodynamics

o Tutorials and Opening Workshop (7/12/10-7/16/10)

o Intensive Research Week (7/19/10-7/23/10)

A Planning Meeting (4/6/10) was held for the Uncertainty Quantification (UQ)

Program for 2011-12. The afternoon session of this meeting was a public meeting

discussing the formation of UQ committees in the ASA and SIAM.

4

Education and Outreach 2-Day Workshop for Undergraduates (10/30/09-10/31/09)

2-Day Workshop for Undergraduates (2/26/10-2/27/10)

Interdisciplinary Workshop for Undergraduates (5/17/10-5/21/10)

The Industrial Mathematical and Statistical Modeling Workshop for Graduate

Students (7/19/10-7/27/10)

Graduate Courses at SAMSI

o Stochastic Dynamics, Fall 2009

o Theory of Continuous Space and Space-Time Processes, Fall 2009

o Spatial Epidemiology, Fall 2009

o Spatial Statistics in Climate, Ecology and Atmospherics, Spring 2010

2. Programs for 2010-2011 Analysis of Object Data

o Opening Workshop & Tutorials (9/12/10 – 9/15/10)

o Interface Functional & Longitudinal Data Analysis (11/8/10 –

11/10/10)

o AOOD Meets Evolutionary Biology (4/30/11 – 5/2/11)

Complex Networks

O Opening Workshops & Tutorials (10/20/10 – 10/22/10)

O Dynamics of Networks (1/10/11 – 1/12/11)

O Pedestrian Traffic Flow (2/14/11 – 2/16/11)

O Dynamics on Networks (3/21/11 – 3/23/11)

Summer Program: SAMSI/Sandia Summer School of Uncertainty Quantification

Education and Outreach 2-Day Workshop for Undergraduates (10/29/10-10/30/10)

2-Day Workshop for Undergraduates (2/25/11-2/26/11)

Graduate Student Probability Workshop (4/29/11 – 5/1/11)

Interdisciplinary Workshop for Undergraduates (5/16/11-5/20/11)

The Industrial Mathematical and Statistical Modeling Workshop for Graduate

Students (7/7/11-7/15/11)

Graduate Courses at SAMSI

o Complex Networks, Fall 2010

o Analysis of Object Data, Fall 2010

o Analysis of Object Data, Spring 2011

3. Programs for 2011-2012 Uncertainty Quantification: Climate Change Workshop

o Opening Workshop & Tutorials (8/29/11 - 8/31/11)

Uncertainty Quantification: Methodology

o Opening Workshop & Tutorials (9/7/11 – 9/10/11)

Uncertainty Quantification: Engineering & Renewable Energy

o Opening Workshop & Tutorials (9/19/11 – 9/21/11)

Uncertainty Quantification: Geosciences Applications

o Opening Workshop & Tutorials (9/21/11 – 9/23/11)

5

B. Financial Overview

6

C. Directorate’s Summary of Challenges and Responses

After eight years of SAMSI, the scientific programs have been of high caliber, and have

led to significant new and ongoing research collaborations between, statistics, applied

mathematics, and disciplinary sciences. There has been significant human resource

development, through the postdoctoral and graduate programs and through involvement

of senior researchers in new interdisciplinary areas. Many students across the country

have been shown the SAMSI vision through educational outreach programs and courses.

We feel that these successes are amply demonstrated throughout the report; some

highlights are given in section D of the Executive Summary. This section discusses the

challenges that arose this year and the Directorate‟s response to these challenges.

Changes in the Directorate: After eight years of sterling service as SAMSI‟s inaugural

Director, Jim Berger stood down from that position in 2010. After a national search took

place to appoint a new Director, Dr. Richard Smith of the University of North Carolina

(UNC) assumed that position from July 1, 2010.

Richard‟s appointment as Director meant that the UNC representative on the Directorate

(previously Michael Minion) moved from the Mathematics department to Statistics and

Operations Research. To maintain the statistics/mathematics balance on the Directorate, it

was logical to seek a new Associate Director from the Mathematics department at Duke.

Dr. Rick Durrett took up this role after moving from Cornell to Duke during the summer

of 2010.

As was reported in last year‟s report, Dr. Pierre Gremaud of NCSU was appointed as the

full-time Deputy Director from January 1, 2010, with half of his salary provided by the

home university (as previously), while the other half is provided from SAMSI NSF

money.

As of Summer 2010, the fourth member of the Directorate continued to be Nell Sedransk,

Associate Director of NISS and also of SAMSI.

The current arrangements mean that at any one time, only three of the eight Triangle

departments that support SAMSI have a representative on the Directorate. To ensure

direct input from all the involved departments, a new Local Department Liaisons

committee was formed including one member from each of the five departments not

directly represented on the Directorate. This committee will meet with the Directorate at

least once a semester and will be actively consulted about how to optimize local faculty

involvement in SAMSI. The initial members of this committee are Drs. Greg Forest

(UNC Math), Stephen George (Duke Biostatistics), Robert L.Wolpert (Duke Statistics),

Donglin Zeng (UNC Biostatistics) and

Helen Hao Zhang (NCSU Statistics).

Because this committee effectively takes over all the functions of the previous Local

Development committee, the latter committee no longer exists.

7

Space: The extension to the NISS/SAMSI building completed in Fall 2008 has greatly

expanded SAMSI‟s capacity to host visitors. The space has been well utilized during

2009-10 with the following numbers of visitors:

Long-term Participants in 2010-11 SAMSI Programs

Postdocs Postdoc Assoc

Graduate Visitor

Graduate Local Visitor

Faculty Fellow

1-2 Month Visitor

New Rschr Totals

Year-Long 8 12 3 12 1 36

Fall 4 16 25 1 46

Spring 1 1 13 11 26

Program Evaluation: Programs at SAMSI are evaluated through questionnaires

distributed at all workshops, and through annual surveys of SAMSI postdocs and

program participants. The information from these surveys is presented in Sections B and

C and in Appendix G.

Communication and Marketing

As part of SAMSI's branding efforts, a MP3 charger that can be used in an automobile

was given out at conferences, and at some of SAMSI's research workshops and Education

and Outreach workshops.

A tri-fold marketing brochure was produced that gives a brief explanation of what

SAMSI is and the programs that are offered. It is used as part of SAMSI's overall

marketing efforts and given to people at who visit SAMSI's exhibit at the math and

statistics conferences.

Posters were designed and distributed for the two main programs and for the Education

and Outreach program. A special poster was designed for the IMSM workshop, which is

jointly sponsored by SAMSI, the NSF, CRSC and NCSU. The posters are distributed to

colleges and universities across the United States.

SAMSI's newsletter was mailed out quarterly to a list of about 460 colleges and

universities. Another 2,643 email contacts receive the email edition of the newsletter as

well using the Bronto e-mail marketing system. Roughly 22 messages were sent using the

Bronto system to communicate with our target market. In addition to the newsletter, these

messages included announcements of upcoming workshops and programs, and any

special announcements the organization wanted to convey.

Two press releases were written and sent to local and national media. The first was about

the hiring of additional postdoctoral fellows and was used as part of a larger effort by all

13 mathematical institutes to publicize the new fellows that were able to be hired thanks

to an increase in funds by the NSF. This release went out on May 11, 2009 and was

picked up by 30 media outlets including dBusinessNews, Carolina Newswire, Business

Leader and others. The other press release publicized the hiring of Gordon Campbell as

8

the Operations Director for SAMSI. This release was distributed on July 13, 2009. This

release was picked up mainly by local media, such as the Durham Herald Sun and the

Duke Chronicle, but also was run in the ISM Bulletin.

SAMSI's efforts in the social media realm were recognized by the North Carolina chapter

of the Public Relations Society of America as the best social media campaign. SAMSI is

using three main areas of social media, which include LinkedIn, Facebook and Twitter to

communicate with its target market. See below for statistics on the reach that SAMSI has

had during the Fiscal 09-10 year.

A "Welcome to SAMSI" brochure was produced to give to new visitors to the area. The

brochure includes information such as area restaurants, places to live, places to shop,

recreation and places to see and a guide for parents. It was given to each new visitor to

SAMSI during the 2009-10 year.

SAMSI Twitter Followers

Date Number of Followers

5/15/09 28

8/2/09 45

10/27/09 406

11/16/09 446

1/5/10 510

08/24/10 578

Facebook Friends

Date Number of Friends Number of Visits that Week

3/25/10 87 53

4/19/10 95 98

7/26/10 123 88

8/30/10 135 68

9

D. Synopsis of Developments in Research, Human Resource

Development, and Education

1. Research

In this report, we cover both of SAMSI‟s full-year research programs for 2009-2010, as

well as the summer programs in both 2009 and 2010. It should be noted, however, that

the detailed statistics presented elsewhere in this report cover only the 2010 summer

program, not that from 2009.

The two main research programs were both highly successful. Stochastic Dynamics was

one of the few SAMSI programs up to this date to feature probability theory as a major

emphasis, though the program also included applied mathematics, statistics and several

fields of application, notably biology. Some major outcomes included new insights into

statistical aspects of multiscale computing methods; a collaboration among a probabilist,

an applied mathematician and a statistician to study “molecular motors”, and ongoing

collaborations on topics in stochastic neuronal dynamics and epidemic spread on

networks. The other program on Space-Time Analysis for Environmental Mapping,

Epidemiology and Climate Change was one of SAMSI‟s largest programs to date, with

no fewer than nine Working Groups resulting in numerous new theoretical and applied

developments in spatial and spatio-temporal statistics, as well as applications in both the

epidemiology and climate sub-themes. One outcome was the first fully Bayesian space-

time analysis for the problem of paleoclimatology – the reconstruction of historical

temperatures from proxies such as tree rings and ice cores. We also report on the

formation of a joint Working Group between the two programs – this first time that two

different SAMSI programs have combined forces in this way and further evidence of

SAMSI‟s goals of developing joint research among statisticians and applied

mathematicians.

In the following sections, we first present brief reports of the two summer programs,

followed by detailed summaries of the two year-long programs.

(a) SUMMER PROGRAM ON PSYCHOMETRIC MODELING AND

STATISTICAL INFERENCE

During the past 25 years, many new statistical methods have been introduced in

psychometrics. Examples include item response theory, structural equation models,

cognitive diagnosis models and generalized linear latent and mixed models. These

developments have been spearheaded by quantitative psychologists, who come primarily

from psychology and education departments. Within the same time frame, very similar

models have been developed by statisticians. The goal of this summer program was to

bring the two communities together to explore possible collaborations.

The progam organizers were Valen Johnson and Richard Swartz (University of Texas

M.D. Anderson Cancer Center), assisted by Jimmy de la Torre (Rutgers), Charles

Cleeland (M.D. Anderson Cancer Center) and David Banks (Duke). There were 71

10

participants, including 36 who were students or new researchers. The first four days of

the program consisted of a series of tutorial and overview lectures. For the second week,

the workshop broke up into three Working Groups.

The first Working Group was on Peer Review. This started with a statistical analysis by

David Banks (who at the time was one of the editors of JASA - the Journal of the

American Statistical Association) of review times and outcomes at JASA, and another by

Valen Johnson of peer review scores in NIH R01 grant proposals. Further discussions

during the week focused on Bayesian approaches to the analysis of rating and ranking

data.

The second Working Group was on Patient Reported Outcome (PRO) data, particularly

concerned with the longitudinal collection and analysis of PROs in clinical research,

regulatory review and disease management.

The third Working Group discussed Cognitive Diagnostic Models. This group identified

three objectives: first to survey the current state of the art; second, to develop a consensus

among the participants about the most promising lines for future research; and third, to

establish collaborative relationships among the participants to pursue these lines of

research.

All three groups met for initial presentations of the background on Monday, followed by

three days of discussions and collaboration, then a draft White Paper completed on

Friday. Each of the three White Papers was subsequently expanded into a full report,

copies of which are included in the final SAMSI Program Report for this program.

(b) SUMMER PROGRAM ON SEMIPARAMETRIC BAYESIAN INFERECE:

APPLICATIONS IN PHARMACOKINETICS AND

PHARMACODYNAMICS

Pharmacokinetics (PK) is the study of the time course of drug concentration resulting

from a particular dosing regimen. PK is often studied in conjunction with

pharmacodynamics (PD). PD explores what the drug does to the body, i.e., the

relationship of drug concentrations and a resulting pharmcological effect.

Pharmacogenetics (PGx) studies the genetic variation that determines differing response

to drugs. Understanding the PK, PD and PGx of a drug is important for evaluating

efficacy and determining how best to use such agents clinically.

Hierarchical models have allowed great progress in statistical inference in many

application areas. Hierarchical models for PK and PD data that allow borrowing of

strength across a patient population are known as population PK/PD models. These

models have allowed investigators to learn about important sources of variation in drug

absorption, disposition, metabolism, and excretion, allowing the researchers to begin to

tailor drug therapy to individuals. Newer Bayesian non-parametric population models and

semi-parametric models offer the promise of individualizing therapy and discovering

subgroups among patients even further, by freeing modelers from restrictive assumptions

about underlying distributions of key parameters across the population. The purpose of

11

this program is to bring together a mix of experts in PK and PD modeling, non-

parametric Bayesian inference, and computation.

This program was jointly organized by Gary Rosner and Peter Mueller (M.D. Anderson

Cancer Center). The first week of the program (July 12-16, 2010) consisted of a

workshop featuring one day of tutorials followed by four days of intensive research

discussions; the second week (July 19-23), participants split into five Working Groups

for intensive research discussions. The topics of the Working Groups were:

1. Model Fit and Comparison;

2. Joint PK and PD Modeling;

3. Nonparametric bayes and Clustering for Population PK and PD;

4. PK and PD Simulation for Design and Dose Individualization;

5. Pharmacogenomics and Networks.

Each of the Working Groups met several times during the week and had lively and

intensive discussions. Several groups formulated plans for continued research activities in

the future.

(c) FULL-YEAR PROGRAM ON STOCHASTIC DYNAMICS

The SAMSI program on Stochastic Dynamics brought together researchers from many

different fields of mathematics, such as probability theory, statistics, numerical analysis

and mathematical modeling, to study the mathematical analysis, computation and

applications of systems governed by stochastic differential equations. Applications of

stochastic dynamics are widespread, some examples being mathematical biology

studying transport on a cellular level, understanding stochastic forcing in dynamical

systems, the statistics of dynamic networks, and the relationship between atomistic and

continuum physics. However, these areas of research are too often treated as distinct

topics, not enough being done to disseminate research in one of those areas across the

spectrum of research in applied mathematics and statistics. The SAMSI program served

to bring together experts in different but highly inter-related research specializations

under the broader umbrella of stochastic dynamics in the hopes of creating collaborations

which could potentially lead to exciting advances in particular research areas.

Particular successes of the program include:

The establishment of a new collaboration, including two major grant proposals

submitted to date, involving a probabilist (Scott McKinley), an applied

mathematician (Peter Kramer), and a statistician (John Fricks) to develop a

systematic stochastic modeling framework for intracellular transport connecting

three important physical length scales.

A sustained research effort across SAMSI programs, with applied mathematicians

from the working group on Numerical Methods for Stochastic Systems (Peter

Kramer and Sorin Mitran) joining with statisticians from the program on Space-

Time Analysis for Environmental Mapping, Epidemiology, and Climate Change

12

program (Jim Berger, Susie Bayarri, Murali Haran, Emily Kang, Hans Kuensch)

to explore how statistical aspects of multiscale computing methods could be

enhanced.

Two research programs toward new modeling approaches in microswimming

(Peter Kramer) and swarming (Amarjit Budhiraja, Oliver Diaz and Xin Liu) that

were nucleated by presentations at the workshop on Self-Organization and Multi-

Scale Mathematical Modeling of Active Biological Systems in October, 2009.

Several ongoing collaborations between SAMSI postdocs on topics in stochastic

neuronal dynamics and epidemic spread on networks.

A project initiated and driven by two graduate students (Katherine Newhall and

Amanda Traud) participating in the working group on Network Dynamics, now

being prepared for submission for publication.

Two workshops in the spring which were designed to resonate with the specific

interests of the working groups, stimulating new ideas and helping to lead to

several submitted publications.

The Opening Workshop took place from August 30-September 2, 2009, and featured

tutorials from Peter Baxendale (University of Southern California), Mark Alber

(University of Notre Dame), Grant Lythe (Leeds University) and Andrew Stuart

(Warwick University). This was followed by five half-day research sessions, a poster

session and two “five-minute madness” sessions, and the formation of working groups.

Other workshops included one on Self-Organization and Multi-Scale Mathematical

Modeling of Active Biological Systems (October 26-28, 2009), one on Theory and

Qualitative Behavior of Stochastic Dynamics (February 8-10, 2010) and one on

Stochastic Dynamics: Molecular Motors, Neuron Models, and Epidemics on Networks

(April 15-17, 2010), as well as a transition workshop that took place at SAMSI

(November 17-19, 2010).

Working Group 1 was led by Jonathan Mattingly (Duke) and studied Qualitative

Behavior of Stochastic Dynamics. The objective of the group was to study a number of

topics in the qualitative description of stochastic dynamics. One example was some work

on planar dynamical system by Avanti Athreya, Tiffany Kolba, and Jonathan Mattingly.

They studied two specific systems of differential equations for which, without noise, the

equation has regions of instability, but with noise, it appears to stabilize. Another

example of this group‟s work was by Kevin Lin and Jonathan Mattingly, who were

concerned with simulation methods for estimating the sensitivity of solution of SDEs to

input parameters. Specifically, they came up with a new method of coupling, where the

basic idea is to construct two correlated realizations of the SDE for nearby parameters in

a way that ensures the two solutions remain near each other most of the time. By

constructing the simulations in this way, estimates of the sensitivity, that depend on the

differences between two realizations, have much smaller variance. The current focus of

the project is the numerical analysis of these methods. As a third example, Peter H.

Baxendale and Priscilla E. Greenwood have submitted a paper to the Journal of

Mathematical Biology that tries to explain why stochastic models for biological systems

often exhibit sustained oscillations at an amplitude well above the expected noise level.

Baxendale and Greenwood showed that in a general limiting sense the stochastic path

13

describes a circular motion modulated by a slowly varying Ornstein-Uhlenbeck process,

and gave numerical examples for several well-known models.

Working Group 2 was on Numerical Methods for Stochastic Systems, led by Alina

Chertock and Sorin Mitran. This group discussed a number of multi-scale numerical

methods that mix stochastic models at small scales with deterministic models at larger

scales. Particular issues included the incorporation of adaptive time stepping and the

initialization of microscale simulations consistent with the macroscale constraints. In

response to these challenges, Sorin Mitran proposed a novel time-parallel continuum-

kinetic-molecular (tpCKM) simulation scheme, which is designed to handle multiscale

systems where the microscale statistics do not relax quickly relative to the macroscopic

time scale. A third, mesoscopic, level of description was added and updated by an

adaptive predictor-corrector approach through communication with both the macroscale

and microscale simulations. Mitran and his graduate student Jennifer Young had

separately been studying polymeric and cellular blebbing with this method, while the

working group examined a one-dimensional string model where bonds could form and

break on the microscale with rates dependent on the local stress.

This Working Group also collaborated with Working Group 3 of the Space-Time

Analysis program (described in more detail under that program), which developed a

focused interest in multiscale computing. The collaboration looked at ways to improve

the statistical aspect of the communication between the mesoscale and microscale

dynamics in the tpCKM method. Statistical issues in multiscale computing appear not to

have been much pursued until now, but the close interaction of applied mathematicians

and statisticians at SAMSI has identified new areas of research along this direction.

Working Group 3 was on Data Assimilation, led by Cindy Greenwood, John Harlim and

Peter Kramer. Data Assimilation involves the incorporation of observational data into

mathematical models and is an area of high interaction between applied mathematicians

and statisticians. This working group was motivated by two specific examples, one

motivated by single neuron experiments where the input consists of an electrical current

and the output consists of voltage measurements across the neuron membrane, the other

by experimental ecologies consisting of micro-organisms observed in a laboratory

chemostat, where variables such a temperature and the supply of nutrients can be

controlled by the experimenter and are the inputs to the system. In both cases, the

problem is formulated as one of experiment design, i.e. how to vary the inputs so as to

achieve maximum information about unknown parameters of the system. The paradigm

that this group used to investigate such questions was optimal control theory. However,

there are some complications to the use of control theory in this context. The optimal

control is typically infinite, requiring some restriction on the range of admissible controls

or a penalty function. Another difficulty is that the Fisher information about unknown

parameters is very difficult to calculate and requires some approximation, such as the

information that would be obtained if the state of the system were exactly observable at

all time. Thus, the overall strategy is to simplify the problem to one that can be solved by

optimal control techniques.

Several papers are in preparation based on the work of this group. One approach uses an

implementation of particle filters for a nonlinear system. Another paper uses a linear

14

model but includes an analysis of the accuracy of the assumed approximations. A third

prospective paper involves a graduate student at Cornell University who is working on

deriving the optimal control in a discrete-state Hidden Markov model with application to

polymerese chain reactions.

Working Group 4 on Network Dynamics was led by Pierre Gremaud, Peter Kramer and

Peter Mucha. This group was concerned with understanding the way in which structural

properties of connections in a network of stochastic processes affect the behavior of the

network itself. The work was focused around three sub-themes.

Contagion Dynamics on Social Networks: This concerned the study of diseases and social

contagion on networks. One topic studied by John McSweeney and Bruce Rogers

concerned the spread of a disease through a population consisting of two communities,

where the probability of contact between two individuals of the same community is much

higher than the probability of contact between two individuals in opposite communities.

They showed a typical “stair step” solution, whereby the epidemic reaches equilibrium in

one community before spreading to the other. Work on this problem has continued into

the 2010-11 Complex Networks program; a paper is in preparation. Another topic studied

by Cindy Greenwood and Xeuying Wang concerned pairwise closure approximations for

two types of epidemic model. One example they studied let to a limiting process

exhibiting stochastic oscillations in the form of an Ornstein-Uhlenbeck process composed

with a time-dependent rotation; this led to a paper and a presentation at the Transition

Workshop.

Neuronal Network Dynamics: the focus of this subgroup was understanding how the

collective behavior of a network of neurons is influenced by neuronal-level dynamics and

the topology of the network. One focus was on the extension of kinetic theory to account

for the spatial distribution of the neurons. Another development was initiated by graduate

students Katie Newhall and Mandi Traud, with input from their advisors Kramer and

Mucha, concerning the relationship between statistical properties of neuron cascades and

the underlying network of interconnectivity. The work led to two presentations and a

paper being written up for the journal Chaos.

Institute-Participation Data in Mathematics: This subgroup was concerned with

understanding the social network of participants in NSF mathematical sciences institutes.

The group made some advances in understanding statistical methodologies and software,

but there was no immediate access to relevant data.

Working Group 5 on Biological Stochastic Dynamics was led by Amarjit Budhiraja,

Cindy Greenwood and Peter Kramer. The work of this group was motivated by the

explosion of interest in mathematical modeling in biological sciences, where stochasticity

is present at every scale.

The most successful outcome of this group was on the modeling and analysis of systems

of molecular motors bound to a single cargo. The central question concerns how the

details of the stochastic model characterizing individual molecular motors, including their

response to applied forces and proclivity to bind or unbind from the microtubule track, is

reflected in the effective dynamics of the complex of the cargo bound to multiple

molecular motors. The group's work was focused on constructing a physically consistent

15

spatial model, and applying stochastic averaging techniques to derive effective equations

for the motor-cargo complex from physically and biologically based stochastic dynamical

equations for the individual components. Avanti Athreya, John Fricks, Peter Kramer, and

Scott McKinley are currently writing up their work for publication under the title

“Cooperative dynamics of kinesin and dynein type molecular motors”. They plan to

continue their collaboration, which has led to two major grants submitted to the National

Science Foundation's Focused Research Group (FRG) program and the Joint

DMS/NIGMS Initiative to Support Research in the Area of Mathematical Biology. These

grant proposals extend the work of the SAMSI group to a mathematically integrated

stochastic model for cellular transport across multiple spatial scales.

Several other subgroups worked on Stochastic Single Neuron Models. One project by

Cindy Greenwood together with short-term visitor Laura Sacerdote and collaborator

Maria Teresa Giraudo led to the paper “How sample paths of Leaky Integrate and Fire

models are influenced by the presence of a firing threshold”, submitted to Neural

Computation. In another project, Avanti Athreya, Badal Joshi, and Xueying Wang

considered coherence and self-induced stochastic resonance in Morris-Lecar models for

neurons. The group extended previous work of Rowat into regimes where the

deterministic model has only a stable fixed point but no stable limit cycles. Part of the

work concerned the study and classification of behaviors in small networks of two or

three neurons, with the idea that qualitative understanding of such small networks could

lead to insights in larger, more biologically realistic, network sizes. A number of specific

questions were studied by simulation and are leading to more detailed analytic work.

Another subgroup studied microswimming, which is concerned with the dynamics of

tracer particles in a flow generated by a suspension of swimming microorganisms. The

work of this subgroup did not immediately lead to results, but has subsequently been

taken up Peter Kramer at his home institution together with a collaborator and a graduate

student.

(d) FULL-YEAR PROGRAM ON SPACE-TIME ANALYSIS FOR

ENVIRONMENTAL MAPPING, EPIDEMIOLOGY AND CLIMATE CHANGE

The program on Space-Time Analysis was extremely lively and resulted in many

research advances. The Opening Workshop was one of the best-attended workshops in

the history of SAMSI, and was followed by four others as well as a Transition Workshop.

Two of the mid-program workshops took place away from SAMSI itself – the one on

spatial epidemiology was held in conjunctions with the GEOMED conference, and hosted

by them at the Medical University of South Carolina. The was also a one-day workshop

on Objective Bayesian Analysis for Spatial-Temporal Models held in San Antonio,

Texas. SAMSI also played host to a joint workshop on environmental risk analysis, held

in conjunction with the European environmental statistics group SPRUCE. The fourth

mid-program workshop was on climate change and included a number of visiting climate

scientists as well as statisticians and applied mathematicians.

Educational activities hosted by this program included a pre-program summer school on

spatial statistics, featuring four visiting speakers who gave hands-on demonstrations of

software as well as lectures on the underlying theory. There were three regular graduate

16

student courses, and the program also contributed to SAMSI‟s undergraduate workshop

program.

The main bulk of the research activity was spread over nine working groups, covering

every aspect of space-time statistics, from the very theoretical (e.g. WG6, which included

comparisons of inference approaches), through to more applied and computational topics

(WG4 on computation and visualization; WG7 on geostatistics), to other forms of

stochastic models (WG8 on point processes, WG9 on nonstationary and non-Gaussian

processes) and three working groups on applications (WG1 on paleoclimate, WG2 on

spatial health effects and WG5 on spatial extremes, which although covering some

theoretical themes was mostly motivated by the combined application of spatial statistics

and extreme value theory to climatological datasets). The remaining working group,

WG3, was a joint working group between this program and the program on Stochastic

Dynamics.

The remainder of the present description outlines the main achievements of each of the

nine working groups.

Working Group 1 was concerned with Paleoclimate: the science of reconstructing past

temperatures based on proxy records such as tree rings, lake sediments and ice cores. The

modern instrumental records of weather date from about 1850. Proxies such as the ones

just mentioned allow us to extend the record of reconstructed temperature back by about

1000 years. These reconstructions have drawn particular attention because of the apparent

“hockey stick” shape that so many of them show: a steady or even slightly declining

record of temperatures for about 900 years prior to the twentieth century, then a sharp

increase in temperatures during the twentieth century. This is often interpreted as

evidence that the recent increases are human caused. However, there is no universal

agreement about the best statistical method for making these reconstructions, or indeed

the hockey stick shape, and all those that appear in the climate literature are limited when

it comes to giving realistic uncertainty estimates.

This Working Group went beyond previous reconstructions by considering the spatial as

well as temporal aspects of the data, in effect treating it as one of reconstructing the full

space-time field; and then using Bayesian Hierarchical Models (BHMs) to perform the

inference. Although BHMs have been used previously for paleoclimatic reconstructions,

they are still not a well-known technique among paleoclimatologists but this group

effectively demonstrated the power of this methodology.

The main activity was led by a group of six core members led by Stanford visitor (and

former SAMSI postdoc) Bala Rajaratnam and current SAMSI (later NCAR) postdoc

Martin Tingley, that led to a substantial paper submitted to Journal of Climate. The group

has continued working together since the SAMSI program ended, including helping to

organize a workshop at the National Center for Atmospheric Research (NCAR) in March

2011. Among the follow-up work is a forthcoming paper on the reconstruction of

temperature extremes (see also WG5, below).

Working Group 2 (Spatial Exposures and Health Effects) was the main group to take up

the “epidemiology” component of the program‟s title, though some other working group

17

activities (notably in WG7 and WG8) were also motivated by spatial disease mapping

problems. Much of modern epidemiology is concerned with studying how some adverse

health effect (such as mortality, hospitalization, asthma attacks) depends on air pollution,

ozone and particulate matter being the most common air pollutants studied. In recent

years, both more refined data collection techniques (e.g., geographical information

systems for mapping pollution fields, geocoding of addresses to associate individuals

with specific latitude-longitude coordinates) and more refined techniques for spatial

statistics have greatly expanded the possibilities for studying spatial associations in

epidemiology, which are potentially much more informative than either time series or

cohort studies. However, these methods create problems of their own, for example,

characterizing exposure measurement error has become a major topic of research.

This Working Group included researchers from all three local universities as well as one

senior outside visitor (Kate Calder from Ohio State) and several postdocs, graduate

students and researchers from the Environmental Protection Agency. Within this group,

six separate projects were identified covering various aspects of the spatial exposure

problem and also different health effects. For example, two of the projects were

concerned with the effect of air pollution on preterm birth. One particularly novel

application was the projection of future deaths in the southeast US taking into account

the effects of climate change, not only the direct effects on weather but also projected

increases in ozone that might arise as a side-effect of warmer temperatures, this neatly

tying together the “climate change” and “epidemiology” components of the program‟s

title.

One outcome of Working Group 2 was a jointly authored NIH proposal by five of the

participants, extending the climate change work just mentioned to propose a fully spatial-

temporal model for linking climate change, air pollution and human health outcomes.

Working Group 3 was on Interaction of Deterministic and Stochastic Models. This was

unusual in being a joint working group of the two SAMSI programs running in parallel,

Space-Time Analysis and Stochastic Dynamics. The goal was to work on statistical

issues in complex deterministic and stochastic models.

The goal of this group was to work on statistical issues in complex deterministic and

stochastic models. Particular themes included developing approximations, emulation and

calibration for complex computer models, and extending the “usual” statistics of spatial

modeling (accounting for dependence) to capture spatial dynamic processes that are of

scientific interest.

One subgroup that made particular progress looked at the use of multiscale simulation

and computational methods to study multiscale dynamical systems or heterogeneous

multiscale methods (HMMs). HMMs are used for simulating physical processes that have

both “macroscale” and “microscale” dynamics. This has become an active research area

in applied mathematics and scientific computing in the past decade and has lots of

important scientific applications, for example modeling complex fluid dynamics and

climate systems. This appears to be a rich and still unexplored area of research for

statisticians. Hence, a significant outcome of this Working Group's activities is the fact

that they opened up possibilities for a new and potential fruitful line of research. Specific

outcomes included a paper by SAMSI postdoc Emily Kang (whose training was in

18

statistics) and applied mathematician John Harlim on “A Fast Filtering Framework for

Assimilating Partially Observed Multiscale Systems”; plus several papers on simulation

and emulations in computer models, including applications to hydrology and climate

science.

Working Group 4 was on Computation, Visualization, and Dimension Reduction. This

was a very large Working Group that was formed to consider the computational and

visualization aspects of fitting complicated models to spatio-temporal data. Eventually

the group split into several subgroups. Brian Smith and Kate Cowles from the University

of Iowa worked on a R package “magma” (now publicly available) that enables parallel

matrix computations to be carried out on graphical processing units. Other subgroups

worked on specific scientific problems. For example, a group of seven participants led by

Noel Cressie (Ohio State) submitted a paper on “Dynamical random-set modeling of

concentrated precipitation in North America”, which used data from the North American

Regional Climate Change Assessment Project (NARCCAP), which is run out of NCAR,

viewing the set of locations where the precipitation exceeds a given threshold as a

random set. They then proposed a dynamical model for the temporal evolution of that

random set. The relevance of this work to climate change research is that the proposed

models can be used to project precipitation changes at smaller spatial scales than typical

global climate models; this is important because many of the potential impacts of climate

change involve intense precipitation and flooding at small spatial scales, and stochastic

models are needed to make meaningful statement about such events. Another subgroup

led by Bruno Sansό also used NARCCAP data to develop probabilistic projections of

future regional climate change by “blending” the projections from different climate

models; this paper has been submitted for publication and was also presented at a

NARCCAP users meeting in Boulder.

Working Group 5 was on Spatial Extremes. In recent years, extreme value theory –

widely used to study the statistical distribution of extremes in univariate time series – has

been extended first to the multivariate setting and then to spatial processes. One concept

that has been particularly widely studied is max-stable processes – a class of stochastic

processes that has the property of remaining invariant under the operation of forming

pointwise maxima, modulo location and scale transformations. However, statistical

inference for max-stable processes has until recently been considered problematic,

because of the nonavailability of a direct formula for the likelihood. Instead, recent

research has been concerned with a composite likelihood approach that requires only the

computation of pairwise joint densities. The resulting maximum composite likelihood

estimator (MCLE) has been proved asymptotically normal with a covariance matrix

defined by the well-known information sandwich formula; however, little is known about

its efficiency. This posed a number of problems for this Working Group, including the

development of a threshold-exceedances approach for estimating max-stable processes,

and Bayesian variants on the MCLE approach. From an applications point of view, most

of the motivation comes from the need to characterize the spatial distribution of extreme

meteorological events; two specific applications were the extension of paleoclimate

methods (see report of Working Group 1) to extreme events, and the recently developed

theory of “single-event attribution”, which is concerned with assessing the relative

probabilities of observed extreme events under either anthropogenic or natural causes.

Such calculations underlie statements about the extent to which extreme events such as

19

Hurricane Katrina or the 2010 Russian heatwave can be said to be “due to” human-

induced climate change.

This group benefitted from close interactions with the spatial extremes research group at

EPFL (Lausanne, Switzerland); two researchers from EPFL visited the group and others

participated in the weekly webex meetings. Outcomes included a new approach to

threshold-based estimation of max-stable process, and two contributions to Bayesian

estimation. Ben Shaby used Bayesian versions of the information sandwich approach to

develop valid inferences based on the composite likelihood, while graduate student Rob

Erhardt proposed an implementation of “Approximate Bayesian Computing” which, at

the cost of greatly increased computation time, avoids the problem of computing a

likelihood altogether. In simulations, he has succeeded in showing that the ABC method

improves on the composite likelihood method which, if the result is sustained in future

research, could lead to the ABC approach eventually displacing the composite likelihood

approach as the preferred method for fitting these models. A paper on this topic has been

submitted to Biometrika.

Working Group 6 was on Fundamentals of Spatial Modeling, led by Dongchu Sun of

the University of Missouri. This was the most theoretical of the working groups in this

program, concerned with fundamental inference issues. One subgroup was entitled

“Important issues in spatial and temporal models”, and was particularly concerned with

the relationship between traditional geostatistical models, that can be used equally well

with gridded or non-gridded data, and models of the “CAR” type, that require gridded

data. As one example of a new approach, Paul Speckman of the University of Missouri

has developed a theory of “Irregular Second Order Gaussian Markov Random Fields”.

Gaussian Markov random fields are standard for analysis of areal data or data on a grid.

The common models for areal data (e.g., CAR models) are first order. However, it is

possible to construct higher order priors for gridded data. A general method is proposed

for constructing second order priors for general point-referenced data. The priors

approximate Gaussian processes arising as solutions to stochastic differential or partial

differential equations. They can have higher order smoothness properties; they are

constructed to have sparse precision matrices. The new class of priors generalize higher

order Gaussian Markov random fields to arbitrary (gridded or not) data. The method has

been illustrated with an application to a rainfall data set.

A second subgroup worked on “Effects of spatial confounding”. This was motivated by

the fact that in spatial datasets there may well exist (measured or unmeasured)

confounding variables that also vary spatially; in this situation, it is hard to distinguish

the effect of the covariate from the effect of spatial dependence. This subgroup looked at

the effect of spatial confounding on covariate estimation.

A third subgroup was concerned with “Objective Bayesian and Fiducial Inference for

Spatial and Temporal Models”. This topic covers methods for determining prior

distributions in a Bayesian approach to have desirable frequentist properties. Another

approach is that of Generalized Fiducial Inference, that has been significantly extended in

recent years by Jan Hannig of UNC. A one-day workshop in San Antonio, TX, presented

recent results in both of these fields.

20

Working Group 7 on Geostatistics, led by Sudipto Banerjee of the University of

Minnesota, was another very large Working Group that split up into several subgroups.

One subgroup was on preferential sampling. This refers to situations where the selection

of sampling sites is not random but could be correlated with the outcome of interest in a

way that could possibly bias the results, e.g. when monitors are intentionally placed in

“hot spot” areas where pollution is expected to be high. The group planned to explore

underlying theoretical connections between preferential sampling and models for missing

data as well as the connections between spatial sampling designs and preferential

sampling. More generally, the group sought to understand the nature of preferential

sampling, modeling issues related to preferential sampling and the effect of preferential

sampling upon model performance and inference. Group member Jim Zidek has

continued the work of this subgroup since the end of the SAMSI program, and plans to

submit four research paper describing various theoretical and applied aspects of the topic.

Another subgroup worked on estimation techniques, in particular, devising efficient

inference and precision methods when the size of the dataset is too large to allow exact

implementation of the O(n3) matrix factorizations that are at the heart of tradition spatial

methods. One approach that the group explored was to combine effective modeling

aspects and inference methods. For instance, predictive process models provide

dimension-reduction while at the same time being amenable to approximate Bayesian

inference using Integrated Nested Laplace Approximation (INLA). Since the predictive

process models introduce no additional parameters over the regular (full) model

formulation, it is relatively straightforward to apply INLA. The speed-up compared with

MCMC is large. A technical report on this work has been produced by Eidsvik, FInley,

Banerjee and Rue.

Another project by this subgroup was parallel computing in the context of geostatistical

models. The group focused on composite likelihood models, where the structure is such

that parallelization is immediate. The variance of the estimate can be computed from the

Godambe sandwich estimate. Similar ideas, using composite likelihood models and the

sandwich estimate for variance, hold for prediction at unobserved sites too. A paper on

this work has been submitted by Eidsvik, Shaby and Reich.

Another subgroup of the Geostatistics working group focused on numerical model

evaluation. This was primarily concerned with developing Bayesian spatial methods for

calibrating deterministic forecasts to produce probabilistic forecasts. The work split in

two directions: spatial quantile regression and Bayesian spatial model averaging.

Numerical weather forecasts are often calibrated by adjusting the mean, and perhaps the

variance, of the predictive distribution. For non-Gaussian data such as precipitation, this

can lead to underestimation of extremes. To calibrate the entire conditional distribution

of precipitation given a forecast, the group proposed Bayesian spatial quantile regression.

The group also explored statistical methods to calibrate the spatial output generated by an

ensemble of numerical models. They extended previous work using Bayesian model

averaging (BMA) to produce probabilistic weather forecasts. The group produced a

productive and hopefully long-lasting collaboration between Ingelin Steinsland, Veronica

Berrocal, and Brian Reich. The group has so far submitted two papers for publication.

21

Working Group 8, on Point Patterns, was led by Alan Gelfand of Duke, with

substantive input from several SAMSI visitors. This group coalesced around three

primary research goals. The first (which yielded the “Mixture” subgroup) focused on

novel ways of introducing covariate information into modeling intensities. Special

settings here included mixture models for intensities, both parametric and Bayesian

nonparametric as well as generalized Neyman-Scott processes with covariate

information at parent level to drive locations and spreads for offspring.

The second goal (which yielded the “Practical” subgroup) was to bring to a more applied

level, spatial point processes with first and second order components (first order is a

usual nonhomogeneous Poisson process form λ(s), second order is a pairwise interaction

form γ(s,s'), inhibition, attraction). Issues here again included the introduction of

covariate information into such settings with interest in a fully model-based

implementation with associated technical questions such as integrability for the resulting

intensity. The third goal (which yielded the “Model diagnostics” subgroup) considered

an under-explored problem for point patterns, the question of model adequacy and model

comparison. For instance, how do we compare two nonhomogeneous point processes?

How do we compare a nonhomogeneous Poisson process with a pairwise interaction

process? What happens if we add dynamics, resulting in space time point processes?

Working Group 9, on Non-stationary and Non-Gaussian Processes, was led by David

Dunson (Duke), Jo Eidsvik (NTNU) and Montse Fuentes (NCSU), with considerable

input from visitors to SAMSI. The objective of the group was to explore non-stationarity

and non-Gaussianity for spatial and spatiotemporal models.

The main work here was split into three subgroups. The spectral domain subgroup was

based on using spectral methods to define new classes of covariance function. The group

proposed new models for nonstationarity by having a spectrum that is space-dependent.

More specifically, the group proposed a nonparametric estimate of the spectral density by

first defining a spatial periodogram estimate for each frequency, and then smoothing over

neighboring frequencies. The current methodology requires gridded data, but it could be

also extended to irregularly spaced data, by using a filtered spline representation across

space. This is work in progress. Open questions include asymptotic properties of the

estimators using fixed domain asymptotics, and extension to space-time processes.

A second subgroup worked on mixture models. This group built on much work from

Dirichlet processes, stick breaking processes, and other non-parametric models that have

become popular because of large flexibility and computational tractability. Dirichlet

mixture models are nonGaussian and nonstationary, but without replicates it is difficult to

determine if we have lack of stationarity or lack of normality. However, the Dirichlet

process (DP)-type of mixture models would allow for both. Mixture models are flexible

enough to characterize complex tail behavior, and can be used to model extremes through

a DP copula.

A third subgroup looked at non-stationarity models constructed by adding covariates in

the covariance function. Typically, the covariance structure is regarded as a nuisance that

we do not learn about, except specifying the scale and correlation range in some way.

The mean term, on the other hand, is constructed by sophisticated modeling from

22

available covariates. For prediction purposes, the covariates are important only through

the predictor, not the prediction variability. However, one can envision better prediction

and more useful prediction distributions by letting the covariance also depend on

covariates. One project that has been studied in the sub-group uses a mixture of Gaussian

processes with different scale and range, but weighted by covariates. The sub-group has

submitted a paper on this topic (Reich, Eidsvik, Guindani, Nail and Schmidt, 2010, A

class of covariate-dependent spatiotemporal covariance functions), applying the resulting

model to daily ozone levels in the South-Eastern US. It is currently under revision.

2. Human Resource Development

SAMSI‟s impact on human resource development is fully discussed in Sections I.B and

I.C, with impact on diversity highlighted in Section I.H. The individual program reports

also contain significant insight into human resource development.

SAMSI‟s postdoctoral fellows and associates again in 2009-10 have embraced the

interdisciplinary tenor of SAMSI programs and have engaged with visible enthusiasm in

the activities for graduate and undergraduate students. Most of those completing their

SAMSI fellowships are explicitly committed to continuation of interdisciplinary

collaborative research and/or interdisciplinary research with SAMSI collaborators.

The impact of new technology for remote participation in SAMSI working groups

continues to be steady; essentially every working group is actively using remote access to

working group meetings to include participants located outside the Triangle area, many

located outside the US. In some cases, even the working group leaders are remote.

An unexpected secondary success of incorporation of remote participants is the extension

of the lifetime of the working group. Numerous working groups from previous SAMSI

programs still operate, utilizing the SAMSI technology, even though none are actually

present at SAMSI.

The detailed participant lists for concluded programs provide ample evidence of the

national and international draw of SAMSI activities. SAMSI programs attracted 50

research fellows (visitors from outside the Triangle who played a major role in program

activities) plus 3 new researcher fellows, 12 local faculty fellows, 16 postdoctoral fellows

and 11 graduate fellows. The research workshops associated with the major programs

attracted a total of 540 participants; the undergraduate education and outreach workshops

a total of 130; the graduate probability workshop had 134 participants; the IMSM

graduate research workshop had 51; the planning workshop for Uncertainty

Quantification had 54, and the summer (2010) program in PKPD had 62 participants.

Altogether, this makes over 1,000 individuals who participated in SAMSI in some form

during the year.

Diversity: SAMSI puts considerable emphasis on contributing to the NSF‟s effort to

broaden the participation from underrepresented groups in the mathematical sciences.

During the past year, we have organized and co-sponsored many diversity related

activities. SAMSI has also developed a web page devoted to our diversity activities. The

23

page advertises the various program activities related to minority outreach and has links

to other diversity related information outside of SAMSI.

Pierre Gremaud has been serving as SAMSI‟s representative to the NSF Institutes

Diversity Coordination Committee which was formed in 2006 by Chris Jones (SAMSI)

and Helen Moore (formerly of AIM), and is now chaired by David Auckly (MSRI). The

Institutes Diversity Coordination Committee has been working together to promote

diversity in the Mathematical Sciences at national conferences and through other special

events.

Together with the other NSF mathematical sciences institutes, SAMSI has recently

secured supplemental NSF funding that will allow for the long term planning of an entire

portfolio of activities including the Modern Math workshop series, the Workshop

Celebrating Diversity series, the Minority Professional Development workshop, the

Careers for Minorities workshops, the Careers for Women workshops, the Blackwell

Tapia Conferences and some activities in conjunction with the Association for Women in

Mathematics. SAMSI took part in the Modern Math program at the 2009 SACNAS

National Convention in Dallas, TX, which included a presentation by one of the SAMSI

postdoctoral fellows.

Minority Participation in SAMSI Programs

At the postdoc level, out of 15 appointments associated with the 2009-10 Research

Programs, five were women and one was an underrepresented minority. For the 2010-11

Research programs, of seven postdocs hired, two are women and two are minorities. The

large number of postdocs hired in 2009-10 resulted from the supplemental NSF funding

for that purpose made available to the institutes that year.

SAMSI creates additional possibilities for under-represented groups through its education

and outreach activities. Of the four Education and Outreach workshops in 2009-10 (3

undergraduate, 1 graduate), there were 147 total participants, of whom 59 (40%) were

female and 11 (7%) were members of underrepresented minorities.

Overall Participations in Workshops by Underrepresented groups

The following chart shows an overall summary of the participation by underrepresented

groups at SAMSI events. The large spike in female and African-American participants in

2007-08 was partly due to the Infinite Possibilities conference, which focused specifically

on African-American women.

24

Workshop Evaluations: Detailed evaluations of workshops are given in Appendix F.

Here are the summary graphs indicating the satisfaction of participants.

0%

10%

20%

30%

40%

50%

60%

70%

% Female

% African-American

% Hispanic

% New Researcher-Students

0%

15%

30%

45%

60%

2002-032003-042004-052005-062006-072007-082008-092009-10

Perc

en

t o

f R

esp

on

ses

Year

Summary of Science at SAMSI Workshops (2002-April 2010)

Poor

Fair

Good

Very Good

Excellent

25

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Science Staff Facility Lodging Transport

Excellent

Very Good

Good

Fair

Poor

Workshop Evaluations Summary 2009-2010

0%

10%

20%

30%

40%

50%

60%

70%

80%

Science Staff Facility Lodging Transport

Poor

Fair

Good

Very Good

Excellent

Undergraduate Programs Evaluations Summary 2009-2010

26

It is SAMSI‟s policy always to attract and support the leading scientists, regardless of

nationality; but to otherwise focus resources on domestic participants. The table below

shows the nationality status of the participants who received some funding from SAMSI.

Year US Citizen or Permanent Resident

Foreign National Residing in US

Foreign National Not Residing in US

TOTAL

2002-03 209 87 36 332

2003-04 220 90 29 339

2004-05 158 71 21 250

2005-06 217 101 37 355

2006-07 222 146 60 428

2007-08 382 124 45 551

2008-2009 248 112 66 426

2009-2010

190 107 45 342

TOTAL 1,846 838 339 3,023

Percentage of all funded participants

61.1% 27.7% 11.2%

Includes all funded visitors and workshop participants. Total funded = 377

(342 + 35 people with no citizenship information.)

Broadening the DMS research impact: SAMSI‟s national impact also depends on

Institutional Diversity and the inclusion of participants whose home institutions are not

already heavily supported by NSF funding through DMS. Such inclusion develops the

national research base by significantly increasing the number of individuals that can

engage in cutting edge research. The SAMSI record in this regard during 2009-10 is

excellent, as shown in the following table (for both funded participants and all

participants). The „Other‟ category primarily includes individuals from other disciplines,

governmental agencies or laboratories, and industry.

27

2009-2010 SAMSI Participation

Funded Participants Home Institution by DMS Funding Level

Top 50 DMS Funded 51-200 DMS Funded Other

# of Institutions 39 39 70

# of People 155 96 126

% People 41.1% 25.5% 33.4%

All Participants

Top 50 DMS Funded 51-200 DMS Funded Other

# of Institutions 44 52 135

# of People 259 174 183

% People 42.10% 28.25% 29.71%

3. Education and Outreach

The impact of SAMSI courses and various components of the SAMSI Education and

Outreach program are documented in Section I.E. Part 4 and various program reports.

We summarize here specific new initiatives and specific highlights of the program.

(i) Two outreach workshops were held to expose undergraduate students from

programs around the country to topics and research directions associated with

the SAMSI Programs on Space-time Analysis for Environmental Mapping,

Epidemiology and Climate Change and Stochastic Dynamics. One goal of

these workshops was to illustrate the application and synergy between

mathematics and statistics which goes far beyond that which students have

seen in coursework. The overall objective was to broaden the perspective of

students with regard to both future graduate studies and career choices.

(ii) The one-week SAMSI Workshop for Undergraduates encompassed three highly

unique components.

All tutorials and sessions were presented by SAMSI graduate students and

postdocs under close supervision of directorate members, members of the

Education and Outreach Committee, and local faculty.

The workshop provided students with an intensive introduction to the synergy

between applied mathematics and statistics in the context of physical applications.

During one of the sessions, the students were introduced to a variety of

experiments and each team collected their own physical data.

28

(iii) The overall goals of the ten-day Industrial Mathematical and Statistical Modeling

Workshop for Graduate Students were twofold:

Expose mathematics and statistics students to current research problems from

government laboratories and industry which have deterministic and stochastic

components;

Expose students to a team approach to problem solving.

For the 2009 workshop, research problems were presented by scientists from the National

Center for Atmospheric Research, the Environmental Protection Agency, the UNC

Institute for Pharmacogenomics, Sandia National Laboratory, MIT Lincoln Laboratory

and the Republic Mortgage Insurance Co. Each team gave a 30 minute oral presentation

summarizing their results on the final day of the workshop and written reports were

compiled as the SAMSI Technical Report 2010-02 which can be obtained at

http://www.samsi.info/communications/technical-reports

(iv) The Kenan Fellows Progam pairs mentors from the SAMSI community with K-12

public school teachers who have been selected to be Kenan Fellows. The program‟s goals

include promoting teacher leadership, developing and disseminating exciting new

curriculum in science, technology, and math education, and addressing the problem of

teacher retention in public schools.

No new Kenan Fellow was appointed in 2010 as a suitable match between candidates and

available projects could not be found. As of summer 2011, Lori Craven was appointed to

work with Richard Smith on Statistical Aspects of Climate Change.

29

E. Evaluation by the SAMSI Governing Board - 2010

(George Casella, Michael Crimmins, Donald Estep, James Landwehr, John

Simon, Daniel Solomon – Chair)

The Governing Board provides broad oversight for the Institute‟s administration,

finances, and evaluation, and for relationships among the partnering institutions. This

year‟s evaluation follows as responses to three broad questions:

1) What are some outcomes of the synthesis of applied mathematics and statistics?

Consistent with its founding vision, SAMSI continues to foster interaction between

applied mathematics and statistics through the creation of programs focused on topics

that involve both disciplines. Working groups established under these programs build

teams of researchers consisting of applied mathematicians and statisticians as well as

those from other areas of the mathematical sciences. The results of these efforts lie not

only in the production of papers and reports, but also in the continued interaction among

members of the teams after the formal program is completed, and, most importantly, in

the culture of multidisciplinary interaction it has established.

Some of the interactions between applied mathematics, statistics and other disciplines for

which primary activity ended or started in the past year, and that we noted in the annual

report, are listed below.

The most striking development was the formation of a joint working group

between the two year-long research programs. Working Group 3 of the Space-

Time program on Interaction of Deterministic and Stochastic Models was set up

in conjunction with several members of the Stochastic Dynamics program. Many

fields of research rely on complex computer models that are essentially

simulations of deterministic dynamics, but because of their complexity, behave in

many respects like stochastic models. There is by now a well-established

literature that treats static snapshots of such models as realizations of spatial

processes, but for simulations in time as well as space, more understanding is

needed of the dynamics of the process. One particular application was the use of

simulation methods to study heterogeneous multiscale methods (HMMs). This led

to a joint paper between a SAMSI postdoc in statistics, Emily Kang, and an

applied math faculty member, John Harlim.

Another compelling example of collaboration was among two applied

mathematicians, a probabilist and a statistician in the molecular motors subgroup

within the Stochastic Dynamics Working Group 5. Avanti Athreya, Peter Kramer,

John Fricks and Scott McKinley developed a systematic stochastic modeling

30

framework for intracellular transport connecting three physical length scales.

They are in the process of writing up their work for publication and have

submitted two joint grant proposals since the program ended.

Another fertile topic for interactions between statisticians and applied

mathematicians is data assimilation, which refers to the incorporation of

observational data into a dynamical model. A working group within the Stochastic

Dynamics program looked at two specific models, one motivated by neurobiology

and the other by ecology, where the problem is essentially to estimate unknown

parameters within a dynamical experiment. The statistical concept of Fisher

information is useful for determining how well parameters are estimated within a

given experimental set-up, and can therefore be used as a basis for optimal

experimental design. Key participants included the probabilist/statistician Cindy

Greenwood as well as applied mathematicians John Harlim and Peter Kramer, and

several papers are in preparation.

2) Is the impact and national recognition of SAMSI on science and human resources

commensurate with the scale of SAMSI?

Section D of the Executive Summary describes the developments in research, human

resource development, and education. As above, consistent with its founding vision

SAMSI continues to have a significant impact on disciplinary sciences. We highlight a

few areas below. SAMSI programs have also been influencing the research careers of

program participants, helping to refocus research directions for some senior researchers

and providing formative experiences for post-docs and other junior scientists.

Two research programs toward new modeling approaches in microswimming

(Peter Kramer) and swarming (Amarjit Budhiraja, Oliver Diaz and Xin Liu) that

were nucleated by presentations at the workshop on Self-Organization and Multi-

Scale Mathematical Modeling of Active Biological Systems in October, 2009.

Several ongoing collaborations between SAMSI postdocs on topics in stochastic

neuronal dynamics and epidemic spread on networks.

A project initiated and driven by two graduate students (Katherine Newhall and

Amanda Traud) participating in the working group on Network Dynamics, now

being prepared for submission for publication.

Working Group 1 within the Space-Time program was on Paleoclimatology, the

science of reconstructing historical temperatures from proxies such a tree rings

and ice cores. This working group involved a SAMSI postdoc (Martin Tingley)

whose training was in atmospheric science but whose main current research is in

the applications of Bayesian statistics, a SAMSI visitor (and former postdoc) from

Stanford (Bala Rajaratnam), three other visiting statisticians and another SAMSI

31

postdoc. At the end of the year they submitted a substantial paper to the Journal

of Climate, proposed a Bayesian hierarchical approach for the reconstruction of

the space-time field of temperatures.

Working Group 2 of the Space-Time program was the main group working on

epidemiology, with particular emphasis on the use of spatial statistics model for

exposure error. Another theme was the effect of climate change on human health,

on which topic three of the group members have submitted an NIH grant proposal

under NIH‟s climate change initiative.

Several other groups within the Space-Time program interacted with climate

scientists to utilize the NARCCAP dataset, a large database of regional climate

models being created at the National Center for Atmospheric Research. Noel

Cressie and co-authors used this dataset to study how the spatial extent of a

rainfall field evolves over time; Bruno Sansό and colleagues took a different tack,

using the NARCCAP data to extend their previous work on inference from

ensembles of climate models.

The lists of refereed publications associated with SAMSI programs (see Section I.G. of

the full report) provide another measure of the impact on the mathematical and

disciplinary sciences. There were 40 accepted publications over the year. There were an

additional 43 papers submitted and 48 papers in preparation.

Earlier evaluations have noted some of the long-term impacts of SAMSI programs, e.g.

the establishment of research and education centers at the interface of applied

mathematics, statistics and disciplinary science. One such example is the Center for

Quantitative Sciences in Biomedicine (http://www.ncsu.edu/cqsb) with director and co-

director a statistician and applied mathematician respectively. Evidence that such

impacts are maturing is that CQSB (jointly with two other institutions) recently won a

$12.5M grant from the National Cancer Institute.

As another example of long-term impacts, in 2010 SAMSI hosted a National Town Hall

meeting for members of SIAM, ASA, and USACM (United States Association for

Computational Mechanics) about the joint establishment of Interest Groups in

Uncertainty Quantification in SIAM and ASA. Both societies subsequently established

these interest groups and USACM followed suit shortly after. As a consequence, SIAM

and ASA are cooperating to organize the SIAM/ASA Conference on Uncertainty

Quantification (to be held every two years). The first Conference will be held in April of

2012. The organizing committee and co-chairs are drawn evenly from ASA and SIAM

(as well as USACM). In addition, Jim Berger, Don Estep and Max Gunzburger

submitted a joint proposal to SIAM and ASA to create a Journal of Uncertainty

Quantification in Models that is being evaluated.

32

These activities developed as a direct result of SAMSI organizing the National Town

Hall meeting that brought together the statistics, mathematics and computing

communities around this important topic.

SAMSI‟s strong commitment to the development of human resources in the mathematical

sciences is summarized in Section D.2 of the Executive Summary and detailed in

Sections I.B, I.C and I.H of the full report. Videoconference and WebEx technologies

have now been adopted by all working groups, some with an international reach. Indeed,

in some working groups, the majority of participants engage by these means, and some

working groups continue to be active after the end of the formal program.

Participation by women and other underrepresented groups remains high although a bit

lower than last year. In particular for 2009-2010, the participants were 31% female, 2%

African American and 4% Hispanic. Participation by new researchers and students this

year is at 52%. The SAMSI website now offers information about its diversity programs

at http://www.samsi.info/about/diversity-statement .

The inclusion in SAMSI programs of a substantial number of participants from

institutions not heavily supported by NSF-DMS funding is detailed in the full report.

Less than half of funded participants in SAMSI programs are from top 50 DMS funded

institutions.

The detailed participant lists for concluded programs provide ample evidence of the

national and international draw of SAMSI activities. SAMSI programs attracted 50

research fellows (non-local visitors who played a major role in program activities) plus 3

new researcher fellows, 12 local faculty fellows, 16 postdoctoral fellows, 11 graduate

fellows and an overall total of over 1,000 in all types of SAMSI activities.

3) Is the Directorate meeting the needs of an evolving SAMSI?

Founding Director James Berger stepped down in 2010 after eight years of distinguished

leadership that developed the effective SAMSI “model” that we have today. Following a

national search led by a broadly-based nomination committee, Richard Smith was named

Director effective July 1, 2010.

The directorate model continues to serve SAMSI very well, and transitions in the

directorate have gone smoothly. Pierre Gremaud became full-time Deputy Director on

January 1, 2010 and has taken responsibility for oversight of the program operations. He

is scheduled to remain in the position through the close of the current award period. Nell

Sedransk continued to serve as Associate Director from NISS while NISS Director Alan

Karr continued to work closely with SAMSI.

33

The Governing Board itself continues to operate in the expanded structure implemented

earlier that now includes two representatives from beyond the four SAMSI partner

institutions who are selected by the American Statistical Association (Casella) and the

Society for Industrial and Applied Mathematics (Estep). The UNC-Chapel Hill

representative on the Governing Board became Michael Crimmins, who took over from

Bruce Carney as Senior Associate Dean for Sciences at UNC. As a result, the Governing

Board now includes two domain scientists from chemistry (Crimmins and Simon).

The Governing Board Chair and the SAMSI Director continue to have a biweekly

telephone conference at which administrative and personnel matters are regularly

discussed and issues addressed where they have arisen. There is also excellent

cooperation among the partner universities and NISS to ensure that obligations are met

and that SAMSI continues to flourish. Most notable during the period of this report are

the successful three-university financial arrangements made in connection with the

change in home institution of the Director.

34

Table of Contents

0. Executive Summary………………………………………………………………….. 3

I. Annual Progress Report ..............................................................................................35

A. Program Personnel………..……………………………………………..….35

1. List of Programs and Organizers ...............................................................

2. Program Core Participants .........................................................................

B. Postdoctoral Fellows and Associates .................................................................

1. Overview……………….……………………………………………...48

2. Postdoc Mentoring Plan .……………………………………………...49

3. List of Postdocs 2009-10……………………………………………...50

4. 2009-10 Postdoc Activities……………………………………………51

5. Postdoc Evaluations…………………………………………...………87

C. Graduate Student Participation...…………………………………..…….111

D. Consulted Individuals…………………………………………………...…114

E. Program Activities…………………………………………………………115

1. Stochastic Dynamics…………………………………………………116

2. Space-Time Analysis for Environmental Mapping, Epidemiology, and Climate

Change………………………………………………………………134

3. Semiparametric Bayesian Inference: Applications in Pharmacokinetics

and Pharmacodynamics……………….………………………………169

4. Education and Outreach Program……….………………..………….179

F. Industrial and Governmental Participation…………………………...…185

G. Publications and Technical Reports………………………………….…...186

H. Diversity Efforts………………………………………………………..…..197

I. External Support and Affiliates……………………………………………200

J. Advisory Committees…………………………………………………….…203

K. Income and Expenditures…………………...………………………….…206

L. Report of the Mathematics Sciences Institutes Directors Meeting….......210

II. Special Report: Program Plan ..................................................... .. A. Programs for 2010-2011 .....................................................................................

1. Analysis of Object Data………………………………………………….217

2. Complex Networks…………………………………………………..228

B. Scientific Themes for Later Years……………………………………...…238

C. Budget for 2009-2010………………………………………………………251

D. Financial Plan for 2009-2010…………………………………………...…257

Appendix A. Final Program Report: Sequential Monte Carlo Methods…..……………260

B. Final Program Report: Algebraic Methods in Systems Biology .....………293

C. Final Program Report: Summer Program in Meta-Analysis…..…………303

D. Final Program Report: Summer Program in Pschometrics………………312

E. Workshop Participant List………………………………………………. 409

F. Workshop Programs and Abstracts………………………………………486

G. Workshop Evaluations…………………………………………………….600

35

I. Annual Progress Report

Year 8 programs were PKPD (Summer 2010); Stochastic Dynamics; Space-Time Analysis for

Environmental Mapping, Epidemiology and Climate Change; their final reports are in Appendices A

and B, respectively.

A. Program Personnel

1. Program and Activity Organizers

Program Organizers

Program Name Affiliation Field

PKPD Peter Mueller M.D. Anderson

Cancer Center

Biostats

(Summer 2010) Gary Rosner M.D. Anderson

Cancer Center Biostats

2009-10 SAMSI Program

Stochastic Dynamics Rick Durrett Cornell University Statistics

Alejandro Garcia San Jose State Physics

Cindy Greenwood Arizona State U Math and Stats

Alan Karr NISS Statistics

2009-10 SAMSI Program Jonathan Mattingly Duke U Mathematics

Michael Minion SAMSI/UNC Mathematics

Peter Mucha UNC Mathematics

Hongyun Wang UC Santa Cruz Mathematics

Space-Time Analysis for James Berger SAMSI Statistics

Environmental Mapping, Noel Cressie Ohio State U Statistics

Epidemiology and Climate Peter Diggle Lancaster U Biostatistics

Change Montse Fuentes NCSU Statistics

Alan Gelfand Duke U Statistics

2009-10 SAMSI Program Peter Guttorp U Washington Statistics

Jun Liu Harvard U Statistics

Jesper Møller Aalborg U Mathematics

Richard Smith UNC Statistics

Michael Stein U Chicago Statistics

Dongchu Sun U Missouri Statistics

Jim Zidek U British Columbia Statistics

36

Education & Outreach Program Negash Begashaw Benedict College Mathematical Sciences

Carlos Castillo-Chavez (ex officio)

Arizona State U Mathematics

Karen Chiswell NCSU Statistics

Cammey Cole Meredith College Mathematics & CS

Anne Fernando Norfolk State U Mathematics

Pierre Gremaud NCSU/SAMSI Mathematics

Leona Harris College of NJ Mathematics

Marian Hukle U of Kansas Biological Sciences

Masilamani Sambandham

Morehouse College Mathematics

Activity Organizers

2008-09 Programs

Program Year

Activity Name(s)

Sequential Monte Carlo Methods

2008-09 Transition Workshop – November 9-10, 2009

Julien Cornebise (SAMSI), Arnaud Doucet (The Institute of Statistical Mathematics, Japan, and U of

British, Columbia), Simon Godsill (Cambridge U), and Mike West (Duke U)

2010 Summer Program

2010 Program on Semiparametric Bayesian Inference in PKPD Analysis – July 12-23, 2010

Gary Rosner and Peter Mueller (M.D. Anderson Cancer Center)

2009-10 Programs

Program Year

Activity Name(s)

Space-time Analysis for Environmental Mapping, Epidemiology and Climate Change

2009-10 Opening Workshop -- September 13-16, 2009

Noel Cressie (Ohio State U), Peter Diggle (Lancaster U), Montse Fuentes (NCSU), Alan Gelfand (Duke U), Peter Guttorp (U of Washington), Richard Smith (UNC), Michael Stein (U of Chicago), Dongchu Sun (U of Missouri), Jim Zidek (Chair, UBC)

2009-10 GEOMED: Spatial Epidemiology 2009 Workshop (Charleston) – November 14-16, 2009

Andrew Lawson (MUSC), Lance Waller (Emory), Peter Rogerson (Buffalo), Robert Haining,

(Cambridge), Sudipto Banerjee (Minnesota: SAMSI), Linda Young (Florida SAMSI)

2009-10 Climate Change – February 17-19, 2010

Mark Berliner (Ohio State), Jim Zidek (British Columbia), Richard Smith (UNC), Bruno Sansó (UC Santa Cruz), Tapio Schneider (California Institute of Technology), Claudia Tebaldi (Climate Central), Ilya

Timofeyev (Houston).

37

2009-10

One-day workshop on Objective Bayesian for Spatial and Temporal Models (San Antonio) - March 20-21, 2010

James Berger (SAMSI)

2009-10 Statistical Aspects of Environmental Risk Workshop – April 7-9, 2010

Richard Smith, (UNC), Montse Fuentes, (NCSU), Sudipto Banerjee (U of Minnesota), Scott Holan (U of

Missouri), Marian Scott (Glasgow U), Feridun Turkman (U of Lisbon), Clive Anderson (Sheffield U)

Stochastic Dynamics

2009-10 Opening Workshop – August 30-September 2, 2009

Hongyun Wang (UC Santa Cruz), Alejandro Garcia (San Jose State),Cindy Greenwood (ASU), Alan Karr (NISS), Jonathan Mattingly (Duke), Peter Mucha (UNC), Michael Minion (SAMSI), Rick Durrett (Cornell)

2009-10

Self-Organization and Multi-Scale Mathematical Modeling of Active Biological Systems – October 26-28, 2009

Igor Aronson (Argonne National Laboratory), Leonid Berlyand (Penn State U); Directorate Liaison: Pierre

Gremaud (SAMSI)

2009-10 Theory and Qualitative Behavior of Stochastic Dynamics Workshop – February 8-10, 2010

Michael Cranston (U of California-Irvine), Yuri Bakhtin (Georgia Tech), Luc Rey-Bellet (U of Massachusetts),

Jonathan Mattingly (Duke U)

2009-10 Workshop on Molecular Motors, Neuron Models, and Epidemics on Networks – April 15-17, 2010

Priscilla Greenwood (ASU), John Fricks (Penn State), Lea Popovic (Concordia), Scott Schmidler (Duke)

Education and Outreach

2009-10 Two-Day Undergraduate Workshop -- October 30-31, 2009

Pierre Gremaud (NCSU), Murali Haran (Pennsylvania State U)

2009-10 Two-Day Undergraduate Workshop -- February 26-27, 2010

Pierre Gremaud (NCSU), Scott Schmidler (Duke U)

2009-10 4th Annual Graduate Student Probability Conference – April 30-May 2, 2010

Jessi Cisewski, Tiffany Kolba, Xin Liu, Dominik Reinhold, and Rachel Thomas under the supervision

of Prof. Amarjit Budhiraja and Prof. Jonathan Mattingly (Duke U)

2009-10 SAMSI/CRSC Interdisciplinary Workshop for Undergraduates -- May 17-21 2010

Pierre Gremaud (SAMSI), Cammey Cole Manning (Meredith)

2011-12 Programs

2011-12 Uncertainty Quantification Electronic Townhall Meeting – April 6, 2010

Amy Braverman (JPL, Cal. Tech and UCLA), Don Estep (Colorado State), Roger Ghanem (USC), David Higdon (LANL), Christine Shoemaker (Cornell), JeffWu (Georgia Tech), James Nolan (Duke), Jan Hannig (UNC), Pierre Gremaud (SAMSI), Max Gunzburger (Florida State)

38

2. Program Core Participants and Targeted Experts

For each of the major programs, the following tables present the key participants for the programs.

The participants are categorized and coded as follows:

DL Distinguished Lecturer Program affiliated speaker

FF Faculty Fellow Teaching release from local university

FA Faculty Associate Program affiliated local faculty for which no release time is allocated

GF Graduate Student Fellow Student from local university, assigned to a specific program and paid a stipend

GA Graduate Student

Associate Program-affiliated local student with no stipend

VGF Visiting Graduate Fellow Non-local student, paid only expenses

NRV New Researcher Visitor Non-local researchers (holding PhD 5 years or less) brought in for short intervals for interaction with program participants

NRC New Researcher Core

Visitor Non-local researchers (including fellows) who play a major role in program activities

PF Postdoctoral Fellow Program-affiliated individual, paid a stipend in association with a local university

PA Postdoctoral Associate Program-affiliated individual with appointment shorter than 1 year

SV Senior Visitor Researcher (holding PhD 6 or more years) brought in for short intervals for interaction with program participants

RF Research Fellow Non-local researchers who play a major role in program activities

WG Working group Participant local participants of SAMSI working groups (not fellows, visitors or persons otherwise designated)

WGR Remote working group

participant remote participants of SAMSI working groups (not otherwise designated)

Grey – is used to indicate funds that are provided by partner university cost sharing.

Note: For visitors who have yet to visit SAMSI or who are still at SAMSI, dollar amount in the tables

below are the expense allotment for the visitor.

39

Last Name First Name Gender Affiliation Department Status

Assuncao Renato Male Universidade Federal de Minas Gerais Statistics RF

Aune Erland Male NTNU Statistics GF

Banerjee Sudipto Male U Minnesota Statistics RF

Bayarri Susie Female U Valencia Statistics RF

Bernardo Jose Male U Valencia Statistics RF

Berrett Candace Female Ohio State U Statistics GF

Berrocal Veronica Female SAMSI Statistics PF

Bhat K. Sham Male Penn State Statistics GF

Calder Kate Female Ohio State Statistics RF

Chang Howard Male SAMSI Mathematics PF

Chen Lisha Female Yale University Statistics RF

Conesa David Male U Valencia Statistics RF

Cressie Noel Male Ohio State Statistics RF

Das Sourish Male SAMSI Statistics PF

Eidsvik Jo Male NTNU Statistics RF

Ferreira Marco Male U Missouri Statistics RF

Fine Jason Male UNC Statistics FF

Finley Andrew Male Michigan State Physics RF

Last Name First Name Gender Affiliation Department Status

Fortes Annabel Female U Valencia Statistics GF

40

Franca Villoria Maria Female U Glasgow Statistics GF

Fuentes Montse Female NCSU Statistics FF

Gamerman Dani Male Math Institute Math RF

Gelfand Alan Male Duke U Statistics FF

Gomez-Rubio Virgilio Male Universidad de Castilla-La Mancha Math RF

Green Peter Male Bristol U Statistics RF

Guindani Michele Male U New Mexico Statistics RF

Haran Murali Male Penn State Statistics NRC

Harlim John Male NYU Math FF

He Chong Female U Missouri Statistics RF

Held Leo Male U Zurich Statistics RF

Herring Amy Female UNC Stats FF

Holan Scott Male U Missouri Stats RF

Ickstadt Katja Female Technische Universität Darmstadt Math RF

Jackson Monica Female American U Statistics RF

Kang Emily Female SAMSI Statistics PF

Kuensch Hans Male ETHZ Math RF

Lee Jaeyong Male Seoul Nat’l U Statistics RF

Levins Mihails Male Purdue Statistics RF

Last Name First Name Gender Affiliation Department Status

Li Linyuan Male U New Hampshire Statistics NRV

41

Liu Yajun Female U Missouri Statistics GF

Lopez Quilez Antonio Male U Valencia Statistics RF

Mannshardt-Shamseldin Elizabeth Female SAMSI Statistics PF

Manolopoulou Ioanna Female Duke University Statistics PA

Martinelli Gabriele Male Norwegian U Math GF

Martínez Beneíto Miguel Ángel Male

Centro Superior de Investigación en Salud Pública Statistics RF

Matteson David Male Duke U Stats NRV

Micheas Sakis Male U Missouri Statistics RF

Nicolis Orietta Female University of Bergamo Math RF

Paulo Rui Male Technical U Lisbon Stats RF

Rajaratnam Bala Male Stanford Mathematics RF

Ratmann Oliver Male Duke U Stats PA

Reich Brian Male NCSU Statistics FF

Rios-Doria Daniel M Arizona State University Mathematics and Statistics GF

Rue Håvard Male NTNU Statistics RF

Salazar Esther Female SAMSI Stats PF

Sanso Bruno Male UCSC Statistics RF

Schmidt Alexandra Female Universidade federal do Rio de Janeiro Statistics RF

Scott Marian Female U Glasgow Stats RF

Last Name First Name Gender Affiliation Department Status

Shaby Benjamin Male SAMSI Stats PF

42

Smith Richard Male UNC Statistics FF

Steinsland Ingelin Female Norwegian University of Science and Technology Statistics RF

Sun Dongchu Male U Missouri Statistics RF

Tingley Martin Male SAMSI Math PF

Yasamin Saieed M Stanford University Statistics RF

Young Linda Female University of Florida Statistics RF

Zhang Jun Male SAMSI Statistics PF

Zidek Jim Male UBC Statistics RF

Last Name First Name Gender Affiliation Department Status

Avanti Athreya F Duke University Statistics PF

Bessaih Hakima F University of Wyoming Mathematics RF

Budhiraja Amarjit M UNC-CH Statistics FF

Chertock Alina F NC State University Mathematics FF

Cornebise Julien M Duke University Statistics PF

Deng Shaozhong M UNC-Charlotte Mathematics and Statistics RF

Last Name First Name Gender Affiliation Department Status

Deville Lee M UIUC Mathematics RF

43

deSilva Nuwan Indurwage M

Rensselaer Polytechnic Institute Mathematics GF

Diaz Espinoza Oliver M Duke University Statistics PF

Fricks John M

Pennsylvania State University Statistics RF

Greenwood Cindy F Arizona State University

Mathematics and Statistics RF

Induruwage Nuwan M Rensselaer Polytechnic Institute Stat GF

Kramer Peter M Rensselaer Polytechnic Institute Mathematics RF

Lin Kevin M University of Arizona Mathematics RF

Mattingly Jonathan M Duke University Mathematics FF

McKinley Scott M Duke University Mathematics PA

McSweeney John M UNC-Chapel Hill Mathematics PF

Mitran Sorin M UNC-CH Mathematics FF

Mora Carlos M Univ. de Concepcion Mathematical Engineering RF

Mueller Carl M University of Rochester Mathematics RF

Newhall Katie F Rensselaer Polytechnic Institute Mathematics GF

Ng H. K. Tony M Southern Methodist University Statistics RF

Owen Megan F NC State University Mathematics PA

Pego Robert M Carnegie Mellon Mathematical Sciences RF

Popovich Lea F Concordia University Mathematics and Statistics RF

Last Name First Name Gender Affiliation Department Status

Rogers Bruce M Duke University Mathematics PF

Schmidler Scott M Duke University Statistics FF

44

Srinivasan Ravi M Duke University Mathematics PF

Sun Yi M NC State University Mathematics PA

Wang Xueying F NC State University Mathematics PF

Last Name First Name Gender Affiliation Department Status

Armagan Artin Male Duke University STAT NRG

Banks David Male Duke University STAT FP

Barbare Ralph Male Gad Consulting Services STAT NRG

Bayarri M.J. Female University of Valencia STAT FP

Bhattacharya Anirban Male Duke University STAT NRG

Bonate Peter Male GlaxoSmithKline OTHR FP

Boonstra Philip Male

University of Michigan Department of Biostatistics STAT NRG

Burgette Lane Male Duke University STAT NRG

Chen Shuguang Male GlaxoSmithKline STAT FP

Choi Leena Female Vanderbilt University STAT FP

Cornebise Julien Male University of British Columbia STAT FP

Last Name First Name Gender Affiliation Department Status

Cui Kai Male Duke Unviersity STAT NRG

Dahl David Male Texas A&M University STAT FP

Davidian Marie Female North Carolina State University STAT FP

De Iorio Maria Female Imperial College STAT FP

45

Ding Lili Female

Cincinnati Children's Hospital Medical Center STAT NRG

Duncan Thomas Male University of Southern California BIOSTAT FP

Dunson David Male Duke University STAT FP

Dutta Ritabrata Male Purdue University STAT NRG

Escobar Michael Male University of Toronto STAT FP

Fronczyk Kassie Female University of California STAT NRG

Gan Kevin Male GlaxosmithKline STAT NRG

Ghosh Jayanta Male Purdue University STAT FP

Ghosh Joyee Female UNC Chapel Hill STAT NRG

Ghosh Kaushik Male University of Nevada - Las Vegas STAT FP

Guha Subharup Male University of Missouri STAT FP

Guindani Michele Male University of New Mexico STAT NRG

Johnson Brendan Male GlaxoSmithKline LIFE FP

Johnson Wesley Male University of California – Irvine STAT FP

Korman Alexander Male Duke University STAT NRG

Kottas Athanasios Male University of California, Santa Cruz STAT FP

Last Name First Name Gender Affiliation Department Status

Kundu Suprateek Male UNC STAT NRG

Lee Ju Hee Female Ohio State University STAT NRG

Lewandowski Andrew Male Purdue University STAT NRG

Li Lang Male Indiana University MED / MOLEC GENETICS FP

46

Maceachern Steven Male Ohio State University STAT FP

Mueller Peter Male MD Anderson Cancer Center STAT FP

Oluyede Broderick Male Georgia Southern University STAT FP

Omolo Bernard Male UNC Chapel Hill STAT FP

Pang Herbert Male Duke University STAT NRG

Pararai Mavis Female Indiana University of Pennsylvania MATH FP

Pati Debdeep Male Duke University STAT NRG

Quispe Vargas Walter Male University of Puerto Rico STAT NRG

Rizwan Ahsan Male UNC LIFE NRG

Rodriguez Abel Male University of California STAT NRG

Rosner Gary Male Johns Hopkins University STAT FP

Roura Jaime Male University of Puerto Rico, Rio Piedras STAT NRG

Tadesse Mahlet Female Georgetown University STAT FP

Telesca Donatello Male UCLA STAT NRG

Tokdar Surya Male Duke University STAT NRG

Tomasetti Cristian Male University of Maryland MATH NRG

Last Name First Name Gender Affiliation Department Status

Torres David Male University of Puerto Rico NRG

Vicini Paolo Male

Pfizer Global Research and Development LIFE FP

Wahed Abdus Male University of Pittsburgh STAT FP

Wang Yanpin Female UNIVERSITY OF FLORIDA STAT NRG

47

Wolpert Robert Male Duke University STAT FP

Wu Meihua University of Michigan BIOSTAT NRG

Xiaojing Wang Female Duke University STAT NRG

Xing Chuanhua Female Duke University STAT NRG

Xu Xinyi Female Ohio State University STAT FP

Yang Mingan Saint Louis University STAT NRG

Yao Ping Northern Illinois University STAT FP

Yu Li Female Medimmune STAT NRG

Zhu Hongxiao Female

University of Texas MD Anderson Cancer Center STAT NRG

Zou Yuanshu Female University of Cincinnati STAT NRG

48

B. Postdocs

1. Overview

SAMSI had an increased number of postdocs this year, thanks to additional funds received from

DMS in 2008 to compensate for the severe shortage in new PhD hiring opportunities. While this

additional funding was very welcome, the additional numbers overwhelmed the available mentoring

resources, as demonstrated by detailed reports in Sections B.4 and B.5. The Directorate is therefore in

the process of redefining the objectives and mentoring of the postdoc program. In this section, we

present our goals for the program, followed by the revised mentoring plan in Section B.2, and then the

detailed postdoc reports.

Our goals for postdocs are to:

(i) Facilitate a transition from being a graduate student who works on problems suggested by

their adviser to an independent researcher who can formulate their own problems.

(ii) Achieve a broader perspective on the use of mathematics in applications.

(iii) Improve their presentations on research through participation in the postdoc seminar and

the undergraduate workshops, giving talks at local universities, and at regional and national meetings.

(iv) Enhance their teaching skills by teaching in their second year.

(v) Experience the pleasures of service to the community and enhance their communication

skills through outreach activities.

Program participation is the primary activity, and the one that attracts outstanding postdocs to

SAMSI. Postdocs typically participate intensively in two Working Groups in their “home program,”

working with national and international leaders. One of these groups is their main research focus, while

the other may be a secondary interest or a topic about which they wish to learn. Each postdoc is

responsible, for one Working Group, for maintaining the web page and for managing the use of WebEx.

Postdocs attend all courses associated with their program, which provides the background needed to

carry out their research and broadens their scientific horizons. They are required to attend all

workshops conducted by their home program, and encouraged to attend the opening workshops of

other concurrent programs, as well as events at NISS and the university partners.

Postdocs also actively engage in SAMSI’s Education & Outreach activities. They are presenters

and mentors in the week-long Undergraduate Modeling Workshop and the Two-Day Undergraduate

49

Workshop associated with their home program. The benefits to the postdocs are improved

communication skills, and experience serving to the mathematics and statistics community. Some

postdocs function as team mentors for the IMSM workshops for graduate students, and on occasion, as

team leaders.

Community engagement is the third key postdoc activity. From the day of their arrival at SAMSI,

postdocs join a community of SAMSI postdocs, NISS postdocs and other early career researchers in the

NISS-SAMSI complex. In some ways, the informal relationships are the most potent component of the

community. But, there are also more formal activities, especially a weekly postdoc seminar. A typical

seminar has two parts, one focusing on general postdoc issues and upcoming events, and the second

comprising a 30-minute presentation by one of the postdocs on his or her research.

2. Postdoc Mentoring Plan

Two mentors. Each SAMSI postdoc has two mentors, a Scientific Mentor and an Administrative

Mentor. The latter is a member of the SAMSI Directorate and is the same for both years. The Scientific

Mentor in the first year is one of the program organizers or a long-term visitor. In many cases the first

year Scientific Mentor is a visitor from outside the Research Triangle, while in the second year the

Scientific Mentor is a professor at the postdoc's associated university with closely related interests. As

the names indicate, the Scientific Mentor advises the postdoc on his or her research and in the second

year on teaching, while the Administrative Mentor is concerned with other aspects of professional

development and supervises the postdoc's performance of duties at SAMSI. For example, while the

Administrative Mentor will explain what happens in the job search process and when the steps occur,

the Scientific Mentor is in a better position to give advice about specific positions that the postdoc may

be considering.

Postdoc Orientation. Before the scientific program begins the postdocs (and graduate students

associated with the current programs) come to SAMSI for an afternoon of orientation activities. Part of

this is technical – many people participate in Working Groups remotely using WebEx, and the Working

Group relies on its web page to facilitate the collaboration, so the postdocs and graduate students need

to be trained in the use of these tools. The orientation also covers other technical aspects such as

computing and the use of SAMSI's webpages. Another important aspect of orientation is to spell out in

detail what the postdocs will do during their first year.

The postdoc seminar plays an important role in the mentoring of postdocs. The seminar meets

once a week and is run by one or two members of the Directorate. All of the first year postdocs, and

some in their second year, participate in this seminar. In a typical meeting, the first fifteen minutes or so

are devoted to informal discussion of general postdoc issues and upcoming events. After this, there is a

30-minute presentation by one of the postdocs on his or her research. Owing to the wide variety of

research interests of the postdocs, this must be a talk for a general audience, which introduces

background and carefully explains new concepts. Participants are encouraged to ask questions when

they do not understand, and to give the speaker constructive criticism at the end of the presentation. An

50

important goal of the seminar is to teach the postdocs how to give clear talks accessible to a wide

audience at conferences and seminars.

On some occasions, there is a more structured discussion of some topic such as refereeing

papers, applying for jobs, or writing grant applications. The seminar is not an appropriate place to

discuss personal issues, so the Administrative Mentor will meet with each of his or her postdocs roughly

once a month in an informal setting over lunch or coffee to make sure things are going smoothly.

Postdoc reports are important for documenting activities and are also useful to ensure that

postdocs articulate their plans. Each first-year postdoc writes three reports: (i) a research plan for the

year due by October 15; (ii) a first progress report and updated research plan due by January 21; and (iii)

a final report on the year's activities due by July 1. A second-year postdoc writes two reports: (i) a plan

describing the postdoc's research objectives and job search due by October 15, and (ii) a final report, to

be turned in before leaving SAMSI, which indicates the postdoc's job placement and describes the work

done while at SAMSI.

3. List of 2009-10 Postdocs

Avanti Athreya (Ph.D., Applied Mathematics, 2009, University of Maryland)

SAMSI Program: Stochastic Dynamics

Research Mentor: Jonathan Mattingly

Administrative Mentor: Michael Minion

Veronica Berocal (Ph.D., Statistics, 2007, University of Washington)

SAMSI Program: Space-Time Analysis

Research Mentor: Alan Gelfand

Administrative Mentor: Jim Berger

Howard Chang (Ph.D., Biostatistics, 2009,Johns Hopkins University)

SAMSI Program: Space-Time Analysis

Research Mentor: Alan Gelfand

Administrative Mentor: Jim Berger

Oliver Diaz-Espinosa (Ph.D., Applied Mathematics, 2006, University of Texas – Austin)

SAMSI Program: Stochastic Dynamics

Research Mentor: Jonathan Mattingly

Administrative Mentor: Michael Minion

Emily Kang (Ph.D., Statistics, 2009, Ohio State University)

SAMSI Program: Space-Time Analysis

Research Mentor: John Harlim

Administrative Mentor: Pierre Gremaud

51

John McSweeney (Ph.D., Mathematics, 2009, Ohio State University)

SAMSI Program: Stochastic Dynamics

Research Mentor: Cynthia Greenwood

Administrative Mentor: Michael Minion

Oliver Ratmann (Ph.D., Epidemiology, 2009, University College, London)

SAMSI Program: Stochastic Dynamics

Research Mentor: Katia Koelle

Administrative Mentor: Pierre Gremaud

Bruce Rogers (Ph.D., Mathematics, 2009, Arizona State University)

SAMSI Program: Stochastic Dynamics

Research Mentor: Peter Mucha

Administrative Mentor: Michael Minion

Benjamin Shaby (Ph.D., Statistics, 2009, Cornell University)

SAMSI Program: Space-Time Analysis

Research Mentor: Alan Gelfand

Administrative Mentor: Jim Berger

Yi Sun (Ph.D., Applied and Computational Mathematics, 2006, Princeton University)

SAMSI Program: Stochastic Dynamics

Research Mentor: Pierre Gremaud

Administrative Mentor: Pierre Gremaud

Martin Tingley (Ph.D., Earth and Planetary Science, 2009, Harvard University)

SAMSI Program: Space-Time Analysis

Research Mentor: Bala Rajaratnam

Administrative Mentor: Jim Berger

Xueying Wang (Ph.D., Mathematics, 2009, Ohio State University)

SAMSI Program: Stochastic Dynamics

Research Mentor: Cynthia Greenwood

Administrative Mentor: Pierre Gremaud

Jun Zhang (Ph.D., Statistics, 2009 University of Wisconsin – Madison)

SAMSI Program: Space-Time Analysis

Research Mentor: Richard Smith

Administrative Mentor: Nell Sedransk

4. Postdoc Activity Reports: See following pages.

SAMSI Postdoctoral Fellow Research Plan

Date: October 12, 2009

1. Name: Avanti Athreya

2. PhD ProgramUniversity and department: University of Maryland, College Park, AppliedMathematicsDissertation advisor: Dr. Mark FreidlinYear of PhD: 2009

3. SAMSI ResearchSAMSI program title: Stochastic DynamicsSAMSI research mentor: Jonathan MattinglySAMSI administrative mentor: Michael Minion

4a. SAMSI Activities Courses (fall and spring): Stochastic Dynamics Course;presentation on large deviations given on October 7, 2009Workshops attended (and workshop support tasks): Opening workshop, Stochas-tic Dynamics Program; workshop support—assistance to speakersPostdoc seminar: Presentation on “Metastability in Nearly-Hamiltonian Sys-tems,” Monday, September 21, 2009Undergraduate Workshop: Will participate in upcoming undergraduate work-shop on February 26–27, 2010

4b. Other activities: Involved in the Probability working group at DukeUniversity; Large Deviations reading group at SAMSI

5a. Working group I (Qualitative Dynamics)Special tasks for working group I: WebmasterPresentations to working group: talk on large deviations and metastability,September 16, 2009.Research area—plans: 1) Stochastic resonance in Morris-Lecar and Hodgkins-Huxley models; 2) large deviations and estimates of extinction in stochasticpredator-prey models (possibly also joint with Biological working group); 3)averaging and large deviations for multiscale diffusion processes

5b. Working group II (Biological Systems)Special tasks for working group II: None as yetPresentations to working group: None as yet

1

Research area—plans: related to above topics in Qualitative Behavior

5c. Working group II (Network Dynamics)Special tasks for working group II: None as yetPresentations to working group: None as yetResearch area—plans: Understanding and extending the model of Eames andKeeling for disease spread in networks by adding stochasticity first to diseasespread and next to network growth

6. Other researchPapers from PhD research: I’m working on deriving an LDP for a multidi-mensional diffusion process with a first integral; extending some results onmetastable “distributions” for nearly-Hamiltonian systems; and analyzing metasta-bility for the case of n independent one-degree-of-freedom oscillators.Presentations of other research: Seminar talks at NCSU (October 5, 2009) andUNC (November 9, 2009); talk at the Third Workshop in Random DynamicalSystems in Bielefeld, Germany (November 18–20, 2009).

2

SAMSI Postdoctoral Fellow Research Plan

Date:

1. Name Veronica Berrocal

2. Ph.D. ProgramUniversity & Department: Department of Statistics, University of Washington

Dissertation Advisor: Adrian Raftery

Year of Ph.D.: 2007

3. SAMSI ResearchSAMSI Program Title: Space-Time Analysis for Environmental Mapping, Epidemiology and Climate Change

SAMSI Research Mentor: Sudipto Banerjee

SAMSI Administrative Mentor: Jim Berger

4a. SAMSI ActivitiesCourse(s) (fall & spring): Fall 2009: Spatial EpidemiologyFall 2009: Theory of Continuous Space and Space-Time Processes Spring 2010: Spatial Statistics in Climate, Ecology and Atmospherics

Workshops Attended (and Workshop Support Tasks): Opening workshops for the Space-Time program (helped assisting the speakers)

Postdoc-Grad Student Seminar – Presentation(s): Presentation on October 26, 2009: ``On the use of a PM 2.5 exposure simulator to explain birthweight''

Undergraduate Workshop(s) – Participation (specifics to be added later): 50-minute talk at the Undergraduate workshop on October 30-31, 2009.

4b. Other Activities (e.g., teaching)Guest lectured for Alan Gelfand's course on ``Theory of continuous space and space-time processes'' on October 15, 2009

5a. Working Group IModel-based geostatistics and preferential sampling

Special Tasks for Working Group:Webmaster

Presentations to Working Group:

Research Area – Plans:I am planning on developing/extending previous modeling approaches to calibrate computer models (calibration of the input) and downscale the output from computer models to point level thus addressing the problem of change of support between the two sources of data.

5b. Working Group IISpatial Exposure and Health Effects

Special Tasks for Working Group:

Presentations to Working Group:Presentation on October 1, 2009 on approaches to model individual exposure to contaminants/pollutants accounting for spatial dependence

Research Area – Plans: I plan to work on assessing the effect of spatial aggregation on the estimates of risk

factors. More specifically, I plan to study how inference changes when a spatial process occurring at a point level is included in an epidemiological model at somespatial aggregated unit, rather than at the point level.

Another research project that I am planning to work on is to develop spatio-temporal models that relate health effects to exposure to multiple pollutants while adjusting for other factors. In particular, one issue that I would like to address is how to include the entire distribution function of a risk factor (i.e. age) for the population living in a spatial unit as a covariate in an epidemiological model linking a spatially-aggregated response variable to exposure.

5c. Working Group III (if appropriate)Non Gaussian and non-stationary spatial models

Special Tasks for Working Group:

Presentations to Working Group:

Research Area – Plans:

I intend to work on employing Dirichlet process to model non-stationary non-Gaussian spatial process. In particular, I am planning on combining predictive processes and Dirichlet processes for spatial prediction purposes.

6. Other ResearchWork on Papers from Ph.D. Research:

Other Research started or continued at SAMSI:

Continuing Collaborations while at SAMSI:I am continuing my collaborations with Marie Lynn Miranda (from Duke University), Dave Holland (from EPA), Claudia Czado (from Technical University of Munich) and Tilmann Gneiting (from University of Heidelberg).

Presentations of Other Research:

Research Progress Report & SAMSI Program Final Report

Date:

7a-z. Research Contributions – Current Projects (grouped by Working Group)Research Project Title: Collaborator(s) & Mentor(s):Specific Goals & Accomplishments (results):Research Contributions (publication submissions, articles in preparation, etc.):Presentations outside SAMSI (including invitations for future talks):

8. Future Research Plans (after completion of SAMSI Program)Research Area – Plans:Continuing Collaborations (if appropriate):Presentations outside SAMSI:

SAMSI Postdoctoral Research Plan

Date: October 16 2009

Name: Howard Chang

Ph.D. Program:

Department of Biostatistics

Johns Hopkins Bloomberg School of Public Health

Advisor: Francesca Dominici

Co-advisor: Roger Peng

Year of Ph.D.: 2009

SAMSI Research:

SAMSI Program: Space-time analysis of environmental mapping, epidemiology,

and climate change

Research Mentor: Alan Gelfand

Admin. Mentor: Jim Berger

SAMSI Activities:

Courses: Spatial Epidemiology (Fall)

Theory of Continuous Space and Space-time Process (Fall)

Spatial Statistics in Climate, Ecology, and Atmospherics (Spring)

Workshop Attended: Sept 2009 Spatial Opening Workshop

Nov 2009 GEOMED

Undergraduate Workshop: Oct 2009 Two-day Undergraduate Workshop

Other Activities:

Teaching:

� Taught three sessions in the SAMSI Spatial Epidemiology class.

Topic: Multi-site Time Series Analysis.

� Will present at the Oct 2009 undergraduate workshop and lead a one-

hour R lab session on related materials.

Working Group 1: Spatial Exposures and Health Effects

Tasks: Webmaster

Presentations: Presented Oct 8 on the challenges in defining appropriate exposure

measures for the analysis of air pollution and health data.

Research Plans:

(1) Spatial survival model with time-varying coefficients to study the effects of

particulate matter exposure on preterm birth. The data is obtained from the Southern

Center on Environmentally-Driven Disparities in Birth Outcomes at Duke University.

This project will be supervised by Brian Reich (NCSU) and Marie Lynn Miranda

(Duke).

(2) National spatial time series analysis of air pollution and mortality between 2001 to

2007. We will utilize county-level mortality data linked to daily exposure to air

pollutants derived from monitoring, numerical or fused data. The study design is

similar to the National Mortality, Morbidity and Air Pollution study. This project will

be supervised by Montse Fuentes (NCSU) and Lucas Neas (EPA).

Additional projects are still being discussed within the working group. I am particularly

interested in individual versus aggregated analysis and exposure assessment.

Working Group 2: Model-based Geostatistics

Tasks: Participant

Presentation: None

Research Plans: Interested in examining preferential sampling in air pollution monitoring

design and its implications in health effect analysis.

Other Research

Duke University

50% of my effort is with the Children’s Environmental Health Initiatives supervised by Alan

Gelfand and Marie Lynn Miranda. The following projects are currently planned:

1. Time series and case cross-over analysis of adverse birth outcomes and air pollution

in North Carolina 1999-2007.

2. Develop a spatial model for joint modeling of birth weight and gestational week.

3. A simulation study to examine methods for estimating the health effects of air

pollution using semi-individual data.

Thesis Research

Two papers have been submitted:

Chang HH, Peng RD, and Dominici, F. Estimating the Acute Health Effects of Coarse

Particulate Matter Accounting for Exposure Measurement Error.

Chang HH, Dominici, F, and Peng, RD. Bayesian Model Averaging for Clustered data.

Another paper is in progress comparing difference exposure indicators for estimating the health

effects of coarse particulate matter.

59

59

SAMSI Postdoctoral Fellow Research Plan Date: October 14, 2009 1. Name Sourish Das 2. Ph.D. Program

University & Department: University of Connecticut, Department of Statistics Dissertation Advisor: Dipak K Dey Year of PhD : 2008

3. SAMSI Research

SAMSI Program Title: Space Time Program SAMSI Research Mentor: Sudipto Banerjee SAMSI Administrative Mentor: Nell Sedransk

4a. SAMSI Activities Course(s) (fall & spring):

1. Workshops Attended (and Workshop Support Tasks):

1. Adaptive Design, SMC and Computer Modeling Workshop, April 15-17, 2009 2. SAMSI/CRSC Undergraduate Workshop, May 18—20, 2009 3. Summer Program on Pshchometric, July 7 – 17, 2009 4. Summer school in Spatial Statistics, July 28 – August 1, 2009

2. Postdoc-Grad Student Seminar – Presentation(s):

Scheduled for December 7, 2009

3. Undergraduate Workshop(s) – Participation (specifics to be added later):

SAMSI/CRSC Undergraduate Workshop, May 18—20, 2009 Two presentations were given in the workshop.

a. Reflection on the Data Collection and Modeling Experiences b. Statistical Analysis of Vibrating Beam: Inverse Problem

4. Poster presentation at the SMC Opening workshop September 8, 2008 5. Poster presentation at Pshchometric, July 7 – 17, 2009

4b. Other Activities (e.g., teaching)

1. Teaching Stat 101 (class of __ students) in the Department of Statistical

Science at Duke University, during Fall 2009. 2. Review papers for Journal of Multivariate Analysis

60

60

3. Review papers for Epidemiology

4. Review papers for Computational Statistics and Data analysis 5. Review papers for Journal of Statistical Planning and Inference

6. Organizing invited session at JSM 2009 :

Speakers are: (i) Mike Daniels (U of Florida, Gainsville), (ii) Bani Mallick (Texas A & M), (iii) David Dunson (Duke University), (iv) Sourish Das (SAMSI and Duke University)

7. Organizing invited session at ENAR 2010: (Accepted)

Speakers are: (i) Sourish Das (SAMSI/Duke University) (ii) Peter Thall (M.D. Anderson) (iii) Mike Daniels (U of Florida, Gainsville) (iv) Peter Muller (M. D. Anderson)

8. Heading to India in December 16, 2009 for job interview talk at (i) Indian Institute of Technology (IIT), (ii) Indian Statistical Institute (ISI) (iii) Indian Institute of Management (IIM) Returning to India on January 15, 2010 5a. Working Group: Computation, Visualization, and Dimension Reduction in Spatio-Temporal Modeling (Spatio-

Temporal Modeling)

Special Tasks for Working Group:

(i) Webmaster of the WG (ii) Send WebEx invitation to every member of the WG every week. (iii) Take minutes of the WG meeting

Presentations to Working Group: N/A

Research Area – Plans: Currently, working with Sudipto Banerjee on ―Space-Time Data Analysis using

Sequential Monte Carlo Method‖. Our objective is to implement currently developed predictive process to reduce the dimension and take the advantage of parallel computation which is very natural to SMC method.

61

61

Manuscript in preparation: Yes 6. Other Research

Work on Papers from Ph.D. Research: Other Research started or continued at SAMSI:

1. Gaussian Process Latent Variables Dynamic Models (Work in progress with David Dunson)

2. Bayesian Methods for Dynamic Models with Dipak Dey (Ready for submission)

3. On Bayesian Inference for Generalized Multivariate Gamma Distribution with

Dipak Dey (submitted Statistics and Probability Letters) 4. Extreme Drinking Behavior of Patients suffering Alcohol Dependence Disorder

using Pareto Regression with Harel, Dey, and Kranzler (Resubmitted in Statistics in Medicine after reviewers suggetions)

5. Adaptive Bayesian analysis of binomial proportions (2009)

with Sonali Das (In Press in South African Journal of Statistics) 6. Efficacy of Endoscopic Ultrasound (EUS) Guided Celiac Plexus Block (CPB) and

Celiac Plexus Neurolysis (CPN) for Managing Abdominal Pain Associated with Chronic Pancreatitis and Pancreatic Cancer: a systematic review and Meta analysis. (2009) with Kaufman, M. , Singh, G., Das, S., Micames, C., and Gress, F. (Accepted in Journal of Clinical Gastroenterology)

7. Elicitation of Expert Prior opinion in Context of Presidential Election (2009) with David Banks (work in progress: Tentative title)

Continuing Collaborations while at SAMSI:

Sonali Das, CSIR, South Africa

Marina Kaufman and Gurpreet Singh at SUNY downstate medical center, Brooklyn, NY 11203

Presentations of Other Research:

Research Progress Report & SAMSI Program Final Report Date: 7a-z. Research Contributions – Current Projects (grouped by Working Group)

62

62

Research Project Title: Gaussian Process Latent Variable Models and Gaussian Process Dynamic Models

Collaborator(s) & Mentor(s): David Dunson

Specific Goals & Accomplishments (results):

Presentations outside SAMSI (including invitations for future talks): N/A 8. Future Research Plans (after completion of SAMSI Program)

Research Area – Plans:

Continuing Collaborations (if appropriate):

SAMSI Postdoctoral Fellow Research Plan

Date:

1. Name: Oliver Rodolfo Díaz Espinosa

2. Ph.D. ProgramUniversity & Department: University of Texas at Austin, MathematicsDissertation Advisor: Rafael de la LlaveYear of Ph.D.: 2006

3. SAMSI ResearchSAMSI Program Title: Stochastic DynamicsSAMSI Research Mentor: Jonathan Mattingly, Cindy GreenwoodSAMSI Administrative Mentor: Michael Minion

4a. SAMSI ActivitiesCourse(s) (fall): • Stochastic Dynamical Systems

Workshops Attended: • Program on Stochastic Dynamics Opening Workshop (August 30-September 2)• The Mathematics Institutes' Modern Math Workshop at SACNAS (Oct 14-Oct

15)Postdoc-Grad Student Seminar – Presentation(s): • Long wave expansions for water waves over a random bottom (at SAMSI)

Undergraduate Workshop(s) – Participation (specifics to be added later):

4b. Other Activities Teaching: M103 Intermediate Calculus, Duke University 5a. Working Group I (Qualitative behavior of dynamical systems) Special Tasks for Working Group: Presentations to Working Group: Doob’s h-Transform (to be presented)

Research Area – Plans: • Understanding of stochastic dynamical systems (maps) and the study of the

notions of bifurcation in this setting.• Develop some numerical techniques to implement the h-transform when

conditioning over sets of small measure

5b. Working Group II (Biological Dynamics) Special Tasks for Working Group: Web-master

• Setting up webex for the weekly meetings and maintaining the web-pages of the groups and its subgroups

Presentations to Working Group:

Research Area – Plans:• Understanding the dynamics of molecular motors• Analysis of the mechanism behind oscillations in stochastic biological models

epidemiology and in population dynamics.

6. Other ResearchWork on Papers from Ph.D. Research:• Long wave expansion for water waves over a random surface• Directed polymers in random environment with product structureOther Research started or continued at SAMSI:Continuing Collaborations while at SAMSI: • Walter Craig (McMaster Univ.)• Catherine Sulem (U. of Toronto)• Konstantin Khanin (U. of Toronto)Presentations of Other Research:• Long wave expansion for water waves over a random bottom(at Modern Math Workshop at SACNAS)

Research Progress Report & SAMSI Program Final Report

Date:

7a-z. Research Contributions – Current Projects (grouped by Working Group) Research Project Title: Collaborator(s) & Mentor(s):

Specific Goals & Accomplishments (results): Research Contributions (publication submissions, articles in preparation, etc.): Presentations outside SAMSI (including invitations for future talks):

8. Future Research Plans (after completion of SAMSI Program) Research Area – Plans: Continuing Collaborations (if appropriate): Presentations outside SAMSI:

65

SAMSI Postdoctoral Fellow Research Plan Date: 1. Name: Elizabeth Mannshardt-Shamseldin 2. Ph.D. Program

University & Department: University of North Carolina at Chapel Hill, Department of Statistics and Operations Research (Statistics PhD).

Dissertation Advisor: Prof. Richard L. Smith Year of Ph.D.: 2008

3. SAMSI Research

SAMSI Program Title: Space-Time Analysis for Environmental Mapping, Epidemiology and Climate Change

SAMSI Research Mentor: TBD very soon SAMSI Administrative Mentor: Nell Sedransk

4a. SAMSI Activities Course(s) (fall & spring):

Spring 2010 Course in Spatial Statistics in Climate, Ecology and Atmospherics. Workshops Attended (and Workshop Support Tasks): *Opening Workshop for Space-Time Analysis for Environmental Mapping, Epidemiology and Climate Change. Presented a poster “Severe Weather Under a Changing Climate: Large Scale Indicators of Extreme Events” at the poster session. Worked as support staff assisting the workshop presenters for the Monday morning session. *Workshop on Environmental Climate Change, Feb 17-19, 2010. Presenter in Spatiall Extremes Working Group session – topic TBD; Participant. *Environmental Risk Workshop, Apr 7-9 2010. Participant, possible presenter TBD.

Postdoc-Grad Student Seminar – Presentation(s): Oct 19th, 2009: “Severe Weather Under a Changing Climate: Large Scale Indicators of Extreme Events”. Joint work with Eric Gilleland, NCAR.

One of the more critical issues with a changing climate is the behavior of extreme weather events, as these can cause loss of life, and have huge economic impacts. It is generally thought that such events would increase under a changing climate. However, climate models are currently at too coarse of a resolution to capture the very fine scale extreme events such tornadoes or hurricanes. One approach is to look at the behavior of large scale indicators of severe weather. Here several factors are considered as large scale indicators of severe weather, including convective available potential energy and wind shear. This presents some interesting statistical issues. Numerous approaches, including the use of the generalized extreme value distribution for annual maxima, the generalized Pareto distribution for threshold excesses, a point process approach, and a Bayesian framework, are examined. Each approach is critiqued and compared for goodness of fit, model robustness,

66

and predictive attributes on both re-analysis data and climate model output data. For the univariate case, it is relatively straightforward to analyze such data though numerous issues must be resolved. These issues include appropriate techniques for threshold selection and prior specification. A bivariate approach can also be considered. In addition, when analyzing weather extremes, one is faced with a spatial field. Predicting extreme weather events is an important, growing area of research and there remain many avenues for further exploration. Acknowledgements to Harold Brooks, Patrick Marsh and Matt Pocernich.

Graduate Workshop(s) – Participation:

This last July (2009) I served as a faculty mentor for the Industrial Mathematical and Statistical Modeling Workshop for graduate students at North Carolina State University. The objective of the workshop is “to expose graduate students in mathematics, engineering, and statistics to challenging and exciting real-world problems arising in industrial and government laboratory research.” Students get experience in team problem solving and are mentored by a faculty adviser and the industry problem presenter while working on the project for several days during the workshop. The group deliverable is a paper to be included in the workshop proceedings. In my role as a faculty adviser I developed and taught a short-course in extreme value theory, suitable to the various backgrounds of the statistics, mathematics, and education majors in the group of graduate students. I also was on-site during the workshop to answer questions about concepts and programming techniques. The group of graduate students that I mentored, along with Prof. Richard Smith and Dr. Eric Gilleland from NCAR as the problem presenter, developed a Bayesian modeling approach for an extremes application, resulting in a paper to be submitted to Extremes. I am looking forward to participating as a faculty mentor again this coming year (July 19 – 27, 2010), and am also taking an active role in recruiting problem presenters from industry. 4b. Other Activities (e.g., teaching)

This academic year I am serving as the adviser for an undergraduate statistics major at Duke who is writing an honors thesis. The ultimate objective, via the Duke department honors requirements, is a publishable paper in an appropriate statistics or scientific journal. I have been working with the student to develop an appropriate project in the area of extreme value theory. My mentoring role includes everything from tutoring basic concepts and programming techniques, finding interesting data for a relevant application, critiquing and proof-reading progress, and steering the project as it develops towards a publishable product.

In the spring (2010), I will be teaching Sta10 – “Stats And Quantitativ Literacy”, a concept-based introduction to statistics course for non-science majors at Duke. This has provided an interesting perspective on teaching introductory material. The course uses the text “Seeing Through Statistics” by Jessica Utts, which focuses on presenting statistical ideas as they relate to non-mathematical disciplines. Students are taught how to read and critique scientific and news reports for statistical appropriateness and accuracy. Very basic formulas are presented as necessary to aid students’ understanding of the relationships being investigated.

This last spring (2009), I proposed, developed, and taught a special topics graduate course in “Extreme Value Theory and Applications with Bayesian Methods at Duke

67

University. The course combined material from Coles’ 2001 book “An Introduction to Statistical Modeling of Extreme Values” and several relevant journal articles on emerging techniques. It was a great opportunity to further develop my pedagogical skills.

5a. Working Group I - Spatial Extremes (Working Group 5 – Mondays 11am) Special Tasks for Working Group: Webmaster Presentations to Working Group: Oct 19th – “Regional Modeling of Extreme Storms via Max-Stable Processes” by Stuart Coles, JRSS-B 1993.

“Max-Stable Processes: What’s Out There and Open Problems.” (working title. Abstract available soon.) Presentation to the workshop on Spatio-temporal Extremes and Applications, November 2009 at Ecole Polytechnique Fédérale de Lausanne, Switzerland.

Research Area – Plans: There are 8 projects proposed for the working group at the present time. I am looking into four of them, and will focus on one or two as the group progresses. (The following is an excerpt from Richard’s Smith working group proposal.)

1. Are max-stable processes the right mathematical approach? What alternatives are available?

5. Are there “threshold” versions of max-stable processes, and how should they be estimated?

6. One of the major aims of spatial statistics, as conventionally formulated, is to predict from observed to unobserved sites. How should we do that in the context of max-stable processes?

8. Find applications and analyze data! Examples: meteorological data (e.g. rainfall, windspeeds), environmental data (e.g. ozone), financial data.

5b. Working Group II - Paleoclimate (Working Group 1 – Thursdays 1pm) Special Tasks for Working Group: Participant Presentations to Working Group: Nothing scheduled at this time.

Research Area – Plans: Learning about paleoclimate data, proxy data, climate model development and output, and statistical challenges when dealing with climate model and proxy data. Very applicable to my own research as I deal regularly with climate model output.

6. Other Research

Work on Papers from Ph.D. Research:

“Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands.” Under Richard L. Smith, UNC-CH. Paper in preparation.

A difficulty that arises with traditional kriging methods is the fact that the standard formula for the mean squared prediction error does not take into account the estimation of the covariance parameters. This generally leads to underestimated prediction errors, even if the model is correct. Smith and Zhu (2004) establish a second-order expansion for predictive

68

distributions in Gaussian processes with estimated covariances. In my dissertation, I establish a similar expansion for multivariate kriging with non-linear predictands. The Bayesian coverage probability bias for prediction intervals is computed and compared with the frequentist plug-in method using the restricted maximum likelihood estimates of the covariance parameters. The expected length of a Bayesian prediction interval is computed. The form of the Bayesian predictive distribution can be expressed as partial derivatives of the restricted log-likelihood. This important development leads to possible analytical evaluation using a boot-strap method to obtain predictions for general, non-linear predictands, and overcomes the difficulties with iterative methods, such as MCMC, when the closed form of the distribution is not easily obtained. Future work can explore the existence of a matching prior. Applications include spatial analysis, environmental statistics, and network design.

Other Research started or continued at SAMSI:

“Downscaling Extremes: A Comparison of Extreme Value Distributions in Point-Source and Gridded Precipitation Data”. Joint work with Richard L. Smith, Stephen Sain, Linda Mearns, and Daniel Cooley. To appear (2009) Annals of Applied Statistics. (Finished this last month while at SAMSI.)

There is substantial empirical and climatological evidence that precipitation extremes have become more extreme during the twentieth century, and that this trend is likely to continue as global warming becomes more intense. However, understanding these issues is limited by a fundamental issue of spatial scaling: that most evidence of past trends comes from rain gauge data, whereas trends into the future are produced by climate models, which rely on gridded aggregates. To study this further, we fit the Generalized Extreme Value (GEV) distribution to the right tail of the distribution of both rain gauge and gridded events. The rain gauge data come from a network of 5873 U.S. stations, and the gridded data from a well-known re-analysis model (NCEP) and on climate model runs from NCAR's Community Climate System Model (CCSM). The results of this modeling exercise con_rm, as expected, that return values computed from rain gauge data are typically higher than those computed from gridded data; however, the size of the diverence is somewhat surprising with the rain gauge data exhibiting return values sometimes two or three times that of the gridded data. The main contribution of this paper is the development of a family of regression relationships between the two sets of return values that also take spatial variations into account. Based on these results, we now believe it is possible to project future changes in precipitation extremes at the point-location level based on results from climate models.

Continuing Collaborations while at SAMSI:

“Severe Weather Under a Changing Climate: Large Scale Indicators of Extreme Events”. Joint work with Eric Gilleland, NCAR.

There is much that can still be considered for this project. Once the data is available, a bivariate approach will be explored. The results so far indicate spatial dependence as well as multivariate dependence among the different large-scale indicators. Spatial extremes is an expanding field, and dependence structures can be introduced in different capacities to address both of these issues. There are several methods that can be investigated to resolve this dependence. In addition, while they provide an improvement in data retention, due to the tendency of extremes to occur in clusters and hence violate independence assumptions, threshold methods are susceptible to the same “loss of data” issues that they are intended to

69

resolve over the block-maxima approaches used in extreme value theory. The standard bias-variance trade-off is considered by Fawcett and Walshaw (2007), who provide an alternative approach to the standard errors which makes use of all of the data above the threshold while addressing the temporal dependence within extreme clusters. Heaton et al (2009, in preparation) considered a Bayesian modeling approach to address the tail behavior in physical regions for the indicators across the continental US, which could partially resolve the tail behavior issues occurring in the preliminary results for the global data. These approaches could both be considered.

Presentations of Other Research: “Max-Stable Processes: What’s Out There and Open Problems.” (Qorking title,

abstract available soon.) Presentation to the workshop on Spatio-temporal Extremes and Applications, November 2009 at Ecole Polytechnique Fédérale de Lausanne, Switzerland.

Research Progress Report & SAMSI Program Final Report Date: 7a-z. Research Contributions – Current Projects (grouped by Working Group) Research Project Title: Collaborator(s) & Mentor(s):

Specific Goals & Accomplishments (results): Research Contributions (publication submissions, articles in preparation, etc.): Presentations outside SAMSI (including invitations for future talks): 8. Future Research Plans (after completion of SAMSI Program) Research Area – Plans: Continuing Collaborations (if appropriate): Presentations outside SAMSI:

60

SAMSI Postdoctoral Fellow Research Plan Date: 10/15/2009 1. Name John McSweeney 2. Ph.D. Program

University & Department: The Ohio State University, Mathematics Dissertation Advisor: Boris Pittel Year of Ph.D.: 2009

3. SAMSI Research

SAMSI Program Title: Stochastic Dynamics SAMSI Research Mentor: Priscilla Greenwood SAMSI Administrative Mentor: Michael Minion

4a. SAMSI Activities Course(s) (fall & spring): Stochastic Dynamics: theory, modeling, and computation

(SAMSI); Jonathan Mattingly's probability course (Duke)

Workshops Attended (and Workshop Support Tasks): Stochastic Dynamics Opening workshop; technical support for speakers during Wednesday session

Postdoc-Grad Student Seminar – Presentation(s): None yet (scheduled for 11/23/09) Undergraduate Workshop(s) – Participation (specifics to be added later): None yet 4b. Other Activities (e.g., teaching) N/A 5a. Working Group I Biological Stochastic Dynamics Special Tasks for Working Group: local liaison for Lea Popovic, leader of the Reaction networks subgroup Presentations to Working Group: 10/20/09: Stochastic Reaction Networks (planned)

Research Area – Plans: -To understand the role of stochasticity in chemical and

intracellular reaction networks (including multi-scale). Based on the papers [BKPR], and [LW], under the theoretical framework presented in [K]

-To develop a single-neuron model for axonal transport (with B. Joshi, C. Greenwood, X. Wang)

5b. Working Group II Network Dynamics Special Tasks for Working Group: none

61

Presentations to Working Group: 9/30/09: Demonstration of iGraph software (with R interface) 10/21/09: Exploring the link between connectivity correlation properties of diesase dynamics on graphs (as presented in [EK]) and second-order connectivity properties of neuronal networks (as presented in [LN])

Research Area – Plans: -To develop stochastic versions of disease dyanmics on graphs -To establish links between social contagion networks and neuronal networks

(based on suggestions by P. Mucha and J. Nolen)

5c. Working Group III (if appropriate) N/A Special Tasks for Working Group: Presentations to Working Group:

Research Area – Plans: 6. Other Research

Work on Papers from Ph.D. Research: N/A Other Research started or continued at SAMSI: Informal working group on large

deviations (with A. Athreya and X. Wang), based on the textbook by Dembo and Zeitouni

Continuing Collaborations while at SAMSI: With D. Pearl, D. Schmidt (Ohio State/MBI) and B. Joshi (Duke): Coalescent models for parapatric speciation

Presentations of Other Research:

Bibliography [BKPR] Ball, K., Kurtz, T., Popovic, L., and Rempala, G. Asymptotic analysis of multiscale approximation to reaction networks. Annals of Applied Probability, Vol 16 no 4, 1925-1961. [EK] Eames, T.D. and Keeling, M.J. Modeling dynamic and network heterogeneities in the spread of sexually transmitted diseases, PNAS, vol 99 no. 20, 2002. [K] Kurtz, T. Averaging for Martingale problems and stochastic approximation. (1992) [LW] Lepzelter, D. and Wang, J. Exact probabilistic solution of spatial-dependent stochastics and associated spatial potential lasndscape for the bicoid protein Phys. rev. E. vol 77, 041917, 2008. [LN] Liu, C.-Y. and Nykamp, D.Q. A kinetic theory approach to capturing interneuronal correlation correlation: the feed-forward case. J. Computational Neuroscience, vol 26: 339-368, 2009.

72

SAMSI Postdoctoral Fellow Research Plan Date: October 15, 2009 1. Name: Bruce Rogers 2. Ph.D. Program

University & Department: Arizona State, Mathematics Dissertation Advisor: Thomas Taylor Year of Ph.D.: 2009

3. SAMSI Research

SAMSI Program Title: Stochastic Dynamics SAMSI Research Mentor: Peter Mucha SAMSI Administrative Mentor: Michael Minion

4a. SAMSI Activities Course(s) (fall & spring): Stochastic Dynamics (Fall)

Workshops Attended (and Workshop Support Tasks): Stochastic Dynamics Opening Workshop

Postdoc-Grad Student Seminar – Presentation(s): Poincare Inequalities on Markov Chains (Oct. 10)

Undergraduate Workshop(s) – Participation (specifics to be added later): 4b. Other Activities (e.g., teaching) Teaching Math 32 (Calc 2) at Duke 5a. Working Group I Special Tasks for Working Group: Data Assimilation Presentations to Working Group: Intro. to Stochastic Control (Oct. 6)

Research Area – Plans: We've formulated a problem in experimental design in the language of control theory; how can one find a maximally efficient asymptotic

estimator subject to some (multiplicative or additive) controls? Specifically, I would like to find an experimental regime that maximizes the power of estimating

parameters in the Lotka-Volterra model (over some control space).

5b. Working Group II Special Tasks for Working Group: Network Dynamics Presentations to Working Group: Statnet Tutorial (Sept. 30)

Research Area – Plans: Eames and Keeling (PNAS vol 99 (20)) have introduced a method for

incorporating network heterogeneities in a compartmental SIS model. In their

73

deterministic model, they subcompartmentalize by the degree of the individual. By making a

"natural" stochastic differential equation model from the deterministic framework, will

there be bimodality in the distribution of epidemic size as in the traditional stochastic

SI(R) model? What level of network information is needed in the

subcompartmentalized model to accurately capture the distribution of epidemic sizes?

5c. Working Group III (if appropriate) Special Tasks for Working Group: Presentations to Working Group:

Research Area – Plans: 6. Other Research

Work on Papers from Ph.D. Research: Poincare Inequalitites on Markov Chains--2 papers in preparation; Control of Consensus under Bounded Confidence--in prepartaion; Other Research started or continued at SAMSI: Mathematical models of Status Hierarchies Optimal Number of Food Sharing Partners Continuing Collaborations while at SAMSI: Network Affects on Computational Models of Generalized Reciprocity with

Cecelia Menjivar (in preparation) Stochastic Agro-Ecosystem Model with David Murillo (nearly submitted) Modeling Internet Traffic with Matthew Hindman (book and several

papers) Presentations of Other Research:

Research Progress Report & SAMSI Program Final Report Date: 7a-z. Research Contributions – Current Projects (grouped by Working Group) Research Project Title: Collaborator(s) & Mentor(s):

Specific Goals & Accomplishments (results): Research Contributions (publication submissions, articles in preparation, etc.): Presentations outside SAMSI (including invitations for future talks):

74

8. Future Research Plans (after completion of SAMSI Program) Research Area – Plans: Continuing Collaborations (if appropriate): Presentations outside SAMSI:

SAMSI Postdoctoral Fellow Research Plan

Date: 10/16/2009

1. NameBen Shaby

2. Ph.D. ProgramUniversity & Department: Cornell University, StatisticsDissertation Advisor: Martin Wells and David RuppertYear of Ph.D.: 2009

3. SAMSI ResearchSAMSI Program Title: Space-Time Analysis for Environmental Mapping, Epidemiology

and Climate ChangeSAMSI Research Mentor: Officially, Alan Gelfand, but he is not involved in any of

the working groups I have joined. The local faculty members who have been active in my working groups are David Dunson and Montse Fuentes, so I may end up being more involved with one or both of them.

SAMSI Administrative Mentor: Jim Berger

4a. SAMSI ActivitiesCourse(s) (fall & spring): Theory of Continuous Space and Space-Time ProcessesWorkshops Attended (and Workshop Support Tasks): Spatial opening workshop,

where I helped one or two speakers plug their usb thumb drives into the conference laptop.Postdoc-Grad Student Seminar – Presentation(s): I'm scheduled for 11/9, when I'll

talk about hierarchical Bayesian black box regression updating.Undergraduate Workshop(s) – Participation (specifics to be added later): I'll be

giving a short introduction to R.

4b. Other Activities (e.g., teaching)

5a. Working Group I Computation, Visualization, and Dimension Reduction in Spatio-Temporal Modeling

Special Tasks for Working Group:Presentations to Working Group: scheduled for 10/30 - “All you need to know about

(parameter estimation using) covariance tapering”Research Area – Plans: We don't seem to be quite far enough along to have specific

research plans. This is a little frustrating because there is a lot of talent and expertise in the group, but it really hasn't gotten off the starting block.

5b. Working Group II Non-Gaussian and Non-stationary Spatial ModelsSpecial Tasks for Working Group: WebmasterPresentations to Working Group:

Research Area – Plans: This is also a work in progress. So far it looks like there will be two foci: nonparametrics (via Dirichlet process type models), and models in the spectral domain. It's not yet clear how this will shape up.

5c. Working Group III (if appropriate) Spatial ExtremesSpecial Tasks for Working Group:Presentations to Working Group:Research Area – Plans:

6. Other ResearchWork on Papers from Ph.D. Research: Submitted a paper to JASA and one to JCGS

since I've been here. Also, I'm finishing up one for Biometrecs.Other Research started or continued at SAMSI: Working on some asymptotic theory

for spatial parameter estimation, an “open-faced sandwich” adjustment for m-estimation within MCMC, and a new method for dealing with huge datasets.

Continuing Collaborations while at SAMSI: I am still finishing papers with the Cornell Lab of Ornithology. I am also just starting a collaboration with Petros Drineas from RPI computer science, and I have continued to work with Cari Kaufman from UC Berkeley.

Presentations of Other Research: I'll be giving the weekly theory seminar in the computer science department at RPI 9/3.

SAMSI Postdoctoral Fellow Research Plan Date: 10/15/2009 1. Name Ravi Srinivasan

2. Ph.D. Program University & Department: Division of Applied Mathematics, Brown University Dissertation Advisor: Govind Menon Year of Ph.D.: 2009

3. SAMSI Research SAMSI Program Title: Stochastic Dynamics SAMSI Research Mentor: Jonathan Mattingly SAMSI Administrative Mentor: N/A

4a. SAMSI Activities Course(s) (fall & spring): -Stochastic Dynamics: Theory, Modeling, and Computation Workshops Attended (and Workshop Support Tasks): -Opening Workshop, 2009 Program on Stochastic Dynamics Postdoc-Grad Student Seminar – Presentation(s): -09/21/2009--Implementation of wikis to replace working group websites, jointwith Julien Cornebise (see below). -11/30/2009 (pending)--Will present work on `Kinetic theory of shock clusteringin Burgers turbulence.' Undergraduate Workshop(s) – Participation (specifics to be added later): -None at present time. 4b. Other Activities (e.g., teaching) -Designed, created, and implemented wiki to replace websites for working groups,assisted current postdocs in the transition, and packaged it for use in future years (jointlywith Julien Cornebise). 5a. Working Group I: Network Dynamics Special Tasks for Working Group: -Assisted in transition to wiki (see above). Presentations to Working Group: -09/30/2009--Presented intro to NetworkX software. Research Area – Plans: -A problem of current interest within the Network Dynamics group is toinvestigate the stochastic analogues of sub-compartmental models that arise in disease dynamics(and in other models of social contagion). At present, we have fairly detailed knowledgeregarding the SDE corresponding to a mean-field SIR model, which exhibits bimodality in thedistribution of total epidemic size. An interesting problem would be to consider similar

-09/21/2009--Implementation of wikis to replace working group websites, jointwith Julien Cornebise (see below). -11/30/2009 (pending)--Will present work on `Kinetic theory of shock clusteringin Burgers turbulence.' Undergraduate Workshop(s) – Participation (specifics to be added later): -None at present time. 4b. Other Activities (e.g., teaching) -Designed, created, and implemented wiki to replace websites for working groups,assisted current postdocs in the transition, and packaged it for use in future years (jointlywith Julien Cornebise). 5a. Working Group I: Network Dynamics Special Tasks for Working Group: -Assisted in transition to wiki (see above). Presentations to Working Group: -09/30/2009--Presented intro to NetworkX software. Research Area – Plans: -A problem of current interest within the Network Dynamics group is toinvestigate the stochastic analogues of sub-compartmental models that arise in disease dynamics(and in other models of social contagion). At present, we have fairly detailed knowledgeregarding the SDE corresponding to a mean-field SIR model, which exhibits bimodality in thedistribution of total epidemic size. An interesting problem would be to consider similar

distributional quantities in an SDE model obtained from a sub-compartmentalized SIR model (i.e.,one that drops the mean-field assumption). This modification has already proven more effectivethan the mean-field model in recovering average epidemic size in a model of STD spread (see K.Eames and M. Keeling (2002)). What is the proper SDE model in this case? Can we show that thissystem also exhibits the expected bimodality in epidemic size, at least numerically? How robustwould such a sub-compartmentalized model be with regards to the network structure? This projectwould be in collaboration with Peter Mucha and Cindy Greenwood, among others. 5b. Working Group II: Qualitative Behavior of Stochastic Dynamic Systems Special Tasks for Working Group: -Assisted in transition to wiki (see above). Presentations to Working Group: -10/21/2009 (pending)--Will present paper on `Rare Events in Stochastic PartialDifferential Equations on Large Spatial Domains' by E. Vanden-Eijnden and M. Westdickenberg. Thepurpose is to investigate the role of entropy provided by the large spatial extent in phenomenaexhibiting metastability, and how the large size of the system actually regularizes thetransition through a law of large number type of effect. Research Area – Plans: -There are several problems of interest that arise with regard to the paper tobe presented on 10/21/2009. One problem involves a proper derivation of kink formation in aone-dimensional sharp interface limit of the stochastic Allen-Cahn equation with symmetricpotential. The problem here reduces to an analysis of annihilating Brownian walkers on the realline. Although the dynamics of the model have been studied after kink formation has occurred(see, e.g., I. Fatkullin and E. Vanden-Eijnden (2003)), there is an issue with how kinks form tobegin with: two Brownian walkers are randomly spawned from the same location but instantlyannihilate a.s. due to sample path properties of Brownian motion. In the case of an asymmetricpotential there may be a finite probability of non-annihilation which would provide a mechanismfor kink formation in the model. I believe this would be an interesting problem to investigate. Another problem is to find a rigorous derivation of the JMAK nucleation andgrowth model from the large system SPDE model discussed in the paper. This is a subtle problemthat may not be feasible as a short-term project but it may be worth investigating with theworking group on a longer time scale. It may also be quite interesting to consider the problemof a multi-well potential corresponding to a cascade of nucleations and how it relates to thePNG model, an in turn, the problem of largest eigenvalues of random matrices (this statement isexplicitly made in the paper). Although these connections have already been studied in theliterature, it may be worth reviewing the steps that lead to this relationship.

5c. Working Group III (if appropriate) Numerical Methods for Stochastic Systems Special Tasks for Working Group: -Assisted in transition to wiki (see above). Presentations to Working Group: -None at present time. Research Area – Plans: -My main purpose in attending this working group is to broaden my perspective onproblems in numerical methods for multiscale systems (deterministic or In particular I wouldlike to obtain a deeper understanding of the relevance of kinetic theory in out-of-equilibriumproblems that exhibit non-Gaussian statistics.

distributional quantities in an SDE model obtained from a sub-compartmentalized SIR model (i.e.,one that drops the mean-field assumption). This modification has already proven more effectivethan the mean-field model in recovering average epidemic size in a model of STD spread (see K.Eames and M. Keeling (2002)). What is the proper SDE model in this case? Can we show that thissystem also exhibits the expected bimodality in epidemic size, at least numerically? How robustwould such a sub-compartmentalized model be with regards to the network structure? This projectwould be in collaboration with Peter Mucha and Cindy Greenwood, among others. 5b. Working Group II: Qualitative Behavior of Stochastic Dynamic Systems Special Tasks for Working Group: -Assisted in transition to wiki (see above). Presentations to Working Group: -10/21/2009 (pending)--Will present paper on `Rare Events in Stochastic PartialDifferential Equations on Large Spatial Domains' by E. Vanden-Eijnden and M. Westdickenberg. Thepurpose is to investigate the role of entropy provided by the large spatial extent in phenomenaexhibiting metastability, and how the large size of the system actually regularizes thetransition through a law of large number type of effect. Research Area – Plans: -There are several problems of interest that arise with regard to the paper tobe presented on 10/21/2009. One problem involves a proper derivation of kink formation in aone-dimensional sharp interface limit of the stochastic Allen-Cahn equation with symmetricpotential. The problem here reduces to an analysis of annihilating Brownian walkers on the realline. Although the dynamics of the model have been studied after kink formation has occurred(see, e.g., I. Fatkullin and E. Vanden-Eijnden (2003)), there is an issue with how kinks form tobegin with: two Brownian walkers are randomly spawned from the same location but instantlyannihilate a.s. due to sample path properties of Brownian motion. In the case of an asymmetricpotential there may be a finite probability of non-annihilation which would provide a mechanismfor kink formation in the model. I believe this would be an interesting problem to investigate. Another problem is to find a rigorous derivation of the JMAK nucleation andgrowth model from the large system SPDE model discussed in the paper. This is a subtle problemthat may not be feasible as a short-term project but it may be worth investigating with theworking group on a longer time scale. It may also be quite interesting to consider the problemof a multi-well potential corresponding to a cascade of nucleations and how it relates to thePNG model, an in turn, the problem of largest eigenvalues of random matrices (this statement isexplicitly made in the paper). Although these connections have already been studied in theliterature, it may be worth reviewing the steps that lead to this relationship.

5c. Working Group III (if appropriate) Numerical Methods for Stochastic Systems Special Tasks for Working Group: -Assisted in transition to wiki (see above). Presentations to Working Group: -None at present time. Research Area – Plans: -My main purpose in attending this working group is to broaden my perspective onproblems in numerical methods for multiscale systems (deterministic or In particular I wouldlike to obtain a deeper understanding of the relevance of kinetic theory in out-of-equilibriumproblems that exhibit non-Gaussian statistics.

6. Other Research Work on Papers from Ph.D. Research: -Submitted joint work with Govind Menon on `Kinetic theory and a Lax pair forshock clustering and Burgers turbulence,' Sept. 2009. -Edited/resubmitted work on `Rates of convergence for Smoluchowski's coagulationequations,' Oct. 2009 Other Research started or continued at SAMSI: -At present I am collaborating with Jonathan Mattingly and Scott McKinley on aproject closely connected to my Ph.D. research. The relevant problem here is to consider theBurgers turbulence problem with a stationary stochastic forcing at the boundary of thehalf-infinite domain in order to understand how randomness is propagated through the system viathe deterministic flux. That is, we seek to understand the asymptotic properties of theinvariant measure in space, and hope to be able to do this via the Lax pair formulationinvestigated in the random initial data problem in my thesis. Although the problem isinteresting from a purely mathematical perspective, it may be relevant in obtaining a deeperunderstanding of shell models of turbulence and other physically motivated problems. -Another problem I am working on is that of complete integrability of theBurgers turbulence model with white noise initial data. This problem is of fundamental interestin a variety of contexts since it is essentially the study of the convex hull of Brownian motionwith a parabolic drift. Here we seek an appropriate representation of the model as aninfinite-dimensional Hamiltonian system, and to use classical methods in integrable systemstheory (Riemann-Hilbert analysis, inverse scattering) to rederive the exact solution obtained byGroeneboom (1989) for white noise data. I have been discussing this problem with StephanosVenekides at Duke. Continuing Collaborations while at SAMSI: -None at present time. Presentations of Other Research: -None at present time.

SAMSI Postdoctoral Fellow Research Plan

Date: 16 October 2008

1. Martin Tingley

2. Ph.D. ProgramUniversity & Department: Harvard University, Department of Earth and Planetary Sciences.Dissertation Advisor: Peter Huybers.Year of Ph.D.: 2009.

3. SAMSI ResearchSAMSI Program Title: 2009-10 Program on Space-time Analysis for Environmental Mapping,Epidemiology and Climate Change.SAMSI Research Mentor: Bala Rajaratnam, Noel Cressie.SAMSI Administrative Mentor: Pierre Gremaud.

4a. SAMSI ActivitiesCourses: Theory of Continuous Space and Space-Time Processes (Fall), Spatial Statistics inClimate, Ecology and Atmospherics (Spring).Workshops Attended: Opening workshop for the spatial program, climate workshop in February,transition workshop when it occurs.Postdoc-Grad Student Seminar: Presentation: "A Bayesian approach to reconstructing climatefields from proxy time series, applied to 600 years of high northern latitude surfacetemperatures" (12 October 2009).Undergraduate Workshop: Presentation at the 30-31 October workshop, ""Paleoclimate!Reconstructing past temperatures using natural thermometers."

4b. Other Activities

I don't know what I'm supposed to put in this section.

5a. Working Group I : PaleoclimateSpecial Tasks for Working Group: Webmaster.

Presentations to Working Group: "A primer on climate proxy data," one hourpresentation on 24 Sept 2009. I will be lead the discussion on 29 October, "Open problems inpaleoclimate research."

Research Area – Plans: This working group is well aligned with my doctoral work,which I plan on continuing and expanding over the next few years. My goals with regards to thepaleoclimate working group and my own research are twofold: 1) learn more statistics from theother group members (and from the SAMSI community in general), to further my own researchagenda, and 2) interest the larger statistical community in paleoclimate research. There is a need

for more statistically sophisticated approaches to the analysis of paleoclimate data, and mybackground in both the earth sciences and statistics puts me in a unique position to makeadvancements in this regard. The paleoclimate working group at SAMSI is an excellent forum tomake progress on the statistical (as opposed to data) side of climate reconstruction problems, andI plan on taking full advantage of the opportunity.

Our working group is still in the exploratory stage, with the meeting so far focusing on thesources of paleoclimate data and various statistical approaches used to combine proxy records toinfer past temperatures. In the next few weeks, we plan on honing in on specific researchquestions, and I plan on being heavily involved in both this process, and the actual research.

My graduate work focused on the development of BARCAST, a "Bayesian Algorithm forReconstructing Climate Anomalies in Space and Time." As currently construed, this algorithmmakes the simplest set of assumptions that are at all reasonable. The submitted manuscript thatdescribes the algorithm (see section 6, below) includes a list of possible generalizations, and Iwould like to implement some of these over the next year. In particular, I currently make use ofan isotropic, exponentially decaying spatial covariance form, and assume that each proxyobservation is linearly related to the local (in space and time) value of the true field. Both ofthese assumptions can be generalized, and doing so will allow for the inclusion, for example, ofproxy observations that represent some average of a climate field in space and time. Afunctioning algorithm that can combine proxies that relate to the underlying climate field ondifferent spatial and temporal scales would be of broad interest to the paleoclimate community.

5b. Working Group II : Spatial ExtremesSpecial Tasks for Working Group: None.Presentations to Working Group: Nothing planned at this time.Research Area – Plans: I've been attending the meetings for this working group due to

the importance of extreme events in the science of climate change. The frequency of extremetemperature and precipitation events, and extreme storms, are all conjectured to increase as greenhouse gas concentrations continue to rise. The notion of space-time extremes is thereforerelevant to my own interests in studying the climate of the past as well as the current changes inthe climate system. My hope is that I learn something relevant to these problems in this workinggroup. So far, however, the meetings have been rather theoretical, and lacking a strong (actually,any) background in the statistics of extremes, I've felt rather lost. My hope is that at some point,the group deals with some practical applications, which will ground some of the theoreticalnotions in reality.

5c. Working Group III : Computation, Visualization, and Dimension Reduction inSpatio-Temporal Modeling

Special Tasks for Working Group: None.Presentations to Working Group: Nothing planned at this time.Research Area – Plans: Paleoclimate data sets are large, in both space and time. A

temperature reconstruction at a 5 by 5 degree resolution over the northern hemisphere involves1296 spatial locations, and many hundred to several thousand time points. The computational

challenges posed by these data sets are considerable, and I hope to learn techniques, tricks, andinsights for dealing with large space-time data sets. So far, the working group has lacked acohesive focus, perhaps due to the large number of people involved and the broad array ofsubjects covered in presentations made at the group meetings. When the group begins to laydown more specific problems and challenges, and perhaps defines subgroups to investigateparticular topics, I may become more involved.

With regards to the working groups in general, my goals are practical: to advance my ownresearch objectives, to learn more statistics, and to interest more statisticians in the science ofclimate change. That said, I am, of course, more than open to working on new problems thatmight fall outside of my current research plans, provided they are motivated by interesting real-world applications. One difference I have noticed between the earth sciences and SAMSI is thatresearch talks in the earth sciences are almost always motivated by a particular real-worldproblem, and the focus of the talk will be on advancing knowledge with regards to that problem.Theoretical developments and the fitting of complex models are considered steps towards asolution, rather than the main objects of intellectual interest. At SAMSI, I have seen a number oftalks that are very well motivated by real-world applications (monitoring fish stocks in theMissouri river, tsunami prediction, . . . . ), but which focus on theoretical developments, theconstruction of a sophisticated data model, or computational tricks that allow the analysis to beconducted. I have been somewhat alarmed at the number of times these talks have not comearound full circle, to explain how the theory feeds back to advance understanding of themotivating problem. Perhaps this is a difference in academic cultures, whereby the earth sciencecommunity sees the practical problem as of greatest interest, while the math/stats communityoften (it seems) considers the model building and theoretical developments to be of greatestinterest. I recognize the importance of theoretical work, but am most interested in problemsfirmly grounded in practical, real-world applications - preferably real world applications thathave some form of societal relevance.

While these remarks are perhaps somewhat off topic, I include them in this research proposal asthey provide background information about my own approach to research. The 'A' in SAMSI canjust as well apply to the 'S' as to the 'M,' and that has certainly been, and will likely continue tobe, the case in my own work.

6. Other Research

Work on Papers from Ph.D. Research:

Submitted manuscripts:

Tingley, Martin P. and Peter Huybers. A Bayesian Algorithm for Reconstructing ClimateAnomalies in Space and Time. Part 1: Development and applications to paleoclimatereconstruction problems.

Tingley, Martin P. and Peter Huybers. A Bayesian Algorithm for Reconstructing ClimateAnomalies in Space and Time. Part 2: Behavior and comparison with the regularizedexpectation-maximization algorithm.

Tingley, Martin P. and Peter Huybers. The spatial mean and dispersion of surface temperaturesover the last 1200 years: warm intervals are also variable intervals.

The first two papers are in revision at Journal of Climate. We are in the process of responding tothe second round of reviews, and the manuscripts will be resubmitted shortly. We are confidentthey will be accepted in short order.

The third paper, which concerns a project largely unrelated to the other papers and my main lineof research, was submitted to Climatic Change. The review process took well over a year, and asa result, the data sets used in the paper are somewhat out of date. The reviewers have asked forsignificant changes to the data analysis we present, as well as major changes to the structure andfocus of the paper. Dealing with these reviews will be a major effort, and while I do plan onmodifying the paper and resubmitting, doing so is a secondary priority.

The final chapter of my thesis presented an analysis of a high northern latitudes multi proxytemperature data set, which extends back to 1400. I have been talking about this work for a fewmonths now, and writing it up for publication is a high priority. My collaborators on the project(Peter Huybers and Konrad Hughen) feel that we should aim for a high profile publication, therationale being that the analysis is the longest temperature reconstruction for the region that isboth spatially complete and annually resolved, and it is the first practical application of a newapproach (BARCAST) to reconstructing past climate. Our plan is to submit to Science, and if weare rejected, to Nature and then Geophysical Research Letters.

Presentations of Other Research, Continuing Collaborations while at SAMSI:

Here is a list of non-SAMSI research activities I have planned for the coming academic year.

- I gave the weekly seminar at UNC's Department of Geological Sciences on 15 October 2009.

- I will be attending a symposium on Arctic climate and meeting with collaborators (PeterHuybers, Konrad Hughen) at Harvard University, 9-11 November 2009.

- I will be visiting NCAR on 9-10 December 2009, to meet with Doug Nychka, Stephen Sain,Caspar Ammann and others about the work I will be conducting there next year.

- AGU, 14-18 December 2009. The format of my presentation, entitled "High northern latitudetemperature extremes, 1400-1999," has yet to be assigned.

- I will be meeting with Michael Mann, Scott Rutherford, and other members of their researchteam at AGU, to discuss possible collaborations.

- Tentatively, I will be speaking at Lamont-Doherty Earth Observatory of Columbia Universitysometime in the spring, and exploring possible collaborations with a number of scientists there,including Ed Cook, Kevin Anchukaitis, and Jason Smerdon. That group has also been interactingwith Andrew Gelman, so I am excited about possible research opportunities at the intersection ofstatistics and the earth sciences.

Other Research started or continued at SAMSI:

One of the recurring themes in the continuous space-time processes class being taught at SAMSIthis semester is the inadequacy of separable models of space time covariance structures in realdata applications. In my own work on paleoclimate reconstructions, I assume a separablecovariance form, and most other approaches do not even consider temporal auto correlation. Inshort, the statistical techniques used to reconstruct past climate are primitive relative to thetechniques discussed in class. At some point, it seems likely that the paleoclimate communitywill adopt more sophisticated statistical models. I am interested in a starting a simple projectabout the issue of separability in the context of paleoclimate reconstructions.

I have no concrete plans at this time, but am thinking of something along the following lines.First, I am curious to see if the instrumental and proxy data sets I've been using display non-separable space-time covariance structures. I believe Alan Gelfand mentioned tests forseparability have been developed, so I would have to learn about those. If the data sets areseparable, then this is an interesting result, as it would show that relatively simple statisticalmodels are appropriate for the analysis. I suspect, however, that the tests will show that the datasets display non-separable space-time covariance. This is also an interesting result, as it showthat there is still room for improvements in the climate reconstruction business, and that currentstatistical models are not capturing the richness of the underlying space-time field. In addition, Iam curious to see how BARCAST, my own climate reconstruction algorithm which assumesseparability, performs on decidedly non separable data sets.The results of this research would (Ithink) form the basis of a nice paper, of interest to those who think about how to reconstruct thepast climate.

Bala and I are also in the initial stages of defining a project we can work on together.

See also section 5a.

85

SAMSI Postdoctoral Fellow Research Plan Date: 1. Name: Xueying Wang 2. Ph.D. Program

University & Department: The Ohio State University & Department of Mathematics Dissertation Advisor: Dr. David Terman Year of Ph.D.: 2009

3. SAMSI Research

SAMSI Program Title: Stochastic dynamics SAMSI Research Mentor: SAMSI Administrative Mentor: Dr. Pierre Gremaud

4a. SAMSI Activities Course(s) (fall & spring): Stochastic Dynamics: Theory, Modeling, and Computation

Workshops Attended (and Workshop Support Tasks): Opening workshop for 2009-2010 program on stochastic dynamics

Postdoc-Grad Student Seminar – Presentation(s): Undergraduate Workshop(s) – Participation (specifics to be added later): 4b. Other Activities (e.g., teaching) 5a. Working Group I Special Tasks for Working Group: to understand network dynamics for stochastic systems Presentations to Working Group:

Research Area – Plans: Avanti Athreya, John Mcsweeney and I plan to develop stochastic compartmental models to study epidemiology for infectious diseases under the supervision of Dr. Cindy Greenwood and Dr. Peter Mucha.

5b. Working Group II Special Tasks for Working Group: to characterize the qualitative behavior of the stochastic dynamic systems Presentations to Working Group: A sample path approach to stochastic resonance (based on Berglund and Gentz's work on stochastic resonance)

Research Area – Plans: Avanti Atheriya and I are working on stochastic resonance on Morris Lecar model under the supervision of Dr Cindy Greenwood. We would like to understand the noise-induced resonance by comparing the firing patterns of Fitzhugh Nagumo model and those of Morris Lecar model.

.

5c. Working Group III (if appropriate) Special Tasks for Working Group:

86

Presentations to Working Group: Research Area – Plans:

6. Other Research

Work on Papers from Ph.D. Research: I am writing a paper on my Ph.D. Thesis. Other Research started or continued at SAMSI: Continuing Collaborations while at SAMSI: Presentations of Other Research:

Research Progress Report & SAMSI Program Final Report Date: 7a-z. Research Contributions – Current Projects (grouped by Working Group) Research Project Title: Collaborator(s) & Mentor(s):

Specific Goals & Accomplishments (results): Research Contributions (publication submissions, articles in preparation, etc.): Presentations outside SAMSI (including invitations for future talks): 8. Future Research Plans (after completion of SAMSI Program) Research Area – Plans: Continuing Collaborations (if appropriate): Presentations outside SAMSI:

87

SAMSI Postdoctoral Fellow Research Plan Date: 1. Name: Jun Zhang 2. Ph.D. Program

University & Department: University of Wisconsin-Madison, Department of Statistics Dissertation Advisor: Murray Clayton Year of Ph.D.: 2009

3. SAMSI Research

SAMSI Program Title: Space-time Analysis for Environmental Mapping, Epidemiology and Climate Change

SAMSI Research Mentor: Richard Smith SAMSI Administrative Mentor: Nell Sedransk

4a. SAMSI Activities Course(s) (fall & spring): 1.Theory of Continuous Space and Space-Time Processes 2. Spatial Epidemiology

Workshops Attended (and Workshop Support Tasks): Opening workshop for spatial program Sept. 13-16,2009.

Postdoc-Grad Student Seminar – Presentation(s): Oct.26, 2009 on regression models for spatial images Undergraduate Workshop(s) – Participation (specifics to be added later):Oct 30-31, 2009, Matlab tutorial lab. 4b. Other Activities (e.g., teaching) None for this semester. 5a. Working Group I Spatial extremes Special Tasks for Working Group: None. Presentations to Working Group: In discussion with the group leader.

Research Area – Plans: possibly some topics related with max-stable process, I am discussing the details with Prof. Smith.

5b. Working Group II Fundamentals of spatial modeling Special Tasks for Working Group: Webmaster for this group. Presentations to Working Group: Nov. 11 on regression models for spatial images

Research Area – Plans:So far the main activities are presentations about previous research or paper reviews by group members.

5c. Working Group III (if appropriate) Computation, Visualization and Dimension reduction in spatial-temporal modeling Special Tasks for Working Group: None. Presentations to Working Group: In discussion with the group leader.

88

Research Area – Plans:We haven't settled down on any particular topics yet. So far mainly people gave presentations about their previous research. I have an idea on dimension reduction with 3-Dimensional data-cubes which include spatial-temporal data. Currently I am experimenting with toy datasets and the idea is still in its early stage. 6. Other Research

Work on Papers from Ph.D. Research: 1. working on the final revision of paper on regression models for spatial images. This paper has been submitted to JABES earlier this year and the current revision is a resubmission. 2. Will work on a new paper on how to deal with missing data when fitting regression models for spatial images

Other Research started or continued at SAMSI: 1. Collaboration with Lin Xie (Dept. of EE, University of Wisconsin-Madison) on path segmentation identification for post-silicon time delay in chip design. The time delay data is a spatial dataset and can become very large. It is a challenging application problem.

Continuing Collaborations while at SAMSI: I am still collaborating with my Ph.D. Adviser Murray Clayton on my second paper about regression models for spatial images.

Presentations of Other Research:I will think about this, once I have done some new topics at SASMI.

Research Progress Report & SAMSI Program Final Report Date: 7a-z. Research Contributions – Current Projects (grouped by Working Group) Research Project Title: Collaborator(s) & Mentor(s):

Specific Goals & Accomplishments (results): Research Contributions (publication submissions, articles in preparation, etc.): Presentations outside SAMSI (including invitations for future talks): 8. Future Research Plans (after completion of SAMSI Program) Research Area – Plans: Continuing Collaborations (if appropriate): Presentations outside SAMSI:

89

5. Postdoctoral Fellow Experience Evaluations

Avanti Athreya

1. Program Involvement:

Which SAMSI program(s) have you been involved with and at what level(s)? (e.g., “been doing

active research on …” or “just went to a tutorial/workshop to learn about that area which is new

to me”)

Stochastic Dynamics, with active research in molecular motors, qualitative behavior of SDEs,

and neuron models

2. Interactions with Other Institutions:

Describe your interactions with other institutions while at SAMSI. (e.g., a Triangle university,

NISS or CRSC)

Gave talks at UNC and NC State; mentored an undergraduate PRUV fellow at Duke on a project

in which we modeled escape times from a double-well potential with small noise and small

deterministic oscillations and level crossing from a breathing potential with noise

3. High Points at SAMSI:

What have been the high points of your SAMSI experience?

Interacting with other faculty and postdocs in the working groups; it’s great that SAMSI brings

together mathematicians and statisticians from all three major universities

4. Suggestions for Improvement:

How could your SAMSI experience have been improved?

Extend the postdocs by one additional year so that postdocs have an extra year before they apply

for jobs; encourage more junior faculty (assistant professors) to participate---they face some of

the same time and career pressures as the new postdocs, and it was very helpful to be able to

work with them

5. Mentoring:

SAMSI aims to provide solid mentoring of Postdocs. How successful has this been in your case?

I would have benefited from having a few more small-scale, soluble problems to address right

away, just from the perspective of going on the job market so soon. But I realize that this isn’t

always realistic, and I appreciated the scientific mentoring and constant encouragement I

received from Jonathan Mattingly, Cindy Greenwood, Pete Kramer, and John Fricks. Also, I

received helpful mathematical input, mentoring, and advice from the more senior postdocs (Scott

McKinley, Oliver Diaz, and Matthias Heymann at Duke). I think it’s great to have this mix of

senior faculty, junior faculty, and more advanced postdocs in the same place.

6. SAMSI Benefits for the Future:

90

Has your SAMSI experience met your expectations in preparing you for your next position? Are

there (new) benefits that you now perceive coming from your SAMSI experience that are

relevant to your career in the future?

Yes---I’ve been exposed to multiple avenues of research, particularly in more applied areas with

which I was unfamiliar before coming to SAMSI. Also, by mentoring undergraduates during the

workshops and through the PRUV program at Duke, I’ve learned quite a bit about appropriate

problem selection and the distillation of technical ideas to more manageable levels for

undergraduate teaching.

7. SAMSI in contrast with University Setting:

How do you think your SAMSI experience compares to what you might have encountered in a

typical university setting?

I think it was better in many ways, because I was exposed to a wide variety of problems and got

to interact with researchers from multiple institutions. Also, the opportunities to speak at

workshops, participate in working groups, and lead discussions on specific problems were

invaluable.

8. SAMSI in comparison to Other Experiences:

If you have had experience in an industrial, national or other lab setting, how would you compare

that experience with your SAMSI experience?

9. Other Research while at SAMSI:

If you have spent considerable time during your appointment with some other research group,

please comment on the benefits and drawbacks of this experience. (e.g., if your appointment was

extended via an appointment at NISS, CRSC or elsewhere)

10. Other Issues:

Are there other issues or concerns you would like to bring up?

Julien Cornebise

1. Program Involvement:

Part of the Tracking and Population Monte Carlo working groups of the Sequential Monte Carlo

program. See details in the mid-program report.

2. Interactions with Other Institutions:

Collaborations are ongoing with Duke University, with my former co-author from Lund

University, and with my former Ph.D. advisor from Télécom ParisTech, France. See details in

the mid-program report.

91

Past interactions include stays in Computer Science engineering school ESIEA (5 years),

Université Paris 6 (4 years), National engineering school Telecom ParisTech (3 years), private

pharmaceutical company’s statistical company (6 months), Lund University (1 month).

3. High Points at SAMSI:

I am literally amazed by the incredible opportunities to meet and interact with most of the best

researcher of the field. People which were so far only (somewhat mythic) names on articles are

now blood and flesh persons, researchers, colleagues, which is a tremendous transition from the

grad studies to the post-doctorate research. On a scientific point of view, these interactions bring

a deep re-evaluation of my earlier work, making it fit in a much broader view of the field,

stressing its advantages, seeing how it can be extended to neighboring areas. On a more human

point of view, they contribute to mutating a finishing student into a full and mature (though

young) member of the scientific community.

An especially remarkable high point is also the very consideration brought to the post-docs as

researchers. However rich and benefiting the experience as a grad student may have been, the

role of a post-doc is definitely one of a entire researcher, with the associated responsibilities – in

terms of research, organization of working groups, advises to graduate students – and the

associated consideration and status. I cannot stress enough the extremely positive impact this

change of perspective had on the very nature of my work and my approach to research. However

personal this might look, this newly acquired self-confidence, and realizing that, yes, I belong

here, I belong to the research community, are something that can hardly be expressed in its full

importance. Besides, discussions with several other postdocs have led me to realize that this

relief and the former doubts seem to be a common issue amongst young doctors, which help all

the most to reduce them and move on to the next stage.

On a more “local” scale, but nevertheless so important, I must thank the whole SAMSI staff, and

especially the “Fantastic Four”, “SAMSI’s angels”, Denise Auger, Rita Fortune, Sue McDonald,

Terri Nida, for their so precious help, from the first pre-hiring interview to the settling of all the

practical and administrative details upon arrival – from lodging to VISA issues, all the more

important to a newcoming foreigner –, and still keeping on through the day-to-day life in

SAMSI. Never, ever, did I meet such a dedicated, patient, helpful and welcoming team.

4. Suggestions for Improvement:

I hardly see any point that could be improved at SAMSI. Concerning the offered setting and

means for doing research, it seems like it cannot be more perfect that it is.

The only point I can find concern the computer resources. SAMSI is gifted with great hardware,

the brand new computers are “computational beasts”, and the number of computers is striking for

such an institute. However, it could really benefit a centered network administration, which

would first and foremost allow remote access – hence making them real work platforms available

from anywhere that one could rely on.

Please don’t read me wrong: the current IT responsible, James Thomas, is extremely willingful

to help, and I here wish to thank him for the reactivity and the help he provides to everyday

problems, from the failing printer to Matlab installation, from the need for a new mouse to the

92

addendum of a new software. James Thomas fulfills these IT needs with a dedication that forces

the admiration -- I saw him staying until the middle of the night setting up the new computers !

My only suggestion regarding improvements would be to evaluate whether the joint network

administration of both NISS and SAMSI networks can be dealt by a single administrator.

Though this is probably not the kind of advice expected from a post-doctoral fellow, I would

recommend thinking of doubling the resources allocated to this task: SAMSI’s potential is here

grand, but still dormant.

My experience so far (both as a user, and as a system administrator for several years in a

professional context) is that the network administrators are always a team of at least two people,

that hence provide completing competences and knowledge (network administration getting a

broader and broader field), as well as a mandatory double reflection and confronting (therefore

enriching !) point of views on crucial technical choices – especially in a sensitive networks such

as NISS and SAMSI have been outlined to be in the last postdoctoral lunch.

Up to an added increase of security (which has a price that can hardly be measured) and

efficiency, the cost would most likely not be as increased as expected, as the spending on a

salary would partly be compensated by reduced call to external maintenance and services.

5. Mentoring:

I here would like to express my gratitude for the great comprehension of SAMSI directorate, first

and foremost of its director – and my administrative advisor – Jim Berger. SAMSI allowed for

exceptional measures to adapt the post-doctoral contract to the late schedule of my dissertation

(whose completion was originally planned just before the beginning of SMC program), and

permit me to be part of this grand research occasion while still finishing my dissertation.

I will never forget the open-mindness I have witnessed (and benefited!) when discussing this

matter with Jim Berger, comparing efficiently the possible ways to tackle it, and the strong will I

perceived to find a solution optimal to both SAMSI and its postdoc. I was afraid that these delays

(for which I take entire responsibility) may have put an end to the warm welcome I found in

SAMSI. On the very contrary, an offer that was then worked out to finish my 3 months stay here,

then getting back to France for a couple of month – time to finish writing the manuscript while

keeping up to date to SAMSI’s program – then coming back as a full-time postdoctoral fellow.

This generous and ideal (from my point of view) agreement was a tremendous incentive to

wrapping up the leftover writing as fast as possible, and to give the best of my possibilities to

SAMSI, both while back in France to keep up to the progresses of the working groups, but, even

more important, now that I am back, done with the dissertation.

For this, again, I would like to renew my thanks to Jim Berger and the directorate of SAMSI.

On the scientific side, the mentoring from Arnaud Doucet has been extremely benefiting. Its long

stay during the whole Fall of 2008 was the occasion of fascinating conversations, extremely

enriching, as well as occasions to become a reviewer through an article to which editor he

recommended me. Beyond that, his mentoring keeps on going now that he is in Japan, by means

of email conversations and his support in a grant application to join him at the Institute for

Statistical Mathematics in Japan for a two months stay next Fall.

6. SAMSI Benefits for the Future:

93

The benefits from my SAMSI experience have been outlined at length in the question above, and

can be summarized as triggering the key mutation from a finishing grad student to a fulltime

researcher, active member of the community.

7. SAMSI in contrast with University Setting:

In my opinion, SAMSI’s uniqueness resides in the way it relies on the post-doctorates as the

coordinators of its day-to-day scientific life, from working group coherence to welcoming of

“anchor point” for visiting researchers and professors. In a university setting, my guess is that

this unique role and would be conferred to local assistant professors, rather than postdocs. This

responsibility, however, is the key benefit of SAMSI, as it pushes to a greater dynamism and

increased interaction with the members of the program.

8. SAMSI in comparison to Other Experiences:

I lack extra-university research experiences to make a comparison, e.g. with private research

institutes. However, I can compare with the average French position, when no post-doctoral

research position is found after a research, as “Teaching and Research Assistant” (A.T.E.R.),

which is a one-year once-renewable contract consisting of a full teaching load. It is commonly

reported by young French researchers – though I did not experiment it firsthand, so this is to be

taken with caution, while keeping in mind that, happily, this is only a tendency to which a lot of

exceptions do exist – that the young A.T.E.R. is mainly filling the gaps in the teaching schedule,

and that the only achievable research during this year can be finishing some articles taken from

the Ph.D. dissertation.

No need to stress how much this lies at the opposite extreme of SAMSI.

9. Other Research while at SAMSI:

As stated in the “Mentoring” section, during my first 3 months stay, I have both been acting in

two working groups and worked on finishing writing my Ph.D. dissertation. Now that I have

been back for a month, and as is further detailed in my mid-program report, the only other

research planned while at SAMSI consists on finishing two articles out of my dissertation.

Besides, these two articles are deeply related to SAMSI’s SMC program, as my whole Ph.D. is

about Adaptive SMC methods. Therefore, this “other research while at SAMSI” still can be seen

as part of the working group activities.

Here again, the benefits of being at SAMSI are blatant, in terms of the scientific maturity in the

field, gained by the never-ending interactions occurring there.

10. Other Comments:

I can only renew the expression of my joy to be part of SAMSI. From a professional and

scientific perspective, it brings incredible assets, and (as far as I can judge) a formidable booster

for a research career. On the human perspective, the dynamic and the energy flowing in SAMSI

are a precious gift. I already guess that meeting again such a highly stimulating environment will

not occur easily !

94

E m i l y K a n g

1. Program Involvement:

Which SAMSI program(s) have you been involved with and at what level(s)? (e.g., “been doing

active research on …” or “just went to a tutorial/workshop to learn about that area which is new to

me”)

I have involved in the 2009 program on Space-Time Analysis for Environmental Mapping,

Epidemiology and Climate Change. I have been working with Johan Harlim in multiscale

computation strategies in stochastic spatio-temporal systems, which is new to me. Meanwhile, I am

involved in a subgroup in one of the working groups, where we are analyzing ash data for the Iceland

volcano eruptions.

2. Interactions with Other Institutions:

Describe your interactions with other institutions while at SAMSI. (e.g., a Triangle university, NISS

or CRSC) I gave a presentation at the Bayesian Statistics Working Group at Department of Statistics

in NC State University I taught St371 at NC State University in the first summer session

3. High Points at SAMSI:

What have been the high points of your SAMSI experience?

Opportunities to know and collaborate people in and out of my fields

4. Suggestions for Improvement:

How could your SAMSI experience have been improved? I would prefer being given one specific

projects to work on while we are also given enough time and flexibility for our own interests.

5. Mentoring:

SAMSI aims to provide solid mentoring of Postdocs. How successful has this been in your case?

Good.

6. SAMSI Benefits for the Future:

95

Elizabeth Mannshardt Shamseldin

1. Program Involvement:

Which SAMSI program(s) have you been involved with and at what level(s)? (e.g., “been doing

active research on …” or “just went to a tutorial/workshop to learn about that area which is new

to me”)

I have been actively involved

in the Space-Time Analysis for Environmental Mapping, Epidemiology and Climate Change

program for 2009-2010. I have actively participated in two working

groups, Spatial Extremes and Paleoclimate, as well as two sub-groups,

Paleo Extremes and Extreme Attribution. Presented a poster “Severe Weather Under a Changing

Climate: Large Scale Indicators of Extreme

Events” at the Opening Workshop Poster Session

and a corresponding presentation 10/19 for the SAMSI Postdoc seminar series. I also presented

“Extremes in Paleoclimate Proxy Data” on 2/15 for Post-doc Seminar. I

will be presenting this work at the International Meetings for Statistics in

Climatology meeting in Edinburgh in July, as well as at JSM in August. I plan to participate in

the SAMSI program transition workshop in October.

2. Interactions with

Other Institutions:

Describe your interactions with other institutions while at SAMSI. (e.g., a Triangle university,

NISS or CRSC)

For the last two years, I have had a joint appointment as a Visiting Assistant Professor at

Duke University. In the spring (2010), I taught Sta10 – “Stats And Quantitativ Literacy”, a

concept-based introduction to statistics course for non-science majors at Duke. I also served as

the adviser for an undergraduate statistics major at Duke who wrote a senior thesis Fall of 2009 –

Spring 2010. I worked with the student to develop an appropriate project in the area of extreme

value theory. It will appear as a technical report.

Has your SAMSI experience met your expectations in preparing you for your next position? Are

there (new) benefits that you now perceive coming from your SAMSI experience that are relevant to

your career in the future?

Yes. I started two new research projects and learned a lot of new knowledge in some new fields.

7. SAMSI in contrast with University Setting:

How do you think your SAMSI experience compares to what you might have encountered in a

typical university setting?

I think that compared to a junior faculty, SAMSI postdocs may have more time for their own

research, and spend much less time on teaching or other service commitments.

8. SAMSI in comparison to Other Experiences:

If you have had experience in an industrial, national or other lab setting, how would you compare that

experience with your SAMSI experience?

Not applicable

9. Other Research while at SAMSI:

If you have spent considerable time during your appointment with some other research group, please

comment on the benefits and drawbacks of this experience. (e.g., if your appointment was extended

via an appointment at NISS, CRSC or elsewhere)

Not applicable

10. Other Issues:

Are there other issues or concerns you would like to bring up?

None

96

In Spring of 2009, I proposed, developed, and taught a special topics graduate course in

“Extreme Value Theory and Applications with Bayesian Methods at Duke University.

July of 2009 I served as a faculty mentor for the Industrial Mathematical and Statistical

Modeling (IMSM) Workshop for graduate students at North Carolina State University. The

group deliverable is a paper to be included in the workshop proceedings, which developed into a

paper that is to appear in Environmetrics. I am looking forward to participating as a faculty

mentor again this coming year (July 19 – 27, 2010), and have also taken an active role in

recruiting problem presenters from industry.

I reviewed two peer papers for a prominent journal, and also submitted an extensive

review for an NSF grant/request for funding for a $1.5 million project.

I also have continued to maintain contacts with researchers from NCAR, NOAA and

other institutes. I am finishing up the project with Eric Gilleland from NCAR, and will be

starting a project with Harold Brooks from the National Severe Storm Laboratory in conjunction

with the IMSM workshop.

3. High Points at SAMSI:

What have been the high points of your SAMSI experience?

The conferences and meetings that I am able to participate in due to my affiliation with SAMSI

and SAMSI’s support are definitely high points. To have exposure to the current research in my

area is extremely enriching, and the contacts made through these meetings – as well as through

the SAMSI working groups themselves – are invaluable for future research as well as career

prospects.

4. Suggestions for Improvement:

How could your SAMSI experience have been improved?

I thought the interactions among SAMSI members my second year here (2009-10) were

significantly different from those in 2008/9. My first year at SAMSI, substantial effort was

made (and was much appreciated) to have social interactions – dinners, lunches, etc – which

significantly added to the congenial atmosphere at SAMSI. I feel that atmosphere is a vital part

of what makes SAMSI so successful. This was lacking a bit the second year, possibly simply

due to personnel changes with the incoming not realizing that this had taken place previously.

5. Mentoring:

SAMSI aims to provide solid mentoring of Postdocs. How successful has this been in your case?

I personally did not have a postdoc adviser at SAMSI until my second year. This was in part due

to my involvement in the SAMSI program in 2008-2009 being minimum (as it was not my area),

but also in part due to not knowing who to ask, etc. My current adviser, Peter Craigmile, is

fantastic and has been very helpful and provided many helpful comments and guidance.

However, it was a bit tough to navigate on my own for that first year, while my colleagues

seemed to have someone to turn to for research advice, etc. Nell was very helpful and

approachable with regards to professional advice, but it is also useful to have someone in your

field.

97

In addition, in 2008-2009 the postdoc had regular question/answer “topics” lunches with the

SAMSI directorate. This was very helpful in many ways. First, it enables the postdocs to feel

comfortable with the directorate and those they might be working with while at SAMSI. Second,

the topics were timely, useful subjects that the directorate had very useful advice on. (CV’s, job

interviews, how to publish, authorship, research ethics, etc.) Third, this also served as a regular

“touch-base” session for everyone to briefly discuss with “those in charge” where they were on

any particular issues.

6. SAMSI Benefits for the Future:

Has your SAMSI experience met your expectations in preparing you for your next position? Are

there (new) benefits that you now perceive coming from your SAMSI experience that are

relevant to your career in the future?

I believe that learning “how to research” is extremely important, and is something that is

provided by the SAMSI experience. You learn how to work with others, what projects may or

may not be possibly suitable for a future publication, how to start/develop/hopefully complete a

project, etc. Postdocs often present their research several times - at meetings, conferences,

seminars - and in several different formats – posters, 15 min talks, hour sessions, etc – which is

very useful in becoming comfortable presenting as well as explaining your own research.

In addition, I reviewed two peer papers and submitted an NSF grant review while at

SAMSI, which certainly is good experience in preparation for a future academic position.

7. SAMSI in contrast with University Setting:

How do you think your SAMSI experience compares to what you might have encountered in a

typical university setting?

I think most SAMSI postdocs have a different experience than in the university setting for two

main reasons. One, university positions might involve teaching responsibilities – as mine did –

which takes away from available research time. (Teaching does provide valuable experience in

and of itself however.) Two, university postdocs are often off by themselves, or perhaps with

one or two others under the same professor/lab. However, at SAMSI, the postdocs are offered

the opportunity for collaborative work – with other postdocs as well as with experience

professors and researchers. This is invaluable in terms the learning experience and getting a

research base established.

8. SAMSI in comparison to Other Experiences:

If you have had experience in an industrial, national or other lab setting, how would you compare

that experience with your SAMSI experience?

I think the SAMSI experience is much more collaborative and supportive than the typical

position. Most certainly when compared to an industry position, where the “bottom-line” or

“client satisfaction” is often the main goal over novel research and applications.

9. Other Research while at SAMSI:

If you have spent considerable time during your appointment with some other research group,

please comment on the benefits and drawbacks of this experience. (e.g., if your appointment was

extended via an appointment at NISS, CRSC or elsewhere)

98

I have not extended my position within SAMSI to another research group, however I

have spent some of my time at SAMSI on collaborative projects with people from other

institutions. I am finishing up the project with Eric Gilleland from NCAR, and through IMSM

this July will be starting a project with Harold Brooks from the National Severe Storm. I am

continuing next year in my position as a Visiting Assistant Professor at Duke.

10. Other Issues:

Are there other issues or concerns you would like to bring up?

The snacks in the break-room should have more chocolate items.

99

John McSweeney

1. Program Involvement: Which SAMSI program(s) have you been involved with and at what level(s)? (e.g., “been doing

active research on …” or “just went to a tutorial/workshop to learn about that area which is new

to me”)

(a) Network Dynamics Group: I attended all the weekly meetings and was actively involved in

projects on epidemic spreads on networks. Specifically, I am working with Bruce Rogers now to

put together a paper on the analysis of SIS epidemics across a hierarchically structured

population. My talk and lab at the February undergraduate workshop were on these topics as

well.

(b) Dynamics of Biological Systems Group: I looked at the issue of reaction kinetics in a cellular

environment, things like determining applicability of kinetic models, chemical reaction systems

as applied to developmental biology, multiscale analysis of reactions. The most promising

direction was a problem proposed by R. Yamada which he, I, J. Fricks, and B. Joshi worked on.

It has to do with finding conditions in a gene expression network that will produce oscillations,

that could potentially explain observed oscillations in nature such as circadian rhythms. We are

still working on this and I gave a talk at the 2010 SIAM Life Sciences conference about our

initial investigations into this.

2. Interactions with Other Institutions: Describe your interactions with other institutions while at SAMSI. (e.g., a Triangle university,

NISS or CRSC)

I spent some time at Duke in 09-10, during which I sat in on some lectures (J. Mattingly's class

in the fall, S. McKinley's in the Winter). I also attended two instances of the social networks

weekly seminar.

I attended the 2010 Graduate student probability conference inApril/May which was held at

Duke this year, and I gave a talk related to my work at SAMSI on “Closure approximations for

epidemics on networks”.

I attended the UNC-Chapel Hill applied math seminar once as well.

3. High Points at SAMSI: What have been the high points of your SAMSI experience?

Having a constant stream of people coming in and out, but who stay long enough (months, as

opposed to just days) to be able to make meaningful progress on a project from scratch. Being

free from teaching to be able to focus on research 100%.

The staff was very helpful as well with all issues practical and logistical.

4. Suggestions for Improvement: How could your SAMSI experience have been improved?

The working groups could have been given a little more direction. As a (very) junior researcher

in a somewhat new field, I don't mind being told what to do to a certain extent. It's very hard to

100

come up with good problems in an unfamiliar field, and it seems like while I learned a great deal

in everything I did, a lot of the work I put in didn't get me much closer to getting a paper

published, which of course has to be the ultimate byproduct of such work.

5. Mentoring: SAMSI aims to provide solid mentoring of Postdocs. How successful has this been in your case?

Overall I thought the mentoring was very good, the mentors were enthusiastic and cared about

the postdocs. I think many of the postdocs could have used more direction/attention from faculty.

Although we were all assigned an academic mentor, it was not always easy to know where to go

for guidance: often the mentors were too busy with other things, or spent most of their time at

one of the local universities, or just had research interests that didn't line up particularly well. I

don't think it would hurt if the mentors delegated or gave specific tasks more (e.g. “read this

paper and tell me about it in a few days”), especially with postdocs who might appear to be

lakcing direction on a particular project.

6. SAMSI Benefits for the Future: Has your SAMSI experience met your expectations in preparing you for your next position? Are

there (new) benefits that you now perceive coming from your SAMSI experience that are

relevant to your career in the future?

Already I find I am able to get more out of conferences since there are so many people at them

that I may have met or at least heard of through SAMSI. It is too early to tell how important my

time at SAMSI will be in my current position because it's only just begun.

7. SAMSI in contrast with University Setting: How do you think your SAMSI experience compares to what you might have encountered in a

typical university setting?

I think I got a decent amount of the university experience since I spent about 2 days a week at

Duke. It might have been nice to have the stability and continuity of having say a 3-year postdoc

at the same institution, which might have allowed me to form deeper professional relationships

but at the same time not exposed me to as many people. It's a question of quantity versus quality

I guess.

8. SAMSI in comparison to Other Experiences: If you have had experience in an industrial, national or other lab setting, how would you compare

that experience with your SAMSI experience?

N/A

9. Other Research while at SAMSI: If you have spent considerable time during your appointment with some other research group,

please comment on the benefits and drawbacks of this experience. (e.g., if your appointment was

extended via an appointment at NISS, CRSC or elsewhere)

N/A

101

10. Other Issues: Are there other issues or concerns you would like to bring up?

Bruce Rogers

1. Program Involvement:

Which SAMSI program(s) have you been involved with and at what level(s)? I have been doing

research associated with the Stochastic Dynamics program in both the Network Dynamics and

Data Assimilation working groups.

2. Interactions with Other Institutions:

Describe your interactions with other institutions while at SAMSI. (e.g., a Triangle university,

NISS or CRSC)

My pay goes through Duke University. They have a nice library. My mentor is at UNC, and

they have free wireless for visitors.

3. High Points at SAMSI:

What have been the high points of your SAMSI experience?

Working at SAMSI is great. I have the freedom to pursue my own research with the support of

many experts working in my areas of interest. Plus there are always statisticians around to

answer questions.

4. Suggestions for Improvement:

How could your SAMSI experience have been improved?

The working groups were not well organized at the beginning of the year. We wasted a lot of

time because there weren't any senior researchers with specific tasks to bring to the table. Also,

see #5.

5. Mentoring:

SAMSI aims to provide solid mentoring of Postdocs. How successful has this been in your case?

My Scientific Mentor, Peter Mucha, has been incredible. I met with my Administrative Mentor

only one time. I'm not sure what an Administrative Mentor is supposed to do, but I could always

use some advice about preparing for the academic job market.

6. SAMSI Benefits for the Future:

Has your SAMSI experience met your expectations in preparing you for your next position? Are

there (new) benefits that you now perceive coming from your SAMSI experience that are

relevant to your career in the future?

My experience at SAMSI has more than met my expectations. Just having an additional year of

research experience is incredibly valuable. Also, the post-doc seminar has been great for helping

me improve my presentations.

102

7. SAMSI in contrast with University Setting:

How do you think your SAMSI experience compares to what you might have encountered in a

typical university setting?

There's certainly less time devoted to teaching at SAMSI, so I was able to devote more time to

research.

8. SAMSI in comparison to Other Experiences:

If you have had experience in an industrial, national or other lab setting, how would you compare

that experience with your SAMSI experience?

NA

9. Other Research while at SAMSI:

If you have spent considerable time during your appointment with some other research group,

please comment on the benefits and drawbacks of this experience. (e.g., if your appointment was

extended via an appointment at NISS, CRSC or elsewhere)

NA

10. Other Issues:

Are there other issues or concerns you would like to bring up?

No.

Benjamin Shaby

1. Program Involvement:

Which SAMSI program(s) have you been involved with and at what level(s)? (e.g., “been doing

active research on …” or “just went to a tutorial/workshop to learn about that area which is new

to me”)

I have been actively participating and contributing to the Spatial program.

2. Interactions with Other Institutions:

Describe your interactions with other institutions while at SAMSI. (e.g., a Triangle university,

NISS or CRSC)

This has been very limited. I have been to Duke only to sign employment papers and get a book

from the library. I've been to NCSU once or twice for meetings. I've spent more time at

Berkeley and UCSF working with collaborators there on projects that were not initiated through

SAMSI.

103

3. High Points at SAMSI:

What have been the high points of your SAMSI experience?

The workshops are great. I have been impressed with the number and quality of participants.

The best part of the experience, I think, has been getting to spend time with the visitors.

Interacting with them has really introduced me to a ton of new perspectives. In addition, it's nice

having contacts all over the world as a result of the visitor program.

4. Suggestions for Improvement:

How could your SAMSI experience have been improved?

Mentorship had been a real problem. Visitors are great, but those relationships are not conducive

to a longer-term mentorship model.

Working groups seem kind of scattered. I'm still not really clear on what they are supposed to

achieve. Two of the three in which I participated were run more like seminars than meetings

intended to produce actual work. That was frustrating. The third, I feel, was held together only

by Richard Smith's exceptional leadership. It seems that without someone really directing

traffic, the groups flounder. Maybe a problem is that they're too big? Or maybe there is not a

problem, and that they are not intended to produce any actual research.

5. Mentoring:

SAMSI aims to provide solid mentoring of Postdocs. How successful has this been in your case?

Completely unsuccessful. I am not alone in this regard either. I think a big part of the problem

is structural – with faculty at their universities and us at SAMSI, there are really barriers to

interaction. I feel like I have been completely on the basis of self-guidance. It would have been

great to have had a senior faculty member to talk to on a regular basis. I have no met with my

official scientific mentor once since I've been here. I don't hold it against him or anything – I

think the relationship was never clearly defined, and he's very busy and located in a different zip

code. This does not make for a successful mentorship relationship.

6. SAMSI Benefits for the Future:

Has your SAMSI experience met your expectations in preparing you for your next position? Are

there (new) benefits that you now perceive coming from your SAMSI experience that are

relevant to your career in the future?

It's impossible to say really. I think the big benefit has been meeting so many smart people from

all over the place. The networking opportunities have been immense. On the other hand, the

mentorship vacuum has probably been to my detriment.

Something else to consider is the openness of the work structure is great for incubating new ideas

and following interesting paths. The other side of the coin, however, is that the lack of focus

does not lead to immediate productivity as easily. So in the short term, the breadth of the

experience could make it harder to get a job (I have no new papers this year), even as it broadens

my longer-term horizons.

104

7. SAMSI in contrast with University Setting:

How do you think your SAMSI experience compares to what you might have encountered in a

typical university setting?

The lack of structure and the exposure to so many people are wonderful and exciting, but they

come at the expense of mentorship and focused research that one would expect working closely

with a single faculty member at a university.

8. SAMSI in comparison to Other Experiences:

If you have had experience in an industrial, national or other lab setting, how would you compare

that experience with your SAMSI experience?

I have worked in a lab setting, and it's so different that it's difficult to compare to SAMSI. The

lab seems to be more of a cohesive unit, with a single person supplying the overall vision. The

SAMSI model is totally decentralized. I am starting to think that the relative chaos of the

SAMSI setting will serve me better in the future.

9. Other Research while at SAMSI:

If you have spent considerable time during your appointment with some other research group,

please comment on the benefits and drawbacks of this experience. (e.g., if your appointment was

extended via an appointment at NISS, CRSC or elsewhere)

I have not.

10. Other Issues:

Are there other issues or concerns you would like to bring up?

Xueying Wang

1. Program Involvement:

Which SAMSI program(s) have you been involved with and at what level(s)? (e.g., “been doing

active research on …” or “just went to a tutorial/workshop to learn about that area which is new

to me”)

I have participated in the following workshops at SAMSI:

Opening workshop of stochastic dynamics program

Workshop on selforganization and multiscale mathematical modeling of active Biological

systems

Workshop on theory and qualitative behavior of stochastic dynamics

Workshop on molecular motors, neuron models, and epidemics on networks

105

I am very very interested in stochastic dynamics. In particular, I have been working on

stochastic neuroscience and epidemiology on networks and I am writing papers with my

collaborators in those areas.

2. Interactions with Other Institutions:

Describe your interactions with other institutions while at SAMSI. (e.g., a Triangle university,

NISS or CRSC)

N/A

3. High Points at SAMSI:

What have been the high points of your SAMSI experience?

SAMSI has built a great environment for graduate students, post-docs and faculty fellows to

communicate and to conduct mathematical, statistical and interdisciplinary research. The year-

long programs are fantastic. In particular, the workshops and group meetings for stochastic

dynamics program have been quite successful in making interesting discoveries and significant

progresses.

4. Suggestions for Improvement:

How could your SAMSI experience have been improved?

It would be really nice if the network of links to real applications could be broaden.

5. Mentoring:

SAMSI aims to provide solid mentoring of Postdocs. How successful has this been in your case?

It works well for my case. I have been benefiting a lot from both my administrative and scientific

mentoring.

6. SAMSI Benefits for the Future:

Has your SAMSI experience met your expectations in preparing you for your next position? Are

there (new) benefits that you now perceive coming from your SAMSI experience that are

relevant to your career in the future?

Yes. One of the most important things I have learned at SAMSI is to be able to independent

research, which definitely helps me a lot with the transition from a graduate student to a

postdoctoral fellow.

7. SAMSI in contrast with University Setting:

How do you think your SAMSI experience compares to what you might have encountered in a

typical university setting?

106

It was really great. Only scientific resources and the computer maintenance at SAMSI is

different from what I experienced at University.

8. SAMSI in comparison to Other Experiences:

If you have had experience in an industrial, national or other lab setting, how would you compare

that experience with your SAMSI experience?

N/A

9. Other Research while at SAMSI:

If you have spent considerable time during your appointment with some other research group,

please comment on the benefits and drawbacks of this experience. (e.g., if your appointment was

extended via an appointment at NISS, CRSC or elsewhere)

N/A

10. Other Issues:

Are there other issues or concerns you would like to bring up?

N/A

Jun Zhang

1. Program Involvement:

Which SAMSI program(s) have you been involved with and at what level(s)? (e.g., “been doing

active research on …” or “just went to a tutorial/workshop to learn about that area which is new

to me”)

I have been doing active research on spatial extremes and spatio-temporal point patterns at

SAMSI. Currently I have finished a spatial extremes project and I am writing a paper. For

spatio-temporal point patterns, I have recommended to use regional climate model output as the

data set and we are currently looking at the precipitation patterns. Also I am planning to start a

project on spatio-temporal max-stable processes.

2. Interactions with Other Institutions:

107

Describe your interactions with other institutions while at SAMSI. (e.g., a Triangle university,

NISS or CRSC)

It is very helpful to interact with NISS postdocs. Frank Zou, Feng Xingdong, Xia Fang and

Jessica Xia all provided very helpful suggestions and comments for my research and

presentations.

3. High Points at SAMSI:

What have been the high points of your SAMSI experience?

1)Met with many great statisticians and mathematicians from institutions all over the world.

2)Exposure to many interesting new topics.

2)Local staff members are very helpful. Karem helped me bind my print-outs. James helped me

on my computer issues.

3)Extremely clean and quiet working environment.

4. Suggestions for Improvement:

How could your SAMSI experience have been improved?

I don't think of anyone.

5. Mentoring:

SAMSI aims to provide solid mentoring of Postdocs. How successful has this been in your case?

I always felt lucky to have Nell Sedransk and Richard Smith as my mentors. Nell has been

wonderful and always gave me great tips on everything from presentation skills to research

skills. Richard has a deep understanding of extreme theory and risk assessment and helped me

start my research project. Richard also helped me solve numerous technical problems.

6. SAMSI Benefits for the Future:

Has your SAMSI experience met your expectations in preparing you for your next position? Are

there (new) benefits that you now perceive coming from your SAMSI experience that are

relevant to your career in the future?

Yes, I think my SAMSI experience exceeded my expectations.

My SAMSI experience helped me find one of my research focuses-spatial extremes. I have zero

knowledge before I came to SAMSI. However, after several months of exposure to the topic

including listening to the talks given by many wonderful researchers in that field, I started my

own project with the help from Prof. Richard Smith. In future, I think I will do more research on

spatial extremes. To be able to have a suitable research topic is very important to my future

career. In this point, SAMSI experience helped a lot.

7. SAMSI in contrast with University Setting:

How do you think your SAMSI experience compares to what you might have encountered in a

typical university setting?

108

In a typical university setting, I could never meet so many great statisticians from all over the

world in person and online through WEBEX. Also SAMSI uses WEBEX very well. WEBEX

created virtual research groups and connected the researchers at SAMSI with many researchers

in remote locations. In other word, at SAMSI, you felt you are connected with global research

community all the time without the geographical limit while in a typical university setting, your

connections are limited to the visitors coming to the university or limited to visitors coming to a

conference.

8. SAMSI in comparison to Other Experiences:

If you have had experience in an industrial, national or other lab setting, how would you compare

that experience with your SAMSI experience?

I don't have such experience and I can't comment on this one.

9. Other Research while at SAMSI:

If you have spent considerable time during your appointment with some other research group,

please comment on the benefits and drawbacks of this experience. (e.g., if your appointment was

extended via an appointment at NISS, CRSC or elsewhere)

I don't have such appointment and I can't comment on this one.

10. Other Issues:

Are there other issues or concerns you would like to bring up?

No.

Esther Salazar

Postdoc Questionnaire

1. Program Involvement: Which SAMSI program(s) have you been involved with and at what level(s)? (e.g., “been doing active research on …” or “just went to a tutorial/workshop to learn about that area which is new to me”)

I have been involved with the SAMSI program: “Space-Time Analysis for Environmental Mapping, Epidemiology and Climate Change” for the period January – August, 2010.

2. Interactions with Other Institutions: Describe your interactions with other institutions while at SAMSI. (e.g., a Triangle university, NISS or CRSC)

I have been working actively with researchers from Duke University, University of Chicago, University of California Santa Cruz (UCSC) and North Carolina State University (NCSU). More specifically in the following activities:

SAMSI sub-working group “Substantive Scientific Problems”. Joint work with Bruno Sanso (UCSC) and Montserrat Fuentes (NCSU)

Working paper “Dynamic Matrix-Variate Spatial Factor Analysis”. Joint work with Mike West (Duke) and Hedibert Lopes (U. Chicago)

3. High Points at SAMSI: What have been the high points of your SAMSI experience?

The high points are: Excellent infrastructure I can talk and interact with top researchers that help me in many projects and with which

I will continue to work for the next months in order to finish some papers.

4. Suggestions for Improvement: How could your SAMSI experience have been improved?

I feel that the postdocs are very isolated. On the other hand, the working group meetings should be more organized with more responsibilities for the members.

5. Mentoring: SAMSI aims to provide solid mentoring of Postdocs. How successful has this been in your case?

I feel confortable with my academic mentor and interacting with other researchers. I think that it was very beneficial for me and for my future research plans.

6. SAMSI Benefits for the Future: Has your SAMSI experience met your expectations in preparing you for your next position? Are there (new) benefits that you now perceive coming from your SAMSI experience that are relevant to your career in the future?

Definitely met my expectations. More important, I could find an academic position because of the contacts that I did during my SAMSI experience.

7. SAMSI in contrast with University Setting: How do you think your SAMSI experience compares to what you might have encountered in a typical university setting?

Although the postdocs spend more time working at SAMSI, many of them (including myself) had the opportunity to work at near Universities. That means that we can take advantages of the University infrastructure as well as SAMSI facilities.

8. SAMSI in comparison to Other Experiences: If you have had experience in an industrial, national or other lab setting, how would you compare that experience with your SAMSI experience?

I did not have other experience.

9. Other Research while at SAMSI: If you have spent considerable time during your appointment with some other research group, please comment on the benefits and drawbacks of this experience. (e.g., if your appointment was extended via an appointment at NISS, CRSC or elsewhere)

Parallel to my work at SAMSI I have been working on a project with researchers from Duke and University of Chicago. Benefit: I have had completely freedom to work in that project managing my daily activities and with help of SAMSI researchers. Drawbacks: none.

10. Other Issues: Are there other issues or concerns you would like to bring up?

None.

111

I.C Graduate Fellows

1. Space-time Analysis for Environmental Mapping, Epidemiology and Climate Change

K Sham Bhat (Pennsylvania State University, dissertation advisor: Murali Haran)

Sham was a member of both the Interaction of Deterministic and Stochastic Models and the Computation, Visualization, and Dimension Reduction in Spatio-Temporal Modeling working

groups. He attended many of the SAMSI workshops and the courses “Theory of Continuous

Space and Space-Time Processes” and “Spatial Epidemiology.” His dissertation research

involves calibration of very large computer climate models, which has been furthered by his

participation in the working groups.

He currently has submitted a book chapter for “Frontier of Statistical Decision Making and

Bayesian Analysis-In Celebrating James O. Berger's 60th Birthday”: “Computer model

calibration with multivariate spatial output: a case study in climate parameter learning” with

Murali Haran and Marlos Goes.

Laura Boehm (NCSU, SAMSI mentor: Brian Reich)

Laura was a member of the Spatial Health Effects working group and took the SAMSI course

“Spatial Epidemiology.” She also assisted with the undergraduate workshop held at SAMSI in

October. She plans to do research on the analysis of Texas birth defect data and air

pollution.

Avishek Chakraborty (Duke, dissertation advisor: Alan Gelfand)

Avishek was a member of the Spatial Point Patterns working group. He attended several

workshops at SAMSI and the courses “Theory Of Continuous Space” and “Spatial Statistics in

Climate, Ecology and Atmospherics.” Avishek was a key presenter in the undergraduate

workshop in October where he led a group as well as a lab.

Morgan Gieseke (NCSU, dissertation advisor: Len Stefansky)

Morgan was a member of the Spatial Exposures and Health Effects Working Group and gave a

presentation to the group on her work related to the geocoded data for the PIN3 study.

Soyoung Jeon (UNC-CH, dissertation advisor: Richard Smith)

Soyoung participated in the Spatial Extremes working group and took the courses “Theory of

Continuous Space and Space-Time Processes” and “Spatial Epidemiology.” Soyoung also

participated in several workshops. Her research is focused on the extreme value theory for

spatial data. The modeling of extreme values can be applied to lots of fields such as

environmental statistics or environmental epidemiology. One of the goals in her research is to

112

figure out existing approaches for max-stable processes and develop a new method or

extension to apply it for more realistic data sets.

Yajun Liu (University of Missouri, dissertation advisor: Dongchu Sun)

Yajun is participating in the Fundamentals of Spatial Modeling and the Computation,

Visualization, and Dimension Reduction in Spatio-Temporal Modeling working groups. She has

taken the SAMSI courses “Theory of Continuous Space and Space-Time Processes,” “Spatial

Epidemiology,” and “Spatial Statistics in Climate, Ecology and Atmospherics.” She also

participated in several SAMSI workshops and assisted with the undergraduate workshops.

Zhaowei Hua (UNC-CH, dissertation advisors David B. Dunson and Hongtu Zhu)

Zhaowei is participating in the Spatial Health working group and has taken the SAMSI courses “Theory of Continuous Space and Space-Time Processes.” Her general research area is methodology development for spatial data/functional data and applications to health effects. For her dissertation, she develops semiparametric Bayesian methods for functional data

2. Stochastic dynamics

Amogh Deshpande (NCSU, dissertation advisor Min Kang)

Amogh was a member of the qualitative behavior of stochastic dynamics systems. His research

deals with stochastic modeling and large deviation theory. He wrote a paper which applies

concepts on probability and stochastic analysis in finance and has submitted it to the journal

of Methodology and Computation in Applied Probability

Dan Fovargue (UNC-CH, dissertation advisor Sorin Mitran)

Dan was a member of the numerical methods for stochastic systems group. His research

involves multi-scale numerical methods relating computation at a microscopic level to

computation at a continuum level through probabilistic and stochastic methods. The goal is to

speed overall computation by averaging microscopic affects and allowing for larger continuum

level timesteps. His dissertation work involves numerical simulations of acoustic waves

propagating through non-uniform media. For non-uniformities at a microscopic scale, the

SAMSI research applies directly to the creation of a method to produce a continuum level

solution. Dan contributed presentations to the working group and was an active presenter of

the undergraduate workshop.

Nuwan Induruwage (Visiting graduate student from RPI, dissertation advisor: Pete

Kramer)

113

Nuwan attended several workshops and was working on how to effectively perform

averaging when there are metastable barriers in the potential landscape in the direction

of the fast variable.

Tiffany Kolba (Duke, dissertation advisor: Jonathan Mattingly)

Tiffany was a member of the Qualitative Behavior of Stochastic Dynamical Systems working

group. Tiffany’s work is related to noise induced stability for stochastic differential

equations. She made several presentations as part of her working group activities. Tiffany’s

poster at the graduate students poster day gave a wonderful description of her research

activities at SAMSI.

Katherine Newall (Visiting graduate student from RPI, dissertation advisor: Pete Kramer)

Katie was a very active participant to the program. She was an invaluable member of the

network working group. Through her multiple presentations, Katie influenced the

research direction of the group in significant and very productive ways.

She investigated the role of stochastic input into a single neuron model. Whereas a single

Hodgkin-Huxley (HH) neuron driven by constant deterministic input transitions due to a

bifurcation from a quiescent state to one of periodic firing as the input increases in

intensity, the stochastically driven neuron with equivalent mean input can demonstrate

oscillations between bursting firing periods and non firing periods, a state which is never

present in the deterministically driven case. It is thought that this is a result of

fluctuations in the input pushing the dynamics away from a fixed point in the phase space

and onto a limit cycle and then back to the fixed point. Her main goal is to work on an

analytic expression for the rate of transition between these two states.

Her thesis is also on oscillations, but in an entirely different form. She has looked at the

synchrony resulting from cascading total firing events, where every neuron in a network

of stochastic-driven, integrate-and-fire (IF), pulse-coupled, excitatory neurons fire.

Although the mechanism for these network firing events are fundamentally different from

the mechanism responsible for bursting of the single HH neuron, she believes the same

analytical tools of first-passage time calculations and probabilistic analysis will facilitate

understanding the HH bursting neuron, and the role of stochastic input. It is the

stochastic input into the system that makes the synchrony she studies fundamentally

different from its deterministic counter part.

114

D. Consulted Individuals

The individuals consulted for the broad selection of topics within programs and

workshops were the members of two groups:

The Program Organizers, listed in Section I.A.1

Members of the Advisory Committees, listed in Section I.J

The specific topics that Program Working Groups chose to pursue were, in general,

selected by the Working Group participants themselves, according to their combined

interests. In almost all cases, however, a Program Leader headed each working group, so

that specific research topics remained consistent with overall program goals. In Section

II.E, the various Working Groups, and their members, are discussed.

115

I.E PROGRAM ACTIVITIES

1. Stochastic Dynamics

2. Space-Time Analysis for Environmental Mapping, Epidemiology and Climate

Change

3. 2010 Summer Program: Semiparametric Bayesian Inference: Applications in

Pharmacokinetics and Pharmacodynamics

4. Education and Outreach

I.E.1 2009-2010 Program on Stochastic Dynamics

Program Leaders: Hongyun Wang (UC Santa Cruz), Martin Hairer (War-wick), Pete Kramer (RPI), Alejandro Garcia (San Jose State), Lea Popovic(Concordia), Cindy Greenwood (Arizona State University)

Local Scientific Coordinators: Alan Karr (NISS), Jonathan Mattingly (Duke),Peter Mucha (UNC)

Directorate Liaison: Michael Minion (SAMSI)

1. Introduction

This year-long SAMSI program was centered around the broad topic of Stochas-tic Dynamics with particular focus on analysis, computational methods, andapplications of systems governed by stochastic differential equations. The term“Stochastic Dynamics” is one which resonates within many fields in statis-tics and applied mathematics. The numerical analyst designing algorithms forstochastic differential equations, the math biologist studying transport on thecellular level, the analyst trying to understand the effect of stochastic forcingand data in dynamical systems, the statistician trying to characterize the statis-tics of dynamic networks, and the mathematical modeler trying to bridge thegap between atomistic and continuum physics are all examples of research instochastic dynamics. Unfortunately it is too often the case that research beingdone in one of the above scenarios is not being widely disseminated across thespectrum of statistics and applied math. This SAMSI program served to bringtogether experts in different but highly inter-related research specializationsunder the broader umbrella of stochastic dynamics in the hopes of creatingcollaborations which could potentially lead to exciting advances in particularresearch areas.

Particular successes of the program include:

• The establishment of a new collaboration, including two major grant pro-posals submitted to date, involving a probabilist (Scott McKinley), anapplied mathematician (Peter Kramer), and a statistician (John Fricks)to develop a systematic stochastic modeling framework for intracellulartransport connecting three important physical length scales.

• A sustained research effort across SAMSI programs, with applied math-ematicians from the working group on Numerical Methods for StochasticSystems (Peter Kramer and Sorin Mitran) joining with statisticians fromthe program on Space-Time Analysis for Environmental Mapping, Epi-demiology, and Climate Change program (Jim Berger, Susie Bayarri, Mu-rali Haran, Emily Kang, Hans Kunsch) to explore how statistical aspectsof multiscale computing methods could be enhanced.

116

• Two research programs toward new modeling approaches in microswim-ming (Peter Kramer) and swarming (Amarjit Budhiraja, Oliver Diaz andXin Liu) that were nucleated by presentations at the workshop on Self-Organization and Multi-Scale Mathematical Modeling of Active BiologicalSystems in October, 2009.

• Several ongoing collaborations between SAMSI postdocs on topics in stochas-tic neuronal dynamics and epidemic spread on networks.

• A project initiated and driven by two graduate students (Katherine Newhalland Amanda Traud) participating in the working group on Network Dy-namics, now being prepared for submission for publication.

• Two workshops in the spring which were designed to resonate with thespecific interests of the working groups, stimulating new ideas and helpingto lead to several submitted publications.

Below, we first review the content of the workshops, and then summarizethe activities of the working groups.

2. Workshops

Five workshops were organized as part of the program.

2.1. Opening Workshop & Tutorials, August 30 - September 2, 2009

Following the usual style of an opening workshop tutorials were given on Sunday,while from from Monday to Wednesday, invited speakers gave presentations.Following the talks, the research working groups formed and had their initialmeetings at SAMSI.

Speakers: Mark Alber (University of Notre Dame), Anna Amirdjanova (Uni-versity of Michigan), David Anderson (University of Wisconsin), Peter Baxen-dale (University of Southern California), Gerard Ben Arous (New York Uni-versity), David Cai (New York University), Sandra Cerrai (University of Mary-land), Lee DeVille (University of Illinois), Tim Elston (University of North Car-olina), Alejandro Garcia (San Jose State University), Cindy Greenwood (Ari-zona State University), Martin Hairer (New York University), Peter Imkeller(Humboldt University Berlin), Markos Katsoulakis (University of Massachusettsat Amherst), Peter Kotelenez (Case Western University), Rachel Kuske (Uni-versity of British Columbia), Kevin Lin (University of Arizona), Gabriel Lord(Heriot Watt University), Grant Lythe (University of Leeds), Katie Newhall(Rensselaer Polytechnic Institute), Houman Owhadi (California Institute ofTechnology), Grigorios Pavliotis (Imperial College, London), Lea Popovic (Con-cordia University), Mike Reed (Duke University), Gareth Roberts (Universityof Warwick), Christof Schuette (Free University, Berlin), Andrew Stuart (Uni-versity of Warwick), Anja Sturm (University of Delaware), Eric Vanden Eijnden

117

(New York University), Mike West (Duke University), Darren Wilkinson (New-castle University).

2.2. Workshop on Self-Organization and Multi-Scale MathematicalModeling of Active Biological Systems, October 26-28, 2009

Organizers: Igor Aronson (Argonne National Laboratory), Leonid Berlyand(Penn State University)

Topics: How do seemingly random mixtures of molecular components andswimming microorganisms organize themselves into large-scale cellular struc-tures and synchronize to execute collective motions? What is common to theseprocesses? Can we model them mathematically and simulate them computa-tionally? What can we learn about the biofluids flows and self-organizationof biological systems by developing and analyzing mathematical models? Theworkshop brought together mathematicians, statisticians, biophysicists and en-gineers to discuss these questions and focus on the following mathematical andstatistical issues:

• Homogenization and upscaling in active systems (suspensions of swim-mers, cytoskeletal networks, biological tissue),

• Analysis of mathematical model of self-organization in biological systems,

• Novel computational algorithms for simulations of biological processes.

Complementary research thrusts include the statistical properties of self-organized collective states and macroscopic effects of intrinsic stochasticity inactive systems, such as velocity and density fluctuations, phase transitions, crit-ical behavior, rheology and constitutive relations.

Speakers: Anna Balazs (University of Pittsburgh), Andrea Bertozzi (UCLA),John Brady (Caltech), Suncica Canic (University of Houston), Thierry Collin(University of Bourdeaux), Margaret Gardel (University of Chicago), JerryGolub (Haverford College), Yulia Gorb (Texas A&M), Michael Graham (Uni-versity of Wisconsin-Madison), Danielle Hilhorst (Universite Paris-Sud), JamesKeener (University of Utah), Simona Mancini (Universite d’Orleans), Tim Ped-ley (University of Cambridge), Benoit Perthame (Universite Pierre et MarieCurie), Jacques Prost (ESPCI), and Michael Shelley (Courant Institute).

2.3. Workshop on Theory and Qualitative Behavior of StochasticDynamics, February 8-10, 2010

Organizers: Michael Cranston (University of California-Irvine), Yuri Bakhtin(Georgia Tech), Luc Rey-Bellet (University of Massachusetts), Jonathan Mat-tingly (Duke University)

118

Topics: Stochastic modeling is becoming ubiquitous in science and engineer-ing. This conference brought together speakers addressing a number of advancesin describing the qualitative dynamics of stochastic systems such as stochasticbifurcation, structure of invariant measures, metastability, averaging, and suband super diffusive behavior. Although the presentations emphasized theoreti-cal aspects of the subject, there was a bias toward results which shed light onrelevant models of physical, biological or technological interest.

Speakers: Yuri Bakhtin (Georgia Tech), Peter Baxendale (USC), Dima Dol-gapyat (University of Maryland), Paul Dupuis (Brown University), BarbaraGentz (University of Bielefeld), Stanislav Molchanov (UNC-Charlotte), AlexeiNovikov (Penn State University), David Nualart (University of Kansas), Grig-orios Pavliotis (Imperial College London), Luc Ray-Bellet (University of Mas-sachusetts), Michael Scheutzow (Berlin University), Richard Sowers (Universityof Illinois)

2.4. Workshop on Molecular Motors, Neuron Models, and Epidemicson Networks, April 15-17, 2010

Organizers: Priscilla Greenwood (ASU), John Fricks (Penn State), Lea Popovic(Concordia), Scott Schmidler (Duke)

Topics: The workshop concentrated on primary topics of research in three ofthe SAMSI working groups. In addition to lectures by the speakers listed below,there were reports from the following working groups.

• Molecular motors. These are biological molecular machines that convertenergy into motion, producing movement in living organisms, operatingat a scale where thermal noise is significant. Because motor events arestochastic, they may be modeled in terms of random walks or Markovprocesses. An active topic is the stochastic dynamics of such models.

• Neuron models. Several well-known models are being studied with theaim of comparing the stochastic dynamics of the models. A fundamentalquestion is, in what cases the transition from firing to quiescence and backis governed by boundary crossing times of a stochastic process, as opposedto large deviation times. The larger question for stochastic dynamics isthe nature of transition times between domains of attraction of locallystable configurations.

• Epidemics. They have been defined on networks in various ways. Oneapproach is to fix a network and define a version of a classical mean fieldepidemic model on that network by specializing the interaction equationsto specific local interactions. The challenge is to devise models wheresolutions, or at least some of their properties, are accessible.

119

Speakers: Andrea Barriero (U. of Washington), Meredith Betterton (U. ofColorado), Susan Ditlevesen (Copenhagen), David Hunter (Penn State), DavidKinderleher (Carnegie Mellon), Martina Morris (U. of Washington), DuaneNykamp (U. of Minnesota), Laura Sacerdote (Toronto), E. Volz (U. of Michi-gan), Hongyun Wang (UC Sanata Cruz)

2.5. Transitions Workshop, November 17-19, 2010

Organizers: Peter Kramer (Rensselaer Polytechnic Institute), Jonathan Mat-tingly (Duke University) and Sorin Mitran (UNC-Chapel Hill)

Topics: The Stochastic Dynamics transition workshop was an opportunity forthe participants in the program to reconnect with each other and present theprogress of sustained research efforts that have proceeded beyond the end of theprogram at SAMSI. Presentations were given from each of the working groups:Network Dynamics, Data Assimilation, Numerical Methods for Stochastic Sys-tems, Qualitative Behavior of Stochastic Dynamical Systems, and BiologicalStochastic Dynamics.

Speakers: Meredith Betterton (Colorado), John Fricks (PennState), GilesHooker (Cornell), Peter Kramer (Rensselaer Polytechnic Institute), John Har-lim (NCState), Peter Kotelenez (Case Western), Jonathan Mattingly (DukeUniversity), Scott McKinley (Florida), Sorin Mitran (UNC Chapel Hill), BruceRogers (Duke), Ravi Srinivasan (Texas), Amanda Traub (NCState), XueyingWang (Texas A&M), Jennifer Young (Rice)

3. Working Groups

Five research working groups were organized as part of the program.

3.1. Working Group on Qualitative Behavior of Stochastic Dynamics

Group Leader: Jonathan Mattingly (Duke)

Participants: Avanti Athreya (SAMSI), Hakima Bessaih (Univ. of Wyoming),Esteban Chavez (Duke), Cindy Greenwood (ASU), Oliver Diaz (Duke), BarbaraGentz (Universitat Bielefeld), Boumedienne Hamzi (Duke), Badal Joshi (Duke),Tiffany Kolba (Duke), Jonathan Mattingly (Duke), Scott McKinley (Duke), Pe-ter Mucha (UNC), Scott Schmidler (Duke), Ravi Srinivasan (Duke), AndreaWatkins (Duke)

Topics: The group studied a number of topics in the qualitative description ofstochastic dynamics, such as stochastic resonance, Lyapunov exponents, quasi-invariant measures, h-transforms to condition stochastic systems, large devia-tion theory to understand nucleation in stochastic partial differential equations,structure of random attractors, and multiscale analysis in probability. Some

120

of the work by members of this group is described under the heading of singleneuron dynamics. A description of some of the other concrete results follows.

• Planar dynamical systems:

Avanti Athreya, Tiffany Kolba, and Jonathan Mattingly are currentlyexamining two specific systems with small additive noise where (a) withoutnoise, the unperturbed ODE has regions of instability and/or finite-timeblowup; and (b) with noise, the system seems to stabilize, either in thesense that near-periodic orbits emerge or in the sense that there exists aunique invariant probability measure (Scheutzow gave a nice example ofsuch a system in a 1991 paper). For one of the dynamical systems underconsideration, we know by asymptotic analysis how the invariant measuredecays far from the origin, and we’re still addressing the question of itsexistence. For the other, we are trying to see if we can reduce the problemof escape from a region of deterministic instability to a problem of escapefrom a double-well potential.

• Variance reduction for stochastic differential equations:

In many applications of stochastic modeling, the dynamics are governedby an SDE depending on certain parameters and key quantities of interestare expressed as the expected value of an observable with respect to the in-variant probability measure of the SDE, and computed by time-averagingover long trajectories. In some situations, for example when one wishes tooptimize the expected value with respect to the parameters, it is desirableto compute the change, or sensitivity of the expected value with respect tovariations in the parameter. A challenge is that sensitivities are typicallyrelatively small quantities, so very long trajectories may be required toestimate them accurately.

In an on-going effort to develop fast methods for computing sensitivities,Kevin Lin and Jonathan Mattingly are working on a class of methodsbased on couplings. The basic idea is to construct two correlated real-izations of the SDE for nearby parameters in a way that ensures the twosolutions remain near each other most of the time. One then obtains adirect, low-variance estimate of the sensitivity via integrating differencesalong the path. The current focus of the project is the numerical anal-ysis of these methods. Specifically, an investigation of different couplingstrategies and their effectiveness for some prototypical problems is beingcarried out, with the goal of understanding how details of the couplingconstruction and features of the stochastic dynamics together determinethe scaling of numerical errors with respect to the parameter perturbation.

• Sustained oscillations for density dependent Markov processes:

A paper has been written by Peter H. Baxendale and Priscilla E. Green-wood and submitted to the Journal of Mathematical Biology. Simula-tions of models of epidemics, biochemical systems, and other bio-systems

121

show that when deterministic models yield damped oscillations, stochasticcounterparts show sustained oscillations at an amplitude well above theexpected noise level. A characterization of damped oscillations in termsof the local linear structure of the associated dynamics is well known, butin general there remains the problem of identifying the stochastic processwhich is observed in stochastic simulations.

Here we show that in a general limiting sense the stochastic path describesa circular motion modulated by a slowly varying Ornstein-Uhlenbeck pro-cess. Numerical examples are shown for the Volterra predator-prey model,Sel’kov’s model for glycolysis, and a damped linear oscillator.

3.2. Working Group on Numerical Methods for Stochastic Systems

Group Leaders: Alina Chertock (NCSU), Sorin Mitran (UNC)

Participants: Anna Amirdjanova (Univ. of Michigan), Dave Anderson (Univ.of Wisconsin), Hakima Bessaih (Univ. of Wyoming), Alina Chertock (NCSU),Sean Cohen (UNCC), Dan Fovargue (UNC), John Fricks (Penn State), Pe-ter Kramer (Rensselaer), Kevin Lin (Univ. of Arizona), Jonathan Mattingly(Duke), Michael Minion (UNC), Sorin Mitran (UNC), Carlos Mora (Universi-dad de Concepci???n), Anuj Mubayi (Univ. of Texas at Arlington), Hon KeungTony Ng (SMU), Ravi Srinivasan (Duke), Ilya Timofeyev (Univ. of Houston)

Topics: The classical continuum equations arising in fluid flow, elasticity, orelectromagnetic propagation in materials require constitutive laws to derive aclosed-form system. The constitutive laws appropriate for a given set of equa-tions can be derived in two ways: i) from phenomenological considerations suchas the linear behavior underlying elastic deformation or Newtonian fluid flow,or ii) from averaging of kinetic theory results describing basic molecular dynam-ics. This has been possible for situations in which the microscopic behavior isclose to thermodynamic equilibrium. In such cases, Gaussian statistics are wellverified at the microscopic level and the moments of Gaussian distributions canbe computed analytically.

However, many physical processes exhibit significant localized departurefrom Gaussian statistics. When a solid breaks, the motion of the atoms inthe crystalline lattice along the crack propagation path is no longer governed bya Maxwell-Boltzmann distribution. When a material undergoes a phase change,large scale correlations among atoms are formed (or destroyed) which modifythe typically Gaussian statistics of the equilibrium phases. Protein folding canbe seen as a large-scale modulation imposed by polymer links of the Gaussianstatistics of the component atoms. A common characteristic of these situationsis that macroscopic features impose the departure from local thermodynamicequilibrium and macroscopic quantities of are practical interest. Crack propa-gation is initiated by a force acting on a solid and we wish to know how far thesolid deforms before it breaks. In solidification, a heat flux evacuates energy

122

from the melt at some rate and we wish to characterize the type of order arisingin the material.

In all such cases, a basic problem is how to extract the statistical distribu-tions of physical quantities when the system is away from thermodynamic equi-librium. Knowledge of the distribution would allow local constitutive laws to beformulated. Direct numerical simulation is prohibitively expensive. Continuumlevel simulation is incomplete due to lack of constitutive laws. Furthermore,while it is clear that higher-order moments characterizing the microscopic sta-tistical distribution are required, it is not known how many of these momentsare needed and what their persistence time might be. The microscopic dynam-ics are stochastic but subject to multiple macroscopic constraints. One majorstatistical challenge is how to characterize the microscopic motion in a mannerthat can be used to derive a constitutive law. A basic computational questionis how to advance the system in time efficiently at both the microscopic andmacroscopic level. An analysis challenge is how to combine this knowledge andform (e.g. through asymptotic expansions) particular constitutive laws.

The numerics group has been discussing a number of multi-scale numericalmethods that mix stochastic models at small scales with deterministic models atlarger scales. Particular issues on which we focused included the incorporation ofadaptive time stepping and the initialization of microscale simulations consistentwith the macroscale constraints. In spring, Sorin Mitran rolled out his noveltime-parallel continuum-kinetic-molecular (tpCKM) simulation scheme, whichis designed to handle multiscale systems where the microscale statistics do notrelax quickly relative to the macroscopic time scale. A third, mesoscopic, level ofdescription was added and updated by an adaptive predictor-corrector approachthrough communication with both the macroscale and microscale simulations.Mitran and his graduate student Jennifer Young had separately been studyingpolymeric and cellular blebbing with this method, while the working groupexamined a one-dimensional string model where bonds could form and breakon the microscale with rates dependent on the local stress. Implementations ofboth the classical Heterogeneous Multiscale Method and tpCKM method wereexplored.

One of the working groups in the contemporaneous Space-Time Analysis forEnvironmental Mapping, Epidemiology, and Climate Change, concerned withthe “Interaction of Deterministic and Stochastic Models”, developed a focusedinterest in multiscale computing and in particular the efforts of this NumericalMethods Working Group. This has led to a sustained collaboration across thesetwo working groups, now dedicated to improving a statistical aspect of thecommunication between the mesoscale and microscale dynamics in the tpCKMmethod. Statistical issues in multiscale computing appear not to have beenmuch pursued until now, but the close interaction of applied mathematicians andstatisticians at SAMSI has identified new areas of research along this direction.

123

3.3. Working Group on Data Assimilation

Group Leaders: Cindy Greenwood (ASU), John Harlim (NCSU), Peter Kramer(RPI)

Participants: Cindy Greenwood (ASU), Boumendiene Hamzi (Duke), JohnHarlim (NCSU), Giles Hooker (Cornell), Peter Kramer (RPI), Huitian Lu (SanDiego State University), Jonathan Mattingly (Duke), Anuj Mubayi (Univ. Texasat Arlington), Bruce Rogers (SAMSI)

Topics: This group has been striving to explore questions of interest to bothapplied mathematicians and statisticians. To this end, we have focused on theproblem of optimizing the design of experiments based on incoming data in astochastic system, and which can be formulated as a stochastic control problem.

When collecting data from experimental systems, the experimenter may beable to control the quantities measured, the measurement times, estimationmethods used for inference, and inputs into the system. We have focused onthe control-type question of what process-informed input to a system will yieldmaximal information about parameters governing system dynamics. Two real-world applications have motivated this study:

• Single neuron experiments involve stimulating a neuron with an electricalcurrent and measuring the electrical potential across the neuron mem-brane. Voltage measurements are typically available with very high fre-quency and accuracy; however complex models involving the interactionof several ion channels can underpin neural dynamics. In this context,experiments typically use step-inputs. Since these inputs can be readilycoupled to output systems, an automated control process can be readilydesigned to maximize information about parameters of interest.

• An experimental ecology typically consists of a few species of micro-organism kept in a chemostat (approx 500ml). In such systems ecolog-ical dynamics are of interest; predator-prey interactions, trait evolution orphenotype plasticity. These ecologies can be”tuned” by setting their ambi-ent temperature, the amount of nutrient provided to the ecology, and therate at which the ecology is diluted. These parameters are typically heldat constant values, chosen based on intuition, and changed, again basedon intuition, after observing the ecological dynamics. For such systems,measurements are less frequent and much noisier than for neural experi-ments, creating a more challenging, but possibly more important problemin providing principled programs for varying system settings dependingon the observation history.

These two examples can be represented by a sequence of increasingly complexmodels for real-world experimental problems. In adapting stochastic controltheory to the problem of experimental design, we have attempted to maximizethe Fisher information about the parameters of interest.

124

The paradigm of input-control that attempts to maximize an objective canbe represented within the framework of Control Theory. However, this is com-plicated by a number of factors:

• The optimal control is typically infinite, requiring either a restriction toa feasible range or some form of penalization. The use of these controlscan also destroy the stationarity of the system being observed, makinginferential methods more problematic.

• The evaluation of Fisher Information in noisily-observed nonlinear stochas-tic systems can be difficult and typically requires computationally expen-sive numerical approximation. Further, the Fisher Information is not rep-resentable as an integral over times, meaning the optimal control can de-pend on the entire sequence of past observations rather than, as is usuallythe case, the filtered estimate of the current state.

In order to derive algorithms to calculate the optimal control we have focusedon two approximations. First, we represent the system in terms of discrete-valued states and input. This allows restrictions on the range of values thatthe input can take to be naturally enforced. Second, the Fisher Informationis replaced with the Information that would be obtained were the states of thesystem to be observed exactly at all times. This framework reduces the problemto one of optimal control for noisily observed systems where it has been shownthat the control may be chosen based on a filtered estimate of the current state.

It is, of course, important to understand the properties of these approxi-mations. To that end we have investigated the case of a discrete-time linearsystem where algorithms to calculate the Fisher Information have been alreadydeveloped. We have derived the optimal control for this problem and shownthat the dependence of the optimal input on past observations decays exponen-tially as time moves forward. These calculations can be further used to suggestcorrections to the approximate methods derived above.

Further areas of investigation include solving the complete control problemfor discrete-valued systems, the approximation of control variables by basis ex-pansions for higher-dimensional systems and the use of particle filter methodsto evaluate Fisher Information, and the choice of measurement times and quan-tities.

These investigations have lead to a number of papers that are currentlyin preparation. One, based on the approximations above requires specializedparticle-filter code to be developed for nonlinear systems, but is already un-derway. A second, based on the linear model and the accuracy of the approx-imations above is being prepared. A graduate student at Cornell Universityis currently working on deriving the optimal control in a discrete-state hiddenMarkov model with applications to Polymerase chain reaction systems and apaper resulting from this work is expected in the new year.

125

3.4. Working Group on Network Dynamics

Group Leaders: Pierre Gremaud (NCSU), Peter Kramer (RPI), Peter Mucha(UNC)

Participants: Avanti Athreya (SAMSI), Cindy Greenwood (ASU), Badal Joshi(Duke), Peter Kramer (RPI), Kevin Lin (Univ. of Arizona), Jonathan Mattingly(Duke), Scott McKinley (Duke), John McSweeney (SAMSI), Anuj Mubayi (Univ.of Texas at Arlington), Peter Mucha (UNC), Katie Newhall (Rensselaer), JamesNolen (Duke), Bruce Rogers (ASU), Scott Schmidler (Duke), Ravi Srinivasan(Duke), Yi Sun (SAMSI), Rachel Thomas (Duke), Ilya Timofeyev (Univ. ofHouston), Mandi Traud (UNC), Xueying Wang (SAMSI)

Topics: This group organized under the common expectation that the net-work of connections coupling different stochastic processes are affected by theunderlying structure of such couplings, but with little a priori expectation aboutwhich structural characterizations might matter. Our working group discussionsrevolved around three topic areas: (i) biological diseases and social contagions,as examples of dynamics spreading on a network; (ii) neuronal network dynam-ics, as an example of dynamical systems coupled by selected network topologies;and (iii) NSF research institute participation data (this example of a social net-work is familiar to participants and might help develop ideas about the utilityof network analysis for the interests of this group).

• Contagion Dynamics on Social Networks:

The first common broad system of interest within the working group wasthe study of diseases and social contagions on networks.

John McSweeney and Bruce Rogers have considered the contact process(SIS epidemic) on a strongly clustered graph, described as a stochasticblock model. In the simplest case there are two communities with theprobability of an edge connecting two individuals within the same com-munity p1 and the probability of an edge connecting two individuals be-tween communities p2 much smaller than p1. In a range of parameters,a plot of the number of infected sites versus time shows a stair step: theepidemic reaches equilibrium in one community before spreading to theother. Work on this problem is continuing in the Dynamics ON Networksworking group of the Complex Networks in collaboration with new post-doc David Sivakoff and Rick Durrett. A proof has been sketched and workis proceeding on a paper.

Cindy Greenwood and Xueying Wang have been analyzing the pairwiseclosure approximation (PAC) theory for the stochastic SIRS model (whererecovery from an infective state leads to a temporary immune state) [GannaRozhnova and Ana Nunues, 2009] and comparing it with the stochasticSIS model (where recovery from infection leads directly to a susceptiblestate) [Eames and Keeling] on regular graphs. In one limiting case of the

126

PAC theory for the SIRS model, the two dimensional principal compo-nent of the stochastic paths exhibits sustained stochastic oscillations, inthe form of a Ornstein-Uhlenbeck process composed with a time depen-dent rotation. This work has led to a paper and was the subject of one ofthe presentations at the Transitions Workshop.

• Neuronal Network Dynamics:

One subgroup has explored a key question in neuronal networks: how toderive quantitative descriptions and models for the collective behavior ofa network of neurons starting from neuronal-level dynamical equations forthe membrane voltage and a description of the topology of the network.Stochasticity enters the description typically to describe irregularity ofsignaling inputs, which are then propagated by the network dynamics.Particularly interesting collective behaviors include synchronous oscilla-tions and the formation of mobile spatial patterns where neurons are firingintensely.

A starting point for the subgroup’s investigation is the kinetic theoryapproach developed by one of the opening workshop speakers, David Cai,and his co-workers to describe the overall network firing properties bycoarse-graining the neuronal dynamics in a manner parallel to describingthe evolution of a gas density by coarse-graining the dynamics of individualgas molecules.

The subgroup also found particular interest in recent work by Liu andNykamp on a ”second-order” kinetic theory that has some promise fordescribing large-scale synchrony. These neuronal kinetic theories have sofar not accounted for the spatial distribution (apart from the topologi-cal description) of the neurons, and the subgroup spent some time andeffort thinking about how to incorporate a spatial component into the ki-netic theories or some other mathematical framework that might make itpossible to develop a sort of spatio-temporal continuum equations for theneuronal firing patterns that upscale in a quantitative way the neuronal-level description of the dynamics.

In a related investigation, Katie Newhall, Mandi Traud, Peter Kramerand Peter Mucha have worked on a project which explores different dis-cretizations of simple integrate-and-fire dynamics and the relation betweenstatistical properties of the cascades in these models and the underlyingnetwork of interconnectivity. This project was actually initiated by thegraduate students Newhall and Traud at the intersection of their researchinterests, with their advisors Kramer and Mucha providing subsequentguidance and focus with regard to approaching the question they posed.This work has been presented by Traud in a minisymposium at the SIAMConference on the Life Sciences in July, 2010, by Newhall in a posterpresentation at the opening workshop of the Complex Networks programat SAMSI in August 2010, and is in the process of being written up forsubmission to the journal Chaos.

127

• Institute-Participation Data in Mathematics:

In order to develop network analysis experience and expertise more broadlywithin the working group, we endeavored to supplement our stochastic dy-namics interest with social network analysis of a specific data set. Therewas particular interest leading to direct discussions with the NSF to tryto efficiently gather public co-occurrence data about participants in pro-grams and workshops at SAMSI and the other NSF-funded mathematicalinstitutes. Working group discussions briefly explored standard softwaretools available for social network analysis which might be used on thesedata, along with more specialized tools of community detection and anal-ysis of co-occurrence hypergraphs within the working group and otherspecialists local to the triangle area. However, no easy access to the datawas immediately available from which to undertake this analysis withinthe group.

3.5. Working Group on Biological Stochastic Dynamics

Group Leaders: Amarjit Budhiraja (UNC), Cindy Greenwood (ASU), PeterKramer (Rensselaer)

Participants: Mark Alber (Notre Dame), Avanti Athreya (SAMSI), AmarjitBudhiraja (UNC), Esteban Chavez (Duke), Amogh Deshpande (NCSU), NuwandeSilva (Rensselaer), Oliver Diaz (Duke), John Fricks (Penn State), CindyGreenwood (ASU), Kazi Ito (NCSU), Badal Joshi (Duke), Min Kang (NCSU),Peter Kramer (Rensselaer), Mimi Lin (Duke), Kevin Lin (Arizona), Huitian Liu(San Diego State), Scott McKinley (Duke), John McSweeney (SAMSI), AnujMubayi (Univ. of Texas at Arlington), Katie Newhall (Rensselaer), Lea Popovic(Concordia), Bruce Rogers (ASU), Ravi Srinivasan (Duke), Rachel Thomas(Duke), Xueying Wang (Ohio State University), Richard Yamada (Universityof Michigan)

Topics: The explosion of interest in mathematical and statistical modelingand computation in the biological and medical sciences, where stochasticity ispresent at nearly every scale, has been one of the most exciting trends in thebiosciences in the last ten years. A sizable group has organized at SAMSI tostudy applications of stochastic models and methods to questions in biology,ranging from the population to the cellular and down to the molecular level.

To begin the year, members presented pedagogical overviews of stochasticmethods in biology as well as illustrations of research problems they had workedon and were interested to explore further. Stimulated by these discussions andthe presentations at the opening workshop, six project areas were identified ashaving substantial interest by midyear. These were each discussed on a rotatingbasis during the weekly group meetings.

The molecular motor topic crystallized into a sustained coherent researchcollaboration between four participants, with the ongoing work presented at

128

two SIAM conferences thus far, a first publication being prepared, and twomajor grant proposals submitted.

The single neuron dynamics project has also continued beyond the end ofthe Stochastic Dynamics program, with at least one publication likely to beforthcoming.

While the microswimming project generated a sustained collaborative re-search focus on a recent experimental paper through the early spring, it waslater abandoned as being less promising than originally thought. On the otherhand, one of the visiting faculty has launched a new research project in this areawith local collaborators through the stimulus provided by his time at SAMSI.

While the three other projects remained only in an exploratory phase, theyhelped broaden the exposure of the group, particularly the graduate studentsand postdocs to a wider range of topics of stochastic modeling in biology.

• Molecular motors:

An active collaboration emerged from the working group concerning themodeling and analysis of systems of molecular motors bound to a singlecargo. The central question concerns how the details of the stochasticmodel characterizing individual molecular motors, including their responseto applied forces and proclivity to bind or unbind from the microtubuletrack, is reflected in the effective dynamics of the complex of the cargobound to multiple molecular motors. The group’s effort has primarilyfocused thus far on constructing a serious, physically consistent spatialmodel, and applying stochastic averaging techniques to derive effectiveequations for the motor-cargo complex from physically and biologicallybased stochastic dynamical equations for the individual components.

Avanti Athreya, John Fricks, Peter Kramer, and Scott McKinley are cur-rently writing up their work for publication under the title “Cooperativedynamics of kinesin and dynein type molecular motors”. We plan to sus-tain this collaboration beyond this paper, and to this end, Fricks, Kramer,and McKinley, together with bioengineer William Hancock have submit-ted two major grants to the National Science Foundation’s Focused Re-search Group (FRG) program and the Joint DMS/NIGMS Initiative toSupport Research in the Area of Mathematical Biology. The work pro-posed in these grants would use the mesoscopic modeling developed bythe group at SAMSI as a scaffold from which to build a mathematicallyintegrated stochastic model for cellular transport, from the nanoscale tothe microscale of the whole cell.

• Stochastic Single Neuron Models:

Several collaborative subgroups worked between the biological stochasticdynamics group and the qualitative behavior working group in applyingstochastic analysis techniques to single-neuron models.

Cindy Greenwood together with short-term visitor Laura Sacerdote andcollaborator Maria Teresa Giraudo have submitted a paper “How sample

129

paths of Leaky Integrate and Fire models are influenced by the presence ofa firing threshold” to Neural Computation. This paper treats the statisticsand simulation of sample voltage paths taken by a neuron conditioned onit not having fired (reached threshold) by a certain time.

Avanti Athreya, Badal Joshi, and Xueying Wang considered coherenceand self-induced stochastic resonance in Morris-Lecar models for neurons.This project would advance the characterization of these phenomena byVanden-Eijnden, Muratov, and DeVille in FitzHugh-Nagumo models tothe more complex Morris-Lecar model. For a range of parameter val-ues, the single neuron deterministic Morris-Lecar model has a stable fixedpoint, an unstable limit cycle and a stable limit cycle. Here the stablefixed point corresponds biologically to quiescence of the neuron while thestable limit cycle corresponds to firing an action potential or spiking. Ithas been shown by Rowat [Neural Computation 19, 1215???1250 (2007)]that noise plays a role in spontaneously switching between the two modesof quiescence and firing.

The group extended this work of Rowat into regimes where the deter-ministic model has only a stable fixed point but no stable limit cycles.Even in this parameter regime, having noise in the model can result intwo modes of firing - quiescence and active, and the noise can cause spon-taneous switching between the modes. Thus noise not only plays a rolein switching, but introduces a completely new dynamic mode which doesnot exist in the deterministic model. This has important consequences forpredicting the behavior of a neuron when an external current is applied.Along these lines, Avanti Athreya, Scott McKinley, and Ravi Srinivasanexamined how to model escape from the stable equilibrium in Morris-Lecar models for a neuron, thinking of it as a ”breathing potential” nearthe fixed point. It seems valid to take a small-noise limit in the Morris-Lecar model because the noise can come from ion channels or from inputcurrent, but large deviation approaches don’t seem quite appropriate. Be-sides internal or channel noise in single neurons, another important sourceof noise arises in the synapse or the connection between neurons.

In an ongoing project, the subgroup is studying and classifying the rangeof behaviors of small networks of two or three neurons, connected in afeedforward or a feedback manner. In such small networks, the interactionof synaptic noise and channel noise can give rise to a range of novel andinteresting behaviors. The idea is to develop a framework that would helpqualitatively understand such small networks, and then possibly extendthe insights to larger, more biologically realistic network sizes.

Particular questions being addressed by this subgroup, first by simulation,then to be followed up by analytic work are: How are the firing patternsgenerated by the Morris-Lecar and Fitzhugh-Nagumo models related whensinusoidal forcing, ion channel noise, and synaptic noise are included inthese models? Is there a large deviation cycling effect in the Morris-Lecar

130

model as has been found in the Fitzhugh-Nagumo model, and how is thisrelated to neuron firing?

A related question being pursued by some of the neuron subgroup mem-bers is inference from the spike activity of two neurons. Suppose we havedata in the form of the time points when spikes occur as is usually de-picted in raster plots. A cross-correlogram is a commonly used measure forquantifying the dependence structure, but it has its difficulties in interpre-tation, for example, in inferring the network architecture from the spikingactivity of two neurons. For instance, if the spike times of two neurons arehighly correlated, it can be because they are connected directly or indi-rectly to each other or it can happen because they receive common input.This question of how to distinguish these two cases is being approachedfrom a modeling point of view. Suppose we start with two integrate andfire neurons which receive a common Poisson input. Can we predict thedependence structure in the spiking activity of the two neurons?

• Microswimming:

A workshop at SAMSI in Fall 2009 on Self-Organization and Multi- ScaleMathematical Modeling of Active Biological Systems has provided manystimulating ideas to the group, with two subgroups pursuing specific projects.

One subgroup studied in particular the paper “Dynamics of EnhancedTracer Diffusion in Suspensions of Swimming Eukaryotic Microorganisms”by Leptos, Guasto, Gollub, et al. The results of this paper were pre-sented by Gollub at the workshop. Experimental results for the dynamicsof tracer particles in a flow generated by a suspension of swimming mi-croorganisms showed a striking self-similar probability distribution for thedisplacement of a tracer with respect to time increment, which howeverdid not correspond to any known stable law.

The subgroup sought to develop simplified stochastic models for the dy-namics of these tracer particles that could explain this observation. Thisproject was eventually abandoned when questions began to arise abouthow exactly the data were being collected and processed, and the groupconcluded it was too far removed from the experiment to understand theseissues in adequate detail in order to proceed with confidence in modeling.

Another talk by Michael Shelley on kinetic theories for suspensions of mi-croswimmers elicited a remark by Peter Mucha that, from his experiencewith non-living suspensions, a stochastic treatment of diffusion could leadto important differences from the deterministic treatment of diffusion of-ten employed in kinetic theories. The issues of incorporating stochasticitymore faithfully in the kinetic theory of Shelley was discussed in the work-ing group, and has since been taken up by Peter Kramer at his homeinstitution together with a collaborator Patrick Underhill and a graduatestudent.

• Modeling Biological Swarmsusing Nonlinear Diffusions:

131

Another subgroup was motivated by a different set of talks in the workshopon Self-Organization and Multi-Scale Mathematical Modeling of Active Bi-ological Systems, particularly the one by Andrea Bertozzi on modeling thecollective motion of biological organisms, e.g., locust swarms, fish schools,ant track formation, etc. Her approach employs, for example in work withTopaz, a two-dimensional continuum kinetic model with a nonlocal socialinteraction.

Amarjit Budhiraja, Oliver Diaz and Xin Liu are exploring the use of tech-niques from the theory of nonlinear diffusions to see if they can capturethe key aspects of dynamical feature formation from the phenomenologicalcontinuum theories using a more fundamental stochastic particle systemwith a mean field type interaction. This nonlinear diffusion approach isexpected to not only yield flexible and tractable simulation schemes butalso give a greater insight to the underlying biological mechanism. Theinteracting particle models are expected to be related to the continuummodels through a Propagation of Chaos limit. Rigorously establishingsuch hydrodynamic limit theorems and the associated fluctuation theo-rems is the long term goal of this working group. This work is currentlyin progress.

• Reaction networks:

John Fricks, Badal Joshi, John McSweeney, and Richard Yamada exploreda specific genetic regulatory network in the suprachiasmatic nucleus, whichis responsible for maintaining circadian rhythms. A detailed stochasticsimulation of this network (which takes into account the finite number ofmolecules involved and the effectively random times at which they interactwith each other) exhibits oscillations of concentration levels, a qualitativefeature completely missed by a coarse-grained deterministic model for thisnetwork (based on mass-action principles). The group is seeking to explainthe oscillations observed in the simulation through a coarse-grained theorythat incorporates stochastic fluctuations, such as a van-Kampen (near-Gaussian) expansion.

132

FINAL REPORT: SAMSI PROGRAM ON

SPACE-TIME ANALYSIS FOR ENVIRONMENTAL

MAPPING, EPIDEMIOLOGY AND CLIMATE

CHANGE

Edited by Alan Gelfand, Montse Fuentes and Richard Smith

June 9, 2011

Organizers: Noel Cressie (Ohio State), Michael Stein (University of Chicago), Dongchu Sun(University of Missouri), Jim Zidek (University of British Columbia).

Scientific Advisory Committee: Peter Diggle (Lancaster University), Peter Guttorp (Universityof Washington), Jesper Møller (Aalborg)

Directorate Liaison: Jim Berger (Richard Smith after 07/01/2010)

Local Scientific Coordinators : Montse Fuentes (NCSU), Alan Gelfand (Duke), Richard Smith(UNC).

National Advisory Committee Liaison: Jun Liu (Harvard).

This 12 month SAMSI program focussed on problems encountered in dealing with randomspace-time fields, both those that arise in nature and those that are used as statistical represen-tations of other processes. The sub-themes of environmental mapping, spatial epidemiology, andclimate change are interrelated both in terms of key issues in underlying science and in the statis-tical and mathematical methodologies needed to address the science. Researchers from statistics,applied mathematics, environmental sciences, epidemiology and meteorology were involved, andthe program promoted many opportunities for interdisciplinary, methodological and theoreticalresearch.

The program began with a Summer School on Spatial Statistics (July 28-August 1, 2009, atSAMSI), which featured invited lectures and hands-on tutorials from Sudipto Banerjee (U. Min-nesota), Reinhard Furrer (U. Zurich), Doug Nychka (National Center for Atmospheric Research)and Stephen Sain (National Center for Atmospheric Research).

The Opening Workshop took place from September 13–16, 2009, and featured the followingtutorial lectures:

• Alan Gelfand, Duke University: Hierarchical Modeling for Analyzing Space and Space-timeData

• Jim Zidek, University of British Columbia: Designing Monitoring Networks

133

• Michael Stein, University of Chicago: Asymptotics for Spatial and Spatial-temporal Data

• Sylvia Richardson, Imperial College London: Introduction to Spatial Epidemiology

These were following by a workshop that included five invited paper sessions on “ClimateChange”, “Random Fields and Applications”, “Spatial Epidemiology”, “Spatial and Time-spacePoint Processes” and “Interface of Deterministic Modeling and Space-time Statistics”, as well astwo New Researcher sessions.

On Thursday, September 17, 2009, SAMSI was host to a visit from several scientists at theNational Climatic Data Center (NCDC). This included presentations from Matt Menne and JohnBates of NCDC.

Other workshops organized in connection with the program were:

• SAMSI was joint organizer of the GEOMED 2009 workshop, Nov. 14–16, 2009, at the MedicalUniversity of South Carolina

• Climate Change Workshop, at SAMSI — February 17-19, 2010

• One-day Workshop on Objective Bayesian for Spatial and Temporal Models, in San Antonio,TX — March 20-21, 2010

• Statistical Aspects of Environmental Risk, at SAMSI — April 7-9, 2010

• Transition Workshop, at SAMSI — October 11-13, 2010

Three graduate level courses were taught as part of the program:

• Spatial Epidemiology (Fall);

• Theory of Continuous Space and Space-Time Processes (Fall);

• Spatial Statistics in Climate, Ecology and Atmospherics (Spring).

As usual in SAMSI programs, the bulk of the research activity centered around the Work-ing Groups. Nine Working Groups were formed, and remainder of this report is concerned withreporting the research activities of each of these groups.

1 WG1: Paleoclimate

1.1 Goals and Outcomes of the Working group

A major challenge in climate work is the fact that surface temperatures before about 1850 canonly be indirectly inferred from temperature-sensitive geological proxy data. The same is trueof other climate variables, such as precipitation and the pressure field. An understanding of theclimate of the past is necessary in order to understand the natural variability of the system, and toplace observed changes in the system over the last century, and changes predicted for the future,in a broader context. Estimates of the past evolution of climate variables such as the surfacetemperature space-time field, along with estimates of the associated uncertainty, can be used to

134

assess the extent to which currently observed changes are unusual with respect to the naturalvariability of the system, and to make probabilistic assessments about dynamical hypotheses ofhow climate varies and changes.

While much work has been done on this sort of problem, there remains room for substantialimprovement in the statistical techniques used to reconstruct past climate from proxy data. Theworking group has three main objectives within which there are many sub-objectives:

1. Develop new and/or improve existing statistical methods for three different reconstructionproblems, where each will involve combining incomplete paleoclimatic data and instrumentalobservations:

(a) The reconstruction of univariate climate field, such as temperature.

(b) The reconstruction of a climate index, such as ENSO or the AO, that has major impactson weather and climate on a global scale.

(c) The reconstruction of a multivariate climate field, such as temperature and precipitation.

2. Test these methods against existing ones and objectively assess their performance.

3. Produce new paleoclimatic analyses for the last several hundred to several thousand years,beginning with surface temperatures over different spatial domains.

1.2 Summary of working group Activities: 2009 Sep–2010 October

1.2.1 Membership

Core members: Peter F. Craigmile (Ohio State); Murali Haran (Penn State); Bo Li - PurdueUniversity Elizabeth Mannshardt-Shamseldin (Duke); Martin P. Tingley (SAMSI postdoc, now atNCAR); Bala Rajaratnam (Stanford — chair of working group).

All the above were in residence during Fall 2009.

Other members: Priscilla Greenwood (Arizona State); Rajib Paul (Western Michigan)

1.2.2 Summary of administrative working of group activities

• Standard Meeting Time: Thursday 1pm EST

• Weekly meetings during the academic year 2009-2010 and now biweekly meetings

• First semester Activities (“soul searching”): 1) Presentations from both statisticians andenvironmental/geoscientists and consultations with experts. 2) Identification of problem areasdeemed to be important to the paleoclimate community. 3) Explore the feasibility of studyingproblems that were identified

• Second semester (and Summer) Activities: 1) Preliminary discussions of joint research work

I. Paleoclimate exploratory data analysis

II. Discussions of a theoretical nature and presentation of whether in the paleoclimate contextone should use separable covariance functions or not

135

– Contributions at Climate Change workshop (February, 2010)

– Development and Writing of paper that is currently in review

– JSM (2010) representation: Joint session of paleoclimate and extremes working groups

1.2.3 Paleoclimate Working Group Research Output

The group has been extremely productive and there is a tangible, approximately 100 page researchoutput from the working group in the following paper:

Tingley, M.P., Craigmile, P.F., Haran, M., Li, B., Mannshardt-Shamseldin, E. and RajaratnamB., “Piecing together the past: Statistical insights into paleoclimatic reconstructions”. Underreview at Journal of Climate.

Available at http://statistics.stanford.edu/∼ckirby/techreports/GEN/2010/2010-09.pdf

The paper identifies all the issues (as we see them) in the language of modern day statistics.The paper builds on recent work in the paleoclimate literature using Bayesian hierarchical models— gives a unifying framework. The “grand goal is to move paleoclimate reconstructions away fromsimple regression based approaches towards hierarchical models.” The aims of the paper are:

1. Establish a modeling and notational framework that is transparent to both the earth scienceand statistics communities.

2. Outline and distinguish between scientific and statistical challenges and indicate where mod-ern statistics can contribute.

3. Offer some suggestions for model construction and explain how to perform the required sta-tistical inference.

4. Identify issues that are important to both the earth science and applied statistics communities,and encourage greater collaboration between the two.

1.2.4 Other Research Activities

Craigmile, Peter and Bala Rajaratnam. DISCUSSION OF: A STATISTICAL ANALYSIS OFMULTIPLE TEMPERATURE PROXIES: ARE RECONSTRUCTIONS OF SURFACE TEM-PERATURES OVER THE LAST 1000 YEARS RELIABLE? The Annals of Applied Statistics2011, Vol. 5, No. 1, 88–90.

Haran, Murali and Nathan M. Urba. DISCUSSION OF: A STATISTICAL ANALYSIS OFMULTIPLE TEMPERATURE PROXIES: ARE RECONSTRUCTIONS OF SURFACE TEM-PERATURES OVER THE LAST 1000 YEARS RELIABLE? The Annals of Applied Statistics2011, Vol. 5, No. 1, 61–64

Nychka, Douglas and Bo Li. DISCUSSION OF: A STATISTICAL ANALYSIS OF MULTIPLETEMPERATURE PROXIES: ARE RECONSTRUCTIONS OF SURFACE TEMPERATURESOVER THE LAST 1000 YEARS RELIABLE? The Annals of Applied Statistics 2011, Vol. 5,No. 1, 80–82.

136

Smith, R.L. (2010), Understanding sensitivities in paleoclimate reconstructions. Submitted forpublication.

Tingley, Martin P. SPURIOUS PREDICTIONS WITH RANDOM TIME SERIES: THELASSO IN THE CONTEXT OF PALEOCLIMATIC RECONSTRUCTIONS. DISCUSSION OF: ASTATISTICAL ANALYSIS OF MULTIPLE TEMPERATURE PROXIES: ARE RECONSTRUC-TIONS OF SURFACE TEMPERATURES OVER THE LAST 1000 YEARS RELIABLE? TheAnnals of Applied Statistics 2011, Vol. 5, No. 1, 83–87

Tingley, M.P. and Li, B. (2011), Comments on “Reconstructing the NH mean temperature:Can underestimation of trends and variability be avoided?” by Bo Christiansen; under review atJournal of Climate.

1.3 List of Working group members

Mark Berliner, Ohio State UniversityK Sham Bhat, Penn StatePeter Craigmile, Ohio State UniversityNoel Cressie, Ohio State UniversityCindy Greenwood, Arizona State UniversityMichele Guindani, University of New MexicoMurali Haran, Penn StateJohn Harlim, Courant Institute/NCSUGardar Johannesson, Lawrence Livermore National LaboratoryKaren Kafadar, Indiana UniversityBo Li, PurdueErnst Linder, Univeristy of New HampshireElizabeth Mannshardt-Shamseldin, SAMSI/DukeDoug Nychka, NCARGarritt Page, DukeRajib Paul, Western Michigan UniversityBala Rajaratnam, STanfordBrian Reich, NCSUSteven Roberts, Australian National UniversityJulia Salzman, StanfordJenise Swall, EPARichard Smith, UNC/SAMSICarolyn Snyder, StanfordMartin Tingley, SAMSI/NCARSaeid Yasamin, SAMSIIan Wong, StanfordJun Zhang, SAMSI

137

2 WG2: Spatial Exposures and Health Effects (Montse Fuentes,Amy Herring and Brian Reich)

2.1 Summary

This working group developed and implemented spatial-temporal statistical models to study theimpact under climatic change conditions of air pollution on human health. We introduced Bayesianmultivariate spatial dependence structures in the environmental stressors, climatic variables, andhealth outcomes, while taking into account different sources of uncertainty in models and data.We developed novel spatial quantile regression models for the climatic and pollution variables forbetter characterization of extremes, tail behavior, and complex dependences between weather andpollution. We developed Bayesian hierarchical shrinkage methods for assessing spatial associationsbetween complex pollutant mixtures and health outcomes. We improved upon existing approachesby simultaneously accounting for different pollutant types, such as ozone and particulate matter(PM) or speciated PM, characterizing the spatial temporal structure of the susceptible periods offetal development (for pregnancy outcomes) and the exposure lag (for mortality outcome), whiletaking into account different sources of uncertainty in models and data.

This work had led to several papers, and the submission of an NIH proposal, as well as severalconference presentations.

NIH Proposal: Space-time Modeling for Linking Climate Change, Pollutant Exposure, BuiltEnvironments, and Health Outcomes. PI: Montserrat Fuentes, co-PIs: Amy Herring, Alan Gelfand,Brian Reich and Marie Lynn Miranda. $1.1M, NIH R-01 submitted

For the remainder of the report of this working group, we describe the outcomes of severalindividual projects that were conducted as part of the working group’s activities.

2.2 Estimating Individual Activity Spaces (Leader: Catherine Calder)

In this collaboration with sociologists and geographers, we are developing methods to estimateindividual activity spaces, or the space-time context for human exposures, and examining theconsequences of exposure to violence and disadvantage on health and behavioral outcomes.

Papers in preparation:

• Calder, C.A. and Darnieder, W.F. Bayesian Inference for Incomplete Marked Spatial PointPatterns.

• Krivo, L.J., Washington, H.M., Peterson, R.D., Browning, C.R., Calder, C.A., and Kwan,M.-P. Social Isolation of Disadvantage and Advantage: The Reproduction of Inequality inUrban Space.

Presentation:

• C.A. Calder, “Bayesian Estimation of Individual Activity Spaces from Incomplete ActivityPattern Data” Conference on the Dynamics of Space-Time Use, The Ohio State University,Columbus, OH, October 2-3, 2009

138

• C.A. Calder, “Bayesian Inference for Incomplete Marked Spatial Point Patterns: EstimatingIndividual Activity Spaces”. SAMSI Program on Space-Time Analysis Closing Workshop,October 11-13, 2010

Posters:

• “Bayesian Inference for Incomplete Marked Spatial Point Patterns” by C.A. Calder. Pre-sented at Valencia/ISBA Meeting, Benidorm, Spain, SAMSI Program on Space-Time Anal-ysis Closing Workshop, June 3–8, 2010.

Grants: NSF and NIH

2.3 Effects of Air Pollution Exposure on Cardiovascular Health (Leader: Cather-ine Calder)

Using spatially-referenced longitudinal data collected as part of the Dallas Heart Study, we areestimating the effects of exposure to particulate matter on cardiovascular health.

Grants: in preparation for the NIH

2.4 Identifying Critical Windows in Studies of Air Pollution and Preterm Births(Josh Warren, Montserrat Fuentes, Amy Herring)

We developed a model for examining the relationship between exposure to PM2.5 and ozone andthe probability of preterm birth, with a focus on identifying critical windows of pregnancy duringwhich increased exposure to these pollutants may be particularly dangerous. Our methods havebeen applied to data from Texas vital records, are currently being applied to a Texas birth defectsregistry, and will soon be applied a ten-state study that is the largest study of the etiology ofbirth defects ever conducted, the National Birth Defects Prevention Study. We are currentlyworking on extensions of these methods that relax some distributional assumptions and that lookat multivariate outcomes.

Paper:

• Warren, J., Fuentes, M., Herring, A., and Langlois, P. Air Pollution and Preterm Birth:Identifying Critical Windows of Exposure. Under Review.

2.5 Health Impacts of Future Ambient Ozone Levels due to Climate Change(Howard Chang, Jingwen Zhou, Montserrat Fuentes)

To quantify the health impacts of future air pollution due to climate change, we have developed amodeling framework that integrates data from climate model outputs, historical meteorology andozone observations, and a health surveillance database. We applied this approach to examine therisks of future ozone levels on non-accidental mortality across 19 urban communities in SoutheasternUnited States. We are currently extending the model to address the following statistical challenges:calibrating multivariate climate model outputs, spatial misalignment, and joint analysis of airquality and temperature.

Papers:

139

• Chang, H.H., Zhou, J. and Fuentes, M. (2010). Impact of Climate Change on Ambient OzoneLevel and Mortality in Southeastern United States. International Journal of EnvironmentalResearch and Public Health 7: 2866-2880.

• Zhou, J. and Fuentes, M. Calibration of air quality deterministic models using nonparametricspatial density functions. In preparation.

Talks:

• Impact of climate change on ambient ozone level and mortality in Southeastern United States.Multi-pollutant Multi-city Working Group, US Environmental Protection Agency, June 2010.Durham, NC

• Impact of climate change on ambient ozone level and mortality in Southeastern United States.SAMSI Workshop on Statistical Aspects of Environmental Risk, April, 2010.

2.6 Time-to-event Analysis of Air Pollution and Preterm Birth (Howard Chang,Brian Reich, Marie Lynn Miranda)

Most studies of preterm birth use a binary indicator of preterm birth as the response. We proposea flexible spatial survival model to estimate the effect of environmental exposure on the risk ofpreterm birth. We view gestational age as time-to-event data where each pregnancy enters therisk set at a pre-specified time (e.g. the 32th week). The pregnancy is then followed until either(1) a birth occurs before the 37th week (preterm); or (2) it reaches the 37th week and a full-termbirth is expected. Our simulation study shows that this is a more powerful study design than morecommon time series and case-control designs. Also, by analyzing these data as survival data, weare able to study effects of both acute and long-term exposures jointly. The proposed approachis applied to geocoded births during 2002-2007 in North Carolina and we report an increased riskof preterm birth associated with exposure to fine particulate matter during the first and secondtrimesters.

Papers:

• Chang, H.H., Reich, B.J., and Miranda, M.L. Spatial time-to-event analysis of fine particulatematter and preterm births. In preparation.

• Chang, H.H., Reich, B.J., and Miranda, M.L. Fine particle air pollution and preterm birthin North Carolina, 2003-2007. In preparation.

Talks:

• Fine particle air pollution and preterm birth in North Carolina, 2001-2005. Third NorthAmerican Congress of Epidemiology, June 2011, Montreal, Canada

• Spatial time-to-event analysis of fine particulate matter and preterm births. Summer ResearchConference, Southern Regional Council on Statistics, June 2010. Virginia Beach VA.

• Time-to-event analysis of preterm birth and fine particulate matter. Joint Statistical MeetingAugust 2010. Vancouver Canada.

140

• Time-to-event analysis of preterm birth and fine particulate matter. Environmental StatisticsSeminar, Harvard Department of Biostatistics, August 2010. Boston MA.

2.7 Exposure Measurement Error in Time Series Studies of Air Pollution andHealth (Howard Chang, Montserrat Fuentes)

In a time series design, the health outcome is only available as daily total number of adverse eventsin a community. Unbiased risk estimates require the exposure measure to coincide with the trueaverage exposure experienced by all at-risk individuals in the community. We have examined theeffects of exposure measurement error due to spatial heterogeneity in the pollutant concentrations,motivated by the health effects of coarse particulate matter. We are currently developing modelsto incorporate spatial distribution of the at-risk population and personal exposure simulations inthe times series design. The proposed approach is used to study the association between personalexposure to fine particulate matter and mortality in the New York City metropolitan area.

Paper:

• Chang, H.H., Peng, R.D., and Dominici, F. Estimating the acute health effects of coarseparticulate matter accounting for exposure measurement error. Under Revision.

Talk:

• Estimating the acute health effects of coarse particulate matter accounting for exposure mea-surement error. ENAR, 2010. New Orleans, LA.

2.8 Additional Activities

• Invited JABES highlight session at JSM 2011.

• 2011 ENAR short course: A Practical Introduction to the Analysis of Geocoded and ArealHealth Data.

• Collaboration with EPA on a national multi-site time series analysis of daily mortality andchemical constituents of fine particulate matter. Paper in preparation:

Chang HH, Baxter LK, Fuentes M, and Neas LM. Daily mortality and chemical constituentsof fine particle air pollution.

3 WG3: Interaction of Deterministic and Stochastic Models (Co-leaders: Murali Haran, Pennsylvania State University, and M.J.Bayarri, University of Valencia)

3.1 Overview

Core group members: M. J. Bayarri, University of Valencia, Spain; Jim Berger, Duke; MuraliHaran, Penn State; John Harlim, NC State (from SAMSI stochastic dynamics program); Emily

141

Kang, SAMSI postdoc: Peter Kramer, RPI (from SAMSI stochastic dynamics program); HansKuensch, ETH Zurich, Switzerland.

Other key members: Sham Bhat, Penn State: Anabel Forte, University of Valencia, Spain;Robert Wolpert, Duke; Danilo Lopes, Duke; Priscilla Greenwood, Arizona State.

The goal of our group was to work on statistical issues in complex deterministic and stochasticmodels. We identified the following potential areas:

1. Approximations, emulation and calibration for complex computer models, typically Gaussianprocess-based models, and inference for stochastic differential equations.

2. Bridging the gap between the “usual” statistics idea of spatial modeling (accounting fordependence) and capturing spatial dynamic processes that are of scientific interest, which weenvisioned as a potential avenue for interactions between the spatial program and the SAMSIstochastic dynamics program.

3. Using deterministic models/process to construct flexible space-time models.

4. Deterministic models as surrogate for probability/process model (“computer likelihoods”).

5. Embedding deterministic models within a hierarchical framework to learn about parametersof interest.

6. Developing efficient computational approaches for simulating from multiscale models.

This working group was unique among the SAMSI space-time modeling working groups of 2009–2010 in that the research involved a close collaboration between two SAMSI programs in 2009–2010, the stochastic dynamics program, predominantly involving applied mathematicians, and thespatial program, primarily involving statisticians. Two applied mathematicians, Peter Kramer fromRensselaer Polytechnic Institute, and John Harlim from North Carolina State University, have beencore members of the group, and have driven the research problem definition in conjunction withthe statisticians. The focus of this collaboration was work in multi-scale dynamical systems; to ourknowledge this is the first such collaboration between statisticians and applied mathematicians.

Summary of working group activities: The fall 2009 semester involved several research talksand discussions about potential projects by group members and collaborators of group members.Several of these problems included specific scientific applications:

(i) Modeling/inference for a component model for alcohol usage,

(ii) Dengue fever modeling based on a data set from Peru,

(iii) Exploring mathematics underlying a class of spatial dynamics models.

(iv) Studying multiscale dynamical systems or heterogeneous multiscale methods (HMMs).

(v) Emulation/calibration for complex computer models for multivariate spatial processes.

(iv) Bayesian model averaging of climate models for temperature projections.

142

Our group focused on projects (iv)-(vi); several related projects have either been completedor are ongoing projects involving the working group members. Projects (i)-(iii) represent futureavenues for research. Several of our research project proposals will continue through our recentlyformed collaborations. Others may find their home in future SAMSI programs. In particular, manyof the complex computer models problems may fit nicely into the “Uncertainty Quantification”SAMSI program in 2011.

3.2 Specific Activities

3.2.1 Multiscale modeling and simulation, spatio-temporal filtering

Project (iv) on multiscale modeling has been a focal point of research discussions. A coregroup of working group members listed above is meeting biweekly to discuss multiscale simula-tion/computation (HMM)-related research. HMMs are used for simulating physical processes thathave both “macroscale” and “microscale” dynamics. This has become an active research area inapplied mathematics and scientific computing in the past decade and has lots of important scientificapplications, for example modeling complex fluid dynamics and climate systems. This appears tobe a rich and still unexplored area of research for statisticians. Hence, a significant outcome of ourworking group’s activities is the fact that we have opened up possibilities for a new and potentialfruitful line of research.

Research manuscripts are planned and the group intends to submit multi-institutional grantproposals to the National Science Foundation, among other funding sources. The group is workingclosely with Sorin Mitran, an applied mathematician and participant in the stochastic dynamicsprogram at SAMSI, who is providing the specific examples and associated computer code and datawhich will act as a testbed for the new methods we develop.

Research manuscripts

J. Harlim, Numerical Strategies for Filtering Partially Observed Stiff Stochastic DifferentialEquations, J. Comput. Phys., 230(3), 744-762, 2011.

E.L. Kang and J. Harlim, A Fast Filtering Framework for Assimilating Partially ObservedMultiscale Systems: The Macro-Micro-Filter, submitted to Mon. Wea. Rev., 2010.

3.2.2 Complex computer models with multivariate spatial output

Project (v) above, on Gaussian process models for computer model emulation and calibration wasanother area of focus for the group.

Research manuscripts:

Bhat, K. S., Haran, M., Goes, M. (2010) Computer model calibration with multivariate spatialoutput: a case study in climate parameter learning. In Frontiers of Statistical Decision Makingand Bayesian Analysis (Ming-Hui Chen, Dipak K. Dey, Peter Mueller, Dongchu Sun, and KeyingYe, eds.). Springer

Bhat, K. S., Haran, M., Keller, K., Tonkonojenkov, R. (2010) Inferring likelihoods and climatesystem characteristics from climate models and multiple tracers, submitted.

143

M.J. Bayarri (2010). A methodological review of computer models. In Frontiers of StatisticalDecision Making and Bayesian Analysis (Ming-Hui Chen, Dipak K. Dey, Peter Mueller, DongchuSun, and Keying Ye, eds.). Springer, pp 157-168.

Reichert, P., White, G., Bayarri, M.J., Pitman, E.B. (2011). Mechanism-based emulation ofdynamic simulation models: concept and application in hydrology. Computational Statistics andData Analysis (in press).

Submitted grant proposal:

P. Kramer (RPI), Statistical Computing Methodologies in Dynamical Network and MultiscaleApplications, co-PI K. Bennett; submitted to National Science Foundation, Division of Mathemat-ical Sciences in December 2010; under review.

3.2.3 Bayesian model averaging for climate projections

Bhat, K. S., Haran, M., Keller, K. (2010) Combining multiple climate models using Bayesian modelaveraging to project surface temperature, in preparation.

4 WG4: Computation, Visualization, and Dimension Reduction(Frank Zou, NISS; Scott Holan, University of Missouri and NoelCressie, Ohio State)

Our Working Group (WG) consists of approximately 60 members, though some are not active. Ithas approximately 10 postdocs and 8 graduate students.

Active postdocs: Sourish Das, Emily Kang, Esther Salazar, Ben Shaby, Xia Wang, and FrankZou

Active graduate students: Sham Bhat, Matthias Dorit Hammerling, Katzfuss, and Yajun Liu

WG has attracted several outside members, including: Kate Cowles, Hedibert Lopes, and BrianSmith

Meetings started on 9/25/09, with an average attendance of 14 (on site) and 10 (remote).

4.1 Schedule of meetings (September 2009 through March 2010)

9/25/09. Introductory/Organizational meeting

10/2/09. Two presentations, on high performance computing and dimension reduction: AndrewFinley (Michigan State University), Peter Kramer (Rensselaer Polytechnic Institute).

10/9/09. Two presentations, on parallel computing and S4 classes in R: Ginger Davis (Universityof Virginia) and Blair Christian (EPA)

10/16/09. Two presentations, on regional climate models and spatially varying coefficient mod-els: Ernst Linder (University of New Hampshire) and Scott Holan (University of Missouri,Columbia)

144

10/23/09. Two presentations, on covariance tapering and petroleum applications: Jo Eidsvik(Norwegian University of Science and Technology) and Orietta Nicolis (University of Bergamo,Italy)

10/30/09. One presentation, on modeling dynamic controls on ice streams: Noel Cressie (TheOhio State University)

11/06/09. Two presentations, on parallel computing and hierarchical spatial process models: Vir-gilio Gomez-Rubio (Universidad de Castilla-La Mancha) and Sudipto Banerjee (University ofMinnesota)

11/13/09. Two presentations on RAMPS — an R package for complex spatio-temporal data:Brian Smith (University of Iowa) and Kate Cowles (University of Iowa).

11/20/09. Two presentations, on independent component analysis for multivariate time series andMonte Carlo strategies for calibration in climate models: David Matteson (Cornell University)and Gabriel Huerta (University of New Mexico)

12/04/09. Two presentations, on ancillary sufficient interweaving scheme and Monte Carlo strate-gies for calibration in climate models: Yajun Liu (University of Missouri) andMurali Haran(Penn State)

12/11/09. One presentation, on understanding covariance tapering: Ben Shaby (SAMSI)

1/15/10. Two presentations, on spatio-temporal statistics and models that include covariatesin covariance structures: Noel Cressie (Ohio State) and Alexandra Schmidt (UniversidadeFederal do Rio de Janeiro)

1/22/10. One presentation, on spatio-temporal analysis of total nitrate concentrations using dy-namic statistical models: Sujit Ghosh (North Carolina State)

1/29/10. Two presentations, on spatial dynamic factor analysis and using temporal variabilityto improve spatial mapping with application to satellite data: Hedibert Lopes (University ofChicago) and Emily Kang (SAMSI)

2/5/10. Two presentations, on dynamic spatial models and Bayesian methods in syndromicsurveillance: Dani Gamerman (Universidade Federal do Rio de Janeiro) and Frank Zou(NISS).

2/12/10. One presentation, on space-time modeling with Markov random fields, plus subgroupupdates: Marco Ferreira (University of Missouri, Columbia)

2/26/10. One presentation on Ensemble Kalman Filters, plus subgroup updates: Hans Kuench(Swiss Federal Institute of Technology)

3/12/10. One presentation, on surveillance for detecting emergent space-time clusters, plus sub-group updates: Renato Assuncao (Universidade Federal de Minas Gerais)

3/26/10. One presentation, on spatially varying autoregressive processes, plus subgroup updates:Bruno Sanso (University of California Santa Cruz).

145

In addition to the above, visits were paids to the CICS group at the National Climatic DataCenter (NCDC) by Scott Holan (12/08/09) and by Bruno Sanso and Richard Smith (03/25/10).At these meetings, the potential was discussed for applying the new statistical techniques to newspatio-temporal datasets from NCDC, though it was subsequently decided to concentrate instead onregional climate model output from the NARCCAP group at the National Canter for AtmosphericResearch.

In May, 2010, the groups split into four subgroups that subsequently met separately. The workof these subgroups will now be described.

4.2 Covariance Decomposition Using Wavelets (Leader: Orietta Nicolis)

The activities of the working group began by examining relevant papers on methods for reducing therank of large covariance matrices. Initially, their attention was directed toward methods based onmultiresolution techniques and wavelets functions. In particular, each member gave a presentationand wrote a summary of relevant papers.

In this context several issues were discussed including

1. Which wavelet was better than others in approximating spatial covariance functions?

2. How the computational burden can be reduced using a reduced/fixed rank covariances?

3. How to extend these methods using a Bayesian approach.

4. How methods based on spatial multiresolution covariances can be extended to the spatio-temporal domain.

The group decided to propose a covariance model (based on wavelet functions) for analyzingsatellite data. They considered the dataset of Aerosol and SO2 from the NASA website:

http://mirador.gsfc.nasa.gov/index.shtml.

The aim was to propose a spatio-temporal model for analyzing the ash plume of the Icelandvolcano during the month of April 2010.

The group realized that the data needed a pre-processing before being used and found severalproblems. The group decided to use simulated data, to start, and has ongoing meetings to discusstheir progress.

4.3 Spatio-Temporal Point Processes (Leader: Noel Cressie)

Noel Cressie led the SG on spatio-temporal point processes. Their first activity was to carry outa literature survey. The group read and wrote summaries of about 20 papers, and have postedthe summaries on ’SG2 Central,’ the subgroup website. After the literature survey, the grouplooked for an appropriate space-time dataset to analyze. The group eventually decided to use theNARCCAP dataset and to model the pixels of extreme rainfall. In fact, such thresholded pixelsare better modeled as a random set whose ”germs” are generated by a type of autoregressive (intime) Poisson spatial point process. Random sets are defined by their hitting functions and, as ofthis writing; the SG is calculating the hitting function of the proposed spatio-temporal set model.

146

The subgroups intention is to publish a joint paper. They communicate by telecon at a regulartime (10am Mon) most weeks, and the minutes of the meetings are posted on SG2 Central.

Submitted paper:

N. Cressie, R. Assuncao, S.H. Holan, M. Levine, O. Nicolis, J. Zhang and J. Zou (2011), Dynam-ical random-set modeling of concentrated precipitation in North America. Technical Report No.854, March, 2011, Department of Statistics The Ohio State University, Columbus, OH. Submittedto Statistics and Its Interface.

4.4 Spatial and Spatio-Temporal Classes in R (Leaders: Kate Cowles and BrianSmith)

Brian Smith developed an R class of spatial objects that can be used with both geostatistical modelsand conditional autoregressive models.

Brian Smith developed the R package ”magma” that enables parallelized matrix operations tobe carried out on graphical processing units (http://cran.r-project.org/package=magma).

Kate Cowles and summer students beta tested and benchmarked the ”magma” package andfound speed-ups of up to 40 fold for operations related to spatial data analysis.

Kate Cowles identified relevant funding sources (from both government and industry) for re-search in spatiotemporal modeling and computation.

4.5 Space-time models for regional climate model data (Leader: Bruno Sanso)

Subgroup 4, which eventually reduced to Xia Wang (NISS), Andrew Finley (MSU), Ingelin Steins-land (ntnu), Dorit Hammerling (UMich), Esther Salazar (Duke), Paul Delamater (MSU), andBruno Sanso (UCSC), has been meeting regularly to work on space-time models for regional cli-mate output. The following paper has been submitted for publication:

Salazar, E., Sanso, B., Finley, A.O., Hammerling, D., Steinsland, I., Wang, X. and Delamater, P.(2011), Comparing and Blending Regional Climate Model Predictions for the American Southwest.

Bruno Sanso presented a talk based on this paper at the SAMSI spatial transition workshopas well as at Workshop on Environmetrics in Boulder, CO, 10/11-13/2010. Dorit Hammerlingpresented a poster based on this paper at the 3rd NARCCAP user workshop, April 7-8 2011,Boulder, CO.

4.6 Syndromic Surveillance

Manuscripts:

Zou, J., Karr, A., Heaton, M., Banks, D., Datta, G., Lynch, J. and Vera, F. (2010), BayesianMethodology for Spatio-Temporal Syndromic Surveillance. Submitted to Statistical Analysis andData Mining.

Heaton, M., Banks, D., Zou, J., Karr, A., Datta, G., Lynch, J. and Vera, F. (2011), A Spatio-Temporal Absorbing State Model for Disease and Syndromic Surveillance. Submitted to Statisticsin Medicine.

147

Presentations and Posters:

Bayesian Methods in Syndromic Surveillance, QMDNS Meeting, Fairfax, VA, May 2010.

Bayesian Methods in Syndromic Surveillance, DTRA/NSF Algorithm Workshop, Chapel Hill,NC, June 2010.

Modeling the Spatio-temporal Dynamics of Risk with Application to Disease and SyndromicSurveillance, DTRA/NSF Algorithm Workshop, Chapel Hill, NC, June 2010.

Hierarchical Bayesian modeling in syndromic surveillance, ISBA World Meeting, Benidorm,Spain, June 2010.

5 WG5: Spatial Extremes (Richard Smith)

Membership:

Juliette Blanchet (WSL-Institut fur Schnee, Switzerland)Lisha Chen (Yale)James Christian (EPA)Dan Cooley (Colorado State)Peter Craigmile (Ohio State)Sourish Das (Duke/Samsi)Anthony Davison (EPFL, Switzerland)Robert Erhardt (UNC)Alan Gelfand (Duke)Souparno Ghosh (Texas A&M)Radu Herbei (Ohio State)David Holland (EPA)Scott Holan (Univ of Missouri)Gabriel Huerta (Univ of New Mexico)Janine Illian (Univ of St. Andrews)Soyoung Jeon (UNC)Matthias Katzfuss (Ohio State)Linyuan Li (Univ of New Hampshire)Kenny Lopiano (Univ of Florida)Jim Lynch (Univ of South Carolina)Elizabeth Mannshardt-Shamseldin (Duke/Samsi)Garritt Page (Duke)Abel Rodriguez (UC Santa Cruz)Huiyan Sang (Texas A&M)Bruno Sanso (UC Santa Cruz)Marian Scott (Univ of Glasgow, U.K.)Benjamin Shaby (SAMSI)Richard Smith (UNC/SAMSI)Martin Tingley (SAMSI/NCAR)Anand Vidyashankar (Cornell)Maria Franco Villoria (Univ Glasgow, U.K.)

148

Jianqiang Wang (NISS)Robert Wolpert (Duke)Linda Young (Univ of Florida)Jun Zhang (SAMSI)Zhengjun Zhang (Univ of Wisconsin)James Zidek (University of British Columbia)

Classical (univariate) extreme value theory revolves around the class of limiting distributions forsample extremes, which may be summarized by the Generalized Extreme Value (GEV) distribution

G(x ; µ, ψ, ξ) = exp

{−(

1 + ξx− µψ

)−1/ξ+

}(y+ = max(y, 0)).

The extension to higher dimensions, known as Multivariate Extreme Value Theory, leads tomultivariate distributions with the property that for and dimension K ≥ 1 and for each N ≥ 1there exist constants {ANk > 0, BNk, 1 ≤ k ≤ K} such that

GN (x1, ..., xK) = G(AN1x1 +BN1, ..., ANKxK +BNK).

Distributions satisfying this property are known as Max-Stable Distributions.

In studying spatial extremes, much attention recently has focussed on a class of processes knownas Max-Stable Processes: A stochastic process over some index set (typically space or time) suchthat all the finite-dimensional distributions are max-stable.

In recent years, a number of specific examples of max-stable processes have been examined,including the “Smith model” (originally given in an unpublished 1990 paper), the class of modelsdue to Schlather (2002), an extension of the Schlather model defined by Davison and Gholamrezaee,and the class of “Brown-Resnick” models and their extensions (Kabluchko, Schlather and de Haan,2009).

Each of these models is defined by a stochastic representation that does not lead to closed-form solutions for the multivariate densities of the process. Therefore, direct methods of statisticalestimation, such as maximum likelihood and Bayes, are not applicable. However, for each ofthe classes of max-stable models specified above, there do exist closed-form expressions for thebivariate densities. This suggests estimation based on maximizing the composite likelihood, wherethe composite likelihood in effect consists of the product of all bivariate densities (Padoan et al.2010).

This more or less defined the starting point for our working group: max-stable processes havebeen increasingly used over the last few years as a model for spatial extremes, and thanks to thecomposite likelihood theory, there is now a viable means of estimating them, but that still leaves alot of open questions, including the development of a “threshold” theory of estimation (comparablewith the extension of univariate and later multivariate extreme value theory from the generalizedextreme value distributions for sample maxima to univariate and multivariate generalized Paretodistributions for exceedances over thresholds), the development of a Bayesian theory of estimation(with associated applications to, for example, prediction of extreme events), and a wider range ofapplications. Among the latter were a special focus on paleoclimatic extremes (motivated in part bythe work of the parallel group on Paleoclimate, with whom we shared some common members) and

149

the recently developed theory of “single-event attribution” in climate change, which is concernedwith the problem of how well we can attribute a single extreme event to human-induced causes ofclimate change as opposed to natural variability.

After an initial series of meetings of the whole group, it was decided to focus future attentionwithin four subgroups concentrating on these four specific themes.

5.1 Threshold methods

Membership: Dan Cooley, Soyoung Jeon, Elizabeth Mannshardt-Shamseldin, Richard Smith,Robert Wolpert.

The challenge is to develop a version of the composite likelihood approach for max-stable pro-cesses based on threshold exceedances rather than the block maxima approach which is the mainone considered previously. It turns out that there is a fairly straightforward construction to writedown a possible composite likelihood in this setting, but understanding the properties of the result-ing estimators is a much harder task. This group benefitted considerably from interactions withthe spatial extremes group at EPFL (Lausanne, Switzerland), which included visits to SAMSI byAnthony Davison and Raphael Huser from EPFL; Huser gave a talk at the Transition Workshop onthe approach that he has developed in collaboration with Davison. Meanwhile, SAMSI GraduateFellow Soyoung Jeon worked independently on an alternative approach which uses the the second-order theory of bivariate extremes to develop bias and variance approximations to the compositelikelihood estimators in a threshold context; such approximations lead to theoretical expressionsfor the optimal threshold as well as establishing asymptotic normality of the resulting estimators.A paper is shortly to be submitted (Jeon and Smith 2011).

Independently, Robert Wolpert developed approximations and a simulation methodology forthe joint distibution of k ≥ 2 points from a class of max-stable processes, and for maxima overregions. The method has potential to lead to exact maximum likelihood or Bayesina inference incertain cases (Wolpert 2010).

References:

Jeon, S. and Smith, R.L. (2011), Max-Stable Processes for Threshold Exceedances in SpatialExtremes. Manuscript, shortly to be submitted.

Wolpert, R. (2010), Spatial extremes. Preliminary version of a paper.

5.2 Bayesian methods

Membership: Dan Cooley, Peter Craigmile, Rob Erhardt, Matthias Katzfuss, Ben Shaby, RobertWolpert.

Since there is no closed-form expression for the joint density of k observations from a max-stableprocess, when k > 2, there is no direct method of Bayesian inference. For frequentist inference,the method of composite likelihood has become fashionable (Padoan, Ribatet and Sisson, 2010).This therefore raises the question of whether the composite likelihood concept can be extended toa Bayesian context. A recent paper by Ribatet, Cooley and Davison (2010) has proposed such amodification. Independently, SAMSI postdoc Ben Shaby found an alternative construction (Shaby

150

2011). Both methods rely on “correcting” the composite likelihood function in the neighborhoodof its maximum to derive a more realistic approximation to the true likelihood function.

An alternative approach to approximating the likelihood by a simpler form is to approximatethe process itself. Reich and Shaby (2011) proposed a discrete approximation to one of the well-known classes of max-stable processes, resulting in a construction for which a standard MCMCalgorithm is applicable.

A different kind of alternative approach is to abandon the likelihood function altogether anduse the “approximate Bayesian computing” framework that has become popular for using Bayesiananalysis for problems with intractable likelihoods. The computational demands of this approachare considerable, but UNC graduate student Rob Erhardt has developed an approach using parallelcomputing that seems very promising. Although vastly more computationally intensive than thecomposite likelihood approach, recent simulations show that the resulting estimates are superior tothose computed via maximum composite likelihood. A paper has been submitted for publication(Erhardt and Smith, 2011).

Papers:

Erhardt, R. and Smith, R.L. (2011), Approximate Bayesian Computing for Spatial Extremes.Submitted for publication.

Reich, B.J. and Shaby, B. (2011), A finite-dimensional construction of a max-stable process forspatial extremes. Submitted.

Ribatet, M., Cooley, D. and Davison, A.C. (2010), Bayesian inference from composite likeli-hoods, with an application to spatial extremes. Under revision for Statistica Sinica.

Shaby, B. (2011), The open-faced sandwich adjustment for estimating function-based MCMC.In preparation.

5.3 Paleoclimatic Extremes

Membership: Peter Craigmile, Elizabeth Mannshardt-Shamseldin, Martin Tingley, Jim Zidek.

This group examined the relationship between changing climate and extreme climate events.Possible trends identified in temperature reconstructions lead to several questions about long-termclimate behavior, and how to interpret this behavior given the patterns seen in proxy series. Forexample, it is of interest to use proxy data to address questions such as “Is there evidence that theextreme events of recent decades are more extreme than previous decades?” This group looked atwhat the statistics of extremes has to offer the field of paleoclimatology through modeling of theoriginal proxy series, and sought to address several of the emerging areas of research that intertwineextreme value analysis and paleoclimatic reconstruction.

Paper in preparation:

Mannshardt-Shamseldin, E., Craigmile, P. and Tingley, M. (2011), Paleoclimatic extremes inproxy data. Shortly to be submitted.

151

5.4 Operational (Single-Event) Attribution

Membership: Richard Smith, Martin Tingley, Xuan Li (UNC), Matthias Katzfuss, Michael Wehner(Lawrence Berkeley Lab.), Jun Zhang.

Detection and Attribution is a well-established field of climate change research that is concernedwith deciding whether observed changes and trends in the climate can be attributed to humancauses. In a typical application of the method, a climate model (or several climate models) isrun under different combinations or forcing conditions, e.g. control runs (stationary climate; noexternal forcing), natural forcings (including solar fluctations and volcanoes) and a combinationof natural and anthropogenic forcings, where the anthropogenic forcings include greenhouse gasesand sulfate aerosols. By regressing an observed climate signal on different combinations of climatemodels, it is possible to say when the anthropogenic signal is “detected” (usually characterized bya regression coefficient being statistically significantly different from zero), and in that case, the“attribution” refers to the partioning of observed climate change to different forcing factors, asmeasured by their regression coefficients.

Recently, attention has shifted to the problem of attribution of single events, where a singleextreme meteorological event may be studied for evidence that it was in some sense “caused by”anthropogenic climate change. Following a landmark paper by Stott, Stone and Allen (Nature,2004), such evidence is often presented in terms of a fraction of attributable risk (FAR) of anextreme event that is derived from the anthropogenic components of a climate model (or models).

This working group sought a more rigorous derivation of FAR using extreme value theory. Theidea is to estimate the probability of an observed extreme event based on comparative runs ofclimate models under different combinations of control, natural-forcings or anthropogenic-forcingsscenarios; however, the confidence intervals associated with such probability estimates are invariablyvery wide. Smith and Wehner will shortly submit a paper proposing a profile likelihood solution tothis problem; Li (2011) is working for her PhD dissertation on an alternative Bayesian approach.

References:

Li, Xuan (2010), Bias and Variability Assumptions in a Bayesian Analysis of Climate Models.PhD Dissertation Proposal, Department of Statistics and Operations Research, University of NorthCarolina, Chapel Hill.

Smith, R.L. and Wehner, M. (2011), A Threshold-based Extreme Value Approach to EventAttribution. In preparation.

5.5 Other activity of this working group

SAMSI postdoc Jun Zhang worked on a penalized likelihood approach for estimating return valuesfrom multiple precipitation stations, where the penalty function is designed to ensure smoothnessin the variation of the GEV coefficients over space. This is an alternative to the popular Bayesianhierarchical approach to problems of this nature.

Montse Fuentes presented a talk on an alternative “nonparametric” approach to joint distribu-tions of spatial extremes, avoiding the assumptions concerned with max-stable processes:

152

Anthony Davison gave an invited talk at the Climate workshop and he, Juliette Blanchet andMathieu Ribatet also contributed to working group discussions a number of times by Webex linkfrom Switzerland.

References:

Blanchet, J. and Davison, A. C. (2011) Spatial modelling of extreme snow depth. Annals ofApplied Statistics, to appear.

Davison, A. C., Padoan, S. and Ribatet, M. (2010) Statistical modelling of spatial extremes.Under revision for Statistical Science.

Davison, A. C. and Gholamrezaee, M. M. (2010) Geostatistics of extremes. About to be resub-mitted.

Fuentes, M., Henry, J. and Reich, B. (2011), Nonparametric Spatial Models for Extremes:Application to Extreme Temperature Data. Extremes, accepted.

Zhang, J., and Zheng, W., (2011), A Penalized Likelihood Approach for m-year precipitationreturn values estimation. Submitted to Journal of Agricultural, Biological, and EnvironmentalStatistics.

6 WG6: Fundamentals of Spatial Modeling (Leader: DongchuSun, University of Missouri)

6.1 participants

Renato Assuncao, Universidade FederalSusie Bayarri, Universitat de ValenciaSudipto Banerjee, University of MinnesotaJim Berger, SAMSI/DukeJose M Bernardo, University of ValenciaHoward Chang, SAMSINoel Cressie, Ohio StateMarco Ferreira, University of MissouriAnabel Forte Deltell, Universitat de ValenciaSherry Gao, Missouri Department of ConservationVirgilio Gmez-Rubio, Bienvenidos a la Universidad de Castilla-La ManchaZhuoqiong He, Uinversity of MissouriJay Ver Hoef, NOAA, AlaskaMonica Jackson, American UniversityHannes Kazianka, Alpen Adria Universitat KlagenfurtAndrew Lawson, Medical University of South CarolinaJaeyong Lee, Soeul National UniversityMichael Levine, Purdue UniversityYe Liang, University of MissouriYajun Liu, University of MissouriDesheng Liu, Ohio State

153

Antonio Lopez Qulez, Universitat de ValenciaJim Lynch, University of South CarolinaMiguel A.Martinez-Beneito, Conselleria de Sanitat Generalitat ValencianaXiaoyi Min, University of MissouriShawn Ni, University of MissouriGarritt Page, DukeRui Paulo, Universidade Tecnica de LisboaGavino Puggioni, UNCBala Rajaratnam, StandfordBrian Reich, NCSUCuirong Ren, South Dakota StateChester Schmaltz, University of MissouriPaul Speckman, University of MissouriDongchu Sun, University of MissouriAnand Vidyashankar, CornellJianqiang Wang, NISSXiaojing Wang, DukeChang Xu, University of MissouriJun Zhang, SAMSIJing Zhang, Miami UniversityJian Zou, NISS

6.2 Presentations and Workshop

Weekly meetings took place on Wednesday, 11:00–12:45, during academic year, with irregularmeetings thereaafter. There were 12 meetings during Fall 2009; 14 meetinga during Spring 2010.Some of the meetings discussed existing literature; these led to Ongoing research with discussions.Outside experts were invited to give presentations (Sudipto Banerjee, Bala Rajaratnam), as wellas other group leaders.

The group organized a one-day Workshop on Objective Bayesian for Spatial and TemporalModels in San Antonio, TX, March 20, 2010.

6.3 Research Themes

6.3.1 Important issues in spatial and temporal models

• MCAR and Smoothed ANOVA with Spatial Effects (Sudipto Banerjee, University of Min-nesota)

• Relation between spatial models and graphical models, (Balakanapathy Rajaratnam, Stand-ford University)

• Spatio-temporal smoothing of risks based on spatial moving averages. (Miguel Angel MartnezBeneito)

• MCAR, Jointly modeling of spatial effects with given marginal, correlation between twonested and nonnested variables (Chester Schmaltz, University of Missouri)

154

• Irregular second order gaussian markov random fields (Paul Speckman, University of Mis-souri)

• Dynamic Multiscale Spatio-Temporal Models (Marco Ferreira, University of Missouri)

• Non-Gaussian Models, Zero-inflated Bayesian Spatial Models with Repeated Measurements(Jing Zhang, Miami University)

• Bayesian regression based on principal components for high dimensional data, Jaeyong Lee,Soeul National University

An Example: Irregular Second Order Gaussian Markov Random Fields (Paul Speckman)

• Gaussian Markov random fields are standard for analysis of areal data or data on a grid.

• The common models for areal data (e.g., CAR models) are first order.

• On the other hand, there are higher order priors for gridded data.

• A general method is proposed for constructing second order priors for general point-referenceddata.

• The priors approximate Gaussian processes arising as solutions to stochastic differential orpartial differential equations;

• Can have higher order smoothness properties; They are constructed to have sparse precisionmatrices;

• The new class of priors generalize higher order Gaussian MRF to arbitrary (gridded or not)data. The method is illustrated with an application to a rainfall data set.

6.3.2 Effects of spatial confounding

Core Members: Garritt Page, Duke; Yajun Liu, University of Missouri; Jaeyong Lee, Seoul NationalUniversity, Korea; Dongchu Sun, University of Missouri.

Research themes:

• For a spatially-varying error structure, there often exist (possibly unmeasured) covariatesthat also vary spatially.

• In these scenarios it is hard to distinguish the effect of the covariate from that of the residualspatial variation.

• In a iid normal error setting, it is well know that dependence between a covariate and errorproduces biased regression coefficient estimates. However, the consequences of this depen-dence in a spatial setting are not as well understood.

• It is needed to investigate the effect of spatial confounding on covariate estimation.

– the bias

155

– the variance

– the mean squared error

• Prediction

• Measurement errors

• Extension to the case of multiple (potentially unmeasured) covariates.

• Effect on the Bayes factors

6.3.3 Objective Bayesian and Fiducial Inference for spatial and temporal models

Core Members: Jose M Bernardo, University of Valencia, Spain; Jim Berger, SAMSI/Duke Uni-versity; Marco Ferreira, University of Missouri; Jan Hannig, UNC; Cuirong Ren, South DakotaState University; Jing Zhang, Miami University; Zhuoqiong He, Uinversity of Missouri; Shawn Ni,Uinversity of Missouri; Dongchu Sun, Uinversity of Missouri.

Other Participants: Victor De Oliveira, University of Texas at San Antonio; Rui Paulo, Univer-sidade Tecnica de Lisboa, Portugal; Bruno Sanso, University of California at Santa Cruz; StefanoCabras, University of Cagliari; Maria Eugenia Castellanos, Rey Carlos University, Spain; BruneroLiseo, University of Roma, Italy; Maria M. Barbieri, Universita Roma, Italy; Siva Sivaganesan,University of Cincinatti.

Research Objectives:

• In the absence of sufficiently quantifiable subjective knowledge concerning the unknown pa-rameters, it is typical to resort to objective noninformative or priors; these priors allow mostof the benefits of Bayesian analysis to be achieved, while avoiding the difficulty of obtaininga subjective prior.

• Alternatively, one could use Fiducial Analysis. The Recent work by Jan Hannig (UNC) andhis collaborators offers promising methods for spatial analysis.

6.3.4 One-day Workshop on Objective Bayesian for Spatial and Temporal Models atSan Antonio, TX, March 20–21, 2010

• The main theme is to explore appropriate prior distributions for parameters of spatial andtemporal models.

• Formal objective priors were introduced for AR (1) models by Berger and Yang (1994), andfor geo-statistical models by Berger, De Oleveira and Sanso (2001, JASA) and Paulo (2005,Annals of Statistics).

• However, the current results are mainly for situations in which there is no nugget effect andno measurement errors; in practice, both are ubiquitous.

Current and Future Work

• Group members are working on a monography summarize the work during the year

156

• To be published by Chapman and Hall

• About 10 Chapters, 5 papers on Spatial Models, one on Spatial Confounding, and 4 paperson Objective Bayesian and Fuducial Models

7 WG7: Geostatistics (Sudipto Banerjee, Veronica Berrocal, BrianReich and Alan Gelfand)

7.1 Working Group

Geostatistics: The Geostatistics working group is led by Sudipto Banerjee (UMN), Brian Reich(NCSU) and Alan E. Gelfand (Duke). This group will explore spatial and spatiotemporal modelsfor geostatistical data analysis with special emphasis on issues relating to preferential sampling.

The participants in this group include (in alphabetical order):

Sudipto Banerjee (University of Minnesota),Jarrett Barber (University of Wyoming),Francisco M Beltran (University of California Santa Cruz),Candace Berrett (Ohio State University),Veronica Berrocal (SAMSI),Avishek Chakraborty (Duke University),Howard Chang (SAMSI),David Conesa (University De Valencia, Spain),Sourish Das (SAMSI),Dipak Dey (University of Connecticut),Yiping Dou (University of British Columbia),David Dunson (Duke University),Jo Eidsvik (Norwegian University of Science and Technology),Andrew Finley (Michigan State University),Alan Gelfand (Duke University),Michele Guindani (University of New Mexico),Pritam Gupta (University of Wyoming),Sandra Hurtado Rua (University of Connecticut),Janine Illian (University of St. Andrews, U.K.),Karen Kafadar (Indiana University),Hannes Kazianka (University of Klagenfurt, Austria),David Kessler (University of North Carolina),Kenny Lopiano (University of Florida),Gabriele Martinelli (Norwegian University of Science and Technology),Amy Nail (Duke University),Orietta Nicolis (University of Bergamo, Italy),Garritt Page (Duke University),Juergen Pilz (University of Klagenfurt, Austria),Ana Rappold (EPA),Brian Reich (North Carolina State University),Huiyan Sang (Texas A& M University),

157

Alexandra Schmidt (University Federal do Rio de Janeiro, Brasil),Ron Smith (Centre for Ecology and Economics, Edinburgh, U.K.),Gunter Spoeck (University of Klagenfurt, Austria),Ingelin Steinsland (Norwegian University of Science and Technology),Jay Ver Hoef (NOAA),Linda Young (University of Florida),Chengwei Yuan (University of New Hampshire),Jing Zhang (Miami University),Zhengyuan Zhu (Iowa State University),James Zidek (University of British Columbia).

7.2 Research Activities and Findings

After the opening workshop and a series of weekly meetings, the working group identified someimportant issues for initial research. Given the large size and the broad spectrum of topics, theworking group was further subdivided into smaller subgroups, with each subgroup focusing uponone of the following topics.

• Preferential sampling and point-process modeling: (Leader: Alan Gelfand). This groupexplored hierarchical spatial and spatiotemporal models that account for stochastic depen-dencies in settings where the processes generating the data and the sampling locations aredependent. The group planned to explore underlying theoretical connections between pref-erential sampling and models for missing data as well as the connections between spatialsampling designs and preferential sampling. More generally, the group sought to understandthe nature of preferential sampling, modeling issues related to preferential sampling and theeffect of preferential sampling upon model performance and inference. Integrating point pro-cess models into hierarchical frameworks for large datasets has also been explored by thisgroup. There was substantial overlap in the activities of this subgroup and the group forpoint-processes.

• Spatial sampling designs: (Leader: Gunter Spoeck).. This working group carried outmethodological developments in spatial sampling design. In particular, this group exploreddeterministic algorithms based on spectral representations and experimental design ideas toassess the efficiency estimates that demonstrated how good sampling designs actually are usingconvex design functionals and convex optimization theory. The group also explored spatialsampling designs for non-stationary random fields having non-Gaussian skew distributionsgenerated using spectral representations of random fields and making use of the experimentaldesign theory for GLMMs. These methodologies were explored in applied contexts such asexploring the adequacy of existing pollution monitoring networks for protecting human health.

• Estimation techniques for geostatistical models: (Leader: Jo Eidsvik) This subgroup fo-cused on fast computation methods for large geostatistical models. There have been a numberof suggestions on how to solve the so-called “big-N” problem, i.e. to handle the O(n3) prob-lem of matrix factorization in large Gaussian or latent Gaussian models. One approach thatthe group explored was to combine effective modeling aspects and inference methods. Forinstance, predictive process models provide dimension-reduction while at the same time being

158

amenable to approximate Bayesian inference using Integrated Nested Laplace Approximation(INLA). Since the predictive process models introduce no additional parameters over the reg-ular (full) model formulation, it is relatively straightforward to apply INLA. The speed-upcompared with MCMC is large. At every evaluation the computational cost is reduced byusing a predictive process model, also providing large speed-ups. The statistical performanceis reduced a little for the estimates of covariance parameter, but we see small effects onestimates for regression effects and predictions.

This group also explored parallel computing in the context of geostatistical models. Parallelcomputing is getting easier on modern computers, and can be implemented on the Graphicalprocessing unit (GPU) as well as on many Central Processing Units (CPU). One could ap-proach this from a Bayesian or a frequentist viewpoint. The frequentist approach is somewhatsimilar to INLA, since an optimization routine is required. The group focused on compositelikelihood models, where the structure is such that parallelization is immediate. The varianceof the estimate can be computed from the Godambe sandwich estimate. Similar ideas, usingcomposite likelihood models and the sandwich estimate for variance, hold for prediction atunobserved sites too. This group also explored complex hierarchical models for analyzinglarge geostatistical datasets arising in forestry and the environmental sciences.

• Ozone process models:(Leader: Amy Nail). What separated the goals of this workinggroup from many ozone modeling efforts was the focus on developing statistical models ofozone that represent the chemical and physical mechanisms through which ozone is created,destroyed, and transported – that is, to do with a statistical model what is typically donewith a differential-equation-based deterministic model.

The group separated into the 5 sub-projects. The first project was led by Brian Reich andrepresents a cross-over project with the Non-Gaussian and Non-Stationary work group. Theobjective was to produce a methodological paper with application to ozone examining a newmethod for modeling the effects of covariates on the spatial covariance in a space-time model.

The purpose of the second project was to develop methodologies, with application to ozone, forexamining a nonparametric variable selection approaches that account for spatial correlation.

The third project sought to model the dispersion of NOx from multiple point sources, and tocombine such models with information about area-source emissions and transport to predictNOx concentrations in space and time. Statistical challenges included adapting determinis-tic dispersion modeling techniques (differential-equation-based or otherwise) to a statisticalmodel, tackling the change-of-support problem, performing sequential estimation while man-aging large spatial covariance matrices, and accounting for covariates in the covariance.

The group also looked at optimizing code for a dynamic linear model with spatial covariances.Since a process-based space-time model of either ozone or NOx will require the modeling oftransport, sequential estimation will ensure a fully-model-based framework. Experimenta-tions were conducted using R code to call C code to implement a dynamic linear model ofozone with spatial covariances, but this approach found computation time to be formidable(especially if this model is to be applied for the whole U.S. or eastern U.S.). Dimensionreduction approaches are currently being explored.

Note: The ozone process modeling workgroup was initiated by Amy Nail, who was employed

159

at the U.S. EPA at the time of heading up the group, but when the group separated into the5 sub-projects listed below, Brian Reich took charge of the first one.

• Numerical model validation: (Leader: Brian Reich) The numerical models subgroup fo-cused primarily on developing Bayesian spatial methods for calibrating deterministic forecaststo produce probabilistic forecasts. The work split in two directions: spatial quantile regressionand Bayesian spatial model averaging. Numerical weather forecasts are often calibrated by ad-justing the mean, and perhaps the variance, of the predictive distribution. For non-Gaussiandata such as precipitation, this can lead to underestimation of extremes. To calibrate theentire conditional distribution of precipitation given a forecast, we propose Bayesian spatialquantile regression. Quantile regression allows for a separate the relationship between theforecast and each quantile of the predictive distribution, therefore calibrating the probabilis-tic forecast at each quantile level. We show this leads to good performance in both the centerand tails of the predictive distribution.

Recognizing the several sources of uncertainty affecting the output of a numerical model,in recent years, there has been a shift from deterministic predictions to probabilistic ones,and runs of a single numerical model have been replaced by runs of ensembles of numericalmodels. Therefore, the working group also explored statistical methods to calibrate thespatial output generated by an ensemble of numerical models. We extend previous workusing Bayesian model averaging (BMA) to produce probabilistic weather forecasts. BMAassumes the predictive distribution is a mixture of density functions centered around eachbias-corrected forecast and BMA weights that depend on the past predictive performanceof each ensemble member. The proposed method is an extension of BMA in that both theweights and the parameters of each mixture component are assumed to vary in space. Thisis intuitively appealing as it is expected that the performance of each numerical model arenot constant in space, but might vary from location to location.

The working group produced a productive and hopefully long-lasting collaboration betweenIngelin Steinsland, Veronica Berrocal, and Brian Reich. The group also expects two papersto be submitted before the end of the year.

• Feedback: (Leader: Jarrett Barber). The ’feedback’ subgroup was interested in thecutting of edges in graphical models for the purpose of controlling undesirable feedbackamong model components. Focus was primarily on models of health effects and exposurein a Bayesian framework. Prototype R code was produced to begin simulation experimentsto explore feed back control. The subgroup is currently conducting extensive simulationexperiments.

The working group – and subsequently the subgroups – continued to meet on a weekly basisand focused on the above issues. The interactions produced several productive collaborationsbetween statisticians, deterministic modelers, scientists and GIS experts who embarked upon severalinteresting projects. A non-exhaustive list of publications emerging from the Geostatistics groupis provided below.

Banerjee, S. and Gelfand, A.E. (2010). Modelling spatial gradients on response surfaces. InFrontiers of Statistical Decision Making and Bayesian Analysis, eds. M.H. Chen, D.K. Dey,P. Mueller, D. Sun and K. Ye, New York: Springer

160

Berrocal, V.J., Gelfand, A.E., and Holland, D.M. (2010). A bivariate space-time downscaler underspace and time misalignment. Annals of Applied Statistics, (in press).

Luke Bornn, Gavin Shaddick and James V Zidek, The effects non-stationarity and preferentialsampling in modeling and measuring environmental fields for health effect analysis. Draftreport in preparation.

Song Cai, Xiaoli Yu, Gavin S Shaddick and James V Zidek, Preferential sampling in the monitoringblack smoke: a case study Draft report nearly completed.

Chakraborty, A., Gelfand, A.E., Wilson, A.M., Latimer, A.M. and Silander, J.A. (2010). Modelinglarge-scale species abundance with latent spatial processes. Annals of Applied Statistics (inpress).

Eidsvik, J., Shaby, B., Reich, B.J., Wheeler, M. and Niemi, J. (2010). Estimation and predictionin spatial models with block composite likelihoods using parallel computing. Submitted.

Eidsvik, J., Finley, A., Banerjee, S., and Rue, H. (2010). Approximate Bayesian inference for largespatial datasets using predictive process models. Under revision, Computational Statistics andData Analysis.

Finley, A.O., Banerjee, S. and MacFarlane, D.W. (2010). A hierarchical model for predicting forestvariables over large heterogeneous domains. Journal of the American Statistical Association106, 31–48.

Guhaniyogi, R., Finley, A.O., Banerjee, S. and Gelfand, A.E. (2010). Adaptive Gaussian predictiveprocess models for large spatial datasets. Environmetrics (in review).

Prates, M.O., Dey, D.K., Willig, M.R. and Yan, J. (2010), Intervention Analysis of HurricaneEffects on Snail Abundance in a Tropical Forest Using Long-Term Spatiotemporal Data,Journal of Agricultural, Biological, and Environmental Statistics, 1-15, doi: 10.1007/s13253-010-0039-1.

Prates, M.O., Dey, D.K., Willig, M.R., and Yan, J. (2011), Transformed Gaussian Markov RandomFields and Spatial Modeling. Submitted.

Reich, B., Eidsvik, J., Guindani, M., Nail, A. and Schmidt, A. (2010). A class of covariate-dependent spatiotemporal covariance functions. Accepted by Annals of Applied Statistics.

Reich, B. and Berrocal, V. (2010). Local Bayesian model averaging for numerical model calibra-tion. To be submitted to Environmetrics.

Spoeck, G., Zhu, Z. and Pilz, J.: Simplifying objective functions and avoiding stochastic searchalgorithms in spatial sampling design, submitted to: Stochastic Environmental Research andRisk Assessment, (2011).

Spoeck, G., Hussain I.: Spatial sampling design based on convex design ideas and using externaldrift variables for a rainfall monitoring network in Pakistan. In: Statistical Methodology,New York: Elsevier Science Inc, doi: 10.1016/j.stamet.2011.01.004, (2

161

Steinsland, I., Berrocal, V. and Reich, B. (2010). Bayesian spatial quantile regression for numericalmodel calibration. To be submitted to Journal of the Royal Statistical Society: Series C.

Wang, X., Dey, D.K. and Banerjee, S. (2010). Non-Gaussian hierarchical generalized linear geo-statistical model selection. In Frontiers of Statistical Decision Making and Bayesian Analysis,eds. M.H. Chen, D.K. Dey, P. Mueller, D. Sun and K. Ye, New York: Springer.

Xiaoli Yu, Gavin Shaddick and James V Zidek, Correcting for preferential sampling in monitoringenvironmental processes. Draft report of theory completed. Application pending.

Xiaoli Yu, Gavin Shaddick and James V Zidek, The effects of preferential sampling in environ-mental epidemiology. Draft report partially completed.

8 WG8: Point Patterns Working Group

8.1 Goals

The working group on point patterns coalesced around three primary research goals. The first(which yielded the “Mixture” subgroup) focused on novel ways of introducing covariate informa-tion into modeling intensities. Special settings here included mixture models for intensities, bothparametric and Bayesian nonparametric as well as generalized Neyman Scott processes with covari-ate information at parent level to drive locations and spreads for offspring. The second goal (whichyielded the “Practical” subgroup) was to bring to a more applied level, spatial point processeswith first and second order components (first order is a usual nonhomogeneous Poisson processform λ(s), second order is a pairwise interaction form γ(s, s′), inhibition, attraction). Issues hereagain included the introduction of covariate information into such settings with interest in a fullymodel-based implementation with associated technical questions such as integrability for the re-sulting intensity. The third goal (which yielded the “Model diagnostics” subgroup) considered anunder-explored problem for point patterns, the question of model adequacy and model comparison.For instance, how do we compare two nonhomogeneous point processes? How do we compare anonhomogeneous Poisson process with a pairwise interaction process. What happens if we adddynamics, resulting in space time point processes?

8.2 Active members

As a result of the three-pronged research agenda, three active subgroups emerged:

• “Mixture” group - Micheas, Chakraborty, Assuncao, Matteson, Gelfand

• “Practical” subgroup - Assuncao, Lopes, Waller, Illian

• “Model diagnostics” subgroup - Gomez Rubio, Chakraborty, Gelfand

8.3 More detail on the work of the “Mixture” subgroup

Considering covariates in this setting requires distinguishing spatially indexed covariates X(s) fromlabels (marks). For instance, X(s) might be elevation which might encourage more or fewer pointsbut this is apart from a mark which might label different species with associated point patterns.

162

Evidently, the issue is direction of conditioning, i.e., the covariate can be a mark at a location)yielding X|s as X(s) and {X(si)}|θ.

In this regard, Micheas (with Chakraborty and Gelfand), using λf(s; θ) form, {si}|λ, θ for theintensity add the covariate in form of discrete mark. In fact, he extends to a mixture form λf(s)where f(s) =

∑l πlfl(s|θl). In fact fl(s|θl) = BivN(µl,Σl)

Work of Chakraborty introduced covariates to explain where locations will tend to be. Here, theusual form is logλ(s) = X(s)Tβ + ω(s). However, again consider λf(s) where f(s) =

∑l πlfl(s|θl)

with fl(s|θl) = BivN(µl,Σl). So, µl provide “centers” for the intensity and πl = eX(µl)T β∑

jeX(µj)

T β.

Assuncao and Lopes developed a cluster model for point processes, driven by covariates but nota mixture model, rather with interacting neighbors.

Papers:

Assuncao, R. and Lopes, D. (2010), Visualizing Marked Spatial and Origin-destination PointPatterns With Dynamically Linked Windows. Journal of Graphical and Computational Statistics,accepted.

Chakraborty, A., Gelfand, A.E., Wilson, A. M., Latimer A. M. and Silander, J.A. 2011 Pointpattern modeling for degraded presence-only data over large regions. To appear, Journal of RoyalStatistical Society, Series C

Chakraborty, A. and Gelfand, A. E. 2011 Mixture modeling for spatial point patterns withlocation-specific covariates, manuscript in preparation

Matteson, D.S. and Micheas, A.C., A Finite Mixture Model for Spatio-temporal Poisson Process.In Preparation.

8.4 More detail on the work of the “Practical” subgroup

This group worked on the paper listed below. The context is temporal point process with spatialmarks in the form of binary indicators on a lattice. The application is to infection propagation. Avery high level of missingness arises; only 6 out of 30 weeks are observed, necessitating a challengingdata augmentation approach.

Jean Vaillant, Gavino Puggioni, Lance A Waller, and Jean Daugrois (2011), A spatio-temporalanalysis of the spread of sugar cane yellow leaf virus. Journal of Time Series Analysis, to appear.

8.5 More detail on the work of the “Model diagnostics” subgroup

Again, we assert that model adequacy and model comparison are underdeveloped for point patterndata. What exists follows the lead of Baddeley, with follow on work from Illian. Here, the widelyused Ripley - K function is not what is needed because it is only applicable under complete spatialrandomness, not a model of interest.

In comparing intensities we can envision various naive ideas such as binning and counts. Wecan imagine some version of AIC here. Also possible is discretized integrated squared differencebetween model intensity and empirical intensity (say kernel intensity estimate). Another option iscross-validation, thinning the point pattern to provide a hold-out sample.

163

Through normalization, there is a density associated with a point process realization, f(S; θ).

For homogeneous Poisson process it is f(S) = αλn(S).

For a nonhomogeneous Poisson process it is f(S) = αΠiλ(si).

For pairwise interaction process, f(S) = αΠiλ(si)Πi,jγ(si, sj).

From these, we can define the Papangelou conditional intensity: φ(u; S) = f(S ∪ u)/f(S).

In turn, we can create residuals (following ideas of Baddeley).

In particular, there is a realized - I(B; θ) = n(S ∩ B) −∫B φ(u; S; θ)du. Also, we have an

estimated — R(B; θ) = n(S ∩ B) −∫B φ(u; S; θ)du. These residuals provide a different path for

model assessment.

8.6 Ongoing research problems

Finally, we mention some continuing open research ideas. In the point pattern setting, how can weextend AIC to DIC with spatial random effects; how would we do the computation? From above,can we implement a Bayesian analysis of conditional intensity, hence, of realized residuals? Suchwork would begin with Cox processes first, then proceed to interaction processes. There seems tobe little experience in the literature with thinning and cross-validation ideas.

A novel way to model space time point patterns is through integro difference equations (space-time diffusions) and very recently through integral projection models (population demography).There is promising modeling opportunity here with considerable computational challenge.

There is also potential to revisit recent preferential sampling work (Diggle et al.) with regardto environmental exposure, raising the question of the effect of sampling design as well as where toput covariates in this investigation. There remain many open questions here.

Finally, Gomez Rubio has prepared a grant submission to study point patterns arising frompatients with respiratory and circulatory illness in Madrid.

9 WG9: Non-stationary and Non-Gaussian Processes

This working group is led by David Dunson (Duke), Jo Eidsvik (NTNU) and Montse Fuentes(NCSU). The objective is to explore non-stationarity and non-Gaussianity for spatial and spa-tiotemporal models.

The most active participants in this group include (in alphabetical order): Veronica Berrocal(SAMSI), Kate Calder (Ohio State), David Dunson (Duke), Jo Eidsvik (NTNU), Montse Fuentes(NCSU), Michele Guindani (U of New Mexico), Scott Holan (U of Missouri), Katja Ickstadt (Tech-nical University of Dortmund), Jaeyong Lee (Seoul U), Linyuan Li (U of New Hampshire), DavidKessler (UNC), Amy Nail (Duke), Orietta Nicolis (U of Bergamo), Brian Reich (NCSU), AlexandraSchmidt (U Federal do Rio de Janeiro), Ben Shaby (SAMSI), Michael Stein (U of Chicago), RobertWolpert (Duke).

164

9.1 Research

After the opening workshop the group had a series of weekly meetings covering a broad spectrumof topics. We learnt a lot during these meetings. They let us define common interests and set thestage for collaborations. In the late Fall the working group decided to split in 3 sub-groups. In theSpring the working group only had a handfull of joint meetings. The 3 sub-groups were defined asfollows:

• Spectral domain methods:

Spectral methods are powerful tools to introduce new classes of covariance functions. The sub-group proposes new models for nonstationarity by having a spectrum that is space-dependent.By introducing a spatial filtered periodogram we obtain a nonstationary nonparametric esti-mate of the covariance.

To every stationary Z(s) there can be assigned a process Y (ω) with orthogonal increments,such that we have for each fixed s the spectral representation:

Z(s) =

∫R2eis

TωdY (ω) (1)

The Y process is called the spectral process associated with a stationary process Z.

The periodogram In estimates the spectral density f of a process Z observed in a grid n×n,

In(ω) = (2πn)(−2)

∣∣∣∣∣∣∑j∈J

Z(j)e−iωT j

∣∣∣∣∣∣2

The sub-group looks at approaches for the spectral analysis of non-stationary spatial pro-cesses, Z(x), which is based on the concept of spatial spectra, this means spectral functionswhich are space dependent, fx(ω).

The spectral representation of Z(x) is always interpreted as its representation in the form ofsuperposition of sine and cosine waves of different frequencies ω

Z(x) =

∫R2

exp(ixTω)φx(ω)dY (ω)

We propose a nonparametric estimate of the spectral density. We first define Jx(ω0),

Jx(ω0) = ∆x1∑

u1=x1−n1

x2∑u2=x2−n2

g(∆u)Z(∆(x− u))

exp{−i∆(x− u)Tω0},

where s = (s1, s2), and {g(u)} is a filter. We refer to |Jx(ω)|2 as the spatial periodogram ata location x for a frequency ω. |Jx(ω)|2 is an approximately unbiased estimate of fx(ω), butas its variance may be shown to be independent of N it will not be a very useful estimate

165

in practice. Then, we estimate fx(ω) by “smoothing” the values of |Jx(ω)|2 over neighboringvalues of x.

The presented methodology requires gridded data, but it could be also extended to irregularlyspaced data, by using a filtered spline representation across space. This is work in progress.

There is need to study of the properties of the proposed nonparametric estimates using fixeddomain asymptotics. This is work in progress.

These methods could be generalized to space and time, and they would allow for nonsepa-rability. A reasonable approach would be semiparametric in space-time: with a parametricmodel for short distances in space, and a spline representation for large distances. This iswork in progress.

• Mixture models:

The sub-group builds on much work from Dirichlet processes, stick breaking processes, andother non-parametric models that have become popular because of large flexibility and com-putational tractability. Dirichlet mixture models are nonGaussian and nonstationary, butwithout replicates it is difficult to determine if we have lack of stationarity or lack of nor-mality. However, the Dirichlet process (DP)-type of mixture models would allow for both.Mixture models are flexible enough to characterize complex tail behavior, and can be used tomodel extremes (i.e a DP copula)

The stick-breaking prior for F is the potentially infinite mixture Fd=∑mi=1 piδ(θi), where

pi are the mixture probabilities and δ(θi), is the Dirac distribution with point mass at θi.The mixture probabilities “break the stick” into m pieces, so the sum of the pieces is one,∑mi=1 pi = 1. p1 = V1, and the subsequent probabilities are pi = Vi

∏i−1j=1(1 − Vj), where

1−∑i−1j=1 pj =

∏i−1j=1(1− Vj) is the probability not accounted by the first i− 1 components.

The sub-group looks at covariance regression, where a covariance matrix is in the support ofthe space of covariance stochastic processes. Let Σ(s) = Λ(s)Λ(s) + Σ0 induced through

y(si) = Λ(si)ηi + εi, ηi ∼ Nk(0, Ik), εi ∼ Np(0,Σ0)

We can let Λ(si) = Θξ(si) = Θ ∼ p×L = coefficients relating predictor-dependent factor load-ings matrix to dictionary functions comprising the L×k matrix ξ(s). This model for the non-parametric covariance model can also be modified to build upon a latent segmentation processproviding the label indeces at different locations. Then, y(si) =

∑k I[Z(si) ∈ Bk]ηk(si). The

ηk are Gaussian processes, while the labeling can take on many forms. This is related to workon latent stick breaking processes. There are working papers in these directions.

The nonparametric Bayesian spatial models should be extended to space and time to allow fornonseparable dependence. These models should also be extended to a multivariate setting,to allow for nonstationary cross-dependence.

• Covariates in the covariance:

166

This sub-group looks at non-stationarity models constructed by adding covariates in thecovariance function. Typically, the covariance structure is regarded as a nuisance that we donot learn about, except specifying the scale and correlation range in some way. The meanterm, on the other hand, is constructed by sophisticated modeling from available covariates.For prediction purposes, the covariates are important only through the predictor, not theprediction variability. We envision better prediction and more useful prediction distributionsby letting the covariance also depend on covariates.

One project that has been regarded in the sub-group is mixtures of Gaussian processes withdifferent scale and range, but weighted by covariates. Define the process as

y(si) =∑k

w(x(si);αk)ηk(si), ηk(si) ∼ N(0,Σk)

, where the weights w(x(si);αk) of the mixture varies as a function of covariates x(si). Theprocesses are parameterized by different range and variance in Σk. For instance, when there ismuch wind, the weight could be highest for a process with large correlation range and largervariance. When there is litte wind, the weight could be highest for a process with smallcorrelation range and less variance. The model is similar to the spatially varying coefficientmodel and other models using a spatially varying weights for a mixture of processes. The sub-group has submitted a paper on this topic (Reich, Eidsvik, Guindani, Nail and Schmidt, 2010,A class of covariate-dependent spatiotemporal covariance functions), applying the resultingmodel to daily ozone levels in the South-Eastern US. It is currently under revision.

The suggested mixture models for the covariance structures above naturally extends to includecovariates in the covariance structure. For instance, the factors could depend on covariates.Further, it is straightforward to let the prior mean of the latent segmentation process Z(si)depend on covariates. This would pull the labeling process to a high/low index in parts of thespace, and would naturally exhibit in the covariance for locations where the labels becomethe same or not.

The sub-groups (or parts of sub-groups) continue to meet to finish ongoing work.

9.2 Deliverables

• Sharing knowledge.

• Papers submitted or in progress.

• Several new collaborations.

10 Other Products of the Program

Submitted grant proposal:

Network for Statistics in Atmospheric and Oceanic Sciences (PI: M. Fuentes, NCSU; co-PIs:M. Stein, University of Chicago and P. Guttorp, University of Washington. Submitted to Program

167

Solicitation NSF 10-584 on Research Networks in the Mathematical Sciences, National ScienceFoundation, $5 million.

Papers:

Corberan, A. and Lawson, A. (2011), Conditional Predictive Inference for On-line Surveillanceof Spatial Disease Incidence. Submitted to Statistics in Medicine.

Dennis, Fox, Fuentes, Gilliland, Hanna, Hogrete, Irwin, Rao, Scheffe, Schere, Steyn, Venkatram.(2010). A framework for evaluating regional-scale numerical photochemical modeling systems.Environmental Fluid Mechanics Journal, in press.

Fuentes, M., and Banarjee, S. (2011). Bayesian Modeling for Large Spatial Datasets. WIREsComputational Statistics, accepted

Fuentes, M. and Foley, K. (2011). Ensembles methods, in upcoming Encyclopedia of Environ-metrics.

Fuentes, M., Xi, B. and Cleveland, W. (2010) Trellis Display for Modeling Data from DesignedExperiments. Journal of Statistical Analysis And Data Mining, in press.

Reich BJ, Fuentes M. Nonparametric Bayesian models for a spatial covariance. StatisticalMethodology. Accepted

Reich, B., Fuentes, M. and Dunson, D. (2011). Bayesian spatial quantile regression. JASACase Studies, accepted.

Reich BJ, Kalendra E, Storlie CB, Bondell HD, Fuentes M., Variable selection for high-dimensional Bayesian density estimation: Application to human exposure simulation. AppliedStatistics, accepted.

Zhang, J., Clayton, M. K., and Townsend, P. A., (2011), Missing Data and Functional Concur-rent Linear Model for Spatial Images, in preparation.

Zhang, J., Clayton, M. K., and Townsend, P. A., (2010), Functional Concurrent Linear Modelfor Spatial Images, accepted by Journal of Agricultural, Biological, and Environmental Statistics.

168

Final Program ReportSemiparametric Bayesian Inference: Applications in Pharmacokinetics and

PharmacodynamicsSummer 2010

1 Objectives

Pharmacokinetics (PK) is the part of pharmacology that is concerned with what happens to a drug afterit enters the body. One typically characterizes a drug’s PK through studies of the time course of drugconcentrations measured over time. Studies of a drug’s pharmacodynamics (PD) explore the reaction ofthe body to the drug. PD studies often include measurements of a drug’s PK. Pharmacogenetics (PGx)studies the genetic variation that explains person-to-person variation in a drug’s PK or determines differingresponse to drugs. Understanding the PK, PD, and PGx of a drug is important for evaluating efficacy anddetermining how best to use the drug clinically.

Because PK and PD studies often include multiple measurements from the same patient over time, anal-yses of these studies often include models that allow accounting for within-subject and between-subjectvariation. Such statistical models are sometimes called population models or hierarchical models. Hierar-chical models have allowed great progress in statistical inference in many application areas. One reasonthat hierarchical models are popular for PK and PD studies is that they allow borrowing of strength acrossa patient population, such as when one may have fewer measurements for some patients than for others.These models have allowed investigators to learn about important sources of variation in drug absorption,disposition, metabolism, and excretion, in some cases allowing clinicians to tailor drug therapy to individuals.

Typically, population PK and PD models characterize between-subject variability with a normal orlognormal distribution. While such distributions may be appropriate in some settings, it is not always thecase that this model characterizes very well the heterogeneity in a population. For example, there may besubpopulations of extreme values that lie farther away from the central population average than a normaldistribution would allow. Newer Bayesian nonparametric population models and semi-parametric modelsoffer the promise of individualizing therapy and discovering subgroups among patients even further, byfreeing modelers from restrictive assumptions about underlying distributions of key parameters across thepopulation.

The purpose of this program was to bring together a mix of experts in PK and PD modeling, non-parametric Bayesian inference, and computation. The motivation for this program was a recent surge inresearch in Bayesian nonparametric (BNP) statistics. BNP has traditionally focused on basic inferentialproblems, such as density estimation, survival analysis, and more recently mixed-effects models. Now, how-ever, BNP statistical methodology has sufficiently matured to allow addressing scientific research problemswith all the complexities and complications that arise in real applications.

A workshop during the first week of the program served to introduce participants from the BNP researchcommunity with current problems and challenges in PK andPD modeling, as well as providing PK and PDmodelers an introduction to BNP methodology. Tutorial reviews on Monday were important to introducenotation and to lay out basic problems and challenges. Tuesday through Friday mornings focused on severalareas of concentration, with more diverse talks following in the afternoon. Focus areas included priorelictation, joint modeling of PK and PD, dependent Dirichlet process models, and clustering. On Fridayafternoon, we formed working groups for the second week.

Summary: The program successfully brought together researchers from different areas to initiate newprojects that will hopefully lead to exciting advances in PK & PD research, driven by state-of-the-art BNPmodels. The program also served the purpose of establishing PK, PD, and PGx as natural application areasfor BNP.

1

2 Workshop

The workshop was held July 12-16, 2010.The first day was dedicated to tutorial introductions. The program started with an overview of current

research, opportunities and challenges for BNP in PK & PD, presented by Gary Rosner. A first tutorialon BNP was prepared by Wes Johnson and introduced basic notions of BNP. The tutorial defined the mostpopular models, including Dirichlet process (DP), DP mixture, and Polya tree (PT) models. The tutorialfocused also on the clustering structure that is implicitely defined by DP priors. This tutorial was followedby a second BNP tutorial by David Dunson, who showed some successful examples of biomedical researchproblems approached by BNP. The applications were not directly PK & PD, but the PK & PD modelers inthe audience could recognize opportunities for similar approaches in PK & PD.

In the afternoon, there were two tutorials on PK & PD modeling, presented by David D’Argenio, Univer-sity of Southern California, and Paolo Vicini, Pfizer Inc. Both tutorials provided an excellent introductionto research problems in PK & PD and were very well received by the BNP researchers in the audience. .

Tuesday through Thursday and Friday morning featured researchers discussing their work on BNPmethodology and PK, PD, and PGx. Some examples are prior elictation, joint modeling of PK and PD, de-velopment of physiologically-based PD models, computational challenges in PK and PD models, dependentDirichlet process models, and clustering.

On Friday afternoon, we formed working groups for the second week. The following groups were formed

W1: Model Fit and Comparison;

W2: Joint PK & PD Modeling;

W3: Nonparametric Bayes & Clustering for Population PK &PD;

W4: PK & PD Simulation for Design and Dose Individualization; and

W5: Pharmacogenomics and Networks.

3 Working Groups and Outcomes

W1: Report on Model Fit and Comparison

Keith O’Rourke served as working group leader.A brief summary of the deliberations and ongoing work of the SAMSI Summer PkPd Programs Model

Fit and Model Comparison working group (PkPdFit).

Monday. The members of the group introduced themselves and provided a very brief overview of theirbackground and interests. There were initially 16 members that joined this working group that were laterjoined by 3 additional members (list below). Most were from academic statistics or biostatistics depart-ments, with a few from either a pharmaceutical firm, pharmaceutical consulting firm or regulatory authority.Additionally, Jim Berger (Duke University), Jayanta Ghosh (Purdue University) and Robert Wolpert (DukeUniversity) attended one or more of the meetings. The group was lead by Keith ORourke (Health Canada).

After discussions of the major interests of the working group the group recognized that these themes werenot fully reflected in its current name - such as the intrinsic interactions between model Fitting, Criticism,Understanding and the pragmatic reality of usually needing to Keep (select) just one of the models. Thegroup, however, decided to just stick with the original name.

There was a brief discussion of some philosophy of science and vocabulary surrounding the identifiedthemes largely along the lines that all models are wrong (Fit) but can always be made less wrong (Crit-icism) and it was suggested that it would be better to speak of there being less or least wrong models(Understanding) rather than better or best models but some wished to stick (Keep) with more commonlyused statistical terms.

2

Most of the discussion was centered on the exact nature of Pk models and the Pk scientific process andespecially the current Pk community (the kinds of experiments they do, the statistical models they currentlyuse, why these are used or preferred and trying to anticipate opportunities to enable the Pk community toadopt different models and approaches that would likely provide highly visibly improvements). This waslargely lead by Peter Thall, drawing on his brief discussions about the nature of Pk models/communitieswith Paolo Vincini (Pfizer), at the end of the first week of the Summer PkPd program.

There was general agreement that there was a considerable amount of complexity, perhaps most visiblydescribed as a sea of parameters to swim through with some lurking sea monsters; such as the “Greekletter fallacy” (i.e. β changing its meaning in a simple multiple regression model when other covariatesare added), aliasing between parameters (i.e. parameters that could biologically cancel each other out)and both biological and mathematic constraints that might be very subtle but extremely important in Pkapplications. Pk models need to well represent physical/biological processes “what goes in must come out”and only highly dependent constrained life is actually possible - so priors, functions of covariates, etc. mustadequately reflect these realities. Whereas all models are wrong - in Pk there are many subtle but criticalaspects that can only be a little bit wrong. Additionally, as the data is very sparse in many Pk applications,we can not depend on the data to drag us away from faulty priors.

There was some concern raised about a possible tension between individual and population parame-ters/models and a possible reluctance in the Pk community to pragmatically borrow information in order todo any hierarchical modeling. This was perhaps best expressed in a rhetorical question that was repeatedlyraised “they can’t really want a different model for each individual rat or even a different model within anindividual rat at different instances of time?”

At the end of the meeting there seemed to be a general agreement that there was a need for some concretePk models and data sets for the group to utilize in working their interests, concerns and ideas through as agroup. Russell Barbare (North Carolina State University) offered to try to obtain some data and present itat the next meeting.

Tuesday. The meeting was opened with a short 15 minute informal presentation by Keith ORourke (HealthCanada) on a proposed general or default method for the plotting of distinguishable prior and data contri-butions in any complex statistical model. It was entitled “Painting inference pictures” and was based onobtaining marginal priors and marginal “almost individual observation” likelihoods for different parametersof interest or focus. The motivation was to clearly display - as is almost always done in the meta-analysisof separate study estimates - the consistency of the separate estimates and how they add together in thecombined analysis to provide the overall estimates. And especially to do this, directly in the parameter spaceof most interest, so that contributions are displayed in terms of what is of direct interest or convenience. Itwas also suggested as a general diagnostic plot for model fit in complex models. Peter Thall suggested thatmore than one parameter be displayed utilizing high dimensional graphical displays. There seemed to be thesense that it needed tried out for some concrete realistic Pk models and that Jim Berger (Duke University)suggest that calibration of the plots would be important.

Russel Barbare (North Carolina State University) then presented an available data set he had obtainedand anticipated some basic Pk models that might be considered when analyzing this data. There was adiscussion on simple one compartment versus two compartment Pk models with attendant parameterizationsin terms of rates and volume parameters (often called micro parameters) versus exponential fit parameters(often called macro parameters). This lead to the identification of a K parameter space (for the microparameters) versus a Beta parameter space (for the macro parameters). Many questions then arose aboutthe connections between them and the merits of these two apparently different spaces. Where can Pk peopleexpress priors most credibly? Where do they especially want to see the posteriors in the end? Are thereone to one functions between them (i.e. are they just reparameterizations)? Are there valid reasons for twospaces, or is just an historical artefact of previous numerical challenges?

After this, there was some discussion regarding the need to arrive at certain functions as being the modelto be fit (e.g. functions for specifying the expected values of observations assumed to be Normal distributed)through the solving differential equations. Succinctly - how to get the likelihood or data model in closed

3

form. As it might only be possible to solve the differential equations numerically, discussions of exactlyhow to implement this in MCMC algorithms also arose. As for more complex Pk models than just twocompartment models (up to 20 and beyond), and the possibility of different clusters within an experiment(i.e. a mixture of say two and three compartment models) a proposal for a simple mixture of say two modelsas a starting point was made by Mike Escobar (University of Toronto). In contrast to this, a more generalmodel that allows clusters through parameter value settings after some selection of a least wrong model froma well motivated set of about half a dozen or so models was proposed by Steve MacEachern (Ohio StateUniversity). Both were then briefly discussed.

There was then general acknowledgement that the group needed a simple but useful set of concrete Pkmodels and data sets to work with for the next meeting. Paolo Vincini (Pfizer) was sought to providethis kind of input. After the meeting, Susie Bayarri (University of Valencia) reiterated the concern aboutborrowing information between subjects with the question Does it have scientific sense to have a separate,entirely different scientific model for each individual?

Wednesday. Paolo Vincini (Pfizer) gave an invited presentation of some simple concrete models and theirconnections between the K space and Beta space (all being one to one functions in the specific models hepresented). This lead to considerable discussion of parameterizations, various re-parameterizations and allkinds of seemingly different and possible misleading notations is these models. However, there seemed to bea consensus that reasonably well motivated hierarchical models (that borrow information) were being usedand could be improved on in numerous ways. He later provided some helpful online Pk resources and links tosome well studied Pk examples and stylized statistical analyses that the group could utilize. There seemedto emerge a default preference for working in the K space for both priors and posteriors as well as for thespecification and fitting, criticising, understanding and selection of models needed to transform the priorsinto posteriors if possible. Insurmountable challenges might arise that prevent this given the way the Pkdata is collected and certain model specifications. There again was a contrast of model selection betweenbiological pieces (i.e. two versus three component models) the simple mixture model - versus parametervalues connecting the biological pieces (i.e. setting k23 and k 32 rate parameters to zero via spike and slab)the well selected sufficiently flexible model.

Concerns again arose about having to get the likelihoods numerically (via numerical solution of differentialequations) and anticipating MCMC implementation challenges because of this. Just before the meeting, MikeEscobar had met with William Gillespie (Metrum Instruments, who was participating in the Simulation forDesign and Dose Individualization working group) to get a sense of what WinBugs Pk models he had beensuccessful in implementing. At the end of the meeting there was an agreement to work through some simplemodels with standard but well used data sets as step one with some of the more seasoned Bayes NPershopefully getting an initial start on Bayes NP models.

Thursday. Mike Escobar outlined his progress in getting a set of Pk working examples in WinBugs. Aftera quick overview of WinBugs code specially modified by William Gillespie (Metrum Instruments) that isavailable in a package that can be downloaded. He commented on the new box that William had programmedessentially dealt with the numerical differential equation solving needed to get the likelihoods. Essentially,the group was given a first a glimpse of a set of working Pk models that could be put to use for parametricBayesian Pk modeling and extended to various Non-Parametric models. Attention was then drawn to a Pkmodel example provided by Russel Barbare where the m:n function from K space to Beta space was not 1:1(m > n), providing another reason to try and work in the K space if at all possible.

The group then sketched out a long list of existing goodness of fit and model comparison methods inthe statistical literature, broken into Goodness of Fit (GOF), model comparison (MC) measures (e.g. BIC,AIC, DIC), posterior predictive checking, cross validation and plotting. Jim Berger pointed out that all theGOF and MC had problems for random effects models and especially non-parametric random effects modelsthe group hopes to work with. This was confirmed by David Dunson (Duke University), even for standardNon-Parametric random effects models in much simpler application areas. Some of the reasons for this wereoutlined by Steven MacEachern (Ohio State University) and Jayanta Ghosh (Purdue University). Thanis

4

Kottas (University of California, Santa Cruz) mentioned some posterior predictive checking one could “tryand make do with” and David Dunson (Duke University) confirmed that his group so far had only usedcross-validation/ posterior predictive checking methods as a way to try to address model fit issues. At thispoint the group realized that they had, at last come to the sought after insurmountable opportunities andopen research questions seemed to be occurring to most.

At this point, some members of the group have started on some new approaches to model fit assessmentand hopefully with share their progress with the rest of the members, if and when appropriate. We arehoping to share some working examples amongst the group, hopefully initially in WinBugs, initially usingparametric approaches where the Pk modelling issues can be more clearly explicated, working through variousnon-parametric models of interest. But perhaps because it is easier, if not more enjoyable, to criticallyevaluate models others have lovingly and endlessly slaved away to create, many in the FIT group would lovethe opportunity to criticize (constructively) other groups Pk and even Pd models. It would be most helpfulto us and hopefully others!

Keith ORourke (Health Canada) has offered to try and plot out these models. As for offering up ourwork for consideration and criticism of others, Keith ORourke (Health Canada) has offered to circulate adraft of the mathematical basis for the elements in the “Painting inference plots”. A version of this draft,very kindly was reviewed after the Tuesday meeting by Jayanta Ghosh (Purdue University), and their hasbeen (will be) a subsequent addition of some clarification of how it can be related to Laplace approximations,as suggested by Jim Berger (Duke University) on Thursday.

W2: Report on Joint PK & PD Modeling

Michele Guindani, University of Texas M.D. Anderson Cancer Center, served as working group leader.

Monday. Paolo Vicini has connected remotely to give a brief presentation of PK and PD models andpresent the case for joint PK/PD modeling versus separate estimation. The discussion has covered therelevant features of common PK and PD models, with particular attention to the inferential interest ofthe investigators. In the last part of his presentation he explained how covariate information is usuallyintroduced in the modeling framework. Afterward, the on-site participants discussed and reviewed some ofthe issues presented by Paolo Vicini, as well as analyzed in more detail the differences between the modelspresented by Paolo Vicini and the model presented by Michele Guindani during the SAMSI opening week.The working group discussed how joint PK/PD modeling can be pursued by considering the vector of PKand PD parameters as one joint vector, generated from a random effects distribution with a non-parametricBayesian prior, for example a DP prior.

The features and limitations of such non-parametric Bayesian approaches were thoroughly discussed, bothfrom a computational and methodological perspective. Some participants briefly mentioned the possibilityto model the PK and PD vectors separately, or enforce subclustering of the vector components by means ofsome extension of the DP.

Tuesday. After summarizing the discussion from Monday, participants continued the discussion of possibleBayesian NP models for modeling the PK and PD parameters jointly. In particular, the working groupconsidered the possibility of separate non-parametric Bayesian models for PK and PD parameters. Thepossibility of specifying the DP prior for the PD parameters as a dependent DP, correlated with the randomeffects distribution for the PK parameters, was discussed. The working group considered the feasibilityof local partitions models to enforce subset clustering in the joint parameters’ vector. References to theproblem of differential weighting of the PK and PD has also been made. The problem had been hintedat by Paolo Vicini in our discussn yesterday, and it’s at the core of the paper by David Lunn, NickyBest, David Spiegelhalter, Gordon Graham and Beat Neuenschwander, ”Combining MCMC with sequentialPKPD modeling”, J. Pharmaokinetic and Pharmacodyn 2009. The group has then started discussing theintroduction of covariates in the model. We have reviewed the paper by David Lunn, ”Automated covariateselection and Bayesian model averaging in population PK/PD models”, J Pharmacokinet Pharmacodyn

5

(2008) 35:85100, as well as the paper by Wakefield J, Bennett J (1996) The Bayesian modeling of covariatesfor population pharmacokinetic models. J Am Statist Ass 91:917927. We have concluded with some hintsat the possible use of sequential MCMC in the inference of mixture of DP models.

Wednesday. Steven MacEachern has been invited to join our discussion, as some had expressed the desireto discuss more the differences among diverse modeling of Dependent Dirichlet process: single p (only theatoms are covariate dependent stochastic processes), single-theta (only the weights are covariate dependent)or full DDP (both weights and atoms may vary according to the covariate value). Then, we have discusseda mail by Paolo Vicini on different ways to include a dynamic component in the PK and PD parameters tobetter catch inter-occasion variability. The meeting concluded with a discussion of David Dhal’s distancebased MCMC scheme.

W3: Report on Nonparametric Bayes & Clustering for Population PK/PD

Siva Sivaganesan, University of Cincinnati, served as working group leader.The PK/PD Cluster Working Group met daily at 11am. Working group activities included presentations

and discussion of issues relating to the use of NPB clustering in PK modeling. This included some specificprojects, as well as general directions for potential projects. The following gives some specifics of the activitiesof the working group.

Monday. J.K. Ghosh gave a talk on the fundamentals of the Drichlet Process prior. Prof. Ghosh dis-cussedthe support of the DP process prior and its ability to identify mixing distributions generated from anyprobabilistic mechanism, a related consistency results, the relationship between DP prior based clusteringand K-Means clustering, and how DP with MCMC leads to clustering. Regarding the application of DPpriors in population PK modeling, he referred to Donoho’s six difficult curves and how they served as testingground for new density estimation methods, and suggested a similar approach – find some typical difficultPK data sets and evaluate DP modeling with those curves. He also suggested using David Dahl’s approachof using prior information with DP prior, and using SDE applied to DP.

Tuesday. Lili Ding gave a talk on Bayesian Semiparametric PopulationPharmacokinetic Models, in whichshe gave an overview of population PK modeling, use of semi-parametric models for population PK modeling,using DP process prior, and limitations of DP priors in capturing certain non-standard structure in covariateseffect. She pointed out that the use of dependent DP priors may be promising to capture these effects.. Therewas discussion on the choice of modling approach DDP and Conditional DP approaches were discussed.

Wednesday. JuHee Lee spoke on ”Local-Mass Preserving Prior Distributions for Nonparametric BayesianModels” In this talk Ju Hee gave an approach to assign a robust prior for DP mass parameter and the scaleparameter in the base measure. This approach was able to fit outliers as well as the center of the data betterthan standard DP prior.

This talk was followed by a brief discussion of the paper Cluster Analysis: An Alternative Method forCovariate Selection in Population Pharmacokinetic Modeling that was given by Paolo Vicini.

Thursday. We continued discussion on the above paper, which uses hierarchical (different nomenclaturehere) clustering to select covariates, and do modeling separately. It was felt that NPB can do the clusteringof covariates and the PK modeling simultaneously, and pick a specific cluster a posteriori, for use in drugguidelines. Assigning new patients to a cluster was discussed. There was discussion on the need to come upwith a simple rule, or guidelines on the use of the drug being studied, based on the important covariates,and it was suggested this could be accomplished even with a complex underlying PK/PD model.

Addressing model mis-specification using computer models and/or NPB methods was also discussed.Joinly, Athanasios Kottas, Mahlet Tadesse, Michele Guindani, and David Dahl, have started brainstorm-

ing about applying existing dependent nonparametric prior models (and possibly developing new priors) for

6

clustering subjects for which more than one source of data is available – one possible application is to estimateseparate, but related, clustering of the subject specific PK and PD parameters in a joint PK/PD model; itis also likely that the ideas and issues discussed will lead to more participants taking up new projects basedon the working group discussion and, more generally, from the experience of the whole workshop.

References. Some discussions in this working group were based on

http://www.springerlink.com/content/y0v14362r8j83274/

http://www.nature.com/clpt/journal/v83/n1/full/6100259a.html

Davidian, M. An Introduction to nonlinear mixed effects models and PK/PD analysis (ASA Biopharma-ceutical Section webinar, April 2010)

Wakefield, J. and Walker, S. (1997). Bayesian nonparametric population models: formulation and compar-ison with likelihood approaches. Journal of Pharmacokinetics and biopharmaceutics, 25(2):235-253.

De Iorio, M., Mueller, P., Rosner, G. and MacEachern, SN. (2004). An ANOVA model for dependentrandom measures. JASA, 99(465):205-215.

Rosner, G., Mueller, P., Tang, F., Madden, T. and Andersson, B. (2004). Dose individualization forhigh-dose anti-cancer chemotherapy. Advanced Methods of Pharmacokinetic and PharmacodynamicSystems Analysis Volume 3.

W4: Report on PK/PD Simulation for Design and Dose Individualization

Gary Rosner, Johns Hopkins University, served as working group leader.On Monday, July 19, 2010, the PK and PD simulation and Dose Individualization working group began

by discussing what we might try to accomplish during the second week of this SAMSI workshop/conference.We discussed the use of PK and PD models in clinical trial simulation. We reviewed terminology in drugdevelopment, particularly phase I, II, III, and IV clinical trials.

We then reviewed Bill Gillespie’s slides from his presentation during the first week of the workshop. Inthat talk, Bill presented the steps one might go through when carrying out trial simulation (including PKand PD prediction) to design a phase II study. The group decided to put together the pieces to carry out atrial simulation based on a joint PK and PD model.

We looked at potential decision paths as possible things for us to look at in our trial simulation.

• Different treatment regimens• Different trial designs• Different overall development strategies• Different indications• Different drug candidates

We decided to look at the first three items. Specifically, we chose to pretend we were designing a phaseII study with phase IIA and IIB objectives. The disease the study was targeting had a fairly short-term(2-4 weeks) efficacy outcome. There was a longitudinal biomarker measurement available to us for assessingefficacy, such as an early signal of lowering lipids. We considered simple linear pharmacokinetics and adirect-effect pharmacodynamic model. We would introduce complexity in the model of inter-individualvariation.

When designing the phase II study, we considered that we might have available some or all of the followinginformation.

• Animal safety, PK, & PBPK• Animal model for the disease (some PD)• (maybe) other drugs (competition) for disease

7

• In vitro studies (e.g, IC50, etc.)• Phase 1 data (healthy volunteers): safety and PK• A PK model• (maybe) PD model• (maybe) joint PK-PD model• (maybe) disease-biomarker model

We then discussed a GlaxoSmithKline (GSK) phase IIa study of a new INI antiretroviral drug for treatingHIV-positive patients. The drug is taken orally, once a day. Merck has competitor called Raltegravir. GSKhas a PD model.

The group decided

• to use the GSK study as a template.• to design a phase IIa & compare it to the one GSK actually carried out.• to design a phase IIb & compare our “optimal” design to the design of GSK’s ongoing phase IIb study.

Possible objectives of our simulation study:

• Include simulated data to allow for different subgroups and see how a Bayesian nonparametric priormodel based on the Dirichlet process mixture performs.

• Increase the sample size to find out when the Bayesian NP model shows the different subgroups.

Need PK & PD models for underlying truth. One-compartment PK model (w/1st-order absorption).Use GSK PD model with inter-individual variation for one element.

We developed a list of tasks.

• Code PK & PD models and figure out which parameters need priors.• Use BUGS code to simulate data and to fit model to data via separate calls to R.• Get priors for the parameters in the models.• Fit PK & PD model with mixture model priors for parameters.

We also considered the following design questions.

• When to sample viral load?• Dose levels of drug• How many dose levels in phase 2?• How to distribute patients across the dose levels (adaptive later, much later)?• Duration of treatment? (decision: 24 weeks in phase IIb)• Incorporate development of resistance?• What is objective of phase 2?• Primary endpoint (efficacy)?

– Keep viral load below 50 copy/mL

• Include safety considerations in the model? (need model)• Include pharmacogenetics?• How much heterogeneity is needed for individualization to pay off?

We then formed Task Groups:

• Code the PD model function with embedded PK model• Write WinBUGS code to generate concentrations over time & viral load measurements over time• Write program in R to call WinBUGS• Estimate parameters:

8

The group met each afternoon during the week, with several subgroups working together at other times.By the end of the second week, we produced a program that ran in WinBUGS program and generated PKand PD data similar to data we anticipate collecting as part of the clinical trials. We also had a WinBUGSprogram for fitting data with our joint PK & PD model that includes a Bayesian nonparametric prior modelfor the distribution of subject-specific parameters. We also have R code to call WinBUGS to simulate and thenanalyze these viral load data. We plan to run these programs under different design scenarios and therebydetermine the optimal biologic dose and also the best design for a future clinical trial further evaluating thisdrug. The group will continue working on this project after the SAMSI program and, hopefully, write upour results.

W5: Report on Pharmacogenomics and Networks

Peter Muller, University of Texas M.D. Anderson Cancer Center, served as working group leader.Disucssions in this WG focused on network modeling & graphical models. In the initial organizing

meeting we recognized that most WG members were new to the area and concluded that group activitiesshould therefore be mostly tutorial introductions.

Monday. The first presentation was given by Donatello Telesca, UC Los Angeles. Dontello Telesca deliv-ered an introductory lecture on graphical models, introducing notation, terminology and some basic models.

Tuesday. The theme was continued on Tuesday in a lecture by Abel Rodriguez, UC Santa Cruz, Priorson Graphs. The talk introduced informative priors for graphs, exploiting knowelege about properties of theexperimental units represented by the nodes. In particular this included attributes specific to each node(gene, person, etc.) and information about clustering of units. Participants noted that the use of clusteringto inform a probability model for a a random graph was unusual in biomedical applications, and perhaps aninteresting modeling approach.

Wednesday. Stan Young, NISS, gave a lecture on factorization of non-negative matrices. The lecturepointed out the relationship with traditional principal component analysis. The main difference between thetwo approaches is that the proposed factorization of non-negative matrices allows inference about the termsof a deconvolution of the non-negative matrix into a sum of matrices. This can be important in biomedicalapplications when a response arises as a sum of different mechanisms. Also, the solution tends to be sparse,with many zeros in the factor matrices. This might be useful in identifying structure and networks.

Thursday. The program of the working group concluded with a discussion lead by Peter Muller on Recip-rocal Graphs. The discussion tried to clarify the process of associating a graphical model with a probabilitymodel, and how posterior inference on the probability model implies posterior inference on the graph. Re-ciprocal graphs are a class of graphical models that are specifically suited for the representation of molecularnetworks in general, and in pharmacogenomics in particular. The models allow directed edges and allow theinclusion of cycles.

9

4 Feedback and Outcomes

Feedback from many participants seems to indicate that the program succeeded in the intention of connectingtwo traditionally separate research communities and drawing the attention of researchers in BNP to inter-esting opportunities in PK/PD related problems. In particular, we received feedback about the followingtentative new research projects initiated as a result of the program.

Steve MacEachern, Ohio State University, reports conversations with David Dahl, Mahlet Tadesse andThanasis Kottas about diagnostics for NP Bayesian models. They aim to develop graphical diagnosticsfor nonparametric Bayesian models.

Lang Li, IUPUI, reports initiating a collaboration with Wes Johnson, UCI. They are working on elcitingprior distributions for population PK/PD models.

Subha Guha is planning to work on several projects that resulted from my attending the conference.

Peter Thall says the workshop “. . . turned out to be a really terrific learning experience, and it has changedthe way I think about Bayesian data analysis.” He has initiated two projects during the workshop, oneon prior equivalent sample size for nonparametric Bayes priors, and another on non-parametric Bayesfor causal inference. Both are joint with Mike Escobar and Peter Muller.

Julien Cornebise says he learned a lot, “. . . and I am looking much forward to having other chances tocollaborate soon!” He has initiated work on a paper discussing the use of SDE’s for PK/PD modeling.population PK/PD model for illustration. Another project is on a “balanced Dirichlet process” model,jointly with Lane Burgette.

10

179

I.E.4 Education and Outreach Program

The SAMSI Education and Outreach (E&O) Program encompasses a variety of activities which

have achieved national stature for both their scientific and pedagogical content. The annual

activities include two-day Undergraduate Outreach Days held both in the Fall in the Spring, a

week-long Undergraduate Workshop (UGS) held in May, and the ten-day Industrial

Mathematical and Statistical Modeling (IMSM) Workshop for Graduate Students that is held in

July.

Undergraduate Outreach Days

The two outreach workshops are held annually to expose undergraduates from institutions

around the country to topics and research directions associated with concurrent SAMSI

programs. One goal of these workshops is to illustrate the application and synergy between

mathematics and statistics which goes far beyond that which students have seen in coursework.

The overall objective is to broaden the perspective of students with regard to both future

graduate studies and career choices.

Space-Time Analysis for Environmental Mapping, Epidemiology and Climate Change: October

30-31, 2009

The Fall outreach workshop focused on topics from the SAMSI Program on Space-time analysis.

The students were provided with an overview of SAMSI by Pierre Gremaud (SAMSI/NCSU) after

which program leaders, participants, postdocs and students gave a variety of presentations and

tutorials. During the Friday morning session, Murali Haran (Penn State) gave a general

introduction to statistical methods for environmental science and climate change. Ben Shaby

(SAMSI postdoc) then led a tutorial on R. This was followed by two applied presentations where

the students were shown how the type of methods under study can be used in practice. First,

Veronica Berrocal (SAMS postdoc; now on the faculty at the U. of Michigan) discussed how to

downscale outputs from computer models from averaged values to local ones. This presentation

was also paired with an interactive R session in the afternoon. The first presentation in the

afternoon was given by Martin Tingley (SAMSI postdoc; now at NCAR). Martin discussed the

field of paleoclimatology, i.e., the reconstruction of the climate using the very partial data at our

disposal. The presentation was also paired with a lab. For this, the students used MATLAB, after

having received an introduction to it by Jun Zhang (SAMSI postdoc; now at NOAA/CICS). The day

was concluded by an open discussion led by Pierre Gremaud on graduate school and career

options. During dinner on Friday, members of the Directorate as well as SAMSI visitors and

postdocs interacted with students to further discuss career opportunities. Two presentations

were given on Saturday morning by Susie Bayarri (University of Valencia, Spain) on the

construction of hazard maps for volcanic eruptions and by Howard Chang (SAMSI postdoc) on

epidemiology. Howard presentation was followed by an interactive session in R. Details

regarding the workshop can be obtained at http://www.samsi.info/workshop/two-day-

undergraduate-workshop-october-2009

There were 48 participants which included 21 females, 2 African Americans and 3 Hispanics Stochastic Dynamics: February 26, 27, 2010

180

The Spring outreach workshop was devoted to Stochastic Dynamics. The students were provided with an overview of SAMSI by Pierre Gremaud (SAMSI/NCSU) after which program leaders, participants, postdocs and students gave a variety of presentations and tutorials. The first talk in the workshop was given by Scott Schmidler (Duke) who gave a fascinating presentation on Markov chains, networks and molecules. This was followed by an introduction to R given by Xueying Wang (SAMSI postdoc; now at Texas A&M). John McSweeney (SAMSI postdoc; now at Concordia) introduced the students to models for epidemics in structured populations, i.e., by taking into account the social (for instance) interactions on the population members. The afternoon started with an interactive R session based in McSweeney’s application. Avani Athreya (SAMSI postdoc) presented introductory material on random walks and Brownian motion while an other SAMSI postdoc, Oliver Diaz discussed the sensitivity of dynamical systems to small random perturbations. Finally, Xueying Wang introduced to the students to the marvels of MATLAB. The day was concluded by an open discussion led by Pierre Gremaud on graduate school and career options. During dinner on Friday, members of the Directorate as well as SAMSI visitors and postdocs interacted with students to further discuss career opportunities. On Saturday morning, Cindy Greenwood, the overall program leader discussed general issues pertaining to stochastic dynamics. Bruce Rogers (SAMSI) addressed connections between graph theory and linear algebra; more specifically, how the former can be used very easily to get information on the former. This presentation was paired with a MATLAB interactive session. Details regarding the workshop can be obtained at http://www.samsi.info/workshop/two-day-undergraduate-workshop-february-2010 There were 35 participants which included 14 females, 1 African American and 1 Hispanic.

Undergraduate modeling workshop: May 17-21, 2010 The undergraduate modeling workshop is a yearly week event. The focus is different from the outreach workshop. Here, we consider one application and guide the students through the entire modeling process, from the design of a model, its mathematical analysis, its statistical properties, its implementation and finally discussion of the results. All tutorials and sessions are presented by SAMSI graduate students and postdocs under close supervision of a member of the Directorate and local faculty. This allows the undergraduates to interact with peers within educational and research programs they are considering and it provided valuable experience for the presenters, many of whom are considering academic careers. The workshop provides students with an intensive introduction to the synergy between applied mathematics and statistics within the context of timely physical applications and real data.

This year, we decided to build on the strengths present at SAMSI and consider an application

related to epidemiology. This field is in indeed at the intersection of the two 2009-10 programs.

Further, it pertains to one of the 2010-11 programs (Complex Networks). The students were

first guided through introductory presentations related to classical epidemiology models (SIR

and the likes). They were also given background in basic statistics and probability, R and

MATLAB. A first interactive session allowed to the students to test their skills on a “perfect” data

set: a well known boarding school influenza example for which perfectly mixed models such as

SIR work very well. After some background on complex network theory and parameter

estimation methods, the students were made to work on actual CDC flu data. That data set

corresponds to about ten regions in the US for which aggregated flu counts are given at regular

time intervals. The students had the task of adapting simple models to this situation. For the

part of the workshop, the students work in groups (of about three). Some groups successfully

adapted the models to specific regions, some others applied new tools from network theory. On

the final day of the workshops, each student team presented the results obtained during the

181

week.

The workshop was a resounding success, so much so in fact that the 2011 version will be

devoted to epidemiology again. Further, in 2011, we will run a parallel workshop involving

faculty from primarily teaching colleges with the goal of producing a teaching module. The 2010

heavily involved 15 SAMSI postdocs and 12 SAMSI graduate fellows who interacted with the

undergraduate students through lectures, interactive session and socially. In the middle of the

week on Wednesday morning, two panels are assembled to discuss professional development

issues. One panel gathers faculty with various background and addresses graduate school

opportunities. The other panel is devoted to career options after studies are completed; the

panelists come from various industries and companies. More details about the workshop can be

found at http://www.samsi.info/workshop/2009-10-undergraduate-modeling-workshop

There were 27 participants which included 10 females and 3 African Americans.

SAMSI Graduate Fellow Poster day, April 14, 2010 Through the year, the 11 SAMSI graduate fellows are involved in research projects directly related to working group activities. To recognize their achievements, the entire SAMSI community gathers to celebrate their achievements in a poster reception. Each student presents his/her results in the form of a poster. The students are encouraged to present their work at relevant conferences and make use of their SAMSI poster there. 4th Annual Graduate Student Probability Conference: April 30-May 2, 2010 SAMSI is proud to provide support for this conference cycle which is organized by graduate students and is for graduate students. The objectives of the conference are (i) to provide graduate students and postdoctoral fellows with the opportunity to speak on an area of interest within probability, (ii) to foster discussions with a friendly and informal atmosphere, (iii) to establish connections for potential future collaborations and (iv) to learn about recent development in probability. More details about the conference can be found at http://www.math.duke.edu/%7Etkolba/GSPC/ Industrial Mathematical and Statistical Modeling (IMSM) Workshop: July 20-28, 2009 The ten-day Industrial Mathematical and Statistical Modeling Workshop for Graduate Students

was the 15th year in the series and the 7th sponsored by SAMSI. The overall goals of the work-

shop are twofold: (i) expose mathematics and statistics students to current research problems

from government laboratories and industry which have deterministic and stochastic

components, and (ii) expose students to a team approach to problem solving. During the

workshop, the students learn to communicate with scientists outside their discipline, allocate

tasks among team members, and disseminate results through both oral presentations and

written reports.

Both the undergraduate modeling workshop and the graduate IMSM workshop share (and

achieve) the following goals with respect to intellectual merit:

182

Students gain experience in team work. Team work is indispensable in the approach to problem solving, in producing a final written report, and in preparing an oral presentation.

The students learn to communicate with scientists who are not academic mathematicians.

The workshops present a unique combination of applied mathematics and statistics that is not part of the usual class work.

The IMSM workshop goes further:

Students work on genuine industrial research problems. These are not the kind of academic exercises often considered in classrooms. The projects tend to be open-ended and require fresh new insight for both formulation and solution. Sometimes the biggest challenge is to figure out what the real problem is. The students also learn how to derive a useful result under a tight deadline.

Students acquire crucial insight into the aspects of a non-academic career. Some presenters may know more about their problems and can guide the students away from dead ends, while other presenters may have brought open-ended problems and are searching along with the students. This combination of approaches exposes the students to the variety of challenges facing scientists in industry.

The IMSM workshop helps students to decide what kind of career they want. The IMSM workshop provides a unique experience of how mathematics and statistics are applied outside academia. In some cases the help has been in the form of direct hiring by the participating companies.

The broad impact of our Education and Outreach activities is substantial:

Our workshops help to attract students to and prepare them for a non-academic career, by exposing them to real-world industrial problems.

The participating students represent a nationally diverse group, with a substantial number of women and minorities.

The workshops strengthen the interaction between applied mathematics and statistics. The workshops, and specifically, IMSM, benefit government and industry research. Often

the student teams come up with useful solutions to a project. Several projects initially presented at the IMSM workshop have resulted in long term collaborations between students and faculty on the one side and the companies on the other. Furthermore, several companies have taken advantage of the recruitment opportunity provided through direct contact with some of the most talented students in the mathematical and statistical sciences. Many companies, large and small, have shown continued interest and enthusiasm about the IMSM workshop.

For the 2009 IMSM workshop, there were 37 participants which included 14 females and 1 Hispanic.

The projects investigated during the workshop corresponded to current research problems presented by scientists from the Environmental Protection Agency, MIT Lincoln Laboratory, the National Center for Atmospheric Research, Republic Mortgage Insurance Company, Sandia National Research Laboratories and the UNC School of Pharmacy. Each team gave a 30 minute oral presentation summarizing their results on the final day of the workshop and written reports were compiled as the SAMSI Technical Report 2010-02 which is available at http://www.samsi.info/sites/default/files/tr2010-02.pdf Details regarding the workshop can

183

found at http://www.ncsu.edu/crsc/events/imsm09/ Diversity

See Section I.H for discussion of the efforts to promote diversity. Courses

See the program reviews in Section I.E for discussion of the SAMSI courses. Three courses were

offered during the Fall semester in 2009 and one during Spring 2010; these are credited 3

credits/units at each of the participating Universities.

The course attached to the Stochastic Dynamics program in Fall 2009 was taught by Priscilla Greenwood, Jonathan Mattingly, and guest lectures. This course introduced background material in dynamical systems theory and some basic principles regarding the behavior of associated stochastic dynamical systems. Examples were drawn from a number of settings including population dynamics, modeling of biological systems, fluid mixing and transport, and simple physical systems. No background in any particular field of application was assumed. The course examined both analytical and numerical approaches as well as discussed the art of stochastic modeling. Attendance: 16 students registered The Space-Time program generated two graduate courses in Fall 2009. Alan Gelfand (Duke) and Montse Fuentes (NCSU) and several long term SAMSI visitors offered a course on Theory of Continuous Space and Space-Time Processes. This course was intended to provide a strong theoretical foundation for space and space-time processes over continuous domains. Topics included continuous parameter stochastic process theory; spectral methods; spatial asymptotics; nonstationary spatial modeling; dynamic models and spatial time series; nonseparable space-time models; spatial design; space-time data fusion; low rank representations; nonparametric spatial methods; topics in shape analysis. Attendance: 27 students registered In parallel with that course, Richard Smith (UNC) taught a course on Spatial Epidemiology. Much of modern epidemiology is concerned with relationships between environmental factors and various types of human health outcome. When data are collected at many spatial locations, we may refer to the problem as one of spatial epidemiology. However in most cases, this includes a temporal component as well. Since modeling spatial dependence is often critical to the method of statistical inference, it is necessary to use methods from spatial or spatio-temporal statistics. Very often health data are aggregated (e.g. into zip code or county totals) so models for data at discrete spatial locations, such as Markov random fields, are more appropriate than geostatistical methods. Another kind of problem is exemplified by the NMMAPS study (http://www.ihapss.jhsph.edu/): an air pollution-mortality relationship is developed initially for many time series at individual cities, but inferences are then drawn by combining data across spatial locations. A third kind of problem is when there is uncertainty about the pollution field itself, for example, when data collected at monitors are interpolated to other locations. Sometimes this interpolation is performed by spatial statistics methods, but there is a growing trend to use air pollution models such as CMAQ (the EPA's Community Multiscale Air Quality model). Specific topics: Models for spatially distributed health data. Markov random fields; extensions to spatial-temporal processes. Multi-city time series studies; combining data across multiple studies at different spatial locations. Measurement error problems that involve spatial

184

interpolation; use of air quality models. Attendance: 19 students registered Finally, in Spring 2010, Montse Fuentes and several guest lecturers gave a course on Spatial Statistics in Climate, Ecology and Atmospherics. In this course, the instructors introduced the statistical methods to characterize uncertainties in climate, weather, ecological and air pollution deterministic models. They also presented statistical frameworks to combine disparate spatial data, from observations and output of deterministic models, and to measure the agreement between an artificially generated climate signal from a climate model and real data as measured by surface observation stations or satellites. Statistical methods for processing ensembles of climate models were covered. The instructors introduced different spatial temporal modeling approaches to characterize trends in space and time, as well as to estimate dependency structures, and to do space-time prediction for climate, weather, ecological and air pollution data. They also discussed state-of-the art methods for dimension reduction, spatial extreme events in climate and weather, and impact of climate change on mortality and human health. Attendance: 15 students registered

185

F. Industrial and Governmental Participation

Government and industry participation in SAMSI program and activities reflects broad

interest in the SAMSI vision. Most SAMSI workshops had extensive participation by

individuals from industry and government. Here, we summarize only the more intensive

involvements, e.g., participation of such individuals in program working groups.

Risk Analysis, Extreme Events and Decision Theory: This program had working

group members from IBM, the Center for Disease Control and Prevention (CDC), and

NCAR. Contact was also made to Genesys Lab of Alcatel-Lucent to obtain data for

testing methodology.

Environmental Sensor Networks: This program had working group members from

government agencies, laboratories, and industry, including EPA, CDC, Marine Biological

Laboratory, the IBM Watson Research Center, and the National Institute for Space

Research (Brazil).

Sequential Monte Carlo Methods:

The Tracking working group had close interactions with the UK's DSTL governmental

defense organization, with Nathan Green being on secondment from there. Mark Briers

was on secondment from QinetiQ Ltd. for his visit in Fall 08.

Algebraic Methods in Systems Biology and Statistics:

We had a number of participants who were from government agencies and industry:

Lawrence Cox (Amgen), Gilles Gnacadja (Amgen), and Richard Haney (Cellular

Statistics).

Education and Outreach Program: In the Industrial Mathematical and Statistical

Modeling Workshop, the attendees were divided into 6 teams to investigate current

research problems presented by scientists from Glaxo Smith Kline, MIT Lincoln

Laboratory, the National Institute of Statistical Sciences, Republic Mortgage Insurance

Co and SAS.

186

G. Publications and Technical Reports

I. STOCHASTIC DYNAMICS

Publications and Technical Reports

Athreya, A., Srinivasan, R., McKinley, S., “Understanding Quiescence for a

Morris-Lecar Model”

Athreya, A., Mattingly, J., Kolba, T. “Noise-Induced Stabilization for Certain

Planar System”

Bhamidi S., and Greenwood, P.E., Variants of Brownian motion. Chapter for

Encyclopedia of Operations Research, to in appear 2011:

Baxendale, P.H., and Greenwood P.E., Sustained oscillations for density-

dependent Markov processes, Journal of Math. Bio. to appear

Cooke B., Mattingly J.C., McKinley S.A., Schmidler S.C., Geometric

Ergodicity of Two-dimensional Hamiltonian systems with a Lennard-Jones-

like Repulsive Potential. Communications in Mathematical Sciences,

submitted, 2011. http://arxiv.org/abs/1104.3842

E, Weinan, Engquist, B., and Sun, Y. Heterogeneous multiscale methods with

application to combustion, in Turbulent Combustion Modeling: Advances,

Trends and Perspectives (T. Echekki and E. Mastorakos eds.), Springer,

pp.439–459, 2011. http://www.springerlink.com/content/k031788726183244/

Giraudo, M.T., Greenwood P.E., and Sacerdote L., How sample paths of leaky

integrate and fire models are influenced by the presence of a firing threshold.

Comp. Neuroscience, to appear.

Gremaud, P. and Sun, Y. Numerical study of singularity formation in

relativistic Euler flows, SIAM J. Sci. Comput., in review, 2011.

http://www.cims.nyu.edu/~yisun/paper/SISC.pdf

Srinivasan, R., Menon, G., “Kinetic Theory and a Lax Pair for Shock

Clustering and Burgers Turbulence” Submitted Sept. 2009.

Srinivasan, R., “Rates of Convergence for Smoluchowski's Coagulation

Equations” Edited/resubmitted Oct. 2009

Sun, Y., Caflisch, R. and Engquist, B. A multiscale method for epitaxial

growth, SIAM Multiscale Model. Simul., 9, pp.335–354, 2011.

http://dx.doi.org/10.1137/090747749

187

Sun, Y. Rangan, A.V., Zhou, D., and Cai, D. Coarse-grained event tree

analysis for quantifying Hodgkin-Huxley neuronal network dynamics, Journal

of Computational Neuroscience, in press, 2011.

http://www.springerlink.com/content/j6k86h53470332x1/

Vivar, J. and Banks, D. Models for Networks: A Cross-Disciplinary Science,

in: Wiley Interdisciplinary Reviews: Computational Statistics, to appear in

2011

Wang, H., Banks, D., Vivar, J., and Dunson, D. The Multiple Bayesian Elastic

Net, Journal of Machine Learning, submitted, 2011.

http://stat.duke.edu/~hy35/MBEN.pdf.

Reports in Preparation

Athreya, A “Large Deviations for a Diffusion Process with a First Integral”

(in preparation)

Athreya, A. “Stochastic Resonance for Hamiltonian Systems Perturbed by

Noise” (in preparation)

Athreya, A., Kramer, P., Fricks, J., McKinley, S., “Cooperative and Tug-Of-

War Models for Molecular Motors” (article in preparation)

Diaz-Espinosa, O., de la Llave, R., “Random Pertubations of Holomorphic

Maps with Siegel Disks”

Diaz-Espinosa, O., Schmidler, S., “Metastability”

Diaz-Espinosa, O., Khanin, K., “Directed Polymers in Random Environment

with Product Structure”

Diaz-Espinosa, O., “Long Wave Expansion for Water Waves Over a Random

Surface”

Rogers, B., “Poincare Inequalitites on Markov Chains” 2 papers in

preparation

Rogers, B., “Control of Consensus under Bounded Confidence” in

preparation

Rogers, B., Menjivar, C., “Network Affects on Computational Models of

Generalized Reciprocity” (in preparation)

Rogers, B., Murillo, D., “Stochastic Agro-Ecosystem Model” (nearly

188

submitted)

Rogers, B., Hindman, M., “Modeling Internet Traffic” (book and several

papers)

Wang, X., “Pairwise Closure Approximations of in Epidemic Models on

Networks”, in preparation.

Wang, X., “Stochastic Resonance Effects in Single and Coupled Morrislecar

Type Neurons”, in preparation.

II. SPACE-TIME ANALYSIS FOR ENVIRONMENTAL MAPPING,

EPIDEMIOLOGY AND CLIMATE CHANGE

Publications and Technical Reports

Assunçao, R. and Lopes, D. (2010), Visualizing Marked Spatial and Origin-

destination Point Patterns With Dynamically Linked Windows. Journal of

Graphical and Computational Statistics, accepted.

Banerjee, S., Gelfand, A. “Modelling Spatial Gradients on Response

Surfaces” Frontiers of Statistical Decision Making and Bayesian Analysis.

Eds. M.H. Chen, D.K. Dey, P. Mueller, D. Sun, K. Ye New York: Springer

(2010)

M.J. Bayarri (2010). A methodological review of computer models. In:

Frontiers of Statistical Decision Making and Bayesian Analysis (Ming-Hui

Chen, Dipak K. Dey, Peter Mueller, Dongchu Sun, and Keying Ye, eds.).

Springer, pp 157-168.

Berrocal, V., Gelfand, A., Holland, D.M. “A Bivariate Space-time

Downscaler Under Space and Time Misalignment” Annals of Applied

Statistics. In Press (2010)

Bhat, K. S., Haran, M., Goes, M. (2010). Computer model calibration with

multivariate spatial output: a case study in climate parameter learning. In:

Frontiers of Statistical Decision Making and Bayesian Analysis (Ming-Hui

Chen, Dipak K. Dey, Peter Mueller, Dongchu Sun, and Keying Ye, eds.).

Springer

Bhat, K. S., Haran, M., Keller, K., Tonkonojenkov, R. (2010). Inferring

likelihoods and climate system characteristics from climate models and

multiple tracers, submitted.

Blanchet, J. and Davison, A. C. (2011) Spatial modelling of extreme snow

depth. Annals of Applied Statistics, to appear.

189

Chakraborty, A., Gelfand, A., Wilson, A.E., Latimer, A.M., Silander, J.A.

"Modeling Large-scale Species Abundance with Latent Spatial Processes"

Annals of Applied Statistics. In Press (2010)

Chakraborty, A., Gelfand, A.E., Wilson, A. M., Latimer A. M. and Silander,

J.A. (2011), Point pattern modeling for degraded presence-only data over

large regions. To appear, Journal of Royal Statistical Society, Series C

Chang, H., Peng, R.D., Dominici, F. “Estimating the Acute Health Effects of

Coarse Particulate Matter Accounting for Exposure Measurement Error”

Under Revision 2010

Chang, H., Zhou, J., Fuentes, M. “Impact of Climate Change on Ambient

Ozone Level and Mortality in Southeastern United States” International

Journal of Environmental Research and Public Health 7:2866-2880 (2010)

Chang H, Dominici, F, and Peng, RD. “Bayesian Model Averaging for

Clustered Data” Submitted (2009)

Corberan, A. and Lawson, A. (2011), Conditional Predictive Inference for

On-line Surveillance of Spatial Disease Incidence. Submitted to Statistics in

Medicine.

Craigmile, Peter and Bala Rajaratnam (2011), Discussion of A statistical

analysis of multiple temperature proxies: Are reconstructions of surface

temperatures over the last 1000 years reliable? Annals of Applied Statistics,

Vol. 5, No. 1, 88--90.

N. Cressie, R. Assunçao, S.H. Holan, M. Levine, O. Nicolis, J. Zhang and J.

Zou (2011), Dynamical random-set modeling of concentrated precipitation in

North America. Technical Report No. 854, March, 2011, Department of

Statistics, The Ohio State University, Columbus, OH. Submitted to Statistics

and Its Interface.

Das, S., Dey, D., “Bayesian Methods for Dynamic Models” Submitted (2009)

Das, S., Kaufman, M., Singh, G., Micames, C., Gress, F.,“Efficacy of

Endoscopic Ultrasound (EUS) Guided Celiac Plexus Block (CPB) and Celiac

Plexus Neurolysis (CPN) for Managing Abdominal Pain Associated with

Chronic Pancreatitis and Pancreatic Cancer: a systematic review and Meta

analysis”. Accepted in Journal of Clinical Gastroenterology (2009)

Davison, A. C. and Gholamrezaee, M. M. (2010) Geostatistics of extremes.

Under revision.

190

Davison, A. C., Padoan, S. and Ribatet, M. (2010), Statistical modelling of

spatial extremes. Under revision for Statistical Science.

R. Dennis, Fox, M. Fuentes, A. Gilliland, A. Hanna, G. Hogrefe, Irwin,S. Rao,

R. Sheffe, H. Schere, Steyn, Venkatram. (2010). A framework for evaluating

regional-scale numerical photochemical modeling systems. Environmental

Fluid Mechanics Journal, in press.

Eidsvik, J., Finley, A., Banerjee, S., and Rue, H. (2010). Approximate

Bayesian inference for large spatial datasets using predictive process models.

Under revision, Computational Statistics and Data Analysis.

Eidsvik, J., Shaby, B., Reich, B.J., Wheeler, M. and Niemi, J. (2010).

Estimation and prediction in spatial models with block composite likelihoods

using parallel computing. Submitted.

Erhardt, R. and Smith, R.L. (2011), Approximate Bayesian Computing for

Spatial Extremes. Submitted for publication.

M. A. R. Ferreira, Using Proper Markov Random Fields to Build Highly

Structured Models for Gaussian Areal Data. Chapter for book edited by

Dongchu Sun, to appear.

M. A. R. Ferreira and J. C. Vivar, Spatiotemporal models for Poisson areal

data with an application to the AIDS epidemic in Rio de Janeiro. Journal of

the Royal Statistical Society - Series C, submitted.

Finley, A.O., Banerjee, S. and MacFarlane, D.W. (2010). A hierarchical

model for predicting forest variables over large heterogeneous domains.

Journal of the American Statistical Association 106, 31-48.

Fuentes, M., and Banerjee, S. (2011). Bayesian Modeling for Large Spatial

Datasets. WIREs Computational Statistics, accepted.

Fuentes, M. and Foley, K. (2011). Ensemble methods, in upcoming

Encyclopedia of Environmetrics.

Fuentes, M., Henry, J. and Reich, B. (2011), Nonparametric Spatial Models

for Extremes: Application to Extreme Temperature Data. Extremes, accepted.

Fuentes, W., Herring, A., Langlois, P. “Air Pollution and Preterm Birth:

Identifying Critical Windows of Exposure”. Under Review.

Fuentes, M., Xi, B. and Cleveland, W. (2010), Trellis Display for Modeling

Data from Designed Experiments. Journal of Statistical Analysis And Data

Mining, in press.

191

Ghosh, S.; Das, S. “Spatial Point Process Analysis of Maoist Insurgency in

India” June 19, 2010 SAMSI Tech Rep 2010-3

Guhaniyogi, R., Finley, A.O., Banerjee S., Gelfand, A.E. "Adaptive Gaussian

Predictive Process Models for Large Spatial Datasets" Environmetrics – In

Review (2010)

Haran, Murali and Nathan M. Urba (2011). Discussion of A statistical

analysis of multiple temperature proxies: Are reconstructions of surface

temperatures over the last 1000 years reliable? Annals of Applied Statistics,

Vol. 5, No. 1, 61-64.

J. Harlim (2011), Numerical Strategies for Filtering Partially Observed Stiff

Stochastic Differential Equations, J. Comput. Phys., 230(3), 744-762.

Heaton, M., Banks, D., Zou, J., Karr, A., Datta, G., Lynch, J. and Vera, F.

(2011), A Spatio-Temporal Absorbing State Model for Disease and Syndromic

Surveillance. Submitted to Statistics in Medicine.

Kang, E.L., Cressie, N., Sain, S.R. (2010) “Bayesian Hierarchical Models to

Combine Outputs from the NARCCAP Regional Climate Models”. Submitted

E.L. Kang and J. Harlim (2010), A Fast Filtering Framework for Assimilating

Partially Observed Multiscale Systems: The Macro-Micro-Filter, submitted to

Monthly Weather Review.

Mannshardt Shamsheldin, E., Smith R., Sain, S., Mearns, L., Cooley, D.

“Downscaling Extremes: A Comparison of Extreme Value Distributions in

Point-Source and Gridded Precipitation Data”. Annals of Applied Statistics,

March 2010.

Martinez-Beneito, M.A., Botella-Rocamora, P. and Zurriaga, O. (2011), A

kernel-based spatio-temporal surveillance system for monitoring influenza-

like illness incidence. Statistical Methods in Medical Research 20, 103-118.

Nychka, Douglas and Bo Li (2011), Discussion of A statistical analysis of

multiple temperature proxies: Are reconstructions of surface temperatures

over the last 1000 years reliable? Annals of Applied Statistics, Vol. 5, No. 1,

80-82.

Prates, M.O., Dey, D.K., Willig, M.R. and Yan, J. (2010), Intervention

Analysis of Hurricane Effects on Snail Abundance in a Tropical Forest Using

Long-Term Spatiotemporal Data, Journal of Agricultural, Biological, and

Environmental Statistics, 1-15, doi: 10.1007/s13253-010-0039-1.

Prates, M.O., Dey, D.K., Willig, M.R., and Yan, J. (2011), Transformed

Gaussian Markov Random Fields and Spatial Modeling. Submitted.

192

Reich, B., Eidsvik, J., Guindani, M., Nail, A., Schmidt “A Class of

Covariate-dependent Spatiotemporal Covariance Functions” Submitted to

Annals of Applied Statistics 2010

Reich BJ, Fuentes M. Nonparametric Bayesian models for a spatial

covariance. Statistical Methodology. Accepted

Reich, B., Fuentes, M. and Dunson, D. (2011). Bayesian spatial quantile

regression. JASA Case Studies 406, 6-20.

Reich BJ, Kalendra E, Storlie CB, Bondell HD, Fuentes M., Variable

selection for high-dimensional Bayesian density estimation: Application to

human exposure simulation. Applied Statistics, accepted.

Reich, B.J. and Shaby, B. (2011), A finite-dimensional construction of a max-

stable process for spatial extremes. Submitted.

Reichert, P., White, G., Bayarri, M.J., Pitman, E.B. (2011). Mechanism-based

emulation of dynamic simulation models: concept and application in

hydrology. Computational Statistics and Data Analysis (in press).

Ribatet, M., Cooley, D. and Davison, A.C. (2010), Bayesian inference from

composite likelihoods, with an application to spatial extremes. Under revision

for Statistica Sinica.

Salazar, E., Lopes, H., Gamerman, D., “Generalized Spatial Dynamic Factor

Analysis” (Submitted)

Salazar, E., Ferreira, M., “Temporal Aggregation of Lognormal

Autoregressive Processes” (Submitted)

Salazar E., Ferreira, M., Migon, H., “Objective Bayesian Analysis For

Exponential Power Regression Model” (Submitted to Sankhya B)

Salazar E., Lopes, H., Schmidt, A., Gomes, M., Achkar, M., “Measuring

Vulnerability Via Spatially Hierarchical Factor Models” (Submitted)

Smith, R.L. (2010), Understanding sensitivities in paleoclimate

reconstructions. Under revision.

Spöeck, G., Zhu, Z. and Pilz, J. (2011), Simplifying objective functions and

avoiding stochastic search algorithms in spatial sampling design, submitted

to: Stochastic Environmental Research and Risk Assessment, (2011).

Spöeck, G., Hussain I.: Spatial sampling design based on convex design ideas

and using external drift variables for a rainfall monitoring network in

193

Pakistan. In: Statistical Methodology, New York: Elsevier Science Inc, doi:

10.1016/j.stamet.2011.01.004.

Steinsland, I., Berrocal, V., Reich, B. "Bayesian Spatial Quantile Regression

for Numerical Model Calibration" Journal of the Royal Statistical Society:

Series C. To be submitted 2010

Tingley, Martin P. Spurious predictions with random time series: the lasso in

the context of paleoclimatic reconstructions. Discussion of A statistical

analysis of multiple temperature proxies: Are reconstructions of surface

temperatures over the last 1000 years reliable? Annals of Applied Statistics,

Vol. 5, No. 1, 83-87.

Tingley, M.P., Craigmile, P.F., Haran, M., Li, B., Mannshardt-Shamseldin, E.

and Rajaratnam B., Piecing together the past: Statistical insights into

paleoclimatic reconstructions. Under revision for Journal of Climate.

Tingley, M., Huybers, P., “A Bayesian Algorithm for Reconstructing Climate

Anomalies in Space and Time. Part 1: Development and applications to

paleoclimate reconstruction problems” In revision. Journal of Climate 2009

Tingley, Martin P. and Peter Huybers. A Bayesian Algorithm for

Reconstructing Climate Anomalies in Space and Time. Part 2: Behavior and

comparison with the regularized expectation-maximization algorithm. In

revision. Journal of Climate 2009

Tingley, Martin P. and Peter Huybers. The spatial mean and dispersion of

surface temperatures over the last 1200 years: warm intervals are also

variable intervals. In revision Climatic Change 2009

Tingley, M.P. and Li, B. (2011), Comments on “Reconstructing the NH mean

temperature: Can underestimation of trends and variability be avoided?” by

Bo Christiansen; under review at Journal of Climate.

Jean Vaillant, Gavino Puggioni, Lance A Waller, and Jean Daugrois (2011), A

spatio-temporal analysis of the spread of sugar cane yellow leaf virus. Journal

of Time Series Analysis, to appear.

Wang, X., Dey, D.K., Banerjee, S. “Non-Gaussian Hierarchical Generalized

Linear Geostatistical Model Selection” Frontiers of Statistical Decision

Making and Bayesian Analysis, eds. M.H. Chen, D.K. Day, P. Mueller, D.

Sun, K. Ye. New York: Springer 2010

Warren, J., Fuentes, M., Herring, A., Langlois, P. “Air Pollution and Preterm

Birth: Identifying Critical Windows of Exposure” Under Review 2010

Zhang, J., “Regression Models for Spatial Images” Submitted to JABES

194

Zhang, J., Clayton, M. K., and Townsend, P. A., (2010), Functional

Concurrent Linear Model for Spatial Images, accepted by Journal of

Agricultural, Biological, and Environmental Statistics.

Zhang, J., and Zheng, W., (2011), A Penalized Likelihood Approach for m-

year precipitation return values estimation. Submitted to Journal of

Agricultural, Biological, and Environmental Statistics.

Zou, J., Karr, A., Heaton, M., Banks, D., Datta, G., Lynch, J. and Vera, F.

(2010), Bayesian Methodology for Spatio-Temporal Syndromic Surveillance.

Submitted to Statistical Analysis and Data Mining.

Reports in Preparation

Bhat, K. S., Haran, M., Keller, K. (2010), Combining multiple climate models

using Bayesian model averaging to project surface temperature.

Luke Bornn, Gavin Shaddick and James V Zidek, The effects non-stationarity

and preferential sampling in modeling and measuring environmental fields for

health effect analysis.

Song Cai, Xiaoli Yu, Gavin S Shaddick and James V Zidek, Preferential

sampling in the monitoring black smoke: a case study.

Calder, C.A., Darnieder, W.F. “Bayesian Inference for Incomplete Marked

Spatial Point Patterns” 2010

Chakraborty, A. and Gelfand, A. E. 2011 Mixture modeling for spatial point

patterns with location-specific covariates.

Chang, H.H., Baxter, L.K., Fuentes, M., and Neas, L.M., Daily mortality and

chemical constituents of fine particle air pollution.

Chang, H.H., Reich, B.J., Miranda, M.L. “Spatial Time-to-event Analysis of

Fine Particulate Matter and Preterm Births” 2010

Chang, H.H., Reich, B.J., Miranda, M.L. “Fine Particle Air Pollution and

Preterm Birth in North Carolina” 2010

Das, S., Banerjee, S., “Space-Time Data Analysis using Sequential Monte

Carlo Method”. In preparation

Das, S., Dunson, D., “Gaussian Process Latent Variables Dynamic Models”.

Work in progress

195

Das, S., Banks, D., “Elicitation of Expert Prior opinion in Context of

Presidential Election” (2009) Work in progress: Tentative title

Jeon, S. and Smith, R.L. (2011), Max-Stable Processes for Threshold

Exceedances in Spatial Extremes.

Krivo, L.J., Washington, H.M., Peterson, R.D., Browning, C.R., Calder, C.A.,

Kwan, M.P. “Social Isolation of Disadvantage and Advantage: The

Reproduction of Inequality in Urban Space” 2010

Li, Xuan and Smith, R.L., Bias and Variability Assumptions in a Bayesian

Analysis of Climate Models.

Mannshardt-Shamseldin, E., Craigmile, P. and Tingley, M, Paleoclimatic

extremes in proxy data.

Mannshardt Shamsheldin, E., Gilleland, E., “Severe Weather Under a

Changing Climate: Large Scale Indicators of Extreme Events”.

Mannshardt Shamsheldin, E., Smith R., “Asymptotic Multivariate Kriging

Using Estimated Parameters with Bayesian Prediction Methods for Non-

linear Predictands.” Paper in preparation.

Matteson, D.S. and Micheas, A.C., A Finite Mixture Model for Spatio-

temporal Poisson Process.

Reich, B. and Berrocal, V. Local Bayesian model averaging for numerical

model calibration.

Salazar, E., Sansó, B., Finley, A.O., Hammerling, D., Steinsland, I., Wang, X.

and Delamater, P., Comparing and Blending Regional Climate Model

Predictions for the American Southwest.

Shaby, B., The open-faced sandwich adjustment for estimating function-based

MCMC.

Smith, R.L. and Wehner, M., A Threshold-based Extreme Value Approach to

Event Attribution.

Wolpert, R., Spatial extremes.

Xiaoli Yu, Gavin Shaddick and James V Zidek, Correcting for preferential

sampling in monitoring environmental processes.

Xiaoli Yu, Gavin Shaddick and James V Zidek, The effects of preferential

sampling in environmental epidemiology.

196

Zhang, J., “Penalized Concurrent GEV/GPD Likelihood Method”

Zhou, J., Fuentes, M. “Calibration of Air Quality Deterministic Models Using

Nonparametric Spatial Density Functions” In preparation.

III. SUMMER PROGRAM ON SEMIPARAMETRIC BAYESIAN INFERENCE:

APPLICATIONS IN PHARMACOKINETICS AND PHARMACODYNAMICS

Reports in Preparation

Cornebise, J., “Use of SDE's for PK/PD Modeling Population PK/PD Model

for Illustration” In preparation

Cornebise, J., Burgette, L., “Balanced Dirichlet Process Model” In

preparation

Lang, L, Johnson, W., “Elciting Prior Distributions for Population PK/PD

Models” In preparation.

MacEachern, S., Dahl, D., Tadesse, M., Kottas, T., “Diagnostics for NP

Bayesian Models” In preparation

Thall, P., Escobar, M., Muller, P., “Prior Equivalent Sample Size for

Nonparametric Bayes Priors” In preparation

Thall, P., Escobar, M., Muller, P., “Non-Parametric Bayes for Causal

Inference” In preparation

197

H. Efforts to Achieve Diversity

SAMSI puts considerable emphasis on contributing to the NSF’s effort to broaden the participation from underrepresented groups in the mathematical sciences. During the past year, we have organized and co-sponsored many diversity related activities. SAMSI has also developed a web page devoted to our diversity activities. The page advertises the various program activities related to minority outreach and has links to other diversity related information outside of SAMSI. Participation in the NSF Institutes’ Diversity Committee Pierre Gremaud has been serving as SAMSI’s representative to the NSF Institutes

Diversity Coordination Committee which was formed in 2006 by Chris Jones (SAMSI) and Helen Moore (formerly of AIM), and is now chaired by David Auckly (MSRI). The Institutes Diversity Coordination Committee has been working together to promote diversity in the Mathematical Sciences at national conferences and through other special events. Together with the other NSF mathematical sciences institutes, SAMSI has recently secured supplemental NSF funding that will allow for the long term planning of an entire portfolio of activities including the Modern Math workshop series, the Workshop Celebrating Diversity series, the Minority Professional Development workshop, the Careers for Minorities workshops, the Careers for Women workshops, the Blackwell Tapia Conferences and some activities in conjunction with the Association for Women in Mathematics. SAMSI took part in the Modern Math program at the 2009 SACNAS National Convention in Dallas, TX. This program was aimed at introducing young scientists to a variety of current research topics, providing mentorship and networking opportunities, and recruiting future participants in NSF Institute programs from underrepresented groups. Oliver Diaz-Espinosa, one of our SAMSI postdoctoral fellows and a member of the SAMSI program on Stochastic Dynamics presented an overview of his research. SAMSI was again a participant in the Modern Math program in Oct. of 2010, which will be reported in the next Annual Report of SAMSI. Minority Participation in SAMSI Programs SAMSI Postdoctoral Positions: For the 2009-10 Research Programs, of 15 post-docs hired, five are women: Emily Kang, Esther Salazar, Xueying Wang, Veronica Berrocal, and Avanti Athreya, and one is an underrepresented minority: Oliver Diaz. For the 2010-11 Research program, of seven post-docs hired, two are women: Sylvie Tchumtchoua and Hongxiao Zhu and two are minorities David Degras and Sylvie Tchumtchoua. The large number of postdocs hired in 2009-10 resulted from the supplemental NSF funding for that purpose made available to the institutes that year.

198

Education and Outreach Programs: SAMSI continues to create opportunities, through its E&O Program, to enhance its diversity efforts by recruitment of under-represented participants. We recruit from HBCU's for all programs. We collaborate with our National Advisory and Education and Outreach Committees to reach out to the Hispanic and Native American communities. Further, new efforts have been made to engage both faculty and students at primarily teaching colleges. The diversity breakdown in specific E&O workshops is as follows.

Industrial Mathematical and Statistical Modeling (IMSM) Workshop (July 2009): From the 37 participants, 14 were female, and 1 was Hispanic.

2-Day Undergraduate Workshop (Oct 2009): From the 48 participants, 21 were female, 2 were African American, and 3 were Hispanic.

2-Day Undergraduate Workshop (Feb. 2010): From the 35 participants, 14 were female, 1 was African American, and 1 was Hispanic.

Undergraduate Workshop (May 2010): From the 27 participants, 10 were female and 3 were African American.

199

200

I. External Support and Affiliates

1. External Support

SAMSI receives extensive support through the home institutions of our long-term visitors. On

average, SAMSI pays for approximately 1/3 of a long-term visitor’s salary during the visit to

SAMSI; the remainder is provided by the home institution.

Kenan Foundation: provided $50,000 of supplementary support, focused primarily on the K-12

Kenan Fellows program.

Sequential Monte Carlo Methods: The Adaptive Design Workshop was jointly funded by the

NISS project on Computer Models for Geophysical Risks.

Affiliates: Significant support arose from the Affiliates Program, as discussed in the next

section.

2. Affiliate Involvement

2.1. Background

The NISS Affiliates Program and NISS/SAMSI University Affiliates Program are the largest

programs of their kind among the DMS-funded mathematical sciences research institutes. The

entire directorate interacts directly with affiliates; NISS director Alan Karr and associate director

Nell Sedransk have major responsibility for operation of these programs.

Recent new affiliates in 2009-10 include the Department of Statistical Science at Cornell

University, the Department of Mathematics and Statistics at the University of Maryland

Baltimore County, the Departments of Biostatistics and Statistics at the University of Pittsburgh

and the Department of Statistics at Virginia Tech. A complete listing of affiliates as of May 2011

appears below.

As a benefit of membership, affiliates may receive reimbursement for expenses to attend SAMSI

workshops as well as NISS events, many of which derive from SAMSI programs.

A central role of the affiliates is as a bridge from SAMSI to the statistics and applied

mathematics communities (as well as to industry and government), especially to inform the

development of SAMSI programs. To illustrate, the 2007-08 program on Risk Analysis, Extreme

Events and Decision Theory, as well as the 2006–07 program on Development, Assessment and

Utilization of Complex Computer Models, the National Defense and Homeland Security

program in 2005–06, the Latent Variable Models in the Social Sciences (LVSS) program in

2004–05 and the DMML program for 2003–04, all reflect affiliate interest to a significant de-

gree. All 2009-10 and 2010-11 contain components and activities suggested by affiliates.

201

2.2 NISS Affiliates and NISS/SAMSI Affiliates

Corporations: Avaya Labs, AT&T Labs Research, Bayer HealthCare, GlaxoSmithKline, Eli

Lilly, Merck Research Laboratories, MetaMetrics, Inc., PNYLAB, RTI International, SAS

Institute, and Yahoo! Labs

Government Agencies and National Laboratories: Bureau of Labor Statistics, Census Bureau,

Energy Information Administration, National Agricultural Statistics Service, National Center for

Education Statistics, National Center for Health Statistics/CDC, National Security Agency, and

Office of the Comptroller of the Currency

NISS/SAMSI University Affiliates: University of California Berkeley, Department of Statistics;

Carnegie Mellon University, Department of Statistics; Columbia University, Department of

Biostatistics; University of Connecticut, Department of Statistics; Cornell University,

Department of Statistical Science; Duke University, Departments of Mathematics and Statistical

Science; University of Florida, Department of Statistics; Florida State University, Department of

Statistics; George Mason University; Georgetown University Department of Biostatistics,

Bioinformatics, and Biomathematics; University of Georgia, Department of Statistics; University

of Illinois Urbana-Champaign, Department of Statistics; Indiana University, Department of

Statistics; Iowa State University, Department of Statistics; Johns Hopkins University,

Department of Applied Mathematics and Statistics; University of Maryland Baltimore County,

Department of Mathematics and Statistics; Medical University of South Carolina, Department of

Biostatistics, Bioinformatics & Epidemiology; University of Michigan, Departments of Statistics

and Biostatistics; University of Missouri Columbia, Department of Statistics; North Carolina

State University, Department of Statistics; North Carolina State University, Department of

Mathematics; University of North Carolina at Chapel Hill, Department of Biostatistics;

University of North Carolina at Chapel Hill, Department of Mathematics; University of North

Carolina at Chapel Hill, Department of Statistics & Operations Research; Oakland University,

Department of Mathematics and Statistics; Ohio State University, Department of Statistics;

Pennsylvania State University, Department of Statistics; University of Pittsburgh, Departments

of Biostatistics and Statistics; Purdue University, Department of Statistics; Rice University,

Department of Statistics; Rutgers University, Department of Statistics; University of South

Carolina, Department of Statistics; Stanford University, Department of Statistics; Texas A&M

University, Department of Statistics; Virginia Commonwealth University, Department of

Biostatistics; Virginia Tech, Department of Statistics

2.3 Affiliate Participation

All SAMSI programs and events during 2009-10 had strong affiliate participation, nearing one-

half of attendees at some workshops. Expenditures from Affiliates Reimbursement Accounts to

attend SAMSI events exceeded $60,000.

202

2.4 Plans for the Future

Several affiliates initiatives will increase the attractiveness of the program, as well as guide the

development of future SAMSI programs. These initiatives include information exchange for

internships, forums for sharing problems and catalyzing collaborations, and more intense focus

on activities such as distance learning.

203

I.J Board and Advisory Committee Members

SAMSI

Governing Board Members

Bruce Carney UNC, Assoc. Dean Astronomy 2004-08

George Casella U of FL Stats 2005-08

Don Estep Colorado State U Math & Stats 2009-10

John Harer Duke, As.Provost Math 2002-04

Douglas Kelly UNC, Dean Stats 2002-04

Jon Kettenring Telecordia Math 2002-04

Tom Manteuffel U of CO Appl. Math 2005-08

James Landwehr NISS Trustees Chair Stats 2008-09

John Simon Duke, Asst. Provost Chemistry 2004-08

Daniel Solomon NCSU, Dean Stats 2002-08

NAC Members

Mary Ellen Bock Purdue Stats 2002-07

Peter Bickel UC Berkeley Stats 2002-05

Lawrence Brown Pennsylvania Stats 2002-07

Raymond Carroll Texas A&M Stats 2004-06

Carlos Castillo-Chavez Arizona State Math & Stats 2003-08

Ricardo Cortez Tulane U Math 2008-10

Rick Durrett Cornell Probability 2006-08

Jianqing Fan Princeton U Ops Research 2008-10

David Heckerman Microsoft CS & Stats 2002-04

Sallie Keller-McNulty LANL Stats 2002-05

Nancy Kopell Boston U Math 2005-08

John Lehoczky Carnegie-Mellon Probability 2002-05

Rod Little U of MI Biostats 2006-08

Jun Liu Harvard U Stats 2007-10

David Mumford Brown U Appl Math 2005-08

Susan Murphy U Michigan Stats 2008-10

George Papanicolaou Stanford Math 2002-03

Daryl Pregibon Google, Inc. CS 2003-08

Adrian Raftery Washington Stats 2002-03

G.W. Stewart Maryland CS 2003-08

Phillippe Tondeur Illinois Math 2003-05

Mary Wheeler U of TX Math & Eng 2004-07

Shmuel Winograd IBM Math 2002-03

Margaret Wright NYU CS 2002-04

Bin Yu (Co-Chair) U of CA Berkeley Stats 2006-11

LDC Members

David Banks Duke Bioinfo 2003-08

H. Thomas Banks NCSU Math 2005-08

204

Andrea Bertozzi Duke Math 2002-03

Lloyd Edwards UNC Biostats 2003-08

Gregory Forest UNC Math 2002-08

Jean-Pierre Fouque NCSU Math 2002-04

Montserrat Fuentes NCSU Stats 2004-08

Alan Gelfand Duke Stats 2002-03

Mark Genton NCSU Stats 2002-03

John Harer Duke Math 2004-08

Jacqueline Hughes-Oliver NCSU Stats 2002-04

Joseph Ibrahim UNC Biostats 2002-03

Christopher Jones UNC Math 2002-03

Thomas Kepler Duke Bioinfo 2002-04

Sharon Lubkin NCSU Math 2004-08

Sally Morton RTI Epidemiology 2005-08

Andrew Noble UNC Stats 2002-04

David Schaeffer Duke Math 2002-03

Scott Schmidler Duke Bioinfo 2002-03

Ralph Smith NCSU Math 2002-03

Richard Smith UNC Stats 2004-08

John Trangenstein Duke Math 2003-04

Butch Tsiatis NCSU Stats 2004-08

Mike West Duke Bioinfo/Stats 2004-08

Chairs Committee Members Jianwen Cai UNC Biostats 2005-06

Clarence E. Davis UNC Biostats 2003-05

Elizabeth DeLong Duke Biostats 2008-09

Patrick Eberlein UNC Math 2005-08

Jean-Pierre Fouque NCSU Math 2004-05

Alan Gelfand Duke Stats 2007-08

Richard Hain Duke Math 2004-06

Loek Helmnick NCSU Math 2005-08

Christopher Jones UNC Math 2003-04

Michael Kosorok UNC Biostats 2007-08

Vidyadhar Kulkarni UNC Oper. Res. 2003-08

Bernard Mair NCSU Math 2003-04

David Morrison Duke Math 2003-04

Sastry Pantula NCSU Stats 2003-08

William Smith UNC Math 2004-05

Dalene Stangl Duke Stats 2003-06

Mark Stern Duke Math 2007-08

William Wilkerson Duke Biostats 2003-05

E&O Committee Members

H.T. Banks NCSU Math 2004-05

Negash Begashaw Benedict College Math Sci 2004-08

205

Carlos Castillo-Chavez Arizona State Math & Stats 2004-08

Karen Chiswell GSK Stats 2004-08

Cammey Cole Meredith College Math & CS 2004-10

Wei Feng UNC-Wilmington Math & Stats 2004-08

Anne Fernando Norfolk U Math 2010-

Pierre Gremaud NCSU Math 2008-10

Leona Harris College of NJ Math 2010-

Gabriel Huerta U of New Mexico Stats 2010-

Marian Hukle U of Kansas Bio Sciences 2004-08

Negash Medhin NCSU Math 2004-08

Masilamani Sambandham Morehouse College Math 2004-08

Ralph Smith NCSU Math 2007-08

210

L. REPORT ON MATHEMATICAL SCIENCES

INSTITUTES DIRECTORS MEETING

Minutes and ReportMathematical Institutes Directors meeting

IASApril 30 – May 1, 2010

Brian Zuckerman of STPI gave an Overview of Feasibility Study (attached).

--Where do institutes fit in with the balance of the NSF? --Portfolio analysis of the Institutes within the field of math and in the area of training activities--There is a range of available tools for self-evaluation. Each institute now has its own way of collecting data. Could processes be shared to improve best practices? --Request to NSF for MIDs to see aggregate data that has been acquired from all of the math institutes. But there are privacy issues. An argument for use of data would have to be made to NSF counsel. --Are the institutes the most appropriate place to allocate funds or should the funds be used in another way? --60 data points suggested that the institutes do not have the same intellectual footprint. Almost no overlap among the participants at the institutes, i.e., very few people attend programs at more than one institute.

There was discussion of what areas of questions would produce useful data.

--How to track participants – some institutes (MSRI and IMA) use an identifier number, as opposed to an email address since email can change.

--The institutes responded quickly when federal funds became available and received a good amount of available funds. How can that be quantified?

--How do you know the effect of being at a particular institute several years down the road?

The institutes “enhance communities” – these relationships grow over the years and lead to collaboration. “Creating new knowledge” is very difficult to measure. Perhaps a qualitative as opposed to a quantitative approach should be used. David – change Goal 2 to “Increase interaction” (rather than impact) of math in other disciplines.Another goal could be the participation of mathematicians in non-mathematical fields and the participation of non-mathematicians in the math institutes.

How do the institutes effect the choice of where people end up?

It is important to add the collaboration of the institutes as a goal – create synergies. This can be a fifth goal. Add reaching out to the international community.

1A Change goal 3 from “solving problems” to “address the problems.”

Suggested that we not use bibliometrics approach, understanding that it has been 'gamed.'But some would like to include this - at least compile list of publications. One way to do this is to go

through the lists of publications that the institutes provide – self-reporting. Another way is to evaluate journals to get this information. But a lot can be missed and it doesn't really show the influence of the institutes. Also this may leave out people who are doing research in new areas so they are not publishing much right now. After discussion, this approach was rejected, although it was agreed to have a list of publications.

'Expert judgment approach' was rejected: The experts would have to be 'unconflicted” (no conflict of interest) which would mean that they wouldn't really know much about what the institutes do – which renders them “not expert.”

1b Stimulate development of new fields/disciplinesProblem – some effects that are cross disciplinary would not be discovered because the publications might be in a different disciplineMSRI uses an exit interview: How many new contacts did you make? Motivation: In order to get their key deposit back, they have to complete the survey!

1d Leveraging new fundingApplying for interdisciplinary grants is fairly new.

4a under-represented groupsMinority participants tend to be younger so if you ask about publications, you aren't going to get very many.

4b Effect of participating at an institute on the likelihood of getting an NSF grant.

4c train students and postdocs to become leaders in mathematicsHow did participation in an institute result in a different effect than non-participation? Looking at junior faculty, for instance.

Be careful of comparing positions with the same title that in fact may not be on the same level, such as an assistant professor at Princeton compared to an asst. professor at the U. of Maryland.

4d K-12 level and public understanding of mathematics Not associated with NSF.

Synergies (process evaluation question)

--Across the Institutes (example: quick response to the stimulus money granted through NSF)

--Interchange across NSF development of/participation in interdisciplinary programs--NSF used as a community resource

--More broadly, internationally

--Rotation of workshops on diversity that are already taking place

--In 2013 the collaboration on Climate Sustainability

--Joint website

The goal of STPI is to have the feasibility study done by the end of the summer.

Plan for ICERM – Proposed institute that is about interaction of math and computers/Jill PipherThe space is provided by Brown University. Not all higher math. For graduate students and postdocs; also for undergraduates in the summer. Should be operational by 2011. Microsoft, Google and IBM have representatives on the board. Board of Trustees is a governing board. Initially a course in “Connectic theory and computation.” Math, Applied Math and Theoretical Computer Science. The programs will be a semester-long. Postdoc positions may be a semester long or a year long.

MSRI tries to pair postdoc with a mentor before they get to campus. And then sometimes they encourage the postdoc to follow their mentor if the mentor leaves after a year.

All of the postdocs at IAS from the stimulus funds got jobs.

David reported on the COV.COV looked at the entire portfolio – 3,000 proposals. NSF gives a template of questions to look at the DMS portfolio. The purpose was to review the 'health' of the system. This committee was created in 2004. In 2007 it was recommended that the Math Institutes be reviewed, which resulted in STPI being here today. DMS wants to understand the value of the Institutes. They are interested in having some of the Institutes find additional sources of funding. The hope is that the released funding would be used to support new institutes.

David - What is being done in this meeting today is the beginning of a self-study, which will precede a review by DMS. Peter March wants some hard data so that he can defend the expenditure of funds for the Institutes.

4 items were reviewed: Looked at the studyLooked in detail at interdisciplinary programs and ARA fundingLooked at previous COV reportThere is a wide disparity in the way that the Institutes disseminate information.

It was suggested that the group come up with some ways to get hard data.

Review Action Items from 2009 meeting

IT Committee – Jim Kimmick of IPAM organized this.

Create a unified approach to video. It requires the directors to present the IT managers with a task to solve, or a directive. Suggestion: IT person at the host institution would give a report to the MIDs.Because MSRI is getting their videos organized into a usable data base, their IT manager might be able to work on this project for all of the Institutes. Maybe NSF could fund this project.

Action Item: Continue to support Association for Women in Mathematics. Contribution was supposed to be $550 per year per institution. Uncertain whether or not those contributions were made. Jill will look into this.

Report by Russel Caflisch (get it from him)--Plan for possibly sharing diversity programs--Infinite Possibilities Conference was held. --Modern Math workshop in Dallas was attended by about 120 people, including undergraduates.--Apply to NSF for support of mini-course. If that isn't funded, each institute would need to fund about $1,000 each for a scaled down version.

Brief report on Joint Postdoctoral Initiative: All of the postdocs at IAS funded by this Initiative found jobs. Several of the Institutes had 2-year postdocs so they haven't been job hunting yet. Brief AIM had a workshop last summer that was very successful. The information is on the AIM website. MSRI also had a one-day workshop that was successful.

Recommendation for JMN theme and for organizers: This is a joint meeting with 15 math institutes.Action item: The recommendation is Math That Connects the Dots or Mathematical Connections or Mathematical Networking. Russel Caflisch and Fadil Santosa volunteered to help organize the event.Bob Bryant and Marty Golubitsky will send the email with this information to the other math institutes.

MetricsIn collecting data for NSF, what is the definition of 'participant'? IMA has 'supported participants' and 'speaking participants'. The point of the question is how many people are benefiting from a workshop.

The directors want to continue to use surveys of participants/members in order to discern the effect of being at an institute on the participants' careers. It is difficult to standardize questions for participant/members a few years after they have left the institute. Perhaps look at one another's questionnaires. Action item: The questionnaires can be put on the MID password protected part of the website (MathInstitutes.org.) Write a brief description of what is done to collect the questionnaire.

Suggestions for ways to assess:--Number of new collaborations that can be attributed to the institutes--Number of new papers--Has the research become interdisciplinary (use DMS definition of 'interdisciplinary')--Number of hits on webpage (it was decided this is not useful)

--Number of hits on videos and length of time the videos were viewed

MSRI is tracking undergraduates who take their 6-week courses to see who continues on to grad school and who gets a Ph.D. They appoint a mentor for each class and that person is responsible for writing

an annual report.

There was a discussion of efforts to increase diversity.

NEXT YEAR – The MIDs will meet at AIM in Palo Alto.

203

II. Special Reports: Program Plan

A. Programs for 2010-2011

1. Analysis of Object Data

2. Complex Networks

INITIAL GOALS and OUTCOMES

STATEMENT for the Working Group

GEOMETRIC CORRESPONDENCE

Stephan Huckemann and Ross Whitaker (Administrators)

December 2010

Contents

1 Initial Goals 1

2 Series of Talks 3

3 Specific Research Projects for Current Pursuit 33.1 Generative Models for Registering Ensembles of Functions and

Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33.2 Minimax Correspondence for Shape Classification . . . . . . . . . 33.3 Measuring Landmark Importance for Planar Shapes . . . . . . . 43.4 Modeling Locus and Shape Variation by Partial Rotations . . . . 4

1 Initial Goals

In our first meetings we identified the following key goals

• alignment of functions

– with and without knots

– “vertical alignment”

– of an ensemble of functions

• alignment of shapes

– landmarks

∗ known labels but positions are free· pseudo landmarks· optimizing information theoretic content

∗ unknown labels but possibly fixed positions (e.g. protein match-ing/folding)

1

– mismatching configurations∗ Robustness∗ including occlusions∗ topological differences

– deterministic or not

• statistical issues

– formulation– bias– overfitting

∗ tradeoff between deformation and variability· where is the info: deformation vs. residual· marginalizing over the deformation

– automation based on statistical criteria– learning and classification by maximizing discriminality– hypothesis testing, regression, fair “p”-values

• applications to

– protein matching/folding∗ deformation is constrained by chemistry∗ matching of “blobs” between structures∗ 3D

– gels∗ mass and charge (2D)∗ comparison between organisms∗ register blobs with deformation and mismatch∗ E. paper by Mardia

– ensembles of surfaces of smooth biological shapes∗ bones, brains, faces, etc.∗ collections of hippocampi

– alignment of functions for growth curves– wings/leaves

∗ branching structures∗ discontinuities of shape∗ related to skeletal modeling

· graphs/trees· geometric trees

– projective shapes/cameras∗ allowinging for projective transformations∗ inheriting properties of the projective shape spaces

2

2 Series of Talks

As decided on in the first meetings, for the first period of approx. two monthssome group members gave kick off talks from their individual expertise, touchingthe above key goals. These talks were followed by extended discussions.

• Automated correspondences (Ross Whitaker),

• functional alignment (Daniel Gervini),

• curve matching landmarks (Leif Ellingson and Victor Patrangenu),

• alignment within Gaussian processes (John Moriarty) and

• correspondence using random fields (Ian Dryden).

3 Specific Research Projects for Current Pur-suit

In consequence of these talks and discussions, certain collaborative work mate-rialized in specific research projects.

3.1 Generative Models for Registering Ensembles of Func-tions and Images

Suppose we have a set of functions fi of which have sampled values fij . Un-derlying are unwarped signals fi = fi ◦ T−1

i with a warping Ti. We want toestimate the Ti from the data by setting up a generative process, that in thefirst instance we consider Gaussian

P (T, f) = G(Σ ef , µ ef

)G

(ΣT , µT )

which suggests a EM-approach to identify hidden parameters. E.g., such anapproach for images warps images not to each other but rather to a mean.Currently active researchers are Daniel Gervini and Ross Whitaker.

3.2 Minimax Correspondence for Shape Classification

Many applications deal with a specific classification problem. Here we presenta generic method of addressing this issue, a very specific method is presentedin Project 3.3 following below. E.g. in applications of forestry, assessing leafvariation of leaf shape over genotype, temperature gradient and leaf location inthe tree (e.g. crown or breast hight) is of great interest. Previous research usinglandmark based shape spaces has shown that for powers of tests the “correct”number and placement of landmarks is of great importance. In this projectfor a specific classification problem given, an automated landmarking schemeis to be developed that places landmarks such that the intra-group variation is

3

minimal while simultaneously the variation over the groups is maximal. Thismethod is to be based on purely statistical features building on Cates et al.(2006). Another potential application of this method lies in the early diagnosisof degenerative brain processes such as Alzheimer’s desease. Research in thisproject is currently at a very conceptional level, both working out a stringentmathematical formulation and making feasible a numerical approach. Currentlyactive researchers are Stephan Huckemann, Ross Whitaker, Martin Styner.

3.3 Measuring Landmark Importance for Planar Shapes

Planar landmark based Kendall shape space is a complex projective space. Aspace with less landmarks is a lower dimensional complex projective space thatcan be embedded in various ways. From a given data sample with k landmarks,in previously unpublished research, a method has been developed to computethe best fitting abstract shape space with k − 1 landmarks and in this the bestabstract shape space with k− 2 landmarks, etc. The entire procedure until theshape space of triangles is reached, or even further, up to a single geodesic anda single point can be called a

nested backward shape space principal component analysis

(cf. Jung et al. (2010a)). One way of assessing landmark importance is tocompare the total intrinsic variances in these abstract spaces with the totalintrinsic variances of concrete spaces obtained from leaving out one and morespecific landmarks. Algorithmic work has begun as well as some applications toIan Dryden’s digits 3 dataset (Dryden and Mardia (1998)) have been performed.Further application are planned to leaf data (cf. Huckemann et al. (2010)) inorder to automatically identify a minimal number of landmark placment forspecific discrimination problems arising in forestry. Currently active researchers:Stephan Huckemann, Sungkyu Jung and Steve Marron.

3.4 Modeling Locus and Shape Variation by Partial Ro-tations

A goal in computational anatomy is to keep track of rigid body motions as wellas of deformations of internal organs, one application being radiation therapy ofthe male prostate. In a novel approach, we are estimating few axes of rotationsof parts of organs in order to nearly exhaustively model rigid body rotation, androtations of parts of the organ, e.g. bending and pinching. Preliminary researchbased on modifications of principial arc analysis for spoke data on medial skele-tons (cf. Jung et al. (2010b)) has shown to have a very promising potential.Estimating dominating axes of (internal) rotations may also have applicationsto protein folding. Currently active researchers: Stephan Huckemann, SungkyuJung, Steve Marron and Steve Pizer.

4

References

Cates, J., Fletcher, P., Whitaker, R., 2006. Entropy-based particle systems forshape correspondence. Int Conf Med Image Comput Comput Assist Interv9(WS), 90–99.

Dryden, I. L., Mardia, K. V., 1998. Statistical Shape Analysis. Wiley, Chich-ester.

Huckemann, S., Hotz, T., Munk, A., 2010. Intrinsic MANOVA for Rieman-nian manifolds with an application to Kendall’s space of planar shapes. IEEETransactions on Pattern Analysis and Machine Intelligence 32 (4), 593–603.

Jung, S., Dryden, I., Marron, J., 2010a. Analysis of Principal Nested Spheres.Submitted in Biometrika.

Jung, S., Foskey, M., Marron, J. S., 2010b. Principal arc analysis on directproduct manifolds. Submitted to the Annals of Applied Statistics.

5

222

Analysis of Object Data Working Group: Dynamics and Inference Working Group Leader: Giles Hooker (Cornell University) After the initial workshop and some fruitful examination of background research, the working group has identified a number of directions for research during this year

- Populations of Dynamic Systems: while there has been significant recent interest in statistical inference for dynamical systems, there is a relatively small literature on collections of dynamic systems such as would occur for immune dynamics observed in different individuals or epidemiological dynamics in different cities. There is also very limited use of population-level data to assess Markovian assumptions about dynamical systems. At this level, dynamical systems (and their parameterizations) represent objects of study; our concerns here are to develop methods to incorporate covariate-dependent random effects within population models of dynamic systems, and to make use of multiple time-varying parameters to assess model fit.

- Signatures of Stochastic Features: a stochastic dynamic system requires the specification not just of a deterministic drift structure, but also a complete probabilistic description of the system evolution. It is common that this specification is simplified from a computationally-intractable model, or chosen from a class of tractable models in order to ‘noise-up’ a deterministic system. This portion of the project examines the use of spectral methods to determine the qualitative features of system stochasticity – are disturbances `rough’ or `smooth’? Can we distinguish these disturbances from measurement error? Will the change in the spectrum over time yield insight into potential inhomogeneous stochasticity? Given the computational cost and complexity of performing inference in dynamical systems, the ability to determine the type of stochastic evolution that is supported by the dynamics of the model will yield substantial benefits to the application of these models to real-world systems.

- Summary Selection: a common approach to inference in dynamical systems is to replace the observed data with summaries that describe qualitative dynamical features thought to be relevant. This is particularly well developed in the field of Approximate Bayesian Computation (ABC), but has also been suggested in other contexts. A focus of this group will be in the development of principled means of choosing summary statistics, based on loss-of-information from the original data, convexity of the likelihood surface, and robustness to model choices and assumptions.

Specific working group activities will include

- The development of methodology for non-parametric mixed-effects models in differential equation models for systems described by a multivariate state vector with unmeasured variables in the presence of covariates.

- The development of spectral-based tests for instantaneous noise in system forcing and for changes in noise variance.

- Methodology to select summary statistics that retain maximal information while remaining robust to specific model terms.

Working group participants: Giles Hooker (Cornell University), David Degras (SAMSI),

Yuefeng Wu (Cornell University), James Ramsay (McGill University), Darren Wilkinson

(University of Newcastle), Damla Senturk (Pennsylvania State University), Jiming Ding

(Washington University of St Louis), Sophia Olhede (University College London), Simon

Preston (University of Nottingham), Sy-Miin Chow (University of North Carolina, Chapel Hill),

Oliver Rattman (Duke University), Snehalata Huzurbazar (University of Wisconsin), Jaeyong

223

Lee (Seoul National University), Jiguo Cao (Simon Fraser University), David Campbell (Simon

Fraser University), Mark Girolami (University College London), Serge Guillas (Georgia Tech),

Hulin Wu (University of Rochester), Debashis Paul (University of California, Davis)

224

Analysis of Object Data: Working Group on Data Analysis on Sample Spaces

with a Manifold Stratification

Working Group Leaders: During the kick-off workshop, Vic Patrangenaru (F S U) realized that the objects that were studied (statistically or from a deterministic viewpoint) by speakers, were all sharing the nice property of being represented as points on certain metric spaces that admit a manifold stratification. Arguably, all object data may be represented as points on such spaces, therefore they should play a key role in the twenty first century Statistics. These sampling spaces include Hilbert manifolds ( functional data ), orbifolds ( of all sort of shapes), folded Euclidean spaces (of phylogenetic trees). Ezra Miller ( Duke ) took the leadership when it came to problems of understanding the singular part of stratified space and related geometry aspects. We started our broadcast meetings with the goal of understanding of the asymptotic

the early meetings, the working group has identified some important issues for initial research:

. The issue here is a new behavior

singular point, that does not change under small perturbations within the family, as first suggested by Steve Marron (U N C). This property was initially understood by a group

point of a spider ( the space of trees with 3 leafs happens to be a spider with 3 legs ); a proof resulting from the first group meeting, for the stickiness of the intrinsic sample means from a distribution on a spider a formalized by Thomas Hotz ( IMS, Germany ) in a paper in coauthored in that meeting was posted on the group website. I an ulterior subsequent paper Stephan Huckemann ( IMS, Germany ) started formalizing the group discussion on stickiness of the intrinsic mean singular points on higher dimensional tree spaces. Extrinsic sample means are asymptotically sticky to a singular extrinsic mean as well. As a consequence estimation of

large.

Nonprametr Gromov defined Cartan Alexandrov Toponogov spaces of curvature bounded from above by k, also known as CAT(k) spaces , by comparing geodesic triangles on such spaces with their corresponding counterparts on model Riemannian spaces of constant curvature k. In the case of folded Euclidean models for tree spaces, which are CAT(0)

a singular point is a phenomenon might that might not become evident when the samples are small. Therefore other slower consistent estimators, that are easier to compute might be useful in guessing the nature of the population mean ( singular or not ). In the case of CAT(0) spaces such an estimator, that is easy to compute is the so called Sturm estimator, that was brought up to our group attention by Sean Skewer(UNC). The asymptotics of the Sturm estimator are not known. Given that usually the coverage error can be improved by bootstrapping a pivotal statistic based on

from this bootstrap principle. Megan Owen ( Berkeley ) made progress in bootstrapping of Sturm estimators.

Projective Shape. Projective shape is widely accepted as the most appropriate notion of shape by computer scientists and others dealing with machine vision ( Hartley and Zisserman (2004) ). Two view points have been advanced in our group for dealing with

225

a statistical analysis of projective shape. One view, advanced by Vic Patrangenaru

(FSU), is an extension of the projective frame approach , presented in Comm. Statist.

Theory Methods , Ann. Statist. 2005, J. Multivariate Anal. 2008, 2010, 2011), and was

suggested by R. N. Bhattacharya. Consider a k-ad in general position in the m dimensional real projective space, with k > m+2. Let I be a set of m+2 indices in {1,…, k}. Consider the set P(I)∑(m,k) of all k-ads p = ((p(1),…,p(k))) such that $(p(i(1)),…, p(i(m+2))) is a projective frame, which is known to be a manifold diffeomorphic to a product of (k-m-2) copies of the projective space in m-dimensions. It has been shown by R.N. Bhattacharya (2010) that the full-projective space P(0)∑(m,k) of k-ads containing a projective frame is a differentiable manifold with smooth transitions among the P(I)∑(m,k), whose union over all sets I of indices cover P(0)∑(m,k) . The other approach, briefly presented in the opening workshop by John T. Kent (U. of Leeds ) and detailed by him in our group meetings in the particular case of a 4-ad on the projective line, is based on a representation of projective shape in a 2006 paper at the Leeds Annual

Statistics Workshop, that builds on a result of Tyler (Ann. Statist. 1987a,b) on the robust

estimation of a covariance matrix about the origin for vector data.

Extrinsic vs Intrinsic Data Analysis. The issue of what type of data analysis on manifolds, and by extension on stratified spaces is computationally faster is another issue discussed in our group. This topic was discussed in our group the case of data analysis on a hyperbolic plane, including data on impedances by Stephan Huckemann and Thomas Hotz ( IMS, Germany ) , Huling Le ( U.Nottingham,UK), Peter Kim (U of Gueph, Canada ) and Vic Patrangenaru (FSU) who started a joint collaboration on this subject. Another example discussed in the opening workshop by A. Schwartzman (Harvard) and Honk Tu Zhu (UNC) of extrinsic or intrinsic analysis of Diffusion Tensor Imaging data, is also on our group agenda.

The goals and expected outcomes include:

The ultimate goal of our working group is a nonparametric statistical data analysis of the twenty first century - an extension of nonparametric multivariate and functional data analysis to data analysis on spaces with a manifold stratification. An expected outcome is an extension to nonparametric data analysis on manifolds. Historically this was made possiblG. Watson, David G. Kendall (UK), Herbert Ziezold (Germany), and reached its current stage, due to work by Rudy Beran (U C Davis), Ted Chang (U. Virginia), Hendriks (Radboud U., Netherlands), Zinoviy Landsman (Haifa U., Israel), Frits Ruymgaart (TTU), Rabi N. Bhattacharya (U. Arizona), Steve Marron (UNC), Peter Kim (U Guelph, Canada), Vic Patrangenaru (FSU), A. Bhattacharya (Duke), A. Munk and Stephan Huckemann (IMS, Goettingen).

A data analysis of intrinsic means on folded Euclidean models for spaces of trees with p leafs

The role of nonparametric bootstrap in data analysis on stratified spaces

Derivation of the structure of the full projective shape space, and of ( preferably equivariant ) embeddings of this space leading to a projective analysis agreed upon by members of our group

Guidelines regarding a preference for a computational procedure intrinsic or intrinsic in object data analysis

Collaboration between applied mathematicians, bio-statisticians, statisticians and computer scientists.

226

Analysis of Object Data Working Group: Metrics in Shape Spaces

A. Personnel. The main people who have been active in the group include:

John Kent (administrator), Anuj Srivastava (administrator), Sungkyu Jung (local organizer), Hans Mueller,

Huiing Le, Ian Dryden, Irina Kogan, Hamid Krim, Ross Whittaker, Megan Owen, Sebastian Kurtek,

Stephan Huckemann, Victor Paneretos, Velentina Staneva, Christof Seiler

B. History of talks. We have had weekly talks from members of the group to make people aware of one

another's work and to suggest directions for collaboration

1. Anuj Srivastava. Shape Analysis of Elastic Curves in Euclidean Spaces

2. Han Mueller. Curve warping

3. Sebastian Kurtek. Riemannian Framework for Functional Analysis

4. Hamid Krim. Squigraph_modeling_of_Shapes

5. Irina Kogan. Geometric transformations, invariance, and congruence

6. John Kent. Basis expansions for shape variation

7. Valentina Staenva. Diffeomorphisms and RKHSs

C. Future talks

1. Anuj Srivastava. His new ideas on FDA

2. Irina Kogan. Curve matching

3. John Kent. Topology and metrics in projective shape spaces

4. Victor Paneretos. Shapes of closed curves and DNA geometry

5. Stephan Huckemann. Parallel transport on shape spaces

6. Ian Dryden. Matching unlabelled points using random fields.

D. Goals of the workgroup

The overall goal is to explore the implications of the choice of metric in various shape spaces. A metric

often induces a Riemannian structure, which in turn leads to geodesics and parallel transport. The

reason for our interest in metrics is that different metrics can highight different features of shape

differences.

227

Some of the main topics to be explored, and a brief summary of the progress so far, include the

following.

1. Better understanding of differential geometry on shape manifolds. Good progress, see below.

2. Interaction between modeling strategies for shape and for FDA. Good progress, see below.

3. Development of metrics for new shape spaces such as projective shape space -- good progress, see

below.

4. Robustness (e.g. L2 vs L1) -- not much explored yet.

5. Basis expansions of shape variation. Simplest ideas done. More on implications and extensions to be

explored.

6. Notions of deformation. E.g. landmarks, curves, surfaces, solids; all of these can also include the

ambient space. Basic understanding clear. No major new insights yet.

7. How sophisticated do models need to be? E.g. (a) Local (tangent space) vs global treatment of

deformations; and (b) landmark vs continuum models. We have explored the current knowledge;

nothing new yet.

E. Achievements. Several papers are in progress or near completion.

1. Splines on manifolds. Anuj Srivastava, Eric Klassen, Ian Dryden and Huiing Le

2. Application of shape models for curves to Functional data analysis. Anuj Srivastava, Sebastain Kurtek,

Eric Klassen and Hans Mueller.

3. The geometry and topology of projective shape spaces (John Kent, Thomas Hotz, Stephan Huckemann

and Ezra Miller)

4. Matching curves (Irina Kogan and student).

228

COMPLEX NETWORKS

Goals and Outcomes, Fall, 2010

DYNAMICS OF NETWORKS WORKING GROUP

The “Dynamics OF Networks” seeks to develop new collaborations to explore models,

methods, and implications of dynamic processes which change network topologies.

This working group formed as a natural outgrowth of common interests at the Opening

Workshop, has met weekly throughout Fall 2010, and plans to continue to meet during

Spring 2011. The group includes significant overlap of interests with the other CN

working groups, including but not limited to analysis of longitudinal data (overlapping

with the Geometrical/Spectral group), the modeling of networks via agent based models

(overlapping with Sampling/Inference), and the exploration of abstract voter models (as

just one specific project overlapping with the Dynamics ON Networks group). Indeed,

because of the significant overlap with the ON group, we plan to merge the weekly

meeting of the two groups for the Spring semester. That is, the ongoing subgroup

projects of both groups will continue to be represented and brought together in a single

weekly meeting, freeing up time for those subgroups to meet informally.

Starting at the OW, there has been intense early interest in the working group on

“bridging” the different network perspectives commonly attributed to communities in

statistics, probability, and statistical physics. Much of this interest in the group has

focused on the phenomenon of “explosive percolation” (described at the OW in the

presentation of Raissa D’Souza), namely the observation in some constructive models

where the giant component emerges in a seemingly sharp transition, cf. in classic models

such as the Erdos-Renyi random graph model. Quantifying this notion mathematically,

understanding the scaling window and understanding how small components merge into

larger components eventually leading to the formation of the giant component brings in

very interesting connections to the famous Smoluchowski equations in colloidal

chemistry, and Markov processes such as the multiplicative coalescent. One of the hopes

of the working group is to develop the techniques that will enable us to prove the first

mathematically rigorous results in this context. Our early efforts in this direction after

the opening workshop included conversations in the group led by Steve Fienberg and

Raissa D’Souza, among others. Our ongoing efforts aimed towards rigorous results are

currently led by Shirsendu Chatterjee, Charlie Brummitt, Shankar Bhamidi, Raissa

D’Souza, and Rick Durrett.

A second model problem of focus in the working group is the analysis of “Dynamic

Voter Models” [Holme & Newman, 2006], hybrid models that incorporate both a

rewiring dynamic, so the network changes in time, and a voter interaction process which

modifies the node/actor opinion to drive agreement between connected nodes in the

network. When the selected rewiring dynamic is to drop connections between

disagreeing actors, the two processes cooperate in pushing the network towards a final

state of complete consensus within each component, but the distribution of properties of

229

those components depends on the relative frequency of activity of the rewiring and

interaction processes (codified by a parameter). Our primary goals include the

demonstration of the existence of a critical value for this rewiring-frequency parameter,

at which the model changes behavior between the two extremes of a purely voter model

dynamic and a purely rewiring dynamic, the development of model moment-closure

equations for approximating this process, and the rigorous analysis of the process

(especially at and near criticality). Numerical simulations are being used to explore the

behavior of this system, including the different behaviors observed at different rewiring

frequencies, edge densities, and opinion initial conditions. We are using a pair-

approximation approach to gain a heuristic understanding of the process and the location

of the critical value. We are also using a rigorous large deviations approach to

demonstrate the existence of the critical value. This work is led by David Sivakoff, Bill

Shi, Rick Durrett, Alun Lloyd, Peter Mucha, and Mason Porter.

A third subgroup project is investigating the role of concurrent partners (overlapping in

time) in the spread of sexually transmitted diseases. STD transmission through the

dynamic contact network must obviously obey an arrow of time, requiring that the start

time and duration of each edge must be respected, and the resulting questions about the

influence of concurrency in enhancing rapid spread are important for modeling such

systems and for properly targeting behavioral and biological interventions. Multiple

researchers have developed competing measures for the level of concurrency in a

networks, and a wide variety of approaches have been used to study disease dynamics as

a function of concurrency, including but not limited to extensive use of simulations. One

part of this working group’s activity in this area is in addressing the utility of different

network representations thereof. For instance, the temporal information can be used to

create a directed network of possible transmissibility (either exactly or nearly obeying all

temporal rules). An alternative approach is to explore the allowed permutations in a fully

ordered edge list (those which do not change the available transmission paths). These

representations will be used to develop formal models of dynamic sexual networks,

targeting bounds on the size of the largest or expected disease outbreak as a function of

an appropriate global measure of concurrency. This work is led primarily by Bruce

Rogers, Alun Lloyd, Jim Moody, and Peter Mucha.

Additional prospective subgroup projects also remain under discussion. Such continued

exploration will remain an important emphasis for the working group in Spring 2011.

The Dynamics OF Networks workshop will be held at SAMSI in January 2011. The

workshop mixes a collection of invited speakers with presentations by members of the

working group, as well as extensive time blocked out for interaction with the visiting

speakers on the existing subgroup projects and the possible development of new

collaborations.

Dynamics ON Networks: Goals and OutcomesGroup Leaders: Rick Durrett and Alun Lloyd, Postdoc: David Sivakoff

Active group members: Shankar Bhamidi (UNC), Charlie Brummitt (UC Davis),Shirshendu Chatterjee (Cornell visiting Duke), Ariel Cintron-Arias (East TennesseeState U), Luis Gordillo (U Puerto Rico), Christian Gromoll (Virginia), Peter Kramer(Rensselaer), Jim Lynch (South Carolina), John McSweeney (Concorida), Peter Mucha(UNC), Mason Porter (Oxford), Michael Robert (NC State), Puck Rombach (Oxford),Bill Shi (UNC), Mandy Traud (NC State), Tipan Verella (Virginia)

At this point a number of projects have emerged for investigation.

SIS models on networks with communities. During the Stochastic Dynamicsprogram John McSweeney and Bruce Rogers considered the contact process (SIS epi-demic) on a strongly clustered graph, described as a stochastic block model. In thesimplest case there are two communities with the probability of an edge connectingtwo individuals within the same community p1 and the probability of an edge con-necting two individuals between communities p2 much smaller than p1. In a rangeof parameters, a plot of the number of infected sites versus time shows a stair step:the epidemic reaches equilibrium in one community before spreading to the other.Work on this problem is continuing in the Dynamics ON Networks working group ofthe with David Sivakoff and Rick Durrett. A proof has been sketched and work isproceeding on a paper.

Nonlinear voter models. The ordinary voter model with a linear response functionis now fairly well understood on complex networks due to its duality with coalescingrandom walk. Redner et al have studied various nonlinear versions. Chatterjee andDurrett will try to use ideas of Cox, Durrett, and Perkins to show that the mean fieldpredictions for the nonlinear voter model are accurate when the nonlinear responsefunction is almost linear. They think that this method can be used to cover the latentvoter model where voters who have just changed their opinions enter an inert statefor an exponential amount of time with mean λ if the rate λ is large. Although thisis a small perturbation of the voter model, the behavior changes disconintuously, thedensity converges to 1/2 starting from any poaitive level.

In Axelrod’s model, voters have a vector of F opinions, each of which can take oneof q values. Each voter x at rate 1 picks a neighbor y, and one of his opinions i. Ifthe two agree on issue i then x picks a j and imitates ys opinion on that issue. Whenq = F = 2 this reduces to the model with leftists (00), centrists (01, 10) and rightists(11). Lanchier has proved some results in one dimension about the clustering in thecase q = F = 2 and freezing when q is larger than F . Durrett and J.C. Li will workon generalizing the second result to two dimensions, starting with the case when bothq and F are large.

Pair approximation. Mandy Traud and Michael Roberts are coding and testing thefull pairwise approximation model for SIS epidemics using the equations exhibited in

1

Keeling and Eames - Modeling Dynamic and Network Heterogeneities in the Spreadof Sexually Transmitted Diseases. They plan to test actual network data with theseequations, code few different closure approximations, and test different selections ofparameters.

John McSweeney, Charlie Brummitt

There are four types of projects we are interested in:

Explaining Numerics. Two recent papers discuss approaches for analyzing pro-cesses on graphs that are a priori non-rigorous, but seem to give excellent results incertain circumstances. These are (i) the “unreasonable effectiveness” of tree-basedtheory to predict behavior on clustered networks (Melnik et al) and (ii) How accurateis mean field theory? (Gleeson et al.) In the first case, they find that the meaninter-vertex degree is a good predictor of whether the tree-based theory will be valid,despite the presence of clustering. In the second case they conclude that mean-fieldtheory is accurate when the mean of the size biased degree distribution is large. We’dlike to find out the theoretical reasoning behind this, in addition to finding out moreabout which statistics are predictive of the applicability of these techniques. Workon the contact process has shown that the mean field critical value is accurate whenthe number of neighbors is large, so perhaps a result can be proved in this directionfor some systems.

Clustering. The clustering coefficient C of the graph, the ratio of triangles to triplesin the graph, has been shown to be an important factor in determining propertiesof processes, for example the percolation threshold. However depending on how thegraph is built, this may increase or decrease the threshold; see conflicting resultsof Newman (2009) and Gleeson (2009), for example. The latter paper identifiesdegree correlations as an important factor to consider as well (beyond the obviouslyimportant factor of mean degree). We would like to identify a type of “clustering”that is not captured by the usual definition of C. There are three main themes as faras these issues are concerned:

• How to insert clustering (or, more generally, “non-locally-tree-like-ness”)?

• How to summarize it (i.e., find low-dimensional statistics that are useful forpredicting process behavior)?

• How to analyze it (i.e. come up with analytical formulas that include thequantity you want, and the right statistic of the graph)?

Another goal is to determine the appropriate “null models” for clustering. Recentpapers have reached different conclusions regarding the effect of clustering on thingslike the bond percolation threshold. How do we isolate the effect of clustering fromother effects (such as heterogeneity of degrees and degree-degree correlations)? Theremay be multiple useful null models, each with their own advantages and drawbacks.

2

Thresholds for the Watts’ cascade process. In this model for the spread of a fador new technology in a network, there are two transitions for the cascade threshold asedges are added: the first is around when the graph becomes sufficiently connected toeven have a giant component, obviously a necessary condition for existence of a giantcascade. The second transition occurs when the vertex degrees get large enough sothat having a single infected neighbor vertex is not often a high enough proportionto transmit excitation. Simulations in Gleeson and Cahalane indicate that when thedegree distribution is Poisson this transition is discontinuous, and the cluster sizedistribution decays exponentially at the critical value in contrast to the usual powerlaw behavior. We’d like to prove these conclusions rigorously and investigate theirgenerality.

We would also like to investigate the issue of cascades on clustered networks, re-cently tackled by Ikeda, Hasegawa, and Nemoto (2010). Their preliminary conclusionis that clustering decreases the cascade threshold, in the following sense. Let ϕ bethe “response function” for each vertex. Then they find an interval I = [ϕ1, ϕ2] suchthat for ϕ ∈ I, cascades happen with low but nonzero probability p on a clusterednetwork, and impossible on a similar but rewired unclustered network. We would liketo find out what is going on in this middle region: is the behavior sensitive to seedsize, as in Gleeson and Cahahlane (2007)? Does p increase predictably as ϕ increasesthrough I? Is there a way to analytically identify ϕ1 and ϕ2? More importantly, isclustering really the fundamental property dictating the changing threshold, or doesit just happen to be correlated in that way for their specific model?

Other kinds of percolation We are also thinking about other kinds of percolation.In what we are beginning to call 1/k-percolation, an edge from u to v percolates withprobability inversely proportional to the degree of v. This arises in the sandpile model(a model of load on a system), and it loosely captures the idea that more importantnodes have more edges and are less likely to join the cascade. This was studied fiveyears ago [?], and recently a couple members of this group studied it on modularnetworks using multi-type branching processes [?]. We are interested in extendingthis to non-tree-like networks with triangles or other subgraphs added, and perhapsto develop computational methods to predict cascade sizes for such networks.

Shankar Bhamidi has a number of ideas that are relevant to this group and toDynamics OF Networks

1. Flow processes through the network: Suppose the aim of the network is totransport flow between different parts of the network. We are interested in developingmathematical models and techniques to understand the build up of congestion overthe edges in the network in the large network limit, as well as understanding propertiesof the optimal path between different parts of the network.

2. Degree distribution of shortest path trees: One model of great interest forpractitioners is understanding properties of optimal path trees in networks with con-gestion across edges. Using techniques from first passage percolation and stochastic

3

process theory we shall try to deliver asymptotic results for a number of differentrandom graph models.

3. Networks evolving over time: In a number of models, networks evolve overtime with the addition of new vertices arriving into the system and new edges beingcreated between different parts of the network. One of the most popular modelsof such networks is the preferential attachment model which has been used in adiverse number of fields ranging from statistical physics to modeling the internet, fromlinguistics to systems biology. Most authors have focussed their attention only on thedegree distribution. However what we aim to do is develop a general set of techniqueswhich give asymptotic information about a wide range of functionals ranging fromlocal neighborhoods to global functionals such as the spectral distribution.

4. Boolean networks: Boolean networks are models inspired from biological appli-cations. Only in the last few years has there been rigorous work done in trying toprove the many conjectures proposed by experimentalists over the years. Our aim(along with Shirshendu Chatterjee) is to derive rigorous results when the underlyingdirected graph is inhomogeneous.

The Dual of a Random Intersection Graph. Christian Gromoll and Tipan Verella areinterested in studying the emergence of a giant component in the dual of a generalizedrandom intersection graph. Conceptually, given a set of nodes and a set of attributes,a random intersection graph is a graph on nodes induced by constructing a randombipartite graph between the nodes and the attributes. Interest in random intersectiongraphs, has increased in recent years. However, very little attention has been devotedto studying the dual graph naturally constructed by the same process that producesthe random intersection graph.

One of the interesting features of the dual of a random intersection graph isthat it has naturally occurring triangles and other complete subgraphs of varioussize. One can however show that the two random graphs do not have the samedistribution. We would like to study the emergence of a giant component in thedual of the random intersection graph. One of the difficulties of analyzing bothgeneralized random intersection graphs and their duals come from the fact that edgeoccurrence is dependent and not identically distributed, unlike in the Erdos-Renyırandom graph We will follow the approach in Chapter 10 of Alon and Spencer’s TheProbabilistic Method, in which one defines a process Yt keeping track of the boundaryof an exploration of the nodes of a given component of the graph.

4

Complex Networks: Geometry/Spectral Analysis Working Group Group leader: Mauro Maggioni (Duke), Michael Mahoney (Stanford)

Participants: Prakash Balachandran (Duke), Pietro Poggi Corradini (Kansas State), Mihail

Cucuringu (Princeton), Raymond Falk, Mauro Maggioni (Duke), Michael Mahoney (Stanford),

Peter Mucha (UNC Chapel Hill), Mason Porter (Oxford), Ali Shojaie (U. Wisconsin Madison),

Blair Sullivan (Oak Ridge Lab).

The group has been studying a number of topics in spectral and multiscale techniques for graphs,

and their geometric and algorithmic implications, with separate overview/tutorials/open

problems by Mason, Mauro, and Mike, as well as tree decomposition methods and their

algorithmic applications (Blair).

It is now focusing on several problems that emerged as interesting to subgroups of people:

1. Multiscale conductance and time-series Networks: Kash, Mason, Mauro, Mike, Peter.

Inspired by recent work by Mason and Peter on multislice networks, and by Mike on

multiscale conductance, we would like to understand the behavior of multiscale methods (e.g.

conductance, community detection, diffusions) on multislice networks.

2. Soft clustering: Alan, Mason, Mike, Ray. Most existing community-detection algorithms find

hard partitions in the network, while in some situations it seems more natural to construct soft-

partitions. Few methods do that, but it is a new topic and little is well-understood. It would

also be interesting to apply new ideas to the case of multislice networks.

3. Tree decompositions. Alan, Ali, Blair, Mike. These decompositions simplify graphs, leading

to faster optimization/solution of problems which are computationally hard. They are related

to important problems in statistics (especially in graphical models) but such ties are not well-

understood. Moreover, these decomposition algorithms are right now very combinatorial in

nature, and one may wonder if they would become easier/more stable by weaking some of the

requirements, while still being useful.

4. Community Detection: Mihail, Mike, Mauro, Kash. Spectral techniques based on intermediate

eigenvectors of the Laplacian and/or modularity may be used to detect communities. Mike’s

paper provides overview of the state-of-art in the field, and some of techniques could be used

on data sets that Mason can provide. Specific goals still to be defined.

5. Scan Statistics/sparse regression - Ali, Mauro, Alan, Ray, Mike. Scan statistics on graphs is

not well understood and only few technique exist, albeit many applications seem to be in need

of such tools, such as epidemics (early detection), genomics (identification of pathways

relevant in a disease), etc... Ali and Mauro working on a multiscale framework, with

applications to genetic data. Mason and Peter may provide charity network data where again

scan statistics may be a useful tool.

6. Epidemic/Reactive Networks: Pietro, Mauro. Pietro has collected and is processing data from

questionnaires that allow mapping social interactions and behavior in a community, both

under normal conditions and in the case of the emergence of an epidemics. Many questions

arise: would the network change enough to prevent the epidemics? How would the network

change? What would be efficacious ways of modifying the network to prevent the spread?

SAMSI Program on Complex Networks

Goals and Outcomes, Fall, 2010

WORKING GROUP ON SAMPLING, MODELING, & INFERENCE

The “Sampling/Modeling/Inference” working group is focused on topics that arguably involve some or all of “sampling,” “modeling,” and “inference” of networks. Much of our focus has been restricted to static networks, but some projects have nontrivial overlap with interests in the “Dynamics OF” working group as well.

Members of the working group are active on six group projects. Additionally, it appears to the organizers that many participants are doing individual research that is informed by our weekly discussion.

Project 1: Inference on Agent-Based Models for Networks

Agent based models (ABM) specify local interaction rules between individuals and implement these rules through computer simulation. Agents are placed together in a simulated environment, and then one can observe repeated realizations of the evolution of social networks among the agents. But such applications need methods for statistical inference that relate the parameters (agent rules) to the network behavior. Important open problems include how to estimate parameters given a time series of the (simulated) observables or how assess the goodness-of-fit of the model from the given data. These challenges are difficult because each agent can have its own set of parameters, so the dimension of the parameter space grows geometrically in the number of agents. To perform dimension reduction in parameter space and make inference possible, David Banks, Bruce Rogers and Cosma Shalizi are building statistical emulators for agent based models using a Bayesian framework for the calibration of computer models. As a proof of concept we are using the "Sugarscape" agent-based model from Axtel and Epstein's book Growing Artificial Societies.

Project 2: Fit Assessment for Attachment Models

Much of the recent literature on real-world networks concerns mathematical models which relate their characteristics, including degree distributions, small worldedness, and clustering coefficients to parameters in various preferential attachment models. At the simplest level new nodes enter the system and decide to link themselves to nodes in the pre-existing network with probability proportional to some functional of the vertices in the network, for example their present degree. Models based on such schemes arise in a number of areas, ranging from economic systems wherein nodes try to optimize two competing functionals to evolutionary biology (Yule processes). Often one looks at the degree distribution of the real network at hand and matches it to an appropriateattachment model and then uses various properties of the model to infer properties about the network (e.g., epidemic durations). However, there is no mathematical theory adequate to judge

such fits. Shankar Bhamidi, Shirshendu Chatterjee and Eric Kolaczyk are exploring this area in order to provide the necessary inferential theory.

Project 3: Baboon Grooming Data

A consortium of primatologists has collected data on social interactions in twelve baboon troops in Kenya. A subgroup consisting of David Banks, Yingbo Li, Bruce Rogers and Ali Shojie is attempting to fit a dynamic latent space model to grooming relationships over time, using covariate information on kinship, relatedness, age, health, gender, reproductive status, dominance hierarchy, and rainfall. The key aim is to product the observed fission in the baboon troop, which entails a block model for the the proximity matrix in the latent space.

Project 4: Effective Sample Size in Network Modeling

Stochastic models for network structure and processes have applications spanning the social, informational, and biological sciences. However, complex networks have complex structure, and when sophisticated models reflect this in their own dependence structure, ordinary sampling and asymptotic results used to quantify uncertainty often do not apply. Indeed, the only unambiguously independent sample for many network processes is repeated independent observations of realizations of that network over the same set of actors --- something that is, with a few exceptions, not possible. And yet, observing more actors in a network clearly provides more information about the network's structure than observing fewer. Pavel Krivitsky, Sean Simpson, Crystal Linkletter, and Eric Kolaczyk are doing work that seeks to establish notions of effective sample size for network models, taking into account the dependence structure and parametrization of these models and the process by which the network is observed.

Project 5: Estimation of the Degree Distributions

The degree distribution of a sampled network can differ from the degree distribution of the underlying population network, as pointed out, for example, in “Subnets of Scale-Free Networks are Not Scale-Free: Sampling Properties of Networks” Stumpf, Wiuf, May (PNAS, 2005). Work by Frank has shown how to obtain unbiased estimates of the number of nodes of any degree given a simple random sample of nodes, and normalizing that distribution yields an estimate of the degree distribution. Bruce Spencer and Eric Kolaczyk are working to extend those results in two directions. First, the mean and variance of the degree distribution can be calculated from the triad census, and hence they can be estimated from an induced graph sample. Work is underway to improve Frank's estimators by adjusting them to estimates of the mean and variance based on the sample triads. Second, work is underway to extend the results from samples where nodes are included with equal probabilities to samples based on unequal probability sampling.

Project 6: Comparison of Aggregated Relation Data and Ego-Centric Sampling

Tyler McCormick, Tian Zheng, and Eric Kolaczyk are studying a framework for inference in multi-relational data measured through standard surveys. Our goals are (i) formalizing the information contained in two sampling methods (aggregated relational data and ego-centric fixed-size nominations) (ii) exploring the relationship between these two methods (iii) learning about the relationship between two nested networks.

Mini-Project: Minimum Description Length Inference

Edo Airoldi and David Banks have written an NSF proposal that includes a studyof minimum description length inference for a class of latent space models. Iffunded, this project will surely move forward; if not, its pursuit will dependupon available energy and time.

B. Scientific Themes for Later Years:Uncertainty Quantification: 2011-2012

1 Introduction

Mathematical models intended for computational simulation of complex real-world processes area crucial ingredient in virtually every field of science, engineering, medicine, and business. Tworelated but independent phenomena have led to the near-ubiquity of models: the remarkable growthin computing power and the matching gains in algorithmic speed and accuracy. Together, thesefactors have vastly increased the applicability and reliability of simulation - not only by drasticallyreducing simulation time, thus permitting solution of larger and larger problems, but also byallowing simulation of previously intractable problems.

Utilization of such computer models of processes requires addressing Uncertainty Quantification,a greatly expanding field focusing on at least the following issues:

• Models are invariably not a completely accurate reflection of reality; this bias or discrepancyneeds to be acknowledged in analysis and prediction.

• The models are often so expensive to run that model outputs may be available for only a fewinputs; for practical purposes the model predictions are thus unknown at these other inputs.

• There are often uncertainties in the initial conditions for the models.

• The models typically have numerous unknown parameters.

• The models may have stochastic features.

• Predictions combining model output and noisy data (data assimilation) is often required.

The intellectual content of computational modeling comes from a variety of disciplines, includingapplied mathematics, statistics, probability, operations research, and computer science, and theapplication areas are remarkably diverse. Despite this diversity of methodology and application,there are a variety of common challenges detailed below in developing, evaluating and using complexcomputer models of processes, which directly relate to the mission of SAMSI.

Uncertainty Quantification (UQ) will be a year-long SAMSI program for 2011-2012. Thisnascent field and its recent developments are closely intertwined with several applications, mostnotably climate and engineering. To properly explore not only methodological issues but also ap-plications to specific fields, SAMSI will devote its entire resources in 2011-2012 to the UQ program.The structure of the 2011-12 UQ program is indeed unique among SAMSI programs in that, insteadof running two year long programs in parallel, we will run one UQ program with not only (i) amain methodological theme but also, stemming from it, three distinct application fields, namely, (ii)climate modeling, (iii) engineering and renewable energy and (iv) geosciences. This will foster theseamless integration into the program of these fields and their corresponding research communities.For the sake of simplicity, we hereafter refer to each of these four themes as “programs”.

238

239

Current Overall Program Leaders: Amy Braverman (JPL, Cal. Tech and UCLA), Don Es-tep (Colorado State), Roger Ghanem (USC), David Higdon (LANL), Christine Shoemaker(Cornell), Jeff Wu (Georgia Tech)

Local Scientific Coordinators: James Nolen (Duke), Jan Hannig (UNC)

Directorate Liaison: Pierre Gremaud (NCSU)

National Advisory Committee Liaison: Max Gunzburger (Florida State)

2 Research foci and programs

2.1 Methodology program

The program will focus on five complementary themes described below. It is expected that the workdone within the various working groups for this program will complement and address challengesof interest to the applied side of the program, see three applied programs below.

Program leaders:

Dan Cooley (Colorado State), Don Estep (Colorado State), Max Gunzburger (Florida State), JanHannig (UNC), David Higdon (LANL), Jeff Wu (Georgia Tech)

2.1.1 Methodology theme: Foundations

Fundamental questions of interest include treatment of model error, inference with multi-models,treatment of multiscale models, numerical error estimation, reduced order modeling, and theoreticalfoundations of various UQ problems.

More specifically, multiscale, multiphysics systems raise well known challenges for accuratemodeling and simulation and these all carry over to UQ. Particular challenges include combiningdifferent descriptions of uncertainties from different kinds of physical models and different kinds ofdata, treating the coupling and feedback between the uncertainties and errors generated in eachcomponent, and dealing with the effects of scale on uncertainty representations. Scale differences inbehavior of different physical components may lead to scale issues in computing and representinguncertainties.

Statistical analysis of computer models (e.g. use of MCMC methods) and operational use ofmodels (e.g. data assimilation, optimization, or use of the model for risk assessment) may requirehundreds or thousands of model evaluations. When a single computer model run is time-consuming,such analysis and use may be intractable. But a solution that is often almost as good, or at leastgood enough for many purposes, can sometimes be obtained through model approximations, oftencalled emulators, surrogates, or meta-models. In developing these approximations, ideas such asmodel reduction, dimension reduction and order reduction are very active research topics in severalareas of statistics, applied mathematics, computer science, and operations research. One ideais to develop an exact but much less complex (or much smaller) model that retains the mostimportant features of the original; another approach is to compute an approximate solution of the

240

original problem that retains a provable degree of closeness to the exact solution; and obviouslya combination of these ideas is possible. In all cases, a mixture of rigor and application-specificknowledge is needed to deal with increasingly high-dimensional spaces and data.

For stochastic PDEs, efficient algorithms for the characterization of lower dimensional manifoldsoccupied by solutions to specific stochastic partial differential equations are starting to emerge,which is another useful type of dimension reduction.

2.1.2 Methodology theme: Representation and propagation of uncertainty

Representing uncertainty and error in ways that are both descriptive and useful for computationis an ongoing research challenge. Research is particularly needed in high fidelity representations ofrandom quantities, e.g. polynomial chaos and probability distributions. Research is also needed inhow to combine different representations of uncertainty in one form and the effect of changing scalesbetween representations. The program will look at propagation of uncertainty through complexphysical models, specifically in conjunction with the three applied programs described below. Howto best exploit the mathematical structure of models for uncertainty propagation is also a promisingavenue of investigation.

Sensitivity analysis generates essential information for model development, design optimization,parameter estimation, model reduction to name but a few. It has been demonstrated that forwardsensitivities can be computed reliably and efficiently for many differential systems provided thenumber of parameters is not too large. For larger problems, the adjoint method is more appro-priate but comes with additional complexity in its implementation. The proper use of additionalinformation such as derivative information would open the way to more efficient algorithms. Opencomputational issues related to sensitivity analysis include analysis of the accuracy the sensitivitiesthemselves and their inclusion in general adaptive methods.

Another important class of methods for uncertainty propagation are variously termed sequentialMonte Carlo, particle methods, or ensemble methods. While SAMSI had a program on this topic in2008-09, it focused on the development of the theory behind such methods, and not their utilizationin dealing with uncertainty in complex computer models.

2.1.3 Methodology theme: Data assimilation

It is often crucial to combine data with computer models to provide accurate predictions. Thequintessential example is weather forecasting, where current data is used (together with ‘ensemble’runs from the model) to set the initial conditions. The computer model is then exercised fromthese initial conditions to make weather predictions. Data assimilation often involves very highdimensional data, and questions of modeling and computation abound, as well as the questions ofhow best to interface the data with the computer model, especially when data is of an unusualtype, such as Lagrangian data. Other issues include incorporating information from disparate andsparse data sets, filtering and forecasting, response surface techniques, sampling based approaches,derivative/variational-based approaches.

241

2.1.4 Methodology theme: Inverse problem and calibration

Large-scale computational models are a powerful tool for planning and decision-making. Suchmodels can be used to help assess important questions regarding the management of a particularsystem. Such issues are often posed as inverse or optimization problems. Use of models for thispurpose requires understanding of the uncertainty and error in predicted behavior.

The fidelity of models to reality (often termed model validation) is central to their effectivenessin understanding and predicting real phenomena. However, techniques for validating complexmodels are limited and are often based on informal heuristics. Validation seems straightforwardconceptually: data are collected that represent both the inputs and the outputs of the model, themodel is run with those inputs, and the outputs are compared with the observed data correspondingto the outputs. In reality, complications abound: observed data may be expensive, scarce or noisy,the model may be so complex or time-consuming that only a few runs are possible, there can besevere confounding between calibration and validation, and uncertainty enters the process at everyturn.

There is a very non-uniform understanding in what specific approach to use for model calibrationdepending on the models themselves. This is a field where the state-of-the-art methods in thedeterministic and statistical communities are very different and where both could clearly benefitfrom learning more about the other. Further, issues such as error and uncertainty analysis formodel calibration are still in their infancy. A better understanding of these is expected to bothgenerate new methods and greatly enhance the efficiency of existing ones.

2.1.5 Methodology theme: Rare events

A very difficult challenge is accurately representing the effects of events that have very low probabil-ity of occurring but which have very large or catastrophic effects when they do occur. Specific issuesinclude reduced order models, computational and physics models to detect rare events, investigationof relevant statistical methodologies, optimization techniques.

2.1.6 Prospective participants

Applied Math: Todd Arbogast (Austin), Guillaume Bal (Columbia), Leonid Berlyand (PennState), Liliana Borcea (Rice), Alexandre Chorin (Berkeley), Clint Dawson (Texas), John Dennis(Rice), Yalchin Efendiev (Texas A&M), Don Estep (Colorado State), Colin Fox (U. Auckland),James Glimm (SUNY Stonybook), Max Gunzburger (Florida State), Jan Hesthaven (Brown), TomHou (Caltech), Chris Jones (UNC), George Karniadakis (Brown), Andrew Majda (NYU), RichardMoore (NJIT), J. Tinsley Oden (Austin), Houman Owhadi (Caltech), George Papanicolaou (Stan-ford), Linda Petzold (UCSB), Liang Peng (Georgia Tech), Bruce Pitman (Buffalo), RosemaryRenaut (NSF), Juan Restrepo (U. of Arizona), Boris Rozovsky (Brown), Christoph Schwab (ETHZurich), Elaine Spiller (Marquette), Bill Symes (Rice), Mary Wheeler (Texas), Dongbin Xiu (Pur-due), Donxiao “Don” Zhang (USC).

Computational Probability: Amarjit Budhiraja (UNC), Paul Glasserman (Columbia), DennisTalay (INRIA), Harold Kushner (Brown), Eric van den Eiden (NYU), Pierre DelMoral (INRIA),

242

Eckhard Platen (University of Technology, Sydney), Paul Dupuis (Brown), Rene Carmona (Prince-ton), J.P. Fouque (UCSB), Andrew Stuart (Warwick), Bernt Oksendal (Uni. of Oslo), JoanthanWeare (NYU)

Engineering, Science, Computer Science: Hany Abdel-Khalik (NC State), Jeff Anderson(NOAA), Maarten Arnst (USC), Wolfgang Bangerth (Texas A&M), Tim Barth (NASA), IlleniaBattiato (Max Planck Inst.), Jose Blanchet (Columbia), Yanzhao Cao (Auburn), Wei Chen (North-western), Ruchi Choudhari (Cambridge), Peter Cummings (Vanderbilt), George Deodatis (Columbia),Alireza Doostan (Colorado), Debbie Dupuis (HEC Montreal), Howard Elman (Maryland), Char-bel Farhat (Stanford), Fariba Fahroo (AFOSR), Jacob Fish (RPI), Roger Ghanem (USC), OmarGhattas (Austin), Habib Najm (Sandia), Mandy Hering (Colorado Mines), Gianluca Iaccarino(Stanford), Chris Johnson (Utah), Yannis Kevrekidis (Princeton), Peter Kitanidis (Stanford),Steve Koutsourelakis (Cornell), Fredrik Larsson (Chalmers), Alan Laub (UCLA), Pierre Lermusi-aux (MIT), Wing Liu (Northwestern), Youssef Marzouk (MIT), Marc Mignolet (Arizona), BerndMoeller (Dresden), David Moens (Leuven), Dianne O’Leary (Maryland), Karen Pao (NNSA), AbaniPatra (Buffalo), Jaume Peraire (MIT), Serge Prudhomme (Austin), Adrian Sandu (Virginia Tech),Marcus Sarkis (WPI), Christine Shoemaker (Cornell), David Stargel (AFOSR), Laura Swiler (San-dia), Alexander Tartakovsky (USC), Daniel Tartakovsky (UCSD), Hai Wang (USC), Karen Willcox(MIT), Nichola Zabaras (Cornell), T.A. Zang (NASA).

Statistics: M.J. Bayarri (Valencia), Andrew Booker (Boeing), George Casella (U. Florida), PeterChalloner (Southampton U.), Hugh Chipman (Acadia U.), Dan Cooley (Colorado State), DanCornford (Aston U.), Dennis Cox (Rice), Gary Foley (EPA), Michael Goldstein (Durham), SergeGuillas (University City of London), Hary Iyer (Colorado State), Cari Kaufman (Berkeley), MaggieHan (Auburn), Dave Higdon (LANL), Lulu Kang (Illinois Inst. of Tech), Marc Kennedy (SheffieldU.), Herbie Lee (UCSC), Devon Lin (Queens U.), Max Morris (Iowa State), William Notz (OhioState U.), Jeremy Oakley (Sheffield U.), Tony O’Hagan (Sheffield Univ.), Angela Patterson (GeneralElectric Research), Peter Qian (U. WisconsinPritam Ranjan (Acadia U.), Shane Reese (BringhamYoung U.), Jonathan Rougier (Univ. Durham), Fabrizio Ruggeri (CNR-IMATI), Jerry Sacks,Tom Santner (Ohio State), Leonard Smith (London School of Economics), David Steinberg (TelAviv Univ.), Curtis Storlie (Colorado), Claudia Matt Taddy (Chicago), Tebaldi (NCAR), WilliamWelch (U. British Columbia), Darren Wilkinson (U. Newcastle), Henry Wynn (London School ofEconomics), Kenny Ye (SUNY Stony Brook), Jeff Wu (Georgia Tech).

The above Methodology program will be considered in parallel with three applied programwhich we now describe.

2.2 Climate program

The area of climate modeling is a quintessential field of application for UQ. In fact, a sizable partof the research done in the quantification of uncertainty in computer models has been driven by thepressing needs of climate modelers. Climate models are computer codes based on physical principlesthat simulates the complex interactions between the many parts of the Earth system such asatmosphere and oceans. A significant issue of interest is for instance the problem of regional climateas the global models cannot on their own give enough information at the regional/local scales.

243

Various strategies (downscaling and upscaling) can be considered whereby various combinations ofglobal and regional models are combined to gain information on model uncertainty. Other issuesinclude: development of reliable atmospheric and ocean models and interface between the two,regional and local risk assessment from models, paleoclimate models and how to best deal withmodels that have huge uncertainties and biases. Four specific themes, described below, have beenidentified.

Organizers

Amy Braverman (JPL, California Institute of Technology, and UCLA), Xabier Garaizar (LLNL),Dave Higdon (LANL), Gardar Johanneson (LLNL), Donald Lucas (LLNL)

2.2.1 Climate theme: Observations

Observations are key to uncertainty quantification in climate research because they provide a cor-roborating source of information about physical processes being modeled. However, observationshave uncertainties and this poses a set of methodological and practical issues for comparing themto model simulations: (i) quantifying observational uncertainty when the observations are them-selves inferences based on other quantities, (ii) change of support between model resolution and theresolution of remote sensing or in-situ data, (iii) rectifying or accounting for spatial and temporalinconsistencies, (iv) coping with dependence between observations used in model construction andobservations used for UQ, and (v) leveraging massive volumes of distributed data.

2.2.2 Climate theme: Climate models

Climate models remain our best tool for understanding past, present and future climate change.However, state-of-the-art climate models still contain many sources of uncertainty, including un-certainties from physical processes that are poorly known or are not resolved at the temporal andspatial scales represented in climate models. These uncertainties may cloud the analysis and in-terpretation of climate simulations. This theme focuses on the applications of UQ to characterizeuncertainties in climate model simulations.

2.2.3 Climate theme: Assimilation/calibration/forward UQ

With computational models that simulate climate 10s to 100s of years in the future, comes the needto quantify the uncertainties of the predictions they produce. Uncertainties in these large-scale com-putational models can stem from a variety of sources including numerical approximations, unknowninitial conditions, unknown model parameter settings, missing physics and other inadequacies inthe model. Some of these uncertainties can be reduced by constraining the model be consistentwith physical observations. This theme focuses on approaches for uncertainty propagation, dataassimilation and model calibration that help estimate and constrain uncertainties in model-basedpredictions. The use of large-scale models make such approaches challenging due to their compu-tational burden, their complexity, and their inadequacies. The incorporation of physical data leads

244

to challenges as well - data are recorded at different scales, the volume of data can be quite large,while their spatial and temporal coverage can be quite small.

2.2.4 Climate theme: Multiscale inference

Uncertainty in climate model projections is often represented by an ensemble of plausible simula-tions, which can either be a collection of simulations from multiple models or from a single climatemodel. Drawing inference from such ensembles can be challenging; for example, conclude aboutchanges in trends and extreme events. The theme of this topic is statistical analysis of climatemodel simulations, in particularly methods to draw inference from an ensemble of simulations andacross different spatial and temporal scales.

2.2.5 Prospective participants

Dorian Abbott (U. of Chicago), Rick Arichbald (ORNL), Dan Cooley (Colorado State), Noel Cressie(Ohio State), Marc Genton (Texas A&M), Gabi Hegerl (U. of Edinburgh), Charles Jackson (U. ofTexas), Eugenia Kalnay (U. of Maryland), Richard Katz (NCAR), Cari Kaufman (UC Berkeley),Shane Keating (NYU), Pete Kramer (RPI), Mikyoung Jun (Texas A&M), Ruby Leung (PNNL),Chunsheng Ma (Wichita State), Doug McNeall (Met Office Hadley Centre), Matt Menne (NOAA),Chris Paciorek (UC Berkeley), Robert Pincus (U. of Colorado), Juan Restrepo (U. of Arizona),Bruno Sanso (UC Santa Cruz), Christine Shoemaker (Cornell), Leonard Smith (London School ofEconomics and Political Science), Chris Wikle (U. of Missouri), Pen Zeng (Auburn)

2.3 Engineering/renewable energy program

Analyzing and predicting the behavior of complex systems with multiple length and time scaleshas become a central activity across the broad spectrum of engineering fields. For such systems,limited information can be obtained from observation and experiment. Consequently, simulation-based science has become an essential tool across engineering as well. Since important and costlyengineering decisions, e.g. product and system design, depend critically on simulation, uncertaintyquantification has become increasingly vital to engineering advances.

The program will consider uncertainty quantification issues arising in a number of engineeringfields, e.g., materials, circuits, aeronautics, fusion and fission reactors, nano/MEMS-scale devices.Given the increasing importance of sustainability to the national interest, we will place particularemphasis on engineering of green and renewable energy sources along with related environmental en-gineering issues. Some specific applications to be considered include smart grid models, solar power,fuel cells and batteries, green fuel sources, risk models (including environmental risk and financialrisk), economic impact models and eco-toxicology (carbon sequestration models for instance). Fivespecific themes are described below; each of these themes will be explored through a workshop withthe goal of identifying methodological issues and sustain the activities of a corresponding workinggroups.

245

Organizers

Don Estep (Colorado State), Roger Ghanem (University of Southern California), Miriam Heller(MHITech Systems).

2.3.1 Engineering theme: Renewable Energy

This part of the program will aim at improving our understanding of the uncertainties that af-fect renewable energies along their demand-supply chain. This include generations, transmission,demand, smartGrid technologies, and related pricing and financial instruments.

Organizers: Habib Najm (Sandia) and Youssef Marzouk (MIT)Prospective participants: Joseph F. DeCarolis (NCSU), Mike Eldred (SNL), Roger Ghanem

(USC), Marija Ilic (Carnegie Mellon U.), Omar Knio (Johns Hopkins), Josh Linn (RFF), ShuaiLu (PNNL), Dawn Manley (SNL), Steven Mitchell (BP), Hai Wang (USC), Jianhui Wang (ANL),Jean-Paul Watson (SNL), David Woodruff (UC Davis), Victor Zavala (ANL)

2.3.2 Engineering theme: Nuclear Energy

This theme will bring together stakeholders in nuclear assurance.Organizers: Jim Stewart (Sandia) and Hany Abdel-Khalik (NCSU)Prospective participants: Hany Abdel-Khalik (NCSU), Brian Adams (Sandia), Marv Adams

(Texas A&M), Mihai Anitescu (ANL), Dan Cacuci (NCSU), Nam Dinh (INL), Mike Eldred (San-dia), Roger Ghanem (USC), Urmila Ghia (U. of Cincinnati), Guang Lin (PNNL), Ryan McClarren(Texas A&M), Michael Pernice (INL), Simon Phillpot (U. of Florida), Eric Phipps (Sandia), JohnRed-Horse (Sandia), Ralph Smith (NCSU), Kiran Solanki (Mississippi State U.), Pavel Solin (U.of Nevada), Laura Swiler (Sandia), Yixing Sun (Westinghouse), Angel Urbina (Sandia), BrianWilliams (LANL), Nicholas Zabras (Cornell).

2.3.3 Engineering theme: Sustainability

The objective is to bring together people who are tasked with the practice of sustainability withthose at the frontier of sustainability science. This includes sustainable technologies (i.e. withcontrollable carbon footprint and transparent reuse cycle), complex interacting systems, and urbanplanning.

Organizers: Don Estep (Colorado State), Roger Ghanem (USC), Miriam Heller (MHITechSystems)

Prospective participants: Francis Alexander (LANL), Jeff Arnold (US Army), Godfried Augen-broe (GaTech), Ana Barros (Duke), Burcin Beceric-Gerber (USC), George Deodatis (Columbia),Alireza Doostan (U. Colorado, Boulder), Leonardo Duenas-Osorio (Rice), Joseph Fiksel (Ohio StateU.), David Gerber (USC), Robert Glass (Sandia/NISAC), Christian Hellmich (TU Vienna, Aus-tria), Sinan Keten (Northwestern), Yannis Kevrekidis (Princeton), Peter Louie (SoCal Metropoli-tan Water District), John McGrath (NSF), Anna Michalak (U-Mich), Anna Nagurney (U. Mass,Amherst), Habib Najm (Sandia), Priscilla Nelson (NJIT), Rosemary Renaut (Arizona State U.),

246

George Saad (American U. of Beirut, Lebanon), Alexander Tartakovsky (USC), Paul Torrens (Ari-zona State U), Franz Ulm (MIT), Peter Williams (IBM)

2.3.4 Engineering theme: Engineered Systems

This part of the program will bring together individuals who are close to operational constraints, toidentify common challenges and opportunities for advancing the relevance of UQ knowledge-base.

Organizers: Roger Ghanem (USC)Prospective participants: Wei Chen (Northwestern), Mike Eldred (Sandia), Albert Gilg (Siemens-

AG), Jerry Hefner (NIA), Chris Hoyle (Oregon), Gianluca Iaccarino (Stanford), Sean Kenny(NASA), Laura Laurati (Boeing), Pierre Lermusiaux (MIT), Marc Mignolet (ASU), Robert Moser(UT Austin), Abani Patra (SUNY, Buffalo), Lee Peterson (NASA), Mahadevan Sankaran (Van-derbilt), Matthew Sexton (Boeing), Pol Spanos (Rice), Karen Wilcox (MIT)

2.3.5 Engineering theme: Materials

Organizers: Wei Chen (Northwestern), Wing Kam Liu (Northwestern)Prospective participants: Janet Allen (Oklahoma), Guillaume Bal (Columbia), Craig Burkhart

(Goodyear), Wei Chen, (Northwestern), Tim Germann (LANL), Roger Ghanem (USC), SomnathGhosh (Ohio State), Lori Graham (Johns Hopkins), Steve Greene (Northwestern), Gary Harlow(Lehigh), Iwona Jasiuk, (UIUC), Mike Kassner (ONR), Wayne King (Lawrence Livermore), WingLiu (Northwestern), David McDowell (Georgia Tech), Bill Mullens (ONR), Greg Olson (North-western), Masoud Rais-Rohani, (Mississippi State), Tony Rollett (Carnegie Mellon), David Shahan(Univ of Texas-Austin), Jeff Simmons (Wright-Patterson AFB), Yan Wang (Georgia Tech), Dong-bin Xiu (Purdue), Nicholas Zabara (Cornell)

2.4 Geoscience applications program

Many problems in geoscience and environmental engineering are described by computationallyexpensive models. Computational time is arguably the main obstacle to doing rigorous statisticalanalysis of uncertainty. Further, multiple types of uncertainty need to be incorporated, includingdata error, model error, parameter error, randomness in model input (static and dynamic). Thetype of issues call for new algorithms.

Organizers

Omar Ghattas (Univ. of Texas-Austin), Christine Shoemaker (Cornell University), Daniel Tar-takovsky (University of California - San Diego)

Specific examples of problems requiring careful uncertainty quantification in this field include

• determination of spatial distribution of geologic materials in the subsurface based on soundwave and/or radar (many spatial points, low accuracy),

• determination of location of oil reservoirs or underground water based on exploratory drilling(few spatial points, high accuracy),

247

• forecast of contaminant transport.

Aside from hydrology driven applications, we will also consider the modeling of hurricanes, volcanosand tsunamis. Independent teams of researchers working on each of these three specific themeshave been identified, contacted and will actively participate in the program.

The opening workshop of that program is organized by Souheil Ezzedine (Lawrence LivermoreNational Laboratory), Mary C. Hill (USGS), Wolfgang Nowak (University of Stuttgart, Germany),Gowri Srinivasan (Los Alamos NL), Daniel Tartakovsky (University of California, San Diego).

2.4.1 Prospective participants

Vladimir Alvarado (U. of Wyoming), Vo Anh (Queensland University of Technology, Australia),Nicolas Christou (UCLA), Ian Cook (TIBCO Software Inc), Vladimir Cvetkovic (KTH Royal Inst.of Technology), Kossi Edoh (NC A&T), Samson Emeakpor (Jackson State U.), Michael Fienen (USGeological Survey), Genetha Gray (Sandia), Snehalata Huzurbazar (U. of Wyoming), George Kar-niadakis (Brown), Tim Kelley (NCSU), Peter Kitanidis (Stanford), Bo Li (Purdue), Bani Mallick(Texas A&M), Lucy Marshall (U. of Montana), Arnst Maarten (Universite de Liege, Belgium),Sean McKenna (Sandia), Sergey Oladyshkin (U. of Stuttgart), Bruce Pitman (U. at Buffalo), JuanRestrepo (U. of Arizona), Fabrizio Ruggeri (CNR IMATI, Italy), Leonard Smith (London Schoolof Economics & Political Science), Margaret Short (U. of Alaska), Robert Wolpert (Duke), YiminXiao (Michigan State), Peng Zeng (Auburn), Xiang Zhou (Brown).

3 Activities and timing

• A planning workshop was held at SAMSI on April 6, 2010. The goal of that meeting wastwofold. First, we gathered about 30 experts who helped us the design the program as it nowstands. Second, the meeting was the initiation point of both a new SIAM activity group onuncertainty quantification, see http://www.siam.org/activity/uq/, and a sister interestgroup within the ASA. The SIAM activity group is now fully operational. Its three mainofficers –Don Estep (Colorado State), Max Gunzburger (Florida State) and Habib Najm(Sandia)– have major organizational responsibilities with the SAMSI UQ program. Theefforts within the ASA are led by Dave Higdon (LANL) who is also heavily involved in thepreparation of the program.

• Linked to the above SIAM activity group, there will be a SIAM conference on UncertaintyQuantification, April 2-4, 2012, in Raleigh, NC, see http://www.siam.org/meetings/uq12/.The conference is organized in cooperation with SAMSI. The location was chosen to maxi-mize interactions with the SAMSI UQ program participants. Gremaud is on the organizingcommittee.

• In collaboration with Sandia (Jim Stewart), we have organized a UQ Summer School, seehttp://www.samsi.info/workshop/samsisandia-summer-school-uncertainty-quantification.The summer school will place in Albuquerque, NM, next to Sandia, June 20-24, 2011. Thefour main presenters are Dan Cooley (Colorado State), Doug Nychka (NCAR), Adrian Sandu

248

(Virginia Tech) and Dongbin Xiu (Purdue). They will respectively address the issues ofstatistical analysis of rare events, data assimilation and applications in climate modeling,variational data assimilation and sensitivity analysis and polynomial chaos for differentialequations. These main presentations will be complemented by talks from UQ practitioners,namely Mihail Anitescu (ANL), Mike Eldred (Sandia), Jon Helton (Sandia), Dave Higdon(LANL), Gardar Johannesson (LLNL), Laura Swiler (Sandia), Charles Tong (LLNL).

• The SAMSI UQ program will officially open in late August-September with four openingworkshops

– Climate opening workshop, August 29-31, 2011, Pleasanton, CA.

– Opening Workshop and tutorials: Methodology of Uncertainty Quantification, Septem-ber 7-10, 2011, Radisson/SAMSI, RTP, NC.

– Engineering/energy opening workshop, September 19-21, 2011, Radisson/SAMSI, RTP,NC.

– Geosciences opening workshop, September 21-23, 2011, Radisson/SAMSI, RTP, NC.

The Climate program will start in Pleasanton, CA, next to Lawrence Livermore NationalLaboratory to facilitate interactions with the climate research groups there. Satellite activitieswill be held at LLNL the day after the workshop. During the workshop proper, presentationson the LLNL activities related to climate will be given. The formation of SAMSI workinggroups located at the labs (LLNL and/or LANL) is expected.

• Specialized workshops will be held during the year. For the Engineering program for instance,we will organize individual mid-program workshops on (i) renewable energy (Habib Najm(Sandia) and Youssef Marzouk (MIT) organizers), (ii) nuclear energy (Jim Stewart (Sandia)and Hany Abdel-Khalik (NSCU) organizers), (iii) sustainability (Don Estep (Colorado State),Roger Ghanem (USC) and Miriam Heller (MHITech Systems) organizers), engineered systems(Roger Ghanem (USC) organizer) and materials (Wei Chen (Northwestern) and Wing KamLiu (Northwestern) organizers).

• We are also in the final stages of preparation for a workshop on SmartGrid (George Michai-lidis (Michigan) organizer), at SAMSI, Oct. 3-5, 2011. This workshop will be in cooperationwith the Future Renewable Electric Energy Delivery and Management (FREEDM) Systemscenter at NCSU. The program will cover some broad themes such as large scale dynamics,monitoring problems, design aspects of the SmartGrid. Prospective speakers include Rus-sell Bent (LANL), William Booth (US Energy Information Administration), Alex Huang(FREDM Center and NCSU), Marija Ilic (CMU), Sean Meyn (UIUC), Bryan Richardson(Sandia), Anna Scaglione (U. Washington), James Thorp, (Virginia Tech), Pravin Varaiya(UC Berkeley).

• Two graduate courses will be proposed. One in Fall 2011 will focus on methodology; the otherin Spring 2012 will focus on applications. Dongbin Xiu (Purdue) will be the lead instructorfor both courses.

249

• Working groups will pursue research throughout the year.

4 Relationships

4.1 Affiliates

Many of our university affiliates have researchers strongly involved with UQ issues. Our corporateaffiliates have interest in UQ, and have been contacted.

4.2 Partner universities

There is serious interest in this area in virtually every one of the SAMSI affiliated partner universitydepartments. Eleven faculty members will be given release from a course to participate in theprogram, along with 12 graduate students. Many others are expected to also participate.

4.3 National laboratories and governmental agencies

Several of the national labs have been significantly involved so far.

• We have organized an already successful (highly oversubscribed) UQ Summer School jointlywith Sandia, see above. Other potential participants include John Helton, Scott Mitchell,Martin Pilch, Laura Painton Swiler, Tim Trocano, James Stewart, Mike Eldred, Eric Phipps,Brian Adams, John Red-Horse, Brian Carnes, Angel Urbina, Rich Field, Paul Constantine,Keith Dalbey, Bert Debusschere, Habib Najm, Helgi Adalsteinsson, and Cosmin Safta.

• The opening workshop of the SAMSI UQ program on Climate will be held in Pleasanton, CA,next to LLNL. The organization of the workshop and indeed the program is done in closecollaboration with LLNL researchers (Don Lucas, Gardar Johannesson). Other potentialLLNL participants include Xabier Garaizar, Carol Woodward, Charles Tong, Jeff Hittinger,Jeff Banks, Bill Henshaw, Bjorn Sjogreen, Gardar Johannesson Gardar, Bill Hanley, SteveLee, Anders Petersson, Panayot Vassilevski, Stephen Guzik, Kambiz Salari, Ping Wang, DanLaney, Timo Bremer, Peter Lindstrom, and Fady Najjar.

• LANL will also be heavily involved in the program; David Higdon (Head of Statistical Sci-ences) is one the main organizers of the UQ program. Other potential participants in-clude Mark Anderson, Scott Doebling, Michael Hamada, Christine Anderson Cook, KenHanson, Francois Hemez, Nick Hengartner, Randy Michelsen, Lisa Moore, Charlie Nakhleh,Amanda Rutherford, Jim Smith, Chuck Farrar, Ryan Maupin, Jason Pepin, Francois Hemez,Derek Armstrong, David Sigetti, Diane Vaughan, Jim Cooley, Clint Scovel, Don Hush, FrankAlexander, Dave Moulton, Pieter Swart, Humberto Godinez, Brian Williams, Nick Hengart-ner, and Jim Gattiker.

• Smaller groups at Pacific Northwest National Laboratory, Idaho National Laboratory, Ar-gonne National Laboratory, and Oak Ridge National Laboratory have been contacted.

250

• NCAR has significant interests in the area, in the climate arena especially. Contacts includeJoe Tribbia, Chris Snyder, Jeff Anderson, Grant Branstator, Amik St Cyr, Thomas Hopson,Luca della Monache, Doug Nychka, Matt Pocernich, Eric Gilleland, Mandy Hering, RickKatz, Claudia Tebaldi and Steve Sain.

• Triangle-base organizations such as EPA and NIEHS have numerous interests in UQ, andthere involvement in the program is being sought.

4.4 Other NSF institutes

Several of the NSF math/stat institutes will be having programs and themes related to at least someof the issues outlined above at one point or another. There also are also recent workshops or eventsbeing planned that directly relate to UQ issues, such as the two IMA workshops described at http://www.ima.umn.edu/2010-2011/W10.18-22.10/ and http://www.ima.umn.edu/2010-2011/W6.

6-10.2011/. The overall focus of the IMA program is on modeling in general and in that senseis quite close to the 2006-07 SAMSI program on the Development, Assessment and Utilization ofComplex Computer Models. Our upcoming UQ program is much more focused on UQ proper andwill offer a unique link between methodology and the three applied fields of climate, engineeringand geosciences.

259

Appendices

A. Final Report: 2008-09 Program on Sequential Monte Carlo Methods

B. Final Report: 2008-09 Program on Algebraic Methods in Systems

Biology and Statistics

C. Final Report: Summer 2008 Program on Meta-analysis: Synthesis

and Appraisal of Multiple Sources of Empirical Evidence

D. Final Report: Summer 2009 Program on Psychometrics

E. Workshop Participant Lists

F. Workshop Programs and Abstracts

G. Workshop Evaluations

SEQUENTIAL MONTE CARLO METHODS

Final Report

Program Leaders: Arnaud Doucet and Simon Godsill

1 Program and its Objectives:

This aim of this 12 month SAMSI program was to develop new approaches to scientific/statisticalcomputing using innovative sequential Monte Carlo (SMC) methods. The program addressedfundamental challenges in developing effective sequential and adaptive simulation methodsfor computations underlying inference and decision analysis. The research blended concep-tual innovation in new and emerging methods with evaluation in substantial applied contextsdrawn from areas such as control, communications and robotics engineering, financial andmacro-economics, among others. Researchers from statistics, computer science, informationengineering and applied mathematics were involved, and the program promoted the oppor-tunity for both methodological and theoretical research. The interdisciplinary aspects of theprogram were substantial, as was the attractiveness for students and postdocs.

2 Background

Monte Carlo (MC) methods are central to modern numerical modelling and computation incomplex systems. Markov chain Monte Carlo (MCMC) methods provide enormous scope forrealistic statistical modelling and have attracted much attention from disciplinary scientistsas well as research statisticians. Many scientific problems are not, however, naturally posedin a form accessible to evaluation via MCMC, and many are inaccessible to such methodsin any practical sense. For example, for real-time, fast data processing problems that in-herently involve sequential analysis, MCMC methods are often not obviously appropriate atall due to their inherent ”batch” nature. The recent emergence of sequential MC conceptsand techniques has led to a swift uptake of basic forms of sequential methods across sev-eral areas, including communications engineering and signal processing, robotics, computervision and financial time series. This adoption by practitioners reflects the need for newmethods and the early successes and attractiveness of SMC methods. In such, probabilitydistributions of interest are approximated by large clouds of random samples that evolveas data is processed using a combination of sequential importance sampling and resamplingideas. Variants of particle filtering, sequential importance sampling, sequential and adaptiveMetropolis MC and stochastic search, and others have emerged and are becoming popularfor solving variants of ”filtering” problems; i.e. sequentially revising sequences of probabilitydistributions for complex state-space models. Useful entree material and examples SMC

260

methods can be found at the following SMC preprint site. Many problems and existingsimulation methods can be formulated for analysis via SMC: sequential and batch Bayesianinference, computation of p-values, inference in contingency tables, rare event probabili-ties, optimization, counting the number of objects with a certain property for combinatorialstructures, computation of eigenvalues and eigenmeasures of positive operators, PDE’s ad-mitting a Feynman-Kac representation and so on. This research area is poised to explode,as witnessed by this major growth in adoption of the methods.

The SAMSI SMC program aims were to:• Address methodological and theoretical problems of SMC methods, including syn-

thesis of concepts underlying variants of SMC that have proven apparently successful acrossmultiple fields, and the development of methodological and theoretical advances.

• Develop the methodological research – with broad opportunities for test-bed exam-ples, methods evaluation and refinement of generic approaches – in the contexts of a numberof important applied problems (e.g. data assimilation, inference for large state spaces, fi-nance, tracking, continuous time models).

The program was an opportunity for exchange between communities, helping, we hope, toshape the future of stochastic computation and sequential methods, involving statisticians,computer scientists and engineers as core participants as well as others working collabora-tively in a range of applied fields.

3 Core Group

A core group of researchers have been based at SAMSI, complemented by external partici-pants in the various working groups, which held weekly meetings via Webex connections toSAMSI.

3.1 Local faculty

• Mark Huber (Duke)

• Mike West (Duke)

• Nilay Argon (UNC)

3.2 Senior researchers (at SAMSI for significant periods of timein Fall, 2008):

• Susie Bayarri (University Valencia)

• Jaya Bishwal (University North Carolina Charlotte)

• Carlos Carvalho (University of Chicago)

• Arnaud Doucet (University British Columbia)

• Pena Edsel (University South Carolina)

261

• Liu Fei (University Missouri)

• Marco Ferrante (University Pavia)

• Nathan Green (DSTL)

• Hedibert Lopes (University of Chicago)

• Raquel Prado (University Santa Cruz)

• Sylvain Rubenthaler (University Nice)

• Yoshida Ryo (Institute Statistical Mathematics)

• Jochen Voss (University Warwick)

3.3 Researchers (Spring and Summer, 2009)

• Daniel Clark (University Herriott Watt)

• Mark Coates (University Mc Gill)

• Paul Fearnhead (University of Lancaster)

• Andrew Thomas (University St Andrews)

• James Lynch (University South Carolina)

• Ernest Fokoue (Kettering University US)

3.4 Postdoctoral fellows

• Ioanna Manolopoulou

• Das Sourish

3.5 Postdoctoral associates

• Armagan Artin (Duke)

• Christian Macaro (Duke)

• Julien Cornebise (University Paris VI)

• Gentry White (NSCU)

• Elizabeth Shamseldi (Duke)

• Bin Liu (Duke)

262

3.6 Graduate students

• Melanie Bain (Duke)

• Luke Bornn (University British Columbia)

• Deidra Coleman (NCSU)

• Ana Corberan (University of Valencia)

• Thomas Flury (Oxford University)

• Roman Holenstein (University British Columbia)

• Chunlin Ji (Duke)

• Olasunkanmi Obanubi (Imperial College)

• Gareth Peters (University New South Wales)

• Francesca Petralia (Duke)

• Sarah Schott (Duke)

• Minghui Shi (Duke)

• Baqun Zhang (NCSU)

4 Program Organization

4.1 Opening workshop

The Opening Workshop was held during September 7-10, 2008 at SAMSI, organized byArnaud Doucet (British Columbia), Simon Godsill (Cambridge) and Mike West (Duke Uni-versity). This highly successful event engaged significant parts of the statistical, engineeringand mathematics community, as well as others in econometrics and sciences, and includedthemed sessions from all of the main program topics (working groups).

Tutorial talks were given by four world leaders in the various areas of SMC: Pierre delMoral (Bordeaux), Paul Fearnhead (Lancaster), Hedibert Lopes (Chicago) and Jun Liu(Harvard). These were pitched at various levels, allowing useful participation by attendeesstarting in the area as much as those already expert in one or more topics. Themed conferencesessions were arranged to have a good balance between senior invited talks, new researchertalks and panel discussion. Most sessions stimulated very active discussion.

At a break-out session on the final afternoon, Working Group leaders were allocatedand a broad declaration of interest was obtained from all workshop participants for theirsubsequent participation in the program.

263

4.2 Undergraduate workshop

The SAMSI Two-Day Undergraduate Workshop was held from October 31 - November 1,2008. There were nine technical talks given by Jaya Bishwal, Jochen Voss, Gentry White,Nathan Green, Christian Macaro, Sourish Das, Julien Cornebise, Ioana Maolopoulou andFrancesca Petralia, covering many aspects of SMC from basic methodology to applicationsin finance and defence. There was be also an interactive R session.

4.3 Fall SAMSI course on sequential Monte Carlo

This course provided an introduction to sequential Monte Carlo methodology, theory andapplications. It was attended by approximately 40 people. Topics covered include: intro-duction to SMC, advanced SMC methods, SMC methods for parameter estimation in generalstate-space models, SMC methods as alternative to MCMC methods. The main instructorwas Arnaud Doucet and two ’invited’ instructors gave some lectures: Christophe Andrieu(Bristol) and Alexander Chorin (Berkeley).

4.4 Mid-term workshop

A mid-term workshop was organised on 19-20 Feb 2009 at the SAMSI Institute. This hadparticipants from most of the working groups, including the leaders of the Continuous Time(Fearnhead - Lancaster), Tracking (Godsill - Cambridge), Big Data (West - Duke), Parame-ter Learning (Lopes - Chicago) and Model Assessment (Carvalho - Chicago) working groups.These leaders gave overviews of progress in the different working groups and other partici-pants gave research updates on SAMSI related work. Three of the 15 talks were deliveredsuccessfully by Webex from remote locations. A particular focus of talks and discussion wasthe Tracking working group, which assembled many of its participants at the workshop.

4.5 Adaptive Design Workshop

An Adaptive Design, Computer Modeling and SMC workshop organized by Jim Berger andSuzie Bayarri was held from April 15-17, 2009; see http://www.samsi.info/workshop/smc-adaptive-design-sequential-monte-carlo-and-computer-modeling-workshop-april-15-17-2009 fordetails. There were 13 speakers.

4.6 Transition workshop

The transition workshop was held at SAMSI from 9-10 Nov. 2009; see http://legacy.samsi.info/workshops/2008smc-transition200911.shtml for details. There were 20 talks and all the working groups wererepresented.

4.7 Working groups

The working groups met weekly throughout the program. The topics were• Tracking and Large-Scale Dynamical Systems

264

• Theory• Population Monte Carlo• Continuous Time• Parameter• Model Assessment• Big Data

4.7.1 Tracking and Large-scale dynamical systems

The working group leaders were Simon Godsill (whole year) and Nathan Green (Fall 08).Regular participants included: Mark Briers (QinetiQ, UK), Daniel Clark (Heriot-Watt

University UK), Avishi Carmi (Cambridge University UK), Julien Cornebise (SAMSI post-doc), Ernest Fokoue (Kettering University US), Simon Godsill (Cambridge University -program leader), Nathan Green (DSTL UK, long-term Fall 08 visitor) Chunlin Ji (DukeUniversity - SAMSI Grad. student), Sze Kim Pang (Cambridge University UK), Gareth Pe-ters (Univ. New South Wales, SAMSI Fall 08 visitor), Viktor Rozgic (University of SouthernCalifornia), Francois Septier (Cambridge University, UK), Joshua Vogelstein (Johns HopkinsUniversity), Gentry White (N.C. State University - SAMSI Post-doc), Namrata Vaswani(Iowa State University)

Working group - organisation This working group has focussed on methodology forproblems in high-dimensional tracking, with applications in computer vision, tracking, me-teorology, biological imaging, etc. Standard particle filters do not perform satisfactorily inthis scenario and hence we are pushing the methodology further by development of novelapproaches. These include elements of Markov chain Monte Carlo-based filters, genetic al-gorithm approaches and SMC samplers. The goals of the working group are be to producepapers on various topics involving multiple participants from the group, leading to future col-laborative projects across a number of disciplines. The theme was organised into sub-groupsaddressing the following areas:

Subgroup 1: Multiple target tracking (lead Francois Septier (Cambridge)):Participants include Simon Godsill, Francois Septier, Chunlin Ji, Mark Briers, Viktor

Rozgic, Ernest Fokoue, Daniel ClarkThe aim of this sub-group was to research new methodologies for high-dimensional and

structured tracking problems using SMC. We have generated standard datasets, test sce-narios and data simulation code for multiple target tracking with random birth and deathof objects and various sensor characteristics based on point process models or pixellatedimage data. A number of methodologies have been tested on this scenario, including novelMCMC-based particle filters, resample-move filters, SMC samplers, variational Bayes, etc.New smoothing methods for random finite set models have also been developed by Dan Clark.In addition we have done work in detection and tracking of dynamic group objects, usinga virtual-leader formulation and adaptations of our previous SDE-based models, and withdynamic graph structures, tracked over time with SMC. The main interest here is to trackformations of objects which have a common pattern of motion. The movement of the wholegroup is of interest rather than tracking each object separately. Groups of targets can beconsidered as formations of entities whose number varies over time because targets can enter

265

a scene, or disappear at random times. The groups can split, merge, to be relatively nearto each other or move largely independently on each other. However, it is typical for groupformations to maintain some patterns of movement. Monte Carlo approaches combined withevolving random graphs is proposed for group object state and structure estimation weredeveloped and published by the Cambridge and Lancaster groups.

The Various participants have benefitted from the collaborations as can be seen from thelarge number of high quality publications arising from this working group.

1. S. K. Pang, S. Godsill, J. Li, F. Septier, and S. Hill. ”Sequential Inference for Dy-namically Evolving Groups of Objects” a Chapter in ”Bayesian Time Series Models”edited by D. Barber, A.T. Cemgil and S. Chiappa, Cambridge University Press, 2010ISBN : 9780521196765.

2. F. Septier, S. K. Pang, A. Carmi, and S. Godsill. ”On MCMC-Based Particle Methodsfor Bayesian Filtering : Application to Multitarget Tracking” in IEEE InternationalWorkshop on Computational Advances in Multi-Sensor Adaptive Processing (CAM-SAP 2009), Aruba, Dutch Antilles, Dec. 2009

3. F. Septier, A. Carmi, and S. Godsill. ”Tracking of Multiple Contaminant Clouds” inProc. Int. Conf. on Information Fusion (FUSION 2009), Seattle, USA, Jul. 2009Winner of the third best regular paper award.

4. A. Carmi, F. Septier, and S. Godsill. ”The Gaussian Mixture MCMC Particle Al-gorithm for Dynamic Cluster Tracking” in Proc. Int. Conf. on Information Fusion(FUSION 2009), Seattle, USA, Jul. 2009 [BIBTEX]

5. F. Septier, A. Carmi, S. K. Pang, and S. Godsill. ”Multiple Object Tracking UsingEvolutionary and Hybrid MCMC-Based Particle Algorithms” in 15th IFAC Symposiumon System Identification, (SYSID 2009), Saint-Malo, France, Jul. 2009 Invited paper.

6. . F. Septier, S. K. Pang, S. Godsill, and A. Carmi. ”Tracking of Coordinated Groupsusing Marginalised MCMC-based Particle Algorithm” in IEEE Aerospace Conference,Big Sky, Montana, Mar. 2009

7. A. Carmi, S. Godsill, and F. Septier. ”Evolutionary MCMC Particle Filtering forTarget Cluster Tracking” in IEEE 13th DSP Workshop and the 5th SPE Workshop,Marco Island, Florida, Jan. 2009

8. Variational Mean Field Approach to Efficient Multitarget Tracking. E. Fokoue. JSMProceedings 2009.

9. Sequential Monte Carlo Smoothing with Random Finite Set Observations. D. Clarkand M. Briers. JSM Proceedings 2009

10. First-Moment Multi-Object Forward-Backward Smoothing Daniel E. Clark Interna-tional Conference on Information Fusion, 2010

11. Extended object filtering using spatial independent cluster processes A Swain, D ClarkInternational Conference on Information Fusion, 2010

266

12. Improved SMC implementation of the PHD filter B Ristic, D Clark, B N Vo, Interna-tional Conference on Information Fusion, 2010

13. Performance evaluation of multi-target tracking using the OSPA metric B Ristic, B NVo, D Clark International Conference on Information Fusion 2010

14. First-moment filters for spatial independent cluster processes. A. Swain and D. E.Clark. SPIE Defense, Security and Sensing Symposium, 2010.

15. Forward-Backward Sequential Monte Carlo Smoothing for Joint Target Detection andTracking, D. Clark, B.-T. Vo and B.-N. Vo, IEEE Proc. 12th Annual Conf. InformationFusion, Seattle, Washington, 2009.

16. A. Gning, L. Mihaylova, S. Maskell, S. K. Pang, and S. Godsill, Group object struc-ture and state estimation with evolving networks and monte carlo methods, IEEETransactions on Signal Processing, 2010.

17. , Ground target group structure and state estimation with particle filtering, in Proc.of Interna- tional Conf. on Information Fusion, Germany, Cologne, 2008.

18. , Evolving networks for group object motion estimation, in Proc. of IET Seminar onTarget Tracking and Data Fusion: Algorithms and Applications, Birmingham, UK,2008, pp. 99106.

19. H. Bhaskar, L. Mihaylova, and S. Maskell, Population-based particle filters, in Proc.from the Insti- tution of Engineering and Technology (IET) Seminar on Target Trackingand Data Fusion: Algorithms and Applications, Birmingham, UK, 2008, pp. 3138.

20. H. Bhaskar, L. Mihaylova, S. Maskell, and S. Godsill, Evolving population MarkovChain Monte Carlo particle filtering, Journal paper (in preparation).

21. H. Bhaskar, L. Mihaylova, and A. Achim, Video foreground detection based on sym-metric alpha-stable mixture models, IEEE Transactions on Systems and Circuits forVideo Technology, 2010, in press.

22. D. Angelova and L. Mihaylova, Contour segmentation in 2d ultrasound medical imageswith particle filtering, Machine Vision and Applications Journal, 2010, in press.

23. L. Mihaylova, D. Angelova, D. Bull, and N. Canagarajah, Localisation of mobile nodesin wireless sensor networks with time correlated measurement noises, IEEE Transac-tions on Mobile Computing, 2010, in press.

24. L. Mihaylova and D. Angelova, Noise parameters estimation with Gibbs sampling forlocalisation of mobile nodes in wireless networks, in Proc. of the 13th InternationalConference on Information Fusion. ISIF, Edinburgh, UK, 2010.

25. A. Gning, L. Mihaylova, and F. Abdallah, Mixture of uniform probability densityfunctions for nonlinear state estimation using interval analysis, in Proc. of 13th Inter-national Conference on Information Fusion. ISIF, Edinburgh, UK, 2010.

267

26. L. Mihaylova, A. Gning, V. Doychinov, and R. Boel, Parallelised gaussian mixturefiltering for vehicular traffic flow estimation, in Informatik 2009, Lecture Notes inInformatics (LNI) - Proceedings Series of the Gesellschaft fur Informatik (GI), Eds. S.Fischer, E. Maehle, R. Reischuk, Vol. P-154. Germany, Luebeck, 2009, p. 2321 2333.

27. J.A. Pocock, S.L. Dance and A.S. Lawless (2010) State Estimation using the ParticleFilter with Mode Tracking, submitted to Computers and Fluids

28. M. Hong, M. F. Bugallo, and P. M. Djuric, ”Joint Model Selection and ParameterEstimation by Population Monte Carlo Simulation,” to appear in Journal of SelectedTopics in Signal Processing, 2009.

29. B. Shen, M. F. Bugallo, and P. M. Djuric, ”Multiple marginalized population MonteCarlo,” accepted in EUSIPCO 2010.

30. M. F. Bugallo and P. M. Djuric, ”Target tracking by symbiotic particle filtering,”Proceedings of the 2010 IEEE Aerospace Conference, Big Sky (Montana, USA), March2010.

31. M. F. Bugallo, M. Hong, and P. M. Djuric, ”Marginalized population Monte Carlo,”Proceedings of the IEEE 34th International Conference on Acoustics, Speech and SignalProcessing (ICASSP’2009), Taipei (Taiwan), April 2009.

32. P. M. Djuric and M. F. Bugallo, ”Improved target tracking with particle filtering,”Proceedings of the 2009 IEEE Aerospace Conference, Big Sky (Montana, USA), March2009.

33. M. F. Bugallo and P. M. Djuric, ”Complex systems and particle filtering,” Proceed-ings of the Asilomar Conference on Signals, Systems, and Computers, Paci c Grove(California, USA), November 2008.

PhD Student: Viktor Rozgik Viktor Rozgic PhD Student, Advisor: Prof. ShrikanthNarayanan

Signal Analysis and Interpretation Laboratory Viterbi School of Engineering Universityof Southern California

Research and Impact to ThesisI have found information about SAMSI opening workshop online and since I have been

using Sequential Monte Carlo (SMC) methods in my work before I decided to attend it.I found the talks very interesting and I got involved in work of the Tracking Workgroupwhich was the good match for the topic of my thesis ”Multimodal fusion for tracking andidentification in Smart Environments”. Collaboration with the grop members, talks I haveheard in the workshops and weekly group meedings; and references and papers in progressshared over the group’s webpage were very helpfull for my thesis work. Besides gettinga much better perspective on the state-of-the-art Sequntial Monte Carlo Algorithms andunderstanding the current research directions I had a hands-on-experience in implementationand testing of the SMC algorithms on the synthetic multi-target tracking problem. For thisoportunity I feel very gratefull to Prof. Simon Godsill and Dr. Francois Septier. I have

268

managed to transplant and adapt part of this work to problems of audio-visual tracking,speaker segmentation and identitfication in meeting scenarios. Proposed work for the finalpart of my thesis includes work on multitarget tracking algorithms which is not focused onlyon the Meeting Monitoring environments and I hope that I am going to be able to continuecollaboration with people I have meet during the workshop in the following period.

Subgroup 2: Biological Cell tracking (lead Chunlin Ji):Participants include: Chunlin Ji, Simon Godsill, Daniel Clark.This group interfaces also with the multiple target tracking sub-group, being concerned

with video imaging data involving fluorescently labelled multiple cells, which move around,grow, divide, etc. They have investigated a number of approaches including point processbased methods, including PHD filters, and also pixel-based likelihood functions. Datasetshave already been provided by Duke researchers. Two papers are in preparation:

• C. Ji & M. West (2009) Bayesian Nonparametric Modelling for Time-varying SpatialPoint Processes

Chunlin Ji (SAMSI RA) is attached to the Tracking (Godsill) working group and participatesactively in the Big data group with West on spatial dynamic modelling for biologicalcell tracking problems. Ji is developing SMC methods in the context of new classes ofmodels. This research has grown out of existing work of Ji & West in static problems,now extended with new dynamic models that will form an additional part of Ji’sPhD thesis research, and one initial paper is in draft at the time of this report (seemanuscripts section). Ji has led discussions on this work at several Tracking workinggroup and Big data group meetings, gave a talk at the February 2009 mid-programworkshop, and will present this work at the 7th Workshop on Bayesian Nonparametricsin Turin, Italy, in June 2009, and at the 2009 Joint Statistical Meetings in WashingtonDC 2009.

• C. Ji & M. West (2009) Bayesian Nonparametric Modelling for Time-varying SpatialPoint Processes

A grant application has gone in from Dan Clark to the UK’s BBSRC Tools and Resourcespanel (New Investigator program) on cell tracking work

[Bin Liu] (SAMSI RA) was attached to the Tracking (Godsill) working group.Since Feb. 2009, I participated this SMC program in SAMSI as a postdoc (or research

scholar). I have three papers, which are related to this program. Now I list them in thefollowing.

1) Bin Liu, Merlise Clyde, Tom Loredo and Jim Berger. Adaptive Annealed ImportanceSampling Method for Calculating Marginal Likelihood with Application to Bayesian Exo-planets Detection. We’re revising the manuscript and will submit it soon.

2) Bin Liu, Chunlin Ji, Yangyang Zhang and Chengpeng Hao. Multi-target Trackingin Clutter with Sequential Monte Carlo Methods (first drafft prior to coming to SAMSI).Accepted by IET Radar, Sonar & Navigation.

3) Bin Liu, Chunlin Ji, Yangyang Zhang, and Chengpeng Hao. Blending Sensor Schedul-ing Strategy with Particle Filter to Track a Smart Target, Wireless Sensor Networks.

269

All the above works are supported by Profs. Merlise Clyde and Jim Berger’s Astro fund,namely, National Science Foundation of the USA under Grant No. 0507481.

Subgroup 3: Covert chemical release (lead Nathan Green):Participants include: Nathan Green, Francois Septier, Avishy Carmi, Simon Godsill,

Mark Briers, Gareth Peters.This topic involves plume tracking and source term estimation, exploring contour track-

ing, cloud tracking, ABC methods, SMC samplers and other novel techniques. The subgrouphas produced models and simulation code for the source term estimation problem, in whichthe task is to estimate the location of a covert chemical release through sequential monitoringof the pattern of the resulting chemical plume. DSTL have agreed in principle to provideLIDAR data for this problem, and this is still being negotiated at the time of this report. Anumber of advances have been made in the area.

For the pure cloud tracking problem (without source term estimation), we have studiedthe problem of sequential inference about complex evolving cloud structures from LIDARdata, presently all simulated from models. A dynamic Gaussian mixture approximation withunknown number of components is used for the cloud intensity. Some very successful resultswere obtained from very ambiguous thresholded data, which have impressed specialists atQinetiQ UK Ltd. and DSTL UK Ltd. A paper is submitted to the 2009 Fusion conference,and a further paper is in preparation for the SYSID conference:

• Tracking of Multiple Contaminant Clouds. Francois Septier, Avishy Carmi, SimonGodsill. Fusion 2009 (submitted).

• Multiple Object Tracking Using Evolutionary and Hybrid MCMC-Based Particle Al-gorithms. F. Septier, A. Carmi, S. K. Pang and S. J. Godsill. SYSID 2009.

In addition to this, work on sequential source term estimation has been undertaken usinga new trans-dimensional ABC algorithm that is able for the first time to detect multipleunknown covert releases. A survey paper has been submitted already and a paper on theSTEM application is also under preparation:

• S.A. Sisson, G. W. Peters, Y. Fan, and M. Briers, Likelihood-free samplers, JournalSubmission, Dec 2008.

• G. W. Peters, M. Briers and ... Trans-dimensional ABC for source term estimation, Inpreparation.

This work has also led to an invite to the ABC workshop in Paris, June 2009. The workhas attracted serious attention from the UK’s DSTL defence organisation and is very likelyto lead to new grant funding in the near future.

A final sub-topic in this area concerns emulation-based methods for approximation ofcomplex source term simulation problems. A paper is in preparation:

• Emulation Based Priors for Source Term Estimation. Gentry White and Nathan Green,in preparation.

270

This looks into using an emulation based approach to construction priors for use in asequential Monte Carlo model for source term estimation. This emulation based approach al-lows for the construction of priors based on prior information from both computer models aswell as field data. These priors offer advantages over existing priors in that they avoid degen-eracy in the SMC simulation. This work draws on work from the previous SAMSI programon Development, Assessment and Utilization of Complex Computer Models, including workfrom the Engineering Methodology working group and the paper ”Mechanism-Based Emu-lation of Dynamic Simulation Models: Concept and Application in Hydrology” (Reichert et.al 2009). currently under submission.

Subgroup 4: Neuron tracking (lead Joshua Vogelstein):This topic involves tracking of multiple neuronal activity measured in living brains and

involves learning of sparse connectivity matrices in continuous-time spiking environments. Toinclude continuous time spike modelling, inference for multiple (sparsely connected) neurons,parameter estimation, image models.

The work progressed very pleasingly and the following papers were written:

• Spike Inference from Calcium Imaging Using Sequential Monte Carlo Methods, JoshuaT. Vogelstein, Brendon O. Watson, Adam M. Packer, Rafael Yuste, Bruno Jedynakand Liam Paninski, Biophysical Journal, Volume 97, Issue 2, 636-655, 22 July 2009.

• A BAYESIAN APPROACH FOR INFERRING NEURONAL CONNECTIVITY FROMCALCIUM FLUORESCENT IMAGING DATA, Yuriy Mishchencko, Joshua T. Vogel-stein, and Liam Paninski, to appear in the Annals of Applied Statistics

Other works are in progress, in particular, modifying the original smc work to utilize Rao-Blackwellization. They also plan on improving the parameter estimation in the work, po-tentially by replacing the EM step with a fully Bayesian sampling step.

Regular participants include: Mark Briers (QinetiQ, UK), Daniel Clark (Heriot-WattUniversity UK), Avishi Carmi (Cambridge University UK), Julien Cornebise (SAMSI post-doc), Ernest Fokoue (Kettering University US), Simon Godsill (Cambridge University -program leader), Nathan Green (DSTL UK, long-term Fall 08 visitor) Chunlin Ji (DukeUniversity - SAMSI Grad. student), Sze Kim Pang (Cambridge University UK), Gareth Pe-ters (Univ. New South Wales, SAMSI Fall 08 visitor), Viktor Rozgic (University of SouthernCalifornia), Francois Septier (Cambridge University, UK), Joshua Vogelstein (Johns HopkinsUniversity), Gentry White (N.C. State University - SAMSI Post-doc), Namrata Vaswani(Iowa State University)

4.7.2 Theory

The Theory working group was led by Mark Huber (Duke). The goal was to develop andanalyze algorithms arising in SMC and MCMC applications. The plan of attack was toexamine several techniques from both fields, and attempt to answer questions such as: 1)when can a method from one field be used in the other, and 2) is it possible to provesomething about the running time of these methods as algorithms? This second questiontypically reduces to questions about rate of convergence, or the variance of estimators.

271

Participants include Petar Djuric (Stony Brook), Jan Hannig (UNC), Jim Lynch (U.South Carolina), Jonathan Mattingly (Duke), Edsel Pena (U. South Carolina), Gareth Peters(SAMSI), Giovanni Petris (Arkansas), Clyde Schoolfield (Florida), Sarah Schott (Duke),Namrata Vaswani (Iowa State), Anand Vidyashankar (Cornell).

The theory working group has been exploring relationships between Markov chain ap-proximations and SMC methods, with an eye towards provably good methodologies. Un-fortunately, most of the existing work relies on having rapidly mixing Markov chains, atwhich point the use of SMC is not necessary. However, proper use of Monte Carlo samplesremains a difficult issue. Therefore, we have started concentrating on a very general method-ology called the ”Product Estimator” for moving from samples to approximate integration.This method is very versatile and does not rely on having bounded variance of the randomvariables used in the Monte Carlo algorithm. Current analysis, however, relies on usinga medians-of-averages approach. An approach using pure averages should converge morequickly, but this makes the analysis more difficult, requiring study of products of binomialrandom variables. The eventual goal is a tighter bound on the tails of these distributions,moving from what is now a constant of 16 to a value of 2.

Graduate student: Sarah Schott.Since our working group lacked a postdoc, Sarah was organizing the meetings and keeping

our web page. On the research side, she worked on the product estimator problem describedabove, beginning with simulation studies and currently working to extend large deviationsinequalities for binomials from sums to products.

Impact on researchInitially Mark Hubert introduced the product estimator as a side algorithm, a participant

in the working group asked the question about the tightness of the constant. This raised aninteresting point, and as Sarah and Huber studied the problem further, it proved far moredeep a question than at first realized. In addition to this research avenue, the participantshave learned much about SMC methodology over the course of the program, and still hopeto utilize some of these methods in improving perfect simulation algorithms (the focus of myresearch program.)

4.7.3 Population Monte Carlo

The population Monte Carlo working group was led by Arnaud Doucet and Julien Cornebise.Following up the discussions at the kick-off workshop in September 2008, the Population

Monte Carlo working group was created to demonstrate the potential of Sequential MonteCarlo (SMC) methods for general stochastic computation problems. Although most of thecurrent work on SMC address on-line inference problems, the objectives of this group isto focus on the development of SMC and its variants to address problems where Markovchain Monte Carlo (MCMC) methods are traditionally used. Standard MCMC are typicallyinefficient for multimodal target distributions and the objectives of this group is to developpowerful particle alternatives.

We have worked on five specific topics plus a side-collaboration on a connected matter.

Subgroup 1: Adaptive SMC samplers and Population Monte Carlo algorithmsSMC samplers and Population Monte Carlo is a general methodology which can be used

as an alternative to MCMC methods. However it requires specifying a cooling schedule and

272

some proposal distributions. We have developed several methods to automatize the tuningof these key parameters. We proposed a new method which allows us to compute on-the-flya relevant cooling schedule. The resulting algorithms have been used to solve ApproximateBayesian Computation problems (in junction with Subgroup 3), and to perform inference instochastic volatility models. We have developed methodologies to design automatically theparameters of the proposal distributions, both for SMC Samplers and for Population MonteCarlo variants. We also proposed algorithms that use Rao-Blackellisation of the key steps ofPMC-related widespread algorithms to achieve variance reduction of the final estimates.

1. (Subgroup 1 and 4) Adaptivity for ABC algorithms: the ABC-PMC scheme. Beau-mont, M., Robert, C.P., Marin, J.-M. and Cornuet, J.M., Biometrika 96(4), 983-990(2009)

2. On variance stabilisation by double Rao-Blackwellisation. Iacobucci, A., Marin, J.-M.,and Robert, C.P., Computational Statistics and Data Analysis 54, 698-710. (2009)

3. Adaptive Sequential Monte Carlo algorithms, J. Cornebise, Ph.D. thesis, UniversityPierre et Marie Curie Paris 6 (2009)

4. Chain Ladder Method: Bayesian Bootstrap versus Classical Bootstrap. Gareth Pe-ters, Mario Wuthrich, Pavel Shevchenko - Insurance: Mathematics and Economics, toappear (2010)

5. (Subgroup 1 and 4) An adaptive SMC method for approximate Bayesian computation.Pierre Del Moral, Arnaud Doucet and Ajay Jasra, submitted January 2009.

6. Inference in Levy-driven stochastic volatility models. Ajay Jasra, Dave Stephens andArnaud Doucet, Scandinavian Journal of Statistics, 2010.

7. A vanilla Rao–Blackwellisation of Metropolis-Hastings algorithms. Douc, R. and Robert,C.P, Annals of Statististics, 2010.

8. Adaptive Multiple Importance Sampling, Cornuet, J.M., Marin, J.-M., Mira, A. andRobert, C.P., available as arXiv:0907.1254, submitted (2009)

9. An adaptive sequential Monte Carlo sampler. Paul Fearnhead and Benjamin M Taylor,submitted May 2010.

10. (Subgroup 1 and 4) Auxiliary SMC samplers with applications to Partial RejectionControl and Approximate Bayesian Computation. J. Cornebise, G. Peters, O. Rat-mann in preparation.

11. (Subgroup 1 and 3) Adaptation strategies for Particle Markov Chain Monte Carlo. J.Cornebise, G. Peters, in preparation.

Subgroup 2: SMC samplers for Normalizing Constant CalculationsIt is possible to use SMC to compute normalizing constants of high-dimensional distri-

butions. In physics this strategy is known as Jarzinsky’s equality. Recently an alternative

273

method known as nested sampling has appeared in the literature. This method enjoys severaladvantages compared to standard techniques. However, it remains inefficient when appliedto multimodal distributions. We are currently studying an adaptive SMC version of nestedsampling. Our preliminary results indicated that this new method outperforms significantlythe original nested sampling algorithm in complex scenarios. We are currently investigatingthe theoretical properties of the resulting estimate.

1. Particle nested sampling, Arnaud Doucet and Christian P. Robert, in preparation.

Subgroup 3: Particle Markov chain Monte CarloParticle MCMC is a new class of methods which allows us to use SMC proposals within

MCMC algorithms (Andrieu, Doucet & Holenstein, 2010). There are several open questionsto address such as selecting the optimal trade-off between the number of MCMC itera-tions/number of particles or how to select adaptively the number of particles as a functionof the current parameter value so as to ensure that the variance of the marginal likelihoodis below a given threshold. We are currently studying theoretically the performance of thesealgorithms so as to identify the optimal tradeoff; our study relies on new sharp convergenceresults for SMC estimates of normalizing constants. We have also proposed some extensionsof Particle MCMC methods which allow us to solve optimization problems. These extensionsrely on new combinatorial identities for SMC schemes.

1. Particle Markov chain Monte Carlo methods. C. Andrieu, A. Doucet and R. Holenstein,Journal of the Royal Statistical Society B (with discussion) (2010).

2. Comment on ”Particle Markov Chain Monte Carlo”. Julien Cornebise and GarethPeters (SAMSI technical Report) merged version of two comments sent to JRSSB(2010)

3. Ecological non-linear state space model selection via adaptive particle Markov chainMonte Carlo (AdPMCMC). Gareth Peters, Keith Hayes, Geoff Hossack (2010)

4. Channel Tracking for Relay Networks via Adaptive Particle MCMC. Ido Nevat, GarethPeters, Arnaud Doucet and Jinhong Yuan (2010)

5. (Subgroup 1 and 3) J. Cornebise, G. Peters, Adaptation strategies for Particle MarkovChain Monte Carlo, in preparation.

Subgroup 4: Likelihood-free inference and Approximate Bayesian Computa-tion (ABC) Likelihood-free inference is a rising challenge in Statistics, for models where thelikelihood of the observations is either intractable or very expensive to compute.ApproximateBayesian Approximations (ABC) algorithms, which replace the likelihood of the observationsby a distance on a space of summary statistics between Monte-Carlo-based simulated obser-vations and the data. Our subgroup developed the use of SMC Samplers as the cornerstonein the MC part of ABC algorithms in lieu of crude MC or of MCMC, using the coolingschedule of SMC samplers as a series of progressive thresholds on the distance of the sum-mary statistics, thus fighting the depletion of the sample and increasing dramatically the

274

computational efficiency. We also brought those ABC algorithms to numerous models andapplied problems where classical likelihood-based methods are helpless, especially in financialengineering, telecommunication relay systems, and in model choice for spatially correlateddata.

1. Design Efficiency for ”likelihood free” Sequential Monte Carlo samplers. Gareth Peters,Scott Sisson, Yanan Fan (2008)

2. (Subgroup 1 and 4) Beaumont, M., Robert, C.P., Marin, J.-M. and Cornuet, J.M.Adaptivity for ABC algorithms: the ABC-PMC scheme. Biometrika 96(4), 983-990.

3. On Sequential Monte Carlo, Partial Rejection Control and Approximate Bayesian Com-putation. Gareth Peters, Yanan Fan, Scott Sisson (2009)

4. Likelihood-Free Bayesian Inference for Alpha-Stable Models. Gareth Peters, ScottSisson, Yanan Fan (2009)

5. Likelihood-free methods for model choice in Gibbs random fields. Grelaud, A., Marin,J.-M., Robert, C.P., Rodolphe, F. and Tally, F., Bayesian Analysis, 3(2), 427-442(2009)

6. Bayesian Symbol Detection for Relay Systems via Likelihood-free Inference. GarethPeters, Ido Nevat, Scott Sisson, Yanan Fan and Jinhong Yuan. IEEE Transactions onSignal Processing, to appear (2009)

7. Likelihood-free Samplers. Scott Sisson, Gareth Peters, Yanan Fan, Mark Briers, toappear (2010)

8. (Subgroup 1 and 4) An adaptive SMC method for approximate Bayesian computation.Pierre Del Moral, Arnaud Doucet and Ajay Jasra, submitted January 2009.

9. Filtering with Approximate Bayesian Computation. J. Cornebise, G. Peters, in prepa-ration

10. (Subgroup 1 and 4) Auxiliary SMC samplers with applications to Partial RejectionControl and Approximate Bayesian Computation. J. Cornebise, G. Peters, O. Rat-mann in preparation

Subgroup 5: Applications of PMC to cosmologySome of the participants of the SAMSI program were involved in an ongoing partnership

with Institut da Astrophysique de Paris, and therefore gave way to innovative applications ofPopulation Monte-Carlo on cosmology-related problems. They applied the easy paralleliza-tion and higher computational efficiency of PMC to lead Bayesian inference of the parame-ters for data consisting of CMB anisotropies, supernovae of type Ia, and weak cosmologicallensing, and provided a comparison of those results to those obtained using state-of-the-artMarkov chain Monte Carlo (MCMC), reducing the computation time from fays to hoursusing PMC on a cluster of processors.

275

In a second work, PMC was used to compare cosmological models in the context of darkenergy and primordial perturbations: cosmology has spawned a multitude of different models,which Bayesian paradigm compares using directly the posterior probabilities of models, infavour of each of one of the given models. From a practical viewpoint, especially for high-dimensional parameter spaces, the calculation of the evi- dence is very challenging. Whilefast approximations exist, such as the Bayesian Information Criterium or variational Bayesthey can fail dramatically for posterior distributions which are not well approximated bya multivariate Gaussian. They used PMC method to estimate the Bayesian evidence andassess the accuracy and reliability of this estimate, using the same set of sampled values usedfor parameter estimation and to to calculate the Bayesian evidence. Thus with the PMCmethod, model selection comes at the same computational cost as parameter estimation.

1. Estimation of cosmological parameters using adaptive importance sampling. Wraith,D., Kilbinger, M., Benabed, K., Cappe, O., Cardoso, J.-F., Fort, G., Prunet, S., Robert,C.P. Physical Review D, 80, 023502 (2009)

2. Kilbinger, M., Wraith, D., Robert, C.P. , Benabed, K., CappA c©, O., Cardoso, J.-F., Fort, G., Prunet, S., Bouchet, F. Bayesian model comparison in cosmology withpopulation Monte Carlo. Available as arXiv:0912.1614, submitted (2009)

Subgroup 6: Stability of Mixture Kalman FiltersA collaboration developed during the stay at SAMSI of two members of the PMC working

group on a neighboring topic. This stay at the SAMSI allowed Sylbain Rubenthaler to workwith Nicolas Chopin (CREST-ENSAE Paris) on a problem he had been considering for along time : the stability of the so-called “mixture Kalman filters”. They wrote an articleon the stability of particle algorithms with a Feynman-Kac representation such that thepotential function may be expressed as a recursive function which depends on the completestate trajectory. This includes the mixture Kalman filter. They studied the asymptoticstability of such particle algorithms as time goes to infinity. Sylvain Rubenthaler also starteda ongoing collaboration with Amarjit Budhiraja (UNC Chapel Hill) on reflected diffusion,more precisely: the numerical approximation of the stationnary law of a relfected diffusionby means.

1. Stability of Feynman-Kac formulae with path-dependent potentials. Chopin, Nicolasand Del Moral, Pierre and Rubenthaler, Sylvain, to appear in Stochastic Processes andtheir Applications (2010)

4.7.4 Particle Learning

The particle learning working group was led by Hedibert Lopes.

Key points

• PL as a flexible framework sequential Bayesian computation, comparable and comple-mentary to MCMC schemes;

276

• Empirical evidence of the superiority (in terms of MC error) of resample-sample filters(APF) over sample-resample filters (SIS).

• Hybrid filters that combines auxiliary particle filter (APF), Storvik’s and PL for stateand parameter learning;

Scientific papers Accepted papers

1. Particle learning and smoothing. Statistical Science (to appear). By CarlosCarvalho, Michael Johannes, Hedibert Lopes and Nicholas Polson.

2. Particle learning for sequential Bayesian computation (with discus-sion). Bayesian Statistics 9 (to appear). By Hedibert Lopes, Carlos Carvalho, MichaelJohannes and Nicholas Polson.

3. Particle filters and Bayesian inference in financial econometrics. Jour-nal of Forecasting (to appear). By Hedibert Lopes and Ruey Tsay.

4. Bayesian inference for stochastic volatility modeling. In Bocker, K.,editor, Rethinking Risk Measurement, Management and Reporting: Bayesian Analysisand Expert Elicitation (to appear). By Hedibert Lopes and Nicholas Polson.

5. Bayesian computation in finance. In Chen, M.-H., Dey, D., Muller, P., Sun, D.and Ye, K., editors, Frontiers of Statistical Decision Making and and Bayesian Analysis— In Honor of James O. Berger (to appear). By Satadru Hore, Michael Johannes,Hedibert Lopes, Robert McCulloch and Nicholas Polson.

6. Extracting SP500 and NASDAQ volatility: The credit crisis of 2007-2008. In O’Hagan, T. and West, M., editors, Handbook of Applied Bayesian Analysis.By Hedibert Lopes and Nicholas Polson.

Submitted papers

7. Tracking flu epidemics using Google trends and particle learning. ByVanja Dukic, Hedibert Lopes and Nicholas Polson.

8. Sequential parameter learning and filtering in structured AR models.By Raquel Prado and Hedibert Lopes.

9. Particle learning for general mixtures. By Carlos Carvalho, Hedibert Lopes,Nicholas Polson and Matt Taddy.

10. Bayesian statistics with a smile: a resampling-sampling perspective. ByNicholas Polson, Hedibert Lopes and Carlos Carvalho.

11. Sequential parameter estimation in stochastic volatility models. ByMaria Rıos and Hedibert Lopes.

Work in progress

277

12. Particle learning for generalized dynamic conditionally linear mod-els. By Carlos Carvalho, Hedibert Lopes and Nicholas Polson. (75% finished).

13. Particle learning for fat-tailed distributions. By Hedibert Lopes andNicholas Polson. (75% finished).

14. Sequential Monte Carlo estimation of DSGE models. By Hao Chen, FrancescaPetralia and Hedibert Lopes. (75% finished).

15. Learning in a regime-switching macro-finance Nelson-Siegel model. ByBruno Lund and Hedibert Lopes. (75% finished).

16. Learning expected inflation from nominal and real yields. By SatadruHore, Hedibert Lopes and Juha Seppala. (50% finished).

17. Sequential Monte Carlo methods for long memory stochastic volatil-ity models. By Christian Macaro and Hedibert Lopes. (50% finished).

Books Two books published by Working Group members during the 2008-2010 periodcontain chapters with detailed explanation, literature review, examples and comparisons ofsequential Monte Carlo filters.

Raquel Prado and Mike West (2010) Time Series: Modelling, Computation and Inference.Boca Raton: Chapman & Hall/CRC Press.

Giovanni Petris, Sonia Petrone and Patrizia Campagnoli (2009) Dynamic Linear Modelswith R. New York: Springer.

PhD Students

Francesca Petralia and Hao Chen

Francesca (Department of Statistical Sciences, Duke University), Hao (Fuqua Schoolof Business, Duke University) and I are finishing the paper Sequential Monte Carloestimation of dynamic stochastic general equilibrium models. The paper’s novelty isthe proposal of a full SMC scheme to estimate non-linear DSGE models in an on-linefashion, which allows simultaneous filtering of states and parameters. We apply ourfilter on a neoclassical growth model, where structural parameters are sequentiallyestimated compared to currently used MCMC schemes. In addition, we illustratesequential model checking/assessment via SMC by comparing the Smets and WoutersDSGE model for the Euro area with several Bayesian vector autoregressive (BVAR)models. Hao presented the paper during the 2009 JSM meetings in Washington, D.C.and during the 2010 Seminar on Bayesian Inference in Econometrics and Statistics(SBIES) in Austin, Texas.

Bruno Lund

December 2009 PhD Thesis on Term structure models with non-affine dynamics andmacro-variables, Graduate School of Economics, Getulio Vargas Foundation, Rio de

278

Janeiro. I was Bruno’s co-advisor along with Caio Almeida. Bruno spent the 2008-2009 academic year working under my supervision as a visiting PhD student at ChicagoBooth. We have just finished the paper Learning in a regime-switching macro-financeNelson-Siegel model, which study the US post-WW2 joint behavior of macro-variablesand the yield-curve, which extends previous work by incorporating the possibility ofregime changes in order to track NBER cycles. The paper contributes to the sequentialMonte-Carlo estimation literature by providing an interesting context for the combi-nation of our particle learning (PL) algorithm with Liu and West’s (2001) filter.

Maria Paula Rios

Maria is working under my supervision in the Statistics and Econometrics group atthe University of Chicago Booth School of Business. We have just finished the paperSequential parameter estimation in stochastic volatility models, which combines Liuand West’s (2001) with Storvik’s (2001) particle filters to the class of Markov switch-ing stochastic volatility (MSSV) models (Carvalho and Lopes, 2007). We show thatCarvalho and Lopes’ (2007) particle filter degenerates or has larger Monte Carlo errorthan our extension. Our filter takes advantage of recursive sufficient statistics thatare sequentially tracked and whose behavior resembles that of a latent state with con-ditionally deterministic updates. The performance of our filter is also assessed whencomparing its sequential estimation of SP500 volatilities to the VIX index, which isthe CBOE volatility index.

Post-doctoral researchers Christian Macaro (DSS, Duke University) and I are workingon the paper Sequential Monte Carlo Methods for Long Memory Stochastic Volatility Models,which proposes to implement an alternative representation based on the aggregation of firstorder autoregressive models. The resulting framework is particularly suitable for the imple-mentation of SMC methods, since only the marginal posterior distribution of the particlesis required. Comparisons with alternative truncation representation is also provided. Anapplication emphasizes the benefits of this proposal when dealing with real time estimationof volatility of financial high frequency data.

4.7.5 Model Assessment and Adaptive Design

The working group leader was Carlos Carvalho.

Summary Goals and Outcomes Following up the discussions in the kick-off workshop(September 2008) the “Model Selection and Adaptive Design” (MAAD) working groupwas formed with the intent to enhance, explore and demonstrate the potential of particlebased methods to address issues related to model uncertainty and sequential design/decisionmaking. The group focuses on applications (listed below) where either the computation ofmodel probabilities or the exploration of model spaces represent an enormous challenge thatrequires effective computation strategies. The central goal of our efforts is to make use of“state of the art” SMC techniques in trying to tackle these issues.

Since its formation, the group has met weekly at SAMSI for discussions of relevant issuesand to report on progress made by many of the participants.

279

Specific Goals and Areas of Focus At the current stage the group has identified fourmain areas of focus, as described below:

1. Particle Model SelectionOur goal is to develop a general class of particle methods to accommodate uncertainty invariable selection in high-dimensional settings. There is a rich Bayesian literature on variableselection and stochastic search methods for linear regression models, but very little work hasbeen done for nonparametric models that allow the conditional distribution of a responseto change flexibly with predictors. Our initial plan was to develop an efficient particlestochastic search (PSS) approach for high-dimensional variable selection in linear regression,while simultaneously developing a Particle Learning algorithm for posterior computationin probit stick-breaking processes (PSBPs). PSBPs are a recently proposed nonparametricBayes modeling framework, which allow conditional distributions to change flexible withpredictors. Due to the conjugacy of the PSBP after data augmentation, it should be possibleto adapt the Particle Learning algorithm to include a PSS component. This will allowselection of variables having any impact on the conditional distribution of a response, whilealso accommodating responses having arbitrary scales (continuous, categorical, count, etc).An additional topic that the group will focus on is development of efficient particle methodsfor calculating Bayes factors for comparing non-nested models. The idea is to initially devotea similar number of particles to each model, but then through resampling as the algorithmprogress, devote increasing numbers of particles to the better models in the list. This willallow accurate posterior computation and estimation of marginal likelihoods for good models,while not wasting computational effort on poor models.We have made substantial progress in the above areas. Here are specifics of each project:

• “Particle stochastic search for high-dimensional variable selection” (Shi andDunson) - we have continued to make progress in refining our particle stochastic search(PSS) algorithm and have compared the algorithm in a variety of settings to shotgunstochastic search (SSS). We also have results comparing to SSVS for simulated exam-ples and a real data application taken from Hans et al. SSS paper. The paper withthe above title is in final preparation stage and will be submitted within a few weeks.We will then move our focus to variable/model selection in nonparametric Bayes re-gression models, adapting PSS to allow for the inclusion of parameters/latent variablescommon to the different models in the particles. This will allow variable selection inPSBP mixture models and other interesting cases. We plan to apply this in dynamicmixture model settings as well.

• “Bayesian distribution regression via augmented particle learning” (Dun-son and Das) - we have continued to make progress in developing and implementingan efficient sequential Monte Carlo algorithm for posterior computation and marginallikelihood estimation in a broad class of mixture models that allow the mixing weightsto varying with time, space and predictors. This class of mixture models is referredto as probit stick-breaking mixtures and has the appealing property of facilitating ef-ficient computation through a data augmentation strategy. In particular, for manyuseful special cases, one can obtain the marginal likelihood in closed form integrating

280

out all of the parameters but conditioning on latent normal variables. Our proposed“augmented particle learning” (APL) algorithm proceeds by sequentially adding sub-jects in parallel to each of a large number of particles, sampling from the conditionalposterior distributions of the latent variables as subjects are added and resamplingappropriately. The method avoids the need for sequential importance sampling forupdating of particles, instead relying on direct sampling, with marginalization used toimprove efficiency. We have primarily code for count regression models which allow theconditional distribution of a count response to change flexibly with a predictor, andhave already obtained good results for a mixture of Poisson case with no predictors.A manuscript with the initial results will be submitted within few weeks. An abstractfollows:

To limit assumptions in modeling of conditional response distributions, hierarchicalmixtures-of-experts models allow the mixing weights in a regression model to vary flexi-bly with predictors. Nonparametric Bayes methods can be used to incorporate infinitely-many components, allowing effective model dimension to increase with sample size.However, MCMC algorithms for posterior computation often encounter mixing prob-lems due to multimodality of the posterior. Focusing on a broad class of probit stick-breaking process priors for conditional response distributions indexed by time, space orpredictors, we propose an efficient augmented particle filter for posterior computationand approximation of marginal likelihoods. The algorithm sequentially updates randomlength latent normal vectors within each particle as subjects are added, avoiding trun-cation of the infinite collection of random measures. Through marginalization afterdata augmentation, the approach bypasses the need to update parameters, dramaticallyimproving efficiency while avoiding degeneracies. The method can be applied broadlyfor continuous, count or categorical response variables. The methods are illustratedusing simulated examples and an epidemiologic application.

Primary subgroup participants: David Dunson (Duke Univ.), Minghui Shi (PhD stu-dent, Duke Univ.), Sourish Das (SAMSI and Duke Univ.) and Artin Armagan (Duke Univ.).

281

2. Adaptive DesignFor expensive data, as those arising from computer models, astronomy data, destructiveexperiments, etc, careful designs which contemplate how many data points will be obtained,where and when, is mandatory. These design problems, for these expensive experiments,have to, almost unavoidably, be sequential and adaptive so as to best use the very scarceand expensive information. Sequential decision problems (of which sequential designs areparticular cases) involve “look ahead” computations for all possible future observations,which might be computationally challenging for complex models. We intend to explore SMCmethods to help with these computations.

An initial paper is under way as detailed below:

• “Adaptive sampling for Bayesian variable selection” (Fei Liu, Fan Li and Dun-son) - the problem is to sequentially select subjects based on their predictor values,with the response value obtained for the selected subjects and the objective being opti-mal performance in model selection. We have the methods details worked out and FeiLiu has implemented a couple of simple examples where she demonstrates substantialadvantages relative to selecting the subjects in a random order. We have discussedstrategies for proving improvements theoretically under the assumption that the num-ber of subjects in the pool to draw from is large, so that we can avoid finite populationsampling complications. Fan has found an interesting data example to motivate theapproach and the paper should be completed in a month or two depending on Fei’stime. Fei and David have discussed moving on to a “active transfer learning” prob-lem in which there are multiple related regression models and one wants to borrowinformation in selection of models across the related models.

Primary subgroup participants: Susie Bayarri (Univ. of Valencia and SAMSI), JimBerger (SAMSI and Duke Univ.), Merlise Clyde (Duke Univ.), Tom Loredo (Cornell Univ.),Ana Corberan (PhD student, Univ. of Valencia), Fei Liu (Univ. of Missouri) and Fan Li(Duke Univ.).

3. Sequential Model MonitoringIn this subgroup we focus on problems of sequential model reassessment and model spaceexploration as new observations become available. The examples we have been developing sofar involve sequential posterior inference about graphical structures underlying the covariancematrix of innovations in dynamic linear models. These models have been applied in largescale sequential portfolio allocation where the graphs provide a regularization tool for thecovariance matrix of assets. The development of sequential model selection procedures thataddress uncertainty about graphs while allowing for on-line updates is an open research areaand one of key importance in further applications of DLMs in real forecasting problems. Inour first attempt to solve this problem, we have been using particles systems as discreteapproximations for the posterior distribution of models. Hao Wang has been coding some ofthe ideas discussed and the initial results are promising. We have made significant progressin this area and an first draft of a paper by Hao Wang, Craig Reeson and Carlos Carvalhois ready and should be submitted before the summer. It follows the title and abstract:

282

• “Sequential Learning in Dynamic Graphical Models” – We propose a naturalgeneralization of the dynamic matrix-variate graphical model (Carvalho and West 2007)to time varying graphs. The generalization uses the multi- process modelling idea tointroduce sequential graphical model selection procedures that address uncertainty aboutgraphs while allowing for efficient on-line updates. To develop an efficient Bayesianapproach for sequentially searching high-dimensional graphical models, we describe afeature-inclusion particle stochastic search algorithm, or FIPSS. The FIPSS algorithmallows parallel exploration of the search space using estimates of edge inclusion prob-abilities. The model is illustrated using financial time series for predictive portfolioanalysis.

Primary subgroup participants: Carlos M. Carvalho (Univ. of Chicago and SAMSI),Hao Wang (PhD student, Duke Univ.) and Craig Reeson (Undergraduate student, DukeUniv.)

4. Dynamic ControlThe main objective of this subgroup is to study problems that have a dynamic (sequential)decision making component as well as some uncertainty about the system parameters thatwould require Bayesian updating. We are particularly interested in problems that arise inhealth care settings where a decision maker (a doctor/nurse, emergency response officer,hospital management, etc.) will have to give decisions regarding the treatment options ofpatients or allocation of scarce resources to a group of patients. These decisions are dynamicin nature as the conditions of patients change with time. Such problems have been studiedcommonly in the Operations Research literature. However, almost all of the earlier studiesassume that the decision maker has complete information about the states of patients andthe system parameters. There are several situations where such an assumption of perfectinformation may not be realistic. For example, for rarely observed diseases or disasters thatinvolve nuclear agents, there does not exist sufficient data to estimate parameters that areneeded in solving the dynamic control problem. Our objective is to study such dynamiccontrol problems where the decision maker will learn about the disease or the emergencyevent under consideration as the decisions are made sequentially. As an initial step, we willconsider the following problem. Consider a system with several patients in need of care from asingle resource (a doctor or an operating room). The patients are affected by the same diseaseor the same traumatic event but they could be in different stages of criticality. The stage thata patient is in may affect the cost of keeping that patient waiting, the service requirement ofthat patient, and also the success probability of the operation. The decision maker cannotobserve the true states of patients but can observe certain signals that the patients send (forexample, heart rate, blood pressure, etc.). Based on these signals, the decision maker decideswhich patient should be taken into service next with the objective of maximizing the totalexpected utility. As we mentioned earlier, the decision maker does not know exactly howthe signals and the true states of patients relate and how the patients conditions degrade.When each patient is taken into service, we can observe the true condition of the patientand based on such information collected, we can update the unknown parameters related tothe disease/condition. Then, with this updated information, we make the next decision toserve another patient.

283

Nilay Argon and Melanie Bain are currently using SMC methods in solving a dynamiccontrol problem that arises in the aftermath of mass-casualty incidents. To be more specific,we consider a mass-casualty event (such as a plane crash or a terrorist bombing) that resultedin several casualties in need of care. Due to the massive number of casualties, the medicalresources are overwhelmed and decision makers need to prioritize patients for service. De-pending on their injuries, the patients could be in different stages of health. The stage thata patient is in may affect his/her probability of survival and also service requirement. Thedecision maker cannot observe the true states of patients but can observe certain signalsthat the patients send (for example, pulse, breathing rate, etc.). Based on these signals,the decision maker decides which patient should be taken into service dynamically with theobjective of maximizing the total expected number of survivors. We initially assume thatthe decision maker knows how the signals and the true states of patients relate. We alsoassume that the patientsO conditions degrade according to a discrete time Markov chainwith a known transition probability matrix.

We first formulated the above problem as a partially observable Markov decision process(POMDP). The POMDP we obtained could have a very large belief state depending on thenumber of patients involved and also the number of health stages that we define. Hence, wewill need to use an approximate method to solve this problem. We have thus far consideredtwo approaches from the literature. One is by Thrun (2000), where particle filtering is usedto reduce the size of the belief space, and the other is by Luo, Fu, and Marcus (2008),which is based on projecting the high-dimensional belief space to a low-dimensional familyof parametrized distributions. We are currently implementing ThrunOs approach.Primary subgroup participants: Nilay Tanik Argon (UNC), Abel Rodriguez (Univ. ofCalifornia, SC), Melanie Bain (PhD student, UNC) and Kai Wang (PhD student, DukeUniv.)

Papers

• Dynamic Financial Index Models: Modeling Conditional Dependenciesvia Graphs. By by Hao Wang, Craig Reeson and Carlos M. Carvalho. http://ftp.stat.duke.edu/WorkingP21.pdf (submitted to Bayesian Analysis)

• Dynamic Stock Selection: A Structured Factor Model Framework ByC. Carvalho et al., Invited paper, 2010 Valencia Meeting on Bayesian Statistics.

4.7.6 Continuous Time

The working group leaders are Jashya Bishwal, Paul Fearnhed and Jochen Voss. During theinitial formative phase, the working group in continuous time models, parameter estimationand finance started a review of the relevant literature. Several articles were presented duringthe group meetings. Topics covered include the following:

- filtering discrete observations of a continuous time signal- exact algorithms- filtering and parameter estimation using a windowed SMC method (in discrete time)- change point problems for continuous time processes- filtering for a CIR process given Poisson observations

284

The working group focussed on four areas. Firstly, SMC for hierarchical branching processmodels, with application for qPCR analysis. Secondly, methods for survival data, where theunderlying hazard depends on an unobserved stochastic process. The application for thiswork is to modelling, analysis and prediction of mortgage default rates. Thirdly, they havelooked at inference for diffusion models via ”least-action”: defining and calculating a bestpath for the unobserved diffusion, and constructing Laplace approximations to the transitiondensity of the diffusion. Applications here include models in systems biology. Finally, welooked at new inference methods for α-stable Levy processes.

The research has led to a grant proposal to the UK’s EPSRC, which will develop onthe least action filtering and α-stable models by Rogers, Godsill and West. This latter arealed to several grant proposals and a funded project from Fraunhoffer Institute in Germany(contact Ralf Korn) in which a PhD student, Tatjana Lemke, funded by Fraunhoffer, is basedin Cambridge with Simon Godsill and working on Monte Carlo inference for α-stable Levyprocesses. There is a further grant on Laplace approximations, with a focus on their use fordiffusion models, being prepared by Fearnhead, Tawn and Sherlock.

• ’Subprime Mortgage Default’ James B. Kau ,Donald C. Keenan, Constantine Lyubi-mov, V. Carlos Slawson (submitted 2010)

• Anand Vidyashankar (In prep).

• Simulation and Inference for Stochastic Kinetic Models via limiting Gaussian Processes.Paul Fearnhead, Vasilieos Giagos and Chris Sherlock (In prep).

• Inference for autoregulatory genetic networks using diffusion approximations. VasilieosGiagos, PhD Thesis, Lancaster University (In prep; submission by September 2010)

• ENHANCED POISSON SUM REPRESENTATION FOR ALPHA-STABLE PRO-CESSES, Tatjana Lemke and Simon Godsil, Accepted for Proc. IEEE ICASSP 2011.

4.7.7 Big Data & Distributed Computing

Summary Emerging from discussions at the Sept. 2008 opening workshop, the Big Data& Distributed Computing (BD&DC) working group was defined by research challenges andthemes cutting across numerous applications of stochastic computation and sequential meth-ods: scaling of models and analysis methods for increasingly large data sets and in problemswith increasing large spaces of underlying parameters and latent variables. Exploiting multi-core, cluster and parallel hardware promotes a need for basic research and innovations in thedevelopment of computational algorithms and also of model specification and structuring.The opportunity to make progress in these areas, linked to specific motivating applications,led to the formation of this focused working group. BD&DC subgroups were involved in spe-cific research projects under the general goals, with a series of interconnections with some ofthe other working groups involving cross-cutting projects.

Goals and Areas of Focus Exploration, evaluation, development and application of ef-fective computational methods for model fitting and model assessment in problems involvinglarge data sets and high-dimensional latent process parameters (the latter are examples of

285

“big missing data”): sequential Monte Carlo methods, stochastic model search, sequentialimportance sampling and also annealing/optimization methods. Specific research subgroupsare as follows.

BD&DC.1 Distributed computing for SMC (Manolopoulou, Mukherjee, West, Yoshida).

Sequential Monte Carlo methods for model learning, estimation and comparison usingdistributed computing on clusters. This is relevant to several of the specific modelling,methodology and application areas. Activities include study of theory and methods ofimplementing a variety of SMC methods on clusters, using some of the specific modelcontexts of interest to this working group for development. Strategies for parallelisedand cluster-based computation were explored for problems involving large data setsand high-dimensional latent processes, i.e., large missing data sets, the latter focusedon state-space modelling of long time series. This research interacted with researchersin the Tracking working group.

BD&DC.2 SMC in dynamic non-linear models in systems biology (Ji, Mukherjee, West).

SMC and distributed computing in mechanistic, nonlinear state-space models, withmotivating applications in dynamic stochastic models arising in studies of cellularcommunication in biological networks in systems biology. This area involves modeldevelopment and use of customised sequential Monte Carlo methods, and so interactedsubstantially with some of the other program working groups. Specific characteristicsof motivating problems were (a) state-space models with many uncertain parametersthat are observed over time, and for which sequential learning is either desirable ornecessary; (b) very high-dimensional latent processes. In systems biology problems,models are developed mathematically on very fine, discrete time scales, but actualdata is observed at much cruder time scales, so that the fine time scale states becomemissing data in very high dimensions.

This research area also intersected with studies involving stochastic computation formodel fitting in complex computer model emulation, related to the program activitiesat the intersection of research in computation and computer modelling design anddevelopment. This aspect of BD&DC research was represented in talks at the SAMSIworkshop on Adaptive Design, Sequential Monte Carlo and Computer Modeling in April2009 as well as in a SAMSI Topic Contributed Session at the 2009 Joint StatisticalMeetings in Washington DC 2009.

BD&DC.3 SMC for inference on “rare events” with very large data sets (Manolopoulou,West).

Sequential analysis and decision-guided sample selection and learning about rare eventsin mixture modelling with very large data sets. Motivating applications come fromproblems of inference on characteristics of rare sub-populations of biological cells instudies using flow cytometry technology in immunology, vaccine design and other ar-eas. In such studies, a single experiment can easily generate hundreds of millions ofobservations in, typically 10-20 dimensions, representing marker proteins on the sur-face of cells. Random sampling to fit models, such as mixture models for classification

286

and discrimination of sub-populations, is standard, though model fitting becomes chal-lenged by sample size and so sequential methods are inherently interesting. Moreover,a specific focus on generating maximal information of rare sub-populations leads tostatistical design and biased sampling strategies that are inherently sequential and forwhich simulation-based methods need to be developed. The interest in and role fordistributed computation is evident.

BD&DC.4 SMC in nonparametric analyses (Das, Dunson, Li, Liu).

Sequential methods in nonparametric statistical regression and density estimation mod-els, with motivating applications in problems in epidemiology and public health, andin studies of huge data sets in e-commerce and internet traffic research, among oth-ers. This study introduced new classes of SMC algorithms for posterior computationand marginal likelihood estimation in a flexible class of mixture models, which allowmixture weights to vary with time, space and predictors. The proposed augmentedparticle learning (APL) algorithm has had excellent performance in simulation exper-iments for a variety of data types, avoiding degeneracies common to SMC algorithmsthrough use of marginalization after data augmentation. The algorithm has majoradvantages over MCMC algorithms in avoiding mixing problems that plague MCMCfor mixture models, while also allowing marginal likelihood estimation, which allowstesting of competing nonparametric models and parametric vs nonparametric models.

Application areas are numerous. Part of this research generated components of asuccessful NIH proposal, that proposed the further development of the APL algorithmin applications in statistical genetics and gene-environment interactions studies. Thisinvolves models that allow quantitative traits to vary flexibly with high-dimensionalsingle nucleotide polymorphisms and environmental factors.

BD&DC.5 Adaptive sequential sampling for Bayesian variable selection (Li, Liu, Dunson).

Intersections of interests with the working group on sequential Monte Carlo in modelassessment focussed on adaptive strategies for the variable selection problems and ef-ficient Sequential Monte Carlo methods for the evaluation of designs in massive datasets. The problem is to sequentially select subjects based on their predictor values,with the response value obtained for the selected subjects and the the objective beingoptimal performance in model selection. The ideas were developed and methods de-tailed; implementations in a couple of demonstrate substantial advantages relative toselecting the variables in a random order. An interesting study now underway appliesthis to problems with censored data.

This aspect of BD&DC research was represented at the SAMSI workshop on AdaptiveDesign, Sequential Monte Carlo and Computer Modeling in April 2009, as well as in aSAMSI Topic Contributed Session at the 2009 Joint Statistical Meetings in WashingtonDC 2009.

BD&DC.6 Sequential model search in very large model spaces (Dunson, Shi, West, Yoshida).

Stochastic and related deterministic/annealing based methods of search over very large,discrete model spaces such as arising in sparse multivariate factor models with many

287

response variables, and in regression model uncertainty (linear and nonlinear) withmany candidate covariates. Advances in computational methods included innovationsin annealed entropy methods and Bayesian shotgun stochastic search methods. Somespecific motivating applications came from genomics and public health contexts. Onekey development involved including model indices within the particles of SMC methodsleading to an efficient algorithm for massive dimensional variable selection (Dunson andShi), another involved a synthesis of entropy annealing based global optimization withstochastic search for very large-scale sparse statistical models (Yoshida and West).

BD&DC.7 Novel modeling for tracking (Ji, Manolopoulou, West).

Intersections of interests with the working group on sequential Monte Carlo in compute-intensive tracking problems generated novel model and computational methods devel-opment for tracking problems with prototype applications in monitoring and trackingmany cells in systems biology experimental data. Data arising from motivating appli-cations include studies in computational immunology driven by experiments in vaccinedesign, where the motion of multiple different cell types is monitored by measuredfluorescent intensities of cell surface marker proteins. Research here involved novelBayesian dynamic, non-parametric models for inhomogeneous spatial intensity func-tions and sequential Monte Carlo methods development for model fitting.

BD&DC.8 SMC in very large inverse problems (Mukherjee, West).

Emerging from BD&DC discussions in spring 2009, a new project emerged involving asubgroup of this working group, involving approaches to computer model-data synthe-sis and inversion in atmospheric chemistry (CO) monitoring and data synthesis. Witha focus on short time-scale inference on improved understanding of the impact of earthsurface fires (tropical forest fires, savannah fires, etc) on variations atmospheric CO,a core challenge is integration of massive amounts of high-resolution data from newsatellites launched in 2009 with predictions from deterministic biophysical simulationmodels. Sequential Bayesian methods development were initiated as a spin-off fromthis BD&DC discussion, and have led to a new, free-standing collaboration with at-mospheric chemists as well as defining a core thesis topic for Mukherjee. Two papersare in draft (May 2010). This new collaboration with disciplinary computer modelersis an excellent example of a really big, and cluster compute-intense BD&DC problem.

Participants This working group involves local faculty participants, SAMSI postdoctoralfellows, SAMSI visitors, SAMSI and non-SAMSI graduate students, and represents variousareas of statistics and computational science. Several junior and female researchers areinvolved, including some who were quite new to the general area of Sequential Monte Carlo,as well as the specific areas of this working group, prior to the program. See Table 1 forthe list of primary and active participants, as well as additional participants who either hadsome engagement in formative discussions, or who had occasional interactions in BD&DCmeetings, or who were short-term SAMSI visitors.

Student Involvement

288

Name (gender) Position Affiliation Dept/Discipline

(A)Carlos Carvalho (m) Assistant Professor Chicago Statistics & Econ.Sourish Das (m) Postdoc SAMSI StatisticsDavid Dunson (m) Professor Duke Statistical ScienceChunlin Ji (m) Graduate RA Duke & SAMSI Statistical ScienceFan Li (f) Assistant Professor Duke Statistical ScienceFei Liu (f) Assistant Professor Missouri StatisticsIoanna Manolopoulou (f) Postdoc SAMSI StatisticsChiranjit Mukherjee (m) PhD student Duke Statistical ScienceMinghui Shi (f) Graduate RA Duke & SAMSI Statistical ScienceRyo Yoshida (m) Assistant Professor ISM Tokyo StatisticsHao Wang (m) Graduate student Duke Statistical Science‡Mike West (m) Professor Duke Statistical Science

(B)Ernest Fokoue (m) Assistant Professor Kettering MathematicsAmadou Gning (m) Postdoc Lancaster Communication SystemsSteve Koutsourelakis (m) Assistant Professor Cornell EngineeringLyudmila Mihaylova (f) Lecturer Lancaster Communication SystemsMario Morales (m) Consultant Emetricz Engineering/StatisticsDeb Roy (m) Assistant Professor Penn State StatisticsAndrew Thomas Lecturer St. Andrews University Ecology & StatisticsJoshua Vogelstein (m) PhD student Johns Hopkins Neuroscience

Table 1: (A) Primary and local participants (‡Working Group leader); (B) Addi-tional researchers (initial participants, collaborators and/or short-term visitors inthe BD&DC group) {tab:BigData1}

Chunlin Ji (SAMSI RA) was primarily attached to the Tracking (Godsill) working groupbut also participated actively in the BD&DC group with West on spatial dynamicmodelling for biological cell tracking problems and nonlinear modeling. Ji developedSMC methods in the context of new classes of models. This research grew out ofexisting work of Ji & West in static problems, now extended with new dynamic modelsthat will formed additional parts of Ji’s PhD thesis research, and led to expandedfollow-on research in the area with SMC postdoc Ioanna Manolopolou mentioned below.Ji led discussions on this work at several BD&DC meetings, gave a talk at the February2009 mid-program workshop, presented this work at the 7th Workshop on BayesianNonparametrics in Turin, Italy, in June 2009 and at the 2009 Joint Statistical Meetingsin Washington DC 2009. Ji defended his PhD in December 2009 and moved to apostdoctoral position with Jun Liu in Statistics at Harvard.

Chiranjit Mukherjee (Duke graduate student) actively participated in the BD&DC group(though was not officially supported by the program). Mukherjee developed studies of

289

SMC methods for model fitting and comparison in nonlinear dynamical models arisingfrom systems biology (and other applications). These studies involve very long timeseries but for which most of the underlying states are unobserved, and his work hasexplored, evaluated and developed novel approaches to SMC using distributed com-putation. In March 2009, Mukherjee presented and passed his PhD preliminary exambased on this work, has one paper published from this work, led several discussionson the topic at the BD&DC meetings, presented a poster at the February 2009 mid-program workshop, presented this work in a SAMSI Topic Contributed Session at the2009 Joint Statistical Meetings in Washington DC 2009 and at several Duke workshops.Mukhjeree then became involved in the new, emerging project initiated in a BD&DCmeeting on large-scale data integration for inverse inference and data synethesis withcomputer models in atmospheric chemistry. He will present this work at the ISBAWorld Meeting in Spain, June 2010. These two topics from his BD&DC interactionsform the basis of his ongoing thesis research; he will defend in May 2011.

Francesca Petralia (SAMSI RA) was attached to the Particle Learning (Lopes) workinggroup but also participated actively in the BD&DC group. Petralia presented a talkon her work with SMC in econometric models in the Particle Learning working group(Lopes, leader) in a SAMSI Topic Contributed Session at the 2009 Joint StatisticalMeetings in Washington DC 2009. Petralia is currently finalizing a paper, with Lopes,on this SMC project.

Minghui Shi (50% SAMSI RA) worked on sequential model search methodology for large,discrete model spaces, typified by “large p” regression model uncertainty. With Dun-son, Shi developed novel extensions of shotgun stochastic search that incorporate newideas from SMC. Shi passed her PhD preliminary exam on this topic in April 2009,and the topic underlies her emerging thesis area. Shi led discussions on this work atBD&DC meetings, presented a poster at the February 2009 mid-program workshop,presented this work in a SAMSI Topic Contributed Session at the 2009 Joint Statisti-cal Meetings in Washington DC 2009, and will present extensions at the ISBA WorldMeeting in Spain, June 2010.

Hao Wang (Duke graduate student) actively participated in the BD&DC group as well asother working groups, (though was not officially supported by the program). Wangworked, in part, on SMC methods for dynamic graphical models with Carvalho andWest, led discussions on the topic at the BD&DC meetings, presented a poster at theFebruary 2009 mid-program workshop, and presented this work at the SAMSI workshopon Adaptive Design, Sequential Monte Carlo and Computer Modeling in April 2009 aswell as in a SAMSI Topic Contributed Session at the 2009 Joint Statistical Meetingsin Washington DC 2009. With Carvalho, Wang wrote a paper on SMC in dynamicgraphical models (currently under review) and the work formed a component of hisPhD thesis. Wang successfully defended his PhD in spring 2009, and will move toa tenure-track faculty position, Assistant Professor of Statistics, University of SouthCarolina, in summer 2010.

Other Activities

290

• BD&DC short-term visitor Andrew Thomas (St. Andrews University) has, as a re-sult of discussions and interactions in the BD&DC group, initiated development ofnew software for SMC based on the OpenBUGS software, and this is expected to bedeveloped and set up as Open Source software.

Manuscripts Manuscripts linked to research in BD&DC and acknowledging SAMSI/NFSsupport:

1. D.B. Dunson & S. Das (2009) Bayesian distribution regression via augmented particlelearning. In preparation.

2. C. Ji & M. West (2009) Spatial dynamic mixture modelling for unobserved pointprocesses and tracking problems. Under revision.

3. F. Liu, D. B. Dunson & F. Zou (2010) High-dimensional variable selection in metaanalysis for censored data, Biometrics, to appear.

4. F. Liu, F. Li & D. B. Dunson (2010) Adaptive design for variable selection in normallinear models, in preparation.

5. F. Liu & M. West (2009) A dynamic modelling strategy for Bayesian computer modelemulation, Bayesian Analysis, 4(2), 393-412.

6. I. Manolopolou, C. Chan & M. West (2009). Selection sampling from large data setsfor targeted inference in mixture modeling, Bayesian Analysis, to appear.

7. I. Manolopolou, X. Wang, C. Ji, H.E. Lynch, S. Stewart, G.D. Sempowski, S.M. Alam,M. West and T.B. Kepler (2009) Statistical analysis of immunofluorescent histology,submitted for publication.

8. C. Mukherjee & M. West (2009) Sequential Monte Carlo in model comparison: Ex-ample in cellular dynamics in systems biology, JSM Proceedings, Section on BayesianStatistical Science. Alexandria, VA: American Statistical Association, 1274-1287.

9. M. Shi & D. Dunson (2009) Bayesian variable selection via particle stochastic search,submitted for publication.

10. H. Wang, C. Reeson & C.M. Carvalho (2009) Dynamic financial index models: Mod-eling conditional dependencies via graphs, invited revision resubmitted.

11. R. Yoshida & M. West (2009) Bayesian learning in sparse graphical factor models viaannealed entropy, Journal of Machine Learning Research, to appear.

291

5 Conclusions and expected outcomes

To conclude, the program has been successfull. It has generated much activity across awide range of topics. We had well-structured Working Groups, holding weekly meetings byWebex. The results of these collaborative research efforts have started appearing in leadingstatistical journals.The collaborations enabled by the program no doubt lead to major grantapplications in the area of SMC and its scientific applications.

292

SAMSI 2008-09 Program on Algebraic Methods

in Systems Biology and Statistics

Final

Reinhard LaubenbacherChair, Program Leaders

On behalf of the Working Groups Leaders

May 2010

1 Program Overview

In recent years, methods from algebra, algebraic geometry, and discrete mathematics have found newand unexpected applications in systems biology as well as in statistics, leading to the emerging newfields of “algebraic biology” and “algebraic statistics.” Furthermore, there are emerging applicationsof algebraic statistics to problems in biology. This year-long program provided a focus for the furtherdevelopment and maturation of these two areas of research as well as their interconnections. Theunifying theme is provided by the common mathematical tool set as well as the increasingly closeinteraction between biology and statistics. The program allowed researchers working in algebra,algebraic geometry, discrete mathematics, and mathematical logic to interact with statisticians andbiologists and make fundamental advances in the development and application of algebraic methodsto systems biology and statistics. The essential involvement of biologists and statisticians in theprogram provided the applied focus and a sounding board for theoretical research.

1.1 Research Foci

Systems Biology: The development of revolutionary new technologies for high-throughput datageneration in molecular biology in the last decades has made it possible for the first time to obtaina system-level view of the molecular networks that govern cellular and organismal function. Wholegenome sequencing is now commonplace, gene transcription can be observed at the system level andlarge-scale protein and metabolite measurements are maturing into a quantitative methodology. Thefield of systems biology has evolved to take advantage of this new type of data for the constructionof large-scale mathematical models. System-level approaches to biochemical network analysis andmodeling promise to have a major impact on biomedicine, in particular drug discovery.

Statistics: It has long been recognized that the geometry of the parameter spaces of statisticalmodels determines in fundamental ways the behavior of procedures for statistical inference. Thisconnection has in particular been the object of study in the field of information geometry, wheredifferential geometric techniques are applied to obtain an improved understanding of inference pro-cedures in smooth models. Many statistical models, however, have parameter spaces that are notsmooth but have singularities. Typical examples include hidden variables models such as the phylo-genetic tree models and the hidden Markov models that are ubiquitous in the analysis of biological

1

data. Algebraic geometry provides the necessary mathematical tools to study non-smooth modelsand is likely to be an influential ingredient in a general statistical theory for non-smooth models.

Algebraic methods

Algebraic biology is emerging as a new approach to modeling and analysis of biological systemsusing tools from algebra, algebraic geometry, discrete mathematics, and mathematical logic. Appli-cation areas cover a wide range of molecular biology, from the analysis of DNA and protein sequencedata to the study of secondary RNA structures, assembly of viruses, modeling of cellular biochemicalnetworks, and algebraic model checking for metabolic networks, to name a few.

Algebraic statistics is a new field, less than a decade old, whose precise scope is still emerging. Theterm itself was coined by Giovanni Pistone, Eva Riccomagno and Henry Wynn. Their book explainshow polynomial algebra arises in problems from experimental design and discrete probability, and itdemonstrates how computational algebra techniques can be applied to statistics. The first of theseapplications have focused on categorical data and include the study of Markov bases and conditionalinference, disclosure limitation, and parametric inference, to name a few.

The central idea underlying algebraic statistics is that the parameter spaces of many statisticalmodels are (semi-)algebraic sets. The geometry of such possibly non-smooth sets can be studied usingtools from algebraic geometry. Many problems in computational biology can be described withinthis framework. This is where algebraic statistics joins algebraic biology as a new methodology forsolving problems in systems biology.

The unifying theme of the program was the development and use of a particular set of tools fromalgebra, algebraic geometry, and discrete mathematics to solve problems in statistics and biology.

1.2 Organization and Program Leadership

Organizing Committee: Peter Beerli (School of Computational Sciences and Department of Bio-logical Sciences, Florida State University), Andreas Dress (Director, CAS-MPG Partner Institute forComputational Biology, Shanghai), Mathias Drton (Department of Statistics, University of Chicago),Ina Hoeschele (Department of Statistics, Virginia Tech, and Virginia Bioinformatics Institute), Chris-tine Heitsch (School of Mathematics, Georgia Tech), Serkan Hosten (Department of Mathematics,San Francisco State University), Reinhard Laubenbacher, Committee Chair (Department of Math-ematics, Virginia Tech, and Virginia Bioinformatics Institute), Bud Mishra (Departments of Com-puter Science, Mathematics, and Cell Biology, Courant Institute, NYU), Don Richards (Departmentof Statistics, Pennsylvania State University), Seth Sullivant (Department of Mathematics, NCSU),Brett Tyler (Department of Plant Pathology and Weed Science, Virginia Tech, and Virginia Bioin-formatics Institute), Ruriko Yoshida (Department of Statistics, University of Kentucky).

1.3 Major Participants

Long-Term Visitors: Edward Allen (Wake Forest University), Elizabeth Allman (Universityof Alaska), James Degnan (University of Canterbury), Alicia Dickenstein (University of BuenosAires), Luis Garcia-Puente (Sam Houston State University), Jeremy Gunawardena (Harvard MedicalSchool), Chris Hillar (MSRI), Serkan Hosten (San Francisco State University), Reinhard Lauben-bacher (VA Tech), Catherine Mathias (Universite d’Evry), Uwe Nagel (University of Kentucky),Edwin O’Shea (UNAM, Mexico), Sonja Petrovic (University of Illinois, Chicago), Giovanni Pistone(Politecnico di Torino), John Rhodes (University of Alaska), Eva Riccomagno (University of Genoa),Anne Shiu (UC Berkeley)

2

Postdoctoral Fellows: Megan Owen (Cornell University), Ahmad Saeid Yasamin (Indiana Uni-versity)

Graduate Students: Wenjie Chen (UNC), Julia Chifman (University of Kentucky), Deidra Cole-man (NCSU), Thomas Friedrich (FU- Berlin), Benjamin Wells (NCSU), Jason Yellick (NCSU)

Faculty Releases: Ian Dinwoodie (Duke), Scott Provan (UNC), Eric Stone (NCSU), Seth Sullivant(NCSU), Yung-Jing Tzeng (NCSU)

2 Description of Activities

2.1 Workshops

The SAMSI program on Algebraic Methods in Systems Biology and Statistics has been bolstered bya number of workshops and special sessions held throughout the year both at SAMSI and nearbylocations.

2.1.1 Opening Workshop:

The Kickoff Workshop and Tutorial was held September 14–17, 2008. The principal goal of theworkshop was to engage a broadly representative segment of the mathematical, statistical, and lifesciences communities to determine research directions to be pursued by working groups during theprogram. Four working groups were formed that eventually merged down to three working groups.

The workshop covered a very broad range of topics in the interactions of algebraic methods withsystems biology and statistics. After introductory tutorials on Sunday, more focuesed talks directedtowards the problem areas to be highlighted throughtout the year were given. Highlighted topicsincluded: algebraic statistical models, combinatorics of biological molecules, automata theory andfinite dynamical systems, phylogenetics, causal models, and random graph models.

The workshop also contained a number of discussion sessions with the goal of identifying re-search areas that would be highlighted throughout the yearlong program. After these intendedworking group target areas were identified, the program members formed breakout sessions to be-gin discussions of topics that would be discussed in the working groups throughout the year. Theparticular areas of the working groups are described below.

The tutorial speakers were: Bernd Sturmfels, Reinhard Laubenbacher, and Elizabeth Allman.The keynote speakers were: Mathias Drton, Jeremy Gunawardena, Christine Heitsch, Bud Mishra,Abdul Jarrah, Chris Schardl, Michael Savageau, Gheorghe Craciun, Sumio Watanabe, BrandilynStigler, Meera Sitharam, Lior Pachter, Brett Tyler, Niko Beerenwinkel, Eva Riccomagno, and SteveFienberg.

2.1.2 Discrete Models in Systems Biology Workshop

The discrete models workshop was held December 3-5, 2008 and was organized by Elena Dimitrova(Clemson University), Ilya Shmulevich (Institute for Systems Biology), and Brandilyn Stigler (South-ern Methodist University). The workshop focused on the use of discrete models in systems biology.Discrete modeling approaches have been applied to a wide variety of biological contexts, includinggene regulatory networks, epidemiology, and ecosystem dynamics. Examples of topics of interest inthe workshop were

1. discrete dynamical systems: multi-state models such as Boolean networks, logical models, andfinite dynamical systems; random networks; and analytic tools including statistical-mechanicalapproaches

3

2. Bayesian networks, including dynamic Bayesian networks and graphical models

3. static networks: interaction networks and graph-theoretic approaches

4. simulation: finite-state machines, agent-based networks, process algebras

A chief goal was to stimulate the organization of working groups focused on addressing keychallenges in discrete modeling in computational systems biology, particularly the establishment ofunifying themes and principles.

As part of our commitment to foster a synergistic community, we organized three question-and-discussion sessions to encourage interactions among workshop participants and two poster sessionsto showcase the work of junior researchers, including postdocs and graduate students.

2.1.3 Algebraic Statistical Models Workshop

Many classical statistical models, in particular Gaussian models from multivariate statistics andmodels for discrete random variables, exhibit algebraic structure in their parameter spaces. Thisworkshop focused on both algebraic and statistical aspects of such algebraic statistical models. It wasintended to complement other mid-program workshops, which focused more on particular applicationareas.

The workshop was held at SAMSI, January 15–17, 2009 and featured topics by working groupparticipants, as well as outside experts, whose opinions helped to provide new research directions inthe program. The organizers were Mathias Drton, Eva Riccomagno, and Seth Sullivant. Focus topicsof the workshop included Markov bases, graphical models, algebraic tools for maximum likelihoodestimation, identifiability problems, and cumulant methods.

The workshop included talks by Steffen Lauritzen, Thomas Richardson, Donald Richards, RurikoYoshida, Elizabeth Allman, Sonja Petrovic, Akimichi Takemura, Serkan Hosten, Hugo Maruri, andJason Morton and a poster session.

2.1.4 Miniworkshop on Systems Biology

A subgroup of the systems biology working group was formed to focus on software developmentfor parameter estimation for discrete models. As part of the subgroup activities a miniworkshopat SAMSI was conducted February 24-26, 2009. Participants included E. Dimitrova, L. Garcia,F. Hinkelmann, A. Jarrah, R. Laubenbacher, B. Stigler, and P. Vera-Licona. The workshop wasfocused on the design of the overall architecture of a software package for parameter estimation andsimulation for Boolean network models. A significant part of the time was devoted to actual codedevelopment.

2.1.5 Evolutionary Biology and Phylogenetics Workshop

Recently there has been a marked synergy between modern biology and higher mathematics. A num-ber of important connections have been established between computational biology and the emergingfield of “algebraic statistics,” which combines combinatorics, computational algebra, polyhedral ge-ometry and statistical modeling. The primary objective of this workshop was to bring together newand established researchers in mathematics, biology, and statistics in order to discuss the crossoverbetween algebraic statistics, molecular evolution, and phylogenetics.

As part of our commitment to foster a synergistic community, we organized several discussionsessions to encourage interactions among workshop participants to actively begin new collaborations,discuss new research directions, and make new connections. For example, we discussed phylogenetic

4

invariants on group-based models, such as Jukes-Cantor model and their applications to tree recon-struction as well as discussion on the phylogenetic mixture models. There were several discussionabout coalescent theory, such as coalescent approach to approximate the distribution of gene trees.Also we discussed inferences about the impact of phenotype on genotype from the ancestral lineageand also some reviews of phylogenetic reconstructions, what are known and unknown.

We invited four researchers to give a one-hour keynote address, and six researchers to give con-tributed (invited) talks of 45 minutes length (those include 5 minutes for questions at the end) atthe workshop. Invited speakers were: Jeff Thorne, Cecile Ane, Junhyong Kim, Tandy Warnow,Seth Sullivant, Eric Stone, Fumei Lam, Laura Kubatko, Jeremy Sumner, Sonja Petrovic, and JesusFernandez-Sanchez.

2.1.6 AMS Special Session: Mathematics of Biochemical Reaction Networks

The special session ”Mathematics of Biochemical Reaction Networks” was held during the South-eastern Section meeting of the AMS at North Carolina State University during the weekend of April4-5. The organizers were Gheorghe Craciun, Manoj Gopalkrishnan, and Anne Shiu. The idea toorganize this workshop in proximity to SAMSI, and close to the evolutionary biology workshop wasformed during the SAMSI opening workshop.

Our intent was to bring together individuals who study biochemical reaction networks, in orderto share ideas that range from those building upon the classical Feinberg-Horn-Jackson deficiencytheory, to those more recent algebraic techniques that highlight the rich algebraic structure inherentin these networks. For example, much work has focused on predicting dynamics and resolving ques-tions of stability simply from the topological structure of the underlying reaction network. Anothertopic to be covered is the class of ”monotone” systems. In particular, this session featured reportson collaborations that grew out of activities during the 2008-9 academic year at SAMSI in NorthCarolina and the Mathematical Biosciences Institute in Ohio.

Special 45-minute talks were given by Martin Feinberg, Jeremy Gunawardena, and EduardoSontag. Several talks in particular had a strong algebraic aspect. For example, Greg Rempalatalked about an algebraic statistical model for inferring biochemical reaction networks. Ezra Millerdiscussed results on binomial primary decomposition and its connection to boundary steady states ofa chemical reaction system. Luis Garcia connected Birch’s Theorem and chemical reaction networktheory to Bezier patches and their generalizations. Alicia Dickenstein shared results that comparedetailed balancing to complex balancing in terms of their associated algebraic varieties. In addition,there were speakers from various backgrounds ranging from theoretical computer science to controltheory to probability, and talks whose titles include the words ”homotopy” and ”number theory.”This session led to fruitful discussions and further collaborations.

2.1.7 Transition Workshop

The transition workshop was designed to synthesize the year’s activities and provide a blueprint togo forward with research in this field. Talks covered a range of topics related to the working grouptopics. Talks by Reinhard Laubenbacher, Seth Sullivant, and Ruriko Yoshida indicated how wewant to move forward beyond the program year. Some talks highlighted successes from the workinggroups, and some talks were chosen to increase the range of topics present in the program.

In addition to the talks, there was ample time for participants to discuss future plans for research,and for the research area. Each day included multiple “Second Chances” sessions, which gave par-ticipants a chance to further engage the speakers and to explore relations between talks, and futuredirections. One outgrowth of these discussions is that Laubenbacher, Sullivant, and Yoshida will

5

coorganize a workshop on Algebraic Methods during the MBI 2011-12 program year on “Stochasticsin Biological Systems”.

Invited speakers were: Reinhard Laubenbacher, Heike Siebert, Luis Garcia-Puente, KatherineSt. John, Megan Owen, Marcy Uyenoyama, Henry Wynn, Ruriko Yoshida, Peter Huggins, GilesGnacadja, Anne Shiu, Olgica Milenkovic, Elena Dimitrova, Paul Kidwell, and Seth Sullivant.

2.1.8 Joint Statistics Meeting Mini-symposium on Algebraic Methods in Systems Bi-ology and Statistics

Ian Dinwoodie organized a mini-symposium, at the Joint Statistics Meeting in Washington D.C. inAugust 2009, that will highlight some of the results that have come out of the SAMSI program. In-vited speakers were: Saeid Yasamin, Simon Lunagomez, Seth Sullivant, and Reinhard Laubenbacher.

2.2 Working Groups

At the end of the opening workshop in September, the afternoon was devoted to the formation ofworking groups for the year. Based on participant interest and program themes, the topics thatemerged were:

1. Systems biology: The relationship between the structure and dynamics of biological networks;

2. Algebraic statistics and experimental design;

3. Evolutionary biology and phylogenetics.

There was significant overlap between topics and membership of the different working groups.For instance, experimental design is a very important topic in systems biology as well.

2.2.1 Relationship between structure and dynamics of biological networks

The working group leaders are R. Laubenbacher (VT) and Brandilyn Stigler (SMU).

One of the dominant themes at the opening workshop was the relationship between the structureof biological networks and the kind of dynamics this structure supports. The questions about thisrelationship is appropriate for a variety of biological networks, ranging from molecular pathways tosocial networks that support the spread of epidemics. For the working group, the focus was entirelyon biochemical networks, encompassing two different modeling frameworks: polynomial dynamicalsystems over finite fields, in particular Boolean networks, and systems of polynomial differentialequations. The primary structure of a network is given by a directed graph that indicates thedependence of the network variables on each other. In both modeling frameworks, one of the goalsis to infer constraints on the dynamics of the network from properties of this graph. In the otherdirection, the goal is to infer the structure of the graph from a partial specification of the networkdynamics, e.g., through a collection of time course experiments.

Working Group Activities.

Since the background of the working group members varies considerably, a primary focus of thegroup activities in the fall and part of the spring were on presentations and discussions aimed atestablishing a common background in systems biology and the different approaches to modeling andsimulation. The group is now at a stage where first results are being presented, beginning with thework of the subgroup on software development, described below.

6

A major problem with the construction of large-scale algebraic models is that there are no so-phisticated tools available, comparable to ODE tools. Most importantly, the tool of fitting ODEmodel parameters to available data is key in continuous model construction, but completely absentfor algebraic models. Also, tools like bifurcation analysis, sensitivity analysis, stability analysis areall unavailable to the algebraic modeler. However, there exist several or all of these tools in the poly-nomial dynamical systems framework. The goal of this subgroup of the working group was to collecttogether available software and integrate it in a coherent package scheduled which was released inApril 2009.

Active participants.Carsten Conradi (MPI Magdeburg), Alicia Dickenstein (University of Buenos Aires), Elena Dim-

itrova (Clemson University), Ian Dinwoodie (Duke University), Lee Falin (Virginia Tech), ThomasFriedrich (TU Berlin), Gilles Gnacadja (Amgen), Richard Haney (Cellular Statistics), FranziskaHinkelmann (Virginia Tech), Serkan Hosten (San Francisco State University), Abdul Salam Jarrah(Virginia Tech), Reinhard Laubenbacher (Virginia Tech), Tong Lee (Virginia Tech), Shaowei Lin(Univ. of California-Berkeley), Megan Owen (SAMSI), Mercedes Soledad Perez Millan (Universityof Buenos Aires), Anne Shiu (Univ. of California-Berkeley), Heike Siebert (FU Berlin), BrandyStigler (Southern Methodist University), Seth Sullivant (NCSU), Jung-Ying Tzeng (N.C. State Uni-versity), Alan Veliz-Cuba (Virginia Tech), Benjamin Wells (N.C. State University), Henry Wynn(London School of Economics), Richard Yamada (University of Michigan), Shantia Yarahmadian(Indiana University), Saeid Yasamin (SAMSI),

2.2.2 Algebraic Statistics and Experimental Design

The Algebraic Statistics and Experimental Design (ASED) working group in the 2008-2009 SAMSIprogram Algebraic Methods in Systems Biology and Statistics had approximately 30 members, in-cluding many remote participants in England, Italy, Japan, and throughout the U.S. The group wasformed during the opening workshop in September, when all members were present and decidedupon research themes and meeting times. The working group leaders were Serkan Hosten of SanFrancisco State University and Ian Dinwoodie of Duke University.

The ASED group worked on a range of statistical applications that use computational tools ofcommutative algebra. These applications include sampling and Monte Carlo methods for discretedata (tables and sequences), experiments (data gathering) and data analysis for reverse engineeringof biological networks, disclosure limitation, and foundations of phylogenetic trees in evolutionaryand population biology. Each application area emphasizes certain algebraic tools and each has rootsin particular research groups with a wide international base. The different mathematical tools andresearch groups were well-represented in the group members. The working groups benefited fromthe active and steady participation of individuals with much depth and experience. In particular,Henry Wynn and Giovanni Pistone shepherded the experimental design part of ASED, and worked oncollaborations with the Systems Biology working group, where design issues are important for findingwiring diagrams and network connections. Phylogenetic trees were supported by John Rhodes,Elizabeth Allman, and Seth Sullivant, who completed foundational work on identifiable tree modelsin the course of the year, and shared their work from the Evolutionary Biology working group.Ongoing work in high-dimensional tables (sampling and disclosure limitation) was well-representedas well, with presentations by researchers and practitioners Akimichi Takemura, Larry Cox, EdwinO’Shea, Adrian Dobra, and others. Also some connections were made with the Sequential ImportanceSampling program through research on sequential Monte Carlo methods for statistical inference onBoolean dynamics in biological networks.

The ASED working group met formally on Mondays at noon, in addition to informal collabo-

7

rations. About half the participants logged-in from remote locations using the Webex networkingapplication. The list of talks and speakers is at www.samsi.info under the ASED working group link,together with supporting materials and documents.

The complete list of ASED working group members is below:Elizabeth Allman (University of Alaska-Fairbanks), Deidra Coleman (N.C. State University),

Lawrence H. Cox (CDC), Elena Dimitrova (Clemson University), Ian Dinwoodie (Duke University),Luis David Garcia-Puente (Sam Houston State University), Hisayuki Hara (Tokyo), Serkan Hosten(San Francisco State University), Thomas Kahle (Leipzig), Imre Risi Kondor (University College Lon-don), Reinhard Laubenbacher (Virginia Bioinformatics Institute), Tong Lee (Virginia Tech), HugoMaruri-Aguilar (London School of Economics), Catherine Matias (CNRS), Uwe Nagel (University ofKentucky), Edwin O’Shea (Avanzados del IPN), Vittorio Perduca, Mercedes Soledad Perez Millan(Universidad de Buenos Aires), Sonja Petrovic (University of Illinois at Chicago), Giovanni Pistone(Torino), Eva Riccomagno (Genoa), Seth Sullivant (NCSU), Akimichi Takemura (Tokyo), CarolineUhler (Univ. of California-Berkeley), Alan Veliz-Cuba (Virginia Tech), Benjamin Wells (N.C. StateUniversity), Henry Wynn (London School of Economics), Richard Yamada (University of Michigan),Ahmad S. Yasamin (SAMSI), Jason Yellick (N.C. State University), Ryo Yoshida (Japan), Yi MingZou (University of Wisconsin-Milwaukee), Or Zuk (M.I.T.), Piotr Zwiernik (University of Warwick)

2.2.3 Evolutionary Biology and Phylogenetics

As part of SAMSI’s 2008-09 program on Algebraic Methods in Systems Biology and Statistics aworking group in ‘Evolutionary Biology’ was formed during the opening workshop in September.Broadly speaking, members of this group were interested in finding, understanding, and solvingproblems arising in evolutionary biology that might require sophisticated mathematical and statisticaltechniques that have yet to be developed.

The group was lead by Seth Sullivant, Elizabeth Allman, and John Rhodes, and during theopening workshop interested participants indicated that primary areas of common interest includedphylogenetics, coalescent theory, population genetics, and comparative genomics.

Working Group Activities.

It was immediately clear that group members had widely diverse backgrounds in statistics, math-ematics, and biology, and that participants needed a ’common language’ and ’common backgroundknowledge’ in order to collaborate. During the first semester and spilling over into the beginningof the second semester, the working group met weekly. Each week a particular group member withexpertise in one of the areas of common interest, gave a talk at an introductory level to familiarizeother group members with the area, discuss his/her research, and suggest possible problems wherealgebraic techniques might yield results. Typically, after an hour or more of introduction by thespeaker, group members discussed the problems and a question-and-answer period began.

The main topics included: the structure of tree space for phylogenetic trees (2 sessions), mixturemodels in phylogenetics and invariants (3 sessions), the coalescent model (4 sessions), geometry ofcophylogeny (1 session), and comparative genomics (1 session).

8

9/23/08 Megan Owen on the space of phylogenetic trees and the geodesic distance: Space of Phylo-genetic Trees

9/30/08 John Rhodes on phylogenetic invariants10/07/08 Seth Sullivant: Some algebraic ideas for phylogenetic mixtures10/14/08 Peter Beerli (Part 1 of introduction to coalescent theory series): Population genetic calcu-

lations that do not fit on the back of an envelope10/21/08 Laura Salter Kubatko (Part 2 of introduction to coalescent theory series):10/28/08 Peter Beerli (Part 3 of the introduction to coalescent theory series): Finding good trees -

Simplifying Coaslescent trees11/04/08 Serkan Hosten: Extended UPGMA and phylogenetic tree reconstruction11/11/08 Rudy Yoshida: Open Problems in Geometry of Cophylogeny11/18/08 Julia Chifman: Group-based models01/19/09 James Degnan: Gene tree distributions and coalescent histories01/26/09 Or Zuk: Annotating the Human Genome Using Comparative Genomics

For a particularly successful ending to the fall semester, the evolutionary biology meeting con-sisted of a session in which individuals suggested open problems for the group to work on.

After an organizational meeting in mid-January, the working group decided to focus primarilyon reading papers to acquire a deeper understanding of tree space, gene-tree/species-tree problems(coalescent theory), models of speciation, and ancestral recombination graphs. Several talks werealso scheduled during the semester while researchers were visiting at SAMSI for collaborations andworkshops.

Weekly working group meetings ran differently the Spring term. The idea was to have all groupmembers read the papers for the week, and one person was assigned to lead a discussion. It wasassumed that no one was an expert in the the area, so that the group could learn by reading up ona topic together. This worked reasonably well, but the best discussions took place when there weremore group members physically present at SAMSI.

Group Membership.

The official number of group members was quite high, around 35, though the number of activeparticipants is closer to 20. The number of participants on a weekly basis (“the faithful”) wastypically about eight to ten. The meetings on the coalescent model were particularly well-attended.

The names of the active participants have been included below:Elizabeth Allman (University of Alaska-Fairbanks), Elisaveta Arnaudova (University of Ken-

tucky), Peter Beerli (Florida State University), Julia Chifman (University of Kentucky), Luis Garcia-Puente (Sam Houston State University), Serkan Hosten (San Francisco State University), LauraKubatko (Ohio State University), Jinze Liu (University of Kentucky), Catherine Matias (CNRS,Laboratoire Statistique et Genome), Uwe Nagel (University of Kentucky), Megan Owen (SAMSI),Sonja Petrovic (University of Illinois at Chicago) Scott Provan (University of North Carolina), JohnRhodes (University of Alaska-Fairbanks), Chris Schardl (University of Kentucky), Seth Sullivant(N.C. State University), Amelia Taylor (Colorado College), Jason Yellick (N.C. State University),Ruriko Yoshida (University of Kentucky), Or Zuk (M.I.T. Broad Institute) Piotr Zwiernik (Universityof Warwick)

Several new collaborations were formed among evolutionary biology group members, and theworking group format gave the opportunity for extended research interaction to both these new andpre-existing collaborations.

9

2.3 University Courses

Title: Algebraic Methods in Systems Biology and Statistics

Instructors: Seth Sullivant (NCSU) and Reinhard Laubenbacher (VA Tech)

Course Day and Time: Tuesday 4:30-7:00

Course Description: This course provided an introduction to the algebraic techniques that haveemerged as useful tools in biology and statistics. This course was intended to bridge the gap betweenabstract algebra and the application areas covered in the year-long program.

After providing an introduction to polynomial rings, ideals, and Grobner bases, we will surveya range of applications of these ideas. Possible topics include: Polynomial dynamical systems overfinite fields and applications, graphical and hierarchical models, Markov bases for contingency tableanalysis, phylogenetic models and the space of trees, applications of tropical geometry in MAPestimation.

10

Final Report

Meta-Analysis: Synthesis and Appraisalof Multiple Sources of Empirical Evidence

With the increasing concern in science and medicine for issues such as more complete use

of all sources of evidence and reproducibility for single statistical studies, multiple studies and

meta-analysis are becoming central to scientific advancement. The Statistical and Applied

Mathematical Sciences Institute (SAMSI) addressed this topic through a research program

from June 2-13, 2008. It brought together leading statisticians and scientists with interests

in meta-analysis, to assess the existing methodology, and develop needed new methodology,

and explore pedagogy for bringing the methodology to the broader scientific community.

1 Scientific Overview

1.1 General Background on Meta-Analysis

Seldom is there only a single empirical research study or source of evidence relevant to a

question of scientific interest. However, both experimental and observational studies have

traditionally been analyzed in isolation, without regard for previous similar or other closely

related studies. A new research area has arisen to address the location, appraisal, recon-

struction, quantification, contrast and possible combination of similar sources of evidence.

Variously called meta-analysis, systematic reviewing, research synthesis or evidence syn-

thesis, this new field is gaining popularity in diverse fields including medicine, psychology,

epidemiology, education, genetics, ecology and criminology.

Statistical methods for combining results across independent studies have long existed,

but require renewed consideration, development and wider dissemination by inclusion in

the mainstream statistics curriculum. The possibility that the due consideration of all

relevant evidence should be accepted as standard practice in statistical analyses deserves

investigation. The combination of results from similar studies is often known simply as

’meta-analysis’. Common examples are combining results of randomized controlled trials

of the same intervention in evidence-based medicine; of correlation coefficients for a pair of

constructs measured similarly across studies in social science; or of odds ratios measuring

association between an exposure and an outcome in epidemiology. More complex synthe-

ses of multiple sources of evidence have developed recently, including combined analyses of

clinical trials of different interventions, and combined analysis of data from multiple microar-

ray experiments (sometimes called cross study analysis). For straightforward meta-analyses,

1

general least-squares methods may be used, but for complex meta-analyses, the technical

statistical approach is not so obvious. Often likelihood and Bayesian approaches provide

very different perspectives; and in practice the possible benefits of more complex approaches

may be hard to discern as many meta-analyses are compromised by limited or biased avail-

ability of data from studies as well as by varying methodological limitations of the studies

themselves.

The presence of multiple sources of evidence has long been a recognized challenge in the

development and appraisal of statistical methods - from Laplace and Gauss to Fisher and

Lindley. In the 1980s Richard Peto argued that a combined analysis would be more important

than the individual analyses, a view taken still further by Greenland and O’Rourke who have

suggested that that individual study publications should not attempt to draw conclusions

at all, but should instead only describe and report results, so that a later meta-analysis can

more appropriately assess the study’s evidence fully informed by other study designs and

results. Will combined analyses actually replace individual analyses (or at least decrease

their impact)? If so, it is time to reexamine the perennial problems of statistical inference

in this context.

The concept of multiple sources of evidence itself needs to be generalized and applied

more generally and creatively through many areas of statistical research. Multiple sources

should not just be taken as separate studies or even the possible simple regrouping of subsets

of observations within studies but the bringing to bear of seemingly distinct information

sources on given question and even the ”creation” of multiple sources as in Bayesian Additive

Regression Trees (BART) where differing regression trees are purposefully grown to be later

advantageously combined. In some fields, terms like data fusion and data integration are

being used for this more general sense of utilizing multiple sources of evidence. A single

strategy of no pooling, complete pooling or partial pooling of separate studies perhaps needs

to give way to adaptive strategies where the degree of pooling is individually chosen for

each and possibly every parameter in the joint probability models used to represent all the

relevant sources of evidence.

1.2 Specific motivation for the focus of this program

This program comprised two weeks of research, mixing tutorials, research presentations and

working group activities on the subject. The goal of this program were three-fold: 1) to bring

the area to the attention of statistical researchers, whose expertise is critical to substanti-

ate and clarify the necessary statistical theory and methodology; 2) to nurture the necessary

interdisciplinary collaboration and communication between statistical researchers and statis-

ticians who currently work or plan to work with basic and applied science researchers and

2

3) to provide an entry point into the field to interested students and faculty, and to allow

researchers already specialized in the domain to exchange recent results and information.

2 Program Structure

2.1 Leaders

The program was initiated by Keith O’Rourke. The tutorials in the first week were organized

jointly by Vanja Dukic, Ken Rice and Keith O’Rourke. Ingram Olkin opened the program

with the lead tutorial followed by Keith O’Rourke, Ken Rice, Vanja Dukic and Julian Higgins.

The data analysis sessions were given by Keith O’Rourke and Ken Rice. The working group

leaders chosen for the second week of workshops were Dalene Stangl, Ken Rice, Vanja Dukic,

Julian Higgins, Keith O’Rourke and David Dunson.

2.2 Program Attendance

The program attracted 44 participants in the first week, which featured tutorials and data

analysis workshops (see below). A total of 31 participants either continued from the first

week or joined the program in the second week; the second week involved working groups

and a final summary session (see below). All in all, 55 participants attended some portion

of the program.

2.3 Tutorials and Opening Workshop

The introductory overview was given by Ingram Olkin, Stanford University. It provided par-

ticipants with an elementary but thorough introduction to the challenges and opportunities

of dealing with multiple studies in the context of biomedical and social science research.

Many real application examples were covered to illustrate basic and advanced methods and

highlighted the numerous scientific issues and challenges that inevitably arise.

An introductory overview on the likelihood basis for multiple data sources was then given

by Keith O’Rourke, Duke University. This provided participants with an elementary intro-

duction to working directly with likelihoods to contrast and combine data based information.

This was done both for individual observations - where individual observation likelihoods

were contrasted and combined to obtain the usual study estimates - and for studies where

study level likelihoods were contrasted and combined. A general meta-analysis approach was

then presented in terms of the contrast and combination of likelihoods. Various problems

that arise with likelihoods in meta-analysis were then discussed. These problems are largely

due to fact that although likelihood concentrates for common parameters (of interest) it

3

expands in dimension for arbitrary (nuisance) parameters and unfortunately there usually

are many of these arbitrary (nuisance) parameters.

The advanced tutorials were then started the next day with Keith O’Rourke more thor-

oughly reviewing the likelihood approach and underlining issues of sparseness with the histor-

ical Neyman-Scott examples. Vanja Dukic then covered the integrated likelihood approach

as well as some preliminaries for a Bayesian approach. Ken Rice then covered the conditional

likelihood approach both from a classical and Bayesian perspective as well as providing ma-

terial on exchangeability and other more general aspects of meta-analysis. Following this,

Vanja Dukic and Ken Rice more fully covered Bayesian approaches to meta-analysis. The

tutorial sessions ended with Julian Higgins giving a thorough overview of current (practical)

challenges in undertaking meta-analyses in clinical research.

Inter-dispersed with these tutorials, Keith O’Rourke gave a ”Data Analysis Session” on

likelihood calculations in R and Ken Rice gave one on implementing Bayesian meta-analyses

in WinBUGs.

3 Working Groups

In the second week, working groups were formed based on the participant research interests.

There were six working groups formed comprised of 31 participants, with group sizes ranging

from 7 to 17. The Working Groups were

1. Decision theory

2. The role of priors for bias and random effects

3. Bias modeling and information from observational studies

4. ROC and survival analysis

5. Networking, multiple treatments and multivariate

6. Genetics

Here is a summary of their activities.

3.1 Decision analysis group

During the week this group explored two questions:

1. In reporting estimates of treatment effect and heterogeneity, is there a loss function

for which usual estimates reported are optimal?

4

2. In non-inferiority trials, how does one choose the delta by which a new treatment is

considered ”good-enough” relative to the standard treatment and placebo [this question

was motivated by an FDA inquiry to the program].

The group studied what is currently done in non-inferiority trials, discussed the difference

between random and fixed effects, raised and discussed a concern about only looking at inter-

study variability in average treatment effect rather than also being concerned with within

study treatment effect variance in choosing between drugs and discussed how “ideally” one

would like to address the problems versus how one can take what is currently done and

make an improvement that has a chance of being implemented. At the end there seemed

to be a consensus that it was necessary to be clear about what the “real” question was and

for exactly what “population” so that a full and complete modeling of the decision and its

relevant consequences could be undertaken.

As a result of the discussions, some members of the group have written a technical report

that has been submitted for publication but is still under review. The reference of the paper

is: E. Moreno, F. J. Giron, F.J. Vazquez-Polo and M.A. Negrin (2008). Optimal decisions

in cost-benefit analysis. Tech. Report. Dpt. Statistics, University of Granada.

3.2 Role of priors for bias and random effects group

This group focused on priors for random effects and bias - two areas in meta-analysis where

there is usually a small amount of sample information and hence the choice of priors can be

critical.

The discussions around random effects necessarily started with the choice of the para-

metric distribution of random effects – meant to represent the physical variation in effects

from study to study – and then priors for the parameters in these distributions and then

non-parametric approaches to random effects. In this group, roughly as in the Decision

Analysis group, it was found necessary to be clear about what the “real” question was, what

the random effect distribution was meant to represent and exactly what parameters were of

inferential interest. A quick review of some current choices for priors for random effects seen

in the meta-analysis literature was also undertaken.

The discussion of priors for biases, such as may vary with varying assessed study quality,

largely revolved around the possibilities of obtaining empirically motivated priors from the

empirical literature. A possibly relevant data set of clinical research studies with various

methods of appraising their quality was acquired and a method for investigating quality

effects identified in a paper by Greenland and O’Rourke. This likely will become a student

project in the near future.

5

Currently, the group leader, Ken Rice is working on a paper with Keith Abrams entitled

Estimating population-averaged contrasts under exchangeability; the role and influence of

random-effects distributions.

3.3 Bias modeling group

The bias modeling group undertook the challenge of issues and methods for the contrast

and combination of biased and confounded sources of information. There ended up being a

focus on two main topics - 1) propensity score issues and methods in multiple observational

studies and 2) investigations of bias modeling using both RCTs and Non-Rcts together.

The propensity score focus ended up involving two projects, Project A where there was

individual-level data available from multiple observational studies and Project B where was

only study level data available. Project A , was lead by Elizabeth Stuart of John Hopkins

University and B was lead by Robert Platt of McGill University. The motivating question

for A was with regard to how propensity scores should be estimated in this setting and

the motivating question for B was with regard to whether or not and if so – how propensity

score-based subclass estimates from the multiple studies should be combined to get an overall

estimate of the effect. Elizabeth Stuart and Robert Platt have since been collaborating on

these projects and anticipate involving students in the future.

The investigations of bias modeling using both RCTs and Non-Rcts was lead by Dan

Jackson of MRC Cambridge and involved the adaption/extension of methods developed

by Steyerberg and the motivating question was with regard to explicating the necessary

assumptions and critically assessing their appropriateness. Dan Jackson is continuing to work

on the adaption of the Steyerberg method and its extensions and in related work with the

Fibrinogen Studies Collaboration [published in Statistics in Medicine (2009) – Systematically

missing confounders in individual participant data meta-analysis of observational cohort

studies]; he has found the discussions at SAMSI useful in his thinking further about applying

Steyerberg type methods.

Also of note, Elizabeth Stuart – partly as a result of the SAMSI meeting – is planning

to organize a 2010 JSM Invited Session on methods for assessing generalizability.

3.4 ROC and survival analysis group

This working group addressed the issues of synthesizing evidence from independent studies

about diagnostic test accuracy or survival times – both of which entail individual study

and pooled curves or distributions. Both parametric and non-parametric approaches were

of interest.

6

They currently have one paper in preparation, with Jean-Francois Plante of University of

Toronto, Vanja Dukic of Chicago University, David Dunson of Duke University and possibly

Dalene Stangl of Duke University on Bayesian non-parametric meta analysis of ROC curves.

The abstract is as follows: Most standard meta-analytic methods combine information on

single parameter, such as treatment effect. For meta-analysis of diagnostic test accuracy,

measures of both sensitivity and specificity from different trials are of meta-analytic in-

terest, summarized as a bivariate measure of accuracy, or possibly as a receiver operating

characteristic (ROC) curve. Motivated by an analysis of serum progesterone tests for diag-

nosing non-viable pregnancy, we develop simple fixed-effects and random-effects summary

ROC curve estimators, based on a flexible density estimation technique. We compare the

performance of the new estimator to the simpler bivariate normal summary ROC estimator.

3.5 Network meta-analysis group

Network meta-analysis refers to the situation in which studies brought together for synthesis

have compared different subsets from a finite collection of treatments. By exploiting ‘chains’

of evidence, such as making inference on treatment A vs treatment B by contrasting studies

of A vs C with studies of B vs C, a network of interrelationships among the studies is created.

These meta-analyses are often, and perhaps more appropriately, called multiple treatments

meta-analyses (MTM), or mixed treatment comparisons (MTC) meta-analyses.

The working group tackled a variety of problems associated with network meta-analysis.

Particular progress was made on methods for illustrating the network graphically. If every

study makes a pair-wise comparison – i.e. includes exactly two treatments – then simple

graphs with nodes for treatments and lines for comparisons are sufficient to represent the

dataset. However, if some studies include three or more treatments, as is typically the case,

then such representations do not adequately illustrate the important difference between

within-study (direct) comparisons and across-study (indirect) comparisons. In this case,

comparisons that come from the data are not independent. The group proposed a diagram

in which the distinction is made by using separate lines or shapes for different study designs.

Since the workshop, some progress has been made in using graph theory to examine ‘loops’

of evidence in the network.

The importance of separating direct from indirect evidence is largely in order to in-

vestigate whether the network of evidence is coherent. Coherence is defined informally as

mismatch between direct and indirect sources of evidence, or between two different indi-

rect sources of evidence, on any particular comparison. It is a special kind of heterogeneity

between studies that focuses on between-design differences rather than between-study differ-

ences. Two statistical methods for tackling incoherence have been proposed, by T. Lumley

7

(Network meta-analysis for indirect treatment comparisons, Stat Med 2002; 21: 2313-2324)

and Lu and Ades (Assessing evidence inconsistency in mixed treatment comparisons, JASA

2006; 101: 447-459). The former adds a random effect across all studied pair-wise compar-

isons and tests whether the variance of this random effect is zero. The Lu and Ades approach

adds a random effect across each independent evidence cycle, and tests whether the variance

of this random effect is zero. There are fewer independent evidence cycles than there are

comparisons. However, counting the number of independent evidence cycles is not trivial

when there are multi-arm studies. The group discussed other approaches, such as fitting

a model than assumes coherences and comparing deviances with a ‘free’ model that makes

no assumptions about chains of evidence. Three of the workgroup members (Dan Jackson,

Jessica Barrett, Julian Higgins; working with Ian White) have a paper in preparation about

some of these ideas.

The working group also discussed technical issues about making inferences in network

meta-analyses. Restricted maximum likelihood is often used; Lumley uses the function lme

in R with a slightly unusual construction for random-effects variances. Inference is less

straightforward with multi-arm trials or logistic models. We explored profile likelihood,

and inverting the observed information matrix. Plans were made to investigate the use of

conditional likelihood, integrated likelihood, and inverting expected information matrix.

3.6 Meta genetics group

This working group address issues of multiple sources of evidence for genetics, focusing

on Gene Expression Meta Analysis, Meta Analysis for Genetic Association Studies and

Accounting for Dependence in High-Dimensional Predictors. Since this summer’s program,

the meta-genetics working group has been quite productive. The active core of this group

consists of David Dunson at Duke University , Fei Zou at UNC Biostatistics and Fei Liu at the

University of Missouri Columbia. They have submitted the following paper to Biometrics:

Liu, F., Dunson, D.B. and Zou, F. (2008). High-dimensional variable selection in meta

analysis for censored data. Biometrics, submitted. In addition, they have another paper

under way: Liu, F., Dunson, D.B. and Zou, F. (2009). Annotated relevance vector machine

with application to polymorphism selection. In preparation.

Their following summary highlights some of the work undertaken to date which represents

the most exciting research happening in the program and provides a nice example of both a

generalized concept of multiple sources of evidence and the replacement of a single strategy

of no pooling, complete pooling or partial pooling of studies with an adaptive strategy where

the degree of pooling is individually chosen for different coefficients.

In large scale genetic epidemiology studies that collect massive numbers of single nu-

8

cleotide polymorphisms (SNPs) or gene expression measurements, it is extremely challenging

to identify genes that are predictive of disease phenotypes given the modest sample size of

most studies relative to the number of genes. Due to concern about false positive rates, it

is crucial to replicate findings about disease genes in multiple studies. Standard approaches

take multistage testing approaches in which one tests if genes identified in initial studies are

significant in follow-up studies. This strategy is shown to have major disadvantages in terms

of power and type I error rates compared with an innovative approach developed in the

SAMSI meta-genetics working group based on simultaneous selection through a multi-task

relevance vector machine (MT-RVM) procedure. This approach, which is related to methods

used in signal processing, borrows information across studies in the degree of shrinkage of

gene-specific coefficients towards zero. The method is scalable to large numbers of genes, can

accommodate censored data commonly collected in disease recurrence studies, and clearly

outperforms common competitors, such as Lasso. In addition, the meta-genetics group is

currently pursuing a new procedure that allows information on gene function annotation

to be incorporated, while automatically learning how predictive each annotation source is.

The annotated relevance vector machine (aRVM) procedure should be very widely useful

in machine learning and other applications beyond genetics, as it allows an adaptive tar-

geted search for important predictors enabling an effective reduction in dimensionality and

mechanism for borrowing information across disparate studies.

4 Post Program Activities

1. At the Eastern North American Region 2009 meeting of the International Biometric

Society most of the working group leaders and some of the participants presented their

research. In particular, a session “Advances in Meta-Analysis” was organized, based

on the program, with presentations by Eloise Kaizar, Robert Platt, Vanja Dukic,and

Dalene Stangl.

2. Professional Courses:

• Keith O’Rourke gave a two day course on meta-analysis for Statisticians and

Students at the University of Alberta in July 2008.

• Keith O’Rourke gave an Advanced Meta-analysis Short Course at the University

of Alberta.

9

Final Report2009 Summer SAMSI Program on

Psychometric Modeling and Statistical Inference

July 7-17, 2009

1 Scientific Overview and Program Objectives

Much of current psychometric research involves the development of novel statisticalmethodology to model educational and psychological processes, and a wide vari-ety of new psychometric models have appeared over the last quarter century. Suchmodels include (but are not limited to) extensions of item response theory (IRT)models, structural equation models (SEMs), cognitive diagnosis models, and gen-eralized linear latent and mixed models (GLLAMM). The development of severalof these models has been spearheaded by quantiative psychologists, a group of re-searchers who find their academic homes primarily in psychology and educationdepartments. During the same period, very similar models and methodologies weredeveloped—often independently—by academic statisticians residing in mathematicsand statistics departments. The lack of interaction between these two groups hasresulted in a substantial duplication of effort and, more importantly, a delay in thedevelopment of methodology crucial to both fields. The goal of this program wasto bring researchers from both areas together to explore possible avenues for mutualcollaboration.

2 Program Leadership

The program was organized by Valen Johnson and Richard Swartz (The Universityof Texas M.D. Anderson Cancer Center), with assistance from Jimmy de la Torre(Rutgers University), Charles Cleeland (The University of Texas M.D. AndersonCancer Center) and David Banks (Duke University).

1

3 Program Attendance

The program attracted 71 participants, of which 36 were students or new researchers.

4 Program Scope and Timing

The program was conducted during the two-week period between July 7 and July17, 2008 (Tuesday-Friday (July 7–10) and Monday-Friday (July13–17), The formalprogram included the following activities.

5 Program Activities

Week 1: July 7-10

The Psychometric Program kicked-off on Tuesday morning, July 7, 2009. Tutorialsand invited speakers are listed below.

Tuesday, July 7

9:00-12:00 Introduction to Item Response Theory. Yanyan Sheng

2:00-3:00 IRT PRO demonstration. David Thissen

Wednesday, July 8

9:00-12:00 A Nonlinear Mixed Models Approach to IRT. Mark Wilson, FrankRijmen.

2:00-4:00 Topics in Response Time Analysis. Mario Peruggia and Trish VanZandt

Thursday, July 9

9:00-12:00 An Introduction to Cognitive Diagnostic Models. Mathias vonDavier.

2:00-3:00 CDM Pitfalls and Recommendations. Sandip Sinaray.

2

Friday, July 10

9:00-12:00 An Introduction to Rater Models. Matthew Johnson.

2:00-4:00 Process de-association and signal detection. Dongchu Sun and JunLu.

Week 2: July 13-17

Peer Review Working GroupThe Peer Review WG met during the second week of the program. Talks pre-

sented during this week are listed below.

Monday, July 13

10:00-11:00 An Overview of Journal of the American Statistical AssociationArticle Reviews. David Banks.

1:30-2:30 An Overview of NIH R01 Peer-Review Scores. Valen E. Johnson.

Tuesday, July 14

10:00-11:00 A Bayesian Approach to Ranking and Rater Evaluation: AnApplication to Grant Reviews. Jing Cao.

1:30-2:30 A Bayesian Hierarchical Model for Multi-rater Data with Fine Scales.Song Zhang.

Friday, July 17

9:00-12:00 Discuss draft of white paper.

12:00 Adjourn.

Patient Reported Outcome Working Group (PRO WG)The PRO WG met during the second week of the program and the following talks

were presented.

Monday, July 13

3

10:00-11:00 Practical Issues in the use of Patient Reported Outcomes. Char-lie Cleeland.

11:00-12:00 Issues in longitudinal analysis of Patient Reported Outcomes.Bryce Reeves.

2:00-4:00 Group discussion of weeks agenda. Charlie Cleeland and BryceReeves.

Tuesday-FridayGroup collaboration on consensus agenda.

Friday, July 17

9:00-12:00 Discussion of the draft of white paper.

1:00-4:00 Finalize white paper.

4:00 Adjourn.

Applications and Challenges of Cognitive Diagnostic Models (CDM WG)The CDM WG also met during the second week of the program. The following

seminars were presented.

Monday, July 13

10:00-12:00 Discussion of cognitive diagnostic models and identification ofagenda by working group members. Jimmy de la Torre

Tuesday-FridayGroup collaboration on consensus agenda.

Friday, July 17

9:00-12:00 Discussion of the draft of white paper.

1:00-4:00 Finalize white paper.

4:00 Adjourn.

4

6 Program Outcomes

The goal of this program was to stimulate collaborations between researchers in thepsychometric and statistical communities. The desired outcome for the program wasa concrete list of specific research directions that would facilitate future methodolog-ical development in related psychometric/statistical models, and the education ofprogram participants with regard to the current state of research in each programarea. White papers summarizing research in the areas of cognitive diagnostic mod-els and patient reported outcomes were drafted, and a review paper on peer reviewfor articles submitted to the Journal of the American Statistical Association waspresented. These documents are included as attachments to this report.

7 Specific Information Requested

2 Government and Industrial Participation: None

3 Publications: Three white papers were produced during and after the workshop.These reports are included as attachments to this report. All three reports arein the process of being submitted for publication in refereed journals or havebeen submitted.

4 Diversity: The workshop was attended by a diverse group of participants, includingindividuals from psychometrics, psychology, statistics, and symptom researchdepartments. Yanyan Sheng, a junior female researcher from Southern IllinoisUniversity gave the first tutorial; Trish Van Zandt from Ohio State Universitygave another tutorial. Richard Swartz, a junior instructor from M.D. AndersonCancer Center, served as co-organizer.

5 No external support was obtained.

5

An Analysis of the JASA Editorial Process

David Banks, Duke University

Jing Cao, Southern Methodist University

Sourish Das, Duke University and SAMSI

Valen Johnson, University of Texas, M.D. Anderson Cancer Center

Jun Lu, American University

Michael McGill, Virginia Tech

Paul Speckman, University of Missouri-Columbia

Yu Yue, Baruch College, CUNY

Song Zhang, University of Texas, Southwestern Medical Center

Abstract

It is important for research journals to minimize bias and delay. This study, au-

thorized by the American Statistical Association Committee on Publications, analyzes

data on review times and decisions for the Journal of the American Statistical Asso-

ciation. The results show moderate effects that are attributable to both editors and

associate editors; furthermore, there is interpretable structure in the time to first deci-

sion, and evidence of association between the duration of the review and the ultimate

decision. Finally, an effort to examine institutional bias (i.e., whether the prestige of

the contact author’s institutional affiliation is related to acceptance probability) finds

a positive relationship, but this could easily be explained by confounding factors.

Keywords: Ordinal probit model, publication bias, refereeing.

1

1 Introduction

Authors and editors have longstanding concerns about the fairness and efficiency of editorial

processes. Further, the statistical profession has a duty to assess the consistency of its review

process, to ensure that good work is quickly reviewed and published. This paper provides an

exploratory analysis of data from Journal of the American Statistical Association (JASA),

in order to study how well its systems work.

The data for this research were gathered from the on-line Allentrack manuscript han-

dling systemwhich was used to process JASA submissions beginning in 2005. We captured

information on decisions made by the six editors whose terms began after the institution of

Allentrack, and we analyzed data from all three JASA sections (Applications & Case Stud-

ies, Theory & Methods, and Reviews). Section 2 describes the data, Section 3 describes the

analysis, and Section 4 summarizes the results.

Our analyses focus upon bias and speed. In terms of bias, one motivation for this work

is the persistent debate on JASA’s use of double-blind refereeing. In discussions within the

Committee on Publications of the American Statistical Association (ASA), some members

maintain that the review process could be improved if referees knew the identities of the

authors—and there are sound (Bayesian) reasons for thinking this is true. But other members

point to historical evidence, among other journals, that single-blinding invites bias. A third

opinion holds that even if double-blinding increases noise and delay, it is still valuable for the

ASA send a clear signal that gender, nationality, and institution are irrelevant in evaluating

the quality of work. And some iconoclasts argue for no-blind refereeing, in which the authors

are known to the referees and the referees are known to the authors. They claim such a no-

blind system would promote higher-quality reviews, greater courtesy, and allow referees to

be publicly recognized for good work.

As background on the bias discussion, Snodgrass (2006) reviews several kinds of bias in

single-blind systems (i.e., reviews in which the identity of the author is not concealed from

2

the referees). One early study (Goldberg, 1968) gave readers two versions of the same article,

except that one listed the author as John McKay and the other as Joan McKay. John got

significantly better ratings. Similarly, Blank (1991) found that for The American Economic

Review, borderline papers by men were more likely to be accepted than borderline papers

by women. A different kind of bias was shown by Gordon (1980); reviewers from second-tier

schools tended to favor papers by authors at second-tier schools. And Peters and Ceci (1982)

resubmitted 12 papers from top schools to journals that had previously published them, but

with the affiliation of the author changed to a second-tier school. Three were detected, eight

were rejected for “serious methodological flaws,” and one was accepted.

In response to these documented biases, many journals adopted double-blind refereeing,

so that neither the author nor the referees know each other’s identities (but, of necessity,

the editor and associate editor must know both). JASA was among these, prompted in part

by reports from the IMS New Researchers Committee (Altman et al., 1991) and Cox et al.

(1993). One goal in this study is to decide whether there is evidence that the ASA policy

has been successful.

The second issue investigated in this study was the time to publication. There is a

widespread perception that editorial decisions are made more rapidly in computer science

and bioinformatics than in statistics. Additionally, there can be long lags between final

acceptance and the appearance of an article. To the extent this is true, then our journals

are disadvantaging statisticians. And even if the perception is false, it still does damage to

the field—our best new researchers, facing the time pressure of tenure, will avoid statistics

journals and seek to publish in strong domain science journals. Of course, senior researchers

also want their work to appear quickly and to have influence.

The editors of JASA have a duty to minimize the possibility of bias and delay, while at

the same time ensuring the accurate decisions are made. To that purpose, they obtained

permission from the ASA Committee on Publications to harvest anonymized data from Al-

3

lentrack, JASA’s on-line manuscript handling system. Those data were provided to the 2009

Psychometrics Program sponsored by the Statistical and Applied Mathematical Sciences In-

stitute. A working group in that program analyzed that data, looking for problems with the

publication process.

2 The Review Process and the Data

The JASA review system typically works as follows, although there are many exceptions.

First, an author submits the paper on-line through the Allentrack manuscript handling

system. The author provides two copies of the paper, one of which is anonymized, and

indicates which of several submission categories is pertinent (usually the category is one of the

three sections of JASA: Applications & Case Studies, Theory & Methods, and Reviews, but

other categories include corrections, letters, discussions, and the ASA presidential address).

At this time the author also makes several commitments: that the paper is original work,

that it is not under consideration by another journal, and that the co-authors will be apprised

of the outcome.

Once the paper is submitted, the editorial administrator does a quality control check,

confirming the anoymization and that all of the figures and tables are readable. That usually

takes a day or so. Then the paper is given (electronically) to the appropriate editor.

The editor reviews the paper. In some cases there is a quick decision to reject. In

others, the paper is sent to an associate editor. Sometimes it takes several days or weeks

to secure an associate editor. The associate editor is usually asked to complete the review

process expeditiously so that a first decision can be sent to the authors within two months

of submission. The associate editor may make a recommendation without further review, or

may decide to contact referees (usually two or three) and ask them to assess the paper.

It can take days or weeks to secure referees. Many decline to review, which means the

search process must be restarted. (And it is entirely professional to decline to review—a

4

person may be too busy, or conflicted, or may not feel competent to handle the manuscript.)

Usually, the referees are asked to respond within six weeks, but sometimes it is necessary to

negotiate extensions or to ask for a faster turnaround.

The Allentrack system automatically sends the referee a reminder a week before the

review is due. It sends another on the day it is due, and (if necessary) several more over

the following weeks. Unfortunately, Allentrack does not automatically notify the associate

editor when a review runs late—this must be tracked actively rather than passively, and so it

is possible to overlook slow referees. Similarly, Allentrack does not notify the editor when an

associate editor is being slow—the editor must regularly review the roster of papers, looking

for problems.

Once all the referee reports have been submitted to Allentrack, the associate editor is

notified and asked to write a report. That report is based on their own reading of the

manuscript, as informed by the opinions of the referees. The associate editor submits the

report, along with confidential comments and a recommendation, to the editor through

Allentrack.

When the editor receives the reports from the referees and the associate editor, the editor

reviews the manuscript (usually this is a very careful reading, especially if the reports disagree

or if they raise a substantive issue). Then the editor writes a decision letter and sends it

to the contact author electronically, again through Allentrack. Before the letter is sent, the

editorial administrator checks that there has been no accidental breach of the anonymity of

the referees or the associate editor; this check can take a day or two to complete.

Depending upon the editor’s decision, the authors may revise and resubmit the manuscript

for further consideration. The second review process usually returns the manuscript to the

original associate editor and often the original referees, but this varies according to circum-

stance. It is possible for the revision process to iterate several times before termination, at

which time the editor renders a final decision.

5

In the course of managing the back-and-forth in the review process, Allentrack automat-

ically records data on when the manuscript changes hands, when specific review tasks are

completed, and who has responsibility for those tasks. The ASA Committee on Publication

allowed David Banks, the 2007-2009 JASA Coordinating Editor, to compile and anonymize

the data on all JASA submissions to editors whose terms started after April 1, 2006 (i.e.,

those editors who handled all of their manuscripts within the Allentrack system, which came

on-line in 2005). The ending date for the data collection was March 18, 2009; this date

allowed time for cleaning and anonymization before the start of the SAMSI Psychometrics

Program. These data were given to a SAMSI working group for analysis, and have also been

posted on www.amstat.org/publications/tas.cfm for public use.

The data consist of the following fields:

Manuscript Type. This indicates whether the manuscript was submitted to the Applica-

tions & Case Studies section, the Theory & Method section, the Reviews section, or

(much more rarely) was an invited discussion, rejoinder, comment, letter, or correction

note. The classifications in the latter categories are inconsistent, and depend on how

the authors labeled their contribution at the time of submission; for example, some

discussions are classified as regular submissions to one of the three JASA sections.

Total Number of Days. This is the total time from the initial submission to the final

decision. It includes the time that the authors spent doing revisions. Some entries

are blank, corresponding to manuscripts for which the final decision has not yet been

made.

Editor. This is an anonymized code for the editor. There are six editors represented in the

data. In nearly all cases an editor handles manuscripts for just one of the three JASA

sections, but, very rarely, an editor may handle a manuscript for a different section

(e.g., when the usual editor has a conflict of interest).

6

Final Decision. This is the final decision on the manuscript. The possible values are “ac-

cept”, “reject”, and “withdrawn without prejudice”. The last designation is probably

used by different editors in different ways, but it often indicates a manuscript that

the authors wish to withdraw, or which has been submitted to the incorrect section

of JASA , or which needs substantial rework before a review process can be begun.

In some cases it may be a politic rejection. In rare instances, the First Decision is

“reject” and the Final Decision is “accept”—this can happen when, for example, an

author challenges the initial decision and the editor is persuaded to reverse it.

AE. This is an anonymized code for the associate editor. There are 154 associate editors

in the data set. The associate editors are nested within editors; each editor has his or

her own set of associate editors, and this set often changes a little during their term.

Sometimes an editor acts as an associate editor. For many manuscripts this entry

blank, because the editor made a decision without involving an associate editor.

First Decision. The possible values for the first decision are “reject”, “acceptable with

major revision”, “acceptable with moderate revision”, “acceptable with minor revi-

sion”, “accept in present form”, and “withdrawn without prejudice”. Essentially all

research papers are rejected or require major revision after the first round of review.

The papers that are accepted or accepted with minor revision at the first decision are

typically letters, corrections, and the annual ASA presidential address. Other possible

exceptions are cases in which the paper was initially reviewed before the Allentrack

system was adopted, but the revision was submitted after, or other cases in which—

for reasons of error or convenience—a revision was handled as if it were an a new

manuscript. Many of the entries are blank, corresponding to papers that were still

under first review as of March 18, 2009.

Days to First AE Request. This variable is the time between the submission of the

7

manuscript and the time at which the editor decides to seek an associate editor, if one

is wanted. It includes the time needed for the quality control check by the editorial

administrator, and the editor’s time to make an initial assessment.

Days to AE Secured. This is the time from submission to the agreement by an associate

editor to handle the manuscript. Usually the associate editor accepts quickly, but

sometimes an associate editor is too busy or conflicted, and in that case there can be

additional delay.

Days to First Referee Assigned. This is the time from submission to the request for

the first referee, if one is selected. It provides a measure of how quickly the associate

editor is working. Of course, the first referee may decline, requiring the associate editor

to seek several referees, which lengthens the assignment process.

Days to First Referee Secured. This is the time from submission until the first referee

agrees to handle the manuscript.

Days to Final Referee Secured. This is the time from submission until the final referee

agrees to handle the manuscript. Usually there are at least two referees.

Days to Final Referee Comments Secured. This is the time until the last referee re-

port is received. In rare cases, this appears to be inconsistent with other measures,

such as the time of the editor’s decision. Such inconsistency could arise if, for example,

a referee was very slow and the associate editor decided to proceed without their input,

only later to receive a report from the referee.

Days to AE Recommendation. This is the time at which the associate editor sends their

recommendation to the editor.

Days to First Decision. This is the time required for the editor to submit the first de-

cision.

8

Days to First Decision Sent to Author. After the editor submits the first decision,

the editorial administrator reviews it to ensure there has been no accidental compromise

of anonymity. This variable indicates the time required to complete that check. Usually

this takes only a day or less.

Institution Rank. This is a binary variable. A “1” indicates that the contact author is at

one of the universities whose graduate program in statistics is ranked among the top

twenty.

Allentrack records additional information, but these are the fields that seemed most pertinent

to our concerns, and the ones contained in the posted dataset. Some of these fields are not

used in the analyses detailed in this paper, even though the SAMSI working group did

exploratory analysis on all of them.

Unfortunately, it was not practical to include gender information. In many cases it was

difficult to determine the gender of the contact author (the problem of foreign names). Also,

most papers have multiple authors, and it was unclear how to code these. The same issue

arose when classifying Institution Rank; in that case we decided to use the institution of the

contact author (but sometimes the contact author is not the first author).

3 Statistical Analysis

We study the effects due to editors and associate editors (AEs), and the effect of the rank of

the contact author’s institution, relating these to both review time and the editorial decision

that is made. We also provide general summary statistics about the process.

Overall, 27.2% of the manuscripts were sent to the Applications & Case Studies section

(A&CS), 70.9% were sent to Theory & Methods (T&M), and 2.0% were sent to the Review

section (Rev). Figure 1 shows the numbers that went to each of the six editors in the

database, broken out by their first decision categories. We have excluded the “accept”

9

category from this plot, because it is essentially unused as a first decision—the rare exceptions

include some discussions, rejoinders, letters, corrections, the ASA presidential address, and

manuscripts whose initial submission occurred before the Allentrack system was started, but

whose revision came in after (and thus the first decision in Allentrack was actually a second

or third decision in terms of the editorial process).

As Fig. 1 shows, Editors E2 and E4 handled relatively few manuscripts. These two were

the editors of the JASA Reviews section. Since there was little data, and since the editorial

decision-making for a review paper may be different from that used in a research paper,

editors E2 and E4 were omitted from the mathematical modeling described later.

Figure 2 shows the days to first decision, separated into cases in which the first decision

was made solely by an editor, and cases in which an AE was involved. The data are pooled

for all six editors, including E2 and E4. The most notable feature is the bimodality, which is

clearly attributable to the involvement of an AE. We believe (and this is supported by Fig.

3) that decisions to reject are made relatively quickly, either by an editor or by an AE who

does not feel that the paper should be sent to referees. But when an AE chooses to seek

referee input, then the process takes longer, with the first decision typically occuring about

90 days (3 months) after submission.

Figure 3 displays the time to decision, broken out in two ways. The left panel distinguishes

the four kinds of decisions: reject, acceptable with major, moderate, or minor revisions—but

it excludes the rare accept decision. The right panel shows the time to first decision according

to the JASA section that handled the paper. The left panel corroborates the impression that

rejections happen relatively quickly; the right panel shows large apparent differences in the

distributions of review times among the sections.

As a final summary, Fig. 4 shows a loess estimate of the acceptance rate as a function

of review time, pooled across all six editors. In this plot, “acceptance” means that the

first decision was not a rejection (i.e., either a straight acceptance, or a request for major,

10

moderate, or minor revision). The interpretation is that rejections happen quickly, and that

after the first month, the probability of some kind of acceptance increases steadily, with a

notable trend-change at about 90 days. This accords with the common editorial practice

of allowing referees two months for review, after which reminders are sent; these reminders

often prompt quick responses.

The following subsections fit three models that provide a more detailed understanding of

editorial and AE differences, as well as possible bias associated with institutional rank. The

analyses are based on the data for the 1971 manuscripts that were handled by editors E1,

E3, E5 and E6 (now indexed by j = 1, 2, 3, 4, respectively).

3.1 Editorial Rejection

Model 1 fits a probit model to the probability of editorial rejection without AE review. The

four editors are treated as random effects, and the institutional rank (top-tier or not), is

treated as a fixed effect.

We used the binary variable yi = 1/0 (i = 1, . . . , n1; n1 = 799) to indicate whether the ith

manuscript was directly rejected by an editor or not. We defined the indicator variable Eij

to be 1 if the ith manuscript is handled by the jth editor, and 0 otherwise. The institution

rank for the author of manuscript i was denoted by Ii. With the introduction of the latent

variable xi, Model 1 can be written as

xi = β +J∑

j=1

ηjEij + γIi + ei, ei ∼ N(0, 1), (1)

where the decision corresponds to

yi = I(xi > 0).

Here β is the intercept, ηj denotes the random effect of the jth editor, γ is the fixed institution

effect, and ei is assumed to follow a standard normal distribution to anchor the scale of

measurement for the latent variables {xi} (cf. Johnson and Albert, 1999). A locally uniform

11

prior is assigned for β and γ. We assume a two-stage prior for random effects ηj ,

ηj | σ2

η ∼ N(0, σ2

η),

σ2

η ∼ C+(0, 1), (2)

where C+(0, 1) denotes the half-Cauchy distribution with density 2/[π(1 + x2)] for x > 0.

To obtain the marginal posterior distribution on parameters of interest, we implemented a

MCMC algorithm coded in R. We ran multiple chains with different starting points to monitor

the convergence of the MCMC updates. In addition, quantile-quantile plots (cf. Johnson,

2007) were constructed to assess model fit. After a burn-in period of 20,000 updates, 30,000

subsequent updates were used for model inference. The same MCMC procedures were also

performed for the two models discussed in the following subsections.

Table 1 provides a summary of the posterior mean and standard deviation of the parame-

ters in Model 1. The posterior mean of the variance of the random effect for editor was 0.19,

while the posterior mean of the standard deviation was 0.35. The posterior interquartile

range for ση was 0.17− 0.44. These figures suggest that the (root mean square) editor effect

on the assessment of manuscript quality was approximately 35% of the standard deviation

of true manuscript quality, which in this model is (by definition) equal to 1. We regard

the magnitude of the editor effect to be well within reasonable bounds of human judgment,

although it is by no means negligible. The estimate of the institution effect, γ, is negative,

suggesting that manuscripts by authors from top-tier institutions have a better chance of

being sent to an AE for full review. Of course, the institution effect may be confounded with

a number of different factors, ranging from the possibility that top-tier institutions attract

more capable researchers who produce high-quality papers, to the possibility that editors

have an implicit bias in favor of authors from top-tier institutions. Without additional in-

formation, the cause of the institution effect estimated in Model 1 cannot be unambiguously

identified.

12

Table 1: Summary statistics for Model 1

σ2η γ

post. mean 0.19 -0.62

post. std. 0.43 0.10

3.2 Decision Conditional on AE Review

Model 2 considers only manuscripts that were sent for AE review (which often, but by no

means always, entails referee input). We fit an ordinal probit model in which the three ranked

responses are the decisions to reject, to require major revision, and to require moderate or

minor revision. (The last two decisions were combined into a single category because there

is no clear guidance to distinguish them and AEs may have used them exchangeably.)

The explanatory variables are AE (a random effect), institutional rank (a fixed effect),

and review time. Review time is expressed in the form of a linear spline with a knot at 90

days, allowing nonlinearity and respecting the information seen in Fig. 2.

For the 1172 manuscripts that were handled by AEs (with or without input from referees),

we let yi = 1, 2, 3 (i = 1, . . . , n2; n2 = 1172) code the first decision on manuscript i. A

value of 1 indicates that the manuscript was “rejected”, a 2 that it was “acceptable with

major revision”, and a 3 that it was “acceptable with moderate or minor revision”. We

defined an indicator variable Aik to be 1 if the ith manuscript was handled by the kth AE

(k = 1, . . . , K; K = 152), and 0 otherwise. Variable Ti denotes the number of days to the

first decision. We used an ordinal probit model with latent variable xi,

xi = β +

K∑

k=1

αkAik + γIi + θ1Ti + θ2(Ti − 90)+ + ei, ei ∼ N(0, 1), (3)

where

yi = l ⇐⇒ cl−1 < xi ≤ cl

13

to model the effects of AE, institution, and decision times in determining the decision reached

for a manuscript.

The quantities cl (l = 0, · · · , 3) denote ordinal category cutoffs, where c0 = −∞, c1 = 0,

and c3 = +∞. We assume c2 to be uniform on the real line subject to the constraint

c1 < c2 < c3. As in Model 1, β is the intercept and γ is the fixed institution effect. We use

αk to denote the random effect of the kth AE. In addition, (Ti−90)+ = max(0, Ti−90), and

θ1 and θ2 are the regression slopes for the linear spline of Ti. The spline knot at 90 days was

based on an exploratory analysis of the data. A locally uniform prior was assigned for the

intercept and the fixed effects, β2, γ2, θ1, and θ2. The normal-Cauchy prior (2) was again

assumed for the random effects αk, i.e., αk | σ2α ∼ N(0, σ2

α) and σ2α ∼ C+(0, 1).

Table 2 summarizes the MCMC estimates of the parameters in Model 2. Interestingly, the

estimated variance of the random AE effects σ2α is much smaller than the estimate of variance

of the editor effects σ2η from Model 1. A possible explanation for this fact is that editors

have done a good job in the early screening stage, and manuscripts handed down to AEs are

more homogeneous in quality. In addition, AE decisions usually combine their own reading

of a manuscript with input from one or more referees. This explanation is corroborated by

the insignificant institution effect, which indicates that manuscripts handled by AEs were of

similar quality despite difference in institution ranks.

We also note that the estimate of θ1 is positive, which suggests that the first decision

tends to be more positive as the time to that decision (before 90 days) becomes longer. An

explanation of this trend is that manuscripts of relatively lower quality are easier to evaluate,

and manuscripts of relatively higher quality or of borderline quality may receive conflicting

referee reports and thus require longer decision times. Alternatively, editors may more closely

scrutinize manuscripts that are potentially publishable. The estimates of θ2 and θ1 + θ2 are

negative and positive, respectively, which suggests that the association between decision and

time-to-decision decreases with time. One interpretation is that when a decision runs longer

14

than the JASA target timeline, the delay has less to do with the quality of the manuscript

and more to do with process failure.

Table 2: Summary statistics for Model 2

σ2α γ θ1 θ2 θ1 + θ2

post. mean 0.083 0.090 0.014 -0.013 0.0013

post. std. 0.038 0.10 0.00093 0.0015 0.0008

3.3 First Decision

Model 3 examines the first decision as a function of editor and institutional rank, using all of

the manuscripts handled by E1, E3, E5 and E6. The ordinal probit response has four levels:

reject, then major, moderate, or minor revision. (It is believed that editors may use the last

two categories more consistently than do AEs, and so we abandon the previous aggregation

of the last two groups). As before, editors are random effects and institutional rank is a

fixed effect. We note that in this model, the editor effect conflates editor effects with AE

effects, unlike the first model. However, it offers a more faithful picture of what an author

might expect when first submitting a paper, since at that time it is not clear whether the

paper will be assigned to AEs.

We define yi = 1, 2, 3 (i = 1, . . . , n3; n3 = 1971) as in Model 2, but now for all of the

manuscripts handled by the four CS&A and T&M editors. The other variables, xi, Eij , and

Ii are defined as in Model 1. Our model for the three-level ordinal response yi is

xi = β +

J∑

j=1

ηjEij + γIi + ei, ei ∼ N(0, 1), (4)

where

yi = l ⇐⇒ cl−1 < xi ≤ cl.

15

The category cutoffs cl (l = 0, · · · , 3) are specified in the same way as in Model 2. The

parameters β, ηj and γ are the intercept, the random editor effect, and the fixed institution

effect, respectively, and their priors are the same as those used in Model 1.

Table 3 provides a summary of the estimates of the parameters in Model 3. For this

model, the posterior mean of the variance of the random effect for editors was 0.14, while

the posterior mean of the standard deviation was 0.29. The posterior interquartile range

for ση was 0.13 − 0.36. These values suggest that the (root mean square) editor effect on

the assessment of manuscript quality was only about 30% of the standard deviation of true

manuscript quality, which is equal to 1. In our opinion, the magnitude of the editor effect

for these manuscripts was moderate, although not ignorable. The estimate of the institution

effect, γ, is positive in this model, suggesting that manuscripts by authors from top-tier

institutions have a better chance of receiving a favorable first decision. Once again, the

institution effect might be confounded by the same factors mentioned in the discussion of

Model 1.

Table 3: Summary statistics for Model 1

σ2η γ

post. mean 0.14 0.27

post. std. 0.36 0.087

4 Recommendations for the Future

Our analysis of the JASA review process suggest that the editorial decisions are subject to

non-negligible random error. Although most individuals familiar with the editorial process

believe that the best papers are almost always accepted and the worst are almost always

rejected, there may be considerable variability in decisions reached for papers between these

16

extremes; there are certainly cases where rejected manuscripts might have been accepted if

reviewed by different editors or AEs.

Our analyses found that manuscripts whose contact authors were from institutions with

statistics departments that were ranked in the top 20 fared slightly better in the review

process. It is doubtful that all or much of this is due to bias; those institutions aggressively

recruit excellent researchers. Some research (Gordon, 1980) suggests that in the past institu-

tional bias has worked in the opposite direction, tending to favor papers from less prestigious

schools.

There is also the possibility of systematic biases in the review of articles that would not

be detectable with the data from Allentrack; for example, it is likely that some decisions

were influenced by reports from referees who had broken the blinding and identified the

manuscript authors (with Internet search engines, this relatively easy to do). More subtle

biases could occur as a consequence of editors assigning manuscripts to AEs whom they

know to be particularly lenient or stringent. Also, from these data it was not easy to assess

the important possibility of gender bias; however, since recent issues of JASA suggest that

female researchers are relatively well-represented, this may not be as pressing an issue as it

was 20 years ago.

Of more concern than editorial bias and error may be the lengthy period of time required

for accepted papers to appear in print. One goal of publishing research is to quickly spark

new theories and force change in existing practice. Consider, however, the current flow

of academic publication in statistics. First, an investigator develops an idea and typically

spends months or years researching, testing, and preparing the findings for publication. The

resulting manuscript is then submitted for publication to a peer-review journal, where it

wends its way through the editor, and the AE, and referees. After some period of time, most

papers are either returned to their authors for revision, or possibly the author is advised

to submit to a “lesser” journal. The author then spends more time revising the paper and

17

obtaining additional results. Then the manuscript is resubmitted, and perhaps subjected to

additional criticism and further revision. At the end of this process, the manuscript may be

accepted. But even then publication is not an immediate result; there is another lag of six

to twelve months before the paper appears in print. As a consequence, time to publication

after first review by a journal may easily be two or three years. Thus, cutting edge and

groundbreaking ideas are generally several years stale by the time they are shared with the

general research community.

In our age of instantaneous communication, a better system for disseminating research is

clearly required. The authors of this paper are divided on the best method for reform, and

opinions range from simply establishing a preprint webpage for manuscripts (such as arXiv)

and reducing the time that referees are allowed, to the development of novel review systems.

As one example of a fresh approach, one could implement a tiered review system. At the

lowest level, anyone could upload a paper or report to a manuscript depository. Uploaded

files would be visible to the statistical community; interested readers could comment upon

them (either anonymously or not) and rate them. Manuscripts that received sufficiently

positive reviews would be elevated for more formal peer review and placed at increasingly

higher levels in the depository. After some time, those papers that had been rigorously and

favorably reviewed would by selected by the editor for “publication” in the hierarchy’s most

prestigious “journal.” Besides providing immediate dissemination of ideas, such a system

would foster the discussion of ideas within the community and reduce errors and biases

associated with the editorial process by making the process more open. The statistical

theory that underlies modern recommender systems provides a theoretical foundation for

both rating papers and for assessing reviewers.

The statistics profession deserves excellence from its publications. That obligation implies

that journals should regularly review their process data for evidence of bias or unreasonable

delay. This paper attempts to do that; to further support that goal, the data are publicly

18

available at www.amstat.org/publications/tas.cfm, so interested readers may undertake

further analysis.

More generally, the ASA has a duty to regularly question the adequacy of the service it

provides. Our present editorial processes were essentially invented by Henry Oldenburg, the

first secretary of the Royal Society, in response to the expense of paper and the limitations

of his expertise. New technology has changed that equation, and we encourage the ASA to

revisit those decisions.

Acknowledgments

This research was supported in part by NSF grant SES-0720229, NIH grant R01-MH071418,

and University of Missouri research board grant. The authors thank the Statistical and Ap-

plied Mathematical Sciences Institute for organizing the psychometrics program that made

this work possible.

REFERENCES

Altman, N., Banks, D., Chen, P., Duffy, D., Hardwick, J., Leger, C., Owen, A., and Stukel,

T. (1991). “Meeting the Needs of New Statistical Researchers,” Statistical Science, 6,

163-174. Rejoinder in Statistical Science, 7, 265-266.

Blank, R. M. (1991). “The Effects of Double-Blind Versus Single-Blind Reviewing: Experi-

mental Evidence from The American Economic Review”, American Economic Review,

81, 1041–1067.

Cox, D. R., Gleser, L., Perlman, M., Reid, N., and Roeder, K. (1993). “Report of the Ad

19

Hoc Committee of Double-Blind Refereeing,” Statistical Science, 8, 310–317.

Goldberg, P. (1968). “Are Some Women Prejudiced Against Women?” Transaction, 5,

28–30.

Gordon, M. D. (1980). “The Role of Referees in Scientific Communication,” in The Psychol-

ogy of Written Communication, J. Hartley (ed.), London, England: Kogan Page, pp.

263–275.

Johnson, V. E. (2007). “Bayesian Model Assessment Using Pivotal Quantities.” Bayesian

Analysis, 2, 1–15.

Johnson, V. E. and Albert, J.A. (1999). Ordinal Data Modeling. New York: Springer.

Peters, D. P., and Ceci, S. J. (1982). “Peer Review Practices of Psychological Journals:

The Fate of Published Articles, Submitted Again,” Behavioral and Brain Sciences, 5,

187–195.

Snodgrass, R. T. (2006). “Single- Versus Double-Blind Reviewing: An Analysis of the

Literature,” ACM SIGMOD Record, 35, 8–21.

20

Figure 1: Decisions by editor: Bar chart showing submissions of each of the six editors

together with disposition: reject, acceptable with major revisions, acceptable with moderate

revisions, or acceptable with minor revisions.

21

Figure 2: Days to first decision: Top panel displays a histogram of the days to first decision

for all submissions. The vertical axis plots density per day, the raw fraction of decisions on

each day. (Two outliers are omitted for clarity.) The lower left panel displays a frequency

histogram of days to first decision for manuscripts handled solely by the Editor. The lower

right panel displays a frequency histogram of the days to first decision for manuscripts

handled by an Associate Editor.

22

Figure 3: Days to first decision: The left panel shows days to first decision by manuscript

decision: reject, acceptable with major revision (label not shown), acceptable with moderate

revision, acceptable with minor revision (label not shown). The right panel shows days to

first decision broken out by the JASA section to which the article was submitted.

23

Figure 4: The loess estimate of acceptance rate on first decision versus the number of days to

first decision. The acceptance includes manuscripts acceptable with major/moderate/minor

revision. The dashed lines indicate the confidence interval with 2 standard errors.

24

Current Research and Practices for Cognitive Diagnosis Models   1  

Running head: CURRENT RESEARCH AND PRACTICES FOR COGNITIVE DIAGNOSIS MODELS

Current Research and Practices for Cognitive Diagnosis Models:

A White Paper from Cognitive Diagnosis Models Working Group at the

2009 SAMSI Summer Program on Psychometrics

Prepared by:

André A. Rupp University of Maryland

Jimmy de la Torre Rutgers, The State University of New Jersey

With Contributions from:

Alicia Alonzo Michigan State University

Ying Cheng Notre Dame University

Jere Confrey North Carolina State University

Benjamin Cooke Duke University

Ying Cui University of Alberta

Matthew Finkelman Tufts University

Neil Heffernan Worcester Polytechnic Institute

Robert Henson University of North Carolina at Greensboro

Kristen Huff College Board

Eunice Jang University of Toronto

Tzur Karelitz Education Development Center

Roy Levy Arizona State University

Nathalie Loye Université de Montreal

James Madden Louisiana State University

Rebecca Nugent Carnegie Mellon University

Curtis Tatsuoka Case Western Reserve University

Current Research and Practices for Cognitive Diagnosis Models   2  

Abstract

In this white paper we describe future directions for research and practice for cognitive diagnosis models

(CDMs) based on the work presented at the Cognitive Diagnosis Models Working Group (CDMWG),

which was assembled as part of the Summer Program in Psychometrics held at the Statistical and

Applied Mathematical Sciences Institute (SAMSI) in Raleigh, NC, from July 13-16, 2009. The

CDMWG served three aligned purposes. One, it served to take a representative survey of the current

state-of-the-art by having the invited participants describe their current line of work along with future

directions of their own work. Two, it served to develop a consensus among the participants about what

they view as the most promising directions for future research in the area of diagnostic assessments and

CDMs. Third, it served to establish collaborative relationships among the participants that are the

foundation for pursuing some of these promising research avenues. In alignment with these purposes, we

synthesize the key ideas and future research directions that were presented and discussed at the

CDMWG in this white paper.

Current Research and Practices for Cognitive Diagnosis Models   3  

White Paper from the Cognitive Diagnosis Models Working Group at SAMSI 2009:

Current Research and Practices for Cognitive Diagnosis Models

In this white paper we describe future directions for research and practice for cognitive diagnosis

models (CDMs) based on the work presented at the Cognitive Diagnosis Models Working Group

(CDMWG), which was assembled as part of the Summer Program in Psychometrics held at the

Statistical and Applied Mathematical Sciences Institute (SAMSI) in Raleigh, NC, from July 13-16,

2009.

SAMSI is a partnership of Duke University, North Carolina State University, the University of North

Carolina at Chapel Hill, and the National Institute of Statistical Sciences, in collaboration with the

William R. Kenan, Jr. Institute for Engineering, Technology and Science.

SAMSI is part of the Mathematical Sciences Institutes program of the Division of Mathematical

Sciences at the National Science Foundation. It is a national institute whose vision is to forge a new

synthesis of the statistical sciences and the applied mathematical sciences with disciplinary science to

confront the very challenging and most important data- and model-driven scientific challenges. SAMSI

achieves profound impact on both research and people by bringing together researchers who would not

otherwise interact, and focusing the people, intellectual power and resources necessary for simultaneous

advances in the statistical sciences and applied mathematical sciences that lead to ultimate resolution of

the scientific challenges.  

The CDMWG was supported in part through grant 0934106 in the REESE program of the National

Science Foundation. It was organized and hosted jointly by Jimmy de la Torre, Assistant Professor in the

Graduate School of Education at Rutgers, The State University of New Jersey, and André A. Rupp,

Assistant Professor in the Department of Measurement, Statistics, and Evaluation (EDMS) at the

University of Maryland. The CDMWG brought together a diverse array of emerging and established

scholars (i.e., university professors, graduate students, and measurement specialists in the private

industry) in the area of CDMs. All participants had in common that they were involved in some of the

core areas of research surrounding the design, implementation, and analysis of diagnostic assessments

that are scaled with CDMs; for a list of bios please see the Appendix A.

The CDMWG served three aligned purposes. One, it served to take a representative survey of the

current state-of-the-art by having the invited participants describe their current line of work along with

future directions of their own work. Two, it served to develop a consensus among the invited

Current Research and Practices for Cognitive Diagnosis Models   4  

participants about what they view as the most promising directions for future research in the area of

diagnostic assessments and CDMs. Third, it served to establish collaborative relationships among the

participants that are the foundation for pursuing some of these promising research avenues. In alignment

with these purposes we synthesize the main ideas, research directions, and empirical findings that were

presented at the CDMWG in the following.

It is not our intention in this white paper to comprehensively review the current state-of-the-art of

diagnostic assessments and CDMs because several excellent review papers or reference volumes already

exist for that purpose. As generally accessible resources we specifically recommend the book entitled

Cognitive diagnostic assessment for education: Theory and applications by Jacqueline Leighton and

Mark Gierl (2007), which covers a range of topics similar in scope to the CDMWG. We also

recommend the book Diagnostic measurement: Theory, methods, and applications by André A. Rupp,

Jonathan Templin, and Robert Henson (in press), which contains a comprehensive and coherent didactic

introduction to CDMs. The Winter 2007 special issue of the Journal of Educational Measurement is also

a good resource for investigating some models and methodologies for cognitive diagnosis. For readers

interested in delving into the more technical literature directly, we recommend the review articles by Fu

and Li (2007), Roussos, DiBello and Stout (2007), or Rupp and Templin (2008) as well as the articles on

unified estimation frameworks for CDMs by de la Torre (2008, 2009a, 2009c), Henson, Templin, and

Willse (2009), Templin and Rupp (in press), and von Davier (2005).

Terminology

Before we begin it seems appropriate to define the basic terminology that is needed for this white

paper. As is typically the case with terminology, the choice of words and phrases is partially grounded in

disciplinary traditions, partially grounded in communicative objectives for a particular piece of writing,

and partially grounded in subtle nuanced differentiations of meaning. In the end, though, it remains a

choice that is made somewhat idiosyncratically by authors.

In this white paper, we will refer to the people who provide the observable response data as

respondents, the stimuli to which they respond as items or tasks, the instruments that collect the items or

tasks as diagnostic assessments, and the latent (i.e., unobservable) characteristics of the respondents as

attributes. As we have already done above, we will call the statistical or psychometric models of

primary interest as cognitive diagnosis models (e.g., de la Torre, 2008; Nichols, Chipman, & Brennan,

1995; Leighton & Gierl, 2007) even though recent publications also refer to them as diagnostic

classification models (e.g., Rupp and Templin, 2008; Rupp, Templin, & Henson, 2010).

Current Research and Practices for Cognitive Diagnosis Models   5  

Statistically speaking, CDMs contain multiple latent variables (i.e., not directly observable

variables), which are discrete (i.e., dichotomous or polytomous) and yield statistically-driven

classifications of respondents into distinct latent classes. Hence, CDMs are also known as restricted

latent class models, because the models can be characterized as general latent class models that contain

a predetermined number of latent classes with parameter restrictions across the latent classes. The

parameter restrictions are driven by the Q-matrix, which is a table that shows which attributes are

measured by each item – if entries are dichotomous – or, alternatively, the degree to which items

measure attributes – if entries are polytomous or real-numbered (e.g., K. Tatsuoka, 1983, 2009; von

Davier, 2005). Therefore, the resulting multivariate classification of the respondents into a latent class

yields a vector of attribute-mastery states for each respondent, which is also known as an attribute

profile in the literature. With this terminology at hand we are now ready to discuss the key processes,

discussions, and outcomes of the CDMWG.

Current Challenges for Research and Practice for Diagnostic Assessments and CDMs

During the first three days of the week, the CDMWG was organized into thematic blocks that

represented the key stages of the design, implementation, and data analysis for diagnostic assessments in

various disciplines, specifically educational and psychological measurement. (The presenters for the

different blocks are given in Appendix B.)The following list summarizes these key stages:

1. The selection of a diagnostic assessment framework and clarification of the targeted decision-

making and reporting purposes of the assessment

2. The specification of a developmental theory for the measured attributes

3. The development of tasks and the specification of their relationship to the measured attributes via

a table-of-specification for the assessment and the Q-matrix

4. The selection of statistical models for calibrating the tasks and scaling the respondents, which

include parametric, nonparametric, and semi-parametric approaches

5. The estimation of these models using modern programming environments

6. The assessment of model-data fit at different levels

7. The subsequent refinement of the diagnostic assessment and/or the statistical models to improve

model-data fit at different levels

8. The provision of feedback to various groups of stakeholders using diagnostic score reports along

with associated decision-making about the targets of the assessment

Current Research and Practices for Cognitive Diagnosis Models   6  

9. The collection of evidence for a comprehensive validation of all resulting inferences and

decisions that are drawn from the diagnostic assessment.

10. Potentially, the development of statistically optimal systems for computerized adaptive

assessment contexts and automated test assembly

These stages are common to the development of most standardized large-scale assessments, independent

of whether these provide information about single or multiple latent characteristics of the respondents

and independent of whether models from classical test theory (e.g., Thissen & Wainer, 2001, chapters 1-

2), item response theory (e.g., de Ayala, 2009; Embretson & Reise, 2000), factor analysis (e.g.,

McDonald, 1999; Thissen & Wainer, 2001, chapters 3-6) or CDMs are used (see Rupp, 2009a). They

are certainly not as clear-cut and linear in practice as the above list might suggest as most assessment

systems require numerous iterations of refinement and revision.

During the fourth day of the SAMSI meeting, we asked the participants of the CDMWG to

brainstorm a variety of issues that they believed researchers and applied specialists are currently

struggling with. Their key ideas can be organized into the following seven themes:

1. Integrating qualitative and quantitative evidence for analyses with CDMs

2. Creating meaningful and feasible informational richness with CDMs

3. Supporting cultural changes for teaching, learning, and diagnostic assessment with CDMs

4. Articulating common standards for diagnostic assessments and CDMs

5. Finding a balance between model sophistication and model parsimony in CDMs

6. Capitalizing on the information contained in model-data fit statistics for CDMs

7. Fostering professional development for diagnostic assessments and CDMs

We describe the key ideas articulated by the participants on these themes in the subsequent sections but

note that there are tightly interconnected and not as distinct as the list above might suggest.

Integrating Qualitative and Quantitative Evidence for Analyses with CDMs

Participants noted that it is often rather challenging to align evidence from qualitative and

quantitative data-collection for diagnostic assessments and subsequent analyses of these data with

CDMs. For example, think-aloud studies and concurrent interviews are frequently cited as valuable

sources of information into the cognitive and meta-cognitive processes (Leighton, 2004; Leighton &

Gierl, 2007). Such processes can include, for example, the response processes of respondents to the

diagnostic assessment or the decision-making processes of experts when they develop Q-matrices (Loye,

2008; Loye, Caron, Burney-Vincent, & Gagnon, 2009). Yet, this information is typically only used

Current Research and Practices for Cognitive Diagnosis Models   7  

procedurally as a means of quality-control for the development process of diagnostic assessments (i.e.,

for data triangulation) but is not further integrated into explanatory modeling approaches (de Boeck &

Wilson, 2004). Speaking more generally, it is paramount that explanatory information about task or

respondent characteristics is included into CDMs to enrich the evidentiary narrative. Within unified

estimation frameworks for these models (e.g., de la Torre, 2008, 2009a, 2009c; Henson, 2009a, 2009b;

Henson, Templin, & Willse, 2009; C. Tatsuoka, 2002, 2009; von Davier, 2005), this is nowadays

possible with relative ease.

Challenges surrounding the integration of qualitative and quantitative information concern both the

quantification of qualitatively collected narratives and the narrative translation of qualitative and

quantitative data patterns. They also stem from the different methodological repertoires of researchers

working predominantly with either type of data and the lack of acceptance of qualitative data sources by

quantitatively oriented researchers. Reframing the informational value of qualitative data for a

predominantly quantitative process of scaling response data with CDMs also has implications for the

role of assessment design experts. For example, it is typically unclear what constitutes relevant

“expertise” for such experts when they are selected into panels and how their understanding of response

processes and theoretical frameworks can be most effectively used to refine the design of diagnostic

assessments. In other words, it would seem desirable to view a broader range of experts trained in

qualitative, quantitative, and mixed methods as complementary partners in the construction of a

diagnostic assessment rather than merely as functional participants who have to fulfill basic assessment

development tasks (see Rupp, 2009a).

This certainly implies a higher degree of collaboration between learning scientists, content experts,

and assessment specialists (Alonzo, 2009; Huff, 2009). Specifically, teachers with suitable levels of

experience in teaching, learning, and assessment are indispensable for the development of diagnostic

assessment systems. Teachers are particularly important because their “diagnostic” needs on a day-to-

day basis, when formative interpretations of low-stakes assessments are most frequently made, may

differ strikingly from their “diagnostic” needs as perceived by external experts. The views of experts

with background and experience in large-scale assessments are typically colored by such experiences

and understanding the needs of stakeholders like teachers or clinicians may require concerted efforts of

re-educating them with respect to these uncommon situations. Thus, it is important to interview teachers

about their needs repeatedly, at the outset, during the development, and during different implementation

phases of diagnostic assessments. This could also lead to the development of a taxonomy of inferences

Current Research and Practices for Cognitive Diagnosis Models   8  

and associated decisions that need to be supported via CDMs for different stakeholder groups. For

example, in the area of evidence-centered design (e.g., Mislevy, Steinberg, Almond, & Lucas, 2006;

Huff, 2009; Rupp, 2009b) such work is translated into the development of design patterns for tasks,

which link observable task characteristics to specific evidentiary narratives via appropriate building

blocks of statistical models (e.g., Mislevy et al., 2009).

Creating Meaningful and Feasible Informational Richness with CDMs

Participants noted that there exists a certain loss of descriptive richness about learning and reasoning

processes in a domain when those fluid processes are oprationalized and translated into static Q-matrices

for a diagnostic assessment. An important factor that influences the degree of informational loss in this

translation is the grainsize of the attributes and the developmental theory within which the diagnostic

assessment design is embedded. It is clearly challenging to manage the trade-off between in-depth fine-

grained diagnostic analyses and more cursory coarse-grained proficiency profiles despite the fact that

CDMs for different attribute gradations exist (Karelitz, 2009). Furthermore, it appears that some tasks in

some domains lend themselves more easily to designs of diagnostic assessments because the space of

possible solution paths is more clearly delineable and the constituent components of these paths are

more easily enumerable and operationalizable.

For example, as demonstrated in the area of expertise research, solution paths for tasks that require

lower levels of expertise are much more linear whereas solution paths for tasks that require high levels

of expertise are much more nonlinear and mediated by metacognitive strategies. As a case in point, the

work by Confrey and Maloney (2009) demonstrates the complex manner in which concepts related to

rational number reasoning are acquired by children between elementary and middle school. Attempting

to represent this complexity with a particular diagnostic assessment of CDM always bears the risk of

significantly oversimplifying and, thus, misrepresenting the actual reasoning processes of the children.

As Rupp (2009c) discusses, diagnostic assessments are typically designed to provide information

about a particular domain that is described as “rich” in some way, but obtaining deeper information

about a few processes requires sacrificing the breadth that large-scale standardized assessments afford.

Put differently, developing rich narratives in a more restricted domain can also be seen as impoverishing

the narratives about a broader domain, which is why it is important to design, implement, and analyze

diagnostic assessments only when the specificity of the decisions they support is really needed by

stakeholders. In certain cases, feedback for individuals and groups may be desirable. When this is the

case, it is statistically easier to aggregate up reliable individual diagnostic information to create reliable

Current Research and Practices for Cognitive Diagnosis Models   9  

group-level information, rather than attempt to disaggregate reliable group-level information, which may

be unreliable for individuals.

Careful task design can ensure that there is a solid alignment between the observable evidence

available through scored responses and the condensation of this evidence through CDMs. Research and

experience in various projects has taught us how this alignment becomes increasingly challenging as the

complexity of the learning environment increases. However, it is precisely in these contexts that the

potential richness of evidentiary narratives is greatest, which is what CDMs are designed to statistically

support. Importantly, this requires diagnostic assessment designers to utilize task formats that do not

distort the desired richness of the observable evidence by unnecessarily constraining respondents’

actions in these environments. While they certainly cannot be unconstrained, a rich history on

performance assessment has taught us useful lessons about how to achieve a balance between allowing

for and suitably constraining respondent behavior to support evidentiary narratives about complex skill

sets.

In short, there is a dire need of research in different domains that explicates the degree to which

response processes to different types of tasks can be decomposed in such a way that potentially rich

narratives for diagnostic assessments can be meaningfully constructed. One of the future challenges for

the field is to investigate the degree to which CDMs can be used to analyze complex response data from

more open-ended task environments with more ill-defined tasks. This will certainly prove challenging

because the successful application of related latent-variable models like multidimensional IRT models,

multidimensional FA models, and multidimensional Bayes nets requires a great deal of care. We will

discuss a selection of successful diagnostic assessment systems where the trade-off between fine-grained

and coarse-grained informational feedback has been managed well in the last section of this paper.

Supporting Cultural Changes for Teaching, Learning, and Diagnostic Assessment with CDMs

Large-scale standards-based assessment and comparative educational surveys have led to important

sea changes in the teaching, assessment, and accountability cultures of many countries around the world.

It is not the focus of this white paper to debate the relative benefits and merits of such systems, but

participants highlighted that the successful implementation of diagnostic assessment systems requires an

adjustment or change of the culture of teaching, learning, and assessment. Well-integrated diagnostic

assessment systems are frequently viewed as providing fine-grained feedback in low-stakes contexts

(Roussos, DiBello & Stout, 2007; Leighton & Gierl, 2007) and powerful worked examples exist. These

include, for example, the Cisco Networking Academy (West et al., 2009) and the Cognitive Tutors at

Current Research and Practices for Cognitive Diagnosis Models   10  

Carnegie Mellon University (Heffernan, 2009), which both require an emphasis on supporting student

learning with assessment being integral yet incidental to the process. Such trends are also reflected in the

work on games- and simulation-based assessment, which include diagnostic stealth assessments (Shute

et al., 2009).

As part of such successful diagnostic assessment systems, different diagnostic score reports that

provide actionable evidence for different stakeholders are required, which often requires various design

iterations. Recent research on such reports (Jang, 2005, 2009; Huff, 2009) has demonstrated the utility

of presenting the same information in various ways, especially using modern high-resolution

visualization tools. Successfully implementing diagnostic assessment systems requires much more than

the putting all the right tools in place. It requires a commitment of school districts to continual

professional development of teachers that fosters the development of a shared mindset, value

framework, and methodological repertoire for diagnosing multiple attributes of learners.

Articulating Common Standards for Diagnostic Assessments and CDMs

The disciplines of educational and psychological measurement have developed a rich set of

standards, most notably the Standards for Educational and Psychological Testing (American

Psychological Association, American Educational Research Association, National Council of

Measurement in Education, 1999) and the Code of Fair Testing Practices in Education (Joint

Committee on Testing Practices, 2004). These standards, in alignment with shared expertise captured in

technical reports, provide guidance for principled assessment design, implementation, and analysis. On

the one hand, participants agreed that it is necessary for practitioners to adopt and adhere to existing

standards and practices that are equally applicable to diagnostic assessments. On the other hand, it is

also necessary to augment existing standards and practices to suit the inferential validity needs of

diagnostic assessments.

For example, there are no specific standards that focus on the development of Q-matrices or the

cross-validation of respondent classifications and few descriptions of best practices that provide

guidance for practitioners in the form of do’s and don’ts, for example. Even though one can infer such

information from some documents or conversations with colleagues in the field, there is far too much

distributed expertise possessed by a chosen few currently available, rather than shared expertise

available to everyone. At the same time, participants agreed that it is important not to engage in a

process of “reinventing the wheels” of sound assessment practices that already exist; rather, it is about

Current Research and Practices for Cognitive Diagnosis Models   11  

giving those practices a “tune-up” that serves the unique needs of developers and users of diagnostic

assessments. Such “updates” can also be seen in related fields to which CDMs can be applied.

For example, the field of clinical psychology is currently in the process of creating a new fifth

version and associated research agenda for their key diagnostic manual, the Diagnostic and Statistical

Manual of Mental Disorders (DSM-V). Key contributions to this process focus on the dimensional

assessment of psychiatric diagnosis (Helzer et al., 2008), which aligns well with the multidimensional

nature of CDMs. The utility of these models for clinical psychology has already been illustrated by

Templin and Henson (2006) and Jaeger, Tatsuoka, & Berns (2003), and a fertile area for cross-

disciplinary research and practice exists there. If CDMs could be successfully applied to diagnostic

assessments in this area, the professional standards from clinical psychology research and practice could

certainly help sharpen and shape the thinking and acting of professionals in educational assessment.

Finding a Balance between Model Sophistication and Parsimony in CDMs

Several participants were trained quite rigorously in various statistical and psychometric models.

Yet, in alignment with established statistical practices in other disciplines, there was a strong consensus

that the key function of a suitable statistical model is to support the evidentiary narrative for the purpose

of the decision-making in the context in which the diagnostic assessment system is embedded. That is,

tailored parsimony of statistical models such as CDMs for a particular purpose is the key to any

successful application of these models. If statistical models become the driving force during certain

phases of diagnostic assessment design, they may unnecessarily constrain the structure of the

assessment. Similarly, several successful intelligent tutoring environments, which can provide useful

diagnostic information for formative purposes – albeit in typically highly constrained task domains – do

not rely on complex parametric statistical models at all but can be well accepted and perceived as highly

useful by practitioners.

This argument is not meant to suggest that there is no need to refine the development of CDMs.

Quite to the contrary, such technical developments are necessary to ensure that the methodological

toolbox contains reliable tools that measurement specialists can judiciously select from when needed.

Moreover, the structure of more complex statistical models can provide important insight into core

modeling principles that apply to simpler statistical models as well. For example, a Bayesian estimation

framework has the advantage of allowing for the inclusion of prior information about task properties and

respondent profiles into the CDM. This can be beneficial when small sample sizes and reasonably

Current Research and Practices for Cognitive Diagnosis Models   12  

reliable information about plausible ranges of parameter values are available in preoperational stages of

a diagnostic assessment as well-specified prior information effectively acts as additional data.

In recent years there has also been a noticeable reorientation in the literature on educational

measurement that has led to a rediscovery and subsequent refinement of traditional classification

algorithms in lieu of complex CDMs. Work on such algorithms had gotten more prominent visibility

through conferences like the International Conference on Educational Data Mining, which will be held

for only the third time in 2010. Such a shift is healthy for the disciplines of educational and

psychological assessment, because it allows researchers and practitioners to “take a step back”. It allows

them to refocus their attention on solving classification problems in the most effective and efficient way

possible, rather than imposing unnecessarily complex classification mechanism onto systems or contexts

that are not designed to support them. However, successful systemic effects for the discipline also

require that researchers interested in diagnostic assessment and CDMs embrace more fully current and

past research in related fields such as general multivariate statistics and educational data mining.

Current Research and Practices for Cognitive Diagnosis Models   13  

Capitalizing on Information Contained in Model-data Misfit Statistics for CDMs

Related to model complexity is the issue that model-data fit information is frequently either not

investigated comprehensively, or, if it is investigated in more detail, is not used to its full potential

despite some useful available tools (e.g., de la Torre, 2009b; Cui, 2009; Levy, 2009b). As reported by

West et al. (2009), information about model-data fit at various levels (i.e., absolute model fit, relative

model fit, item fit, and person fit) within a Bayesian estimation framework (Almond, Williamson,

Mislevy, & Yan, in press; Levy, 2009a) has been used successfully to refine the design of the Cisco

Networking Academy assessment system. Rather than just helping researchers identify a best-fitting

statistical model at one point in time, the information was used to iteratively refine hypotheses about

student cognition and meta-cognitive strategies. This, in turn, had implications for the refinement of the

design of the diagnostic assessment system, including specifically the feedback that was provided to the

respondents and the decisions that were made about them.

Such worked examples suggest a much richer benefit of such analyses than is often suggested from a

purely statistical perspective. In addition, they help to illustrate the decision-making processes that

assessment specialists employ to improve their assessment design and model calibration. A systematic

description of such processes in the professional literature would help to understand the underlying

value systems of assessment specialists for different types of diagnostic assessment uses. There are other

signs that fit assessment is slowly becoming more comprehensively debated in public forums. For

example, the 15th International Objective Measurement Workshop, held just before the AERA and

NCME conferences in 2010 in Denver, has as its theme “using model fit to evaluate hypotheses about

learning”.

Fostering Professional Education for Diagnostic Assessment and CDMs

As much as participants viewed the professional development of teachers as a critical key to the

future success of diagnostic assessment, the training of measurement specialists in diagnostic assessment

methods was identified as equally important. Such training involves the inclusion of topics relevant to

diagnostic assessment design, implementation, and analysis in educational and psychological

measurement courses as well as the development of novel courses dedicated to CDMs specifically.

Naturally, this requires the creation of textbooks and reference books that focus specifically on such

kinds of assessment systems; the books by Rupp, Templin, and Henson (2010), K. Tatsuoka (2009), and

Almond, Williamson, Mislevy, and Yan (in press) are important steps in this regard. Conferences in

educational and psychological measurement have seen CDMs and diagnostic assessment design become

Current Research and Practices for Cognitive Diagnosis Models   14  

a core part of their training session repertoire. Several prominent scholars in the field, some of whom

attended the CDMWG, have become actively involved in the training of practitioners and professionals

alike and so one can be reasonably hopeful about the dissemination of didactic information and

associated training for the years to come. Until more successful case studies or worked exemplars are

available that show some form of an “added value” for certain diagnostic assessment systems in general

and CDM applications in particular, the “bottom-up” demand for such training will be driven mostly by

large-scale assessment agendas. This may not be advantageous overall, because there is typically a

strong gap between the desire for fine-grained diagnostic information and the understanding of the

complexity of the assessment systems that are required to collect and aggregate such information

reliably.

Three Integrative Strands for Future Research on Diagnostic Assessments and CDMs

During the fourth day of the CDMWG, we also asked participants to identify the three most critical

issues that researchers and practitioners in the fields of diagnostic assessment and CDMs should address

in the future to increase the breadth, depth, and methodological rigor of their work. These are the issues

that they identified:

1. Developing an integrated web portal that serves as a common information warehouse and virtual

meeting/exchange point for researchers and practitioners interested in diagnostic assessments and

CDMs

2. Developing an integrated software environment that is user-friendly, expandable, and statistically

powerful

3. Developing an integrated diagnostic assessment system that is optimized for providing fine-

grained diagnostic feedback based on multivariate attribute profiles from CDMs

As these three issues underscore, participants saw a clear need for a concerted effort for creating

evidentiary and practical coherence for the field of diagnostic assessment and CDMs.

Developing an Integrated Web Portal

To foster an interdisciplinary exchange and growth in the fields of diagnostic assessment and CDMs,

participants suggested the development of an integrated web portal. For the portal to have relevance,

they argued that it should be the definitive comprehensive resource for these topics, rather than just a

relatively unstructured hodgepodge of discussions, examples, and resources. Thus, it should include at

least the following pieces of information/sections/facilities:

Current Research and Practices for Cognitive Diagnosis Models   15  

1. A section with an introduction to the design, implementation, and data analysis of diagnostic

assessments

2. A section with an introduction to various statistical diagnostic modeling approaches including

CDMs

3. A section with an overview of different skill domains that have been successfully modeled with

CDMs and the associated Q-matrices for these applications

4. A repository with sample data files or complete data files from diagnostic assessments that have

been successfully scaled using multiple dimensions

5. A list of available software programs for estimating CDMs and associated measurement

models/algorithms for similar purposes and contact information for acquiring them

6. Worked examples of successful diagnostic assessment systems with a public commenting facility

to create critical metatext

7. A comprehensive annotated bibliography for the different content areas represented on the

portal, including unpublished research reports or related documents

8. A section with frequently asked questions (FAQs)

9. A section with guidelines for best practices/standards/do’s and don’ts

10. A listserv for interested members of the community that the web portal addresses

11. Hyperlinks to different members of the community that the web portal addresses

12. Rich Site Summary feed

Given the scope of the web portal as sketched here, participants deemed it necessary to acquire

external funding to hire graduate students or professionals trained in data-mining and/or computer-

human interaction to design and maintain the portal. Moreover, it was viewed as important to

differentiate the needs for logistical management for creation and management, which such students

could provide, and the need for an intellectual management of the portal, which professionals in the field

should provide. They suggested that the portal could perhaps be linked to professional organizations

such as the Special Interest Group for Cognition and Assessment at AERA. The intellectual

management is critical, because decisions need to be made about what information constitutes “core”

information and how the family resemblance of new information is judged so that sufficient space on the

portal can be allocated in the right location.

Developing an Integrated Software Environment

Current Research and Practices for Cognitive Diagnosis Models   16  

To foster the cross-disciplinary application of CDMs and related classification methods for

calibrating data from diagnostic assessments, participants at the CDMWG identified the need of a

common software suite. The price of such an environment would need to be competitive even though a

commercial version would generally be preferable over a freeware version given that sales would allow

for the hiring of personnel to improve the software. Alternatively, of course, one could acquire grant

money for such a development but it may be challenging, if not impossible, to acquire money for the

maintenance of the software as well as technical support. The first version of the software suite should

be targeted toward the most frequent potential users to allow for maximum market penetration while

later versions can include more advanced features that will make it attractive to secondary users as well.

Similar software suites are currently in development for related measurement modeling frameworks.

For example, a new software suite called IRTPRO (Cai, du Toit, & Thissen, forthcoming, personal

communication) will be released in 2010 to serve similar purposes for IRT analyses. Similarly, the

REMP program at the University of Massachusetts at Amherst has devoted resources, predominantly via

graduate student funding, to the development of several freeware IRT programs.

Since the acceptability and use of estimation algorithms is, in part, driven by how accessible these

algorithms are for users, expanding such efforts to the area of CDMs seems indispensable. Some efforts

are already underway in this regard. For example, Matthias von Davier will release a more flexible

software program for his family of general diagnostic models in 2010 (von Davier, personal

communication). In a similar vein, Rupp, Templin, and Henson (2010) have described how to estimate a

variety of DCMs within Mplus but the approach is not user-friendly enough at this point to be of

practical appeal for anyone but specialists interested in CDMs.

Thus, the participants identified the following desirable features of such a software suite. It should

include…

1. calibration, model-fit, and reporting engines for a variety of CDMs within a broad model

family

2. a variety of acceptable data-model fit statistics at the relative, absolute, item, and person

level

3. a variety of useful descriptive statistics for multidimensional assessments such as

discrimination indices for items and attributes, difficulty indices for items and attributes,

reliability statistics for scale scores, and standard error estimates for these scores

Current Research and Practices for Cognitive Diagnosis Models   17  

4. a variety of visual representations for the attribute profile structures, both at the level of

the profile and marginally for each attribute

5. a flexible importing interface for key model elements (e.g., data file, Q-matrix, starting

values, scoring key) with capabilities to read in files from a variety of foreign file formats

6. point-and-click and graphical interfaces that are cross-generative such that one can create

Q-matrices from specifications, specifications from Q-matrices, visual representations

from Q-matrices, etc.

7. example data sets based on simulated and real data with accurate labeling of variables

and comprehensive documentation similar to international large-scale assessments such

as PISA, TIMSS, or NAEP

8. simulation capabilities with enough flexibility that various statistics can be computed on

the simulated data

9. flexible control of the output structure and format

10. a detailed, comprehensive, and clearly written user manual that includes decision

flowcharts and FAQ sections

11. a comprehensive help section

Given the current state of the market at the time of this writing, such a software suite would compete

with the following current special-purpose software programs/estimation routines for CDMs:

1. Arpeggio software by Lou DiBello et al. (Windows, DOS; commercial program)

2. Mplus code by Jon Templin (Mplus; freeware plus required commercial program)

3. MDLTM software by Matthias von Davier (Windows, DOS; free research license)

4. BUGLIB software by Curtis Tatsuoka (Linux; paid research license)

5. G-DINA code by Jimmy de la Torre (Ox; freeware)

6. LCDM software by Bob Henson (Windows, DOS; freeware)

7. AHM software by Hunka et al. (Mathematica; freeware)

8. DINA, DINO code by Alexander Robitzsch (R; freeware)

Moreover, the software suite needs to include a sufficient number of specialized features to compete

meaningfully with the following general-purpose software programs that can be used to estimate some

CDMs: Winbugs, R/S-PLUS, Mplus, Latent Gold, LEM, MLSSA, GLAMM/STATA, SAS, Fortran,

C/C++, variety of programs for Bayesian inference network, Mathematica and Ox.

Current Research and Practices for Cognitive Diagnosis Models   18  

Incentives for the development of such software programs can also be given via awards, even though

they are rather minor. Nevertheless, the Bradley Hanson Award for Contributions to Educational

Measurement, for example, which is annually awarded by NCME, recognizes research efforts toward

the development of user-friendly software programs in the spirit of Bradley Hanson’s work.

Developing an Integrated Diagnostic Assessment System

The third issue that was identified by participants was the development of an integrated diagnostic

assessment system that takes a long-term perspective on diagnostic assessment and fluidly combines

formative feedback, summative evaluation, and targeted remediation. Participants noted that such a

system does not require the successful application of CDMs as several existing systems have

demonstrated. For example, the Cisco Networking Academy uses methods from CTT, IRT, and Bayes

net (Levy, 2009a; West et al., 2009), the Cognitive Tutors system uses data-mining techniques grounded

in modern cluster analysis techniques (Heffernan, 2009; see Nugent, 2009), the IMMEX system at the

University of California at Los Angeles uses IRT and neural network analysis coupled with hidden

Markov models (Soller & Stevens, 2007), and a new computerized adaptive test system that is currently

pilot-tested in China uses IRT in combination with a higher-order CDM (Chang, 2010; for similar ideas

see Cheng, 2009, and Finkelman, 2009).

The key to any successful diagnostic assessment system is not to lose sight of the targeted inferences

and decisions that are made by the associated stakeholder groups. Moreover, experience in the above

projects has shown that the development of a well-functioning fine-tuned diagnostic assessment system

requires typically numerous iterations that stretch over multiple years and cost millions of dollars in

person time and physical resources. Moreover, the most successful of these systems are designed to

monitor the development of respondents over time by mapping their learning progressions and learning

trajectories in multidimensional spaces. Rather than providing a merely efficient measurement solution

to a large-scale assessment that is given at a single point in time such as the Graduate Record

Examination or the Test of English as a Foreign Language, the best of these systems are designed to

support and foster learning within effectively structured learning environments. The resulting

assessments and feedback mechanisms are what Valerie Shute would call “stealth assessments” because

they are fluidly and unobtrusively integrated into a larger aligned repertoire of learning experiences

(Shute et al., 2009). Of course, CDMs can also be used for assessments that provide high-stakes

summative interpretations but the rhetoric surrounding their theory and use in the educational

Current Research and Practices for Cognitive Diagnosis Models   19  

measurement literature suggests that researchers would like to see them more frequently applied in the

learning contexts just described.

Closing Comments

Many or the participants felt that we are currently at somewhat of a sea change in the field of

diagnostic assessment in general and CDMs in particular. While the demand for fine-grained diagnostic

information is constantly growing in various areas of educational and psychological assessment, the role

that CDMs can play in this area is not as clear. Theoretically, these models seem to respond well to a

call for such diagnostic information but applications that show their “added value” over simpler methods

or alternative complex latent-variable methods are essentially non-existent. Still, as methodological

research on CDMs is ongoing and fueled by the desire of a new cadre of researchers in the field, many

opportunities that lie at the intersection of work in different disciplines can be harvested.

The participants in the CDMWG were overall highly excited and hopeful to work with each other

and others in the field to push the methodological and application boundaries in fields that are touched

by diagnostic assessment and CDMs. Although they reflect the ideas of a selected few, we believe that

the key ideas laid out in this white paper represent some of the current thinking in these fields more

broadly. We are encouraged by the interactions of the CDMWG and the current developments in the

field that a more comprehensive, integrated, and coherent picture surrounding diagnostic assessments

and CDMs will develop in the next decade.

Current Research and Practices for Cognitive Diagnosis Models   20  

References

References with a (*) are presentations given during the CDMWG meeting at SAMSI 2009. Almond, R. G., Williamson, D. M., Mislevy, R. J., & Yan, D. (in press). Bayes nets in educational

assessment. New York: Springer.

*Alonzo, A. (2009, July). Learning progressions: A cognitive diagnosis model? Paper presented at the

CDMWG at SAMSI Summer Program on Psychometrics, Raleigh, NC.

American Educational Research Association, American Psychological Association, National Council of

Measurement in Education (1999). The standards for educational and psychological testing.

Washington, DC: AERA Publications.

Cai, L., du Toit, M., & Thissen, D. (forthcoming). IRTPRO [software program].

Chang, H.-H. (2010, February). Advanced in computer-based internet testing. Presented at the MSMS

seminar in the EDMS department at the University of Maryland, College Park, MD.

*Cheng, Y. (2009, July). Computerized adaptive testing for cognitive diagnosis. Paper presented at the

CDMWG at SAMSI Summer Program on Psychometrics, Raleigh, NC.

Code of Fair Testing Practices in Education. (2004). Washington, DC: Joint Committee on Testing

Practices.

*Confrey, J., & Maloney, A. (2009, July). Developing learning trajectories and diagnostic assessments

based on student reasoning on rational number over time. Paper presented at the CDMWG at

SAMSI Summer Program on Psychometrics, Raleigh, NC.

*Cui, Y. (2009, July). Evaluating person fit for cognitive diagnostic assessment with the hierarchy

consistency index. Paper presented at the CDMWG at SAMSI Summer Program on Psychometrics,

Raleigh, NC.

de Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Press.

de Boeck, P., & Wilson, M. (2004). Explanatory item response theory models: A generalized linear and

nonlinear approach. New York: Springer.

de la Torre, J. (2008, July). The generalized DINA model. Paper presented at the annual International

Meeting of the Psychometric Society (IMPS), Durham, NH.

*de la Torre, J. (2009a, July). Estimation and DINA-based models. Paper presented at the CDMWG at

SAMSI Summer Program on Psychometrics, Raleigh, NC.

Current Research and Practices for Cognitive Diagnosis Models   21  

*de la Torre, J. (2009b, July). Model evaluation and comparison. Paper presented at the CDMWG at

SAMSI Summer Program on Psychometrics, Raleigh, NC.

*de la Torre, J. (2009c, July). The generalized DINA model framework. Paper presented at the CDMWG

at SAMSI Summer Program on Psychometrics, Raleigh, NC.

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.

*Finkelman, M. (2009, July). Computerized adaptive testing to predict an observable outcome. Paper

presented at the CDMWG at SAMSI Summer Program on Psychometrics, Raleigh, NC.

*Heffernan, N. (2009, July). TITLE TBA. Paper presented at the CDMWG at SAMSI Summer Program

on Psychometrics, Raleigh, NC.

Helzer, J. E., Kraemer, H. C., Krueger, R. F., Wittchen, H.-U., Sirovatka, P. J., & Regier, D. A. (Eds.).

(2008). Dimensional approaches in diagnostic classification: Refining the research agenda for

DSM-V. Arlington, VA: American Psychiatric Association.

*Henson, R. (2009a, July). The LCDM. Paper presented at the CDMWG at SAMSI Summer Program on

Psychometrics, Raleigh, NC.

*Henson, R. (2009b, July). Some challenges in estimation, programming, and implementation: What

should we work on? Paper presented at the CDMWG at SAMSI Summer Program on Psychometrics,

Raleigh, NC.

Henson, R., Templin, J., & Willse, J. (2009). Defining a family of cognitive diagnosis models.

Psychometrika, 74, 191-210.

*Huff, K. (2009, July). Educational diagnostic assessment design: Interdisciplinary pathways to

progress. Paper presented at the CDMWG at SAMSI Summer Program on Psychometrics, Raleigh,

NC.

Jang, E. E. (2005). A validity narrative: Effects of reading skills diagnosis on teaching and learning in

the context of NG TOEFL. Unpublished doctoral dissertation, University of Illinois at Urbana-

Champaign, Urbana-Champaign, IL.

*Jang, E. (2009, July). Score reporting and subsequent actions. Paper presented at the CDMWG at

SAMSI Summer Program on Psychometrics, Raleigh, NC.

Jaeger, J., Tatsuoka, C., & Berns, S. (2003). Innovative methods for extracting valid cognitive deficit

profiles from NP test data in schizophrenia. Schizophrenia Research, 60, 140-140.

*Karelitz, T. (2009, July). Between dichotomy and continuity: Ordered categories in diagnostic models.

Paper presented at the CDMWG at SAMSI Summer Program on Psychometrics, Raleigh, NC.

Current Research and Practices for Cognitive Diagnosis Models   22  

Leighton, J. P. (2004). Avoiding misconception, misuse, and missed opportunities: The collection of

verbal reports in educational achievement testing. Educational Measurement: Issues and Practice,

23(4), 6-15.

Leighton, J., & Gierl, M. (2007). Cognitive diagnostic assessment for education: Theory and

applications. Cambridge, UK: Cambridge University Press.

*Levy, R. (2009a, July). Markov-chain Monte Carlo (MCMC) estimation for diagnostic classification

models. Paper presented at the CDMWG at SAMSI Summer Program on Psychometrics, Raleigh,

NC.

*Levy, R. (2009b, July). Posterior predictive model checking for diagnostic classification models. Paper

presented at the CDMWG at SAMSI Summer Program on Psychometrics, Raleigh, NC.

Loye, N. (2008). Élaborer la matrice Q des modèles cognitifs dans diverses conditions et définir leur

impact sur sa validité et sa fidélité [Developing Q-matrices for cognitive models under a variety

conditions and evaluating their impact on the validity and fidelity of assessments]. Unpublished

doctoral dissertation, University of Ottawa, Ottawa, ON.

*Loye, N., Caron, F., Burney-Vincent, C., & Gagnon, M. (2009, July). Validity of diagnostic results

arising from a marriage between didactics and measurement. Paper presented at the CDMWG at

SAMSI Summer Program on Psychometrics, Raleigh, NC.

McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Erlbaum.

Mislevy, R. J., Rioscente, M. M., & Rutstein, D. W. (2009). Design patterns for assessing model-based

reasoning. Manuscript submitted for publication.

Mislevy, R. J., Steinberg, L. S., Almond, R. G., & Lukas, J. F. (2006). Concepts, terminology, and basic

models of evidence-centered design. In D. M. Williamson, I. I. Bejar, & R. J. Mislevy (Eds.),

Automated scoring of complex tasks in computer-based testing (pp. 15-48). Mahwah, NJ: Erlbaum.

Nichols, P. D., Chipman, S. F., & Brennan, R. L. (1995). Cognitively diagnostic assessment. Mahwah,

NJ: Erlbaum.

*Nugent, R. (2009, July). Clustering in cognitive diagnosis: Some recent work and open questions.

Paper presented at the CDMWG at SAMSI Summer Program on Psychometrics, Raleigh, NC.

*Rupp, A. A. (2009a). An overview of the space of parametric CDMs. Paper presented at the CDMWG

at SAMSI Summer Program on Psychometrics, Raleigh, NC.

Current Research and Practices for Cognitive Diagnosis Models   23  

*Rupp, A. A. (2009b). Frameworks for the validation of interpretations from diagnostic assessments.

Paper presented at the CDMWG at SAMSI Summer Program on Psychometrics, Raleigh, NC.

*Rupp, A. A. (2009c). Lessons learned from diagnostic task construction in various assessment

contexts. Paper presented at the CDMWG at SAMSI Summer Program on Psychometrics, Raleigh,

NC.

Rupp, A. A., & Templin, J. (2008). Unique characteristics of cognitive diagnosis models: A

comprehensive review of the current state-of-the-art. Measurement: Interdisciplinary Research &

Perspectives, 6, 219-262.

Rupp, A. A., Templin, J., & Henson, R. J. (2010). Diagnostic measurement: Theory, methods, and

applications. New York: Guilford Press.

Shute, V. J., Masduki, I., Donmez, O., Dennen, V. P., Kim, Y.-J., Jeong, A. C., & Wang, C.-Y. (2009).

Assessing 21st century knowledge and skills in game environments. Manuscript submitted for

publication.

Soller, A., & Stevens, R. (2007). Applications of stochastic analyses for collaborative learning and

cognitive assessment (IDA Document D-3421). Arlington, VA: Institute for Defense Analysis.

Tatsuoka, C. (2002). Data-analytic methods for latent partially ordered classification models. Journal of

the Royal Statistical Society (Series C, Applied Statistics), 51, 337-350.

*Tatsuoka, C. (2009, July). Partially ordered models of cognition. Paper presented at the CDMWG at

SAMSI Summer Program on Psychometrics, Raleigh, NC.

Tatsuoka, K. K. (1983). Rule-space: An approach for dealing with misconceptions based on item

response theory. Journal of Educational Measurement, 20, 345-354.

Tatsuoka, K. K. (2009). Cognitive assessment: An introduction to the rule-space method. Florence, KY:

Routledge.

Templin, J. L., & Henson, R. A. (2006). Measurement of psychological disorders using cognitive

diagnosis models. Psychological Methods, 11, 287-305.

Templin, J., & Rupp, A. A. (in press). On the origins of a statistical species: The evolution of diagnostic

classification models within the psychometric taxonomy. In G. Macready & G. Hancock (Eds.), A

Festschrift in Honor of C. Mitchell Dayton. Publisher to be determined.

Thissen, D, & Wainer, H. (Eds.). (2001). Test scoring. Mahwah, NJ: Erlbaum.

von Davier, M. (2005). A general diagnostic model applied to language testing data (RR-05-16).

Princeton, NJ: Educational Testing Service.

Current Research and Practices for Cognitive Diagnosis Models   24  

West, P., Rutstein, D. W., Mislevy, R. J., Liu, J., Levy, R., DiCerbo, K. E., Crawford, A., Choi, Y., &

Behrens, J. T. (2009, June). A Bayes net approach to modeling learning progressions and task

performances. Presented at the Learning Progressions in Science (LeaPS) conference, Iowa City, IO.

Current Research and Practices for Cognitive Diagnosis Models   25  

Appendix A: Presenter Biographies

(Alphabetical Listing by Last Name)

Alicia Alonzo. Alicia is an assistant professor at the University of Iowa where she specializes in formative classroom assessment for science, specifically the modeling of learning progressions for multiple skill sets. She was one of the co-organizers of the recent Learning Progressions in Science (LeaPS) conference at the University of Iowa, which brought together researchers from such diverse disciplines as cognitive science, learning sciences, science teaching, and educational measurement to discuss the state-of-the-art of conceptualizing, operationalizing, and measuring learning progressions in the field. Ying Cheng. Ying is currently an assistant professor in the Department of Psychology at Notre Dame University. Her research focuses the theoretical development and applications of item response theory (IRT), including computerized adaptive testing (CAT), test equity across different ethnicity/gender groups, classification accuracy and consistency with licensure/certification exams, such as state graduation exams. Recently she is working on cognitive diagnostic models and their applications to CAT. For her past work and proposed work in this area, Ying recently received the prestigious Bradley Hanson Award for Contributions to Educational Measurement from NCME. Jere Confrey. Jere is the Joseph D. Moore Distinguished Professor for mathematics education at the Friday Institute for Educational Innovation here in Raleigh. She is the director of the NSF-funded Generating Increased Mathematics and Science Opportunities (GISMO) project, which is a research lab whose work focuses on the design innovative approaches to learning and teaching mathematics and science. One of her areas of specialization in this regard is rational number reasoning, for which the GISMO project has released a database with relevant peer-reviewed research. Similarly, her DELTA: Diagnostic E-Learning Trajectories Approach project focuses on the design, implementation, and analysis of diagnostic assessments in this area for K-8. Ying Cui. Ying is currently an assistant professor in the Department of Educational Psychology at the University of Alberta. She received her Ph.D. in measurement, evaluation, and cognition from the University of Alberta. Her M.S. and B.S. are in statistics. Her research interests include educational measurement and evaluation, cognitive diagnostic assessment, research design and applied statistical methods, and statistical education. She is particularly interested in evaluating the construct validity of achievement tests, investigating cognitive factors underlying student test performance, and using cognitive models to interpret student test performance. She is currently pursuing ways of applying cognitive diagnostic assessments to classroom settings and evaluating their effectiveness in promoting student learning.

Current Research and Practices for Cognitive Diagnosis Models   26  

Jimmy de la Torre (co-organizer). Jimmy received his Ph.D. in quantitative psychology from the University of Illinois at Urbana-Champaign. He is currently an assistant professor of educational psychology at Rutgers University. His primary research interests are in the field of psychological and educational testing and measurement, with a specific emphasis on IRT and latent variable modeling. As a researcher, his main objective is to develop and apply state-of-the-art psychometric models and methodologies to address theoretically challenging issues that have important practical consequences. Moreover, his goal as a researcher includes the distillation and dissemination of the findings and implications of his work to an audience with diverse backgrounds and needs. His work on cognitive diagnosis modeling has been published in some of the top journals in the field. In addition, he has received funding support from institutions such as the NSF, Institute of Education Sciences, National Academy of Education, and Spencer Foundation to pursue this line of work. Matthew Finkelman: Matthew received his Ph.D. in statistics from Stanford University in 2004. His dissertation, “Statistical Issues in Computerized Adaptive Testing,” explored the use of modern sequential analysis techniques to improve the efficiency of diagnosis in pass/fail tests. After graduating, he worked as a psychometrician for CTB/McGraw-Hill and Measured Progress, where he conducted research in computerized adaptive testing, cognitive diagnosis modeling, and multidimensional IRT. In 2007, he switched his focus from educational assessment to biostatistics. He served as a research fellow at the Harvard School of Public Health from August 2007 to October 2008, where he helped develop a new mixture IRT model for psychiatric diagnosis. He is now an assistant professor at the Tufts University School of Dental Medicine. His current research interests include the application of computerized adaptive testing to medical diagnosis, the automated selection of test items for cognitive diagnosis, and the development of new statistical methods to analyze dental and other medical data. Neil Heffernan. Neil received his Ph.D. in computer science in 2001 from Carnegie Mellon University. He currently is an associate professor at the Worcester Polytechnic Institute where his research interests center around cognitive modelling and intelligent tutoring systems, specifically web-based systems. He has received numerous NSF-funded grants centered around learning in mathematics and science, including a prestigious CAREER grant for early scholars. His most recent grant strives to test a computer instructional math program that will help Worcester middle school students prepare for the MCAS math exams by giving them fine-grained diagnostic feedback. The program is developed in collaboration with Carnegie Mellon University and Carnegie Learning Inc. and will be implemented within an experimental design setting to study its effectiveness for improving student learning.

Current Research and Practices for Cognitive Diagnosis Models   27  

Robert Henson. Robert received his Ph.D. in quantitative psychology from the University of Illinoisis. He is now an assistant professor of Educational Research Methodology at the University of North Carolina at Greensboro. The primary focus of his research is latent variable modeling with an emphasis on cognitive diagnosis models. He is one of the foremost authorities in cognitive diagnosis modelling and has pursued a variety of research avenues related to model estimation, model fit assessment, and model application. He is also interested in the potential use of CDMs in identifying differential item functioning, and applications of CDMs outside the educational context (e.g., diagnosis of psychological disorders). He has developed a log-linear estimation approach for cognitive diagnosis models as part of an NSF grant, and is co-authoring a book on these models. His other research interests include multivariate statistics, hierarchical linear models, structural equation models, and mathematical statistics. Kristen Huff. Kristen Huff is Senior Director of Advance Placement Research and Assessment Design at the College Board. She received both her Masters of Education (University of North Carolina Greensboro) and doctorate (University of Massachusetts Amherst) in educational measurement. Before joining the College Board, Ms. Huff worked at Educational Testing Service on the development of TOEFL iBT using evidence-centered design. Her research areas of interest are assessment design, diagnostic score reporting methodology, and item difficulty modeling. She is one of the key voices in the area of diagnostic assessment design and modeling in our field and always a critical disseminator and generator of knowledge. She is also the director of the Northeastern Educational Research Association (NERA), whose annual meeting is from October 21-23 in Rocky Hill, Connecticut. Eunice Jang. Eunice is an Assistant Professor in the Department of Curriculum, Teaching and Learning at the Ontario Institute for Studies in Education (OISE) at the University of Toronto. She is interested in validity issues in cognitive diagnostic assessment, fairness and equity issues in testing, achievement gaps among students from diverse backgrounds, and literacy assessment practice. She is also keen on epistemological issues surrounding human inquiry approaches in social sciences and education. Eunice is recipient of the 2006 Jacqueline Ross TOEFL Dissertation Award, the Education Alumni Association Outstanding Student Medal, the Tatsuoka Measurement Award, the 2003 IELTS Best MA Dissertation Award, and the Katharine Aston Award for Outstanding Thesis. In recent years, she examined literacy and numeracy skill profiles for Grade 9 and 10 students based on their performance on Canadian assessments. She also collaborates with Dr. Jim Cummins on the validity project for “Steps toward English Proficiency” with over 50 ESL teachers across Ontario. STEP is intended to serve teachers to assess and track all English language learners’ literacy development in Ontario schools.

Current Research and Practices for Cognitive Diagnosis Models   28  

Tzur Karelitz. Tzur received his Ph.D. in 2004 at the University of Illinois at Urbana Champaign with a thesis that focused specifically on modeling skills polytomously in a multidimensional modeling framework, for which he has argued repeatedly in the literature. During his time at the PACE center at Tufts University, he furthermore conducted applied research on wisdom, creativity, and practical and analytical thinking. He collaborated on evaluating and improving the university’s admission process and developing assessments of student outcomes, specifically, assessments of wise leadership qualities, all important so-called 21st century skills. Tzur is currently a senior research associate at the Center for Science Education in the Education Development Center in Newton, MA. Roy Levy. Roy received a Ph.D. in measurement, statistics and evaluation from the University of Maryland in 2006. He is currently an assistant professor of measurement, statistics, and methodological studies at Arizona State University. His work primarily concerns the use of statistical and psychometric models in assessment, education, and the social sciences more broadly. His interests include methodological investigations and applications in psychometrics and statistical modeling, focusing on models and procedures associated with IRT, structural equation modeling, cognitive diagnosis models, Bayesian networks, and Bayesian approaches to inference. Representative projects that he is currently involved in include dimensionality assessment for multidimensional IRT, Bayesian networks for skills diagnosis and learning progressions, model comparison strategies in mean and covariance structure models, and advances in Markov chain Monte Carlo estimation of psychometric models. Nathalie Loye. Nathalie is currently an assistant professor at the Université de Montréal in Montréal, Canada. She received her Ph.D. from the University of Ottawa where she developed a keen interest in how experts use various empirical and theoretical sources of information to construct Q-matrices for diagnostic assessment. She currently continues this work within the Canadian measurement community to foster the development of an international community of researchers in this area. Rebecca Nugent. Rebecca is currently a visiting assistant professor in the Department of Statistics at Carnegie Mellon University. She received her Ph.D. in statistics from the University of Washington, and her master's in statistics from Stanford University. She has a Bachelor's degree in mathematics, statistics, and Spanish from Rice University. Her research is primarily focused on non-parametric clustering and visualization of the (possibly) hierarchical cluster structure in data. She is also interested in assessing stability of cluster solutions by assigning measures of significance to estimated clusters. More recently, she has focused on educational data mining and, more specifically, the development of methodology (with a clustering framework) as alternatives to cognitive diagnosis models. Her recent work includes conditional subspace clustering with skill mastery information, a generalized single linkage method for estimating the cluster tree of a density, and skill set profile clustering based on student capability vectors computed from online tutoring data.

Current Research and Practices for Cognitive Diagnosis Models   29  

André A. Rupp (co-organizer). Andre is an assistant professor of measurement, evaluation, and statistics at the University of Maryland. He received his Ph.D. in measurement, evaluation, and research methodology from the University of British Columbia. Until recently, he worked at the Institute for Educational Progress in Berlin, Germany. His current research interests center around cognitively-grounded assessment approaches and associated statistical models. He is interested in investigating the theoretical potential and practical limitations of diagnostic classification models, researching the robustness of such models to model misspecification, and developing principled diagnostic assessment approaches for modeling learning progressions of complex skill sets using innovative digital technologies such as games- and simulation-based learning environments. Curtis Tatsuoka. Curtis received his Ph.D. in statistics from Cornell University. He is currently an associate professor of neurology and director of biostatistics at the Neurological Institute and Neurological Outcomes Center of Case Western Reserve University. He has been working the area of cognitive diagnosis since his dissertation, in particular developing theories and methods for adaptive testing. He currently works on using lattices, a subclass of partially ordered sets, as cognitively diagnostic models in neuropsychological assessment. Applications of his work include determining differential cognitive deficits in schizophrenics, detecting early cognitive-based markers for Alzheimer’s disease, and measuring cognitive change in clinical drug trials.

Current Research and Practices for Cognitive Diagnosis Models   30  

Appendix B: Schedule of the Thematic Sessions

Day Time Time Presenters 7:30 – 9:00 Breakfast

9:00 – 10:00 Introduction de la Torre

Rupp

10:00 – 11:30 Diagnostic Assessment Approaches Finkelman

Huff Tatsuoka

11:30 – 12:30 Joint Session with Other Working Groups David Banks 12:30 – 2:00 Lunch

2:00 – 3:30 Developmental Theories for

Diagnostic Assessment

Alonzo Confrey Karelitz

Monday

3:30 – 5:00 Task & Q-matrix Construction Cui

Loye Rupp

7:30 – 8:30 Breakfast

8:30 – 10:00 Fully Parametric Models for Classification de la Torre

Henson Rupp

10:00 – 11:30 Non-parametric and Semi-parametric Models

for Classification

Cui Nugent

Tatsuoka 11:30 – 12:30 Joint Session with Other Working Groups Valen Johnson 12:30 – 2:00 Lunch

2:00 – 3:30 Challenges in Estimation, Programming, and

Implementation

de la Torre Henson Levy

Tuesday

3:30 – 5:00 Model Fit Assessment & Refinement de la Torre

Levy Cui

7:30 – 8:30 Breakfast

8:30 – 10:30 Optimal Test Design and Computerized

Adaptive Testing

Cheng Finkelman Tatsuoka

10:30 – 12:00 Score Reporting & Subsequent Action Heffernan

Huff Jang

12:00 – 1:30 Lunch

1:30 – 3:00 Validation Rupp

Heffernan

Wednesday

3:00 – 5:00 Future Challenges for Diagnostic Assessment

& Cognitive Diagnosis Models Moderated

Panel Session

1

The King’s Foot of Patient Reported Outcomes: Current Practices

and New Developments for the Measurement of Change.

Richard J. Swartz*, Ph.D. The University of Texas M. D. Anderson Cancer Center

Li Cai, Ph.D., University of California, Los Angeles

Diane Fairclough, Dr.P.H., University of Colorado

Lori McLeod, Ph.D., RTI Health Solutions

Tito Mendoza, Ph.D., The University of Texas M. D. Anderson Cancer Center

Bruce Rapkin, Ph.D., Albert Einstein College of Medicine

Carolyn Schwartz, Sc.D., Deltaquest Foundation, Inc and Tufts University Medical School

And the SAMSI Psychometric Program Longitudinal Assessment of Patient Reported Outcomes

Working Group1

* all listed authors contributed equally. First listed author coordinated the final document, and others are listed

alphabetically. 1. Other working group members: Bryce Reeve, Ph.D., Charles Cleeland, Ph.D., Betsy Feldman, Ph.D., Carmen

Rivera-Medina, Ph.D., Cheryl Hill, Ph.D., Ethan Basche, M.D., Herle McGowan, Ph.D., Ken Bollen, Ph.D., Knashawn Morales, Sc.D., Lauren Nelson,Ph.D., Mark Price, M.A., M.Ed. Rochelle Tractenberg, Ph.D., MPH, Quiling Shi, Ph.D., Theresa Gilligan, M.S., Thomas Atkinson, Ph.D., Valerie Williams,Ph.D., Xiaojing Wang, Jun Wang,

2

Patient- or person-reported outcome (PRO) data is increasingly collected and recognized

as being valuable in clinical research, regulatory review, and disease management. Barriers to

longitudinal collection of PRO data which have historically been raised no longer seem

insurmountable:

There is substantial regulatory support to collect PRO data capturing patient subjective

experiences directly from patients.

Clinical investigators, NIH institutes, and industry sponsors increasingly recognize the

importance of gathering PRO data to supplement other information in clinical research. .

Consensus has developed around standards for development processes, measurement

properties, and implementation strategies for PRO measures, and these standards have

recently been encoded in documents such as the FDA’s ―Guidance for Industry: Patient-

Reported Outcome Measures Use in Medical Product Development to Support Labeling

Claims.‖ 74

A critical mass of publications demonstrate methods to gather patient-reported

information for routine medical management, and the predictive significance of patient-

reported information.

An increasing number of well-developed instruments is available.

Low-cost technologies are available for data collection; patients and staff are widely

familiar with these data collection strategies, and respondent burden can be minimized

through electronic data capture strategies like skip patterns and computerized adaptive

testing.

3

Recently PRO data has expanded from the commonly used measures of health-related

quality of life and health status measures to also include symptom-specific measures and multi-

symptom questionnaires. The formation of the PRO Consortium within the Critical Path

Institute has encouraged collaborations between academics, sponsors, and the FDA to develop

robust, publicly available symptom-specific instruments for use towards labeling claims. Single

symptom items and item banks (such as being developed through the PROMIS [Patient-Reported

Outcomes Measurement System] initiative), and multi-item questionnaires (Such as the MDASI

[M.D. Anderson Symptom Inventory]) have advanced the ability to understand the patient

experience around specific symptoms and symptom clusters with greater precision. And

sponsors increasingly consider broadening the use of PROs beyond efficacy evaluation to assess

safety and adverse events. In response to this new interest, the National Cancer Institute’s PRO-

CTCAE (Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse

Events) initiative has emerged.

Any use of PROs in the study of illness and treatment obviously requires the

measurement of change. Understanding the patient experience at multiple time points and how

that experience may change provides investigators and clinicians with a more comprehensive

picture of the impact of disease and treatment. In the routine care setting, real-time collection of

PROs over time can assist with supportive care and treatment decisions, and flag concerning

trends. Systematic collection of PROs across patients in this context may be used to support

comparative effectiveness research (CER), which is of increasing interest for informing health

policy decisions. In the clinical trial setting, multiple repeat PRO measures as secondary

endpoints can increase confidence that the impact of an intervention is being appropriately

quantified.

4

Far less obvious are the complex issues that arise in conceptualizing and operationalizing

―change‖ in PRO measures and related constructs. PROs are an important aspect of the patient

experience, and further research is necessary to increase our measurement and understanding of

such outcomes and how they change. This paper outlines issues associated with

operationalizing change in PRO measures, summarizes methods to address these issues,

illuminates gaps in the current state of science, and identifies new developments rich in potential

to fill those gaps. Section 1 outlines the major issues in conceptualizing and operationalizing

change in PRO scores. Once operationalized, change can be modeled. Current methods to

model change in PRO scores over time are discussed in section 2. Recognizing that to

understand change requires identifying what constitutes a meaningful change, methods to

identify meaningful change are described in section 3. Section 4 identifies some of the major

impediments to detecting true change, and section 5 identifies new developments that show

potential for dealing with the challenges, as well as important gaps that are fertile ground for

future researchers.

1. CONCEPTUALIZING MEASUREMENT OF TRUE CHANGE IN PRO SCORES

The paradigm that dominates the concept of change in psychometric research derives

from classical test theory. In this perspective, observed measures of change (based on

subtraction of posttest from pretest scores on a given repeated measure) can be decomposed into

components: change in an attribute’s latent true score plus differences due to random error of

measurement that occur at each observation. Cronbach and colleagues20

recognized that this

error could distort true change, and so devised an approach to regress individuals’ observed

change scores toward the grand mean of sample change, in accord with the unreliability of the

measure. Regression approaches have also been used to adjust for measurement ceiling and

5

floor effects that necessarily attenuate change (i.e., initial scores close to a measure’s maximum

value simply cannot increase as much as other scores). These classical corrections to change

scores are problematic because they are highly dependent upon the characteristics of a given

sample. An individual may give exactly identical responses at each of the repeated assessment

times, yet paradoxically be assigned a non-zero change score because others in the sample have

changed.

Item response theory (IRT) offers a conceptual model and methodology that helps to

overcome some of the shortcomings of classical test theory. In IRT, the estimation of an

individual’s latent true score is based on the pattern of their responses to items with well-defined

parameters of difficulty and discrimination. These item parameters may be derived separately

and applied independent of the characteristics of any given sample. IRT item parameters map

items onto the same unidimensional latent trait continuum as the person scores, and this

facilitates instrument construction making it easier to identify and deal with ceiling and floor

problems. If we assume that item characteristics are invariant and universally applicable, then

estimates of change are not subject to sample-specific effects.

Unfortunately, IRT’s strength in measuring change also contributes to its major

weakness. To be included in a measure, item response probabilities must fit a specified model

(usually a logistic function), as a function of the given latent true score of interest. Standard IRT

models have an underlying assumption that the understanding and meaning of a given attribute

must be universally applicable to all people in all situations; that is that the underlying model

describes how the PRO being measured drives the item responses in the same way for all people.

So far, many of the mainstream IRT models do not allow for large effects of individual

differences other than influences from the person’s PRO or trait level. Item responses that do not

6

fit the probabilistic model and its assumptions simply do not follow the theory and typically are

discarded because their response mechanism cannot be accounted for. Such an approach is

potentially problematic for PROs.

In order to understand the special problems of measuring change in patient-reported

outcomes, it is important to distinguish among several broad classes of measurement. Both

classical test theory and IRT were initially developed to deal with measures of performance,

particularly in educational and intelligence testing. In the health domain, tests of mobility, grip

strength, or completion of neurocognitive tasks are all performance measures. All of these

measures require the individual to demonstrate ability, and all have an objective ―right‖ answer.

Although performance measures may involve observational, verbal, or written assessment of

individuals, they are NOT properly understood as PROs any more than measurements of viral

load or blood pressure would be PROs.

A second category of measurement involves individual perceptions. In this domain,

patients are asked to report on the occurrence, frequency, or magnitude of some event of interest.

Perceptual measures such as the number of missed doses, the number of days spent in bed or the

frequency of unprotected sexual intercourse may be considered as legitimate PROs. The most

important aspect of these perceptual measures is that they necessarily refer to an overt activity

that could in principle be observed, if not for practical or ethical constraints. Perceptual

measures ask individuals to report on some objective behavior as accurately as possible from

their personal recollection of events and experiences. To be sure, factors extrinsic to the attribute

of interest such as accuracy of recall, social desirability or cueing effects can enter into the error

of measurement, but in each case there is an externally verifiable ―right‖ answer.

7

Many PROs represent a third broad category, evaluative measures. Evaluative measures

include ratings by patients of how good or bad they feel in respect to a given set of experiences.

Questions asking individuals to judge their overall health status, quality of care, symptom

distress, and performance difficulty are all evaluative in nature. There are no external

validations for the responses to evaluative measures; an individual may visibly experience

shortness of breath when he climbs one flight of stairs, but if he reports little difficulty, than the

researcher or clinican must accept that answer. Again, extrinsic factors such as desirability or

cueing may affect evaluation, but even more important for measurement purposes are intrinsic

factors related to the process individuals use to appraise the PRO being evaluated. An individual

may say that he had little difficulty going up the stairs compared to last month or compared to

what he was told to expect at cardiac rehab sessions or compared to what he expected after

reading about his condition on the Internet. Each of these standards of comparison is valid, and

an individual’s response might be apparent if we were to elicit the standard they were using. If

measurement operations were to impose a standard of comparison (i.e., compared to a 25 year

old individual in good health), we might get the answer that we expect, but it might bear no

relationship to the individual’s subjective experience. Note that some overtly perceptual

measures have evaluative components. For example, how often an individual received emotional

support from neighbors depends on the individual’s interpretation of ―emotional support.‖ In all

these situations where the individual has their own subjective appraisal, difficulty in the

measurement of change is compounded, because it cannot be assumed that an evaluative item’s

meaning or its relationship to underlying latent variables are constant over time. Identifying true

change in a PRO score involves not only assessing a change, but also attributing the change to an

appropriate mechanism. It is of use to know if the actual condition (pain experienced by the

8

patient) has changed, or if the patient’s appraisal of the condition changed. Both will affect the

experience for the patient, but can have very different implications.

Responsiveness

Responsiveness has emerged in PRO and quality of life (QOL) measurement literature as a

PRO instrument’s sensitivity to detect a true difference or a true change (in a longitudinal setting).

Simply stated, a PRO instrument is responsive if it shows change when there is true or known

change. However, demonstrating evidence of an instrument’s responsiveness is difficult.

Often evidence of responsiveness requires identifying a responder using a measure other than

the PRO scale, such as a related clinical measure. Then, using this external measure to classify

patients into responders and non-responders, several statistics such as Cohen’s effect size or Guyatt’s

responsiveness statistics are available to provide effect size estimates of how responsive the PRO

scale is to changes in the external measure.33 Although no consensus standard for a responsiveness

statistic exists, a recent review and comparison of the available responsiveness statistics identified

one index, Cohen’s effect size, as most appropriate. Cohen’s effect size was selected because it

anchors observed change against variability at baseline, it is less vulnerable to extreme values and is

more readily interpretable.51 Good overviews of responsiveness can be found in papers by Norman

et al 51 and Fayers and Hayes.26

It is important to note that responsiveness is a contextualized attribute of an instrument5

rather than an unvarying characteristic—it is specific to and will vary as a function of who is being

analyzed (individuals or groups), which scores are being contrasted (cross-sectional versus

longitudinal), and what type of change is being quantified (observed change vs. important change).

9

As such, it is a characteristic of the performance of the instrument in a given setting with a given

population.

2. MODELING CHANGE

Once instruments are available that validly measure a change, we can model that change

over time. Only after first selecting a sensible statistical model can researchers maximize the

precision of estimates, enhance the power to detect effects, and improve the overall validity of

results drawn from data analysis. Traditional statistical approaches in longitudinal studies have

relied on the use of some form of the general linear model (GLM). Hedeker and Gibbons35

present accessible summaries of these GLM-based methods, and Muller and Stewart47

offer a

systematic and updated treatment of the underlying GLM theory.

In general, GLM-based methods have a number of deficiencies that preclude them from

being routinely applied to analyze longitudinal data. Basically, GLM-based methods make

simplying assumptions about the longitudinal data that rarely are met in practice. In addition, at

the core of GLM-based methods is a comparison of mean differences; thus GLM-based methods

do not provide information about individual differences in the longitudinal trajectories.

Although GLM-based methods have application, newer methods more suited for

longitudinal analyses are now available and widely used. There are two general model

formulations that developed somewhat independently within two different research areas. The

first modeling framework is multi-level modeling (MLM). Such models are also referred to as

hierarchical linear models in educational and behavioral sciences55

, and mixed-effects models or

mixed models in biometrics and medical statistics.21

The second modeling framework is

10

structural equation modeling (SEqM), particularly the so-called latent curve models for repeated

measures data.11

(In the literature, structural equation modeling is typically abbreviated as SEM;

however in this manuscript SEM will abbreviate standard error of measurement). SEqM

represents the culmination of econometric/sociometric simultaneous equations (path) analysis

and the psychometric factor analytic measurement models.10

In both MLM and SEqM, we specify a model directly on the repeated observations for

each individual. MLM and SEqM also offer more flexibility than GLM-based methods to more

accurately model the functional forms of the change over time. Specifically, both MLM and

SEqM allow for individual differences in the initial status and rate of change, which are

represented as (co)variance components. Both time-varying and time-invariant covariates can be

included in the models to help researchers better understand the causes and patterns of change

for individuals as well as groups. Through the use of full-information maximum likelihood,

MLM and SEqM require only the relatively weak missing at random (MAR) assumption for

missing data43

(discussed in more detail in a section on missing data). Thus these models can

handle unbalanced designs with ease. Importantly, MLM or SEqM analyses provide estimates of

individual characteristics of growth and how these individual characteristics relate to the

covariates.

For a large class of models, MLM and SEqM lead to equivalent model formulations.

MacCallum, Kim, Malarkey, and Kiecolt-Glaser44

discuss these similarities in detail. The

essence of the similarities is that the random effects in MLM are specified as latent variables in

SEqM. If the two frameworks produce equivalent models, it can be shown that the two

frameworks lead to the same parameter estimates.4 MLM is more advantageous when subjects

are clustered (e.g., subjects nested within clinics) because accounting for and modeling

11

additional levels of nesting in MLM is straightforward. SEqM is more flexible when some of the

covariates are latent constructs that are measured by fallible observed indicators because SEqM

easily accounts for measurement error.

3. ASSESSING CHANGE: IDENTIFYING MEANINGFUL CHANGE

Identifying true change in PROs over time is only one important facet of analyzing

change. A second important aspect is identifying when a change in a PRO score is meaningful.

Because PRO scales are typically without meaningful units, it is difficult to make clinical and

regulatory decisions based on PRO assessments. In an effort to provide guidance to decision-

makers, researchers have proposed various methods for estimating minimal important

differences (MIDs) or minimal clinically important differences (MCIDs) for PRO scales. An

MID is defined as the ―smallest difference in a score in the domain of interest which patients

perceive as beneficial and which would mandate, in the absence of troublesome side effects and

excessive cost, a change in the patient’s management.‖38

Basically an MID is the smallest PRO

score difference that can be judged as meaningful.

The FDA draft guidance for industry, Patient-Reported Outcome Measures: Use in

Medical Product Development to Support Labeling Claims.74

extends the responsiveness property

to include some definition or benchmark for change on the PRO scale which characterizes the

meaningfulness of an individual patient’s response to active treatment rather than a group’s response

to treatment. This indicates the importance of identifying meaningful change in order to interpret

and use PRO scores in clinical trials. Patients achieving the benchmark amount of response are

considered treatment ―responders,‖ whereas patients not achieving that amount of response are

considered ―non-responders.‖ The draft guidance provides examples of definitions, such as a 2-point

change on an 8-point scale or a prespecified percent change from baseline. These prespecified values

12

are defined using external measures and should be at least as large as an MID because minimal

change although important to the patient, and dictating some change in treatment strategy may not be

sufficient to classify a patient as a responder.

At this time, there is no consensus on the best methodology for defining the values for a

MID. Consequently, common practice is to calculate multiple estimates of MIDs and compare

both the values and the results when each estimate is applied to judge the meaningfulness of a

reported change.57,67

The FDA’s draft guidance also refers to the practice of using multiple

methods to compare MID estimates and describes this approach as ―generally helpful‖ when an

MID is applied to clinical study results. The FDA guidance further commented on the need for

optimal approaches and standards for MID definitions to be set for the interpretation of clinical

study results, presumably with the goal of providing clear recommendations in a final version of

the guidance.

After the MID or a range of MIDs has been defined, comparisons at the patient level or at

the group level may be made using the MID(s) as a discriminating guideline. For example, the

mean difference between treatment groups may be compared to the MID to supplement

statistical findings and provide information regarding the clinical meaningfulness of the

difference. At the patient level, the percentage of patients achieving change of at least one MID

for each domain can be compared across treatment groups as a criterion for the amount of

improvement. Specifically, percentages of patients achieving at least one MID of change versus

those not achieving an MID often will be compared overall and for each treatment group across

the entire treatment period. Additionally, the percentage in each group can be compared

statistically as a test of greater improvement for one treatment over another (e.g., comparing the

treatment versus placebo). There are three common categories of MID estimation techniques

13

1. Anchor-based methods: compare changes in PRO scores over time with patient-

or clinician-reported global ratings of overall change in disease severity on a

balanced Likert-type scale. For more detail see the study by Juniper et al. 39

The

name of the class of method derives from the PRO results being ―anchored‖ or

defined by differences in ratings of change (either self-reported or clinician-

reported ratings). More recent adaptations include methods to optimize

classification of the ―achieved‖ vs. ―not achieved‖ minimum change on the

anchor using ROC curves.

2. Distribution-based methods rely on the distribution of the empirical data from an

administration of the PRO measure. Most commonly the distribution based MIDs

are some function of the standard deviation of the baseline scores (this includes

methods based on the Standard Error of Measurement; or SEM). For more details

see Norman et al.49

and Wyrwich et al.77

3. Statistical Rules of Thumb MID estimation methods are, as the name implies,

based on statistical rules of thumb. For example, across numerous studies a 0.5-

unit change or greater per response on a 7-point graded-response question has

been applied to define an MID.32,50

This method is straightforward in application

and reasonably consistent across quality of life domains for various conditions

such as heart failure, chronic respiratory disease, and asthma.39

4. MAJOR IMPEDIMENTS TO DETECTING CHANGE

We have identified what constitutes a true change in a PRO score is, as well as how to

model this change and how to interpret the meaningfulness of the change. Next, we delve into

some of the impediments we encounter when measuring change in practice. There are a number

14

of threats to the measurement of true change. We focus on three particular threats relevant to

longitudinal PRO data: 1) response shift, 2) instruments with varying sensitivity across the trait

of interest, and 3) non-ignorable dropout during a trial. The estimates of change are more

sensitive to these factors when the objective is to estimate the magnitude of change within a

specific population than when the objective is to estimate the differences in change between

groups.

Response shift

As mentioned previously, individuals employ subjectivity to appraise their quality of life

and other PRO variables; in fact no measurement of quality of life or related evaluative

constructs is possible without subjective appraisal. It is perfectly understandable that an

individual’s criteria for these PROs can change during (and even be changed by) a course of

illness and treatment. In the area of quality of life research, Schwartz and Sprangers64,68

refer to

these changes as ―response shift‖ phenomena. Many studies over the past decade have

unequivocally demonstrated the ubiquity of response shifts in all manner of PRO measures.2

Sprangers and Schwartz identified three types of response shift: recalibration, a change in the

person’s internal standards used to make comparisons; reprioritization, a change in the

respondent’s values; and reconceptualization, a change in the respondent’s view or definition of

the construct.68

Upon first inspection the subjective aspects of a ―response shift‖ may seem to contribute

to measurement ―error‖ in that many current PRO measurement instruments do not account for

the subjective aspects that influence the score. However, labeling the effects of subjectivity as

error is a misnomer. In practice, the error term represents deviations in the score that are not

explained by the measure. In the case of a misspecified model, where systematic variability

15

exists but is unaccounted for, the variance of the error term absorbs that variability. Perhaps a

paradigm shift is required: In this example of subjective appraisal factors, the model is more

succinctly considered to be misspecified.

It is clear that in order to have more precise and meaningful measures of PROs, we must

know when and how to account for the subjectivity in our measurement of the PROs. This

differs from asking an individual to simply recount an activity or event, because we are asking

about their own judgment. In order to more fully address the nature of change in evaluative

measures, Schwartz and Rapkin introduced the notion of the ―contingent true score.‖ They argue

that any evaluative measure must be interpreted contingent on a cognitive-affective process of

appraisal that underlies an individual’s response.54

They formulate that the true score depends

upon four parameters of appraisal: the individual’s frame of reference or interpretation of an

item, a sampling of experiences, the standards of comparison, and the algorithm for combining

or reconciling discrepant experiences. Rapkin and Schwartz discussed how the three types of

response shift, reconceptualization, reprioritization, and recalibration, could be understood in

terms of changes in appraisal parameters that in turn account for discrepancies in the observed

versus expected changes in a patient’s response to evaluative measures.54

Subsequent research

using cognitive interviewing by Bloem and colleagues9 and ethnographic methods by Wyrwich

and colleagues76

have helped to validate this model by showing that these four domains of

appraisal substantially account for individuals’ introspective statements about their quality of

life.

It follows that any measurement of change in evaluative measures (and in certain

perceptual measures) would be contingent upon change in the underlying parameters of the

appraisal. To explain we revisit the example from section 1 of an individual rating his ability to

16

climb stairs without experience shortness of breath. If we re-evaluate that individual six months

later, he may climb the stairs in 30 seconds with little shortness of breath, but rates his difficulty

as high. This seemingly inconsistent measurement (indicating more difficulty when physically

exhibiting more ease with the situation) occurs because of a change in his subjective appraisal.

At the six-month follow up perhaps the patient researched his condition and expected his

recovery to be faster or the post-surgical pain he was previously experiencing is now gone or

because he has been working so hard in rehabilitation. Again, all of these underlying reasons are

perfectly understandable and valid; therefore the patient’s appraisal is part of their experience of

QOL. All of these underlying reasons for his response shift are perfectly understandable and

valid; therefore the patient’s appraisal is part of their experience of QOL. Expected changes

may be based on clinical judgment, a comparison with performance measures, or statistical

norms in a given sample. Accounting for appraisal parameters will essentially reduce the

measurement error (as estimated by the residuals).

It follows from the contingent true score model that response shifts in PROs induce bias

or measurement error only because in reality they are an un-specified aspect of the classical

model. In fact, evaluative outcomes necessarily require individuals to judge their experiences in

order to make their ratings. How individuals judge their experiences will largely determine their

response to items. Because appraisal processes are highly subject to change over time, we

contend that a measurement of ―true change‖ in evaluative measures requires a direct

measurement of the appraisal. Put differently, change in many PROs of interest cannot be

adequately understood in terms of movement along a single latent variable, representing better

versus worse experiences. Rather, change in several of these PROs can be adequately

understood only in the context of the appraisal.

17

Sensitivity Changes Across the PRO Continuum

It is challenging to develop items to measure across the entire range of may PROs, and

instruments tend to be adequately sensitive for only subsets of the range of the construct rather

than for the entire range. Even the newly developed PROMIS instruments – The item banks of

which underwent extensive qualitative and quantitative development – tend to be more sensitive

and reliable at certain values for the construct.29,58,71

For example, the Pain Impact bank from the

PROMIS instruments has very little reliability at the lower end of the range. These qualitative

issues affect the measurement of PROs, and therefore affect the ability to longitudinally measure

PROs; instruments that are sensitive to a particular change at a particular level in the PRO may

not be sensitive to changes at other levels in the PRO.

Missing Data

Any researcher involved with longitudinal studies using PRO measures knows the

problems that arise in the analysis of change due to missing data. Quite often the data are missing

because the subject’s quality of life has dramatically declined, or the symptoms have become so

severe that they stopped reporting their data. Therefore, the missingness of the data depends on

the outcome being measured, and such missingness cannot be ignored. Statistically speaking

these are non-ignorable missing data, or missing not at random (MNAR). These data are

problematic because selection of the appropriate analytic models depends on untestable

assumptions.25,37,46

Most classical statistical models depend on the missing at random (MAR) assumption.

When the data are MAR, the probability of missingness depends only on the observed data

( )obsY and the covariates ( X ). Appropriate analytic models in this case include well-known and

widely available methods such as maximum likelihood estimation using mixed effects models.

18

When missing data are non-ignorable, the probability of missingness depends also on the

unobserved values. In such an instance, the use of analytic methods such as maximum

likelihood estimation may generate biased estimates.

As most statistical methods assume that missing data are MAR, the typical approach for

non-ignorable missing data is to collect or determine ancillary data that capture the relationship

between missingness and the unobserved data. This ancillary information is generally other

measures of change or response ( ancY ) and not baseline covariates ( X ). Ancillary information

may include other measures of disease progression or response to treatment or concurrent

assessments by caregivers that continue when the subject cannot provide the assessment. The

assumption is that when the ancillary data is included in the analysis, the missing data are MAR

(that is, with the ancillary data, the missing data can be ignored). 24,43

Methods that have been

considered for non-ignorable missing data are multiple imputation using ancillary data, pattern

mixture models, and shared parameter (joint) models.

A classical method for handling missing data is multiple imputation, which assumes that

the missing data are MAR (independent of the value of the missing scores given observed scores

( obsY ) and covariates ( X )). However, if the information used in the imputation procedure is

limited to the information that would be used in the ultimate analysis (i.e. no ancillary data), the

results of multiple imputation will be the same as if one estimated the longitudinal change using

standard longitudinal methods (such as mixed effects models). The use of multiple imputation

will result in improved estimates of change only if ancillary information ( ancY ) is incorporated

into the imputation, and the data are then deemed MAR conditional on the observed data,

covariates and the ancillary information that has been incorporated.24

19

Pattern mixture models assume that there are subgroups of patients defined by an

indicator for a pattern of missing data, and the data are MAR within the subgroup given the

model for change over time.37,46

The overall results are based on weighted averages of the

parameter estimates within these subgroups. The ancillary data enters into the process by

defining the subgroups. Although this method initially appears to be easy to impliment, the

typical analyses requires that either patterns are pooled until there is sufficient information to

estimate all the parameters within the subgroup or assumptions are made to

extrapolate/interpolate across missing assessments.

Joint (shared parameter) models simultaneously estimate the longitudinal trajectory of the

outcome of interest ( obsY ) and the ancillary information ( ancY ) usually linking the models

through the random effects.60,61

The assumption is that conditional on the observed data ( obsY

and ancY ), the data are MAR.

5. NEW DEVELOPMENTS

New developments on the cutting edge of patient-reported outcome measurements can

address some of the previously mentioned pitfalls. New modeling techniques can enhance the

PRO measures and address other issues related to change, such as identifying an MID or MCID.

Assessing appraisal variables and accounting for their effects can increase the sensitivity to

detecting true change. Furthermore, taking advantage of technological advances to administer

PRO measures can overcome some missing data issues, and possibly facilitate standardization of

some of the appraisal variables.

The Quality of Life Appraisal Battery:

20

A complete and adequate understanding in changes of PRO scores requires direct

assessment of changes of appraisal to account for response shifts. Toward this end, Rapkin and

Schwartz introduced a Quality of Life Appraisal Battery that can elicit multiple measures of each

of the four appraisal parameters.54

The Quality of Life Appraisal Battery includes separate

sections to assess each of the four components identified by Rapkin and Schwartz: persons’

frame of reference for considering quality of life, how persons sample experiences within that

frame, how persons evaluate experiences, and how persons summarize and combine evaluations

to describe HRQOL.54

Li and Rapkin42

have demonstrated that this Quality of Life Appraisal

Battery can successfully account for changes in quality of life measures that are due to changes

in appraisal. Theefore a rich area of future research involves how to incorporate appraisal

information into PRO measurement. For purposes of this report, it will be useful to briefly

summarize the four sections of the battery: Frame of Reference, Sampling Experiences,

Standards of Comparisons, and the individual’s combinatory algorithm. Detailed information on

the scoring and psychometric properties of the Quality of Life Appraisal Battery is available

from the authors.

Frame of Reference: Rapkin introduced a measure to probe patients about their life

goals.53

The responses to this instrument facilitated construction of a component of the battery to

measure a person’s frame of reference for PRO assessment. Substantively it captures the

individuals status in broadly areas of concern that are common to most people. Then it also

captures more particular concerns that nonetheless have an important influence on an

individual’s appraisal of a PRO.

Sampling Experiences: Rapkin and Schwartz asked participants to reflect on what they

were thinking about during a PRO measure that they had just completed. Five dimensions were

21

identified: Thinking about Implications of Illness and Impact on Others; Reacting to Recent

Flare Ups; Trying to Focus on Positive Experiences; Managing Communications in the

Interview; and Thinking of Things that were Raised by the Interview.

Standards of Comparison: The battery assessed individuals’ standard of comparison by

asking them to indicate the kinds of situations or people they were thinking about in rating

quality of life questions. Three components were identified: Comparing Oneself to Other

People; Social Norms and Expectations; and One’s Ideal.

Combinatory Algorithm: In order to determine how people formulated their responses

to HRQOL questions, patients are asked to consider a series of 16 items, presented as paired

contrasts. For example, we asked people to indicate whether they were thinking about ―how well

you’ve been doing, how hard it has been, both or neither?‖ when they responded to HRQOL

questions. Principal components analysis of these items yielded seven summary dimensions,

including a focus on Negative Experiences and Feelings; Positive Feelings and Successes;

Change versus Stability; Unfinished versus Finished concerns; Others’ Concerns versus Your

Own; Recent versus Long Standing Concerns; and Independence versus Dependence on Others.

Multidimensional IRT

The advantages of item response theory (IRT) over classical test theory (CTT) in health

outcomes measurement have been demonstrated by a recent wave of applications.22

However, it

is evident that if response shift occurs, or if appraisal information is to somehow be included in

QOL or PRO measures, the construct is no longer unidimensinal (that is, having one latent trait).

As mentioned earlier, in unidimensional IRT, a single latent trait is presumed to be the

underlying construct that the observed items are measuring. Multidimensional IRT relaxes this

22

restriction so that the observed items are influenced by multiple latent variables.23

In

longitudinal settings, multidimensional IRT models are particularly useful to explore the stability

of the measurement structure over time; specifically multidimensional IRT models can detect

response shift occurrences.

For example, consider a PRO construct such as depression that is being assessed at two

time points (perhaps before and after treatment) for a group of respondents. For a variety of the

reasons mentioned above, the relationship between the items and the latent construct may change

over time. From a psychometric perspective, this is a classical longitudinal measurement

invariance issue. To model and test invariance, we require an IRT equivalent of the longitudinal

factor analysis model.72

In this model, the unidimensional construct at the two time points are

represented as two correlated factors, each measured by the time-specific item responses. The

fact that the same items are used twice leads to residual dependence that can be modeled by

including additional latent variables that are defined by item doublets across time.

Figure 1 uses path diagramming conventions to show the general structure of this model

for three items and two occasions. The observed items are represented as rectangles and the

1 2

Figure 1: Path diagram for multidimensional longitudinal IRT model

23

latent variables are circles. A longitudinal multidimensional IRT model was considered, for

example, by Hill (2006) for dichotomous responses, and by te Marvelde, Glas, & van Damme

(2006), with some restrictions, for ordinal responses.36,70

These multidimensional IRT models

can be used to test a variety of longitudinal measurement invariance hypotheses by manipulating

the constraints on the item parameters. Under appropriate circumstances, this model allows the

researcher to tease apart the observed change into two components: change due to response shift

and true change in the level and variability of the latent construct.

However, in the above scenario, response shift is still seen as a type of nuisance, and not

as an intrinsic part of measurement for PROs. A strong possibility for future research is to

incorporate appraisal information into multidimensional IRT models to control for and limit the

effects of response shift. This could enhance longitudinal measurement of PROs and facilitate

the understanding of true change. Other possible modeling techniques to incorporate appraisal

information and adjust for any effects from the appraisal variables may exist as well.

Incorporating the appraisal information as part of a PRO assessment is a challenging but

important task.

While the basic principles of multidimensional IRT are straightforward, their application

has been hampered by challenging computational issues in parameter estimation. For instance,

in the model depicted above, the number of latent variables is equal to the number of time-points

(two) plus the number of items (three). Fitting a 5-dimensional IRT model to data requires a

non-trivial amount of computation. In particular, obtaining maximum likelihood estimates of the

model parameters requires the numerical approximation of high-dimensional integrals, which

leads to an exponentially increasing computational burden. Recent computational advances in

adaptive quadrature,59

Markov chain Monte Carlo methods,6 and stochastic approximation

13,14

24

are poised to resolve the computational challenges and make multidimensional IRT practical for

health outcomes research.

Emerging Methods for Defining MIDs

New methods have been developed to define MID or CMID values. Emerging methods

include approaches incorporating clinical or patient-based judgment. For example, one approach

requires that a panel of healthcare professionals determine a ―consensus‖ value for MID based on

a series of appraisals using their clinical experience and standard-setting techniques.34

This

approach can also be applied using patient panelists as the experts on, for example, the amount of

change needed to impact daily activities or health-related quality of life. (Patient queries of this

nature can be included during the cognitive interview stage of instrument development.) Another

approach combines clinical judgment and ROC curves to identify the unit of change on the PRO

that best predicts clinical judgment of the minimal important change.73

Finally, a more recent

idea, still in the early stages of development, uses item response theory to compute change in the

PRO measure on the standardized latent construct scale and builds upon the anchor-based

methods by identifying the minimal value in terms of the minimally identified change on the

latent construct or theta.45

The Role of Gadgets

Interactive Voice Response: In addition to methodological developments that can

enhance measuring and detecting change in PRO scores, new technological developments can

also facilitate the measurement of change in PROs. Collecting PRO data such as symptom data

from patients is an ongoing challenge for both clinicians and researchers. A major problem is

tracking PROs in general and symptoms in particular between clinic visits when patients are at

25

home and may be reluctant to report that symptoms are getting worse. Given that memory of

pain and other symptoms is often poor, the real time assessment of symptoms can provide

accurate data regarding symptom patterns and changes over time.

Developments in computer and communications technology offer new opportunities for

the assessment of PROs, allowing great flexibility in the collection of data regardless of time or

venue. However, not all patients are comfortable using hand-held computers or other small

automated devices. In addition, patients have to remember to carry and use the devices every day

and to bring or transmit the symptom data to their health care providers.

The development of telephone interactive voice response (IVR) technology provides an

exciting option for two-way communication with the provider that is acceptable to most patients.

Telephone systems have been widely used in outpatient health care settings for communicating

with patients, identifying symptoms that need medical attention, and following patients after

treatment.41

However, traditional telephone communication requires considerable staff time and

is not feasible for assessing PROs on a regular basis. Using IVR technology that combines touch

tone telephones with computers and the internet is an effective way to follow patients who have

PROs such as pain that need to be monitored closely while away from the clinic or hospital. A

patient can respond to spoken instructions by using the keypad of a touch tone phone. For

example, a patient might be asked to rate his/her pain at its worst in the past day from 0 (no pain)

to 10 (pain as bad as can be imagine). Information obtained in this way can be used to update a

patient file on an Internet or Intranet site. In addition, the system can be configured to alert

physicians and other health care providers as necessary.

IVR systems have been shown to be successful tools for assessment and for clinical

interventions. Studies show success assessing depression40

and also facilitating the treatment of

26

other chronic health problems.48,52

The computer versions of scales provide data that are

equivalent to interview scales or self-administered scales.1 Patients have been able to accurately

and reliably use an IVR system designed to assess the special needs of cancer patients receiving

chemotherapy.65

In summary, an IVR system provides reliable and valid data. Accurate and regular

symptom assessment, with data provided to the patients’ physicians, facilitates symptom

management and as a consequence results in the patients’ accepting its use and being motivated

to respond. When coupled with the flexibility of the system for data collection, tie IVR system

should also minimize missing data, especially in longitudinal studies.

Computerized Testing: The widespread use of computers in clinics, and the

development of portable computing systems have introduced computerized testing for PROs.

These developments allow us to administer PRO instruments onl a local computer or over the

Internet, but and have also allowed us to develop and employ adaptive versions of PRO

instruments, known as computerized adaptive testing instruments (CATs).3,7,8,16,18,19,27,28,56,62

CATs reduce the respondent burden of questionnaires and maintain the precision of the

PRO score because they adaptively select items targeted for the individual’s current experience.

CATs work because they are associated with large item banks that cover a range of the PRO of

interest. Modern item response theory models are used to characterize the items in the bank and

mathematical algorithms dictate how to select items from the bank based on these models. For

example, a person experiencing no depressive symptoms might be administered an item such as

―I feel sad‖ but then would likely not be administered a more extreme item such as ―I feel so sad

27

I have thought of harming myself.‖ Furthermore, the patient interface is flexible; for example, a

CAT can be administered using an IVR system. By reducing respondent burden, CATs have the

potential to increase adherence and reduce missing data.

Creative Item Types: Computer-based administration of questionnaires and tests

supports the use of creative multimedia formats such as video and electronic graphics and games.

Although still relatively untapped for the assessment of PROs, such creative formats are in use as

teaching aids, and are being introduced into computer based educational assessments.3,17

An

early study involving health states found that subjects better understood the conditions being

described when multimedia formats were used rather than traditional paper-and-pencil formats.30

As a result of that study, a computer-based program to elicit preferences for a particular PRO

(activities in daily living) has been developed. The program uses videos, pictures and other

multimedia presentations via computer to familiarize respondents with health states that may be

unfamiliar to them. That program has found some success, but this program does not measure

PROs directly31,66

. Using multimedia formats in an attempt to anchor PRO assessment in a

similar way may have similar results, reducing some response shift and other phenomenon that

interfere with longitudinal measurement of change for PROs.

6. FUTURE DIRECTIONS

Identifying true change in PROs is challenging because measuring PROs is intrinsically

difficult. The process of self-report is complex and there are many factors such as the subjective

appraisal variables mentioned above that can influence self-report scores.54,63,69

Furthermore,

28

some of the PRO constructs, such as quality of life are also highly complex, and to some extent

subjective to an individual’s values and experiences.

Although substantial advancement has been made to create reliable, valid and responsive

instruments for measuring PROs, determining how to best measure PROs is a process that is still

under development. This is natural; all measures develop. Historically, even well-defined

physical measures such as length had their developmental periods. For example, what is now

the standard ―foot‖ in the US and English metric system was not once a variable and uncertain

measure. In ancient times, measures of length were based on body parts;12

the traditional belief

being that a ―foot‖ of length derived from humans measuring distance by the equivalent number

of their own feet that would cover the length in question. Of course individual feet vary in

length, and traditional accounts relay that as a first attempt to standardize the measure of length

in medieval times, the length of the King’s foot was declared the standard ―foot.‖ However, this

was still problematic because the measure changed with every new king. Finally, a unified, non-

changing standard was applied and we have the foot defined as it is today.

The historical accuracy of the details of this process is questionable; nevertheless there

are documented cases in which the definition of a measure changed as it progressed to what we

understand it to be today. As another example, an early version of a ―mile‖ measured 5000 feet

rather than the current standard of 5280 feet.

The same process of working toward a unified standard is underway among researchers

who assess PROs. Several measures for PROs exist that are widely used, valid and reliable. For

example, measures used to assess quality of life (QOL) include the Functional Assessment of

Chronic Illnesses Therapy suite of measures, ( http://www.facit.org) and general health outcome

measures such as the short-form 36 health survey questionnaire (SF-36).75

Furthermore, new

29

measures are still being developed such as the Patient Reported Outcome Measurement

Information System (PROMIS) tools (http://www.nihpromis.org). In many cases there is

substantial overlap between the measures (i.e. more than one measure targets a specific PRO

such as depression). However with the recent efforts of the PROMIS initiative and heavy

reliance on modern IRT methods researchers are attempting large-scale standardization across

the different PRO measurement scales. ―Crosswalks‖ and other methods of transforming one

instrument’s scale onto the PROMIS measurement scales are currently being developed15

.

Most PRO measures do not yet account for the appraisal factors that continue to affect

how people report their experience. Developing measures to incorporate appraisal parameters

can greatly reduce the obfuscating effect of a response shift and concomitantly increase

measurement precision. However, despite the conceptual advantages offered by the appraisal

model, incorporating appraisal data presents problems. Thus far, approaches to assess appraisal

involve qualitative data that require considerable coding to use in a quantitative analysis. Indeed,

Bloem suggests that appraisal parameters are best assessed for each individual item. Although

reliable, coding schemes have yielded dozens of variables which complicate the analysis. A

more parsimonious system to describe appraisal parameters or the identification of overarching

response states would be most valuable.

Further, it is as yet unclear how to reconcile these different approaches to the

measurement of change in PROs. Appraisal parameters necessarily come into play when

individuals respond to items. Item evaluation and selection according to item response theory

assumptions must then account for and reflect appraisal processes. It may be that current IRT

item selection in some way constrains appraisal; for example, perhaps only the items that invoke

a consistent frame of reference across people demonstrate the necessary properties for inclusion.

30

If so, it is necessary to ask what frame of reference we are unintentionally imposing by excluding

conceptually-relevant items with unacceptable item performance curves or high differential item

functioning across diverse groups.

A second possibility is that the evaluation of items may be relatively independent of

appraisal. For example, at a given time of measurement, individuals may respond to items such

as ―I feel healthy,‖ ―I am physically fit‖, and ―I am able to perform all of my activities without

difficulty‖ using a common (enough) set of appraisal criteria that these items will almost always

correlate highly with one another. That would likely be sufficient for such items to hang

together in IRT analysis to jointly estimate latent traits and item characteristics. However, the

fact that these items tend to co-vary because responses reflect similar underlying appraisal

parameters may not mean that the appraisal is ignorable at all times for these items. Different

individuals may still appraise these items differently, and an individual’s appraisal may still

change from time to time. The score on the latent variable may provide an accurate estimate of

the individual’s level on a PRO variable, but it provides no information concerning the different

cognitive pathways the individual followed at each assessment to make his or her ratings.

In order to capture true change in PROs, it will be necessary to carry out research that

brings together concepts from IRT and the contingent true score model. For example, as

previously mentioned, it may be that appraisal parameters can be more adequately captured in

models that permit multidimensional latent variables, such as with multidimensional IRT

models. Removing ambiguous or volatile items identified through IRT may help to reduce

response shifts. Furthermore, using IRT and related computer adaptive testing approaches to

assess appraisal itself may be useful in better quantifying and minimizing the burden associated

with the assessment of appraisal parameters. Incorporating appraisal information into a purely

31

psychometric treatment of items in IRT might guide decisions about item selection and about the

underlying dimensionality of the latent traits.

Another possible approach might be to adapt PRO measures to include new items or new

collection techniques that can anchor some of the important appraisal variables. Perhaps in

specific settings such as during a chemotherapy treatment regimen, there are ways to develop

items that fix the appraisal variables at specific levels such that they remain consistent for

patients across the treatment period.

In addition to developing methods and tools to control and account for response shift,

additional work in interpreting change values is required. Despite notable developments in our

understanding of minimal important change, MID/MCID, and responsiveness, there are many

issues that require further investigation. First, in current practice, units that have been defined at

the patient level are used to make both patient-level and group-level judgments. However,

smaller group-level changes or smaller differences in means between groups may equate to

similar treatment impacts as indicated by patient-level classifications based on an MID. Using

Fisher’s exact test to compare the classifications of patients into groups who ―achieved MID‖

versus those who ―did not achieve MID‖ by treatment is not equivalent to testing whether the

means of the treatment groups are statistically different and the amount of difference is at least

one MID unit.

Second, most anchor-based approaches use only a portion of the available data (e.g., data

for patients who respond ―slightly better‖ or ―slightly better/slightly worse‖ on an anchor). Data

from this sometimes very small subset are used to estimate the MID, while the remaining data

are ignored or simply checked to ensure that the pattern of change based on response to the

anchor matches the expected change on the PRO scale. Future research could identify ways to

32

incorporate all the data from anchor-based methods to define MIDs or ways of interpreting

change.

Third, MIDs or CMIDs are applied uniformly along the entire score range. This may not

be reasonable for all settings. Under certain circumstances it may be reasonable that MIDs vary

according to disease severity or some other characteristic—minimal important changes may

depend on where a patient begins on the concept measured (similar to an ANCOVA-type

argument). In addition, the measurement precision of the instrument should also be taken into

account when defining an MID. For example, an MID should not be identified and then applied

at the patient level if it is less than the standard error of measurement.

Finally, very little work has been done related to the contingent true score model and the

effect adopting this model has on developing MID estimates. How to define or interpret MID

under the contingent true score model, how this can influence a response shift, and the effect a

response shift might or might not have on a MID developed using the contingent true score

model are still open areas of research.

Much progress has been made in measuring and analyzing PROs longitudinally, and new

cutting-edge methods are being developed; nevertheless, there is plenty of room for

improvement. Interestingly, the problems of non-ignorable missing data and response shift have

some underlying similarities. Both are situations in which ancillary data improve the likelihood

that the estimation of change is closer to the true change. The challenge in both areas is to

identify the ancillary variables through research and to prospectively incorporate them into

clinical investigations. Rich areas of future research involve continued investigation of appraisal

variables, new methods to identify MID or CMID values, ways to determine the responsiveness

33

of PRO measures, methods incorporating ancillary data to improve estimates in the presence of

non-ignorable missing data, and ways to reduce missing data.

34

References

1. Agel, J., Rockwood, T., Mundt, J. C., Greist, J. H., & Swiontkowski, M. (2001). Comparison of interactive voice response and written self-administered patient surveys for clinical research. Orthopedics, 24(12), 1155-1157.

2. Ahmed, S., & Schwartz, C. E. (2009 in press). Quality of life assessments and response shift. In A. Steptoe (Ed.), Handbook of Behaviioral Medicine: Methods and Applications. New York, NY: Springer Science

3. Basu, A., Cheng, I., Prasad, M., & Rao, G. (2007). Multimedia Adaptive Computer based Testing: An Overview. Paper presented at the IEEE International Conference on Multimedia and Expo (ICME).

4. Bauer, D. J. (2003). Estimating multilevel linear models as structural equation models. Journal of Educational and Behavioral Statistics, 28, 135-167.

5. Beaton, D. E., Bombardier, C., Katz, J. N., & Wright, J. G. (2001). A taxonomy for responsiveness. Journal of Clinical Epidemilology, 54(12), 1204-1217.

6. Béguin, A. A., & Glas, C. A. W. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66, 541-561.

7. Bergstrom, B. A. (1996). Computerized adaptive testing for the national certification examination. Aana J, 64(2), 119-124.

8. Bjorner, J. B., Chang, C. H., Thissen, D., & Reeve, B. B. (2007). Developing tailored instruments: item banking and computerized adaptive assessment. Quality of Life Research, 16 Suppl 1, 95-108.

9. Bloem, E. F., van Zuuren, F. J., Koeneman, M. A., Rapkin, B. D., Visser, M. R. M., Koning, C. C. E., et al. (2008). Clarifying quality of life assessment: do theoretical models capture the underlying cognitive processes? Quality of life research : an international journal of quality of life aspects of treatment, care and rehabilitation, 17(8), 1093-1102.

10. Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley. 11. Bollen, K. A., & Curran, P. J. (2006). Latent curve models: A structural equation approach.

Hoboken, NJ: John Wiley & Sons. 12. . Brief History of Measurement Systems with a Chart of the Modernized Metric System (1974).

from NBS Special Publication 304A. 13. Cai, L. (in press a). High-dimensional exploratory item factor analysis by a Metropolis-Hastings

Robbins-Monro algorithm. Psychometrika. 14. Cai, L. (in press b). Metropolis-Hastings Robbins-Monro algorithm for confirmatory item factor

analysis. Journal of Educational and Behavioral Statistics. 15. Cella, D., Lai, J. S., Garcia, S. F., Reeve, B. B., Weinfurt, K. P., George, J., et al. (2008). The patient

reported outcomes measurement information system-Cancer (PROMIS-Ca): Cancer -specific appplication of a generic fatigue measure. Journal of Clinical Oncology, 26(15S), 6537.

16. Chang, C.-H. (2007). Patient-reported outcomes measurement and management with innovative methodologies and technologies. Quality of Life Research, 16, 157-166.

17. Cheng, I., & Bischof, W. F. (2007). Multimedia Item Type Design for Assessing Humawn Cognitive Skills. Paper presented at the IEEE International Conference on Multimedia and Expo (ICME)

35

18. Chien, T. W., Wu, H. M., Wang, W. C., Castillo, R. V., & Chou, W. (2009). Reduction in patient burdens with graphical computerized adaptive testing on the ADL scale: Tool development and simulation. Health and Quality of Life Outcomes, 7, 1-6. Retrieved from <Go to ISI>://000266421700001. doi:10.1186/1477-7525-7-39

19. Choi, S. W., & Swartz, R. J. (2009). Comparison of CAT item selection criteria for polytomous items. Applied Psychological Measurement, 33(6), 419-440.

20. Cronbach, L. I., & Furby, L. (1970). How we should measure "change" - or should we? Psychological Bulletin, 74(1), 68-80.

21. Demidenko, E. (2004). Mixed models: Theory and applications. Hoboken, NJ: John Wiley & Sons. 22. Edelen, M. O., Reeve, B. B., Edelen, M. O., & Reeve, B. B. (2007). Applying item response theory

(IRT) modeling to questionnaire development, evaluation, and refinement. [Validation Studies]. Quality of Life Research, 16 Suppl 1, 5-18.

23. Edwards, M. C., & Edelen, M. O. (2009). Speical topics in item response theory. In R. E. Millsap & A. Maydeu-Olivares (Eds.), Handbook of quantitative methods in psychology (pp. 178-198). New York, NY: Sage.

24. Fairclough, D. L. (2010). Design and analysis of quality of life studies in clinical trials : interdisciplinary statistics (2nd ed.). Boca Raton, FL: Chapman & Hall/CRC Press.

25. Fairclough, D. L., Thijs, H., Huang, I. C., Finnern, H. W., & Wu, A. W. (2008). Handling missing quality of life data in HIV clinical trials: what is practical? Quality of life research : an international journal of quality of life aspects of treatment, care and rehabilitation, 17(1), 61-73.

26. Fayers, P. M., & Hays, R. D. (2005). Assessing quality of life in clinical trials : methods and practice (2nd ed.). Oxford: Oxford University Press.

27. Fields, F. A. (1992). Computerized adaptive testing for NCLEX-PN. J Pract Nurs, 42(2), 8-10. 28. Fliege, H., Becker, J., Walter, O. B., Bjorner, J. B., Klapp, B. F., & Rose, M. (2005). Development of

a computer-adaptive test for depression (D-CAT). Quality of Life Research, 14(10), 2277-2291. 29. Garcia, S. F., Cella, D., Clauser, S. B., Flynn, K. E., Lad, T., Lai, J. S., et al. (2007). Standardizing

patient-reported outcomes assessment in cancer clinical trials: a patient-reported outcomes measurement information system initiative.[erratum appears in J Clin Oncol. 2008 Feb 20;26(6):1018 Note: Lad, Thomas [added]]. Journal of Clinical Oncology, 25(32), 5106-5112.

30. Goldstein, M. K., Clarke, A. E., Michelson, D., Garber, A. M., Bergen, M. R., & Lenert, L. A. (1994). Developing and Testing a Multimedia Presentation of a Health-State Description. Medical Decision Making, 14(4), 336-344.

31. Goldstein, M. K., Miller, D. E., Davies, S., & Garber, A. M. (2002). Quality of Life Assessment Software for Computer-Inexperienced Older Adults: Multimedia Utility Elicitation for Activities of Daily Living. Paper presented at the American Medical Informatics Association Annual Symposia.

32. Guyatt, G. H., Juniper, E. F., Walter, S. D., Griffith, L. E., & Goldstein, R. S. (1998). Interpreting treatment effects in randomised trials. British Medical Journal, 316, 690-693.

33. Guyatt, G. H., Walter, S. D., & Norman, G. R. (1987). Measuring change over time: Assessing the usefulness of evaluative instruments. Journal of Chronic Diseases, 40, 171-178.

34. Harding, G., Leidy, N. K., Meddis, D., Kleinman, L., Wagner, S., & O'Brien, C. (2009). Interpreting clinical trial results of patient-perceived onset of effect in asthma: methods and results of a Delphi panel. Current Medical Research and Opinion, 25(6), 1563-1571.

35. Hedeker, D. R., & Gibbons, R. D. (2006). Longitudinal data analysis. Hoboken, NJ: John Wiley & Sons.

36. Hill, C. D. (2006). Two models for longitudinal item response data. Unpublished Unpublished doctoral dissertation, University of North Carolina - Chapel Hill.

37. Hogan, J. W., Roy, J., & Korkontzelou, C. (2004). Handling drop-out in longitudinal studies. Statistics in Medicine, 23(9), 1455-1497.

36

38. Jaeschke, R., Singer, J. D., & Guyatt, G. H. (1989). Measurement of health status: Ascertaining the minimal clinically important difference Controlled Clinical Trials, 10, 407-415.

39. Juniper, E. F., Guyatt, G. H., Willan, A., & Griffith, L. E. (1994). Determining a minimal important change in a disease-specific quality of life questionnaire. Journal of Clinical Epidemilology, 47(1), 81-87.

40. Kobak, K. A., Greist, J. H., Jefferson, J. W., Mundt, J. C., & Katzelnick, D. J. (1999). Computerized assessment of depression and anxiety over the telephone using interactive voice response. MD Computing, 16(3), 64-68.

41. Korcz, I. R., & Moreland, S. (1998). Telephone prescreening enhancing a model for proactive healthcare practice. Cancer Practice, 6(5), 270-275.

42. Li, Y., & Rapkin, B. (2009). Classification and regression tree uncovered hierarchy of psychosocial determinants underlying quality-of-life response shift in HIV/AIDS. Journal of clinical epidemiology, 62(11), 1138-1147.

43. Little, R. J., & Rubin, D. B. (2002). Statistical anaalysis with missing data (2nd ed.). New York, NY: Wiley.

44. MacCallum, R. C., Kim, C., Marlarkey, W. B., & Kiecolt-Glaser, J. K. (1997). Studying multivariate change using multilevel models and latent curve models. Multivariate Behavioral Research, 32, 215-253.

45. McLeod, L. D., Nelson, L., & Lewis, C. Personal Communication July 1, 2009. 46. Michiels, B., Molenberghs, G., Bijnens, L., Vangeneugden, T., & Thijs, H. (2002). Selection models

and pattern-mixture models to analyse longitudinal quality of life data subject to drop-out. Statistics in Medicine, 21(8), 1023-1041.

47. Muller, K. E., & Stewart, P. W. (2006). Linear model theory: Univariate, multivariate, and mixed models. Hoboken, NJ: John WIley & Sons.

48. Mundt, J. C., Kaplan, D. A., & Greist, J. H. (2001). Meeting the need for public education about dementia. Alzheimer Disease & Associated Disorders, 15(1), 26-30.

49. Norman, G. R., Sloan, J., & Wyrwich, K. W. (2003). Interpretation of changes in health-related quality-of-life. Medical Care, 4, 582-592.

50. Norman, G. R., Sridhar, F. G., Guyatt, G. H., & Walter, S. D. (2001). Relation of distribution- and anchor-based approaches in interpretation of changes in helath related quality of life. Medical Care, 29, 1039-1047.

51. Norman, G. R., Wyrwich, K. W., & Patrick, D. L. (2007). The mathematical relationship among different forms of responsiveness coefficients. Quality of life research: an international journal of quality of life aspects of treatment, care and rehabilitation, 16(5), 815-822.

52. Piette, J. D. (2000). Interactive voice reponse systems in the diagnosis and management of chronic disease. American Journal of Managed Care, 6(7), 817-827.

53. Rapkin, B. D. (2000). Personal goals and response shifts: Understanding the impact of illness and events on the quality of life of people living with AIDS. In C. E. Schwartz & M. A. G. Sprangers (Eds.), Adaptation to changing health: Response shift in quality-of-life research (1st ed.). Washington, DC: American Psychological Association.

54. Rapkin, B. D., & Schwartz, C. E. (2004). Toward a theoretical model of quality-of-life appraisal: Implications of findings from studies of response shift. Health and Quality of Life Outcomes, 2, 14.

55. Raudenbush, S. W., & Byrk, A. S. (2002). Heirarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage.

56. Reeve, B. B., Burke, L. B., Chiang, Y. P., Clauser, S. B., Colpe, L. J., Elias, J. W., et al. (2007). Enhancing measurement in health outcomes research supported by Agencies within the US Department of Health and Human Services. Quality of Life Research, 16 Suppl 1, 175-186.

37

57. Revicki, D., Hays, R. D., Cella, D., & Sloan, J. (2008). Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. Journal of Clinical Epidemilology, 61(2), 102-109.

58. Rose, M., Bjorner, J. B., Becker, J., Fries, J. F., & Ware, J. E. (2008). Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS). J Clin Epidemiol, 61(1), 17-33.

59. Schilling, S., & Bock, R. D. (2005). High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70, 533-555.

60. Schluchter, M. D. (1992). Methods for the analysis of informatively censored longitudinal data. Statistics in Medicine, 11(14-15), 1861-1870.

61. Schluchter, M. D., Greene, T., & Beck, G. J. (2001). Analysis of change in the presence of informative censoring: application to a longitudinal clinical trial of progressive renal disease. Statistics in Medicine, 20(7), 989-1007.

62. Schmitz, N., Hartkamp, N., Brinschwitz, C., & Michalek, S. (1999). Computerized administration of the Symptom Checklist (SCL-90-R) and the Inventory of Interpersonal Problems (IIP-C) in psychosomatic outpatients. Psychiatry Research, 87(2-3), 217-221.

63. Schwartz, C. E., & Rapkin, B. D. (2004). Reconsidering the psychometrics of quality of life assessment in light of response shift and appraisal. Health and Quality of Life Outcomes, 2, 16.

64. Schwartz, C. E., & Sprangers, M. A. (1999). Methodological approaches for assessing response shift in longitudinal health-related quality-of-life research. Social science & medicine (1982), 48(11), 1531-1548.

65. Siegel, K., Mesagno, F. P., Chen, J. Y., Klein, L., Bowles, M. E., McKenna, M., et al. (1988). Computerized telephone assessment of the "concrete" needs of chemotherapy outpatients: a feasibility study. Journal of Clinical Oncology, 6(11), 1760-1767.

66. Sims, T. L., Garber, A. M., Miller, D. E., Mahlow, P. T., Bravata, D. M., & Goldstein, M. K. (2005). Multimedia Quality of Life Assessment: Advances with FLAIR. Paper presented at the American Medical Informatics Association Annual Symposia.

67. Sloan, J., Cella, D., & Hays, R. D. (2005). Clinical significance of patient-reported questionnaire data: another step towards consensus. Journal of Clinical Epidemilology, 58, 1217-1219.

68. Sprangers, M. A., & Schwartz, C. E. (1999). Integrating response shift into health-related quality of life research: a theoretical model. Social science & medicine (1982), 48(11), 1507-1515.

69. Stone, A. A., Turkkan, J. S., Bachrach, C. A., Jobe, J. B., Kurtzman, H. S., & Cain, V. S. (Eds.). (2000). The Science of Self-Report: Implications for Research and Practice. Mahwah, NJ: Lawrence Erlbaum.

70. te Marvelde, J., Glas, C. A. W., Landeghem, G., & van Damme, J. (2006). Application of multidimensional item repsonse theory models to longitudinal data. Educational and Psychological Measurement 66(1), 5-34.

71. Thissen, D., Reeve, B. B., Bjorner, J. B., & Chang, C. H. (2007). Methodological issues for building item banks and computerized adaptive scales. Qual Life Res, 16 Suppl 1, 109-119.

72. Tisak, J., & Meredith, W. (1989). Exploratory longitudinal factoranalysis in multiple populations. Psychometrika, 54, 261 - 281.

73. Turner, D., H.J., S., Griffith, L. E., Beaton, D. E., M., G. A., Critch, J. N., et al. (2009). Using the entire cohort in the receiver operating characteristic analysis maximizes preicison of the minimal important difference. Journal of Clinical Epidemilology, 62(4), 374-379.

74. US Department of Health and Human Services. Food and Drug Administration (2006). Guidance for industry patient-reported outcome measures: use in mdeical product development to support labeling claims draft guidance. Retrieved February 12, 2009. from http://www.fda.gov/cder/guidance/5460dft.pdf.

38

75. Ware, J. E., Snow, K. K., Kosinski, M., & Gandek, B. (1997). SF-36 Health Survey: Manual and Interpretation Guide. Boston, MA: The Health Institute, New England Medical Center.

76. Wyrwich, K. W., & Tardino, V. M. (2006). Understanding global transition assessments. Quality of life research : an international journal of quality of life aspects of treatment, care and rehabilitation, 15(6), 995-1004.

77. Wyrwich, K. W., Tierney, W. M., & Wolinsky, F. D. (1999). Further evidence supporting an SEM-based criterion for identifying meaningful intra-individual changes in health-related quality of life. Journal of Clinical Epidemiology, 52(9), 861-873.

APPENDIX E – Workshop Participants Lists

For most of the SAMSI workshops, the participants will be summarized in three tables

below. The first table is a summary of all participants by gender, status, field of

work/study, affiliation, and location. The second table lists only the participants who

received support. The third table lists all workshop participants. The minority status of

each participant is available, but we do not include the information here because of

privacy issues; the summaries in Section H: Diversity Efforts were compiled from this

data.

The key top Status entry is as follows:

NRG – New Researcher or Graduate Student S – Students (Education & Outreach)

FP – Faculty or Professional A – Faculty (Education & Outreach)

2008-09 PROGRAM EVENTS AFTER AUGUST 1, 2009

Sequential Monte Carlo Methods

Participants Male Female Unspec-

ified Faculty/

Professional

New Researcher/

Student Stat Math Other

Supported 11 1 0 6 6 9 0 3

Unsuppted 19 7 0 10 16 24 1 1

SAMSI 3 0 0 1 2 2 0 1

Last Name First Name Gender Affiliation Major/Department Status

Airoldi Edoardo M Harvard University Statistics NRG

Bhadra Anindya M University of Michigan Statistics NRG

Carvalho Carlos M University of Chicago Statistics NRG

Doucet Arnaud M University British Columbia Stats & CS FP

Lin Xiaodong M Rutgers University Statistics FP

Lopes Hedibert M University of Chicago Other FP

Lynch Jim M University of South Carolina Statistics FP

Lyubimov Konstantin M University of Georgia Socl FP

Peters Gareth M University of New South Wales Stats & Math NRG

Petris Giovanni M University of Arkansas Statistics FP

Septier Francois M University of Lille Engg NRG

Wang Xiaopei F University of Cinncinati Statistics NRG

Sequential Monte Carlo Methods Transition Workshop

Workshop Participants

November 9-10, 2009

Last Name First Name Gender Affiliation Major/Department Status

Airoldi Edoardo M Harvard University Statistics NRG

Berger Jim M SAMSI Statistics FP

Bhadra Anindya M University of Michigan Statistics NRG

Carvalho Carlos M University of Chicago Statistics NRG

Chen Hao M Duke University Statistics NRG

Clyde Merlise F Duke University Statistics FP

Cornebise Julien M SAMSI Comp Sci NRG

Das Sourish M SAMSI Statistics NRG

Doucet Arnaud M University British Columbia Stats & CS FP

Dunson David M Duke University Statistics FP

Feng Xingdong M NISS Math NRG

Gerstoft Peter M

University of California, San Diego Phys FP

Godsill Simon M Cambridge University Statisitcs FP

Hannig Jan M University of North Carolina Statistics FP

Lenarcic Alan M Harvard University Statistics NRG

Lin Xiaodong M Rutgers University Statistics FP

Liu Bin M Duke University Statistics NRG

Lopes Hedibert M University of Chicago Other FP

Lynch Jim M University of South Carolina Statistics FP

Lyubimov Konstantin M University of Georgia Socl FP

Macaro Christian M Duke University Statistics NRG

Manolopoulou Ioanna F Duke University Statistics NRG

Mukherjee Chiranjit M Duke University Statistics NRG

Muthukumarana Saman M Simon Fraser University Statistics NRG

Pena Edsel M University of South Carolina Statistics FP

Peters Gareth M University of New South Wales Stats & Math NRG

Petralia Francesca F Duke University Statistics NRG

Petris Giovanni M University of Arkansas Statistics FP

Ratmann Oliver M Duke University Bio NRG

Rocha Guilherme M Indiana University Statistics NRG

Schott Sarah F Duke University Statisitcs NRG

Septier Francois M University of Lille Engg NRG

Shi Minghui F Duke University Statistics NRG

Tendick Patrick M Avaya Labs Research Statistics FP

Vidyashankar Anand M Cornell University Statistics FP

Wang Xia F NISS Statistics NRG

Wang Xiaopei F University of Cinncinati Statistics NRG

West Mike M Duke University Statistics FP

Wolpert Robert M Duke University Statistics FP

Wu Wensong F University of South Carolina Statistics NRG

Zou Jian M NISS Statistics NRG

2009-10 PROGRAM EVENTS THROUGH JULY 2010

Stochastic Dynamics

Stochastic Dynamics Opening Workshop

Participant Summary

August 30 – September 2, 2009

Participants Male Female Unspec-

ified Faculty/

Professional

New Researcher/

Student Stat Math Other

Number of Institutions

Represented

Number of States

Represented

Supported 27 7 0 17 17 6 27 1 26 13

Unsuppted 25 12 0 12 25 10 25 1 12 6

SAMSI 0 0 0 0 0 0 0 0 0 0

Last Name First Name Gender Affiliation Major/Department Status

Alber Mark M Notre Dame University Math FP

Amirdjanova Anna F University of Michigan Statistics FP

Anderson David M University of Wisconsin Math NRG

Baxendale Peter M

University of Southern California Math FP

Cai David M New York University Math FP

Capobianco Enrico M Polaris Science and Tech Park Statistics FP

Carreon Fernando M Unversity of Michigan Math NRG

Cerrai Sandra F Maryland Math FP

Denker Manfred M Pennsylvania State University Math FP

Elgart Vlad M Virginia Tech Statistics NRG

Forger Danny M University of Michigan Math NRG

Gentz Barbara F University of Bielefeld Math FP

Gordillo Luis M University of Puerto Rico Math NRG

Hooker Giles M Cornell University Statistics NRG

Karimi Mona F Texas A&M University Engg NRG

Katsoulakis Markos M Umass/Amherst Math NRG

Kim Eunjung F University of Notre Dame Math NRG

Kim Kunwoo M University of Illinois Math NRG

Koralov Leonid M University of Maryland Math FP

Kotelenez Peter M Case Western University Math FP

Kurtz Thomas G. M University of Wisconsin Math FP

Kuske Rachel F University of British Columbia Math FP

Lord Gabriel M Heriot Watt University Math FP

Lythe Grant M University of Leeds Math FP

Mubayi Anuj M University of Texas, Arlington Math NRG

Owhadi Houman M Cal Tech Math NRG

Popovic Lea F Concordia University Math NRG

Seadler Bradley M

Case Western Reserve University Math NRG

Stuart Andrew M Warwick University Math FP

Timofeyev Ilya M University of Houston Math FP

Wilkinson Darren M Newcastle Statistics FP

Zhang Zhigang M University of Houston Math NRG

Zhou Xiang M Princeton University Math NRG

Zhu Bin M University of Michigan Statistics NRG

Last Name First Name Gender Affiliation Major/Department

Alber Mark M Notre Dame University Math

Amirdjanova Anna F University of Michigan Statistics

Anderson David M University of Wisconsin Math

Bakhtin Yuri M Georgia Tech Math

Balachandran Prakash M Duke University Math

Banks H. Thomas M North Carolina State University Statistics

Bessaih Hakima F University of Wyoming Math

Bhamidi S Shankar M University of North Carolina Statistics

Baxendale Peter M University of Southern California Math

Cai David M New York University Math

Capobianco Enrico M Polaris Science and Tech Park Statistics

Carreon Fernando M Unversity of Michigan Math

Cerrai Sandra F Maryland Math

Chen Jing F Duke University Math

Deng Shaozhong M University of North Carolina, Charlotte Math

Denker Manfred M Pennsylvania State University Math

DeVille Lee M University of Illinois Math

Doucette Cheri F

Virginia Commonwealth University Math

Duan Jinqiao M Illinois Institute of Technology Math

Dukic Vanja F University of Chicago Statistics

Elgart Vlad M Virginia Tech Statistics

Forger Danny M University of Michigan Math

Gentz Barbara F University of Bielefeld Math

Gordillo Luis M University of Puerto Rico Math

Heller Martin M North Carolina State University Statistics

Herrera-Valdez Marco M

Arizona State University Life

Hoefer Mark M North Carolina State University Math

Hooker Giles M Cornell University Statistics

Hu Dan M New York University Math

Ji Chunlin M Duke University Statistics

Joshi Badal M Duke University Math

Kan Xingye F Illinois Institute of Technology Math

Karimi Mona F Texas A&M University Engg

Katsoulakis Markos M Umass/Amherst Math

Kim Eunjung F University of Notre Dame Math

Kim Kunwoo M University of Illinois Math

King Sarah F North Carolina State University Math

Koralov Leonid M University of Maryland Math

Kotelenez Peter M Case Western University Math

Kurtz Thomas G. M University of Wisconsin Math

Kuske Rachel F University of British Columbia Math

Layton Anita F Duke University Math

Lord Gabriel M Heriot Watt University Math

Lythe Grant M University of Leeds Math

Maggioni Mauro M Duke University Math

Mubayi Anuj M University of Texas, Arlington Math

Mucha Peter M University of North Carolina Math

Ng Hon Keung Tony M Southern Methodist University Statistics

Nolen James M Duke University Math

Owhadi Houman M Cal Tech Math

Pahlajani Chetan M

University of California, Santa Barbara Math

Popovic Lea F Concordia University Math

Reinhold Dominik M University of North Carolina Statistics

Schott Sarah F Duke University Statistics

Seadler Bradley M Case Western Reserve University Math

Sikorski Kajetan F Rensselaer Polytechnic Institute Math

Stuart Andrew M Warwick University Math

Thomas Rachel F Duke University Math

Timofeyev Ilya M University of Houston Math

Towne Warren M Rensselaer Polytechnic Institute Math

Vanden-Eijnden Eric M

Courant Institute, New York University Math

Wilkinson Darren M Newcastle Statistics

Xu Meng M University of Wyoming Math

Xu Xiaohua M Virginia Tech Bioinfo

Yang Hongxia F Duke University Statistics

Yang Yushu F University of Maryland Math

Zhang Zhigang M University of Houston Math

Zhou Douglas M New York University Math

Zhou Xiang M Princeton University Math

Zhu Bin M University of Michigan Statistics

Zhu Hongtu M University of North Carolina Biostats

Self Organization Workshop Participant Summary October 26-28, 2009

Participants Male Female Unspec-

ified Faculty/

Professional

New Researcher/

Student Stat Math Other

Number of Institutions

Represented

Number of States

Represented

Supported 11 8 0 14 5 1 14 4 16 9

Unsuppted 15 8 0 14 9 1 2 2 10 3

SAMSI 6 2 0 0 8 4 4 0 1 1

Self Organization Workshop Workshop Participant Support

October 26-28, 2009

Last Name First Name

Gender Affiliation Major/Department Status

Alber Mark M Notre Dame University Math FP

Balazs Anna F University of Pittsburgh

Chemical and Petroleum Engineering FP

Bearon Rachel F University of Liverpool

Mathematical Sciences FP

Berlyand Leonid M

Pennsylvania State University Mathematics FP

Bertozzi Andrea F

University of California, Los Angeles Mathematics FP

Canic Suncica F University of Houston Mathematics FP

Gollub Jerry M Haverford College Statistics FP

Gorb Yulia F Texas A&M University Math NRG

Graham Michael M

University of Wisconsin-Madison

Chemical and Biological Engineering FP

Gyrya Vitaliy M

Pennsylvania State University

Department of Mathematics NRG

Hilhorst Danielle F University of Paris Sud Math FP

Huang Yanghong M

University of California, Los Angeles Mathematics NRG

Laurent Thomas M

University of California, Los Angeles Math NRG

Mancini Simona F University of Orleans Math FP

Mao Yi F University of Tennessee

National Institute for Mathematical and Biological Synthesis FP

Pedley Tim M University of Cambridge DAMTP FP

Pertahme Benoit M University P. et M. Curie Laboratoire J.-L. Lions FP

Shelley Michael M New York University Math FP

Yamada Richard M University of Michigan Mathematics NRG

Self Organization Workshop Workshop Participants October 26-28, 2009

Last Name First Name Gender Affiliation Major/Department Status

Alber Mark M Notre Dame University Math FP

Aronson Igor M Argonne National Laboratory Eng & Math FP

Athreya Avanti F SAMSI/Duke University Mathematics NRG

Baker Ruth F University of Oxford Mathematical Institute NRG

Balazs Anna F University of Pittsburgh

Chemical and Petroleum Engineering FP

Bearon Rachel F University of Liverpool

Mathematical Sciences FP

Berlyand Leonid M Pennsylvania State University Mathematics FP

Bertozzi Andrea F

University of California, Los Angeles Mathematics FP

Budhiraja Amarjit M University of North Carolina Stat-OR FP

Canic Suncica F University of Houston Mathematics FP

Chertock Alina F North Carolina State University Mathematics FP

Decoene Astrid F University Paris Sud - 11

Département de Mathématiques d'Orsay NRG

Deng Shaozhong M University of North Carolina, Charlotte Math NRG

Diaz Oliver M SAMSI Math NRG

Gollub Jerry M Haverford College Statistics FP

González Carlos Manuel Mora M

Universidad de Concepción Math NRG

Gorb Yulia F Texas A&M University Math NRG

Graham Michael M University of Wisconsin-Madison

Chemical and Biological Engineering FP

Greenwood Priscilla F Arizona State Univ, SAMSI

Mathematics and Statistica FP

Gremaud Pierre M

SAMSI / North Carolina State University Math FP

Gyrya Vitaliy M Pennsylvania State University

Department of Mathematics NRG

Haider Mansoor M North Carolina State University Mathematics FP

Haines Brian M Pennsylvania State University Mathematics NRG

Hilhorst Danielle F University of Paris Sud Math FP

Horn Mary Ann F National Science Foundation Math FP

Huang Yanghong M

University of California, Los Angeles Mathematics NRG

Karpeev Dmitry M Argonne National Laboratory

Mathemacs and Computer Science FP

Kolba Tiffany F Duke University Mathematics NRG

Kramer Peter M Rensselaer Polytechnic Institute

Mathematical Sciences FP

Laurent Thomas M

University of California, Los Angeles Math NRG

Lubkin Sharon F North Carolina State University Math FP

Liu Jian-Guo M Duke University Physics and Mathematics FP

Mancini Simona F University of Orleans Math FP

Mao Yi F University of Tennessee

National Institute for Mathematical and Biological Synthesis FP

McSweeney John M SAMSI Statistics NRG

Minion Michael M SAMSI Math FP

Mucha Peter M University of North Carolina Mathematics FP

Nolen James M Duke University Mathematics NRG

Pedley Tim M University of Cambridge DAMTP FP

Pertahme Benoit M University P. et M. Curie

Laboratoire J.-L. Lions FP

Prost Jacques M ESPCI ParisTech/ Institut Curie General Direction FP

Rogers Bruce M SAMSI Math NRG

Shelley Michael M New York University Math FP

Sokolov Andrey M Princeton University

Ecology and Evolutionary Biology NRG

Srinivasan Ravi M SAMSI Statistics NRG

Sun Yi M SAMSI Statistics NRG

Thomas Rachel F Duke University Mathematics NRG

Wang Xueying F SAMSI Statistics NRG

Yamada Richard M University of Michigan Mathematics NRG

Zhang Lei M University of Bonn Mathematics NRG

Theory and Qualitative Behavior of Stochastic Dynamics Workshop Participant Summary February 8-10, 2010

Participants Male Female Unspec-

ified Faculty/

Professional

New Researcher/

Student Stat Math Other

Number of Institutions

Represented

Number of States

Represented

Supported 13 3 0 4 12 2 14 0 12 9

Unsuppted 16 6 0 14 8 2 19 0 25 3

SAMSI 6 1 0 0 7 1 6 0 1 1

Theory and Qualitative Behavior of Stochastic Dynamics Workshop

Workshop Participants Support February 8-10, 2010

Last Name First Name Gender Affiliation Major/Department Status

Bakhtin Yuri M Georgia Tech Math NRG

Baxendale Peter M

University of Southern California Math FP

Bou-Rabee Nawaf M New York University Math NRG

Elgart Vlad M Virginia Tech Statistics NRG

Gentz Barbara F University of Bielefeld Math FP

Kashan Nowrouzi Fariba F

Kentucky State University Statistics NRG

Kim Kunwoo M

University of Illinois, Urbana Champaign Math NRG

Li Xiao M

University of Illinois, Urbana Champaign Math NRG

Lu Jianfeng M New York University Math NRG

Molchanov Stas M

University of North Carolina, Charlotte Math & Stats FP

Mubayi Anuj M University of Texas Math NRG

Scheutzow Michael M

Technische Universitaet Berlin Math FP

Sowers Rich M

University of Illinois, Urbana Champaign Math NRG

Sun Xu M Illinois Institute of Technology Math NRG

Yin Mei F University of Arizona Math NRG

Zheng Zhi M University of Illinois Math NRG

Theory and Qualitative Behavior of Stochastic Dynamics Workshop Workshop Participants February 8-10, 2010

Last Name First Name Gender Affiliation Major/Department Status

Athreya Avanti F SAMSI / Duke University Math NRG

Bakhtin Yuri M Georgia Tech Math NRG

Balachandran Prakash M Duke University Mathematics NRG

Baxendale Peter M University of Southern California Math FP

Bogachev Leonid M Leeds Univeristy Math FP

Bou-Rabee Nawaf M New York University Math NRG

Budhiraja Amarjit M University of North Carolina Statistics FP

Cerrai Sandra F University of Maryland Math FP

Deshpande Amogh M North Carolina State University Math NRG

Diaz Espinosa Oliver M SAMSI Math NRG

Dolgopyat Dmitry M University of Maryland Math FP

Elgart Vlad M Virginia Tech Statistics NRG

Gentz Barbara F University of Bielefeld Math FP

Greenwood Priscilla F Arizona State University, SAMSI Math FP

Hamzi Boumediene M Duke University Mathematics FP

Kang Min F North Carolina State University Math FP

Kashan Nowrouzi Fariba F

Kentucky State University Statistics NRG

Kim Kunwoo M University of Illinois, Urbana Champaign Math NRG

Kolba Tiffany F Duke University Math NRG

Li Xiao M University of Illinois, Urbana Champaign Math NRG

Liu Jian-Guo M Duke University Physics and Mathematics FP

Liu Xin F University of North Carolina, Chapel Hill Statistics NRG

Lu Jianfeng M New York University Math NRG

Mattingly Jonathan M Duke University Math FP

McKinley Scott M SAMSI Math NRG

McSweeney John M SAMSI Statistics NRG

Molchanov Stas M University of North Carolina, Charlotte Math & Stats FP

Mora Carlos M Universidad de Concepción / SAMSI Math NRG

Mubayi Anuj M University of Texas, Arlington Math NRG

Nolen Jim M Duke University Math NRG

Novikov Alexei M Pennsylvania State Univerisity Math FP

Nualart David M University of Kansas Math FP

Pavliotis Greg M Imperial College, London Math FP

Pistone Giovanni M Politecnico di Torino Math FP

Ray-Bellet Luc M

University of Massachusetts, Amherst Math FP

Rios-Doria Daniel M SAMSI / University of Arizona Math NRG

Rogers Bruce M SAMSI Math NRG

Scheutzow Michael M Technische Universitaet Berlin Math FP

Sowers Rich M University of Illinois, Urbana Champaign Math NRG

Sun Xu M Illinois Institute of Technology Math NRG

Sun Yi M SAMSI / North Carolina State Univ Math NRG

Wang Xuan M University of North Carolina

Statistics & Operations Research NRG

Watkins Andrea F Duke University Math NRG

Yin Mei F University of Arizona Math NRG

Zheng Zhi M University of Illinois Math NRG

Molecular Workshop Participant Summary

April 15-17, 2010

Participants Male Female Unspec-

ified Faculty/

Professional

New Researcher/

Student Stat Math Other

Number of Institutions

Represented

Number of States

Represented

Supported 8 6 0 8 6 2 11 1 13 9

Unsuppted 15 4 0 8 11 8 11 0 11 3

SAMSI 4 2 0 0 6 2 4 0 1 1

Molecular Workshop

Workshop Participants Support April 15-17, 2010

Last Name First Name

Gender Affiliation Major/Department Status

Barreiro Andrea F University of Washington Math NRG

Betterton Meredith F

University of Colorado, Boulder Physics FP

Ditlevsen Susanne F University of Copenhagen Statistics NRG

Fuller Pamela F

Rensselaer Polytechnic Institute Math NRG

Hunter David M Penn State University Stats FP

Kinderleher David M

Carnegie Mellon University Math FP

Mubayi Anuj M

University of Texas, Arlington Math NRG

Nykamp Duane M University of Minnesota Math FP

Popovic Lea F Concordia University Math & Stats FP

Sacerdote Laura F University of Torino Math FP

Timofeyev Ilya M University of Houston Math FP

Volz Erik M University of Michigan Math NRG

Wang Hongyun M UC, Santa Cruz Math FP

Yamada Richard M University of Michigan Math NRG

Molecular Workshop Workshop Participants

April 15-17, 2010

Last Name First Name Gender Affiliation Major/Department Status

Arunajadai Srikesh M Columbia University Biostats NRG

Athreya Avanti F SAMSI/Duke University Math NRG

Barreiro Andrea F University of Washington Math NRG

Betterton Meredith F University of Colorado, Boulder Physics FP

Diaz Espinosa Oliver M SAMSI Math NRG

Ditlevsen Susanne F University of Copenhagen Statistics NRG

Falk Raymond M OptInference LLC Statistics FP

Fricks John M Penn State University Math FP

Fuller Pamela F Rensselaer Polytechnic Institute Math NRG

Greenwood Priscilla F Arizona State University, SAMSI Math FP

Hunter David M Penn State University Stats FP

Joshi Badal M Duke University Math NRG

Kinderleher David M Carnegie Mellon University Math FP

Kolba Tiffany F Duke University Math NRG

Kramer Peter M Rensselaer Polytechnic Institute Math FP

Lin Kevin M University of Arizona Math FP

Liu Xin F University of North Carolina Statistics NRG

Mattingly Jonathan M Duke University Math FP

McKinley Scott M Duke University Math NRG

McSweeney John M SAMSI Statistics NRG

Mora Gonzalez Carlos Manuel M SAMSI - Universidad de Concepcion Math NRG

Mubayi Anuj M University of Texas, Arlington Math NRG

Ng Hon Keung Tony M

Southern Methodist University Statistics FP

Nykamp Duane M University of Minnesota Math FP

Popovic Lea F Concordia University Math & Stats FP

Reinhold Dominik M University of North Carolina Statistics NRG

Rogers Bruce M SAMSI Math NRG

Sacerdote Laura F University of Torino Math FP

Schmindler Scott M Duke University Statistics NRG

Sikorski Kajetan M Rensselaer Polytechnic Institute Math NRG

Smith Charlie M North Carolina State University Statistics FP

Sun Yi M SAMSI & NCSU Math NRG

Timofeyev Ilya M University of Houston Math FP

Traud Amanda F UNC Math NRG

Volz Erik M University of Michigan Math NRG

Wang Hongyun M University of California, Santa Cruz Math FP

Wang Xueying F SAMSI Statistics NRG

Yamada Richard M University of Michigan Math NRG

Zhang Kurt M University of North Dakota Statistics NRG

Space-time Analysis for Environmental Mapping, Epidemiology and

Climate Change

Participants Male Female Unspec-

ified Faculty/

Professional

New Researcher/

Student Stat Math Other

Number of Institutions

Represented

Supported 21 19 0 21 19 37 0 3 29

Unsuppted 59 33 1 54 39 84 8 1 30

SAMSI 7 3 0 0 10 10 0 0 1

Last Name First Name Gender Affiliation Major/Department Status

Anderes Ethan M University of California Statistics NRG

Bandyopadhyay Soutir M Texas A&M University Statistics NRG

Barber Jarrett M University of Wyoming Statistics FP

Behrens Clifford M Telcordia Technologies, Inc. Socl NRG

Bell Michelle F Yale University Statistics NRG

Beltran Francisco M M

University of California, Santa Cruz Statistics NRG

Berhane Kiros M University of Southern California Statistics FP

Carlin Brad M University of Minnesota Statistics FP

Cooley Dan M Colorado State University Statistics FP

Cressie Noel M Ohio State University Statistics FP

Dean Charmaine F Simon Fraser University Statistics FP

Guindani Michele M University of New Mexico Statistics FP

He Zhuoqiong F University of Missouri Statistics FP

Herbei Radu M Ohio State University Statistics NRG

Hurtado Rua Sandra F University of Connecticut Statistics NRG

Jun Mikyoung F Texas A & M University Statistics FP

Kang Emily F Ohio State University Statistics NRG

Katzfuss Matthias M Ohio State University Statistics NRG

Keller Cherie F University of Florida Life FP

Konomi Bledar M Texas A&M University Statistics NRG

Lee Jaeyong M Duke University Statistics FP

Li Bo F Purdue University Statistics NRG

Liang Ye M University of Missouri Statistics NRG

Linkletter Crystal F Brown University Statistics NRG

Liu Ran F University of Connecticut Statistics NRG

Liu Desheng M Ohio State University Statistics NRG

López Quílez Antonio M Universitat de Valencia Statistics FP

Michalak Anna F University of Michigan Phys FP

Reyes Perla F

University of Wisconsin-Madison Statistics NRG

Richardson Sylvia F Imperial College London Statistics FP

Rue Havard M

Norwegian University of Science and Technology Statistics FP

Sang Huiyan F Texas A&M University Statistics NRG

Sanso Bruno M

University of California, Santa Cruz Statistics FP

Schmidt Alexandra F

Universidade Federal do Rio de Janeiro Statistics FP

Sheppard Lianne F University of Washington Statistics FP

Stein Michael M University of Chicago Statistics FP

Tuglus Catherine F

University of California, Berkeley Statistics NRG

van Lieshout Marie-Colette F

CWI & Eindhoven University of Technology Statistics FP

Wikle Chris M University of Missouri Statistics FP

Zhang Jing F Miami University Statistics NRG

Last Name First Name Gender Affiliation Major/Department

Anderes Ethan M University of California, Berkeley Statistics

Assuncao Renato M

Universidade Federal de Minas Gerais Statistics

Aune Erlend M

Norwegian University of Science & Technology Statistics

Bandyopadhyay Soutir M Texas A&M University Statistics

Banerjee Sudipto M University of Minnesota Statistics

Barber Jarrett M University of Wyoming Statistics

Bayarri Susie F University of Valencia Statistics

Behrens Clifford M Telcordia Technologies, Inc. Socl

Bell Michelle F Yale University Statistics

Beltran Francisco M M

University of California, Santa Cruz Statistics

Berger James M SAMSI Statistics

Berhane Kiros M University of Southern California Statistics

Berliner Mark M Ohio State University Statistics

Berrett Candace F Ohio State University Statistics

Berrocal Veronica F SAMSI Statistics

Bhat K Sham M Pennsylvania State University Statistics

Boehm Laura F North Carolina State University Statistics

Braverman Amy F NASA Statistics

Calder Kate F Ohio State University Statistics

Carlin Brad M University of Minnesota Statistics

Chakraborty Avishek M Duke University Statistics

Chang Howard M Johns Hopkins University Statistics

Chen Lisha F Yale University Statistics

Cheng Guang M Purdue University Statistics

Conesa David M Universitat De Valencia Statistics

Cooley Dan M Colorado State University Statistics

Cornebise Julien M SAMSI Statistics

Cressie Noel M Ohio State University Statistics

Crooks James M US Environmental Protection Agency Statistics

Das Sourish M SAMSI and Duke University Statistics

Dean Charmaine F Simon Fraser University Statistics

Dey Dipak M University of Connecticut Statistics

Dominici Francesca F Johns Hopkins Biostats

Dunson David M Duke University Statistics

Eidsvik Jo M

Norwegian University of Science & Technology Math

Erhardt Robert M University of North Carolina Statistics

Feng Xingdong M NISS Math

Ferreira Marco M University of Missouri, Columbia Statistics

Finley Andrew M Michigan State University Phys

Forte Deltell Anabel F Universitat de Valencia Statistics

Fox Emily F Duke University Statistics

Fuentes Montserrat F North Carolina State University Statistics

Gelfand Alan M Duke University Statistics

Ghosh Souprano M Duke University Statistics

Ghosh Sujit M North Carolina State University Statistics

Gieseke Morgan F North Carolina State University Statistics

Gomez-Rubio Virgilio M Universidad de Castilla-La Mancha Math

Greenwood Priscilla (Cindy) F SAMSI and Arizona State University Math

Guindani Michele M University of New Mexico Statistics

Haran Murali M Pennsylvania State University Statistics

Harlim John M North Carolina State University Math

He Zhuoqiong F University of Missouri Statistics

Held Leo M University of Zurich Statistics

Herbei Radu M Ohio State University Statistics

Herring Amy F University of North Carolina Biostats

Higdon Dave M LANL Statistics

Holan Scott M

University of Missouri

Statistics

Holland David M US EPA Statistics

Hooten Mevin M Utah State University Statistics

Hua Zhaowei F University of North Carolina Biostats

Huerta Gabriel M University of New Mexico Statistics

Hurtado Rua Sandra F University of Connecticut Statistics

Illian Janine F University of St. Andrews Statistics

Jackson Monica F American University Statistics

Jeon Soyoung F University of North Carolina Statistics

Johannesson Gardar M

Lawrence Livermore National Laboratory Statistics

Jun Mikyoung F Texas A & M University Statistics

Kafadar Karen F Indiana University Statistics

Kang Emily F Ohio State University Statistics

Katzfuss Matthias M Ohio State University Statistics

Kaufman Cari F University of California, Berkeley Statistics

Keller Cherie F University of Florida Life

Konomi Bledar M Texas A&M University Statistics

Kramer Peter M Rensselaer Polytechnic Institute Math

Lawson Andrew M Medical University of South Carolina Biostats

Lee Jaeyong M Duke University Statistics

Levine Michael M Purdue University Statistics

Li Bo F Purdue University Statistics

Li Linyuan M University of New Hampshire Statistics

Liang Ye M University of Missouri Statistics

Linder Ernst M University of New Hampshire Statistics

Linkletter Crystal F Brown University Statistics

Liu Ran F University of Connecticut Statistics

Liu Yajun F University of Missouri Statistics

Liu Desheng M Ohio State University Statistics

Lopes Danilo M Duke University Statistics

López Quílez Antonio M Universitat de Valencia Statistics

Mannshardt-Shamseldin Elizabeth F

SAMSI/Duke University Statistics

Martinez-Beneito Miguel A. M

Centro Superior de Investigación en Salud Pública Statistics

Matteson David M Cornell University Statistics

Michalak Anna F University of Michigan Phys

Micheas Athanasios M University of Missouri Statistics

Moller Jesper M Aalborg University Math

Nail Amy F U.S. Environmental Protection Agency Statistics

Nychka Doug M

National Center for Atmospheric Research Statistics

Paciorek Chris M University of California, Berkeley Statistics

Page Garritt M Duke University Statistics

Peng Roger M Johns Hopkins University Biostats

Pilz Juergen M University of Klagenfurt Statistics

Pomann Gina-Maria F North Carolina State University Statistics

Puggioni Gavino M University of North Carolina Statistics

Rajaratnam Balakanapathy M Stanford University Math

Rappold Ana F Environmental Protection Agency Statistics

Reich Brian M North Carolina State University Statistics

Reyes Perla F University of Wisconsin-Madison Statistics

Richardson Sylvia F Imperial College London Statistics

Rue Havard M

Norwegian University of Science and Technology Statistics

Sang Huiyan F Texas A&M University Statistics

Sanso Bruno M

University of California, Santa Cruz Statistics

Schmaltz Chester M University of Missouri Statistics

Schmidt Alexandra F

Universidade Federal do Rio de Janeiro Statistics

Scott Marian F University of Glasgow Statistics

Shaby Benjamin M SAMSI Statistics

Sheppard Lianne F University of Washington Statistics

Smith Leonard M London School of Economics Statistics

Smith Ron M Centre for Ecology and Hydrology Statistics

Smith Richard M University of North Carolina Statistics

Stein Michael M University of Chicago Statistics

Steinsland Ingelin F

Norwegian University of Science and Technology Statistics

Sun Dongchu M University of Missouri Statistics

Swall Jenise F U.S. Environmental Protection Agency Statistics

Tchumtchoua Sylvie F University of Connecticut Statistics

Tebaldi Claudia F UBC-Vancouver and Climate Central Statistics

Tingley Martin M SAMSI Statistics

Tuglus Catherine F University of California, Berkeley Statistics

van Lieshout Marie-Colette F

CWI & Eindhoven University of Technology Statistics

Waller Lance M Emory University Statistics

Wang Xia F NISS Statistics

Wang Xueying F SAMSI Statistics

Wikle Chris M University of Missouri Statistics

Wolpert Robert M Duke University Statistics

Xia Jessie F NISS Statistics

Xu Chang M University of Missouri - Columbia Statistics

Yang Hongxia F Duke University Statistics

Yasamin Saed M SAMSI Statistics

Young Linda F University of Florida Statistcs

Yuan Chengwei University of New Hampshire Statistics

Zhang Jun M SAMSI Statistics

Zhang Jing F Miami University Statistics

Zhao Yingqi F UNC Biostats

Zhu Zhengyuan M Iowa State University Statistics

Zidek James M University of British Columbia Satistics

Zou Jian (Frank) M NISS Satistics

GEOMED Workshop Participant Summary

November 14-16, 2009

Participants Male Female Unspec-

ified Faculty/

Professional

New Researcher/

Student Stat Math Other

Number of Institutions

Represented

Number of States

Represented

Supported 7 9 0 7 9 16 0 0 12 5

Unsuppted 1 1 0 1 1 2 0 0 2 1

SAMSI 1 1 0 0 2 2 0 0

GEOMED Workshop Workshop Participant Support

November 14-16, 2009

Last Name First Name

Gender Affiliation Major/Department Status

Bayarri Susie Female University of Valencia Stat FP

Berrocal Veronica Female SAMSI Stat NRG

Calder Kate Female Ohio State Stat NRG

Chakaborty Avishek Male Duke University Stat NRG

Chang Howard Male SAMSI Biostats NRG

Fortes Anabel Female University of Valencia Stat NRG

Gomez-rubio Viriglio Male

Universidad de Castilla-La Mancha Stat FP

Haran Murali Male Penn State U Stat FP

Lopes Danilo Male Duke U Stat NRG

Martinez-beneito Miguel Male

Centro Superior de Investigación en Salud Pública Stat FP

Nicolis Orietta Female U Bergamo Stat NRG

Yang Hongxia Female Duke U Stat NRG

Barrett Candace Female Ohio State Stat NRG

Liu Yajun Female U Missouri Stat NRG

Nail Amy Female US EPA Stat FP

Sun Dongchu Male U Missouri Stat FP

Reich Brian Male NCSU Biostats FP

GEOMED Workshop Workshop Participants November 14-16, 2009

Last Name First Name Gender Affiliation Major/Department Status

Banerjee Sudipto Male University of Minnesota Biostats FP

Bayarri Susie Female University of Valencia Stat FP

Berrocal Veronica Female SAMSI Stat NRG

Blangiardo Marta Female Imperial College Biostats NRG

Calder Kate Female Ohio State Stat NRG

Chakaborty Avishek Male Duke University Stat NRG

Chang Howard Male SAMSI Biostats NRG

Fortes Anabel Female University of Valencia Stat NRG

Gieseke Morgan Female NCSU Stat NRG

Gomez-rubio Viriglio Male Universidad de Castilla-La Mancha Stat FP

Haran Murali Male Penn State U Stat FP

Lopes Danilo Male Duke U Stat NRG

Martinez-

beneito Miguel Male

Centro Superior de Investigación en Salud Pública Stat FP

Nicolis Orietta Female U of Bergamo Stat FP

Yang Hongxia Female Duke U Stat NRG

Barrett Candace Female Ohio State Stat NRG

Liu Yajun Female U Missouri Stat NRG

Nail Amy Female US EPA Stat NRG

Sun Dongchu Male U Missouri Stat FP

Reich Brian Male NCSU Stat FP

Climate Change Workshop Participant Summary February 17-19, 2010

Participants Male Female Unspec-

ified Faculty/

Professional

New Researcher/

Student Stat Math Other

Number of Institutions

Represented

Number of States

Represented

Supported 17 5 0 13 9 11 5 6 16 9

Unsuppted 27 12 0 22 17 27 7 5 23 6

SAMSI 9 6 0 5 10 12 2 1 1 1

Climate Change Workshop Workshop Participant Support

February 17-19, 2010

Last Name First Name

Gender Affiliation Major/Department Status

Ammann Caspar M NCAR Geoscience FP

Beltran Francisco M

University of California, Santa Cruz Stats NRG

Bhat K Sham M

Pennsylvania State University Stats NRG

Buchman Susan F

Carnegie Mellon University Stats NRG

Drignei Dorin M Oakland University Stats FP

Forest Chris M

Pennsylvania State University Meteorology FP

Haran Murali M

Pennsylvania State University Stats FP

Hu Juegang M Stanford University Stats NRG

Jun Mikyoung F Texas A&M University Stats FP

Katzfuss Matthias M Ohio State University Stats NRG

Li Bo F Purdue University Stats NRG

Mearns Linda F NCAR Phys FP

Pistone Giovanni M Politecnico di Torino Math FP

Rajaratnam Bala M Stanford University Math NRG

Rowlands Dan M Oxford University Physical Sci NRG

Sain Stephan M

National Center for Atmospheric Research Math & Stats FP

Schneider Tapio M

California Institute of Technology Geosciences FP

Stainforth David M Oxford University Physical Sci FP

Timofeyev Ilya M University of Houston Math FP

Tolwinski-Ward Suz F

University of Arizona Math NRG

Wehner Michael M

Lawrence Berkeley National Laboratory Stats FP

Zhang Zepu M University of Alaska Stats FP

Climate Change Workshop

Workshop Participants February 17-19, 2010

Last Name First Name Gender Affiliation Major/Department Status

Ammann Caspar M NCAR Geoscience FP

Baines Paul M Harvard University Stats NRG

Beltran Francisco M M University of California, Santa Cruz App Math & Stats NRG

Berrocal Veronica F SAMSI Stats NRG

Bhat K Sham M Pennsylvania State University Stats NRG

Bramer Lisa F Iowa State University Stats NRG

Branstator Grant M NCAR Geosciences FP

Buchman Susan F Carnegie Mellon University Stats NRG

Buckley Lauren F University of North Carolina Biology NRG

Chang Howard M SAMSI Math NRG

Clark James M Duke University Eviron FP

Conesa David M U Valencia Stats FP

Craigmile Peter M Ohio State University Stats FP

Cressie Noel M Ohio State University Stats FP

Davison Anthony M EPFL Math FP

Drignei Dorin M Oakland University Stats FP

Dupuis Debbie F HEC Montreal Mgmt Sci FP

Eidsvik Jo M SAMSI Math FP

Erhardt Robert M University of North Carolina, Chapel Hill Stats NRG

Ferreira Marco M University of Missouri Stats FP

Forest Chris M Pennsylvania State University Meteorology FP

Genton Marc M Texas A&M University Stats FP

Goodman Dougal M

The Foundation for Science and Technology Stats FP

Greenwood Priscilla F Arizona State University, SAMSI Math FP

Hall Diana F University of North Carolina Stats NRG

Haran Murali M Pennsylvania State University Stats FP

Holt Ashley F National Geospatial-Intelligence Agency Other NRG

Hu Juegang M Stanford University Stats NRG

Jeon Soyoung F University of North Carolina Stats NRG

Jones Christopher M University of North Carolina Math FP

Jun Mikyoung F Texas A&M University Stats FP

Kang Emily F SAMSI Stats NRG

Kaper Hans M Georgetown University Math FP

Katzfuss Matthias M Ohio State University Stats NRG

Kaufman Cari F University of California, Berkeley Stats NRG

Kramer Peter M Rensselaer Polytechnic Institute Math FP

Kuensch Hans M ETHZ Stats FP

Levins Mihails M SAMSI Stats FP

Li Bo F Purdue University Stats NRG

Li Xuan F University of North Carolina Stats NRG

Li Linyuan M SAMSI Stats FP

Linder Ernst M University of New Hampshire Stats FP

Liu Yajun F SAMSI / University of Missouri Stats NRG

Mannshardt-Shamseldin Elizabeth F SAMSI Stats NRG

Martinelli Gabrielle F SAMSI Stats NRG

Martins Assuncao Renato M SAMSI Stats FP

Mearns Linda F NCAR Phys FP

Minion Michael M University of North Carolina Math FP

Modlin Danny M North Carolina State University Stats NRG

Ng Hon Keung Tony M

Southern Methodist University Stats NRG

Paul Rajib M Western Michigan University Stats NRG

Pistone Giovanni M Politecnico di Torino Math FP

Rajaratnam Bala M Stanford University Math NRG

Rios Daniel M SAMSI Stats NRG

Roberts Steven M Australian National University Stats FP

Rowlands Dan M Oxford University Physical Sci NRG

Sain Stephan M

National Center for Atmospheric Research Math & Stats FP

Salazar Esther F SAMSI Stats NRG

Sanso Bruno M University of California, Santa Cruz Stats FP

Schneider Tapio M California Institute of Technology Geosciences FP

Schwartzman Armin M Harvard School of Public Health Stats NRG

Shaby Ben M SAMSI Stats NRG

Smith Richard M University of North Carolina Stats FP

Stainforth David M Oxford University Physical Sci FP

Steinsland Ingelin F SAMSI Stats FP

Timofeyev Ilya M University of Houston Math FP

Tingley Martin M SAMSI Geosciences NRG

Tolwinski-Ward Suz F University of Arizona Math NRG

Wang Xia F NISS Stats NRG

Wang Jianqiang F National Institute of Statiatical Sciences Stats NRG

Wehner Michael M Lawrence Berkeley National Laboratory Stats FP

Wolpert Robert M Duke University Stats FP

Zhang Zepu M University of Alaska Stats FP

Zhang Jun M SAMSI Stats NRG

Zidek Jim M University of British Columbia Stats FP

Zou Jian (Frank) M NISS Stats NRG

Environmental Workshop

Participant Summary April 7-9, 2010

Participants Male Female Unspec-

ified Faculty/

Professional

New Researcher/

Student Stat Math Other

Number of Institutions

Represented

Number of States

Represented

Supported 3 4 0 2 5 6 0 1 7 5

Unsuppted 22 15 1 21 17 33 2 3 20 4

SAMSI 5 5 0 1 9 9 1 0 1 1

Environmental Workshop Workshop Participant Support

April 7-9, 2010

Last Name First Name

Gender Affiliation Major/Department Status

Genton Marc M Texas A&M University Stats FP

Hammerling Dorit F University of Michigan Engg NRG

Li Bo F Purdue University Stats NRG

Menezes Raquel F Minho University Stats FP

Prata Gomes Dora F

New University of Lisbon Stats NRG

Stark Glenn M University of New Mexico Stats NRG

Wu Ping-Shi M Lehigh University Stats NRG

Environmental Workshop Workshop Participants

April 7-9, 2010

Last Name First Name Gender Affiliation Major/Department Status

Antunes Marilia F University of Lisbon Math FP

Bermudez Patricia F University of Lisbon Stats FP

Berrocal Veronica F SAMSI Stats NRG

Braun John M University of Western Ontario Stats FP

Chakraborty Avishek M Duke University Stats NRG

Chang Howard M SAMSI Biostats NRG

Crooks James M U. S. Environmental Protection Agency Stats NRG

Das Sourish M SAMSI, Duke University Stats NRG

de Carvalho Miguel M Banco de Portugal Stats FP

Delmelle Eric M University of North Carolina, Charlotte Geoscience FP

Eidsvik Jo M SAMSI Math FP

Ferguson Claire F University of Glasgow Stats NRG

Finley Andrew M Michigan State University Phys NRG

Franco Villoria Maria F University of Glasgow Stats NRG

Fuentes Montse F North Carolina State University Stats FP

Genton Marc M Texas A&M University Stats FP

Green Peter M University of Bristol / SAMSI Stats FP

Hammerling Dorit F University of Michigan Engg NRG

Held Leonhard M University of Zurich Biostats FP

Holland David M U.S. Environmental Protection Agency Stats FP

Jeon Soyoung F University of North Carolina Stats NRG

Kalendra Eric M North Carolina State University Stats NRG

Kang Emily F SAMSI Stats NRG

Khan Diba F NCHS Stats NRG

Lahiri Soumen M Texas A & M University Stats FP

Li Bo F Purdue University Stats NRG

Liu Yajun F SAMSI / University of Missouri Stats NRG

Lopes Danilo M Duke University Stats NRG

Lopiano Kenneth M University of Florida Stats NRG

Mannshardt-Shamseldin Elizabeth F SAMSI / Duke Stats NRG

Maze Alena F Georgetown University Stats NRG

McDonald Claire F Centre for Ecology & Hydrology (CEH) Biosci NRG

Menezes Raquel F Minho University Stats FP

Ng Hon Keung Tony M

Southern Methodist University Stats NRG

Peng Roger M Johns Hopkins University Stats FP

Prata Gomes Dora F New University of Lisbon Stats NRG

Rappold Ana F U.S. Environmental Protection Agency Stats FP

Reich Brian M North Carolina State University Stats FP

Rue Havard M

Norwegian University of Science and Technology Stats FP

Salazar Esther F SAMSI Stats NRG

Scott Marian F University of Glasgow Stats FP

Shaby Ben M SAMSI Stats NRG

Silva Giovani M University of Lisbon Math FP

Smith Richard M University of North Carolina Stats FP

Stark Glenn M University of New Mexico Stats NRG

Steinsland Ingelin F NTNU / SAMSI Stats FP

Turkman Kamil Feridun M University of Lisbon Stats FP

Turkman Antonia Amaral F University of Lisbon Stats FP

Wang Xia F NISS Stats NRG

Wang Jianyu Duke University Stats NRG

Wu Ping-Shi M Lehigh University Stats NRG

Yajima Masanao M

University of California, Los Angeles Stats NRG

Young Linda F University of Florida Stats FP

Zhang Jun M SAMSI Stats NRG

Zou Jian M NISS Stats NRG

Education and Outreach Program

Undergraduate Two-Day Workshop

Participant Summary

October 30-31, 2009

Participants Male Female Unspecified Faculty Student

Stat/Math Majors

Other/Unspecified

Number of Institutions

Represented

Number of States

Represented

Supported 16 16 0 0 32 29 3 23 18

Unsuppted 7 7 0 10 4 13 1 6 2

SAMSI 4 1 0 5 0 5 0 1 1

Last Name First Name Gender Affiliation Major/ Department

Status

Archambault Heather F

U of North Carolina, Wilmington Mathematics S

Bayo Haddijatou F

Kentucky State University

civil/environmental engineering S

Belanus Danica F

University of North Dakota Mathematics S

Brooks Billy M

University of North Carolina, Asheville

Statisics/ Environmental Studies S

Cheng Howard M University of Arizona Mathematics S

Coles Adrian M

University of North Carolina, Wilmington Math S

Costain Ethan M

Colorado State University Biology S

Cubas-Casana Jorge M

New Jersey Institute of Technology

Mathematics of Finance and Actuarial Science S

Culiuc Amalia F

Mount Holyoke College Mathematics S

Decowski Jonathan M

University of New Hampshire Statistics S

Deis Jennifer F Florida State University Statistics S

Donoven Casey M

Montana State University, Bozeman Mathematics S

Drendel Jesse M

Colorado State University Mathematics S

Emanuel George M

University of Colorado, Boulder

Applied Math and Biology S

Gansel Allison F University of Dayton Biology S

Hom Matthew M University of Arizona

Mathematics, Economics S

Johnson Taylor F University of Arizona mathematics S

Keller Joshua M Emory University

Applied Mathematics S

Lau Shing Yan M

Carnegie Mellon University

Economics & Statistics S

Lee Elaine F

Carnegie Mellon University

Mathematical Sciences S

Li Vicky F

University of Colorado, Boulder

applied math S

McNamara Amelia F Macalester College Mathematics S

Megretskaia Dina F

Carnegie Mellon University

Economics and Mathematical Sciences S

Nolan Connor M Iowa State University

Matematics and Biology S

Parks Martin M University of Florida

Mathematics and Statistics S

Patterson Sally F

Montana State University, Bozeman

Applied Mathematics S

Puelz David M Wesleyan University

Math and Physics S

Rooker Kelly F Bridgewater College

Double majoring in Mathematics and Biology S

Schmidt Roger M Florida State University

Statistics/Philosophy S

Wamunyu Esther F Meredith College

Mathematics and civil engineering S

Woo Kaitlin F Georgetown University Mathematics S

Zhou Yan F

University of illinois, Urbana-Champaign Math S

Last Name First Name Gender Affiliation Major/Department

Archambault Heather F UNC Wilmington Mathematics

Bayarri Susie F U Valencia Stats

Bayo Haddijatou F

Kentucky State University

civil/environmental engineering

Belanus Danica F

University of North Dakota Mathematics

Berrocal Veronica F SAMSI Statistics

Boehm Laura F NCSU Statistics

Brooks Billy M

University of North Carolina, Asheville

Statisics/ Environmental Studies

Butter Eric M

University of North Carolina Biostatistics

Chakraborty Avishek M Duke Statistics

Chang Howard M SAMSI Statistics

Cheng Howard M University of Arizona Mathematics

Coles Adrian M

University of North Carolina, Wilmington Math

Costain Ethan M

Colorado State University Biology

Cubas-Casana Jorge M

New Jersey Institute of Technology

Mathematics of Finance and Actuarial Science

Culiuc Amalia F

Mount Holyoke College Mathematics

Decowski Jonathan M

University of New Hampshire Statistics

Deis Jennifer F

Florida State University Statistics

Donoven Casey M

Montana State University, Bozeman Mathematics

Drendel Jesse M Colorado State U Mathematics

Emanuel George M

U of Colorado, Boulder

Applied Math and Biology

Gansel Allison F University of Dayton Biology

Gieseke Morgan F NCSU Statistics

Gremaud Pierre M NCSU and SAMSI Math

Haran Murali M Penn State Statistics

Hom Matthew M University of Arizona

Mathematics, Economics

Hua Zhaowei F UNC Biostats

Jeon Soyoung F UNC Statistics

Johnson Taylor F University of Arizona mathematics

Keller Joshua M Emory University

Applied Mathematics

Larose Chantal F U CT Statistics

Lau Shing Yan M

Carnegie Mellon University

Economics & Statistics

Lee Elaine F

Carnegie Mellon University

Mathematical Sciences

Li Vicky F

University of Colorado, Boulder applied math

Lopes Danilo M Duke Statistics

McNamara Amelia F Macalester College Mathematics

Megretskaia Dina F

Carnegie Mellon University

Economics and Mathematical Sciences

Nolan Connor M Iowa State University

Matematics and Biology

Parks Martin M University of Florida

Mathematics and Statistics

Patterson Sally F

Montana State University, Bozeman

Applied Mathematics

Puelz David M Wesleyan University Math and Physics

Robel Alexander M Duke University

Earth/Ocean Sciences, Physics, Math (minor)

Rodriguez Terry M

University of North Carolina, Chapel Hill

Mathematical Decisions Sciences

Rooker Kelly F Bridgewater College

Double majoring in Mathematics and Biology

Schmidt Roger M

Florida State University Stats/Philosophy

Shaby Ben M SAMSI Statistics

Tingley Martin M SAMSI Statistics

Wamunyu Esther F Meredith College

Mathematics and civil engineering

Woo Kaitlin F

Georgetown University Mathematics

Yang Hongxia F Duke Stats

Zhang Jun M SAMSI Statistics

Zhou Yan F

University of illinois, Urbana-Champaign Math

Undergraduate Two-Day Workshop Participant Summary

February 26-27, 2010

Participants Male Female Unspec-

ified Faculty Student Stat/Math

Majors Other/Unspecified

Number of Institutions

Represented

Number of States

Represented

Supported 13 12 0 0 25 24 1 18 13

Unsuppted 2 2 0 4 0 4 0 3 2

SAMSI 4 2 0 6 0 6 0 1 1

Last Name First Name Gender Affiliation Major/Department Status

Abdul Hamidah F University of Michigan, Ann Arbor

Actuarial Mathematics, Statistics S

Baer Kerstin F Bryn Mawr College Mathematics S

Bartsch Jennifer F College of New Jersey Mathematics S

Blythe Rachel F

Harriet L. Wilkes Honors College of Florida Atlantic Univ.

Interdisciplinary Biology and Mathematics S

Bostwick Michael M University of Connecticut Statistics S

Cochran Jeff M Georgetown University Mathematics S

Colbert Cory M

Virginia Commonwealth University Mathematics S

Feng Tao M University of Illinois, Urbana Champaign

Applied Mathematics S

Huynh My M Arizona State University Mathematics S

Kamta Eric M Marymount University Mathematics S

Kent Meghan F Meredith College Mathematics S

Last Name First Name Gender Affiliation

Abdul Hamidah F University of Michigan, Ann Arbor

Athreya Avanti F SAMSI

Baer Kerstin F Bryn Mawr College

Bartsch Jennifer F College of New Jersey

Blythe Rachel F Harriet L. Wilkes Honors College of Florida

Liang Xuan F University of Michigan, Ann Arbor

Honors Math/Honors Economics S

Meisner Michele F College of New Jersey

Mathematics Statistics S

Ngo Thao F Arizona State University

Chemical Engineering S

Regh Todd M Southern Oregon University Mathematics S

Shepherd William M Florida State University

Pure Mathematics and Statistics S

Skowron Robert M University of Illinois, Urbana Champaign Mathematics S

Strong Homer M Reed College Mathematics S

Surber Wesley M East Tennessee State University Mathematics S

Wang Jiayin F Purdue University Actuarial Science & Applied Statistics S

Welter Amanda F University of Wisconsin, La Crosse

Mathematics (Statistics Emphasis) and Music S

Withey Catherine F Purdue University Mathematical Statistics S

Yeung Melissa F University of Chicago Mathematics S

Yu Kicho M Purdue University Statistics and Actuarial Science S

Zeng Yi M University of Illinois, Urbana-Champaign Mathematics S

Atlantic Univ.

Bostwick Michael M University of Connecticut

Cochran Jeff M Georgetown University

Cohen Sean M SAMSI

Colbert Cory M Virginia Commonwealth University

Diaz Oliver M SAMSI

Feng Tao M University of Illinois, Urbana Champaign

Greenwood Cindy F Arizona State

Gremaud Pierre M NCSU / SAMSI

Huynh My M Arizona State University

Kamta Eric M Marymount University

Kent Meghan F Meredith College

Kolba Tiffany F Duke

Liang Xuan F University of Michigan, Ann Arbor

McSweeney John M SAMSI

Meisner Michele F College of New Jersey

Ngo Thao F Arizona State University

Regh Todd M Southern Oregon University

Rogers Bruce M SAMSI

Schmidler Scott M Duke

Shepherd William M Florida State University

Skowron Robert M University of Illinois, Urbana Champaign

Strong Homer M Reed College

Surber Wesley M East Tennessee State University

Wang Jiayin F Purdue University

Wang Xieying F SAMSI

Welter Amanda F University of Wisconsin, La Crosse

Withey Catherine F Purdue University

Yeung Melissa F University of Chicago

Yu Kicho M Purdue University

Zeng Yi M University of Illinois, Urbana-Champaign

Last Name First Name Gender Affiliation

Abuzzahab Omar M University of Pennsylvania

Al Matar Najeeb M Florida Institute of Technology

Al-sharadqah Ali M University of Alabama - Birmingham

Andoniaina Rarivoarimanana University of Cincinnati

Athreya Avanti F Duke University

Auffinger Antonio M New York University

Baek Changryong M UNC - Chapel Hill

Balachandran Prakash M Duke University

Bessonov Mariya F Cornell University

Bhamidi Shankar M UNC - Chapel Hill

Bourque Matthew M University of Illinois - Chicago

Budhiraja Amarjit M UNC - Chapel Hill

Burr Meredith F Tufts University

Canepa Cristina F Carnegie Mellon University

Chandra Ajay M University of Virginia

Chaudhuri Ritwik M UNC - Chapel Hill

Chavez Esteban M Duke University

Chen Guangliang M Duke University

Chen Jiang F UNC - Chapel Hill

Chen Li F Oregon State University

Cisewski Jessi F UNC - Chapel Hill

Corwin Ivan M New York University

Crosskey Miles M Duke University

Ding Shanshan F University of Pennsylvania

Drenning Shawn M University of Chicago

Duarte Mauricio M University of Washington

Eden Richard M Purdue University

Eldredge Nathaniel M Cornell University

Ernst Philip M University of Pennsylvania

Esunge Julius M University of Mary Washington

Fang Ming M University of Minnesota

Feng Yaqin UNC - Charlotte

Ghosh Subhankar M University of Southern California

Gillespie Jr. Perry M UNC - Charlotte

Graber Jameson M University of Virginia

Guo Jing F University of Virginia

Guo Xiaoqin M University of Minnesota

Hamzi Boumediene M Duke University

Hannig Jan M UNC - Chapel Hill

Hening Alexandru M University of California - Berkeley

Hoffmeyer Allen M Georgia Institute of Technology

Holmes-Cerfon Miranda F New York University

Hu Yilei University of Paris VI and X

Huang Qizhuo University of Alabama - Birmingham

Huynh Huy M Georgia Institute of Technology

Ichiba Tomoyuki M University of California - Santa Barbara

Jackson Aaron M Duke University

Jafri Syeda Rabab University of Nottingham

Jairo Fuquene M University of Puerto Rico

Jay Parker M University of Illinois - Chicago

Jiang Duo F University of Chicago

Jiang Yunjiang M Stanford University

Jinwoo Hwang Purdue University

Kai Zhang M Duke University

Kaligotla Sivaditya M University of Southern California

Kang Hye-Won University of Minnesota

Karabash Dmytro M New York University

Kemajou Elisabeth F Southern Illinois University

Keutel Martin M University of Virginia

Kobayashi Kei Tufts University

Kolba Tiffany F Duke University

Kolba Mark M Signal Innovations Group

Kumar Ashish M SUNY - Buffalo

Lai Tri Indiana University

Leadbetter Ross M UNC - Chapel Hill

Lee Seonjoo F UNC - Chapel Hill

Lei Pedro M University of Kansas

Li Junchi M Duke University

Li Zhongyang Brown University

Lierl Janna F Cornell University

Lin Hao M University of Wisconsin

Little Anna F Duke University

Liu Xin F UNC - Chapel Hill

Liu Yufei F Brown University

Lou Shuwen F University of Washington

Lu Fei F University of Kansas

Lu Lu F University of Connecticut

Luo Shishi F Duke University

Luo Zhixiang F UNC - Chapel Hill

Ma Hui University of Alabama - Birmingham

Ma Jinyong M Georgia Institute of Technology

Ma Wei University of Virginia

Maggioni Mauro M Duke University

Mattingly Jonathan M Duke University

McGoff Kevin M University of Maryland

McKinley Scott M Duke University

McSweeney John M Concordia University

Mehrdad Behzad M New York University

Miller Jason M Stanford University

Minion Michael M UNC - Chapel Hill

Ngufor Nancy F University of Hasselt

Nguyen Tuan M Rutgers University

Nobel Andrew M UNC - Chapel Hill

Nolen James M Duke University

Pankaj Kumar M IGIDR

Pemantle Robin M University of Pennsylvania

Peng Yisha F Stanford University

Perkins William M New York University

Pipiras Vladas M UNC - Chapel Hill

Rastegar Reza M Iowa State University

Reinhold Dominik M UNC - Chapel Hill

Revzin Ella F University of Illinois - Chicago

Restrepo Ricardo M Georgia Institute of Technology

Rogers Bruce M Duke University

Ryu Hwa Yeon M Duke University

Ryvkina Jelena F Tufts University

Schott Sarah F Duke University

Setayeshgar Leila F Brown University

Shao Yuan M University of Chicago

Song Rui F University of Illinois - Urbana-Champaign

Srinivasan Ravi M University of Texas

Su Wei M University of Chicago

Su Zhe M University of Pittsburgh

Tang Xiaoxiao F University of Virginia

Thomas Rachel F Duke University

Tone Cristina F Indiana University

Vandenberg-Rodes Alexander M

University of California - Los Angeles

Varadhan S. R. Srinivasa M New York University

Varkey Paul M University of Illinois - Chicago

Vedantha Sowrirajan Lakshmi

Institute of Actuaries of India

Verella J. Tipan M University of Virginia

Viquez Juan M Purdue University

Wang Ruodu M Georgia Institute of Technology

Wang Xuan M UNC - Chapel Hill

Wang Yizao M University of Michigan

Wayman Eric M University of California - Berkeley

Webster Justin M University of Virginia

Windle Jesse M University of Texas

Xu Hangjun M Duke University

Xu Meng M University of Wyoming

Yang Xiangfeng Tulane University

Yang Yushu University of Maryland

Zhang Guanlin M University of Kansas

Zhang Xiaojing Kansas State University

Participants Male Female Unspec-

ified Faculty Student Stat/Math

Majors Other/Unspecified

Supported 10 14 0 0 24 21 3

Unsuppted 7 9 0 13 3 16 0

SAMSI 8 5 0 13 0 13 0

Last Name First Name Gender Affiliation Major/Department Status

Agattas Bailey F Aquinas College Mathematics S

Agrawal Mohit M Prinston University Mathematics S

Bayo Haddijatou F Kentucky State Mathematics S

Bender Kristina F University of North Carolina at Asheville

Pure Mathematics and Literature S

Erwin Samantha F Murray State University Mathematics S

Faircloth Vanessa F Norfolk State University

Applied Mathematics S

Geyer Kelly F

Virginia Polytechnic Institute and State University

Statistics, Mathematics S

Horn Corinne F Duke University

Electrical and Computer Engineering, Mathematics S

Huckaba Aron M Murray State University

Chemistry and Mathematics with a minor in Biology S

Huynh My M Arizona State University Mathematics S

Jones Asya F Spelman College Mathematics S

Kim Junghi F Universitiy of Connecticut

Mathematics/Statistics S

Kissler Stephen M University of Colorado, Boulder

Applied Mathematics S

Klopfenstein Toni F University of Colorado at Boulder

Applied Mathematics S

Lalla Chris M University of Illinois, Urbana-Champaign Statistics S

Lirette Seth M Mississippi State University Mathematics S

McCowan Nigel M Norfolk State University

Applied Mathematics S

Morrison Leigh F Georgetown University Mathematics S

Neavin Drew M Colorado State University Biological Sciences S

Roland Byron M East Tennessee State University Computer Science S

Shaw Candace F Spelman College Mathematics S

Tomasek Bradley M University of Illinois, Urbana-Champaign Statistics S

Wang Jiayin (Vivian) F

Purdue University, West Lafayette

Actuarial Science & Mathematical Statistics S

Yuan Lo-Hua F University of Michigan, Ann Arbor

Life Science Informatics, B.S. S

Last Name First Name Gender Affiliation Major/Department

Agattas Bailey F Aquinas College Mathematics

Agrawal Mohit M Princeton University Mathematics

Athreya Avanti F SAMSI Mathematics

Bayo Haddijatou F Kentucky State Mathematics

Bender Kristina F University of North Carolina at Asheville

Pure Mathematics and Literature

Berrocal Veronica F SAMSI Statistics

Boehm Laura F NCSU Stats

Chakraborty Avishek M Duke U Stats

Chang Howard M SAMSI Stats

Cohen Sean M NCSU Mathematics

Cole-Manning Cammey F Meredith College Mathematics

Das Sourish M SAMSI Stats

Despandhe Amogh M NCSU Mathematics

Diaz Oliver M SAMSI Mathematics

Emanuel George M University of Colorado

Applied Math and Molecular Biology

Erwin Samantha F Murray State University Mathematics

Faircloth Vanessa F Norfolk State University Applied Mathematics

Fovargue Dan M UNC Mathematics

Geyer Kelly F

Virginia Polytechnic Institute and State University

Statistics, Mathematics

Gieske Morgan F NCSU Stats

Gremaud Pierre M SAMSI/NCSU Mathematics

Horn Corinne F Duke University

Electrical and Computer Engineering, Mathematics

Hua Zhaowei F UNC Biostats

Huckaba Aron M Murray State University

Chemistry and Mathematics with a minor in Biology

Huynh My M Arizona State University Mathematics

Jeon Soyoung F UNC Stats

Jones Asya F Spelman College Mathematics

Kang Emily F SAMSI Stats

Kim Junghi F Universitiy of Connecticut

Mathematics/Statistics

Kissler Stephen M University of Colorado, Boulder Applied Mathematics

Klopfenstein Toni F University of Colorado at Boulder Applied Mathematics

Kolba Tiffany F Duke U Mathematics

Kuznetsova Anna F Duke University Mathematics AB

Lalla Chris M University of Illinois, Urbana-Champaign Statistics

Lirette Seth M Mississippi State University Mathematics

Liu Xin F UNC Stats

Lopes Danilo M Duke U Stats

McCowan Nigel M Norfolk State University Applied Mathematics

Morrison Leigh F Georgetown University Mathematics

Neavin Drew M Colorado State University Biological Sciences

Ning Yuxiao (Shawn) M New York University Mathematics

Roland Byron M East Tennessee State University Computer Science

Rogers Bruce M SAMSI Mathematics

Salazar Esther F SAMSI Stats

Shaby Ben M SAMSI Stats

Shaw Candace F Spelman College Mathematics

Tingley Martin M SAMSI Stats

Tomasek Bradley M University of Illinois, Urbana-Champaign Statistics

Wang Jiayin (Vivian) F Purdue University, West Lafayette

Actuarial Science & Mathematical Statistics

Wang Xueying F SAMSI Stats

Yang Hongxia F Duke U Stats

Yuan Lo-Hua F University of Michigan, Ann Arbor

Life Science Informatics, B.S.

Zhang Jun M SAMSI Stats

Participants Male Female Unspec-

ified Faculty Student Stat/Math

Majors Other/Unspecified

Supported 26 11 0 0 37 30 7

Unsuppted 11 1 0 12 0 8 4

SAMSI 0 2 0 2 0 2 0

Last Name First Name Gender Affiliation Major/Department Status

Brown Aaron Male Tufts University MATH S

Byrne Erin Female University of Colorado Boulder MATH S

Choi Heejun Male Purdue University MATH S

Ding Lili Female University of Cincinnati STAT S

Gabrys Robertas Male Utah State University STAT S

Griep Chad Male University of Rhode Island MATH S

Heaton Matthew Male Duke University STAT S

Jayaram Magathi Female Utah State University ENGG S

Katzfuss Matthias Male Ohio State University ENGG S

Kim Noory Male Towson University STAT S

Kirshtein Jenya Female University of Denver STAT S

Kumar Nitesh Male University of California Merced MATH S

Laungrungrong Busaba Female Arizona State University ENGG S

Li Yi Male Duke University STAT S

Lomuscio Michael Male Western Carolina University MATH S

Morales Mario Male Hunter College MATH S

Pearson Dale Male Texas Tech University MATH S

Pedings Kathryn Female College of Charleston MATH S

Porter Jacob Male University of California Davis MATH S

Proctor William Male North Carolina State University MATH S

Raghuram Karthik Male

University of California Santa Barbara ENGG S

Ramachandar Shahla Female University of Texas STAT S

Richards Gregory Male Kent State University STAT S

Samarakoon Nishantha Male Kansas State University STAT S

Shafahi Maryam Female University of California Riverside ENGG S

Shen Chongyi Male University of Iowa ENGG S

Skorczewski Tyler Male University of California Davis STAT S

Soodhalter Kirk Male Temple University MATH S

Sun Jie (Rena) Female University of Michigan MATH S

Vu Duy Male Penn State University STAT S

Wang Min Male Northern Illinois University STAT S

Wiltshire Jelani Male FSU MATH S

Yang Hongxia Female Duke University STAT S

Zhang Jingyan Female Penn State University MATH S

Zhong Peng Male University of Tennessee MATH S

Zhou Kun Male Penn State University MATH S

Zou James Male Harvard ENG S

Last Name First Name Gender Affiliation Major/Department

Bouton Chad Male Battelle Inst. ENGG

Brooks Harold Male National Severe Storm Lab PHYSICS

Brown Aaron Male Tufts University MATH

Byrne Erin Female University of Colorado Boulder MATH

Cole-Manning Cammey Female Meredith College MATH

Choi Heejun Male Purdue University MATH

Ding Lili Female University of Cincinnati STAT

Gabrys Robertas Male Utah State University STAT

Gate Brian Male

Gilleland Eric Male NCAR STAT

Griep Chad Male University of Rhode Island MATH

Heaton Matthew Male Duke University STAT

Hopping Albert Male Progress Energy PHYSICS

Ito Kazi Male NCSU MATH

Jayaram Magathi Female Utah State University ENGG

Kang Emily Female SAMSI STAT

Katzfuss Matthias Male Ohio State University ENGG

Kim Noory Male Towson University STAT

Kirshtein Jenya Female University of Denver STAT

Kumar Nitesh Male University of California Merced MATH

Laungrungrong Busaba Female Arizona State University ENGG

Li Yi Male Duke University STAT

Lomuscio Michael Male Western Carolina University MATH

Mannshardt-Shamseldin Elizabeth Female SAMSI STAT

Massad Jordan Male Sandia MATH

Morales Mario Male Hunter College MATH

Peach John Male Lincoln Laboratory MATH

Pearson Dale Male Texas Tech University MATH

Pedings Kathryn Female College of Charleston MATH

Porter Jacob Male University of California Davis MATH

Proctor William Male North Carolina State University MATH

Raghuram Karthik Male

University of California Santa Barbara ENGG

Ramachandar Shahla Female University of Texas STAT

Richards Gregory Male Kent State University STAT

Samarakoon Nishantha Male Kansas State University STAT

Scroggs Jeff Male NCSU MATH

Shafahi Maryam Female University of California Riverside ENGG

Shen Chongyi Male University of Iowa ENGG

Skorczewski Tyler Male University of California Davis STAT

Smith Ralph Male NCSU MATH

Smith Richard Male UNC STAT

Soodhalter Kirk Male Temple University MATH

Sun Jie (Rena) Female University of Michigan MATH

Vu Duy Male Penn State University STAT

Wang Min Male Northern Illinois University STAT

Wiltshire Jelani Male FSU MATH

Yang Hongxia Female Duke University STAT

Zhang Jingyan Female Penn State University MATH

Zhong Peng Male University of Tennessee MATH

Zhou Kun Male Penn State University MATH

Zou James Male Harvard ENG

2010 SUMMER PROGRAM

Semiparametric Bayesian Inference: Applications in Pharmokinetics and

Pharmacodynamics (PKPD)

PKPD Program

Participant Summary

July 12 – July 23, 2010

Participants Male Female Unspec-

ified Faculty/

Professional

New Researcher/

Student Stat Math Other

Number of Institutions

Represented

Number of States

Represented

Supported 25 9 1 18 17 29 2 4 28 15

Unsuppted 18 8 1 10 17 26 0 1 12 5

SAMSI 0 0 0 0 0 0 0 0

Last Name First Name Gender Affiliation Major/Department Status

Bayarri M.J. Female University of Valencia STAT FP

Boonstra Philip Male

University of Michigan Department of Biostatistics STAT NRG

Choi Leena Female Vanderbilt University STAT FP

Cornebise Julien Male U British Columbia STAT NRG

Dahl David Male Texas A&M University STAT FP

Ding Lili Female

Cincinnati Children's Hospital Medical Center STAT NRG

Dutta Ritabrata Male Purdue University STAT NRG

Escobar Michael Male University of Toronto STAT FP

Fronczyk Kassie Female University of California STAT NRG

Ghosh Kaushik Male

University of Nevada - Las Vegas STAT FP

Guindani Michele Male University of New Mexico STAT FP

Johnson Wes Male University of CA, Irvine STAT FP

Kottas Athanasios Male U of California, Santa Cruz STAT FP

Lewandowski Andrew Male Purdue University STAT NRG

Li Lang Male Indiana U LIFE FP

MacEachern Steven Male Ohio State STAT FP

Mueller Peter Male MD Anderson Cancer Center STAT FP

Oluyede Broderick Male

Georgia Southern University STAT FP

Pararai Mavis Female Inidiana U of Penn. STAT NRG

Quispe Vargas Walter Male

University of Puerto Rico STAT NRG

Rodriguez Abel Male University of California STAT NRG

Rosner Gary Male Johns Hopkins University STAT FP

Roura Jaime Male

University of Puerto Rico, Rio Piedras STAT NRG

Tadesse Mahlet Female Georgetown University STAT FP

Telesca Donatello Male UCLA STAT NRG

Thomas Duncan Male University of So. CA LIFE FP

Tomasetti Cristian Male University of Maryland MATH NRG

Torres David Male University of Puerto Rico MATH NRG

Vicini Paolo Male

Pfizer Global Research and Development LIFE FP

Wahed Abdus Male University of Pittsburgh STAT FP

Wang Yanpin Female U Florida STAT NRG

Wu Meihua Male University of Michigan STAT NRG

Xu Xinyi Female Ohio State University STAT FP

Yao Ping Northern Ill U LIFE NRG

Zou Yuanshu Female University of Cincinnati STAT NRG

Last Name First Name Gender Affiliation Major/Department

Armagan Artin Male Duke University STAT

Banks David Male Duke University STAT

Barbare Ralph Male Gad Consulting Services STAT

Bayarri M.J. Female University of Valencia STAT

Bhattacharya Anirban Male Duke University STAT

Bonate Peter Male GlaxoSmithKline OTHR

Boonstra Philip Male

University of Michigan Department of Biostatistics STAT

Burgette Lane Male Duke University STAT

Chen Shuguang Male GlaxoSmithKline STAT

Choi Leena Female Vanderbilt University STAT

Cornebise Julien Male U British Columbia STAT

Cui Kai Male Duke Unviersity STAT

Dahl David Male Texas A&M University STAT

Davidian Marie Female North Carolina State University STAT

De Iorio Maria Female Imperial College STAT

Ding Lili Female

Cincinnati Children's Hospital Medical Center STAT

Dunson David Male Duke University STAT

Dutta Ritabrata Male Purdue University STAT

Escobar Michael Male University of Toronto STAT

Fronczyk Kassie Female University of California STAT

Gan Kevin Male GlaxosmithKline STAT

Ghosh Jayanta Male Purdue University STAT

Ghosh Joyee Female UNC Chapel Hill STAT

Ghosh Kaushik Male University of Nevada - Las Vegas STAT

Guha Subharup Male University of Missouri STAT

Guindani Michele Male University of New Mexico STAT

Johnson Brendan Male GlaxoSmithKline LIFE

Johnson Wes Male University of CA, Irvine STAT

Korman Alexander Male Duke University STAT

Kottas Athanasios Male

University of California, Santa Cruz STAT

Kundu Suprateek Male UNC STAT

Lee Ju Hee Female Ohio State University STAT

Lewandowski Andrew Male Purdue University STAT

Li Lang Male Indiana U LIFE

MacEachern Steven Male Ohio State STAT

Mueller Peter Male MD Anderson Cancer Center STAT

Oluyede Broderick Male Georgia Southern University STAT

Omolo Bernard Male UNC Chapel Hill STAT

Pang Herbert Male Duke University STAT

Pararai Mavis Female Inidiana U of Penn. STAT

Pati Debdeep Male Duke University STAT

Quispe Vargas Walter Male

University of Puerto Rico STAT

Rizwan Ahsan Male UNC LIFE

Rodriguez Abel Male University of California STAT

Rosner Gary Male Johns Hopkins University STAT

Roura Jaime Male University of Puerto Rico, Rio Piedras STAT

Tadesse Mahlet Female Georgetown University STAT

Telesca Donatello Male UCLA STAT

Thomas Duncan Male University of So CA LIFE

Tokdar Surya Male Duke University STAT

Tomasetti Cristian Male University of Maryland MATH

Torres David Male University of Puerto Rico MATH

Vicini Paolo Male

Pfizer Global Research and Development LIFE

Wahed Abdus Male University of Pittsburgh STAT

Wang Yanpin Female UNIVERSITY OF FLORIDA STAT

Wolpert Robert Decline Duke University STAT

Wu Meihua Male University of Michigan STAT

Xiaojing Wang Female Duke University STAT

Xing Chuanhua Female Duke University STAT

Xu Xinyi Female Ohio State University STAT

Yang Mingan Saint Louis University STAT

Yao Ping Northern Ill LIFE

Yu Li Female Medimmune STAT

Zhu Hongxiao Female

University of Texas MD Anderson Cancer Center STAT

Zou Yuanshu Female University of Cincinnati STAT

2011-12 PROGRAM EVENTS THROUGH APRIL 2010

Uncertainty Quantification Workshop Participant Summary

April 6, 2010

Participants Male Female Unspec-

ified Faculty/

Professional

New Researcher/

Student Stat Math Other

Number of Institutions

Represented

Number of States

Represented

Supported 22 2 0 22 2 6 10 8 21 12

Unsuppted 23 5 0 27 1 16 9 3 19 8

SAMSI 1 1 0 2 0 2 0 0 1 1

Uncertainty Quantification Workshop Workshop Participant Support

April 6, 2010

Last Name First Name

Gender Affiliation Major/Department Status

Archibald Rick M Oak Ridge Nat'l Lab Comp Sci / Math FP

Bal Guillaume M Columbia University Applied Math FP

Berlyand Leonid M Pennsylvania State University Applied Math FP

Braverman Amy F

Jet Propulsion

Lab, Caltech Stat FP

Efendiev Yalchin M Texas A&M University Applied Math FP

Estep Don M Colorado State University Math FP

Ghanem Roger M

University of Southern California Engineering FP

Hesthaven Jan M Brown University Applied Math FP

Higdon David M LANL Statistics FP

Iaccarino Gianluca M Stanford University Engineering/Science NRG

Johannesson Gardar M LLNL Math and CS FP

Lermusiaux Pierre M MIT Engineering/Science NRG

Marzouk Youssef M MIT Engineering/Science FP

Nychka Doug M NCAR Stats FP

Owhadi Houman M Cal Tech Applied Math FP

Patra Abani M University at Buffalo Engineering/Science FP

Pitman Bruce M University at Buffalo Applied Math FP

Red-Horse John M SANDIA Probabilist? FP

Santner Tom M Ohio State University Stats FP

Shoemaker Christine F Cornell University Engineering/Science FP

Spiegelman Cliff M Texas A&M University Stats FP

Tartakovsky Daniel M

University of California, San Diego Engineering/Science FP

Wu Jeff M Georgia Tech Stats FP

Xiu Dongbin M Purdue University Applied Math FP

Uncertainty Quantification Workshop Workshop Participants

April 6, 2010

Last Name First Name Gender Affiliation Major/Department Status

Anitescu Mihai M ANL Math and CS FP

Archibald Rick M Oak Ridge Nat'l Lab Comp Sci / Math FP

Bal Guillaume M Columbia University Applied Math FP

Bayarri Susie F University of Valencia Stats FP

Berger Jim M SAMSI Stats FP

Berlyand Leonid M Pennsylvania State University Applied Math FP

Bingham Derek M Simon Fraser University Stats FP

Braverman Amy F

Jet Propulsion Lab,

Caltech Stat FP

Casella George M University of Florida Stats FP

Crowley Jim M SIAM Math FP

Deodatis George M Columbia University Engineering/Science FP

Efendiev Yalchin M Texas A&M University Applied Math FP

Estep Don M Colorado State University Math FP

Fuentes Montse F North Carolina State University Stats FP

Ghanem Roger M University of Southern California Engineering FP

Gremaud Pierre M North Carolina State University/SAMSI Math FP

Hannig Jan M University of North Carolina Stats FP

Hesthaven Jan M Brown University Applied Math FP

Higdon David M LANL Statistics FP

Iaccarino Gianluca M Stanford University Engineering/Science NRG

Johannesson Gardar M LLNL Math and CS FP

Kass Rob M CMU Stats FP

Lee Herbie M UCSC Math/Stats FP

Lermusiaux Pierre M MIT Engineering/Science NRG

Liu Fei F University of Missouri Stats NRG

Marzouk Youssef M MIT Engineering/Science FP

McKeague Ian M Columbia Biostats FP

Miller Robert M Oregon State Engineering/Science FP

Morris Max M Iowa State Stats FP

Nolen Jim M Duke University Math FP

Nychka Doug M NCAR Stats FP

Owhadi Houman M Cal Tech Applied Math FP

Pantula Sastry M North Carolina State University Stats FP

Papanicolaou George M Stanford University Applied Math FP

Patra Abani M University at Buffalo Engineering/Science FP

Petzold Linda F UCSB Applied Math FP

Pitman Bruce M University at Buffalo Applied Math FP

Raftery Adrian M U Washington Stats FP

Red-Horse John M SANDIA Probabilist? FP

Santner Tom M Ohio State University Stats FP

Sedransk Nell F SAMSI Stats FP

Shoemaker Christine F Cornell University Engineering/Science FP

Smith Ralph M North Carolina State University Math FP

Smith Richard M University of North Carolina/SAMSI Math FP

Spiegelman Cliff M Texas A&M University Stats FP

Stein Michael M U Chicago Stats FP

Tartakovsky Daniel M University of California, San Diego Engineering/Science FP

Tebaldi Claudia F UBC Stats FP

Wasserstein Ron M AMSTAT Stats FP

Welch William M UBC Stats FP

Wolpert Robert M Duke University Stats FP

Wu Jeff M Georgia Tech Stats FP

Xiu Dongbin M Purdue University Applied Math FP

Zhang Thomas M NASA FP

486

APPENDIX F – Workshop Programs and Abstracts

1. Psychometrics Summer Program

Schedule

Tuesday, July 7, 2009

(Radisson, RTP)

8:15-8:45 Registration and Continental Breakfast

8:45-9:00 Welcome

9:00-12:00 Yanyan Sheng, Southern Illinois University

“Bayesian Analysis of Item Response Theory Models”

10:00-10:30 Break

12:00-2:00 Lunch

2:00-3:00 David Thissen, University of North Carolina

“IRTPRO Demonstration”

3:00-3:15 Break

3:15-4:15 Richard Swartz, University of Texas, MD Anderson Cancer Center

“Bayesian and Classical Computerized Adaptive Testing Item Selection

Algorithms”

Wednesday, July 8, 2009

(Radisson, RTP)

8:30-9:00 Registration and Continental Breakfast

9:00-12:00 Sun-Joo Cho, University of California, Berkeley

Frank Rijmen, Educational Testing Service

Mark Wilson, University of California, Berkeley

“A Nonlinear Mixed Models Approach to IRT”

10:00-10:30 Break

12:00-2:00 Lunch

2:00-4:00 Mario Peruggia, Ohio State University

“Hierarchical Bayes Models for Response Time Data”

Trish Van Zandt, Ohio State University

“An Overview of Response Time Models in Psychology”

487

3:00-3:15 Break

4:00-5:00 Poster Advertisements (2 minute ads each)

5:00–7:00 Poster Session and Reception

SAMSI will provide poster presentation boards and tape. The board dimensions

are 4 ft. wide by 3 ft. high. They are tri-fold with each side being 1 ft. wide and

the center 2 ft. wide. Please make sure your poster fits the board. The boards can

accommodate up to 16 pages of paper measuring 8.5 inches by 11 inches.

Thursday, July 9, 2009

(Radisson, RTP)

8:30-9:00 Continental Breakfast

9:00-12:00 Mathias von Davier, Educational Testing Service

“Notes on Models for Cognitive Diagnosis”

10:00-10:30 Break

12:00-1:30 Lunch

1:30-2:30 Sandip Sinharay, Educational Testing Service

“A Critical Evaluation of Diagnostic Score Reporting: Some Theory and

Applications”

2:30-2:45 Break

2:45-4:00 Dongchu Sun, University of Missouri

“Bayesian Hierarchical Models for Recognition-Memory Experiments”

Jun Lu, American University

“A Bayesian Approach for Assessing Human Memory Using Process-

Dissociation Procedure”

Friday, July 10, 2009

(Radisson, RTP)

8:30-9:00 Continental Breakfast

9:00-12:00 Matthew Johnson, Columbia University

“An Introduction to Rater Models”

488

10:00-10:30 Break

12:00-2:00 Lunch

2:00-4:00 Paul Speckman, Dongchu Sun, and Jeff Rouder, University of Missouri

“Item-Response Models for Measuring Thresholds in Performance”

3:00-3:15 Break

Monday July, 13 – Friday July, 17

(SAMSI)

9:00-12:00 Working Group Meetings:

Peer Review Working Group (Room 104)

Cognitive Diagnostic Models Working Group (Room 150)

Longitudinal Assessment of PRO Working Group (Room 203)

12:00-1:00 Lunch

1:00-5:00 Working Group Meetings:

Peer Review Working Group (Room 104)

Cognitive Diagnostic Models Working Group (Room 150)

Longitudinal Assessment of PRO Working Group (Room 203)

489

Program on Psychometrics: Peer Review Working Group

The PRWG will meet spontaneously during the second week of the program. Currently scheduled talks

are listed below. Contributed talks will be added to the schedule during the course of the Psychometric

Program. A white paper will be prepared by the working group and summarized at a joint meeting at

the end of the week. Non-meeting time will be devoted to group collaboration on topics related to peer

review.

SCHEDULE Monday July 13, 2009 – Friday July 17, 2009

Monday July 13, 2009

11:30-12:30 David Banks, Duke University

“Judgement in JASA”

12:30-2:00 Lunch

Tuesday July 14, 2009

11:30-12:30 Valen E. Johnson, University of Texas, MD Anderson Cancer Center

“An Overview of NIH R01 Peer Review Scores”

12:30-2:00 Lunch

Wednesday July 15, 2009

11:00-12:00 Jing Cao, Southern Methodist University

“A Bayesian Approach to Ranking and Rater Evaluation: An Application to

Grant Reviews”

12:00-1:30 Lunch

Thursday July 16, 2009

11:00-12:00 Song Zhang, University of Texas, Southwestern Medical Center

“A Baysian Hierarchical Model for Multi-rater Data with Fine Scales”

12:00-1:30 Lunch

Friday July 17, 2009

11:00-12:00 Discuss Draft of White Paper

12:00-1:00 Lunch

1:00 Adjourn

490

Cognitive Diagnostic Models Working Group (CDMWG)

The CDMWG will meet during the second week of the program. The program will be more

structured during the beginning of the week, and more open during the end of the week. Talks

currently scheduled during this week are listed below. Additional talks will be added to the schedule

during the course of the Psychometric Program. A white paper will be prepared by the working group

and summarized at a joint meeting at the end of the week.

SCHEDULE Monday July 13, 2009 – Friday July 17, 2009

Monday July 13, 2009

9:00-10:00 Introduction

10:00-11:30 Matthew Finkelman, Tufts University

Kristin Huff, College Board

Curtis Tatsuoka, Case Western University

“Diagnostic Assessment Approaches”

11:30-12:30 David Banks, Duke University

12:30-2:00 Lunch

2:00-3:30 Tzur Karelitz, Education Development Center

Jere Confrey, North Carolina State University

Alicia Alonzo, University of Iowa

“Developmental Theories for Diagnostic Assessment”

3:30-5:00 Andre Rupp, University of Maryland

Ying Cui, University of Alberta

Nathalie Loye, University of Montreal

“Task & Q-matrix Construction”

Tuesday July 14, 2009

8:30-10:00 Andre Rupp, University of Maryland

Jimmy de la Torre, Rutgers University

Robert Henson, University of North Carolina, Greensboro

“Fully Parametric Models for Classification”

10:00-11:30 Curtis Tatsuoka, Case Western University

Ying Cui, University of Alberta

Rebecca Nugent, Carnegie Mellon University

“Non-parametric and Semi-parametric Models for Classification”

11:30-12:30 Valen Johnson, University of Texas, MD Anderson Cancer Center

12:30-2:00 Lunch

491

2:00-3:30 Roy Levy, Arizona State University

Robert Henson, University of North Carolina, Greensboro

Jimmy de la Torre, Rutgers University

“Challenges in Estimation, Programming, and Implementation”

3:30-5:00 Ying Cui, University of Alberta

Roy Levy, Arizona State University

Jimmy de la Torre, Rutgers University

“Model Fit Assessment & Refinement”

Wednesday July 15, 2009

8:30-10:30 Curtis Tatsuoka, Case Western University

Ying Cheng, University of Notre Dame

Matthew Finkelman, Tufts University

“Optimal Test Design and Computerized Adaptive Testing”

10:30-12:00 Eunice Jang, University of Tornoto

Neil Heffernan, Worcester Polytechnic Institute

Kristin Huff, College Board

“Score Reporting & Subsequent Action”

12:00-1:30 Lunch

1:30-3:00 Andre Rupp, University of Maryland

Tiffany Barnes, University of North Carolina, Charlotte

Neil Heffernan, Worcester Polytechnic Institute

“Validation”

3:00-5:00 Moderated Panel Session

“Future Challenges for Diagnostic Assessment”

The morning of the fourth day will be devoted to identifying the cutting-edge methods in cognitive

diagnosis modeling, and three to five important gaps that must be filled to advance the field forward.

The cutting-edge methods and research gaps will constitute the research agenda of the working group.

The topics in these research agenda will be distributed into three 90-minute time blocks that will

extend from Thursday afternoon and the first part of Friday morning. Participants will be asked to

select a research topic in each time block. The 1.5 hour-block will be used to discuss potential research

projects that can be done in specific topics, sign-up participants who can collaborate on these projects,

and outline strategies and time frame for completing these projects. Starting the latter part of Friday

morning (10:30-12:30), participants will report the summary of the discussion of potential projects to

all the participants of the working group. The working group will adjourn after a lunch.

492

Longitudinal Assessment of PRO Working Group (LAPROWG)

The LAPROWG will meet during the second week of the program. The program will be more

structured during the beginning of the week, and with more spontaneous meetings during the end

of the week. Talks currently scheduled during this week are listed below. Additional talks may

be added to the schedule during the course of the Psychometric Program. A white paper will be

prepared by the working group and summarized at a joint meeting at the end of the week.

SCHEDULE Monday July 13, 2009 – Friday July 17, 2009

Monday July 13, 2009

9:00-9:30 Richard Swartz, University of Texas M. D. Anderson Cancer Center

Introductions, Purpose of Working Group

9:30-10:20 Carolyn Schwartz, DeltaQuest

“Importance of Responsiveness to Change and Mediators to the Measurement of

Change in Health Outcomes”

10:30-Noon Ken Bollen, University of North Carolina

“Longitudinal Measurement of Patient Reported Outcomes: Latent Curve Models

Using Structural Equation Models”

1:00-3:15 Ethan Basch, Memorial Sloan Kettering Cancer Center

Charles Cleeland, University of Texas M. D. Anderson Cancer Center

“Practical Needs in Trial Design and Detecting True Change Over Time -

Clinicians’ Perspective”

Diane Fairclough, University of Colorado, Denver

“Practical Needs in Trial Design and Detecting True Change Over Time - Patient

Perspective of the Patient Experience”

3:30-5:00 Carolyn Schwartz, DeltaQuest

“Methods to Detect Response Shift and Responsiveness to Change ”

Tuesday, July 14

9:00-10:30 Bruce Rapkin, Albert Einstein College of Medicine

“Cognitive Factors in the Quality of Life Rating Response Scales and How to

Include/model This Information”

10:45-Noon Diane Fairclough, University of Denver, Colorado

“Impact of Missing Data When Evaluating Change Over Time”

1:00- 2:30 Li Cai, University of California, Los Angeles

“Multidimensional IRT and Potential Applications to Assessing PROs Over Time

and Detecting Response Shift”

493

3:00-5:15 Jeff Sloan, Mayo Clinic

“Precision, Validity and Sensitivity vs. Response Burden in PRO Endpoints –

Facilitating Detection of True Change: Using Single-item vs. Multi-item Scales

to Monitor Change”

Richard Swartz, The University of Texas M. D. Anderson Cancer Center

“Precision, Validity and Sensitivity vs. Response Burden in PRO Endpoints –

Facilitating Detection of True Change: Considering the Precision vs. Burden

Tradeoff within CAT”

Wednesday, July 15

9:00-10:30 Jeff Sloan, Mayo Clinic

“Interpreting Minimally Important Differences While Accounting for

Measurement Variability/response Shift”

10:45-Noon Brainstorming Next Steps / Outline White Papers

1:00-3:00 Outline White Paper / Discuss Datasets

Thursday, July 16

12:00-1:00 Lunch

3:00-5:00 Updates

Friday, July 17

9:00-Noon Revise Outline for White Papers/ Delegate Duties / Develop Timeline to

Complete White Paper.

12:00-1:00 Lunch

The white paper to be produced in the LAPROWG has two main goals:

1) to discuss the state of the art and recommend policy and procedures for analyzing

longitudinal PRO data

2) to identify areas needing methodological improvement when considering

longitudinal PRO data

SPEAKER ABSTRACTS

David Banks

Duke University

[email protected]

494

The editorial process is noisy, which affects the quality of the decision to reject or to accept a manuscript.

This talk describes some of the factors that may influence the process, and introduces a data set for analysis

that may cast light on unconscious prejudices in the management of the _Journal of the American Statistical

Association_. Additionally, the talk indicates some alternative models for publication that have the potential

to diminish the noise or bias in the review of research papers.

Jing Cao

Southern Methodist University

[email protected]

“A Bayesian Approach to Ranking and Rater Evaluation: Application to Grant Reviews”

We develop a Bayesian hierarchical model for the analysis of ordinal data from multirater ranking studies.

The model for a rater's score includes four latent factors: one is a latent item trait determining the true order of

items and the other three are the rater's performance characteristics, including bias, discrimination, and

measurement error in the ratings. The proposed approach aims at three goals. First, a Bayesian estimator is

introduced to estimate the ranks of items. It shows a substantial improvement over the widely used score-

average ranking estimator by utilizing the information on rater-specific scoring patterns. Second, rater

performance can be compared based on rater bias, discrimination, and measurement error. Third, a

simulation-based decision-theoretic approach is described to determine the number of raters to employ. A

simulation study and an analysis based on a grant review dataset are presented.

Ying Cheng University of Notre Dame

[email protected]

“Cognitive Diagnostic Computerized Adaptive Testing”

Invited Abstract: Computerized adaptive testing has seen tremendous growth in the last three decades. At the

same time, cognitive diagnosis has taken off, especially since the No Child Left Behind Act (2000) mandated

that diagnostic feedback be provided to students. Researchers have been trying to bring together the two

wonderful inventions in educational and psychological measurement. In this talk, we will summarize recent

findings and discuss future directions of cognitive diagnostic CAT research.

Jere Confrey North Carolina State University

[email protected]

“Developing Learning Trajectories and Diagnostic Assessments Based Student Reasoning on Rational

Number over Time”

The DELTA project (Diagnostic E-Learning Trajectories Approach) is designed to synthesize the

research on rational number and identify underlying learning trajectories. Additional interviews are

conducted to fully develop the constructs in the learning trajectories. From these, we are building items

to create diagnostic measures in mathematics and conducting pilot studies to define outcome spaces and

larger field tests to validate the items. The presentation will show how we have used the literature to

show the interconnections among concepts via a concept map and then used it to develop and refine

individual learning trajectories by refining the map and conducting additional research. An example

using equipartitioning will be provided.

Matthew Finkelman

Tufts University

[email protected]

495

“A Zero- and K-Inflated Mixture Model for Health Questionnaire Data”

In psychiatric assessment, Item Response Theory (IRT) is a popular tool to formalize the relation between the

severity of a disorder and associated responses to questionnaire items. Practitioners of IRT sometimes make

the assumption of normally distributed severities within a population; while convenient, this assumption is

often violated when measuring psychiatric disorders. Specifically, there may be a sizable group of

respondents whose answers place them at an extreme of the latent trait spectrum. In this article, a zero- and K-

inflated mixture model is developed to account for the presence of such respondents. The model is fitted using

an expectation-maximization (E-M) algorithm to estimate the percentage of the population at each end of the

continuum, concurrently analyzing the remaining “graded component” via IRT. A method to perform factor

analysis for only the graded component is introduced. In assessments of oppositional defiant disorder and

conduct disorder, the zero- and K-inflated model exhibited significantly better fit than the standard IRT

model.

Matthew Johnson Columbia University

[email protected]

“An Introduction to Rater Models”

In large-scale educational assessments, the use of open-ended items has become increasingly popular over the

past several years. Many argue that open-ended, or constructed response, items are required in order to tap

higher-level reasoning on assessments. Furthermore, certain academic abilities can, such as writing, can only

be assessed with constructed response items. The use of constructed

response items comes at a cost, however. Unlike multiple choice items, constructed-response items must be

graded by human raters. When multiple raters are utilized, the question of fairness comes to

the forefront. In an ideal world all raters would give the same score to the same student. However, when

partial credit is involved this is rarely the case. Raters often differ in terms of their severity and

in terms of the precision of their scores. In this tutorial I will discuss statistical models for

detecting/estimating rater severity and precision, and discuss methods for accounting for differences across

raters. Three approaches that will be discussed in detail are the FACETS model, the generalizability theory

approach, and the hierarchical rater model. Time permitting I will demonstrate the use of WinBUGs for

estimating the rater models.

Valen Johnson University of Texas MD Anderson Cancer Center

[email protected]

“Statistical Analysis of the National Institutes of Health Peer Review System”

A statistical model is proposed for the analysis of peer-review ratings of R01 grant applications submitted to

the National Institutes of Health. Innovations of this model include parameters that reflect differences in

reviewer scoring patterns, a mechanism to account for the transfer of information from an application‟s

preliminary ratings and group discussion to final ratings provided by all panel members and posterior

estimates of the uncertainty associated with proposal ratings. Application of this model to recent R01 rating

data suggests that statistical adjustments to panel rating data would lead to a 25% change in the pool of

funded proposals. Viewed more broadly, the methodology proposed in this article provides a general

framework for the analysis of data collected interactively from expert panels through the use of the Delphi

method and related procedures.

Nathalie Loye University of Montreal

[email protected]

496

“Validity of Diagnostic Resulting from Marriage Between Didactics and Measurement”

With CDM, diagnostic of cognitive processes used to answer items in a test is based on a Q-matrix. The

DINA model includes two items parameters that can be used to examine quality of Q-matrix, and thus validity

of diagnostic, as proposed by de la Torre (2008). This study presents some results related to the validity of the

diagnosis when Q-matrix is based on classification of items according to ontology of mathematics and

according to competences, as established by specialists in teaching. The data come from the test of

mathematics proposed in June 2008 at the École Polytechnique de Montreal.

Jun Lu American University

[email protected]

“A Bayesian Approach for Assessing Human Memory Using Process-Dissociation Procedure”

Modern theories of human memory postulate separate systems for conscious recollection and unconscious,

automatic recall. Psychologists have proposed experimental setups, such as the Process Dissociation

Procedure (PDP), to separate the contributions from these systems to overall memory. To analyze the PDP

data, however, researchers often aggregate outcomes across participants, items, or both. This can lead to

biased estimates due to the correlated random effects. To achieve better statistical inference, we propose a

Bayesian hierarchical model for PDP experiments. The model employs latent variables to model the

conscious and unconscious components of human memory. It features two probit models and priors with

possibly correlated random effects in situations where these correlations have theoretical interest. Three types

of priors are examined, and their efficacies are compared in a simulation study. The Bayesian approach is

also illustrated with data from a memory experiment and can be applied to signal detection problems.

Rebecca Nugent Carnegie Mellon University

[email protected]

“Clustering in Cognitive Diagnosis: Some Recent Work and Open Questions”

One primary objective in educational research is the development of methodology to characterize

students' ability on a set of skills. Ability could be defined as a binary response (either you have a skill or

you don't) or a continuous response on a spectrum of mastery (none to partial to complete). Traditionally,

estimation approaches have used latent class cognitive diagnosis models to fit (parametrically formulated)

item response functions. These models can quickly grow in complexity and estimation difficulty as the

number of examinees, items, and skills increase. More recently, cluster analysis has been suggested as an

alternative to identify groups of students with different latent ability. Preliminary work has shown this

potentially computationally advantageous approach can give comparable results with fewer constraints.

In this talk, we will review some parametric and nonparametric clustering approaches (and their

strengths/weaknesses) and summarize recent work using clustering in cognitive diagnosis. The

remainder of the talk will be devoted to discussing some open research problems including choosing a

method, labeling the clusters, visualization of high dimensional results, variable/skill selection (dimension

reduction; clustering in a reduced subspace), and, time permitting, clustering student learning trajectories.

Mario Peruggia

Ohio State University

[email protected]

“Hierarchical Bayes Models for Response Time Data”

497

Human response time data are widely used in cognitive psychology to evaluate theories of mental processing.

Typically, the data constitute the times taken by a subject to react to a succession of stimuli under varying

experimental conditions. A careful analysis of these data must distinguish and account for both local

dependencies and overall trends. Local dependencies might be directly related to the mental process

generating the response times. For example, a subject might perceive that her last few responses were slow

and try to compensate by speeding up the next few ones or vice versa. Local dependencies might also be due

to sudden distractions or attention gathering events. Overall trends might be due to learning. i.e. the fact that,

over time, a subject becomes better acquainted with the task he must perform, leading to shorter response

times. Or they might be due to fatigue, i.e. the fact that a subject tires to perform the same repetitive task,

leading to longer response times. Distinguishing between local dependencies and overall trends is difficult.

We consider a hierarchical Bayes approach that lets us handle the various modeling choices concerning local

dependencies and overall trends concurrently and lets us incorporate, in a unified fashion, the effects of

experimental covariates and of extreme observations. We use data from several experiments to illustrate and

compare the performance of the two modeling strategies. This talk is based on joint work with Peter

Craigmile and Trisha Van Zandt

Jeff Rouder University of Missouri

[email protected]

“Item-Response Models for Measuring Thresholds in Performance”

Human abilities in perceptual domains have historically been conceptualized with reference to a threshold.

Observers are consciously aware of a stimulus if and only if it is above a threshold intensity. One of the

critical questions in modern psychology is whether there is a subliminal effect of stimuli; that is, do stimuli

below threshold nonetheless affect subsequent behavior. Current statistical approaches to measuring

thresholds, however, define them as corresponding to an arbitrary level of performance, say .75 or .707, rather

than as the minimum amount of intensity corresponding to above-baseline performance. To fill this

measurement lacuna and assess the existence of subliminal perception, my colleagues and I developed a set of

item-response models for measuring thresholds. Traditional psychometric links, such as the probit, logit, and

t, are incompatible with thresholds as there are no true scores corresponding to baseline performance. We

introduce a truncated probit link for modeling thresholds and develop a two-parameter IRT model based on

this link. The model is Bayesian and analysis is performed with MCMC sampling. Based on model analysis,

my colleagues and I fail to find evidence for subliminal perception. The measurement of threshold is broadly

applicable in many domains outside of subliminal perception. Example include measuring the maximum

length of time a driver can drive without fatigue effects, or measuring the maximum amount of ambient noise

an air-traffic controller can tolerate without raising the chance for a mistake.

Yanyan Sheng

Southern Illinois University

[email protected]

“Bayesian Analysis of Item Response Theory Models”

Item response theory (IRT) provides a collection of models that describe how items and persons interact to

yield probabilistic correct/incorrect responses. The influence of items and persons on the responses is

modeled by distinct sets of parameters, and simultaneous estimation of these parameters calls for complex

statistical procedures. The unidimensional model where one latent trait is assumed will be reviewed from a

theoretical viewpoint with an emphasis on the Bayesian estimation techniques of the model parameters. As

Markov chain Monte Carlo simulation techniques offer promise for parameter estimation with complex

models, multidimensional IRT models appropriate in situations where items measure multiple traits will be

introduced in the fully Bayesian framework.

498

Sandip Sinharay Educational Testing Service

[email protected]

“A Critical Evaluation of Diagnostic Score Reporting: Some Theory and Applications”

Diagnostic scores are of increasing interest due to their potential remedial and instructional benefit. Naturally,

the number of testing programs that report diagnostic scores is on the rise, as are the number of research

works on cognitive diagnostic models that can be used to report diagnostic scores. This presentation will raise

the question whether the existing educational tests are ready to report high-quality diagnostic scores. A brief

review will be provided of a recent method (Haberman, 2008; Sinharay, Haberman, & Puhan, 2007) that can

be used to determine whether there is enough justification behind reporting the subscores (that are the

simplest form of diagnostic scores and are reported by several testing programs) for a test. The presentation

will also discuss the recent method of Haberman, Sinharay, and Puhan (2009) that extends the method of

Haberman (2008) to aggegate-level subscores, and of the method of Haberman and Sinharay (2009) that

extends the method of Haberman (2008) to diagnostic scores based on multidimensional item response theory

(MIRT) models. Using results from several operational and simulated data sets, it will be demonstrated that

diagnostic scores are likely to provide any benefits for only a handful of existing educational tests. Some

recommendations are made for those interested in reporting diagnostic scores.

Dongchu Sun University of Missouri

[email protected]

“Bayesian Hierarchical Models for Recognition-Memory Experiments”

Bayesian hierarchical probit models are developed for analyzing the data from the recognition-memory

experiment in Psychology. Both informative priors and noninformative priors are reviewed.

For the informative priors, two latent variable structure priors are proposed. One is a generalization of the

share-component prior in Lu et al. (2007). The other is based on a structure from factor analysis. The latent

variable structure priors model the correlation across participant effects and item effects flexibly by not

requiring the predetermination of the correlation sign.

Meanwhile, three options for objective Bayesian analysis are studied. The first option is to assign the constant

prior for all of the effect parameters. The second option is to use shrinkage priors by assigning bivariate

normal priors on participant effect parameters and item effect parameters at the first level, and objective priors

such as a constant prior on the covariance matrices.

The third option is to apply the constant prior to the two mean effects and hierarchical priors to the participant

and item effects with a class of noninformative priors on the variance components. The propriety of posteriors

are examined underthese options.

Simulation studies and real data analyses are conducted for comparing proposed and some existing methods.

Richard Swartz University of Texas, MD Anderson Cancer Center

[email protected]

“Bayesian and Classical Computerized Adaptive Testing Item Selection Algorithms”

Computerized Adaptive Testing (CAT) is one of the important applications of item response theory.

CAT algorithms adaptively select and sequentially administer items to estimate a person‟s latent trait.

499

CAT has been applied in educational and vocational tests and more recently patient-reported outcomes

questionnaires. It has been shown that applications of CAT obtain better measurement precision and test

efficiency than the standard paper-and-pencil format. A variety of item selection procedures have been

proposed for CAT and are in use. This presentation will discuss some of the classical and Bayesian

methods for item selection and trait estimation. A recent study comparing several of the different

selection methods will then be briefly discussed. In the study it was found that many of the item selection

procedures performed very similarly.

Curtis Tatsuoka

Case Western Reserve University

[email protected]

“Partially Ordered Classification Models: a General Framework for Cognitive Diagnosis and Adaptive

Testing”

Partially ordered sets (posets) are natural models for cognitive functioning. A statistical framework will

be described for the problem when there exists a "true" state among a partially ordered collection of

states, and observations from sequentially selected experiments (test items, measures) are used to identify

it. An advantage of adaptive testing is that testing burdens can be greatly reduced. Results on

asymptotically optimal sequential selection of experiments will be described in the context of Bayesian

classification problems when the parameter space is a finite lattice. Experiment selection rules and the

attainment of optimal rates of convergence will be discussed.

Finally, an analysis of neuropsychological testing data will be presented. This approach is used to detect

differential cognitive profiles in neurological diseases, and to link classification results to clinical course

variables.

David Thissen University of North Carolina

[email protected]

“IRTPRO Demonstration”

The IRTPRO (Item Response Theory for Patient-Reported Outcomes) component of the MEDPRO Project is

an entirely new application for item calibration and test scoring using IRT. Fall, 2009 release of this software

is anticipated; this presentation briefly describes its features, user interface, and output. IRTPRO provides

maximum likelihood calibration of items fitted with the 1PL, 2PL, 3PL, Graded, Generalized Partial Credit,

and Nominal IRT models in any combination, using one of three estimation algorithms: 1) Bock-Aitkin EM,

2) adaptive quadrature, or 3) Metropolis-Hastings Robbins-Monro (MHRM). Unidimensional or

multidimensional IRT models may be used; among multidimensional models, the implementation performs

full-information estimation for exploratory and confirmatory models, including the special-case treatment

appropriate for bi-factor models. Analysis of differential item functioning (DIF) is also provided, using the

Wald test, with accurate item parameter error variance-covariance matrices computed using the Supplemented

EM (SEM) algorithm. Several goodness of fit and diagnostic statistics are reported. Standard maximum a

posteriori (MAP) and expected a posteriori (EAP) estimates of the latent variable(s) for item response patterns

may be computed, as well as (weighted) summed-score to scale score translation tables.

Trisha Van Zandt Ohio State University

[email protected]

“An Overview of Response Time Models in Psychology”

500

Models of cognition focus on the speed and accuracy of simple choices. For example, a memory task may

ask a subject to discriminate between old (studied) and new (not studied) objects. The choice between old

and new is modeled as a process of information accumulation over time. When the evidence for one of the

choice options reaches a threshold or criterion level, a response indicating that option is made. These models

are implemented as stochastic processes such as Poisson and Wiener processes.

This talk is a tutorial that provides an overview of the different kinds of stochastic processes used in the

modeling of simple choice. I will present some historical background and motivation for how different tasks

have been modeled. My talk provides

the background for Mario Peruggia's presentation, which presents a hierarchical treatment of response time

data motivated by these types of processes.

Matthias von Davier ETS

mvondavier@ets

“Notes on Models for Cognitive Diagnosis”

Models for cognitive diagnosis - as they are called in their current incarnation - have been around for some

time now. Multidimensional latent structure models, constrained latent class models, or multiple classification

latent class models are other terms that may be more descriptive with respect to the the target of inference

(von Davier & Yamamoto, 2004, von Davier, 2005). Cognitive diagnosis describes one type of application

where a model with multidimensional latent variable is utilized to assess respondents with respect to multiple

cognitive skills. The idea of extracting more from test data than a single ability parameter or person score

has also been around for some time now, but is not without pitfalls in cases where the test instrument was not

designed for this purpose (Haberman & von Davier, 2006, von Davier, 2009). This presentation will give an

overview of the field from the perspective of what can be gained from these models, and discuss some recent

findings on models for cognitive diagnosis and their relationship to other models. The presentation will

provide information about exemplary applications of models for cognitive diagnosis, and on the estimation of

these models.

Mark Wilson University of California, Berkeley

[email protected]

“A Nonlinear Mixed Models Approach to IRT”

The central message of the workshop is that it is beneficial to see IRT models as extensions of generalized

linear regression models that seek to model facets of the measurement situation: These facets are most

typically persons and items, but the set may be extended to incorporate other facets such as raters, and may

also be re-labelled to suit particular applications. While the link function and the random component of the

regression model remain the same, the most interesting part of the extension concerns the structural part of the

model: (1) the kind of predictive function (linear or nonlinear, e.g. bilinear), (2) the effects (weights) of the

predictors (fixed effects or random effects).

Starting from some well-known IRT models, other and less well-known models will be framed in this

approach, based on a volume published by Springer: “Explanatory Item Response Models: A generalized

linear and nonlinear mixed approach” (De Boeck & Wilson, 2004). We will illustrate how the models can be

estimated with the SAS procedure NLMIXED.

The workshop will consist of two parts. In the first part, the explanatory item response framework will be

presented, and it will be explained how the framework fits within the family of generalized linear and

501

nonlinear mixed models. Specific attention will be devoted to the distinction between descriptive and

explanatory item response models, and the distinction between fixed and random effects. It will be shown

how well-known item response models fit within this framework. In addition, the framework naturally leads

to new item response models, such as models with both random item and random person

effects.

In the second part, an in-depth account will be given of multidimensional item response models, and models

for polytomous data. Again, both families of models can be conceptualized as generalized linear and

nonlinear mixed models, and doing so naturally leads to model extensions that may be of interest to the

applied researcher. In this part, some attention will be devoted to model estimation as well.

Throughout, the models are illustrated with datasets on anger and verbal aggression. We will also emphasize

random item concepts and models.

Song Zhang University of Texas Southwestern Medical Center

[email protected]

“A Bayesian Hierarchical Model for Multi-rater Data with Fine Scales”

During a peer review process, reviewers are often instructed to assign scores on a fine scale. For example, the

National Institutes of Health adopts a scoring scale of 1.0 to 5.0 with an incremental unit of 0.1. Despite the

instruction, raters often assign scores with dual scales, i.e., a rater might assign scores in the increment of 0.1

(1.1, 1.2, 1.3,...) in a certain range while assign scores in the increment of 0.5 in another (4, 4.5, ...).

Furthermore, some half-integer scores (such as 1.5, 2.0,..) are assigned more often than the others. Most of the

current ranking methods are based on ordinal models which only utilize the ordinal information of the scores.

The numerical difference between two scores is considered irrelevant as long as the order remains unchanged.

We propose a Bayesian multinomial logistic model where the numerical (besides ordinal) information of the

scores is utilized in inference. The model includes parameters of rater bias, discrimination and measurement

error. Particularly, the measurement error is allowed to have different magnitudes over the scoring range

which naturally accounts for dual scales and different frequencies of the scores. A simulation study and a real

data example are presented.

502

2. Stochastic Dynamics

Schedule Sunday, August 30, 2009

Radisson Hotel RTP

Overview Tutorials

8:00-8:55 Registration and Continental Breakfast

8:55- 9:00 Welcome

9:00-10:30 Tutorial Lecture

Peter Baxendale, University of Southern California

“Equilibrium, Bifurcation and Averaging for a Nonlinear Oscillator Perturbed by White

Noise”

10:30-11:00 Coffee Break

11:00-12:30 Tutorial Lecture

Mark Alber, Notre Dame

“Stochastic Modeling and Applications to Biology and the Medical Sciences”

12:30 –1:45 Lunch

1:45- 3:15 Tutorial Lecture

Grant Lythe, University of Leeds

“Numerical Methods for Stochastic Dynamics”

3:15-3:45 Coffee Break

3:45-5:15 Tutorial Lecture

Andrew Stuart, Warwick University

“Limit Theorems for MCMC”

503

Monday, August 31, 2009

Radisson Hotel RTP

8:00-8:45 Registration and Continental Breakfast

8:45-9:00 Welcome

9:00-12:15 Theme 1: Qualitative Behavior of Stochastic Dynamical Systems and Stochastic

Modeling

9:00-9:45 Barbara Gentz, University of Bielefeld

“Metastable Lifetimes in Coupled Random Dynamical Systems”

9:45-10:30 Rachel Kuske, University of British Columbia

“Noise Driven Order in Biological Models”

10:30-11:00 Coffee Break

11:00-11:30 Lee DeVille, University of Illinois

“Synchrony-breaking and Rare Events in Stochastic Neuronal

Networks”

11:30-12:15 Panel Discussion:

Cindy Greenwood (Moderator), Arizona State University

Peter Baxendale, University of Southern California

Leonid Koralov, University of Maryland

12:15-1:30 Lunch

1:30-4:45 Theme 2: Stochastic Dynamics Across Many Scales

1:30-2:15 Peter Kotelenez, Case Western University

“From Microscopic to Mesoscopic and Macroscopic Equations”

2:15-3:00 Sandra Cerrai, University of Maryland

"Fast Transport Asymptotics for SPDE's with Noise Acting on the Boundary"

3:00-3:30 Coffee Break

3:30-4:00 Houman Owhadi, California Institute of Technology

“Non-intrusive and Structure Preserving Multiscale Integration of Stiff

SDEs and Stochastic Hamiltonian Systems with Hidden Slow Dynamics

via Flow Averaging”

504

4:00-4:45 Panel Discussion:

Markos Katsoulakis, (Moderator) University of Massachusetts

Hakima Bessaih, University of Wyoming

Anna Amirdjanova, University of Michigan

4:45-5:15 Five Minute Madness (6 talks)

5:15-5:45 Poster Advertisement Session (2 minute ads each)

6:00-8:00 Poster Session and Reception SAMSI will provide poster presentation boards and tape. The board dimensions are 4 ft. wide by 3 ft.

high. They are tri-fold with each side being 1 ft. wide and the center 2 ft. wide. Please make sure your

poster fits the board. The boards can accommodate up to 16 pages of paper measuring 8.5 inches by 11

inches.

Tuesday, September 1, 2009

Radisson Hotel RTP

8:00-9:00 Registration and Continental Breakfast

9:00-12:15 Theme 3: Challenges in Numerical Methods for Stochastic Systems

9:00-9:45 Eric Vanden-Eijnden, Courant Institute, New York University

“Some Issues in the Long-time Simulation of Complex Systems”

9:45-10:30 Gabriel Lord, Heriot Watt University

“Numerics for SPDEs”

10:30-11:00 Coffee Break

11:00-11:30 David Anderson, University of Wisconsin

“Error Analysis of Approximation Methods for Stochastically Modeled

Chemical Reaction Systems”

11:30-12:15 Panel Discussion:

Thomas Kurtz, (Moderator) University of Wisconsin

Grant Lythe, University of Leeds

Kevin Lin, University of Arizona

12:15-1:30 Lunch

1:30-4:45 Theme 4: Estimation and Data Assimilations in Stochastic Dynamics

1:30-2:15 Darren Wilkinson, Newcastle University

“Bayesian Inference for Markov Process Models, with Applications to

Systems Biology”

2:15-3:00 Christof Schuette, Free University, Berlin

3:00-3:30 Coffee Break

3:30-4:00 John Harlim, North Carolina State University

“Stochastic Parameterization Extended Kalman Filter for Turbulent

Signal with Model Errors”

505

4:00-4:45 Panel Discussion:

Andrew Stuart (Moderator), Warwick University

John Fricks, Pennsylvania State University

Scott Schmidler, Duke University

4:45-5:30 5-Minute Madness

506

Wednesday, September 2, 2009

Radisson Hotel RTP

8:00-9:00 Registration and Continental Breakfast

9:00-12:15 Theme 5: Dynamics of Biological Networks

9:00-9:45 Lea Popovic, Concordia University

"Stochastic Modeling of Cellular Reaction Networks"

9:45-10:30 David Cai, New York University

“Mathematical Analysis of Neuronal Network Dynamics”

10:30-11:00 Coffee Break

11:00-11:30 Katie Newhall, Rensselaer Polytechnic Institute

“Synchrony in Stochastic Pulse-coupled Neuronal Network

Models”

11:30-12:15 Panel Discussion:

Mike Reed (Moderator), Duke University

Danny Forger, University of Michigan

Tim Elston, University of North Carolina

12:15-1:30 Lunch

1:30–2:30 Working Group Formation and Initial Meeting

2:30-3:30 Working Group Reports and Scheduling for Thursday and Friday

Thursday and Friday: Individual working group meetings at SAMSI.

SPEAKER ABSTRACTS

David Anderson University of Wisconsin

[email protected]

“Error Analysis of Approximation Methods for Stochastically Modeled Chemical Reaction

Systems”

We perform an error analysis for numerical approximation methods of continuous time Markov

chain models commonly found in the chemistry and biochemistry literature. The motivation for

the analysis is to be able to compare the accuracy of approximation methods and, specifically,

standard tau-leaping and a midpoint method. We perform our analysis under a scaling in which

the size of the time discretization is inversely proportional to some (bounded) power of the norm

of the state of the system. This is unlike previous error analyses that assumed the size of the time

discretization goes to zero independently from the abundances of the constituent molecules. Our

scaling is a natural choice because of the existence of statistically exact algorithms that can be

employed in the case of extremely small time-steps and which make approximate methods

507

unnecessary in such a scaling regime. Under the present scaling we show that the midpoint

method achieves a higher order of accuracy, in both a weak and a strong sense, than standard

tau-leaping; this result is in contrast to previous analyses. We present examples that demonstrate

our findings.

References: [1] David F. Anderson, Arnab Ganguly, Thomas G. Kurtz, Error Analysis of

Approximation Methods for Stochastically Modeled Chemical Reaction Systems. in preparation.

Peter Baxendale University of Southern California

[email protected]

“Equilibrium, Bifurcation and Averaging for a Nonlinear Oscillator Perturbed by White Noise”

This talk will demonstrate how a number of techniques of stochastic dynamics can be applied to

a particular two-dimensional system. Topics will include: Lyapunov exponents for linear

stochastic dynamical systems; the interplay between nonlinear systems and their linearisation;

applications of Dynkin's formula (equivalently local martingale techniques) to issues of non-

explosion, recurrence and stationary measures; behavior of stationary measures near a stochastic

bifurcation point; and stochastic averaging.

David Cai New York University

[email protected]

“Mathematical Analysis of Neuronal Network Dynamics”

From the perspective of nonlinear dynamical systems, nonequilibrium statistical physics, and

scientific modeling, we will briefly describe some recent developments of mathematical methods

used in analysis of the dynamics of neuronal networks arising from the brain. We will outline

some mathematical difficulties in characterization of neuronal network dynamics for the

understanding of information processing in the brain.

Sandra Cerrai University of Maryland

[email protected]

"Fast Transport Asymptotics for SPDE's with Noise Acting on the Boundary"

We consider a class of stochastic reaction-diffusion equations having also a stochastic

perturbation on the boundary and we show that when the diffusion rate is much larger than the

rate of reaction it is possible to replace the SPDE's by a suitable one-dimensional stochastic

differential equation. This replacement is possible under the assumption of spectral gap for the

diffusion and is a result of {\em averaging} in the fast spatial transport. We also study the

fluctuations around the averaged motion.

508

Lee DeVille University of Illinois

[email protected]

“Synchrony-breaking and Rare Events in Stochastic Neuronal Networks”

We consider a stochastic model for a network of discretized pulse-coupled oscillators containing

randomness both in input and in network architecture. We analyze the scalings which arise in

certain limits and various finite-size effects as perturbations of these limits. Most notably, for

certain parameters this network supports both synchronous and asynchronous modes of behavior

and will switch stochastically between these modes due to rare events. We also relate the

analysis of this network to classical results in random graph theory --- in particular, those

involving the size of the giant component in the Erdos-Renyi random graph. Finally, we study

the effect of network topology on certain mean observables. This work is joint with Charles

Peskin and Joel Spencer.

Barbara Gentz University of Bielefeld

[email protected]

“Metastable Lifetimes in Coupled Random Dynamical Systems”

For a system of coupled Brownian particles in a potential landscape, we will discuss the

dependence of metastable transition times on coupling strength and noise intensity. For weak

coupling, the system behaves like a stochastic particle system, while for strong coupling the

system synchronizes and transitions between metastable states occur almost simultaneously for

all particles. The transition between these extremal regimes involves a sequence of symmetry-

breaking bifurcations.

Away from bifurcation points, the small-noise asymptotics of the transition times is given by the

Eyring-Kramers formula. However, this formula breaks down in the presence of degenerate

(non-quadratic) saddles or wells as present in our system. We will show how results by Bovier,

Eckhoff, Gayrard and Klein, providing the first rigorous proof of the Eyring-Kramers formula,

can be extended to such degenerate landscapes.

This is joint work with Nils Berglund (Orléans) and Bastien Fernandez (CPT-CNRS, Marseille).

John Harlim North Carolina State University

[email protected]

“Stochastic Parameterization Extended Kalman Filter for Turbulent Signal with Model Errors”

An important emerging scientific issue in many practical problems ranging from climate and

weather prediction to biological science involves the real time filtering and prediction through

partial observations of noisy turbulent signals for complex dynamical systems with many degrees

509

of freedom. Model errors are often unavoidable due to our inability to justify the physical

processes and/or to even numerically resolve some physical processes with current super-

computers even if they are well understood. In this talk, I will present a novel reduced filtering

strategy, the ``Stochastic Parameterization Extended Kalman Filter" (SPEKF) algorithm that

systematically corrects both multiplicative and additive bias in an imperfect model, by blending

ideas from classical stability analysis for PDE's, Kalman filter, and stochastic models from

turbulence theory.

Peter Kotelenez Case Western Reserve University

[email protected]

“From Microscopic to Mesoscopic and Macroscopic Equations”

In the first part of the talk we sketch the derivation of a system of stochastic ordinary differential

equations (SODEs) from systems of microscopic deterministic equations in space dimension

$d\geq 2$ via a mesoscopic scaling limit. This scaling limit preserves a spatial correlation

length, $\varepsilon$, between Brownian motions. In a continuum limit (mass of a particle tends

to $0$, the number of particles to $\infty$) the empirical distribution of the system of SODEs

defines a class of quasi-linear stochastic partial differential equations (SPDEs). As the

correlation length $\varepsilon$ tends to $0$ the solutions of the SPDEs tend to the solutions of

quasi-linear (deterministic) PDEs, provided the initial conditions converge appropriately.

In the second part of the talk we will analyze in detail the qualitative behavior of correlated

Brownian motions. We show that at close distance correlated Brownian motions show attractive

behavior which is consistent with the depletion phenomena in colloids.

At the end we will focus on a possible numerical scheme for the time evolution of a finite

number of solute molecules, based on the mesoscopic limit theorem.

Rachel Kuske University of British Columbia

[email protected]

“Noise Driven Order in Biological Models”

Noise-stabilized transients in biological models: This talk covers a few biological models where

the noise-sensitivity is due to the intrinsic multiple time scales, sometimes hidden in the

dynamics. We focus on oscillations sustained by stochastic effects in otherwise quiescent

systems, and show how ideas from unrelated applications can be transfered to these new areas.

The fact that these oscillations have a strongly deterministic behavior, even though they are

purely noise-induced, obscures their stochastic sensitivity. The types of phenomenon include

amplification through coherence resonance, as well as synchronization and mix-mode

oscillations. We show how different stochastic multiple scales analyses illustrate the

mechanisms responsible for the phenomena and the conditions for their appearance.

510

Gabriel Lord

Heriot Watt University

[email protected]

“Numerics for SPDEs”

This talk will introduce some numerical methods for approximating stochastic PDEs and review

results on using an exponential type integrator and a multiscale type approach. We will examine

more recent results on the path-wise convergence analysis. We will motivate the talk with

practical examples drawn from areas such as computational neuroscience and porous media flow

and give an indication of open areas of interest. (draft)

Will discuss joint work with P. Kloeden, A. Neunkirch and T. Shardlow; S. Geiger and A.

Tambue as well as E. Coutts and I. Bello.

Katie Newhall

Rensselaer Polytechnic Institute

[email protected]

“Synchrony in Stochastic Pulse-coupled Neuronal Network Models”

Many pulse-coupled dynamical systems possess synchronous attracting states. Even

stochastically driven model networks of Integrate and Fire neurons demonstrate synchrony over

a large range of parameters. We study the interplay between fluctuations which de-synchronize

and synaptic coupling which synchronizes the network by calculating the probability to see

repeated {\em total firing events}, during which all the neurons in the network fire at once. The

mean time between total firing events characterizes the perfectly synchronous state, and is

computed from a first-passage time problem in terms of a Fokker-Planck equation for a single

neuron.

Houman Owhadi California Institute of Technology

[email protected]

“Non-intrusive and Structure Preserving Multiscale Integration of Stiff SDEs and Stochastic

Hamiltonian Systems with Hidden Slow Dynamics via Flow Averaging”

We introduce a new class of integrators for stiff ODEs as well as SDEs. An example of subclass

of systems that we treat are ODEs and SDEs that are sums of two terms one of which has large

coefficients. These integrators are (i) {\it Multiscale}: they are based on flow averaging and so

do not resolve the fast variables but rather employ step-sizes determined by slow variables (ii)

{\it Basis}: the method is based on averaging the flow of the given dynamical system (which

may have hidden slow and fast processes) instead of averaging the instantaneous drift of

511

assumed separated slow and fast processes. This bypasses the need for identifying explicitly (or

numerically) the slow or fast variables. (iii) {\it Non intrusive}: A pre-existing numerical scheme

resolving the microscopic time scale can be used as a black box and turned into one of the

integrators in this paper by simply turning the large coefficients on over a microscopic timescale

and off during a mesoscopic timescale. (iv) {\it Convergent over two scales}: strongly over slow

processes and in the sense of measures over fast ones. We introduce the related notion of two

scale flow convergence and analyze the convergence of these integrators under the induced

topology. (v) {\it Structure preserving}: For stiff Hamiltonian systems (possibly on manifolds),

they are symplectic, time-reversible, and symmetric (under the group action leaving the

Hamiltonian invariant) in all variables. They are explicit and apply to arbitrary stiff potentials

(that need not be quadratic). Their application to the Fermi-Pasta-Ulam problems shows

accuracy and stability over 4 orders of magnitude of time scales. For stiff Langevin equations,

they are symmetric (under a group action), time-reversible and Boltzmann-Gibbs reversible,

quasi-symplectic on all variables and conformally symplectic with isotropic friction. This is a

joint work with Molei Tao and Jerry Marsden.

Lea Popovic Concordia University

[email protected]

"Stochastic Modeling of Cellular Reaction Networks"

A cellular network can be modeled mathematically using principles of chemical kinetics. These

are particularly useful for studying the dynamic aspects of cells such as gene transcription,

translation, regulation, and protein-protein interactions. At the cellular level, chemical dynamics

are often dominated by the action of regulatory molecules present at levels of only a few copies

per cell. This implies that an adequate description of cellular dynamics requires analysis of

stochastic models on multiple scales. One can exploit the multiscale nature to reduce the

analytical and computational complexity of such models using approximate asymptotic models.

In general, approximations will be "hybrid" in the sense that some components will be discrete,

some diffusive, and some absolutely continuous. We give fluid and diffusion approximation

results for a subset of chemical species when the rest of the chemical species follow faster

ergodic dynamics.

Andrew Stuart Warwick University

[email protected]

“Limit Theorems for MCMC”

I will describe various results concerning the limiting behaviour of MCMC methods in high

dimensions. I will study Random Walk and Langevin based Metropolis-Hastings methods, as

well as Hybrid Monte Carlo. In all cases the key idea is to choose proposal scalings which ensure

an acceptance probability of order one, uniformly in dimension. Limit theorems may then be

proved and used to extract information about the computational complexity of MCMC methods.

The limiting process will be an S(P)DE, or a piecewise deterministic Markov process.

512

I will start by working with product measures and then demonstrate how this analysis may be

generalized to study the sampling of non-product measures. In particular I will highlight the

importance in applications of measures which are absolutely continuous with respect to a product

measure on a Hilbert space and show how the foregoing theory applies in this situation.

Applications will be indicated including data assimilation in oceanography, atmospheric

sciences and the geosciences, as well as problems from molecular dynamics and signal

processing.

Eric Vanden-Eijnden

Courant Institute

[email protected]

“Some Issues in the Long-time Simulation of Complex Systems”

The dynamics of many systems of interest in physics, chemistry and biology span many spatio-

temporal scales, making their analysis by numerical simulations difficult. Indeed, while we are

bound to simulate the dynamics at the smallest and fastest scales, we are often interested in the

system's behavior in the largest and slowest scales. These are typically not accessible to brute

force simulations, both because of the computational cost involved but also because these time

scales are way beyond the time at which we lose pathwise accuracy with standard numerical

integrators. These difficulties force us to change perspective and look at the systems from a

probabilistic viewpoint. Indeed, besides being unreliable, long time numerical trajectories are

also irreproducible due to sensitivity to initial conditions and/or small perturbations, and they are

typically uninformative per se. By identifying the right statistical descriptors for the systems that

characterize their behavior, like e.g. their invariant measure, the time-correlation functions of

suitable observable, or several statistical objects describing rare reactive, we can build algorithms

that are tailor-made to calculate these descriptors accurately and efficiently. This strategy will be

elucidated in this talk and illustrated via examples.

Darren Wilkinson Newcastle University

[email protected]

“Bayesian Inference for Markov Process Models, with Applications to Systems Biology”

This talk will provide an overview of the problem of conducting Bayesian inference for the fixed

parameters of nonlinear multivariate Markov processes observed partially, discretely, and

possibly with error. A sequential strategy based on either SIR or MCMC-based filtering will be

presented. For models based on stochastic differential equations, a method exploiting non-linear

diffusion bridges will be discussed, and a "global" MCMC algorithm that does not degenerate as

the degree of data augmentation increases will be elucidated. The relationship of these

techniques to methods of approximate Bayesian computation will be highlighted, and

applications to problems in systems biology will be explored.

513

#3. Tutorials and Workshop on Space-time Analysis for Environmental Mapping,

Epidemiology and Climate Change

Schedule Sunday, September 13

Radisson Hotel RTP

Overview Tutorials

8:00-8:55 Registration and Continental Breakfast

8:55- 9:00 Welcome

9:00-10:30 Tutorial Lecture 1: Alan Gelfand, Duke University

“Hierarchical Modeling for the Analysis of Spatial Data”

10:30-11:00 Coffee Break

11:00-12:30 Tutorial Lecture 2: Jim Zidek, University of British Columbia

“Designing Monitoring Networks”

12:30 –1:45 Lunch

1:45- 3:15 Tutorial Lecture 3: Michael Stein, University of Chicago

“Asymptotics for Spatial and Spatial-temporal Data”

3:15-3:45 Coffee Break

3:45-5:15 Tutorial Lecture 4: Sylvia Richardson, Imperial College London

“Introduction to Spatial Epidemiology”

Monday, September 14

Radisson Hotel RTP

8:15-9:00 Registration and Continental Breakfast

9:00-9:15 Welcome

9:15-12:00 Session on Climate Change

Chair: Michael Stein, University of Chicago

Claudia Tebaldi, Climate Central/UBC

“Ensemble of Climate Models”

Lenny Smith, London School of Economics

514

“The Science of Impacts and the Impacts of Science: Bayesian Climate Forecasts and

Adaptation in the United Kingdom”

Coffee Break (30 minutes)

Discussion Session

Amy Braverman, California Institute of Technology

Montse Fuentes, North Carolina State University

Richard Smith, University of North Carolina

12:00-1:15 Lunch

1:15-3:30 Session on Random Fields and Applications

Chair: Dongchu Sun, University of Missouri

Chris Wikle, University of Missouri

“Don't Forget the Process!: Using Scientific Process Knowledge to Motivate Spatio-

Temporal Models”

Håvard Rue, Norwegian University of Science and Technology

“Spatial Modeling and Inference using SPDEs”

Discussion Session

Noel Cressie, Ohio State University

Leo Held, University of Zurich

Alexandra Schmidt, Universidade Federal do Rio de Janeiro

3:30-4:00 Coffee Break

4:00-5:00 New Researcher Session

Chair: Alan Gelfand, Duke University

Cari Kaufman, University of California, Berkeley

“Calibration and Prediction Problems in Catchment Scale Hydrology”

Mikyoung Jun, Texas A&M University

“Nonstationary Cross-covariance Models for Multivariate Processes on a Globe”

Anna Michalak, University of Michigan

“Geostatistical Inverse Modeling for Characterizing the Global Carbon Cycle”

5:00-5:40 Poster Advertisement Session (2 minute ads each)

6:30–8:30 Poster Session and Reception

SAMSI will provide poster presentation boards and tape. The board dimensions are 4 ft.

wide by 3 ft. high. They are tri-fold with each side being 1 ft. wide and the center 2 ft.

wide. Please make sure your poster fits the board. The boards can accommodate up to

16 pages of paper measuring 8.5 inches by 11 inches.

Tuesday, September 15

515

Radisson Hotel RTP

8:15-9:00 Registration and Continental Breakfast

9:00-12:00 Session on Spatial Epidemiology

Chair: Richard Smith, University of North Carolina

Michelle Bell, Yale University

“Critical Issues of Exposure Assessment for Human Health Studies of Air Pollution”

Lianne Sheppard, University of Washington

“Linking Exposures to Health: Exposure Effects and Measurement Error in Cohort Studies”

Coffee Break

Discussion Session

Kiros Berhane, University of Southern California

Francesca Dominici, Harvard University

Dongchu Sun, University of Missouri

12:00-1:15 Lunch

1:15-3:30 Session on Spatial and Time-space Point Processes

Chair: Noel Cressie, Ohio State University

Jesper Møller, University of Aalborg

"Spatial and Space-time Point Processes: Theory"

Lance Waller, Emory University

"Spatial and Space-time Point Processes: Applications"

Discussion Session

Marie-Colette van Lieshout, CWI & Eindhoven University of Technology

Charmaine Dean, Simon Fraser University

Janine Illian, University of St. Andrews

3:30 -4:00 Coffee Break

4:00-5:40 New Researcher Session

Chair: Jim Zidek, University of British Columbia

Ethan Anderes, University of California, Davis

“Estimating Nonstationarity using Local Likelihoods”

Dan Cooley, Colorado State University

“Modeling Precipitation Extremes from a Regional Climate Model Simulations”

Crystal Linkletter, Brown University

“Latent Socio-Spatial Process Model for Social Networks”

Chris Paciorek, University of California, Berkeley

“The Importance of Scale for Spatial-confounding Bias and Precision of Spatial

Regression Estimators”

516

Huiyan Sang, Texas A&M University

"Bayesian Modeling of Spatial Extreme Values"

Wednesday, September 16

Radisson Hotel RTP

8:15-9:00 Registration and Continental Breakfast

9:00-12:00 Session on Interface of Deterministic Modeling and Space-time Statistics

Chair: Montse Fuentes, North Carolina State University

Mark Berliner, Ohio State University

“Combining Models and Observations: Bayesian Approaches”

Doug Nychka, NCAR

“Interpreting Geophysical Numerical Models using Spatial Statistics”

Coffee Break

Discussion Session

Susie Bayarri, University of Valencia

Dave Higdon, Los Alamos National Laboratory

Gardar Johannesson, Lawrence Livermore National Laboratory

12:00-1:15 Lunch

1:15–2:45 Working Group Formation and Initial Meeting

2:45 Break

2:45-3:30 Working Group Reports and Scheduling for Thursday and Friday

Thursday, September 17, and Friday, September 18

SAMSI: Individual working group meetings

SPEAKER ABSTRACTS

Ethan Anderes University of California, Davis

[email protected]

“Estimating Nonstationarity using Local Likelihoods”

This talk will present work on using local stationary approximations to estimate the covariance

structure in nonstationary random fields. We will begin the talk by discussing general approaches

to modeling nonstationarity and current estimation techniques for some particular examples. For

the remainder of the talk we will discuss our version of local likelihood estimation and some

practical issues that arise.

517

Mark Berliner Ohio State University

[email protected]

“Combining Models and Observations: Bayesian Approaches”

Advances in computational and observational assets offer both new opportunities and new

challenges in analyzing and predicting the behavior of complex systems. The challenge of

combining information is my main topic. Relevant information sources include observations,

theory, computer model output, past experience, etc. Modern datasets and computer model

output may be extremely large, so extracting their crucial information contents can be difficult.

Further, the enterprise is difficult due to uncertainties present in virtually all information sources,

mandating the need for uncertainty management. I suggest Bayesian hierarchical modeling as a

framework for approaching these issues. The presentation is organized around two classes of

approaches and focuses on examples arising in climate science, ocean forecasting, and glacial

dynamics.

Dan Cooley Colorado State University

[email protected]

“Modeling Precipitation Extremes from a Regional Climate Model Simulations”

Regional climate models (RCMs) are tools which allow scientists to begin to understand how

different forcings may affect climate. In this presentation we summarize two projects each

which build a spatial hierarchical model for RCM precipitation data. In the first project, we

compare extreme precipitation from a control and future simulation of a single RCM. In the

second, we compare extreme precipitation from six different RCMs all being driven by the same

reanalysis data.

Mikyoung Jun Texas A&M University

“Nonstationary Cross-covariance Models for Multivariate Processes on a Globe”

In geophysical and environmental problems, it is common to have multiple variables of interest

measured in the same location and time. As a consequence, there is a growing interest in

developing models for multivariate spatial processes, in particular, the cross covariance models.

We present a class of parametric covariance models for multivariate processes on a globe. The

covariance models are flexible in capturing nonstationarity in the data yet computationally

feasible requiring moderate number of parameters. We apply our covariance model to surface

temperature and precipitation data from an NCAR climate model to estimate the cross

correlations between the surface temperature and precipitation. We comopare the fitted results

from our model to those from the Mat´ern cross-covariance function by Gneiting, Kleiber, and

Schlather (2009) and various versions of the linear model of coregionalization. We also give a

comparison between our

fitted cross correlations and the estimated cross correlation in Trenberth and Shea (2005).

518

Cari Kaufman

University of California, Berkeley

[email protected]

“Calibration and Prediction Problems in Catchment Scale Hydrology”

Statistical models in environmental applications are making increasing use of the scientific

knowledge represented in systems of differential equations describing the evolution of

environmental processes. I will discuss the use of such a model in predicting changes in the

spatial-temporal distribution of soil moisture for a particular catchment under possible climate

change scenarios. One challenge has been to bridge the gap between observations, which are

spatially sparse and temporally dense, and the desired resolution of predictions, which is

spatially dense but aggregated in time. I will discuss how we can achieve this using a

hierarchical model in which the temporal evolution is driven by a catchment-level model, while

observations are modeled according to a stochastic redistribution of total water.

Crystal Linkletter

Brown University

[email protected]

“Latent Socio-Spatial Process Model for Social Networks”

With concerns of bioterrorism, the advent of new epidemics that spread with person-to-person

contact, such as SARS, and the rapid growth of on-line social networking websites, there is

currently great interest in building statistical models that emulate social networks. Stochastic

network models can provide insight into social interactions and increase understanding of

dynamic processes that evolve through society. A major challenge in developing any stochastic

social network model is the fact that social connections tend to exhibit unique inherent

dependencies. For example, they tend to show a lot of clustering and transitive behavior,

heuristically described as “a friend of a friend is a friend.” It might be reasonable to expect that

covariate similarities, or “closeness” in social space, should somehow be related to the

probability of connection for some social network data. The relationship between covariates and

relations is likely to be complex, however, and may in fact be different in different regions of the

covariate space. Here, we present a new socio-spatial process model that smoothes the

relationship between covariates and connections in a sample network using relatively few

parameters, so the probabilities of connection for a population can be inferred and likely social

network structures generated. Having a predictive social network model is an important step

toward the exploration of disease transmission models that depend on an underlying social

network.

Anna Michalak

University of Michigan

[email protected]

“Geostatistical Inverse Modeling for Characterizing the Global Carbon Cycle”

519

Increasing atmospheric concentrations of carbon dioxide are responsible for the largest

component of the rise in radiative forcing since the start of the industrial era. Currently,

approximately 8 Gt of carbon are emitted globally on an annual basis. Only approximately half

of this carbon remains in the atmosphere, however, with the other half being absorbed by global

oceans, plants, and soils, known collectively as carbon “sinks.” The prediction of future climate

change rests on our ability to predict the future behavior of these carbon sinks, with uncertainties

in model predictions caused by poor understanding of carbon sink dynamics being as large as

uncertainties associated with future anthropogenic emissions. This uncertainty can only be

reduced by understanding the spatiotemporal distribution, and processes controlling the

magnitude, of current natural and anthropogenic carbon fluxes. The use of the global network of

atmospheric carbon dioxide monitoring sites to quantify key aspects of the carbon balance at the

earth‟s surface constitutes an inverse problem. Traditionally, these inverse problems have been

addressed in a classical Bayesian framework, with a priori information provided by inventories

of activities related to carbon flux (e.g. fossil fuel emissions) or process-based models that aim to

predict the natural carbon balance in terrestrial ecosystems. Given the sparseness of the

atmospheric monitoring network, and the strong spatiotemporal variability in carbon fluxes, this

inverse problem is highly underdetermined and ill-posed. As a result, poor a priori assumptions

can bias a posteriori estimates, and inverse modeling results have been found to be very sensitive

to a priori assumptions. This presentation will discuss an alternative approach to the solution of

this inverse problem, based on a framework that can leverage multiple spatiotemporal data

sources (including in situ and remote sensing data) while quantifying and accounting for the

covariance structure of the unsampled carbon flux signal. The approach builds on geostatistical

inverse modeling tools developed in the geology and hydrology fields, coupled with formal

model selection tools for identifying key drivers of carbon flux variability. Examples from

recent work at the global and continental scale will be used to illustrate the approach, and

highlight opportunities for future developments at the interface of statistical and earth system

science.

Jesper Møller

Aalborg University

[email protected]

“Spatial and Space-time Point Processes: Theory”

We discuss the theory of modelling and making statistical inference for spatial and space-time

point processes. We start by defining spatial point processes, including marked point processes,

and consider their moment properties and various other useful summaries. Next we study

Poisson, Cox and Markov point process models. We continue with quick and simple simulation

free procedures as well as more comprehensive methods based on a maximum likelihood or

Bayesian approach combined with Markov chain Monte Carlo (MCMC) techniques. Then we

see how all this extend to space-time point processes. Finally, space-time point processes

specified by a conditional intensity are discussed.

Douglas Nychka

National Center for Atmospheric Research

[email protected]

“Interpreting Geophysical Numerical Models using Spatial Statistics”

520

Geophysical models are large numerical computer codes based on physics that can simulate the

motions, thermodynamics and electromagnetic dynamics of the Earth's atmosphere and ocean.

State-of-the-art numerical models typically have high dimensional inputs, produce

numerous spatio-temporal fields as output and require statistics simply to

summarize model results. This talk will discuss applications to regional climate and to the upper

atmosphere. The North American Regional Climate Change and Assessment Program

(NARCCAP) is the most extensive series of simulation undertaken to characterize the

uncertainty in regional climate due to different models and due to different global

conditions. Here the spatial data are averaged meteorological fields from the model output or

fields of derived statistics and the challenge is to make comparisons among models or to

irregular observations. These problems are illustrated by a functional ANOVA decomposition

among different models and also with an investigation of simulated climate extremes.

The Lyon-Fedder-Mobarry (LFM) global magnetospheric model and has the potential for

forecasting the effect of magnetic storms on the Earth's upper atmosphere. LFM is still being

actively developed and can benefit from the deliberate exploration of model parameter space

from statistical methodology for computer experiments. The spatial problem is to estimate the

response of the model to untried parameter combinations in order to find values that improve fit.

As an example, results are given for matching the model parameters to an observed magnetic

storm.

In either of these cases there are common issues of handling large data sets and exploiting

regularly gridded locations to make the statistical computations feasible. Also, these problems

motivate the need for better descriptions of the statistical uncertainty in an estimated field.

Chris Paciorek

University of California, Berkeley

[email protected]

“The Importance of Scale for Spatial-confounding Bias and Precision of Spatial Regression

Estimators”

Spatially correlated residuals are common in regression modeling. I consider the effects of

residual spatial structure on the bias and precision of regression coefficients, developing a simple

framework in which to understand the key issues and derive informative analytic results. When

the spatial residual is induced by an unmeasured confounder, regression models with spatial

random effects and closely-related models such as kriging and penalized splines are biased, even

when the residual variance components are known. Analytic and simulation results show how the

bias depends on the spatial scales of the covariate and the residual: bias is reduced only when

there is variation in the covariate at a scale smaller than the scale of the unmeasured

confounding. I also discuss how the scales of the residual and the covariate affect efficiency and

uncertainty estimation when the residuals are independent of the covariate, with implications for

the use of proxy information such as obtained from deterministic models and satellite retrievals.

In an application on the association between black carbon particulate matter air pollution and

birth weight, controlling for large-scale spatial variation appears to reduce bias from unmeasured

confounders, while increasing uncertainty in the estimated pollution effect.

521

Sylvia Richardson Centre for Biostatistics, Imperial College London

[email protected]

“Introduction to Spatial Epidemiology”

Spatial analyses abound in the epidemiological literature. Geographical variations of chronic

diseases have long been recognised as being able to suggest important aetiological clues.

Recently, the availability of geographically indexed health and population data at a small scale,

with advances in GIS, have opened the way for serious exploration of small area health statistics.

In this tutorial, we will review the main types of studies used in spatial epidemiology, in

particular disease mapping, geographical correlation studies and point source studies. We will

discuss levels of spatial modelling for disease outcomes (point data or aggregated data) and

introduce models for small area disease mapping. We will briefly review issues associated with

cluster and hot spot detection. Extension of disease mapping to space time models and models

for multiple disease will also be presented. Finally, ecological studies and issues related to

ecological bias will be reviewed, with extension to semi-ecologic designs that combine

information at different levels (individual and group). Throughout, the models will be illustrated

by numerous spatial epidemiology case studies.

Håvard Rue

Norwegian University of Science and Technology

[email protected]

“Spatial Modeling and Inference using SPDEs”

In this talk, I will discuss the use of stochastic partial differential equations (SPDEs) for spatial

modeling and inference. The results are highly surprising and powerful, as if we treat the SPDE's

``correct'', we obtain very sparse and explicite Markov representations of the Gaussian field even

on irregular triangulated manifolds. These representations can then be used in a structured

additive regression framework for which Bayesian inference about the marginals can be done

both faster and more accurate using integrated nested Laplace approximations (INLA) instead of

using MCMC.

Huiyan Sang Texas A&M University

[email protected]

"Bayesian Modeling of Spatial Extreme Values"

We propose a hierarchical modeling approach for explaining a collection of point-referenced

extreme values. In particular, annual maxima over space and time are assumed to follow

Generalized Extreme Value (GEV) distributions, with the GEV parameters specified in the latent

stage to reflect underlying spatio-temporal structure. The novelty here is that we relax the

conditionally independence assumption in the first stage of the hierarchial model which has been

adopted in previous work. We offer a spatial process model for extreme values where the

522

behavior of the surface is driven by the spatial dependence which is unexplained under the latent

spatio-temporal specification for the GEV parameters.

Lianne Sheppard University of Washington

[email protected]

“Linking Exposures to Health: Exposure Effects and Measurement Error in Cohort Studies”

An important application of disease models in environmental epidemiology is to provide

estimates (with appropriate uncertainty) of the effects of exposures on health outcomes. The

biggest challenges to accuracy and precision of these effects are the presence of confounders and

exposure measurement error. Epidemiological study design also influences the nature and

quality of inferences possible about health effects. In this talk I focus on the cohort study design

and discuss my perspective on important research topics. I present a basic model framework for

assessing longitudinal change for the effect of environmental exposure and discuss different

parameterizations based on scientific understanding. I review the concept of partitioning

exposure information between and within areas and discuss its scientific value for environmental

epidemiology. I present a two-stage approach to exposure measurement error and discuss the

rationale for not jointly modeling exposure and disease. I present our group‟s emerging research

understanding of the importance of exposure network design, the underlying exposure

distribution, the exposure estimation approach, and the spatial dependence structure in the

disease model in the context of health effects estimation.

Lianne Sheppard, with Adam A. Szpiro, Sun-Young Kim, Paul D. Sampson and the MESA Air

team

Michael Stein University of Chicago

[email protected]

“Asymptotics for Spatial and Spatial-temporal Data”

An overview of different ways of taking limits with spatial and spatial-temporal data and why it

matters.

Claudia Tebaldi University of British Columbia

[email protected]

“Ensemble of Climate Models”

With international efforts like the IPCC assessment as a strong motivation, modeling centers

around the world coordinate their experiments choosing a common set of forcing scenarios, and

contribute their respective output to publicly accessible archives, which host - among other

experiment results - large datasets of projections of future climate for various scenarios. Those

multi-model ensembles sample initial condition, as well as structural uncertainties in the model

design, and their availability has spurred a variety of

523

approaches to quantifying uncertainty in future climate change in a probabilistic way, while the

official reports like IPCC's rely mainly on equal-weighted averages as best guess results, and

models' ranges as a measure of the uncertainty. I am going to discuss the motivation for using

multi-model ensembles and the main challenges in interpreting them. The motivation resides in

the need of characterizing model uncertainties at the structural level. The challenges are

incapsulated in the idiosyncratic nature of this sample

of climate models, which has been called an ensemble of opportunity. It is small, it is not random

nor systematic, it is a sample of best guesses which does not fully explore the range of possible

model behaviors. In addition, model skill in simulating present day climate conditions is shown

to relate only weakly to the magnitude of predicted change. It is thus unclear by how much our

confidence in future projections should increase based on improvements in simulating present

day conditions, a reduction of intermodel spread or a larger number of models.

Lance Waller Emory University

[email protected]

“Spatial and Space-time Point Processes: Applications”

Spatial and spatio-temporal point process models provide a framework for describing,

evaluating, and predicting patterns in observed data. Drawing from the set of theoretical tools

presented in the preceding presentation, we explore their application and performance within a

variety of application areas including the spatial epidemiology of and screening for neglected

tropical diseases, the evaluation of environmental impacts on endangered species, the impact of

disease on the growth and patterning of epidermal nerve fibers, and the modelling of spatial-

temporal patterns in locations of forest fires. Each application provides a unique focus of

inference, and we illustrate the framing of the questions of interest, the selection of appropriate

analytic tools, and the interpretation and effective presentation of results.

Christopher Wikle

University of Missouri

[email protected]

“Don't Forget the Process!: Using Scientific Process Knowledge to Motivate Spatio-Temporal

Models”

Historically, spatio-temporal random processes have been motivated by science-based dynamical

processes. Indeed, classes of dependence structures have been derived from such processes. The

recognition that spatio-temporal processes are typically dynamic in nature, and are associated

with error, has led to the common use of a state-space modeling framework, as developed in the

engineering and statistical time-series literatures. However, it has often been the case that these

methods have been applied without the explicit recognition that there is a spatio-temporal

process involved. That is, these models are fit using unrealistic assumptions (i.e., dynamic does

not just mean time-varying!). Furthermore, in many spatio-temporal settings, the modeling of

524

the process is complicated by high-dimensional state vectors and unreasonably large numbers of

parameters. The point of this talk is that process based knowledge can be used to help motivate

efficient spatio-temporal dynamical parameterizations (and, or state-dimension reduction) for

linear and nonlinear processes. In addition, the talk will emphasize that the hierarchical modeling

framework allows one to build complexity in these formulations by modeling these parameters

(often using random processes as well) explicitly. The talk will include motivating examples

from the atmospheric, ocean, and ecological sciences.

#4: Self-Organization and Multi-Scale Mathematical Modeling of Active Biological Systems

October 26 – 28, 2009

Schedule

Monday, October 26, 2009

(SAMSI)

8:30 – 9:00 Registration, Continental Breakfast

9:00 – 9:15 Welcome

9:15 – 10:00 Mike Shelley, Courant Institute, New York University

“Modeling the Large-scale Hydrodynamics of Living Suspensions”

10:00 – 10:45 Simona Mancini, University of Orleans

“A Fokker-Planck Model for Two Interacting Populations of Neurons”

10:45 – 11:15 Coffee Break

11:15 – 12:00 Suncica Canic, University of Houston

“Some Multi-scale Issues in Blood Flow”

12:00 – 1:15 Lunch

1:15 - 2:00 Danielle Hilhorst, Université de Paris-Sud 11

“On the Fast Reaction Limit of a Competition-diffusion System”

2:00 – 2:45 Anna Balazs, University of Pittsburgh

“Self-sustained Motion of a Train of Haptotactic Microcapsules”

2:45 – 3:15 Coffee Break

525

3:15 – 4:00 Jacques Prost, ESPCI ParisTech/ Institut Curie

“Simple Examples of Tissue Dynamics”

4:00 – 4:15 Poster Advertisement Session (2 minute ads each)

4:15 - 5:15 Informal Discussions

5:15 –7:15 Poster Session and Reception

SAMSI will provide poster presentation boards and tape. The board

dimensions are 4 ft. wide by 3 ft. high. They are tri-fold with each side

being 1 ft. wide and the center 2 ft. wide. Please make sure your poster

fits the board. The boards can accommodate up to 16 pages of paper

measuring 8.5 inches by 11 inches.

Tuesday , October 27, 2009

(SAMSI)

8:30 - 9:00 Registration, Continental Breakfast

9:00 - 9:45 Benoit Perthame, University P. et M. Curie

“Analysis of PDE Models for Molecular Motors”

9:45 - 10:30 Jerry Gollub, Haverford College

“Swimming Dynamics of Algae Cells and the Fluid Motion They

Produce”

10:30 – 11:00 Coffee Break

11:00 - 11:45 Andrea Bertozzi, University of California, Los Angeles

“Swarming by Nature and by Design”

11:45-12:30 Yuliya Gorb, University of Houston

“Impact of the Residual Stress on the Vibrational Frequency Spectrum of

a Nonlinear Elastic Body”

12:30 - 1:45 Lunch

1:45- 2:30 Tim Pedley, University of Cambridge

“Collective Behavior of Swimming Micro-organisms”

2:30- 3:15 Rachel Bearon, University of Liverpool

“Individual to Population Models of Swimming Micro-organisms in Fluid

Flow”

3:15- 3:45 Coffee Break

526

3:45 – 4:30 Mark Alber, University of Notre Dame

“Connection between Discrete Stochastic and Continuous Models with

Applications to Biology”

4:30 - 5:30 Discussions

Wednesday, October 28, 2009

(SAMSI)

8:30 – 9:00 Registration, Continental Breakfast

9:00 - 9:45 Michael Graham, University of Wisconsin-Madison

“Transport and Collective Dynamics in Suspensions of Swimming Particles”

9:45 - 10:15 Andrey Sokolov, Princeton University

“Swimming Bacteria at Work: Reduction of Effective Viscosity and

Bacterial Machines”

10:15 - 10:45 Coffee Break

10:45 – 11:15 Lei Zhang, University of Bonn

“An Atomistic to Continuum Model for Self-assemblying Biopolymer

Aggregates”

11:15 - 12:00 Dmitry Karpeev, Argonne National Laboratory

“Effective Viscosity of Suspensions of Swimming Bacteria”

12:00-12:15 Concluding Remarks

12:15 – 1:15 Lunch

1:15 Adjourn

SPEAKER TITLES/ABSTRACTS

Mark Alber

University of Notre Dame

[email protected]

“Connection between Discrete Stochastic and Continuous Models with Applications to Biology”

Swarming, a collective motion of many thousands of cells, produces colonies that rapidly spread over

surfaces. A detailed cell- and behavior-based computational model of M. xanthus swarming [1] will

be used in this talk to show that reversals of gliding direction are essential for swarming and that the

reversal period predicted to maximize the swarming rate is the same as the period observed in

experiments. This suggests that the circuit regulating reversals evolved to its current sensitivity under

selection for growth achieved by swarming. Also, an orientation correlation function will be used to

show that microscopic social interactions help to form the ordered collective motion observed in

swarms [2]. Then a continuous limit will be described of a two-dimensional stochastic discrete model

527

describing cells moving in a medium and reacting to each other through direct contact, cell-cell

adhesion, and long range chemotaxis [3]. Contrary to classical Keller- Segel model, solutions of the

obtained nonlinear diffusion equation do not collapse in finite time and can be used even when

relative volume occupied by cells is ruther large. A very good agreement was demonstrated between

Monte Carlo simulations and numerical solutions of the obtained macroscopic nonlinear diffusion

equation. Combination of microscopic and macroscopic models was used to simulate growth of

Structures similar to early vascular networks.

1. Wu, Y., Jiang, Y., Kaiser, D., and M. Alber [2007], Social Interactions in Myxobacterial

Swarming, PLoS Computational Biology 3 12, e253.

2. Wu, Y., Jiang, Y., Kaiser, D., and M. Alber [2009], Periodic reversal of direction allows

Myxobacteria to swarm, Proc. Natl. Acad. Sci. USA 106 4 1222-1227 (featured in the Nature

News, January 20th, 2009, doi:10.1038/news.2009.43).

3. Lushnikov, P.P., Chen, N., and M.S. Alber [2008], Macroscopic dynamics of biological cells

interacting via chemotaxis and direct contact, Phys. Rev. E. 78, 061904.

Anna Balazs

University of Pittsburgh

[email protected]

“Self-sustained Motion of a Train of Haptotactic Microcapsules”

Using theory and simulation, we design a “train” of N microcapsules that undergoes self-sustained,

directed motion along an adhesive surface in solution. The motion is initiated by the release of

nanoparticles from a single “signaling” capsule at one end of the train. The released nanoparticles

can bind to the underlying surface and thereby induce an adhesion gradient on the substrate. Through

the combined effects of the self-imposed adhesion gradient and hydrodynamic interactions, the N

microcapsules autonomously move in a single file toward the region of greatest adhesion. At late

times, this train reaches a steady-state velocity U , which decreases with train length as N −1/ 2 . We

calculate the maximum length for which the train maintains this cooperative, autonomous motion.

Rachel Bearon

University of Liverpool

[email protected]

“Individual to Population Models of Swimming Micro-organisms in Fluid Flow”

Swimming micro-organisms in fluid environments are ubiquitous and diverse. Such micro-organisms

frequently undergo movement patterns which can be modeled as random walks. The presence of

external cues such as light, chemical gradients, or gravity can modify individual-level behaviour

resulting in the random walk being biased towards a preferred direction. In still fluid, at the

population level, this can mathematically described as a diffusion process with drift. In many natural

environments, swimming occurs in fluid environments undergoing motion. The fluid flow will

528

interact with the swimming behaviour in several ways and alter the spatial distribution of a

population, leading to such phenomena as bioconvection and gyrotactic focussing. Here I present the

derivation of a population-level advection-diffusion model for gyrotactic algae which is based on

extensive experimental observations of individual swimming cells in still fluid. I will then present

theoretical predictions and experimental data describing the behaviour of gyrotactic micro-organisms

in 3D flow. Finally I will discuss the effect of flow on the population-level spatial distribution.

Andrea Bertozzi

University of California, Los Angeles

[email protected]

“Swarming by Nature and by Design”

The cohesive movement of a biological population is a commonly observed natural phenomenon.

With the advent of platforms of unmanned vehicles, such phenomena has attracted a renewed interest

from the engineering community. This talk will cover a survey of the speaker's research and related

work in this area ranging from aggregation models in nonlinear PDE to control algorithms and

robotic testbed experiments. We conclude with a discussion of some interesting problems for the

applied mathematics community.

Suncica Canic University of Houston

[email protected]

“Some Multi-scale Issues in Blood Flow”

The speaker will talk about some multi-scale issues related to the derivation of the effective, 3D

axially symmetric equations of Biot type describing blood flow in compliant arteries with thick and

thin walls. The resulting equations form a nonlinear, hyperbolic-parabolic moving boundary problem

in two spatial and one time variables. A tricky choice of the correct time scale which will capture

both the fast waves in the arterial walls, and the relatively slow bulk velocity of blood will be

discussed. The resulting effective equations uncover many interesting physics features of the problem

otherwise hidden in the full problem coupling the incompressible, Navier-Stokes equations modeling

the fluid flow, with the structure equations (3D elasticity or viscoelasticity) modeling the arterial wall

behavior.

Jerry Gollub

Haverford College

[email protected]

“Swimming Dynamics of Algae Cells and the Fluid Motion They Produce”

Many cells have two flagella that behave like coupled oscillators: they can be either synchronous or

asynchronous. When synchronized (due apparently to stresses transmitted through the fluid), the cell

swims relatively straight. When asynchronous, the cell undergoes a sharp turn, which is can affect

nutrient gain or help to avoid predators. The cell is apparently able to switch between these two states

[1]. The origin of cell diffusion has thus been determined for algae cell swimmers for the first time.

529

We also consider the mixing of the background fluid produced by a cloud of swimming cells [2].

Tracers in the fluid undergo an enhancement in their Brownian motion that gives a surprising self-

similar probability distribution of displacements consisting of a Gaussian core and also exponential

tails resulting from the nearby passage of swimmers. The effects of oscillatory flagellar beating are

emphasized [3].

[1] Marco Polin, Idan Tuval, Knut Drescher, J.P. Gollub, and Raymond E. Goldstein,

"Chlamydomonas swims with two "gears" in a eukaryotic version of run-and-tumble locomotion",

Science 24 July 2009. Accompanied by a Perspective by Roman Stocker and William M. Durham:

"Eukaryotic flaggelar synchronization: tumbling for stealth?"

[2] Kyriacos C. Leptos, Jeffrey S. Guasto, J.P. Gollub, Adriana I. Pesci, and Raymond E. Goldstein,

"Dynamics of enhanced tracer diffusion in suspensions of swimming microorganisms", submitted to

Physical Review Letters.

[3] Work supported by the BBSRC (REG), a Leverhulme Trust Visiting Professorship (JPG), and

NSF Grant DMR0803153 (JPG).

530

Yuliya Gorb

University of Houston

[email protected]

“Impact of the Residual Stress on the Vibrational Frequency Spectrum of a Nonlinear Elastic Body”

Our work is on the modeling and analysis of the high frequency vibrational response of nonlinear

elastic bodies subjected to large deformations and residual stresses. The problem derives from

modeling the use of an ultrasound catheter to interrogate in-vivo atherosclerotic plaques in large

arteries. Given the theoretical and practical difficulties of determining the residual stress in healthy

arteries, it is even more daunting to model and measure it in arteries with a significant atherosclerotic

plaque burden. The difficulty in determining the residual stress is a central motivation for an aspect

of the study that addresses the question of how much detail of the residual stress is necessary in order

to distinguish key tissue types via ultrasound techniques.

Michael Graham

University of Wisconsin-Madison

[email protected]

“Transport and Collective Dynamics in Suspensions of Swimming Particles”

A suspension of swimming organisms is an example of an “active” complex fluid. At the

environmental scale, it has been suggested that swimming organisms such as krill can alter mixing in

the oceans. At the laboratory scale, experiments with suspensions of swimming cells have revealed

characteristic swirls and jets much larger than a single cell, as well as causing increased diffusivity of

tracer particles. This enhanced diffusivity may have important consequences for how cells reach

nutrients, as it indicates that the very act of swimming toward nutrients alters their distribution. The

enhanced diffusivity has also been proposed as a scheme to improve transport in microfluidic devices

and might be exploited in microfluidic cell culture of motile organisms or cells.

The feedback between the motion of swimming particles and the fluid flow generated by that motion

is thus very important, but is as yet poorly understood. In this presentation we describe theory and

simulations of hydrodynamically interacting microorganisms that shed some light on the

observations. In the dilute limit, simple arguments reveal the dependence of swimmer and tracer

velocities and diffusivities on concentration. As concentration increases, we show that cases exist in

which the swimming motion generates dramatically enhanced transport in the fluid. This transport is

coupled to the existence of long-range correlations of the fluid motion. Furthermore, the mode of

swimming matters in a qualitative way: microorganisms pushed from behind by their flagella are

predicted to exhibit enhanced transport and long-range correlations, while those pulled from the front

are not. A physical argument supported by a mean field theory sheds light on the origin of these

effects. These results imply that different types of swimmers have very different effects on the

transport of nutrients or chemoattractants in their environment.

531

Danielle Hilhorst

Université de Paris-Sud 11

[email protected]

“On the Fast Reaction Limit of a Competition-diffusion System”

We consider a competition-diffusion system for two competing species; the density of the first specie

satisfies a parabolic equation whereas the second one either satisfies a parabolic equation or an

ordinary differential equation. We show that the two species spatially segregate as the interspecific

competition rate becomes large; the limit problem turns out to be a free boundary problem, which

may have the form of a Stefan problem. We first present a convergence result as well as an error

estimate. We then focus on the singular limit of the interspecific reaction term, which involves a

measure located on the free boundary. This is joint work with S. Martin, M. Mimura and H.

Ninomiya.

Dmitry Karpeev

Argonne National Laboratory

[email protected]

“Effective Viscosity of Suspensions of Swimming Bacteria”

We present a model for dilute suspensions of swimming bacteria in a Stokesian fluid and use it to

calculate the effective viscosity of the suspension. A bacterium rotates its flagella to propel itself

forward or to reorient itself randomly in a process called tumbling. Due to a bacterium's asymmetric

shape, interactions with a prescribed generic background flow, such as a purely straining or a planar

shear flow, cause the bacterium to preferentially align in certain directions. In these aligned

configurations, self-propulsion produces a reduction in the effective viscosity. For sufficiently weak

back- ground flows, the effect of self-propulsion on the effective viscosity dominates all other

contributions, leading to an effective viscosity of the suspension that is lower than the viscosity of the

ambient fluid and, potentially, even a locally negative effective viscosity. This is in agreement with

recent experiments on suspensions of Bacillus subtilis. We present a number of asymptotic results for

the dilute limit, in which the effective viscosity can be explicitly quantified in terms of the geometry

of the bacteria, the parameters of self-propulsion and the characteristics of the background flow.

This is joint work with Leonid Berlyand, Brian Haines, Vitaliy Gyrya (Penn State) and Igor Aronson

(Argonne/U.Chicago).

Simona Mancini

University of Orleans

[email protected]

“A Fokker-Planck Model for Two Interacting Populations of Neurons”

We consider a kinetic equation (Fokker-Planck equation) describing the evolution in time of the

distribution function of the firing rates for two interacting population of neurons. This equation is

deduced from a Wilson-Cowan type model (see [1]), consisting on a system of stochastics

differential equations describing the evolution in time of the firing rates of neurons of various

532

interacting populations. This kind of model is applied, for example, in the study of the neuronal

response to various visual stimuli.

The specificity of the Fokker-Planck equation is that the flux term doesn't derive from a potential.

We shall prove, applying the Krein-Rutman theorem, the existence and uniqueness of a positive

solution to the stationary problem. Moreover, we show, by standard analysis, the existence and

uniqueness of the solution for the time dependent problem. Finally, the positivity of this last one and

its convergence towards the stationnary solution will be justified applying genereal relative entropy

methods.

Concerning numerical simulations, and their comparisons with known results, our results are in

agreement with those find in [1] by means of moments analysis on the stochastic system. Moreover,

we remark the slow-fast behaviour for the evolution of the distribution function, and we will then

briefly investigate the reduction of the stochastic system to one stochastic differential equation,

yielding to one-dimesional Fokker-Planck model.

[1] G. Deco, D. Marti, Biological Cybernetics, (2007).

Tim Pedley

University of Cambridge

[email protected]

“Collective Behaviour of Swimming Micro-organisms”

Bioconvection patterns are observed in shallow suspensions of randomly, but on average upwardly,

swimming micro-organisms which are a little denser than water. The basic mechanism is that an

overturning instability develops when the upper regions of fluid become denser than the lower

regions. The reason for the upswimming depends on the species of micro-organism: certain algae are

bottom-heavy, and therefore experience a gravitational torque when they are not vertical; certain

oxytactic bacteria swim up oxygen gradients that they generate by consuming oxygen. Rational

continuum models can be formulated and analysed in each of these cases, as long as the cell volume

fraction is low enough for cell-cell interactions to be neglected. An important aspect of the theory is

the stress distribution associated with the cells‟ active swimming motions, and the effect depends on

whether they are „pullers‟ (like algae) or „pushers‟ (like bacteria.

Another sort of pattern-formation ("whorls and jets") is observed in very concentrated cultures of

swimming bacteria where cell-cell interactions are crucial. Here we examine the deterministic

swimming of model organisms which interact hydrodynamically but do not exhibit intrinsic

randomness except in their initial positions and orientations. A micro-organism is modelled as a

squirming, inertia-free sphere with prescribed tangential surface velocity. Pairwise interactions have

been computed using the boundary element method, and the results stored in a database. The

movement of a number of identical squirmers is computed by the Stokesian Dynamics method, with

the help of the database of interactions. It is found that the spreading in three dimensions is correctly

described as a diffusive process after a sufficiently long time. Scaling arguments are used to estimate

this time-scale and the diffusivities. These depend strongly on volume fraction and mode of

squirming. However, in two dimensions the squirmers show a definite tendency to aggregation.

533

Benoit Perthame

University P. et M. Curie

[email protected]

“Analysis of PDE Models of Molecular Motors”

The question to explain how motion is produced within the cells (and thus in muscls and living

organisms) is fundamental in biology. In the early 90's, experimental devices have permitted to

observ the processes involved in this mechanism and to produce its biophysical understanding. It is

one of the domains where new mathematical models in biology are now available after the input of

several groups of physicists.

The purpose of this talk is to develop a mathematical theory along PDE formulations proposed and

anamyzed by D. Kinderlehrer and his co-authors. The motors we consider here are:

-) flashing rachets with asymmetric potentials,

-) conformation changes in molecules moving along a filament with an 'asymmetric'

potential.

We give different statements, in some asymptotic regimes or not, proving that the density

concentrates on one end of the filament determined by the asymmetry of the potential. The question

is to determine if 'asymmetry' just means 'non-symmetric'?

Jacques Prost

ESPCI ParisTech/ Institut Curie

[email protected]

“Simple Examples of Tissue Dynamics”

I will discuss tissue growth dynamics in simple geometries, introducing the notion of homeostatic

pressure. I will first show that when two tissues compete for space, in the absence of chemical

signaling, the one, which has the largest homeostatic pressure, always win. I will subsequently show

that in order for a micro-tumor to grow it must exceed a critical radius and calculate the probability

for a tumor to exceed that radius. Boundary effects and orders of magnitudes will also be discussed.

M. Basan, T. Risler , JF Joanny, X Sastre-Garau, J Prost, Homeostatic competition drives tumor

growth and metastasis nucleation Arxiv preprint arXiv:0902.1730, accepted HFSP Journal (2009).

Michael Shelley

Courant Institute, New York University

[email protected]

“Modeling the Large-scale Hydrodynamics of Living Suspensions”

Locomotion of an organism through a fluid is one of the most fascinating instances of fluid-structure

interaction. Just as fascinating are the large-scale flows that result when there are not one but many

locomoting bodies moving and interacting with each other through the fluid. Aside from their

intrinsic importance to biology, such systems offer new approaches to micro-scale mixing and

534

transport, as well as extraction of mechanical work, from active fluids. I will overview recent

mathematical and simulational work in describing "active suspensions" here composed of many self-

locomoting particles moving through a Stokesian fluid. This work ranges from using large-scale

particle-based simulations of hydrodynamically interacting swimmers, to the development and

application of continuum kinetic theories. Our analyses and simulations of these models demonstrate

the existence of large-scale mixing and transport, dependent upon micro-mechanical swimming

mechanisms. I will also discuss recent work on how such dynamics is modified by chemo-sensing

and taxis.

Andrey Sokolov

Princeton University

[email protected]

“Swimming Bacteria at Work: Reduction of Effective Viscosity and Bacterial Machines”

Measurements of the shear viscosity in suspensions of swimming Bacillus subtilis in free standing

liquid films have revealed that the viscosity can decrease by up to a factor of seven compared to the

viscosity of the same liquid without bacteria or with non-motile bacteria. The reduction in viscosity

is observed in two complimentary experiments: one studying the decay of a large vortex induced by a

moving probe and another measuring the viscous torque on a rotating magnetic particle immersed in

the film. In other series of experiments we showed that in the regime of collective swimming bacteria

can power submillimeter gears and primitive systems of gears with asymmetric teeth. The speed of

rotation depends on and is controlled by amount of oxygen available to the bacteria.

Lei Zhang

University of Bonn

[email protected]

“An Atomistic to Continuum Model for Self-assemblying Biopolymer Aggregates”

Self-assemblying aggregates are formed by single monomers such as peptides. They have a hierarchy

of secondary structures and can be used to model the development of pathological situations such as

Alzheimer's, prions and Huntington's diseases. In this talk, we present a multiscale approach to

model the structural properties of biopolymer aggregates. We developed a theoretical framework

which relates the microscale molecular properties of the monomers to the mesoscopic mechanical

and geometrical properties of the biopolymer aggregates. In particular, one can link the elastic

bending modulus and spontaneous curvature of the self-assembled tapes with the side chain

interactions and the backbone twists of the peptide monomers. The curved nature of the self-

assembled tapes has important biological meaning when they are used as the building block of

higher-order structures. We provide a set of numerical tests to justify our continuum elastic model for

tapes. The results suggests that it should be applicable for higher-order self-assembled structures.

535

#5: Education and Outreach Program Two-Day Workshop October 30 – 31, 2009

Schedule:

Friday, October 30, 2009

8:15 Shuttle from Radisson to SAMSI (Group #1)

8:35 Shuttle from Radisson to SAMSI (Group #2)

8:55 Shuttle from Radisson to SAMSI (Group #3)

9:15-9:30 Welcome and Introductions, Pierre Gremaud (NCSU / SAMSI)

9:30-10:20 “Statistical Models for Environmental Science and Climate Change Research”

Murali Haran (Penn State)

10:20-10:50 R Demonstration, Benjamin Shaby (SAMSI)

10:50-11:10 Break

11:10-12:00 "Downscaling Outputs from Computer Models”

Veronica Berrocal (SAMSI)

12:00-1:00 Lunch

1:00-2:00 R Lab Related to Berrocal's Application

2:00-2:50 "Paleoclimate! Reconstructing Past Temperatures using Natural

Thermometers", Martin Tingley (SAMSI)

2:50-3:20 Break

3:20-3:50 MATLAB Demonstration, Jun Zhang (SAMSI)

3:50-4:30 MATLAB Lab Related to Tingley's Application

4:30-5:00 Graduate School and Career Options, Pierre Gremaud (NCSU / SAMSI)

5:00 Shuttle to Radisson (Group #1)

5:20 Shuttle to Radisson (Group #2)

5:40 Shuttle to Radisson (Group #3)

6:00 Dinner @ Radisson

Saturday, October 31, 2009

8:00 Shuttle from Radisson to SAMSI (Group #1)

8:20 Shuttle from Radisson to SAMSI (Group #2)

8:40 Shuttle from Radisson to SAMSI (Group #3)

536

9:00-9:50 "Where to go When the Volcano Blows"

Susie Bayarri (U. Valencia, Spain)

9:50-10:40 "Multi-site Time Series Analysis of Air Pollution and

Mortality", Howard Chang (SAMSI)

10:40-11:00 Break

11:00-12:00 R Lab Related to Chang's Application

Noon Adjournment and Departure

#6: Program on Sequential Monte Carlo Methods Transition Workshop November 9 – 10, 2009

Schedule:

Monday, November 9, 2009

8.30-8:50 Registration and Continental Breakfast

8.50-9:00 Welcome

9.00-9.30 Chiranjit Mukherjee, Duke University

“Sequential Monte Carlo in Model Comparison: Example in Cellular Dynamics

in Systems Biology”

9.30-10.00 Ioanna Manolopoulou, Duke University

“Adaptive Sequential Re-Sampling from Very Large Datasets in Mixture

Modeling”

10.00-10.30 Sarah Schott, Duke University

“Speeding Up the Product Estimator Using Random Temperatures”

10.30-11:00 Break

11:00-11:30 Simon Godsill, Cambridge University

“Particles, Points and groups - Bayesian Tracking Methods”

11:30-12:00 Francois Septier, TELECOM Lille 1

“Multi-Target Tracking using MCMC-Based Particle Algorithm”

12:00-12:30 Chunjin Ji, Duke University

“Bayesian Nonparametric Modelling for Dynamic Spatial Point Processes”

12.30-1:45 Lunch

1:45 – 2:15 David Dunson, Duke University and Sourish Das, Duke and SAMSI

“Gaussian Process Latent Variable Models: Theoretical Properties and Efficient

Computation”

537

2:15 – 2:45 Minghui Shi, Duke University

“Bayesian Variable Selection via Particle Stochastic Search”

2:45-3:15 Carlos Carvalho, Chicago Business School

"Model Assessment and Sequential Design"

3:15-3.45 Break

3.45 - 4.15 Hedibert Lopes, Chicago Business School

“Particle Learning for Sequential Bayesian Computation”

4:15 – 5:00 Second Chances and Discussion

Tuesday, November 10, 2009

8.30-9:00 Registration and Continental Breakfast

9.00-9.30 Julien Cornebise, SAMSI

“Auxiliary SMC Samplers and Applications to ABC”

9:30 – 10:00 Gareth Peters, University of New South Wales

“Bayesian Alpha Stable Models via SMC Samplers PRC-ABC”

10:00-10.30 Jan Hannig, University of North Carolina Chapel Hill

"An Account of Approaches for SMC in Generalized Fiducial Inference"

10.30-11.00 Break

11.00-11.30 Arnaud Doucet , Institute of Statistical Mathematics / University British Columbia

“Forward Smoothing using Sequential Monte Carlo”

11.30-12:00 Jim Lynch, University of South Carolina

“Some Issues Regarding the Reliability of Load-sharing Systems”

12:00-12:30 Anand Vidyashankar , Cornell University

“State Space Models for Branching Processes”

12.30-1:45 Lunch

1:45-2:15 Giovanni Petris, University of Arkansas

“SMC for Transdimensional Bayesian Inference”

2:15-2:45 Konstantin Lyubimov, University of Georgia

“Sequential Monte Carlo Analysis of a Dynamic Model of Mortgage Termination”

2:45-3.15 Edsel Pena, University of South Carolina

“Bayes Multiple Decision Functions”

538

3.15-3.45 Break

3:45 – 4:15 Bin Liu, Duke University and SAMSI

“Adaptive T-mixture Importance Sampling Method”

4:15 – 5:00 Second Chances and Closing Discussion

SPEAKER ABSTRACTS

Carlos Carvalho University of Chicago

[email protected]

"Model Assessment and Sequential Design"

I will provide an update on the activities of the MAAD working group. I will follow with a

presentation of the working paper "Dynamic Financial Index Models" (joint work with Hao

Wang and Craig Reeson).

Julien Cornebise SAMSI

[email protected]

“Auxiliary SMC Samplers and Applications to ABC”

SMC Samplers, as introduced by DelMoral et al. (2006, JRSSB), are a cornerstone extending

SMC methodology to arbitrary sequences of distributions, not necessary linked by recurrence

relations -- on the contrary to the originally motivating problem of filtering state space models.

In this talk, we present an auxiliary variable approach to those new algorithms, introducing

adjustment weights in the resampling step, in a similar way to those brought to SMC filters by

Pitt and Shephard (1999, JASA). Based upon this, we provide new insights and interpretation of

the most recent Approximate Bayesian Computation (ABC)-SMC likelihood-free algorithms.

Arnaud Doucet University of British Columbia

[email protected]

“Forward Smoothing using Sequential Monte Carlo”

We propose a new SMC algorithm to compute the expectation of additive functionals

recursively. Whereas the standard path-based SMC estimator has an asymptotic variance

increasing quadratically with time even under favourable mixing assumptions, the variance of

the proposed SMC estimator only increases linearly with time. This allows us to derive several

new recursive parameter estimation procedures which do not suffer from the particle path

degeneracy problem.

This is joint work with Pierre Del Moral (INRIA & University Bordeaux

539

I) and Sumeetpal S. Singh (Cambridge University).

David Dunson Duke University

[email protected]

“Gaussian Process Latent Variable Models and SMC Computation”

Many interesting models for nonparametric density estimation, dimensionality reduction, flexible

regression, functional data analysis, and spatio-temporal data can be characterized using latent

Gaussian processes. A broad class of such models are described and an SMC approach related to

particle learning is described for posterior computation.

Simon Godsill University of Cambridge

[email protected]

“Particles, Points and groups - Bayesian Tracking Methods”

Authors: Simon Godsill, Francois Septier, Avishy Carmi and Sze Kim Pang.

In this talk I will describe continuous time dynamical models for groups of interacting objects,

such as vehicles, cells, etc. moving in a spatial scene. The models allow for changes in

groupings over time and interactions are captured in a behavioural fashion - objects react to the

behaviour of their neighbours by including a multivariate interaction term in the drift component

of a diffusion process. Inference is carried out

on-line using MCMC adaptations of standard sequential updating.

Jan Hannig

North Carolina State University

[email protected]

We explore the use of Sequential Monte Carlo for generating samples from a generalized fiducial

distribution. We report on both failures and successes of various approaches we undertook.

Chunlin Ji Duke University

[email protected]

“Bayesian Nonparametric Modelling for Dynamic Spatial Point Processes”

We discuss flexible Bayesian nonparametric modelling for inhomogeneous spatio-temporal

processes. This involves nonparametric spatial process mixture models of intensity functions, in

which time variation is introduced via dynamic models for underlying parameters. These models

540

characterize smooth dynamics in time in what may be quite complicated spatial patterns of

spatial inhomogeneity in intensity functions. The framework is based on a time-varying Dirichlet

process partition scheme, and physically attractive time propagation models for parameters of

nonparametric mixture models for intensities. Bayesian inference and model-fitting is addressed

involves novel particle filtering ideas and methods. Illustrative simulation examples in extended

target tracking, and substantive data analysis in applications in cell fluorescent microscopic

imaging tracking demonstrate analysis with these models.

Bin Liu Duke University

[email protected]

“Adaptive T-mixture Importance Sampling Method”

We propose an adaptive importance sampling (IS) method aiming to sample from a static target

distribution, which can be multimodal and also be defined in a high-dimensional parameter

space. This method mainly consists of a mixture training procedure that dynamically adapts a

mixture of Student‟s t distributions to approximate the target, as measured by an entropy

criterion. In particular, both the mixture parameters and the components allocation are updated in

a sequential manner. Taking the refined mixture as importance density, we can obtain accurate IS

estimates to the quantities we desire. A multivariate t-mixture approximation to the target comes

as a natural by-product, relying on which, we can estimate normalizing constants. Relying on

this property, we are using this method in the context of adaptive extra-solar planets detection.

Results from both simulation study and the real astronomers' data analysis are revealed.

Hedibert Lopes

University of Chicago

[email protected]

“Particle Learning for Sequential Bayesian Computation”

Authors: Hedibert Lopes, Carlos Carvalho, Michael Johannes and Nicholas Polson

In this paper we develop a simulation-based approach to sequential Bayesian computation.

Particle learning provides a simple yet powerful framework for efficient sequential posterior

sampling strategies. The key aspect of PL is its ability to effectively deal with the problem of

sequentially inference of fixed parameters defining general state space models. We illustrate our

methodology in a range of models and applications. We start by working in a wide class of

conditionally Gaussian state space models and generalized dynamic linear models. We further

demonstrate the generality of PL by deriving sequential algorithms for nonparametric Bayesian

and general mixture models. In addition, we extensively study issues related to convergence

diagnostics and provide simulation-based evidence of the efficiency of our particle learning

methodology. We emphasize that sequential updating offers posterior and predictive inferences

together with marginal likelihoods for model assessment as a direct by-product of our analysis so

that, in many cases, particle learning provides a natural alternative to MCMC based strategies.

Jim Lynch University of South Carolina

541

[email protected]

“Some Issues Regarding the Reliability of Load-sharing Systems”

These include natural censoring that occurs in such systems as well as the development of

reliability functions for systems of dependent components where the dependencies are based on

load-sharing considerations.

Konstantin Lyubimov

University of Georgia

[email protected]

“Sequential Monte Carlo Analysis of a Dynamic Model of Mortgage Termination”

This paper a deals with the estimation of the parameters of the latent intensity of mortgage

default and prepayment. Mortgage termination is modeled via doubly stochastic Poisson process

intensity of which depends on observable loan characteristics, dynamic macroeconomic factors

such as interest rates and house prices, and a latent factor capturing duration dependence. This

latter is assumed to be of a diffusion type. Parameters of the latent processes (one for

prepayment and one for default) are estimated using auxiliary particle filter using payment

histories of a large sample of U.S. residential mortgages. Authors calibrate model to market

prices and test different specifications of the latent intensity process.

Ioanna Manolopoulou Duke University

[email protected]

“Adaptive Sequential Re-Sampling from Very Large Datasets in Mixture Modeling”

One of the challenges of Markov Chain Monte Carlo in large data sets is the need to scan

through the whole data at each iteration of the sampler, which can be computationally

prohibitive. Here we consider the case where we focus inferences on observations in a (not

necessarily known) region in sample space. The motivating application arises in flow cytometry,

where interest lies in identifying specific rare cell subtypes and characterizing them according to

their corresponding markers. We develop a Sequential Monte Carlo approach whereby a targeted

sample from the region of interest is sequentially augmented, while the estimates for the

parameters evolve. We implement our procedure on an assessment of the immune response to

cytomegalovirus (CMV)-peptide stimulation using 4-color flow cytometry.

Chiranjit Mukherjee

Duke University

[email protected]

“Sequential Monte Carlo in Model Comparison: Example in Cellular Dynamics in Systems

Biology”

Sequential Monte Carlo analysis of time series provides a direct approach to evaluating

approximate model marginal likelihoods for model comparison. We exemplify this in studies of

dynamic bacterial communication in systems biology, where a long sequence of state vectors

542

follow a complicated nonlinear dynamic model with several dening biochemical parameters.

MCMC methods do not mix well in these contexts, and do not lead easily to reliable estimates of

model marginal likelihood. We develop an auxiliary particle ltering algorithm that

simultaneously updates latent states and fixed parameters. Our algorithm takes advantage of

distributed computing to carry forward a huge number of particles to ensure accuracy of the

estimates. Marginal likelihood computation is developed and illustrated in evaluation of

relevance of selected model components.

Edsel Pena University of South Carolina

[email protected]

The proliferation of high-throughput technology, epitomized by the microarray, has made high-

dimensional decision-making of central and critical importance in many scientific disciplines.

Particular examples of this multiple decision problem is that of simultaneously testing a large

number of pairs of hypotheses, as well as that of multiple classifications or more generally

multiple predictions. When assessing the consequences of actions in such multiple decision

problems, there is usually a need to utilize different measures of errors. In this talk we focus on

the false discovery (classification) and the missed discovery (classification) rates. We provide a

Bayesian solution to the problem

and pinpoint the two major computational aspects that arise: that of posterior computation where

SMC methods is of relevance and that of choosing the specific action among the 2M possible

actions, where M is the dimension of multiple decision problem. We provide an efficient

algorithm for doing the latter computational problem.

Gareth Peters

University of New South Wales

[email protected]

“Bayesian Alpha Stable Models via SMC Samplers PRC-ABC”

This talk focuses on a recent paper in which I develop a novel Bayesian modelling framework

based on likelihood-free inference for both univariate and multivariate alpha-stable distributions.

The multivariate setting is of particular interest as it involves formulating a Bayesian model for

the spectral mass and then utilizing likelihood-free methodology to work efficiently with such a

model.

In addition the new likelihood-free methodology for the alpha-stable setting involves

development of the SMC Samplers PRC-ABC algorithm for this context. The methodology is

then assessed and compared to alternative approaches.

Giovanni Petris University of Arkansas

[email protected]

“SMC for Transdimensional Bayesian Inference”

543

We present a new approach that simplifies the application of SMC to transdimensional Bayesian

inference. The proposed method is based on a simple geometric argument that has already been

applied successfully in transdimensional MCMC settings. Applications to variable selection and

to mixture densities with an unknown number of components will be used to illustrate the

method.

Sarah Schott Duke University

[email protected]

“Speeding Up the Product Estimator Using Random Temperatures”

Many problems consist of finding the ratio between the measure of B and the measure of B‟

where B‟ʗB. Typically, one of these measures is relatively easy to compue, and in finding the

ratio, we can approximate the other. In standard Monte Carlo methods, this ratio is approximated

by drawing samples from B, and couning the number that lie in subset B‟. But such methods

require the number of samples to be of the same order as the ratio itself, which is problematic

when B‟ is exponentially small in the size of B. To overcome this difficulty, the product

estimator introduces a sequence of nested subsets between B and B‟, indexed by a parameter β.

In the product estimator, the number of subsets is fixed. In the new approach presented here, the

sequence of subsets is adaptive. The resulting algorithm is nearly optimal and much easier to

analyze. Moreover, the new

method allows for the simultaneous estimation of the size of the set for all values of β.

Francois Septier TELECOM Lille 1

[email protected]

“Multi-Target Tracking using MCMC-Based Particle Algorithm”

Detection and tracking of multiple targets are essential components of modern sensor systems.

The purpose of multiple target tracking algorithms is to determine the number of targets and their

respective kinematic parameters from sequences of noisy observations. The difficulty of this

problem has increased as sensor systems in the modern battlefield are required to detect and

track targets in very low probability of detection and in hostile environments with heavy clutter.

In this talk, I will present an overview of Markov chain Monte Carlo (MCMC) methods for

sequential simulation from posterior distributions, which represent alternatives to SMC methods.

Then, an implementation of this MCMC-Based particle algorithm to perform the sequential

inference for multitarget tracking will be described and compared with other particle methods by

using a novel evaluation metric.

Minghui Shi Duke University

[email protected]

“Bayesian Variable Selection via Particle Stochastic Search”

544

It has become routine in many application areas to collect high-dimensional sets of candidate

predictors, and there is a need for new methods for searching the massive dimensional model

space for promising subsets. Although a number of sparse estimation methods have been

proposed, such as the Lasso and elastic net, such methods do not allow for uncertainty in variable

selection. An alternative is to use Markov chain Monte Carlo (MCMC) to conduct a stochastic

search of the model space, but such methods face major computational challenges as the number

of candidate predictors increase. This article proposes a new computational approach based on

sequential Monte Carlo, which we refer to as particle stochastic search (PSS). We illustrate PSS

through applications to linear regression and probit models.

Anand Vidyashankar

George Mason University

[email protected]

“State Space Models for Branching Processes”

Branching processes and their variants are extensively used to model data arising in several

investigations in contemporary science: PCR, Clinical trials, Ecology, and Finance among

others. A common theme, in several of these applications, is that the state space of the branching

process is unobservable while some information concerning the relationship between the state

space and the observable space can be obtained from various scientific sources. In this talk, I will

develop the notion of state space models for branching processes and describe a Bayesian

approach to inference that is facilitated by sequential monte carlo methods. Simulation results

and illustration with real data will also be provided.

#7: GEOMED: Spatial Epidemiology 2009 Workshop November 14 – 16, 2009

Charleston, SC

Note to self: Schedule needs to be converted to Word

#8: Program on Stochastic Dynamics Theory and Qualitative Behavior of Stochastic

Dynamics Workshop February 8 – 10, 2010

Schedule:

Monday, February 8, 2010

(SAMSI)

8:15-8:45 Registration and Continental Breakfast

8:45-9:00 Welcome

9:00 - 9:45 Luc Ray-Bellet, University of Massachusetts

“Billiards, Large Deviations, and Non-equilibrium Statistical Mechanics”

9:45- 10:30 Michael Scheutzow, Technische Universitaet Berlin

“Attractors and Expansion for Random Dynamical Systems on a Euclidean Space”

10:30 - 10:45 Break

545

10:45- 11:30 Peter Baxendale, University of Southern California

“Sustained Oscillations and Density Dependent Markov Chains”

11:30 - 12:00 Discussion of Morning talks

12:00 -1:00 Lunch

1:00 - 1:45 Stanislov Molchanov, University of North Carolina, Charlotte

“Stochastic Dynamics of Some Critical Ecosystems”

1:45 - 2:30 Alexei Novikov, Pennsylvania State University

“A Stochastic Particle System for the Burgers' Equation”

2:30 - 3:30 Break/ informal discussions

3:30- 4:00 Avanti Athreya, SAMSI / Duke University

“Metastability in a Nearly-Hamiltonian System”

4:00 - 4:30 Discussion of Afternoon talks

4:30 - 5:30 Short presentations / 5 min madness/ open problem discussion

5:30 -7:30 Poster Session and Reception

SAMSI will provide poster presentation boards and tape. The board dimensions are 4 ft.

wide by 3 ft. high. They are tri-fold with each side being 1 ft. wide and the center 2 ft.

wide. Please make sure your poster fits the board. The boards can accommodate up to

16 pages of paper measuring 8.5 inches by 11 inches.

Tuesday, February 9, 2010

(SAMSI)

8:30-9:00 Registration and Continental Breakfast

9:00 - 9:45 David Nualart, University of Kansas

“Feynman-Kac Formula for the Heat Equation Driven by Fractional Noise”

9:45- 10:30 Dmitry Dolgopyat, University of Maryland

“Quenched Limit Theorem for 1D Random Walk in Random Environment”

10:30 - 10:45 Break

10:45- 11:30 Grigorios Pavliotis, Imperial College, London

“Asymptotic Analysis for the Generalized Langevin Equation”

11:30 - 12:00 Discussion of Morning talks

12:00 -1:00 Lunch

546

1:00 - 1:45 Richard Sowers, University of Illinois

“A Stochastic Moving Boundary Value Problem”

1:45 - 2:30 Amarjit Budhiraja, University of North Carolina

“Some Large Deviation Problems for Infinite Dimensional Stochastic

Dynamical Systems”

2:30 - 3:30 Break/ informal discussions

3:30- 4:00 Scott McKinley, SAMSI / Duke University

“Anomalous Diffusion of Distinguished Particles in Bead-spring Networks”

4:00 - 4:30 Nawaf Bou-Rabee, New York University

“Nonasymptotic Behavior of MALA”

4:30 - 5:00 Discussion of Afternoon talks

Wednesday, February 10, 2010

(SAMSI)

8:30-9:00 Registration and Continental Breakfast

9:00 - 9:45 Barbara Gentz, University of Bielefeld

“Metastability for the Ginzburg-Landau Equation with Space--time White Noise”

9:45- 10:30 Yuri Bakhtin, Georgia Tech

“Diffusion Models of Sequential Decision Making”

10:30 - 10:45 Break

10:45- 11:30 TBD

11:30 - 12:30 Discussion of Morning talks

12:30 - 1:30 Lunch

1:30 Adjournment

SPEAKER TITLES/ABSTRACTS

Avanti Athreya SAMSI / Duke University

[email protected]

547

“Metastability in a Nearly-Hamiltonian System”

The abstract: In physics, metastability often refers to the existence of long-lived but ultimately

unstable states of a physical system, such as a supersaturated solution. Mathematically,

metastability can be understood in the context of large deviations for dynamical systems with

random perturbations. We summarize some of the background (i.e. the Freidlin-Wentzell

theory) and describe how metastability can be interpreted

for a nearly-Hamiltonian system with one degree of freedom.

Yuri Bakhtin Georgia Tech

[email protected]

“Diffusion Models of Sequential Decision Making”

I will consider a class of mathematical models of decision making. These models are based on

dynamics in the neighborhood of unstable equilibria and involve random perturbations due to

small noise. I will report results on the vanishing noise limit for these systems, providing precise

predictions about the statistics of decision making times and sequences of unstable equilibria

visited by the process. Mathematically, the results are based on the analysis of random Poincare

maps in the neighborhood of each equilibrium point. I will also discuss some experimental data.

Peter Baxendale University of Southern California

[email protected]

“Sustained Oscillations and Density Dependent Markov Chains”

A number of recent papers have considered the phenomenon of sustained oscillations in density

dependent Markov models of processes in areas such as interacting populations, epidemics, and

chemical kinetics. This talk will consist of three parts. One part will describe a recent stochastic

averaging result which explains the existence of sustained oscillations for a class of 2-

dimensional diffusion processes. A second part will survey the transition from the original

density dependent Markov chain to a diffusion approximation. (For mathematicians this is work

of Kurtz from the 1970s; for physicists,

chemists and biologists this is the Van Kampen expansion). The last part will bring these ideas

together to provide both qualitative and quantitative results on sustained oscillations for density

dependent Markov chains.

Nawaf Bou-Rabee New York University

[email protected]

“Nonasymptotic Behavior of MALA”

548

MALA, a Monte-Carlo method for sampling from a high-dimensional probability distribution,

can now be used to solve SDEs. I will briefly review this recent result and show why MALA is

suited for computations of quantities like equilibrium correlation functions of molecular systems.

The remainder (and majority) of my talk answers a question of Martin Hairer on the rate of

convergence of MALA to equilibrium, i.e., how long does it take for MALA to be oblivious

about where it started?

Sandra Cerrai University of Maryland

[email protected]

“On the Approximation of Quasipotentials”

By using arguments of $\Gamma$-convergence, we show that the quasi potential $V_\delta$,

corresponding to a class of reaction-diffusion equations in space dimension $d>1$ perturbed by

a Gaussian noise $\partial w^\delta/\partial t$ which is white in time and colored in space,

converges to the quasi-potential $V$, corresponding to space-time white noise, when the noise

has a small correlation radius $\delta$, so that it converges to the white noise $\partial w/\partial

t$, as $\delta\downarrow 0$.

We show how these results apply to the asymptotic estimate of the mean of the exit time of the

solution of the stochastic problem from a domain of attraction for the unperturbed system.

Dmitry Dolgopyat

University of Maryland

[email protected]

“Quenched Limit Theorem for 1D Random Walk in Random Environment”

1D random walk in random environment spends most of the time at the traps. We consider the

transient case and show that with the proper definition of traps the times spent at different traps

are independent exponential random variables (with random means). This is a joint work with

Ilya Goldsheid.

Barbara Gentz

University of Bielefeld

[email protected]

“Metastability for the Ginzburg-Landau Equation with Space--time White Noise”

We consider a Ginzburg--Landau partial differential equation in a bounded interval, perturbed by

weak space--time white noise. As the length of the interval increases, a transition between

activation regimes occurs. Consequently, the classical Kramers formula for the expected

transition time between metastable states breaks down. We will show how recent results by

Bovier, Eckhoff, Gayrard and Klein, providing the first rigorous proof of the Eyring--Kramers

549

formula, can be extended to this degenerate and infinite-dimensional setting in order to derive

the subexponential asymptotics of the transition time at the critical interval length.

This is joint work with Florent Barret (Palaiseau) and Nils Berglund (Orl\'eans).

Scott McKinley SAMSI / Duke University

[email protected]

“Anomalous Diffusion of Distinguished Particles in Bead-spring Networks”

We consider the dynamics of a distinguished particle in a network of thermally fluctuating beads

that interact with each other through linear springs. Such processes can be expressed as the sum

of a Brownian motion with a large number of Ornstein-Uhlenbeck processes. We introduce a

single parameter which can be tuned to produce a wide variety of sub-diffusive behavior -- a

long-term mean-squared displacement satisfying $\E{x2(t)} \sim t^\nu$ where $\nu \leq 1$ -- for

the generic sum-of-OU structure and demonstrate the relationship between this parameter and the

geometric structure of the bead-spring connection network in which the distinguished particle

resides. This development provides a basis to prove a conjecture from the physics community

that the Rouse exponent $\nu = 1/2$ is universal among a wide class of models.

Stanislov Molchanov

University of North Carolina, Charlotte

[email protected]

“Stochastic Dynamics of Some Critical Ecosystems”

Co-authors: Professor Leonid Bogachev (University of Leeds, UK), Professor Gregory Derfel

(Ben-Gurion University, Israel), Yakin Fang (UNCC), Joseph Whitmayer (UNCC)

The talk will present several results on the limiting behavior of the reaction – diffusion equations

in ,1, dZ d i.e. branching processes with diffusion (or more complicated underlying random

motion). The most interesting is the case of the critical populations where mortality and birth

rates are equal and one can expect the existence of the stationary in time and space limiting

distribution for the particle field dZxxtn ),,( .

The non-linear equation for the generating functions (of the Fisher-Kolmogorov-Petrovskii-

Piskunov type, FKPP) provides the infinite system of the equations for the correlation functions.

The analysis of these equations provides the desirable limit theorem

on 3, dZ d and (under special assumptions) on 2Z . In the 2D case we can also include the

immigration into our model. The limiting point field demonstrates many features of the real

biosystems (“patches” in the space distribution, deviations from the Poisonnian statistics etc.).

Alexei Novikov

Pennsylvania State University

[email protected]

“A Stochastic Particle System for the Burgers' Equation”

550

We study the dissipation mechanism of a stochastic particle system for the Burgers‟ equation.

Recently, P.Constantin and G.Iyer formulated and studied velocity field of the viscous Burgers‟

equations as an expected value of a certain stochastic process based on noisy particle

trajectories. However, a Monte-Carlo version of the above does not have a viscous dissipation

mechanism. I particular, this Monte-Carlo system develops shocks in finite time. We consider a

resetting procedure, that regularizes this Monte-Carlo system by adding more `Markovianity' to

it. This is a joint work with G.Iyer.

David Nualart University of Kansas

[email protected]

“Feynman-Kac Formula for the Heat Equation Driven by Fractional Noise”

We will present a Feyman-Kac formula for the stochastic heat equation driven by a

multiplicative fractional Brownian sheet. We will discuss the smoothness of the density of the

solution, and the Holder regularity in space and time variables. This is a joint work with

Yaozhong Hu and Jian Song.

Grigorios Pavliotis Imperial College, London

[email protected]

“Asymptotic Analysis for the Generalized Langevin Equation”

I will present some recent results on the qualitative properties of solutions to the generalized

(non-Markovian) Langevin equation (GLE). After writing the GLE as a Markov processes in an

extended phase space, I will present results on a) estimates on the (exponential) rate of

convergence to equilibrium b) bounds on derivatives of the associated Markov semigroup c) a

homogenization theorem d) convergence to the (Markovian) Langevin equation. Some

applications of our results will also be discussed.

Luc Ray-Bellet University of Massachusetts, Amherst

[email protected]

“Billiards, Large Deviations, and Non-equilibrium Statistical Mechanics”

We present results (joint with L.-S. Young (NUY)) on large deviation estimates for chaotic

billiards and non-uniformly hyperbolic dynamical systems. We also discuss applications of

theses results to the study of non-equilibrium steady states in statistical

mechanics.

Michael Scheutzow

Technische Universitaet Berlin

[email protected]

551

“Attractors and Expansion for Random Dynamical Systems on a Euclidean Space”

We provide sufficient conditions on the coefficients of a stochastic differential equation on a

Euclidean space in order that the associated random dynamical system (rds) admits a random

attractor. Roughly speaking, the drift vector field has to point towards the origin in order for an

attractor to exist. For the proof, we use chaining techniques in order to control the growth of

small balls under the rds. We also provide an example of an rds generated by a stochastic

differential equation which does not possess an attractor even though its Markov semigroup

admits a (unique) invariant probability measure.

This is joint work with Georgi Dimitroff (Kaiserslautern).

Richard Sowers University of Illinois

[email protected]

“A Stochastic Moving Boundary Value Problem”

We consider a stochastic perturbation of a two-phase moving boundary problem proposed by

Ludford and Stewart and studied by Caffarelli and Vazquez. There have been a number of fairly

recent works on the effects of perturbations of Stefan-like problems by Wiener fields. Our

contribution is the effect of a multiplicative term modifying the noise. This ensures that there is

only a single boundary. We understand the existence and uniqueness of the SPDE, and its

boundary regularity. This work is joint with Kunwoo Kim and Carl Mueller.

#9: Program on Space-Time Analysis for Environmental Mapping, Epidemiology and

Climate Change February 17 – 19, 2010

Schedule:

Wednesday, February 17

Radisson Hotel RTP

8:15-8:50 Registration and Continental Breakfast

8:50 -9:00 Welcome

9:00 – 12:00 Session 1: Paleoclimate Analysis

Chair, Murali Haran, Pennsylvania State University

Tapio Schneider, California Institute of Technology

“Reconstructing Past Climates from Proxies: Statistical Challenges”

Caspar Ammann, National Center for Atmospheric Research

552

Break (30 min)

Working Group Session

Paleoclimate: Bala Rajaratnam, Stanford University

Discussion Session

Bo Li, Purdue University

12:00-1:00 Lunch

1:00 – 4:30 Session 2: Extremes

Chair: Peter Craigmile, Ohio State University

Anthony Davison, EPFL

“Geostatistics of Extremes”

Debbie Dupuis, HEC Montréal

“On Modelling (Seasonally Adjusted) Extreme Daily Average Temperatures”

Mike Wehner, Lawrence Berkeley National Laboratory

"Application of Generalized Extreme Value Theory to Coupled General

Circulation Models"

Break (30 min.)

Working Group Session

Spatial Extremes: Richard Smith, University of North Carolina

Discussion Session

Richard Smith, University of North Carolina

4:30-5:30 Tutorial: “S-T Model for Oceanic and Climatological Variables”

Bruno Sanso, University of California, Santa Cruz

5:30-6:00 Poster Advertisements (2 minute ads each)

6:00-8:00 Poster Session and Reception

SAMSI will provide poster presentation boards and tape. The board

dimensions are 4 ft. wide by 3 ft. high. They are tri-fold with each side

being 1 ft. wide and the center 2 ft. wide. Please make sure your poster fits

the board. The boards can accommodate up to 16 pages of paper

measuring 8.5 inches by 11 inches.

Thursday, February 18

Radisson Hotel RTP

8:15-9:00 Registration and Continental Breakfast

553

9:00 – 12:30 Session 3: Climate Models

Chair: Hans Künsch, ETH Zurich

Ilya Timofeyev, University of Houston

“Parametric Estimation of Effective Stochastic Models from Discrete Data”

Grant Branstator, National Center for Atmospheric Research “Climate

Response and the Fluctuation-Dissipation Theorem”

David Stainforth, London School of Economics

“Challenges in the Extraction of Decision Relevant Information from

Multi-Decadal Ensembles of GCMs”

Break (30 min)

Working Group Session

Stochastic vs Deterministic Models: Peter Kramer. RPI

Discussion Session

Marc Genton, Texas A&M University

12:30-1:45 Lunch

1:45 -5:15 Session 4: Regional Climate Modeling and Downscaling

Chair: Marco Ferreira, University of Missouri

Linda Mearns, National Center for Atmospheric Research

“The North American Regional Climate Change Assessment Program

(NARCCAP): An Overview”

Cari Kaufman, University of California, Berkeley

“Functional ANOVA Models for Regional Climate Model Experiments”

Ernst Linder, University of New Hampshire

“Analyzing Regional Climate Model Outputs Using an Extended Model

for Large Spatio-Temporal Lattices”

Break (30 min.)

Working Group Session

Spatio-temporal Computation: Noel Cressie, Ohio State University

Geostats: Veronica Berrocal, Duke University & SAMSI

554

Discussion Session

Steve Sain, UCAR

Friday, February 19

Radisson Hotel RTP

8:15-9:00 Registration and Continental Breakfast

9:00 – 12:30 Session 5: Uncertainty Quantification

Chair: Paul Baines, Harvard University

Chris Forest, Pennsylvania State University

“Statistical Calibration of Climate System Properties”

Hans Künsch, ETH Zurich

“Biases and Interannual Variability of Climate Projections”

Break (30 min.)

Dan Rowlands, Oxford University

“An Objective Bayesian Approach to Climate Forecasting”

Discussion Leader

Mark Berliner, Ohio State University

12:30 – 1:45 Lunch

SPEAKER TITLES/ABSTRACTS

Grant Branstator NCAR

[email protected]

“Climate Response and the Fluctuation-Dissipation Theorem”

One of the most common uses of climate models is for estimation of how the climate system will

react to changes in externally imposed system forcing. The physical processes involved are

complex and involve multiple scales so only comprehensive models give accurate estimates.

Such models are difficult to use for systematic studies or for answering optimization questions.

In this presentation we discuss how the fluctuation-dissipation theorem can be used to generate

response operators that give solutions that are very similar to the response that a comprehensive

model would give for

any imposed (weak) forcing. And because of the simple form of the operator it can be used for

systematic investigations and for optimization problems. Furthermore, these response operators

are calculated using only lag-correlation statistics of the undisturbed system; neither knowledge

of the governing equations nor observations of its response to external forcing is required. Not

555

only can the response of mean state variables be estimated in this way but also estimates of

functionals of state, including variances and eddy fluxes of bandpass fields.

After verifying the accuracy of FDT based response operators we use them to address optimum

excitation questions. For example, we find heating distributions that most efficiently change a)

regional mean surface temperature, b) the amplitude of circulation features like the North

Atlantic Oscillation, and c) stormtrack characteristics including precipitation and eddy variances.

We also use FDT operators to systematically study how moving heat sources in the tropics affect

midlatitude weather statistics, a subject of interest for extending the traditional limit of

predictability.

Anthony Davison EPFL

[email protected]

“Geostatistics of Extremes”

Statistical methods for the analysis of independent extremes are now well-developed and widely

used. However methods for the analysis of extremes of spatial and spatio-temporal data are

much less well-developed. One approach that has been proposed is based on the application of

Bayesian computational tools to extremal models, but this seems unsatisfactory because the

usual formulation involves independence assumptions that contradict obvious properties of

spatial extremes. It seems more natural to attempt to fit max-stable processes, which extend the

classical extreme-value distributions in a natural way, but progress on this approach has been

blocked until recently because of a lack of flexible models for spatial max-stable processes and

because of difficulties in fitting the available models. In this talk I shall describe several families

of models and outline how they may be fitted, with applications to a dataset on temperature

extremes in the Alps. The work is joint with Mehdi Gholamrezaee.

Debbie Dupuis HEC Montreal

[email protected]

“On Modelling (Seasonally Adjusted) Extreme Daily Average Temperatures”

Various extreme value analysis techniques have been used to estimate the potential impact of

climate change on extreme phenomena. Some may be criticized for their inefficient use of

historical data, and our subsequent inability to validate estimates using adequate testing. In this

talk, we propose a model that can be validated using an out-of-sample analysis. We model the

time dynamics of daily average temperatures as a regression between daily detrended and

deseasonalized temperatures, including a seasonal error component to represent the observed

seasonal volatility. We estimate extreme daily average temperatures, normalized by the level of

volatility, using a generalized Pareto distribution after having reduced the extremal dependence

by declustering. The model is fitted to 50 years of daily observations recorded in New York City

and out-of-sample testing shows that the model performs well.

Chris Forest

Pennsylvania State University

556

[email protected]

“Statistical Calibration of Climate System Properties”

The behavior of modern climate system simulators is controlled by numerous parameters. By

matching model outputs with observed data we can perform inference on such parameters. This

is a calibration problem that usually requires the ability to evaluate the computer code at any

given configuration of the parameters. As the climate system simulator attempts to describe very

complex physical phenomena, the task of running the model is very computationally demanding.

Thus, a statistical model is required to approximate the model output. In this work, we use output

from the Massachusetts Institute of Technology two-dimensional climate model (climate system

component of the MIT IGSM2.2), historical records and output from a three-dimensional climate

model, to obtain estimates of the climate sensitivity, the effective thermal diffusivity in the deep

ocean and the net aerosol forcing that control the MIT IGSM2.2. We use a Bayesian approach

that allows for the use of scientifically based information on the climate parameters to be used in

the calibration process. The model tackles the problem of dealing with multivariate computer

model output and incorporates all estimation uncertainties into the posterior distributions of the

climate parameters. Additionally we obtain estimates of the correlation structure of the unforced

variability of temperature change patterns. These results are critical for understanding

uncertainty in future climate change and provide an independent check that the information that

is contained in recent climate change is robust to statistical treatment. These results include

uncertainties in the estimation of the multivariate covariance matrices.

Cari Kaufman University of California, Berkeley

[email protected]

“Functional ANOVA Models for Regional Climate Model Experiments”

Functional analysis of variance (ANOVA) models partition a functional response according to

the main effects and interactions of various factors. Motivated by the question of how to compare

the sources of variability in climate models run under various conditions, we develop a general

framework for functional ANOVA modeling from a Bayesian viewpoint, assigning Gaussian

process prior distributions to each batch of functional effects. We discuss computationally

efficient strategies for posterior sampling using Markov Chain Monte Carlo algorithms, and we

emphasize useful graphical summaries based on the posterior distribution of model-based

analogues of the traditional

ANOVA decompositions of variance. We present a case study using these methods to analyze

data from the Prudence Project, a climate model inter-comparison study providing ensembles of

climate projections over Europe.

Hans Künsch

ETH Zurich

[email protected]

“Biases and Interannual Variability of Climate Projections”

557

We study not only averages, but also interannual variability of multi-model climate projections.

We introduce two different assumptions -- called constant bias and constant

relation, respectively -- for extrapolating the substantial additive and multiplicative biases

present during the control period to the scenario period. There are also strong indications that the

biases in the scenario period are different from the extrapolations from the control period.

Including such changes in the statistical models leads to an identifiability

problem that we solve in a frequentist analysis by a zero sum side condition and in a Bayesian

analysis by informative priors. We illustrate the methods by analysing outputs from regional

climate models from the ENSEMBLES and the PRUDENCE project.

Ernst Linder University of New Hampshire

[email protected]

“Analyzing Regional Climate Model Outputs Using an Extended Model for Large Spatio-

Temporal Lattices”

Keywords: Gaussian random fields ; spatio-temporal analysis ; large data ; approximate

inference ; regional climate model outputs.

We propose an extension of the usual CAR model for spatial lattice data that contains a

smoothness parameter similar to the Matern class. This model is defined via the spectral

decomposition of the precision matrix of the data. For regular rectangular lattices, assuming a

circulant structure, the Fast Fourier transform (FFT) can be used thus this provides a

computationally feasible method for very large data. In addition we suggest additional dimension

reduction techniques. We examine the application of this method to spatio-temporal regional

climate model outputs of the NARCCAP program. We consider spatially varying trend models

as well as functional ANOVA type models. We also discuss connections of our methodology to

other current approaches to analyzing large spatio-temporal data such as dimension reduction

methods and approximations, triangulation and finite element approximation such as in INLA,

and others.

Linda Mearns

National Center for Atmospheric Research

[email protected]

“The North American Regional Climate Change Assessment Program (NARCCAP):

An Overview”

NARCCAP is an international program that is systematically investigating the uncertainties in

regional scale projections of future climate and producing high resolution climate change

scenarios using multiple regional climate models (RCMs) by nesting the RCMs within

atmosphere ocean general circulation models (AOGCMs) forced with the A2 SRES emissions

scenario, over a domain covering the conterminous US, northern Mexico, and most of Canada.

The program also includes an evaluation component through nesting the participating RCMs

within NCEP reanalyses. The spatial resolution of the RCM simulations is 50 km. This program

includes RCMs that participated in the European PRUDENCE program (HadRM3 and RegCM),

the Canadian regional climate model (CRCM) as well as the NCEP regional spectral model

558

(RSM), the NCAR/PSU MM5, and NCAR WRF. AOGCMs being used include the Hadley

Centre HadCM3, NCAR CCSM, the Canadian CGCM3 and the GFDL model. The resulting

climate model runs form the basis for multiple high resolution climate scenarios that can be used

in climate change impacts assessments over North America. High resolution (50 km) global

time-slice experiments based on the GFDL atmospheric model and the NCAR atmospheric

model (CAM3) have also been produced and will be compared with the simulations of the

regional models. There also will be opportunities for double nesting over key regions through

which additional modelers in the regional modeling community will be able to participate in

NARCCAP. Additional key science issues are being investigated such as the importance of

compatible physics in the nested and nesting models. Measures of uncertainty across the

multiple runs are being developed by geophysical statisticians. In this overview talk, evaluation

of the regional model performance and results of and the RCM simulations using boundary

conditions from the current and future runs of the AOGCMs will be presented.

Dan Rowlands University of Oxford

[email protected]

“An Objective Bayesian Approach to Climate Forecasting”

Bayesian Approaches have often being used in making Climate Forecasts for the coming

century. However, most are based on untestable prior assumptions on climate model parameters,

which cannot be verified in the real world. Hence, we suggest that applying ideas from Objective

Bayesian Statistics, where Objective simply means that the prior is chosen using a rule rather

than subjective judgement, may help. We illustrate these ideas using a simple Energy Balance

Model of the climate, and discuss the practicalities in extending this to larger and more complex

models

Tapio Schneider California Institute of Technology

[email protected]

“Reconstructing Past Climates from Proxies: Statistical Challenges”

The problem of reconstructing past climates from proxies can be viewed as a missing-data

problem. The problem is to estimate missing values of climate variables in the past from

available proxy data and statistical relations between climate variables and proxies estimable

from periods of overlap between the two. Because proxy data are sparse, the estimation problems

are usually rank-deficient or at least ill-conditioned, so additional (spatial and/or temporal)

information needs to be used to regularize the estimation problems. I will review statistical

methods currently used in these estimation problems and will discuss ways of improving them.

David Stainforth London School of Economics

[email protected]

559

“Challenges in the Extraction of Decision Relevant Information from Multi-Decadal Ensembles

of GCMs”

Debates over how to effectively stimulate greenhouse gas emission reductions rage on. At the

same time increasing effort is being invested in climate change adaptation. Nations and

industries are focusing on the issue and much of the $100B annual fund which has been proposed

as support from the developed world to the developing world from 2020 onwards, is likely to be

focused on adaptation efforts. This is leading to an increasing demand for climate predictions;

predictions not just of global mean temperature but of a wide range of variables at regional and

local scales. Complex climate models are the principle tools used to provide this information.

Uncertainty analysis is of course critical in such endeavours and consequently an increasing

number of climate ensembles have been generated. These have explored different aspects of

uncertainty and sources of error. They range in size from O(10-100) in activities such as the

CMIP III multi-model ensemble, to O(10,000-100,000) in the climateprediction.net project. We

now have far more data regarding GCM model error than we have ever had before but the key

questions of model interpretation remain; supplemented by ones relating to ensemble

interpretation. How do we relate model output and real world behaviour in extrapolatory

problems in complex non-linear systems? How do we combine model output with physical

understanding to provide robust guidance for societal decisions? How do we relate model

diversity and real world behaviour?

Much effort is currently being expended on the production of regional, even local, scale

probability distributions. Here I will discuss the fundamental challenges in the production of

model based probability forecasts on multi-decadal timescales and at spatial scales which could

be potentially relevant for practical decisions. These include (i) the extrapolatory nature of the

problem combined with (ii) model deficiencies and how they can be characterised, while

focusing on (iii) issues related to the lack of independence in multi-model ensembles and (iv) the

difficulties in creating relevant model metrics given the non-linear nature of the system. As part

of (iii), I will discuss the ad hoc shape of model parameter space; an issue which has significant

implications for the role of emulators in these problem.

Ilya Timofeyev University of Houston

[email protected]

“Parametric Estimation of Effective Stochastic Models from Discrete Data”

It is often desirable to derive an effective stochastic model for the physical process from

observational and/or numerical data. Various techniques exist for performing estimation of drift

and diffusion in stochastic differential equations from discrete datasets. In this talk we discuss

the question of sub-sampling of the data when it is desirable to approximate statistical features of

a smooth chaotic trajectory by a stochastic differential equation.

In this case estimation of stochastic differential equations would yield incorrect results if the

dataset is too dense in time. Therefore, the dataset has to sub-sampled (i.e. rarefied).

Michael Wehner

560

Lawrence Berkeley National Laboratory

[email protected]

"Application of Generalized Extreme Value Theory to Coupled General Circulation Models"

I will begin the presentation by highlighting some examples of Generalized Extreme Value

theory applications used in climate change assessments. This will be followed by a discussion of

the sources of uncertainty in estimating return values of daily averaged temperature and

precipitation. These include model fidelity, parent distribution sample size, robustness of the

fitted statistical model and multi-model variance. Some issues to lead us into the discussion

session will be mentioned at the end of the talk.

#10: Education and Outreach Program Two-Day Undergraduate Workshop February 26-

27, 2010

Schedule:

Friday

8:20 Shuttle from Radisson to SAMSI (Group #1)

8:40 Shuttle from Radisson to SAMSI (Group #2)

9:00 Shuttle from Radisson to SAMSI (Group #3)

9:15-9:30 Pierre Gremaud (NCSU/SAMSI) Welcome and Introduction

9:30-10:20 Scott Schmidler (Duke University, Statistics)

"Markov Chains, Networks, and Molecules"

10:20-10:50 Xueying Wang (SAMSI) R demo

10:50-11:10 Break

11:10-12:00 John McSweeney (SAMSI)

“Epidemics on Graphs”

12:00-1:00 Lunch

1:00-2:00 R lab related to McSweeney's application

2:00-2:50 Avanti Athreya (SAMSI)

“Random Walks and Brownian Motion”

2:50-3:20 Break

3:20-4:10 Oliver Diaz (SAMSI)

“Small Random Perturbation of Chaotic Maps of the Interval

and the Central Limit Theorem”

4:10-4:40 Xueying Wang (SAMSI) MATLAB demo

4:40-5:00 Pierre Gremaud (NCSU/SAMSI) Career options

561

5:00 Shuttle to Radisson (Group #1)

5:20 Shuttle to Radisson (Group #2)

5:40 Shuttle to Radisson (Group #3)

6:00 Dinner @ Radisson

Saturday

8:00 Shuttle from Radisson to SAMSI (Group #1)

8:20 Shuttle from Radisson to SAMSI (Group #2)

8:40 Shuttle from Radisson to SAMSI (Group #3)

9:00-9:50 Cindy Greenwood (Arizona State University)

“Why Stochastic Dynamics?”

9:50-10:40 Bruce Rogers (SAMSI)

“Random Graphs and Linear Algebra”

10:40-11:00 Break

11:00-12:00 MATLAB lab related to Rogers‟ application

Noon Adjournment and Departure

12:00 Shuttle from SAMSI to RDU (Group #1)

12:30 Shuttle from SAMSI to RDU (Group #2)

1:00 Shuttle from SAMSI to RDU (Group #3)

#11: Program on Space-time Analysis for Environmental Mapping, Epidemiology and

Climate Change – Fundamentals of Spatial Modeling Working Group: One-day Workshop

on Objective Bayesian for Spatial and Temporal Models March 20 – 21, 2010

San Antonio, TX

The main theme during the second half of the year for the Fundamental Working Group of the

2009-2010 SAMSI Spatial Program is to explore appropriate prior distributions for parameters of

spatial and temporal models. In the absence of sufficiently quantifiable subjective knowledge

concerning the unknown parameters, it is typical to resort to noninformative or objective priors;

these priors allow most of the benefits of Bayesian analysis to be achieved, while avoiding the

difficulty of obtaining a subjective prior. Formal objective priors were introduced for AR (1)

models by Berger and Yang (1994), and for geo-statistical models by Berger, De Oleveira and

Sanso (2001, JASA) and Paulo (2005, Annals of Statistics). However, the current results are

mainly for situations in which there is no nugget effect and no measurement errors; in practice,

both are ubiquitous.

The purpose of this one day workshop at San Antonio is to discuss the state of the art concerning

objective priors for inference with AR models and spatial processes, and to discuss the way

forward in terms of dealing with nuggets effects and measurements errors There will be three

sessions: Auto-regressive Models, Matching Priors and Fiducial Distributions, and Objective

562

Priors for Spatial Models and Space-Time Models. This will set the stage for further work by the

Working Group on these problems.

Schedule:

Saturday, March 20, 2010

Double Tree Hotel

1:50-2:00 Welcome

2:00-4:00 Session 1: Auto-regressive Models

Chair: Judith Rousseau (Universite Paris Dauphine)

2:00-2:40 Shawn Ni, University of Missouri

Objective Bayesian Analysis for AR(p) Models

2:40-3:20 Brunero Liseo, University of Missouri

Objective Priors for AR(p) Models

3:20-3:30 Maria Maddalena Barbieri Universita Roma, Discussion

3:30-3:40 Floor discussion

3:40-4:10 Coffee Break

4:10-6:00 Session 2: Matching Priors and Fiducial Distributions

Chair: Susie Bayarri (Universitat de Valencia)

4:10-4:50 Stefano Cabras (University of Cagliari) and Maria Eugenia Castellanos (Rey Carlos

University, Spain)

A Matching Prior for the Shape Parameter of the Skew-normal Distribution

4:50-5:00 Gauri Datta (University of Georgia), Discussion

5:00-5:40 Jan Hannig (University of North Carolina-Chapel Hill)

On Generalized Fiducial Inference

5:40-5:50 Jose M. Bernardo (Universitat de Valencia), Discussion

5:50-6:00 Floor discussion

Sunday, March 21, 2010

9:00-Noon Session 3: Objective Priors for Spatial Models and Space-Time Models

Chair: Christian Robert, (Universite Paris-Dauphine)

9:00-9:45 Victor De Oliveira (University of Texas at San Antonio):

Bayesian Default Analysis for Gaussian Random Fields

563

9:45-10:30 Cuirong Ren (South Dakota State University)

Objective Bayesian Analysis for a Spatial Model with Nugget Effects

10:30-11:00 Coffee Break

11:00-11:15 Rui Paulo (Technical University of Lisbon), Discussion

11:15-11:30 Bruno Sanso (University of California at Santa Cruz), Discussion

11:30-12:00 Floor discussions.

#12: Uncertainty Quantification Electronic Townhall Meeting April 6, 2010

Background Mathematical models intended for computational simulation of complex real-world processes are

a crucial ingredient in virtually every field of science, engineering, medicine, and business. Two

related but independent phenomena have led to the near-ubiquity of models: the remarkable

growth in computing power and the matching gains in algorithmic speed and accuracy. Together,

these factors have vastly increased the applicability and reliability of simulation---not only by

drastically reducing simulation time, thus permitting solution of larger and larger problems, but

also by allowing simulation of previously intractable problems.

Utilization of such computer models of processes requires addressing UQ. Issues studied in this

fast expanding field include uncertainties in the initial conditions for the models, models with

unknown parameters and/or stochastic features, combination of model outputs and noisy data

(data assimilation), inaccuracies in the models, limited availability of model outputs due to

simulation costs.

The intellectual content of computational modeling comes from a variety of disciplines,

including mathematics, statistics, operations research, and computer science, and the application

areas are remarkably diverse. Despite this diversity of methodology and application, there are a

variety of common challenges in developing, evaluating and using complex computer models of

processes, which directly relate to the mission of SAMSI.

During the academic year 2011-2012, SAMSI will host a unique year long program on both the

methodological and practical aspects of Uncertainty Quantification. Numerous interactions

between the SAMSI program and the new activity group(s) are expected.

The following participants will be present at SAMSI on April 6 to lead the above discussions:

Mihai Anitescu (ANL), Guillaume Bal (Columbia), Susie Bayarri (University of Valencia), Jim

Berger (SAMSI), Leonid Berlyand (Penn State), Amy Braverman (Jet Propulsion Laboratory,

California Institute of Technology, and UCLA), Dennis Cox (Rice), Jim Crowley (SIAM

Executive Director), George Deodatis (Columbia), Yalchin Efendiev (Texas A&M), Don Estep

(Colorado State), Montse Fuentes (N.C. State), Xavier Garaizar (LLNL), Roger Ghanem (USC),

Pierre Gremaud (N.C. State/SAMSI), Max Gunzburger (Florida State), Jan Hannig (UNC-CH),

Jan Hesthaven (Brown), David Higdon (LANL), Gianluca Iaccarino (Stanford), Pierre

Lermusiaux (MIT), Youssef Marzouk (MIT), Jim Nolen (Duke), Doug Nychka (NCAR),

Houman Owhadi (Cal Tech), Sastry Pantula (N.C. State, ASA President), Abani Patra (U.

Buffalo), Bruce Pitman (U. Buffalo), John Red-Horse (Sandia), Tom Santner (Ohio State), Nell

564

Sedransk (SAMSI), Christine Shoemaker (Cornell), Ralph Smith (N.C. State), Richard Smith

(UNC-CH/SAMSI), Daniel Tartakovsky (UCSD), Ron Wasserstein (ASA Executive Director),

Robert Wolpert (Duke), Jeff Wu (Georgia Tech), Dongbin Xiu (Purdue)

Schedule:

April 6, 2010

SAMSI

8:00-8:45 Shuttles from Radisson Hotel to SAMSI

(Provided by Charlene‟s Safe Ride)

8:30-9:00 Registration and Continental Breakfast

9:00-9:20 Welcome, presentation of SAMSI and structure of UQ program

9:20-9:40 Methodology: mathematical issues

9:40-10:00 Methodology: statistical issues

10:00-10:20 Methodology: engineering issues

10:20-10:40 General methodology discussion

10:40-11:10 Break

11:10-11:30 Applications: climate change

11:30-11:50 Applications: sustainability/energy

11:50-12:10 Applications: geosciences

12:10-12:30 General application discussion

12:30-12:45 Discussion of collaboration with the labs

12:45-1:00 List of action items

1:00-2:00 Lunch

Town Hall meeting

2:00-3:00 Discussion of new SIAM activity group

3:00-4:00 Discussion of new ASA interest group

4:00-4:45 Shuttles from SAMSI to RDU Airport

Both discussions will roughly follow the schedule:

Description of the focus-area of the proposed group

Identification of the target community that the group will serve

Description of how the focus-area overlaps or complements that of existing groups

Description the activities proposed by the group

Description of how the group will enhance the goals of the society at large (SIAM and/or ASA)

565

Description of how the group will assess its success at the close of its first charter period

Discussion of Rules of Procedure

Suggestion of names of officers for the group

Discussion of initial petitioners (20 for SIAM, 25 for ASA)

#13: Program on Space-Time Analysis for Environmental Mapping, Epidemiology and

Climate Change Statistical Aspects of Environmental Risk Workshop April 7 -9, 2010

Schedule:

Wednesday, April 7

Radisson Hotel RTP

8:15-8:50 Registration and Continental Breakfast

8:50 -9:00 Welcome: Jim Berger, SAMSI

Feridun Turkman, SPRUCE

9:00 – 12:45 Session 1: Wildfires

Chair: Richard Smith, University of North Carolina / SAMSI

John Braun, University of Western Ontario (9:00-9:45)

“Using Fire Management Tools to Assess Wildfire Risk”

Feridun Turkman, University of Lisbon, Portugal (9:45-10:30)

“Wildfires: Some Statistical Issues”

Break (30 min)

Jose Gomez-Dans, University College, London (11:00-11:45)

Giovani Silva, University of Lisbon, Portugal (11:45-12:15)

“Modelling and Analysis of Forest Fire Data in Portugal”

Discussion (12:15-12:45)

12:45-2:00 Lunch

2:00 – 2:45 Session 2: Ecological Modeling

Chair: Marian Scott, University of Glasgow

Claire Ferguson, University of Glasgow

“Lake Ecosystems: Quality, Risk and Change”

2:45-3:15 Break (30 min.)

3:15-4:30 Poster Advertisements (2 minute ads each)

566

4:30-7:00 Marian Scott, University of Glasgow

Graduate course presentation:

“What Can the Past Climate Reconstructions Inform us About Future

Climates?”

7:00-9:00 Poster Session and Reception

SAMSI will provide poster presentation boards and tape. The board

dimensions are 4 ft. wide by 3 ft. high. They are tri-fold with each side

being 1 ft. wide and the center 2 ft. wide. Please make sure your poster fits

the board. The boards can accommodate up to 16 pages of paper

measuring 8.5 inches by 11 inches.

Thursday, April 8

Radisson Hotel RTP

8:15-9:00 Registration and Continental Breakfast

9:00 – 12:30 Session 3: Preferential Sampling and Point Process Models

Chair: Leonhard Held, University of Zurich

Raquel Menezes, Minho University (9:00-9:45)

“Geostatistics, Preferential Sampling and Environmental Monitoring”

Håvard Rue, Norwegian U. of Science and Technology (9:45-10:30)

“Log Gaussian Cox Process”

Break (30 min)

Working Group Session

Brian Reich, North Carolina State University (11:00-11:30)

Avishek Chakraborty, Duke University (11:30-12:00)

“Point Pattern Modeling for Degraded Presence-Only Data over Large

Regions”

Jo Eidsvik, NTNU, Norway (12:00-12:30)

“Combining Predictive Process Models and Laplace Approximations for

Fast Bayesian Analysis of Spatial Data”

Discussion

1:00-2:30 Lunch

2:30 -5:15 Session 4: Computation and Visualization

Chair: Jo Eidsvik, NTNU, Norway

567

Andrew Finley, Michigan State University (2:30-3:15)

“Modeling and Mapping Non-Stationary Multivariate Processes for Large

Spatial Datasets”

Emily Kang, SAMSI (3:15-4:00)

"Spatio-Temporal Random Effects Models for Very Large Datasets"

Break (30 min.)

Working Group Session

Frank Zou, NISS (4:30-5:00)

"Spatio-Temporal Modeling: Computation, Visualization, and Dimension

Reduction”

Discussion Session (5:00-5:15)

Friday, April 9

Radisson Hotel RTP

8:15-9:00 Registration and Continental Breakfast

9:00 – 1:00 Session 5: Health Effects

Chair: Montse Fuentes, North Carolina State University

Roger Peng, Johns Hopkins University (9:00-9:45)

“Spatial Misalignment in Studies of the Acute Effects of Ambient Air

Pollution”

Bo Li, Purdue University (9:45-10:30)

“The Impact of Heat Waves on Morbidity in Milwaukee, Wisconsin”

Break (30 min.)

Working Group Session

Eric Kalendra, North Carolina State University (11:00-11:30)

“Spatial-temporal Association Between Daily Mortality and Exposure to

Particulate Matter”

Veronica J. Berrocal, SAMSI (11:30-12:00)

“On the Use of a PM2.5 Exposure Simulator to Explain Birthweight”

Howard Chang, SAMSI (12:00-12:30)

"Impact of Climate Change on Ambient Ozone Level and

Mortality in Southeastern United States”

Ana Rappold, USEPA (12:30-1:00)

”Respiratory and Cardiovascular Effects from the 2008 Wildlife Near NC

Pocosin National Wildlife Range”

568

1:00 – 2:00 Lunch

SPEAKER TITLES/ABSTRACTS

Veronica J. Berrocal SAMSI

[email protected]

Co Authors: Alan E. Gelfand, David M. Holland, Janet Burke, Marie Lynn Miranda

“On the Use of a PM2.5 Exposure Simulator to Explain Birthweight”

In relating pollution to birth outcomes, maternal exposure has usually been described using

monitoring data. Such characterization provides a misrepresentation of exposure as (i) it does not

take into account the spatial misalignment between an individual's residence and monitoring

sites, and (ii) it ignores the fact that individuals spend most of their time indoors and typically in

more than one location. In this paper, we break with previous studies by using a stochastic

simulator to describe personal exposure (to particulate matter) and then relate simulated

exposures at the individual level to the health outcome (birthweight) rather than aggregating to a

selected spatial unit.

We propose a hierarchical model that, at the first stage, specifies a linear relationship between

birthweight and personal exposure, adjusting for individual risk factors and introduces random

spatial effects for the census tract of maternal residence. At the second

stage, we specify the distribution of each individual's personal exposure using the empirical

distribution yielded by the stochastic simulator as well as a model for the spatial random effects.

We have applied our framework to analyze birthweight data from 14 counties in North Carolina

in years 2001 and 2002. We investigate whether there are certain aspects and time windows of

exposure that are more detrimental to birthweight by building different exposure metrics which

we incorporate, one by one, in our hierarchical model. To assess the difference in relating

ambient exposure (i.e. exposure to ambient concentration) to birthweight versus personal

exposure to birthweight, we compare estimates of the effect of air pollution obtained from

hierarchical models that linearly relate ambient exposure and birthweight to those obtained from

our modeling framework.

John Braun

University of Western Ontario

[email protected]

“Using Fire Management Tools to Assess Wildfire Risk”

Ignition risk and burn risk due to wildfire have been studied in a region of Ontario, Canada. Our

goal is to establish a methodology which could be used across the boreal forest region of Canada.

A generalized additive model was employed to obtain ignition risk probabilities and a burn

probability map using only historic ignition and fire area data. The Burn-P3 implementation of

569

the Prometheus fire growth simulation model was also used with the goal of obtaining a more

accurate burn probability map, in the absence of data on the actual regions burned by particular

fires. This approach takes into account fuel types and weather to construct realistic fire shapes.

The map of the vegetation in the region was verified and corrected, based on a field study. Burn-

P3 simulations were run under the settings recommended in the software documentation, and

were found to be fairly robust to errors in the fuel map, but simulated fire sizes appear to be

substantially larger than what is indicated in the historic record. By adjusting the input

parameters in ways that might reflect suppression effects, we conclude with a model which gives

more appropriate fire sizes. Based on the resulting burn probability map, we conclude that the

risk of fire in the study area is much lower than what Burn-P3 predicts under the recommended

settings.

Avishek Chakraborty Duke University

[email protected]

“Hierarchical Modeling of Presence-only Data over Large Regions”

Explaining the distribution of a species using local environmental features is a long standing

ecological problem. Often, available data is collected as a set of presence locations only thus

precluding the possibility of a desired presence-absence analysis. We propose that it is natural

to view presence-only data as a point pattern over a region and to use local environmental

features to explain the intensity driving this point pattern. We use a hierarchical model to treat

the presence data as a realization of a spatial point process, whose intensity is governed by the

set of environmental covariates. Spatial dependence in the intensity levels is modeled with

random effects involving a zero mean Gaussian process. We augment the model to capture

highly variable and typically sparse sampling effort as well as land transformation, both of which

degrade the point pattern. The Cape Floristic Region (CFR) in South Africa provides an

extensive class of such species data. The potential (i.e., nondegraded) presence surfaces over the

entire area are of interest from a conservation and policy perspective. The region is divided into

~37,000 grid cells. To work with a Gaussian process over a very large number of cells we use

predictive spatial process approximation. Bias correction by adding a heteroscedastic error

component has also been implemented. We illustrate with modeling for six of different species.

Also, comparison is made with the now popular Maxent approach though the latter is limited

with regard to inference. The resultant patterns are important on their own but also enable a

comparative view, for example, to investigate whether a pair of species are potentially competing

in the same area. An additional feature of our modeling is the opportunity to infer about

biodiversity through species richness, i.e., the number of distinct species in an areal unit. such

investigation immediately follows within our modeling framework.

Howard Chang SAMSI / Duke University

[email protected]

“Impact of Climate Change on Ambient Ozone Level and Mortality in Southeastern United

States”

Co-Authors: Jingwen Zhou, and Montserrat Fuentes

570

There is a growing interest in quantifying the health impacts of climate change. This paper

examines the risks of future ozone level on non-accidental mortality across 17 urban

communities in Southeastern United States. We present a modeling framework that integrates

data from climate model outputs, historical meteorology and ozone observations, and a health

surveillance database. We first modeled present-day relationships between observed maximum

daily 8-hour average ozone concentrations and meteorology measured during the year 2000.

Future ozone concentrations for the period 2040 to 2050 were then predicted using calibrated

climate model output data from the North American Regional Climate Change Assessment

Program. Daily community-level mortality counts for the period 1987 to 2000 were obtained

from the National Mortality, Morbidity and Air Pollution Study. Controlling for temperature,

dew-point temperature, and seasonality, relative risks associated with short-term exposure to

ambient ozone during the summer months were estimated using a multi-site time series design.

We estimated an increase of 0.61 pbb (95% PI: 0.51-0.71) in average ozone concentration for

2040 to 2050 compared to 2000. This corresponds to 67 (95% PI: 28-107) premature deaths in

the study communities attributable to the increase in future ozone level.

Jo Eidsvik

NTNU, Norway

[email protected]

“Combining Predictive Process Models and Laplace Approximations for

Fast Bayesian Analysis of Spatial Data”

The common hierarchical geostatistical model has statistical model parameters at the bottom

level, a Gaussian random field at the intermediate level, and noisy / indirect data at the top level.

One challenge for fast analysis in such models is the big-n problem driven by the O(n^3) cost of

matrix inversion. A second challenge is fast inference and prediction because there are no closed

form solutions.

We illustrate how predictive process modeling is one solution to the first challenge. This

dimension reduction technique projects the n dimensional data to m <<n knot sites. We next

present Laplace approximations and numerical routines as a fast alternative to the second

challenge. The combination of the two approaches means that modeling and inference can be

done in seconds, for large datasets. This makes higher level inference (model choice, design or

analysing preferential sampling) more available. Examples of acidification of lakes in Norway

and forest data in Michigan are presented.

Claire Ferguson University of Glasgow

[email protected]

“Lake Ecosystems: Quality, Risk and Change”

Establishing historical and current water quality in lake ecosystems is important in terms of

assessing risks to both the environment and human health. Eutrophication (as a result of

increased algae) impacts on plant and animal life within the ecosystem, and cyanobacterial

571

blooms (blue-green algae), of which freshwater environments are susceptible, frequently produce

potent toxins which can have adverse health effects.

Nonparametric statistical approaches will be presented to model the complex interrelated nature

of such systems. Initially, water quality will be investigated for individual lakes, with the

environmental drivers that favour cyanobacterial abundance then investigated for lake

ecosystems more generally. Relationships within the ecosystem can change throughout the year

and are potentially impacted by climate change. Therefore, varying-coefficient models and

smooth principal components analysis will then be used to model changes throughout time in

relationships and covariance structures, respectively.

The modelling with be illustrated using data from Loch Leven (Scotland), for which there exists

an excellent long-term monthly data series from the late 1960‟s, and summer phytoplankton data

from 134 lakes across the UK.

Andrew Finley

Michigan State University

[email protected]

“Modeling and Mapping Non-Stationary Multivariate Processes for Large Spatial Datasets”

Here we extend earlier work on hierarchical multivariate spatial models to accommodate non-

stationarity in the correlations among the outcomes as well as capturing the underlying spatial

associations. In particular, we address the computationally challenging setting where the

observations arise from a large number of locations. Direct application of such multivariate

models to large spatial datasets is often computationally infeasible because of cubic order matrix

algorithms involved in estimation. The situation is even worse in Markov chain Monte Carlo

(MCMC) contexts where such computations are performed for several thousand iterations. Here,

we discuss approaches that help overcome these hurdles without sacrificing richness in

modeling. Specifically, we work with linear models of coregionalization but employ a

dimension-reducing spatial process to model the association between the outcomes. We

illustrate the proposed methods using a synthetic and real dataset. Here, interest lies in visual and

statistical inference in the spatially-varying relationship among the outcome variables residual

spatial processes.

Eric Kalendra

North Carolina State University

[email protected]

“Spatial-temporal Association Between Daily Mortality and Exposure to Particulate Matter”

Fine particulate matter (PM2.5) is a mixture of pollutants that has been linked to serious health

problems, including premature mortality. Since the chemical composition of PM2.5 varies across

space and time, the association between PM2.5 and mortality might also be expected to vary

with space and season. This study uses a unique spatial data architecture consisting of geocoded

North Carolina mortality data for 2001-2002, combined with US Census 2000 data. We study

the association between mortality and air pollution exposure using different metrics (monitoring

572

data and air quality numerical models) to characterize the pollution exposure. We develop and

implement a novel statistical multi-stage Bayesian framework that provides a very broad,

flexible approach to studying the spatiotemporal associations between mortality and population

exposure to daily PM2.5 mass, while accounting for different sources of uncertainty. Most of the

pollution-mortality risk assessment has been done using aggregated mortality and pollution data

(e.g., at the county level), and that can lead to significant ecological bias and error in the

estimated risk. In this work, we study and estimate this bias and error. We also provide a

mathematical adjustment for analysis that uses aggregated data to reduce the error in the risk

assessment due to ecological bias. We present results for the State of North Carolina.

Emily Kang SAMSI

[email protected]

"Spatio-Temporal Random Effects Models for Very Large Datasets"

Datasets from remote-sensing platforms and sensor networks are often spatial, temporal, and

very large. Processing massive amounts of data to provide current estimates of the (hidden) state

from current and past data is challenging, and a large number of spatial locations observed

through time can quickly lead to an overwhelmingly high-dimensional statistical model. We will

demonstrate how a Spatio-Temporal Random Effects (STRE) component of a statistical model

reduces the problem to one of fixed dimension with a very fast statistical solution, a

methodology we call Fixed Rank Filtering (FRF). This is compared in a simulation experiment

to successive, spatial-only predictions based on an analogous Spatial Random Effects (SRE)

model, and the value of incorporating temporal dependence is quantified. The resulting fixed-

rank filter is scalable, in that it can handle very large datasets, and the FRF predictions provide

estimates of missing values and filter out unwanted noise. We illustrate the methods with a

remote-sensing dataset of aerosol optical depth (AOD), from the Multi-angle Imaging

SpectroRadiometer (MISR) instrument on the Terra satellite, yielding an optimal statistical

predictor of AOD on the log scale along with its prediction standard error.

Bo Li

Purdue University

[email protected]

“The Impact of Heat Waves on Morbidity in Milwaukee, Wisconsin”

BACKGROUND: Given the predictions of increased intensity and frequency of heat waves, it is

important to study the effect of high temperatures on human mortality and morbidity. Many

studies focus on heat wave-related mortality; however, heat-related morbidity is often

overlooked.

OBJECTIVES: The goals of this study were to examine how hot weather has affected morbidity

in Milwaukee, Wisconsin and how it is predicted to affect morbidity in the future based on

climate change projections.

METHODS: We collected meteorological, air pollution, and hospital admissions data in

Milwaukee, Wisconsin for the years 1989-2005. We estimated temperature threshold values for

573

different causes of hospital admissions and then quantified the associated percent increase of

admissions per degree above the threshold. We also estimated the future impact of higher

temperatures on admissions for the years 2059-2075.

RESULTS: Five causes of admission (endocrine, genitourinary, renal, accidents, and self-harm)

and three age groups (15-64, 75-84, >85 years) were significant factors as related to high

temperature. The estimations for future scenarios indicate a larger number of days above the

temperature threshold and an excess number of admissions.

CONCLUSIONS: High temperatures had a significant effect on several causes of hospital

admissions and age groups in Milwaukee, Wisconsin for the 1989-2005 time period. Our

predictions of increased number of heat-related hospital admissions are consistent with other

climate projections and have implications for public health interventions.

Raquel Menezes

Minho University – Azurém, Portugal

[email protected]

“Geostatistics, Preferential Sampling and Environmental Monitoring”

Geostatistics involves the fitting of spatially continuous models to spatially discrete data.

Preferential sampling arises when the process which determines the data-locations and the

process being modelled are stochastically dependent. Conventional geostatistical methods

assume, if only implicitly, that sampling is non-preferential. In environmental monitoring

networks, preferential sampling is likely to happen as there is a natural inclination to place more

monitors in areas classified as high risk for pollution. In this talk, we describe a model-based

approach, which considers log-Gaussian Cox processes to model the stochastic dependence of

sampling locations on spatial variable under study (Diggle, Menezes and Su, 2009). We

demonstrate through simulated examples that ignoring preferential sampling can lead to

seriously misleading inferences.We then present an application to biomonitoring data from

eastern Iberian Peninsula, where since the 90's moss has been used as biomonitor for air

pollution to measure heavy metals concentration, with intensified sampling in large urban or

industrial areas.

Roger Peng Johns Hopkins University

[email protected]

“Spatial Misalignment in Studies of the Acute Effects of Ambient Air Pollution”

Time series analyses of ambient air pollution often involve comparing a pollutant measured at a

point in space with an aggregate measure of health. Spatial misalignment of the exposure and

response variables can induce error, the magnitude of which depends on the spatial variation of

the exposure of interest. In air pollution epidemiology, there is an increasing focus on estimating

the health effects of the chemical components of particulate matter. One issue that is raised by

this new focus is the spatial misalignment error introduced by the lack of spatial homogeneity in

many of the particulate matter components. Current approaches to time series modeling do not

take into account the spatial properties of components and therefore could result in biased

574

estimation of health risks. We present a spatial-temporal statistical model for quantifying spatial

misalignment error and show how adjusted heath risk estimates can be obtained using a plug-in

approach and a two-stage Bayesian model. We apply our methods to a database containing

information on hospital admissions, air pollution, and weather for 20 large urban counties in the

United States.

Ana Rappold Environmental Protection Agency

[email protected]

“Respiratory and Cardiovascular Effects from the 2008 Wildfire Near NC Pocosin National

Wildlife Refuge”

The association of mortality and morbidity with exposure to urban air pollution is well

established, however the health effects associated with exposure to wildfire emissions are less

understood. Over the past several decades, there has been an increase in the frequency and

duration of wildfires in the United States. Wildfires are major emitters of aerosol and air

pollution, and pose an important concern to human and ecological welfare. There are substantial

public health concerns that climate change related warming trends will exacerbate these risks.

On June 1st, 2008, a lightning strike initiated a deep, smoldering fire in peat soil in eastern planes

of North Carolina. Prolonged drought and unusually low humidity allowed for rapid progression

of the fire which produced air pollution concentrations ten fold in excess of normal levels. For

duration of the wildfire, smoke affecting mostly the low populated neighboring counties.

However, during a single three day window of predominantly westerly winds, smoke plume

moved inland, exposing a large portion of the state.

Health studies on wildfires are difficult to assess due to their episodic and brief nature. In this

study, we examined emergency department (ED) visits from the state‟s comprehensive

surveillance program to establish the total burden on cardio-respiratory health posed by the

wildfire. We considered daily cardiovascular and respiratory related visits and applied standard

time series analysis to estimate the cumulative risk of exposure. Satellite data products were used

to track the location of the smoke plume and approximate county level exposure to smoke plume.

The goal of the analysis was to use health and exposure information data readily available to the

county, state and federal authorities for timely correspondence to public.

We found that for all respiratory outcomes, ED visits increased, including those for acute

conditions of bronchitis and pneumonia (cRR=1.59%, 95%CI= (1.07, 2.34)), as well as those

with chronic respiratory conditions of asthma (1.65, (1.25-2.17)) and chronic obstructive

pulmonary disease (1.73, (1.05-2.83)). We also observed an increase in hospital ED visits for

heart failure, as well as the trend towards a positive association with the combination of acute

coronary syndrome. No association was found between smoke exposure and cardiac arrhythmias.

Furthermore, none of the outcomes showed change in referent counties. In conclusion, the

findings show that even moderate exposure to wildfire smoke produces significant impact on

human welfare particularly in socio – economically depressed communities.

Håvard Rue

Norwegian University of Science and Technology

575

[email protected]

“Log Gaussian Cox Process”

In this talk I will discuss various issues about Log Gaussian Cox processes, which I think has a

much larger potential for applications than its status is today.

Giovani Silva University of Lisbon

[email protected]

“Modelling and Analysis of Forest Fire Data in Portugal”

In the last decade forest fires became a serious problem in Portugal due to different issues such

as climatic characteristics and nature of Portuguese forest. In order to analyze forest fire data, we

use generalized linear models for modelling the proportion of burned forest area. Our goal is to

find out fire risk factors that influence that proportion of burned area and what may make a forest

type susceptible or resistant to fire. Then, we analyze forest fire data in Portugal during 1990-

1994 from a Bayesian approach.

Keywords: Fire management, Burned area proportion, Forest fire data, Generalized linear model.

K.F. Turkman / M.A. Amaral Turkman

CEAUL- FCUL, University of Lisbon

[email protected]

“Wildfires: Some Statistical Issues”

Wildfires cause extensive loss of property and life and inflict heavy damage on ecosystems.

Policy responses of local and global fire management depend on the proper understanding of the

temporal fire extend as well as on its spatial variation across any given study area. In many fire

regimes, a small number of very large fires are responsible for the vast majority of the area

burned. Therefore, the study of spatial and temporal patterns of incidence of large fires is an

important issue. In this talk, we look at some of the statistical issues that we encountered while

modeling some data sets on wildfires. In particular, modeling extremes of wildfires is quite

challenging, raising some interesting questions.

Frank Zou

NISS

[email protected]

“Spatio-Temporal Modeling: Computation, Visualization and Dimension Reduction”

576

Co-Authors: Scott Holan, and Noel Cressie

Computation, visualization and dimension reduction are critical to proper modeling of high-

dimensional spatio-temporal data that are frequently encountered in substantive scientific

applications. This presentation details the year-long activities of the Computation, Visualization,

and Dimension Reduction in Spatio-Temporal Modeling working group (WG) which is part of

the current SAMSI Program on Space-time Analysis for Environmental Mapping, Epidemiology

and Climate Change. Specifically, we highlight the contributions of WG members that have

added to the success of the group and provide a brief description of research presentations that

have aided in the development of four cohesive sub-working groups (SG) whose purpose is to

investigate open research problems. Finally, we describe the four SGs and the research problems

they are exploring.

#14: Program on Stochastic Dynamics Workshop on Molecular Motors, Neuron Models,

and Epidemics on Networks April 15 -17, 2010

Schedule:

Thursday April 15, 2010

(SAMSI)

8:15-8:45 Charlene‟s Safe Ride Shuttle to SAMSI

8:15-8:45 Registration and Continental Breakfast

8:45-9:00 Welcome

9:00 - 9:55 Susanne Ditlevsen, University of Copenhagen

“The Morris Lecar Neuron Model Gives Rise to the Ornstein-Uhlenbeck

Leaky

Integrate-and-Fire Model”

9:55- 10:50 David Kinderlehrer, Carnegie Mellon University

“Aspects of Modeling Transport in Small Systems with a Look at Motor

Proteins”

10:50 - 11:05 Break

11:05- Noon Motors Group

Noon -1:00 Lunch

1:00 - 1:55 Neurons Group

1:55 - 2:50 Networks Group

2:50 - 3:05 Break

3:05- 4:45 Working Sessions

577

(Rooms 150, 201, 259)

4:45 - 5:30 Working Session Reports and Poster Advertisements

5:30 -7:30 Poster Session and Reception

SAMSI will provide poster presentation boards and tape. The board

dimensions

are 4 ft. wide by 3 ft. high. They are tri-fold with each side being 1 ft. wide

and

the center 2 ft. wide. Please make sure your poster fits the board. The boards

can accommodate up to 16 pages of paper measuring 8.5 inches by 11 inches.

7:00-7:45 Charlene‟s Safe Ride Shuttle from SAMSI to Radisson

Friday April 16, 2010

(SAMSI)

8:15-8:45 Charlene‟s Safe Ride Shuttle to SAMSI

8:30-9:00 Registration and Continental Breakfast

9:00 - 9:55 Laura Sacerdote, University of Torino

“Copulae in Neural Networks Modeling”

9:55- 10:50 Erik Volz, University of Michigan

“Mathematical Modeling of Sexually Transmitted Infections in Dynamic

Contact Networks”

10:50 - 11:05 Break

11:05- Noon Andrea Barriero, University of Washington

“Dynamics and Impact of Spike-time Correlations”

Noon -1:00 Lunch

1:00 - 1:55 Hongyun Wang, University of California, Santa Cruz

“Potential Profile of Molecular Motors”

1:55 - 4:45 Working Sessions

(Rooms 150, 201, 259)

2:30 Break

4:45 - 5:30 Working Session Reports

5:30 – 6:00 Charlene‟s Safe Ride Shuttle from SAMSI to Radisson

SPEAKER TITLES/ABSTRACTS

578

Andrea Barreiro University of Washington

[email protected]

“Dynamics and Impact of Spike-time Correlations”

Correlations among neural spike times can be found widely in the brain, and open the possibility

for cooperative coding of sensory inputs across neural populations. The structure of the

correlations that emerge across cells, however, can depend strongly on spike generation

mechanisms in the individual neurons. In joint work, we investigate the interplay among these

mechanisms, network architecture, and the coding of simple classes of inputs.

Meredith Betterton University of Colorado, Boulder

[email protected]

“Molecular Motors That Change the Length of Their Tracks”

Motor proteins in biological systems typically move with directional bias along a one-

dimensional track. Some types of motors are able to change the length of the track on which they

move, by promoting lengthening or shortening of the track. I will discuss some biological

examples of such motors (helicase proteins and kinesin-8 family proteins) and present

mathematical modeling work to describe the coupled problem of motion and track length change.

In some cases crowding and collective effects due to motion of multiple motors are important.

Susanne Ditlevsen University of Copenhagen

[email protected]

“The Morris Lecar Neuron Model Gives Rise to the Ornstein-Uhlenbeck Leaky Integrate-and-

Fire Model”

We analyse the stochastic-dynamical process produced by the Morris Lecar neuron model, where

the randomness arises from channel noise. Using multi-scale analysis, we show that in a

neighborhood of the stable point, representing quiescence of the neuron, this two-dimensional

stochastic process can be approximated by an Ornstein-Uhlenbeck modulation of a constant

circular motion. The firing of the Morris Lecar neuron corresponds to the Ornstein-Uhlenbeck

process crossing a boundary. This result motivates the large amount of attention paid to the

Ornstein-Uhlenbeck leaky integrate-and-fire model. A more detailed picture emerges from

simulation studies.

David Hunter Penn State University

[email protected]

579

“Bayesian Inference for Contact Networks Given Epidemic Data”

We estimate the parameters of a simple random network and a stochastic epidemic on that

network using data consisting of recovery times of infected hosts. The SEIR epidemic model we

fit has exponentially distributed transmission times with gamma distributed latent (exposed) and

infective periods on a network where every tie exists with the same probability, independent of

other ties. We employ a Bayesian framework and MCMC integration to make estimates of the

joint posterior distribution of the model parameters. We demonstrate some of the important

aspects of our approach by studying a measles outbreak in Hagelloch, Germany in 1861

consisting of 188 affected individuals. We provide an R package to carry out these analyses,

which is available publicly on the

Comprehensive R Archive Network (CRAN).

David Kinderlehrer Carnegie Mellon University

[email protected]

“Aspects of Modeling Transport in Small Systems with a Look at Motor Proteins”

We are all aware that motion in small systems presents many challenges: it is difficult to move

and maintain a course compromised by immersion in a highly fluctuating bath. We discuss some

possibilities for molecular motors, for example, kinesin-1. Our approach is to formulate a

dissipation principle connected to the Monge-Kantorovich mass transfer problem. We show how

this leads to a system of evolution equations. We then discuss how various elements of the

system must be related in order that transport actually occur. Does detailed balance play a role?

Finally, what opportunities do these ideas offer? We examine some 'hybrid variational problems'

and discuss unresolved issues, time permitting.

Duane Nykamp

University of Minnesota

[email protected]

“Toward a Second Order Description of Neuronal Networks”

The complexity of the activity of large numbers of neurons and their interconnectivity creates a

challenge for understanding computations within the brain. The high dimensionality of the

activity patterns and connectivity patterns is a formidable obstacle to an analysis of the

relationships between the connectivity and the behavior of the

network. If one is interested in understanding aspects of a particular behavior of a network, a

fundamental problem is determining which details (or dimensions) of the network are important.

I present an approach to developing low-dimensional network models that capture basic features

of neuronal networks. The approach is based a set of second-order statistics that represent

deviations from random networks and capture interactions among edges. Ensembles of neuronal

networks parametrized by those statistics are a simplified description of aspects of complex

networks. These parametrized classes of networks may capture in just a few dimensions key

changes in network structure as well as influences of the network structure on the dynamics of

interacting neurons.

580

Laura Sacerdote University of Torino

[email protected]

“Copulae in Neural Networks Modeling”

Mathematical models for neuron activity help our understanding of neural code. Different types

of models may be used depending upon the level of detail necessary for the studied problem.

Leaky integrate and Fire models due their popularity to their relatively simple mathematical

structure furthermore, despite their simplicity, they can fit experimental data. Their use has

allowed to improve our comprehension of mechanisms involved in signal transmission. Typical

examples are the studies on the role of the noise in the neuronal code or the properties of

responses to periodic stimula. These models have been used also to apply information theory to

the study of input output relationships.

The possibility to record simultaneously from two or more neurons has shown the arising of

specific patterns in observed raster displays. The existence of patterns is the result of the

presence of dependencies between spike times of different neurons. An attempt to model

networks request to switch from one-dimensional models to multivariate ones. Various attempts

in this direction have been proposed in the literature, making use of multiparticle models or

object oriented networks. The main lack of these approaches is their inability to relate single

units and network features.

Here we propose to use the copula notion to get multivariate models. Copulae are the

mathematical object allowing to join marginal distributions to get the multivariate one. Their use

has been recently proposed for statistical purposes of counts of spikes in a network. Here we

focus on their use for modeling purpose. We limit ourselves to simple models involving a few

neurons but the proposed idea could then be applied to larger networks.

Erik Volz University of Michigan

[email protected]

“Mathematical Modeling of Sexually Transmitted Infections in Dynamic

Contact Networks”

Mathematical epidemiology has progressed in the direction of combining increasingly realistic

models of disease progression with increasingly complex host population structure. Contact

networks have become a popular model of population structure for hosts embedded in long-term

relationships. Concurrent advances in mathematical theory have made these models more

tractable, which decreases the need to compromise between realism and mathematical analysis. I

review several special cases of epidemics spreading through complex populations which reduce

to simple solutions based on ordinary differential equations. This reveals a link between

traditional mass action models of epidemics in which contacts are instantaneous and

uncorrelated, and networks which have dynamically rearranging ties. I then investigate several

applications. A simple

581

modification to our equations allows us to model sero-sorting of HIV positive individuals, which

is the tendency to rearrange relationships to those with similar HIV status. These models are

tested using data from a high-risk cohort from Atlanta, Georgia.

Hongyun Wang University of California, Santa Cruz

[email protected]

“Potential Profile of Molecular Motors”

In molecular motors, the mechanical motion is coupled to the chemical reaction. In the

mathematical framework of stochastic differential equations, a motor is subject to the Brownian

diffusion and is driven by the motor potential that changes with the chemical state. For a motor

with N chemical states, there are N motor potentials, one for each chemical state. The effect of

the chemical reaction is to switch the motor among this set of N potentials. It is desirable to

reconstruct these motor potentials from single molecule experimental data. However, it is still

very difficult to measure both the position and the chemical state in single molecule experiments.

As a result, it is not realistic to reconstruct potentials for individual chemical state. Here we use

an effective potential, called motor potential profile, to characterize the motor operation. The

best advantage of using motor potential profile is that it can be reconstructed from time series of

measured position. We will also discuss the Stokes efficiency of a molecular motor and the

using the motor potential profile to decompose the Stokes efficiency into mechanical efficiency

and chemical efficiency.

#15: 4th

Annual Graduate Student Probability Conference April 30 – May 2, 2010

Duke University

Schedule:

#16: 2010 Summer Program on Semiparametric Bayesian Inference: Applications in

Pharmacokinetics and Pharmacodynamics. July 12-23, 2010

Schedule:

Monday, July 12, 2010

Radisson Hotel RTP

582

Tutorials

8:00-8:45 Registration and Continental Breakfast

8:45-9:00 Welcome & Overview

9:00-10:45 Tutorial 1

Wes Johnson, U. of California, Irvine and David Dunson, Duke University

“Bayesian Nonparametrics: An Overview”

10:45-11:15 Break

11:15-12:30 Tutorial 1, continued

12:30-1:30 Lunch

1:30-3:00 Tutorial 2

David D’Argenio, University of Southern California and Paolo Vicini, Pfizer

“Pharmacokinetics (PK) and Parmacodynamics (PD)”

3:00-3:30 Break

3:30-5:00 Tutorial 2, continued

Tuesday, July 13, 2010

Radisson Hotel RTP

Opening Workshop

8:00-9:00 Registration and Continental Breakfast

Prior Elicitation and Construction:

9:00-9:45 Wes Johnson, University of California, Irvine

“On the Value of Incorporating Scientific Input in Modeling and Data Analysis,

and How to Do it Without Pain”

9:45-10:30 Peter Thall, MDACC

“Prior Elicitation in Bayesian Clinical Trial Design”

10:30-11:00 Break

11:00-11:45 Siva Sivaganesan, University of Cincinnati

“Sample Size: How Many Subjects”

11:45-Noon Discussion

Noon-1:00 Lunch

583

1:00-2:30 Lang Li, Indiana University and Paolo Vicini, Pfizer

"Identifiability and Prior Regularization: A PK-PD Modeling View"

2:30 – 3:00 Break

3:00 – 3:45 Robert Leary, Pharsight

“Computational Challenges of Nonlinear Mixed Effects Analysis for PK/PD

Modeling”

3:45-4:30 Sujit Ghosh, North Carolina State University

“Application of Nonlinear Mixed Models Involving ODEs”

4:30-5:00 Discussion

Wednesday, July 14, 2010

Radisson Hotel RTP

8:00-9:00 Registration and Continental Breakfast

Joint PK and PD:

9:00-9:45 Wojciech Krzyzanski, State University of New York

“PK/PD Modeling of Therapeutic Effects of Erythropoietin”

9:45-10:30 Michele Guidani, University of New Mexico

“NP Bayes Functional Regression for a PK/PD Semi-Mechanistic Model”

10:30-11:00 Break

11:00-11:45 Thanasis Kottas, University of California, Santa Cruz

“A Bayesian Nonparametric Modeling Approach for Developmental Toxicology Data”

11:45-Noon Discussion

Noon-1:00 Lunch

1:00-2:30 David Conti, U. of Southern California; Duncan Thomas, U. of Southern California

“Bayesian Hierarchical and PD/PK Modeling for Complex Pathways in Molecular

Epidemiology”

2:30-3:00 Mike Reed, Duke University

“Why Metabolism is a Challenge for Statistical Analysis: the Case of

Substrate Inhibition”

3:00-3:30 Break

3:30- 4:15 Peter Mueller, MD Anderson

“Nonparametric Bayesian Models for Population PK/PD.”

584

4:15-5:00 Donatello Telesca, University of California, Los Angeles

“Modeling Pk/PGx Data”

5:00-5:30 Discussion

Thursday, July 15, 2010

Radisson Hotel RTP

8:00-9:00 Registration and Continental Breakfast

DDP, Mixture Models and Related Models:

9:00–9:45 Subharup Guha, University of Missouri

“Posterior Simulation in Countable Mixture Models for Large Data Sets”

9:45 – 10:30 Steve MacEachern, Ohio State University

“Nonparametric Bayesian Models for Related Subpopulations”

10:30-11:00 Break

11:00-11:45 Maria de Iorio, Imperial College

“Nonparametric Approaches for Heterogeneous Longitudinal Data”

11:45-12:30 Tatiana Tatarinova, University of Glamorgan

“Analysis of Biological Time Series Data in Data-Rich Situations”

12:30-1:30 Lunch

1:30-2:10 Roger Jelliffe, University of Southern California

"Tools for Individualizing Drug Dosage Regimens for Patients"

2:10-2:50 William Gillespie, Metrum Inst.

“Examples of Decison-Focused Modeling and Simulation in Clinical Drug

Development and Some Methodology R&D Opportunities They Suggest”

2:50-3:30 Gary Rosner, Johns Hopkins University

“Optimal Design with Bayesian Nonparametric Priors”

3:30-4:00 Break

4:00-5:00 David D’Argenio, University of Southern California; William Gillespie, Metrum Inst.;

Roger Jelliffe, University of Southern California; Bob Leary, Pharsight

“Implementation and Available Software”

5:00-5:30 Poster Advertisement Session: 2 minute ads by each poster presenter

6:00–8:00 Poster Session and Reception

585

SAMSI will provide poster presentation boards and tape. The board dimensions

are 4 ft. wide by 3 ft. high. They are tri-fold with each side being 1 ft. wide and

the center 2 ft. wide. Please make sure your poster fits the board. The boards can

accommodate up to 16 pages of paper measuring 8.5 inches by 11 inches.

8:00- Informal Debate: “Parametric PD Models vs. Nonparametric PD Models

Friday, July 16, 2010

Radisson Hotel RTP

8:00-9:00 Registration and Continental Breakfast

Nonparametric Bayes and Clustering for Random Effects Distributions:

9:00-9:45 David Dahl, Texas A&M University

“Strategies for Incorporating Prior Information in Random Partition Models with

Examples”

9:45-10:30 Mahlet Tadesse, Georgetown University

“Stochastic Partitioning and Model-based Clustering with Variable Selection”

10:30–11:00 Break

11:00 –11:45 Julien Cornebise, University of British Columbia

"Inference for Mixed Effects Stochastic Differential PKPD Models

with Particle MCMC"

11:45-12:15 Gary Rosner, Johns Hopkins University, and Peter Mueller, University of Texas

Summary & Discussion

12:15-1:30 Lunch

1:30–2:30 Formation of Working Groups

2:30-3:00 Break

3:00-4:00 Initial discussions

Monday, July 19 – Thursday, July 22, 2010

SAMSI, RTP,

Rooms 104, 150, 201, 203, 259

9:00 – 5:00 Working Groups - TBA

Possible Themes:

1. Identifiability and Prior Regularization

2. Joint PK and PD Modeling

586

3. PK/PD Simulation for Design and Dose Individualization

4. Pharmacogenomics and Networks

5. Nonparametric Bayes and Clustering for PK and PD Population Analysis

12:00-1:30 Lunch

Friday, July 23, 2010

SAMSI, RTP,

Room 150

9:30-12:00 Presentations by Working Groups and Discussion about Future Projects

12:00-1:30 Lunch

1:30 Adjourn

SPEAKER TITLES/ABSTRACTS

David Conti Duncan Thomas

University of Southern California University of Southern California

[email protected] [email protected]

“Bayesian Hierarchical and PD/PK Modeling for Complex Pathways in Molecular

Epidemiology”

Molecular epidemiology is concerned with the study of the joint role of genetic and

environmental influences in disease causation. Despite the current enthusiasm for “agnostic”

(hypothesis-free) genome-wide association approaches, there is still a place for biologically-

based analyses driven by our current understanding of specific pathways implicated in disease.

This talk will focus on statistical approaches to incorporating biological knowledge into the

analysis of gene-environment interactions in disease etiology. Two broad frameworks will be

introduced, one based on hierarchical modeling, the other based on physiologically-based

pharmacokinetic (PBPK) modeling (Thomas et al. 2010). In the former, a standard regression

model (such as multiple logistic for a binary disease trait) forms the first level for the study

subject‟s data, with regression coefficients for main effects of various genes, environmental

factors, and their interactions; these coefficients are in turn regressed on external “meta-data”

about these risk factors, such as indicators for specific pathways involving the various factors,

functional data, or expert knowledge organized in formal ontologies. In the PBPK approach, a

mechanistic model for the disease process is constructed, typically involving one or more latent

variables representing intermediate metabolite concentrations or other unobserved biological

processes, each related to each other and to the input variables through a system of ordinary or

stochastic differential equations. Both approaches will be illustrated on various epidemiologic

studies, such as folate metabolism in colorectal cancer, nicotine dependence, and DNA repair

following ionizing radiation exposure in second breast cancers. Either approach typically

requires one to deal with the problem of model uncertainty, such as the selection of variables to

587

include in a hierarchical model. We will discuss approaches such as stochastic search variable

selection and shrinkage using lasso or ridge priors, as well as novel methods based on Markov

chain Monte Carlo searching over the space of all possible pathway models. We will conclude

with some thoughts on the prospects for extending these approaches to genome-wide scale.

Reference: Thomas DC, Conti DV, Baurley J, Nijhout F, Reed M, Ulrich CM. Use of pathway

information in molecular epidemiology. Hum Genomics, 2010;4:21-42.

David Dahl Texas A&M University

[email protected]

"Strategies for Incorporating Prior Information in Random Partition Models with Examples"

Many Bayesian semiparametric and nonparametric models are based on random partitioning of

items (e.g., experimental units) into clusters, where items in the same cluster share a common

value for a model parameter. Often there is good prior information about the clustering of items,

yet traditional random partition models cannot incorporate this information. In this talk, I will

review several recently-proposed strategies for incorporating prior clustering information in

random partition models. After surveying methods for which the prior clustering information

comes in the form of

item-specific covariates, I will discuss methods that instead utilize pairwise distances between

items to inform the prior clustering distribution. Applying these methods to a model for protein

structure prediction, it is shown that the methods incorporating distance

information substantially improve predictive accuracy.

David Dunson

Duke University

[email protected]

“Bayesian Nonparametrics: An Overview”

Tutorial Part II: In the second part of this tutorial we provide a brief review and motivation for

the use of nonparametric Bayes methods in biostatistical applications. The focus is on methods

utilizing random probability measures, with the emphasis on a few approaches that seem

particularly useful in addressing the considerable challenges faced in modern biostatistical

research. In addition, the emphasis will be entirely on practical applications-motivated

considerations, with the goal of encouraging participants to consider related approaches for their

own data.

The discussion includes methods for functional data analysis using DP-based methods,

approaches for local shrinkage and clustering, methods for hierarchical borrowing of information

across studies, centers or exchangeable groups of data, flexible modeling of

conditional distributions using priors for collections of random probability measures that evolve

with predictors, time and spatial location, some recent applications in bioinformatics, and

methods for nonparametric Bayes hypothesis testing.

Maria de Iorio Imperial College

588

[email protected]

"Nonparametric Approaches for Heterogeneous Longitudinal Data"

We discuss a method for combining different but related longitudinal studies to improve

predictive precision. The motivation is to borrow strength across clinical studies in which the

same measurements are collected at different frequencies. Key features of the data are

heterogeneous populations and an unbalanced design across three studies of interest. The first

two studies are phase I studies with very detailed observations on a relatively small number of

patients. The third study is a large phase III study with over 1500 enrolled patients, but with

relatively few measurements on each patient. Patients receive different doses of several drugs in

the studies, with the phase III study containing significantly less toxic treatments. Thus, the main

challenges for the analysis are to accommodate heterogeneous population distributions and to

formalize borrowing strength across the studies and across the various treatment levels.

We describe a hierarchical extension over suitable semiparametric longitudinal data models to

achieve the inferential goal. A nonparametric random-effects model accommodates the

heterogeneity of the population of patients. A hierarchical extension allows borrowing strength

across different studies and different levels of treatment by introducing dependence across these

nonparametric random-effects distributions. Dependence is introduced by building an analysis of

variance (ANOVA) like structure over the random-effects distributions for different studies and

treatment combinations. Model structure and parameter interpretation are similar to standard

ANOVA models. Instead of the unknown normal means as in standard ANOVA models,

however, the basic objects of inference are random distributions, namely the unknown

population distributions under each study. The analysis is based on a mixture of Dirichlet

processes model as the underlying semiparametric model.

Sujit Ghosh North Carolina State University

[email protected]

“Application of Nonlinear Mixed Models Involving ODEs in Biomedical Sciences”

In the field of biomedical applications, data usually consists of repeated measurements on

individuals observed under varying experimental conditions. For example, in pharmacokinetics,

several blood samples are taken on participating individuals over a period of time, following

administration of a drug. The within subject relationships between the measured response and the

varying experimental conditions are often nonlinear represented by a system of nonlinear

ordinary differential equations (ODEs) and involve unknown subject-specific parameters of

interest. These models are useful because they offer a flexible framework where parameters for

both individuals and population can be estimated by combining information across all subjects.

More often such a system of ODEs does not have any analytical closed form solution, making

parameter estimation for these models quite challenging and computationally very demanding. A

method based on Euler's approximation is proposed to obtain an approximate likelihood that is

analytically tractable and thus making parameter estimation computationally less demanding

than other competing methods. These methods will be illustrated using a real data set and

simulation studies will be presented to compare the performances of these new methods to other

established methods in the literature.

589

William Gillespie

Metrum Institute

[email protected]

“Examples of Decision-Focused Modeling and Simulation in Clinical Drug Development

and Some Methodology R&D Opportunities They Suggest”

Modeling and simulation are increasingly used to support decision-making in clinical drug

development. Simulations are used to infer the probable range of outcomes for different

treatment regimens, patient populations and clinical trials. Simulated results are then used to

optimize decisions regarding patient treatment, clinical trial design and analysis, and

development strategy, e.g., whether to continue a development program. The modeling and

simulation approaches used for such applications vary depending on a variety of factors

including the intended purpose, amount and type of prior information, and, of course, the skills

and philosophical preferences of the modeler(s).

Two examples will be presented that use very different modeling strategies for different

purposes. Example 1 involved the use of a largely empirical Bayesian hierarchical model for

efficacy- and safety-related responses as a function of drug exposure and time constructed via

model-based meta-analysis of clinical and preclinical data. Simulations based on the resulting

model were used to optimize the design of a Phase II clinical trial. Design features explored

included adaptive stopping, adapting pruning, and use of informative prior distributions in the

trial analyses. In example 2 a multi-scale mechanistic model for calcium homeostasis and bone

remodeling was developed. The model was used to explore the impacts of chronic renal failure,

parathyroid disorders and discontinuation of denosumab treatment on biomarkers and bone

mineral density. It was also used to investigate potential causal mechanisms for long-term bone

remodeling effects of estrogen in menopausal and postmenopausal women. The current model is

deterministic. An effort is underway to implement the model in a Bayesian framework that

incorporates uncertainty in parameter values and inter-individual variation is selected

parameters. This will permit probabilistic inference and model updating via MCMC analysis of

new data as it becomes available.

Example 1 suggests the following opportunities for method development:

Modeling relationships between biomarkers and therapeutic outcomes including

approaches for integrating models for multiple biomarkers.

Use of decision analysis to optimize trial design and for analysis of adaptive trials.

Model-based meta-analysis, particularly the combined analysis of summary and

individual data

Example 2 suggests the potential for predictive inferences about clinical response to a drug

candidate even before that candidate has been administered to humans. However few such

models have been developed and fewer still provide useful probabilistic inference. This suggests

the following areas for R&D:

Development of multi-scale mechanistic models for disease progression and drug effects

relevant to other therapeutic areas.

Bayesian implementation of such models.

590

Development of Bayesian modeling methods and software better suited to such complex

mechanistic models.

William Gillespie

Metrum Institute

[email protected]

“Computational Tools for Bayesian PKPD Modeling”

Markov chain Monte Carlo (MCMC) simulation methods for Bayesian modeling have been

recently added to some existing pharmacometric tools, e.g., NONMEM 7 and S-Adapt.

Advantages of these programs are the inclusion of built-in PKPD model libraries and model-

specification languages, and MCMC and less computationally-demanding algorithms within a

single platform. On the other hand those tools are limited to models with two levels of random

variation (plus the prior distributions), normally-distributed inter-individual variation and a

normal-Wishart prior distribution.

More general-purpose software tools implementing MCMC for Bayesian modeling include the

BUGS language programs (WinBUGS, OpenBUGS and JAGS), the SAS MCMC procedure and

Microsoft Infer.Net. Some Bayesian modeling packages are also available for R and MatLab

though most are limited to specific classes of models. The BUGS language tools have the

advantage of being very flexible w.r.t. stochastic model structure, e.g., no limit on levels of

variability, choice of many built-in distributions, and one can easily combine sub-models with

very different stochastic structures. Their main disadvantages are the lack of specialized support

for pharmacometrics applications, less flexible specification of deterministic model components,

and the absence of less computationally-demanding algorithms within the same platform.

BUGSModelLibrary is a prototype PKPD model library for use with WinBUGS 1.4.3. The

current version includes built-in 1 and 2 compartment models with or without first order

absorption, and the ability to specify more general compartmental models in terms of first order

ODE's. The models and data format are based on NONMEM conventions. Several test cases

were analyzed using both the NONMEM 7 BAYES method and WinBUGS with

BUGSModelLibrary. For the classes of models studied:

MCMC simulations using NONMEM 7 and WinBUGS produced results with

comparable accuracy.

NONMEM 7 more often produced less auto-correlated MCMC samples.

WinBUGS required much less computation time to produce comparable MCMC results

for a mixture model, and about half the computation time for 4 test cases.

WinBUGS required more time to produce less precise results for 4 other test cases.

For _ 1=2 of the scenarios NONMEM 7 outperformed WinBUGS w.r.t. computation time

adjusted for autocorrelation.

Those results led to the following recommendations:

NONMEM 7 is a recommended platform for Bayesian modeling when suitable models

can be implemented within the limits imposed by NONMEM.

WinBUGS is a recommended platform when greater flexibility is required w.r.t.

stochastic aspects of models.

591

Based on the limited testing presented here, WinBUGS appears to perform better with

mixture models and models with inter-occasion variability and is the preferred platform

for those cases.

Subharap Guha University of Missouri

[email protected]

"Posterior Simulation in Countable Mixture Models for Large Datasets"

Mixture models, or convex combinations of a countable number of probability distributions,

offer an elegant framework for inference when the population of interest can be subdivided into

latent clusters having random characteristics that are heterogeneous between, but homogenous

within, the clusters. Traditionally, the different kinds of mixture models have been motivated and

analyzed from very different perspectives, and their common characteristics have not been fully

appreciated. The inferential techniques developed for these models usually necessitate heavy

computational burdens that make them difficult, if not impossible, to apply to the massive data

sets increasingly encountered in real world studies.

This talk introduces a flexible class of models called generalized Polya urn (GPU) models. Many

common mixture models, such as finite mixtures, hidden Markov models, and Dirichlet

processes are obtained as special cases of GPU models. Other important special cases include

finite dimensional Dirichlet priors, infinite hidden Markov models, analysis of densities models,

nested Chinese restaurant processes, hierarchical DP models, nonparametric density models,

spatial Dirichlet processes, weighted mixtures of DP priors, and nested Dirichlet processes.

An investigation of the theoretical properties of GPU models offers new insight into asymptotics

that form the basis of cost-effective MCMC strategies for large datasets. These MCMC

techniques have the advantage that they provide inferences from the exact posterior of interest

and are applicable to different mixture models. The versatility and impressive gains of the

methodology are demonstrated by simulation studies and by a semiparametric Bayesian analysis

of high-resolution comparative genomic hybridization data on lung cancer.

Michele Guindani University of New Mexico

[email protected]

“NP Bayes Functional Regression for a PK/PD Semi-Mechanistic Model”

In Clinical pharmacology, a major goal is to study the quantitative prediction of drug effects.

Complex pharmacokinetic-pharmacodynamic (PK/PD) models with feedback and transition

effects have been recently developed, for example to estimate the time course of

myelosupression. These models typically involve the presence of covariate dependent

parameters. We propose a coherent NP Bayes probabilistic framework for the analysis of these

models. Our goal is to use the information from the individuals' time courses and covariates to

provide a clustering of patients according to their PK/PD profiles. This information could then be

used to predict the PD time course for a patient on the basis of the observed PK profile, as well

as deal with missing data. In the talk I will try to outline the features as well as the challenges

592

connected with such implementation. This is a joint project with Peter Mueller, Gary L. Rosner

and Lena Friberg.

Roger Jelliffe University of Southern California

[email protected]

"Tools for Individualizing Drug Dosage Regimens for Patients"

The clinical tools we currently advocate are:

1. Set specific Target Goals (not windows) - Hit them most Precisely. Simply setting a

window is not enough, except for knowing a general range of serum concentrations

within which most patients, but certainly not all, do well. It does not consider each

patient's need for the drug and the risk of toxicity it appears realistic to accept in order to

obtain the hoped – for benefits of the drug. It also does not consider the patient's

sensitivity to the drug. Using the same clinical judgment, one can set a specific target

value for a serum concentration, for example, and then attempt to hit it as precisely as

possible. This target may well be outside the so-called therapeutic range, not only below

it, but also above it.

2. Describe Assay Errors by the reciprocal of the variance with which each measurement

was made. Both for population modeling and for clinical care, the laboratory can do a

much better job if it gives up using CV% as a measure of error (they are the only group in

the world who do this!) and use instead the reciprocal of the assay variance for each

measurement. This is as easy to do as CV%. One can fit a polynomial to the data and

store it in the software. Then one has a reliable measure of credibility with which to fit

each measurement. In addition, there is no need to censor low data any more. One can

drive the HIV PCR down to zero, and document it, and not just report <50 copies, for

example. This greatly increases the usefulness of the lab to the community.

3. Determine the remaining Environmental Noise. Having determined the assay error, one

can determine the remaining environmental error either as an additive or a multiplicative

term. Then one can KNOW how much error is due to the assay the how much to the

clinical environment in which the study or the patient's care has taken place. This is

important information, which is usually not obtained.

4. Evaluate CHANGING Renal Function. Most methods to estimate creatinine clearance

or GFR are based only on a single sample. We much favor a longstanding method based

on 2 serum creatinine samples which estimates the creatinine clearance in unstable

patients which makes serum creatinine rise or fall from the first to the second sample

over a stated time, in a patient of stated age, gender, height and weight.

5. Nonparametric (NP) Population PK/PD Models. We greatly favor the use on

nonparametric population models, as they make no assumptions at all about the often

genetically polymorphic shape of the parameter distributions. What you see is what

you've got. It also leads directly to...

593

6. "Multiple Model" (MM) Design of Doses. Here the multiple support points of the NP

model parameter distributions, when given a candidate dosage regimen, provide many

predictions, each weighted by the probability of that support point. It then is easy to

compute the weighted squared error of failure of those predictions to hit the target, and

then to find the specific regimen which minimizes that error.

7. MM, MAP, Hybrid, and IMM Bayesian analysis and individual models. These four

Bayesian methods provide tools to obtain useful Bayesian posterior joint densities for

patients, even when the patient's parameter densities may lie quite a distance outside the

stated ranges of the population model parameters, and also in very unstable patients

whose parameter distributions may be changing significantly during the period of data

analysis.

Wes Johnson

University of California, Irvine

[email protected]

“Bayesian Nonparametrics: An Overview”

Tutorial Part I: Nonparametric statistics is a misnomer. The goal of nonparametrics is

to provide flexible modeling that does not overly rely on assumptions. In order to accomplish

that goal, it is often necessary to expand a parametric family of models to include many more

models. A parametric family is indexed by a finite parameter vector. A nonparametric family is

indexed by an infinite dimensional parameter. Of course in the real world, we can't quite get to

infinity, so we truncate to obtain a family that is manageable. But in this finite approximation to

an infinite dimensional index, we are left with what are called richly parametric models, that still

allow for much flexibility in modeling. The goal of this talk is to present a few methods of

handling the complexity. We discuss the Dirichlet Process Mixture prior, which is used to model

data as theoretically infinite random mixtures of parametric distributions like the normal. We

also discuss Mixtures of Polya Tree priors that allow the user to embed a parametric family, like

the normal, in a broader class of distributions. We also briefly discuss a different kind of

nonparametrics that models a regression function in a flexible way. All methods are

implemented using Markov chain Monte Carlo methods and thus require some finesse in their

implementation. However, they do not require large sample theory for either their development

or their implementation. Moreover, once the basic machinery is set in place, virtually any

inference is achievable virtually by request through a few lines of code rather than by additional

mathematics.

Wes Johnson

University of California, Irvine

[email protected]

“On the Value of Incorporating Scientific Input in Modeling and Data Analysis, and How To Do

It Without Pain”

We discuss the value of incorporating scientific information that is converted into a prior

distribution for a Bayesian analysis of data. Most of the talk will focus on examples from

parametric modeling. For example, imagine a simple binomial example where n individuals

594

have been tested for HIV using an ELISA test. The probability of a test

positive outcome is the prevalence of HIV in the population sampled times that sensitivity of the

ELISA test plus the prevalence on non-HIV infected individuals times the one minus the

specificity of the test. There are three parameters and there is only one data point, the proportion

of test positives in the sample. Additional scientific input

is required to make further progress in this non-identifiable problem. Moreover, many might

place uniform priors on some of these inputs in a misguided effort to be non-informative. But it

would be impossible to believe that the accuracies of the ELISA test, or indeed the prevalence,

were equally likely to be 0 or 1. I will conclude the talk with a

discussion of how to extend informative prior information into non and semiparametric models.

Thanasis Kottas

University of California, Santa Cruz

[email protected]

“A Bayesian Nonparametric Modeling Approach for Developmental Toxicology Data”

We present a Bayesian nonparametric mixture modeling framework for replicated count

responses in dose-response settings. We explore the methodology for modeling and risk

assessment in developmental toxicity studies, where the primary objective is to determine the

relationship between the level of exposure to a toxic chemical and the probability of a

physiological or biochemical response, or death. Data from these experiments typically involve

features that can not be captured by standard parametric approaches. To provide flexibility in the

functional form of both the response distribution and the probability of positive response, the

proposed mixture model is built from a dependent Dirichlet process prior, with the dependence

of the mixing distributions governed by the dose level. The methodology is tested with a

simulation study, which involves comparison with semiparametric Bayesian approaches to

highlight the practical utility of the dependent Dirichlet process nonparametric mixture model.

Further illustration will be provided through the analysis of data from two developmental

toxicity studies.

Joint work with Kassandra Fronczyk, Department of Applied Mathematics and Statistics,

University of California, Santa Cruz

Wojciech Krzyzanski

University at Buffalo

[email protected]

“PK/PD Modeling of the Therapeutic Effects of Erythropoietin”

Pharmacokinetics describes a relationship between drug dose and drug concentrations at various

body sites such as blood and urine. Pharmacodynamics aims at establishing a relationship

between drug concentrations and pharmacological effects and links these responses to a clinical

outcome. PK/PD modeling relates drug exposure to effect by means of mathematical models.

Hematopietic cell populations that include white blood cells, red blood cells, and platelets

constitute important therapeutic targets and are used as measures of pharmacodynamic response.

Erythropoietin is a major hormone that regulates production of RBCs. Recombinant human

erythropoietin (rHuEPO) is a therapeutic agent for treatment of anemia.

595

This talk will briefly introduce a basic concept of mechanistic PK/PD modeling. The

physiology of EPO regulated erythopoietic system will be presented. A PK/PD model addressing

receptor mediated disposition of rHuEPO in rats will be described. Lifespan based indirect

response models will be exemplified by a PD model of rHuEPO effect on reticulocyte, RBC, and

hemoglobin levels.. The talk will conclude with discussion of numerical challenges associated

with the stiffness of the PK/PD model, difficulties of implementation of delay differential

equations in PK/PD software, and estimability of parameters in high dimension PK/PD systems.

Robert Leary Pharsight, Corp.

[email protected]

“Computational Challenges of Nonlinear Mixed Effects Analysis for PK/PD Modeling”

NLME analysis of PK/PD models poses a variety of computational challenges in terms of

algorithms, models, data handling, software, and hardware. For example, general purpose

NLME software such as the NLME package in R and S+, the PROC NLMIXED procedure in

SAS, and the BUGS Bayesian inference package are often not well suited for PK/PD analyses

due to lack of domain-specific features for handling the complex dosing, data structures, and

models often encountered. In this talk I will give an overview of the current state-of-the-art in

computational approaches to NLME analysis for PK/PD models as practiced within the

commercial and academic pharmacometric communities. Both parametric and nonparametric

methods will be discussed in terms of fundamental algorithmic approaches, specific software

implementations, and current limitations.

Lang Li

Indiana University

[email protected]

“PK Parameter Prior Distribution Construction from the Literature”

In this talk, we will talk about a text mining approach to extract PK parameter numerical data

from the literature automatically. The extracted data will then be transformed into compartment

models. Both linear and non-linear mixed effect models are used for these analyses.

Steve MacEachern

Ohio State University

[email protected]

“Nonparametric Bayesian Models for Related Subpopulations”

596

One of the fundamental questions that we face when attempting to analyze data is how to model

a collection of related subpopulations. Through time, many approaches have been developed. In

this talk, I will describe a natural progression from very simple models to

more complex models. The progression takes us from parametric models to nonparametric

Bayesian models, with a focus on dependent Dirichlet processes. Attention will be drawn to

differences between parametric and nonparametric approaches, both for modeling and for

inference.

Peter Mueller MD Anderson

[email protected]

“Nonparametric Bayesian Models for Population PK/PD”

We will review some nonparametric Bayesian models and approaches that are useful to model

random effects distributions in population PK/PD. We will discuss in particular alternative

approaches to incorporating regression on baseline covariates, focusing on

approaches based on random partition models.

Bhramar Mukherjee

University of Michigan

[email protected]

“Bayesian Analysis of Two-phase Studies of Gene-environment Interaction with an Application

to Drug-gene Interaction”

In this talk we consider Bayesian semiparametric analysis of two-phase studies of gene-

environment interaction. Phase I data is available on case-control status, basic demographic

covariates and medication use (environmental factor, say). Phase II genotype data is collected by

stratified sampling on case-control status and enriched for use of medication. The genes are

selected on the same pathway of the metabolism of the drug. We propose a Bayesian approach

based on a retrospective likelihood that data- adaptively uses the gene-environment

independence assumption and can accommodate multiple genetic factors and their interactions

through a variable selection procedure. The retrospective likelihood requires flexible modeling of

the joint distribution of multiple covariates where we adopt Bayesian nonparametric techniques.

The methods are illustrated by analyzing interaction of statins with polymorphisms in the

cholesterol synthesis and lipid metabolism pathway in a population-based case-control study of

colorectal cancer.

Michael Reed Duke University

[email protected]

“Why Metabolism is a Challenge for Statistical Analysis: the Case of

Substrate Inhibition”

597

Cell metabolism is a complicated network of biochemical reactions that

includes many feedback and feedforward interactions that are crucial for

regulation and adaptation to the cell‟s external environment. In addition, the

special properties of enzymes give rise to unusual nonlinear kinetics. It is serious

and important challenge to construct statistical procedures that can infer such

regulatory mechanisms and unusual kinetics from network input-output maps. A

simple example, substrate inhibition, will be discussed.

Gary Rosner Johns Hopkins University

[email protected]

“Optimal Design with Bayesian Nonparametric Priors”

In this talk, I review an application that concerned finding an individual‟s optimal dose of an

anticancer drug. Since there were multiple studies, we used a hierarchical model based on linked

Dirichlet processes. We also incorporated patient-specific covariates to help determine the

optimal dose. I will review several ways to incorporate covariates and point out how these

methods differ. This is joint work with Alejandro Cruz-Marcelo and Peter Mueller.

Co-Presenters: William Gillespie, Metrum Inst.; Roger Jelliffe, University of Southern California

Siva Sivaganesan University of Cincinnati

[email protected]

“Sample Size: How Many Subjects”

Sampling design is an important aspect of a PK study. It includes specification of the number of

sample per subject, sampling times, and the number of subjects or sample size, among other. In

this talk we given an overview of this topic focusing, specifically, on the sample size, and the

role of Bayesian approach in this context.

Mahlet Tadesse Georgetown University

[email protected]

“Stochastic Partitioning and Model-based Clustering with Variable Selection”

In the analysis of high-dimensional data there is often interest in uncovering cluster structure and

identifying discriminating variables. For example, the goal may be to use gene expression data

to discover new disease subtypes and to determine genes with different expression levels

between classes. Another research question that is receiving increased attention is the problem

of relating genomic data sets from various sources. For instance, the goal may be to uncover

cluster structures in the data sets and identify groups of associated markers across data sets. I

will present some Bayesian methods we have proposed to address these problems in a unified

manner.

598

Tatiana Tatarinova Alan Schumitzky

University of Glamorgan University of Southern California

[email protected] [email protected]

“Analysis of Biological Time Series Data in Data-Rich Situations”

In this talk we present two clustering methods: 1) The Kullback-Leibler Markov Chain Monte

Carlo (KLMCMC) method, and 2) The Multiple Collapse Clustering (MCC) method for the

treatment of biological time series data in a data-rich situation. Assuming a continuous nature of

most biological processes, we suggest computing parameters of nonlinear, piecewise continuous

functions approximating each cluster. Our approach is based on a combination of methods

including Gibbs sampling, Random Permutation Sampling, birth-death MCMC, and Kullback-

Leibler distance.

Donatello Telesca

University of California, Los Angeles

[email protected]

“Modeling Pk/PGx Data”

We describe a general framework for the exploration of the relationships between

pharmacokinetic pathways and plymorphisms in genes associated with the metabolism of a

compound of interest. We integrate a population pharmacokinetics model with a simple sampling

model of genetic mutation via a conditional dependence prior. Significant interactions are

selected allowing the pharmacokinetic parameters to depend on gene sets of variable dimension.

We discuss posterior inference and prediction based on MCMC simulation. Our model finds

motivation in the study of a chemotherapeutic agent used in the treatment of various solid

tumors.

Peter Thall MD Anderson Cancer Center

[email protected]

“Prior Elicitation in Bayesian Clinical Trial Design”

The impact of one‟s prior on adaptive decisions in a Bayesian clinical trial design can be

profound. For example, the prior may have a substantive effect on treatment assignments

involving life-and-death outcomes in trials of treatments for rapidly fatal diseases. This issue is

at the heart of what may be considered the strongest frequentist criticism of Bayesian methods.

Despite this, little attention has been paid to the problem of eliciting and calibrating priors in

such settings. In this talk, I will discuss how priors were established in a variety of different

Bayesian clinical trials in which I have been involved. As time permits, topics will include

Dirichlet priors for discrete 2x2 (response, toxicity) outcomes, the notion of prior effective

sample size, logistic regression and generalized logistic regression models for toxicity as a

function of either the dose of a single agent or the doses of two agents in phase I trials, using

penalized least squares to establish a prior for bivariate (efficacy, toxicity) probabilities as a

function of dose, eliciting a hyperprior for a hierarchical model in a phase II activity trial, and

eliciting priors for event-free survival time and toxicity as functions of patient covariates in a

pediatric brain tumor trial.

599

Paolo Vicini

Pfizer

[email protected]

“Identifiability and Prior Regularization: A PK-PD Modeling View”

Biomedical modeling practitioners have always been aware of the need to

characterize their models’ identifiability properties. This is because in

many instances the models’structure and parameters are indirectly estimated

from measurements gathered in one or more accessible compartments (e.g. blood

plasma). The remainder (inaccessible portion, e.g. peripheral organs) of the

model needs to be reconstructed or inferred from such estimation procedures.

This presentation will address various concepts relating to identifiability,

including clarification of the diverse terms used in the literature.

Procedures to avoid common pitfalls when choosing appropriate

parameterizations for PK-PD models will be described. These ideas will be

also developed in the context of sparse data population analysis with mixed

effects models. Examples from the published literature will be used to

reinforce these concepts.

600

Appendix G – Workshop Evaluations

G.1 Overview of Workshop Evaluation

At each workshop, the participants are asked to complete the SAMSI Workshop

Evaluation, which asks the participants to rank the Workshop in terms of scientific quality, staff,

helpfulness, meeting facilities, lodging, and local transportation and then asks a series of

questions. A sample evaluation is included for review.

The following results are summaries of the evaluations completed after a total of eleven

Workshops during the year broken into two categories: Program Workshops and Education and

Outreach (E&O) Workshops. The Program Workshops include the nine workshops held during

the 2009-2010 fiscal year. The E&O Workshops include the two undergraduate workshops.

The evaluation form is as follows:

SAMSI Evaluation

Your feedback on this workshop is requested by SAMSI’s funding agencies, who view it as

important for assessing and improving our performance. Your feedback is also gratefully

appreciated by SAMSI’s directors, because it will enable us to immediately improve SAMSI

activities. Please fill out this form and hand it to a SAMSI Staff Member, or return it by mail.

1. Personal Information: We are required by our funding agencies to obtain information –

in a standard format – about all participants in SAMSI activities. If you have not already done so,

please go to http://legacy.samsi.info/PartInfo/200910/participantinformationform09101.html to

provide this information. Note that if you have participated in a SAMSI activity since last July 1

and completed this webform, you need not do so again, unless your personal information has

changed.

2. General Ratings: Poor Fair Good Very Excellent

Good .

a. Scientific Quality 1 2 3 4 5

b. Staff Helpfulness 1 2 3 4 5

c. Meeting Room/AV Facilities 1 2 3 4 5

d. Lodging 1 2 3 4 5

e. Local Transportation 1 2 3 4 5

2a. What were the positive aspects of the organization and running of this workshop?

2b. What parts of the organization and running need improvement?

3. Please comment on the Scientific Quality:

601

4. Additional comments on any other aspects of the workshop

5. An important goal of SAMSI is to create synergies between disciplines. How well

did this workshop further this goal?

6. How did you learn of this workshop?

7. Please suggest ideas / contacts for future SAMSI activities

G.2 Evaluation of Scientific Content

Greater than 90% of the respondents rated the scientific content as Very Good or

Excellent for the Program Workshops. The outcomes for the E&O Workshops were similar.

G.2.1 Program Workshops (11 events)

2009 Program on Psychometrics

Summer School on Spatial Statistics

Stochastic Dynamics Opening Workshop & Tutorials

Space-time Analysis for Environmental Mapping, Epidemiology & Climate Change Opening

Workshop & tutorials

Stochastic Dynamics: Self-Organization & Multi-Scale mathematical Modeling of Active

Biological Systems

Sequential Monte Carlo Methods Transition Workshop

Climate Change Workshop

Statistical Aspects of Environmental Risk

Molecular Motors, neuron Models & Epidemics on Networks

G.2.2 Education and Outreach Workshops

UG: Two Day Undergraduate Workshop October 2009

UG: Two Day Undergraduate Workshop February 2010

602

0%

20%

40%

60%

80%

100%

excellent

very good

good

fair

poor

Summary of Science at 2009-2010 at all Workshops

0%

20%

40%

60%

80%

100%

excellent

very good

good

fair

poor

Evaluations of Staff at all Workshops

603

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

excellent

very good

good

fair

poor

Evaluations of Facilities at all Workshops

0%

20%

40%

60%

80%

100%

Excellent

Very Good

Good

Fair

Poor

Evaluations of Lodgings at all Workshops

604

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Excellent

Very Good

Good

Fair

Poor

Evaluations of Transportation at all Workshops