Using Reliable Network Sampling to Render Contact Networks Unavailable to Pathogens

11
“Using Reliable Network Sampling to Render Contact Networks Unavailable to Pathogens” Rick Grannis, Ph.D. UCLA Department of Sociology 1. Introduction Population-level pathogen dynamics and epidemic spread depend as fundamentally on the contact network’s configuration as they do on the pathogen’s transmissibility or on the health characteristics of individual population members. Because most health care responders do not know how the host population’s contact network is configured, however, they assume it to be random or some form of lattice. If, instead, it proves to have a highly skewed distribution, as is often the case, then immunizing those with the greatest network-wide, multi-step exposure and retransmission probabilities can quickly immunize the entire population. A potential host population for a pathogenic contagion proves to have “herd immunity” if we can protect most members by immunizing only a small fraction of them. Ignoring such a protocol when it is readily available can waste time and effort and endanger countless individuals. Accurately modeling the contact network’s structural configuration, however, would seem to require complete data about the network, data which may be no easier to gather than it would be to inoculate the entire network. We propose to overcome this limitation by adapting technology we are developing, itself a combination of innovative network sampling techniques (RDS) fully integrated into handheld devices, which allows for the rapid and easy collection of network data by lay users. RDS is a probability based methodology, well-grounded in sampling theory, which avoids the estimation biases and confounds typically associated with chain-referral samples by not attempting to estimate directly to the population from the sample but by

Transcript of Using Reliable Network Sampling to Render Contact Networks Unavailable to Pathogens

“Using Reliable Network Sampling to Render Contact NetworksUnavailable to Pathogens”

Rick Grannis, Ph.D.UCLA Department of Sociology

1. Introduction

Population-level pathogen dynamics and epidemic spread depend as fundamentally on the contact network’s configuration as they do on the pathogen’s transmissibility or on the health characteristics of individual population members. Because most health care responders do not know how the host population’s contact network is configured, however, they assume it to be random or some form of lattice. If, instead, it proves to have ahighly skewed distribution, as is often the case, then immunizingthose with the greatest network-wide, multi-step exposure and retransmission probabilities can quickly immunize the entire population. A potential host population for a pathogenic contagion proves to have “herd immunity” if we can protect most members by immunizing only a small fraction of them. Ignoring such a protocol when it is readily available can waste time and effort and endanger countless individuals.

Accurately modeling the contact network’s structural configuration, however, would seem to require complete data aboutthe network, data which may be no easier to gather than it would be to inoculate the entire network. We propose to overcome this limitation by adapting technology we are developing, itself a combination of innovative network sampling techniques (RDS) fullyintegrated into handheld devices, which allows for the rapid and easy collection of network data by lay users. RDS is a probability based methodology, well-grounded in sampling theory, which avoids the estimation biases and confounds typically associated with chain-referral samples by not attempting to estimate directly to the population from the sample but by

analyzing the population specifically as a network.

By integrating RDS into a user-friendly handheld application, we achieve a means to dramatically reduce the enormous non-response biases which typically plague most surveys by both surveying morerapidly, easily surveying face to face, and in many other ways. As we collect data using the handheld devices, the system will extrapolate from the sampled data to theoretically model the contact network. We will generate dynamic, interactive visualization tools to map disease patterns over time, to facilitate user understanding of disease progress, to visually predict future disease progression, and to allow for a real time,analytic responses. Based on our emerging understanding of the host population’s network geometry, we will identify key individuals and locations for intervention to maximally curtail disease outbreak and will estimate how various interventions willdifferentially impact this networked population and thus the projected spread of the pathogen.

2. The Success of Pathogenic Contagions depend on the Configuration of the Host Population’s Contact Network

The effective spread of contagious diseases moving across individual-to-individual contact networks is determined by personal health characteristics of individual members of the hostpopulation which render them more or less susceptible or immune to the disease, the mode of transmission (e.g., diffuse such as aerosol versus direct such as sexually transmitted), and the social structure of the host population. The social structure ofthe host population is a network consisting of individuals and their interconnections, with each interconnection representing a probability of one individual exposing another to the disease (asa function of the mode of transmission). Thus, an individual’s probability of both of being exposed to a disease and subsequently exposing others (i.e. the probability of that individual transmitting the disease through the population) is

determined by the patterning of their interactions which serves as the substrate for disease transmission.

The underlying social structure across which the pathogen spreadsnot only impacts the susceptibility of individuals to infection, with each individual’s personal susceptibility being a function of both their personal health characteristics as well as their personal network of exposure-capable interactions, but it also strongly impacts the susceptibility of an entire population to pathogen invasion (Pastor-Satorras and Vespignani 2001) and epidemic spread (Newman 2002; Barthelemy et al. 2005; Meyers et al. 2005). Population-level pathogen dynamics depend as fundamentally on the contact network’s configuration as they do on the pathogen’s transmissibility or on the health characteristics of individual population members (Newman 2005; McCallum et. al. 2001). The network geometry of the epidemiologically active contacts’ distribution renders some populations more or less susceptible or immune than others.

3. The Problem: Assuming a Contact Network Configuration in the Absence of Data

This problem is well known, but because most health care responders do not have access to information about the configuration of the host population’s contact network, they mustassume a hypothetical network configuration to which they will

Figure 1-A Figure 1-B

A Network composed of Individuals connected by relatively Randomly Distributed Interactions

A Network composed of Individuals connected by Interactions Distributed as a Lattice

respond. Typically, they treat the network (at least implicitly)as if it were either random or some form of lattice (Figure 1). Given no other information, these seem like reasonable assumptions. Random distribution is often a default for any probabilistic scenario and a lattice would be a reasonable approximation if the interactions were highly constrained in geographic space. If indeed these assumptions are correct, if indeed the population has a random (perhaps Poisson) or lattice configuration, then information about the large-scale network would not offer much value and vaccination protocols should focustheir primary attention to immunizing the most individuals or should be based on individual health considerations.

If, however, these assumptions do not prove to be correct, if indeed the networks of potential contagious contacts are distributed across the population in some other configuration, then efficient vaccination protocols may need to focus their attention beyond immunizing the greatest number of individuals. For example, if the host population’s contact network exhibits a highly skewed distribution, then immunizing those with the greatest network-wide, multi-step exposure and retransmission probabilities can quickly immunize the entire population (Albert et. al. 2000, Barthelemy et al. 2005).

In Figure 2, immunizing those forming the linear backbone of the network would isolate the epidemic. Note that this may or may not be similar to, but is definitely not identical to, immunizingthose with the greatest individual exposure and retransmission probabilities. In Figure 2-A, the blue node has six contact

interactions but can only expose or be exposed by those six. In contrast, the red node has only two contact interactions but serves as a gateway for the pathogen to spread among more than a hundred nodes.

4. Forecasting “Herd Immunity” to Save Time, Effort, and Lives

A potential host population for a pathogenic contagion proves to have “herd immunity” if we can protect most population members byimmunizing only a fraction of its members (Anderson and May, 1990and 1991). In Figure 2-B, eight individuals (about five percent of the host population) were immunized. Immunized individuals are effectively removed from the contact network, thereby breaking possible chains of transmission. Because they were a

Figure 2-A Figure 2-B

A Network composed of Individuals connected by a highly Skewed Distribution of Contact Interactions

A Network composed of Individuals connected by a highly Skewed Distribution of Contact Interactions with Eight Central Nodes (5 % ) Immunized

critical five percent, the host population’s contact network has disappeared. Removing eight individuals transformed a single network into 55 disconnected sub-networks (the largest of which contains only 11 members, the average size of which is less than three, and 33 of which exist as isolated individuals). From an epidemiological perspective, this means that a typical person exposed to the pathogen now can, on average, threaten only two others rather than the entire population. By using targeted immunization, the network has been transformed into many dramatically less susceptible networks by immunizing only eight individuals.

Since, in the original host network, 63 % of all immunized individuals would have protected no one else by being immunized (e.g. they would have removed only themselves from the contact population), if we had immunized eight individuals randomly, there is a two thirds chance (66 % probability) that no more thanthree of the individuals we immunized would protect any other than themselves and a 97 % probability that no more than five would. Furthermore, the odds are about even (46 % probability) that we would not immunize a single one of the individuals forming the “backbone” of the contact network. This means that there is about a 50-50 chance that randomly vaccinating eight individuals would even break the network into two parts, much less 55 pieces, and very high probability that very few, other than the eight individuals we vaccinated, would be protected.

Vaccination programs which target individuals randomly or are based on individual health considerations, but which do not attempt to achieve “herd immunity,” especially when it is easily achievable, waste time and effort and endanger countless individuals. Before undertaking any immunization protocol, it isessential to properly understand the structure of the social network through which the epidemic may spread, including its higher order structural properties, and not merely its degree

distribution, as this has profoundly important implications for the spread of the pathogen. Ideally, immunization protocols would be as much a function of the structure of the contact network as they were of mode of transmission.

5. The Solution: Rapidly Gathering Data about the Contact Network

Even though it is well known that correctly modeling the contact network would be critically important since a successful contagion depends as much or more on the host population’s contact network as it does on the pathogen’s mode and effectiveness of individual-to-individual transmission, this is not often useful information because accurately modeling the contact network’s structural configuration would seem to require complete data about the network, data which may be no easier to gather than it would be to inoculate the entire network.

We overcome this limitation by adapting technology we are developing, itself a combination of innovative network sampling techniques, specifically respondent-driven sampling, fully integrated into handheld devices which allow for the rapid and easy collection of data by lay users.

6. Respondent Driven Sampling (RDS) as an Efficient Means of Tracing a ContactNetwork

Respondent Driven Sampling (RDS), like all network sampling methodologies, collects waves of data by tracing along the socialnetwork of the underlying population, by mapping the very contactnetwork along which the pathogenic contagion must spread. In RDS, an initial sample is generated and this first wave nominatesthose they are connected to in the social network, a sample of which becomes the second wave which then nominates those they areconnected to in the social network, a sample of which becomes thethird wave, and so on. The sample moves wave by wave along the social network, tracing it.

RDS is not a mere convenience sample but is probability based andwell-grounded in sampling theory. RDS avoids the estimation biases and confounds typically associated with chain-referral or network samples by not attempting to estimate directly to the population from the sample but by analyzing the population specifically as a network. In this sense, RDS does not sample individuals but rather relational ties inter-connecting them, thesame ties upon which the disease will attempt to spread. The parameters of the population social network are estimated from the statistics of the sampled social network and only then the parameters of the general population are estimated from the parameters of the population social network. Because RDS sampling is a regular Markov process, as the sample expands wave by wave, its composition becomes independent of its initial starting points and matches the population equilibrium at a geometric rate (usually only 4 to 6 recruitment waves). Because it typically yields much higher response rates, RDS is often preferable to random samples, except in cases of clearly documented, highly cooperative populations.

Figure 3 (from Salganik and Heckathorn, 2004)

RDS compared with traditional sampling and estimation methods. By not attempting to estimate directly to the population from the sample, RDS avoids many problems of estimation from a chain-referral sample.

7. The Proven Advantages of Using RDS in Handheld Devices

By integrating RDS into a user-friendly handheld device application, we have developed a means to dramatically reduce theenormous non-response biases which typically plague most surveys.This integration has allowed us to survey rapidly (e.g., questions can be asked and answers recorded in a fraction of the time of traditional surveys). The devices also enhance possibilities for face-to-face interviews because they are portable and do not require handling intimidating stacks of interview material. The handheld application makes data management much less burdensome and time-consuming (e.g., subjects give their responses, the surveyor enters them, and the data gets uploaded directly into an electronic file with fewer possibilities for transcription errors as data is transcribed directly into the database during the interview). Furthermore, for multi-lingual environments, while it is certainly preferable to have field researchers familiar with respondents’ language, when this is not possible (as often as it is not), as long as therespondent can use the handheld device, a translated questionnaire is available; thus, entire populations which might have been previously excluded due to linguistic barriers are effectively re-included in the sampled population. Additionally,this methodology has proven to have many other advantages including being simple for research assistants to learn as well as providing state-of-the-art credibility.

8. Forecasting the Dynamic Impact of Interventions to render the Contact Network Unavailable to Pathogens

We will adapt our previous implementation of RDS in a handheld device to gather information relevant to the individual-level

pathogen transmission. As we collect data using the handheld devices, the system will not only guide further collection efforts but will extrapolate from the sampled data to theoretically model the network.

Since the configuration of the underlying contact network structure, the substrate which guides and constrains pathogen dynamics (McCallum et al. 2001; Meyers et al. 2005), constantly changes its topology through both natural vaccination by the disease as well as artificial vaccination by health care workers (Anderson and May, 1990; Newman, 2005; Pourbohloul et al, 2005),we will have dynamic, interactive visualization tools to map disease patterns over time, to facilitate user understanding of disease progress, to visually predict future disease progression,and to allow for a real time, analytic response. Thus, the user will be able to dynamically view the past, present, and predicteddistribution of the disease.

Finally, based on our emerging understanding of the host population’s network geometry, we will identify key individuals and locations for intervention to maximally curtail disease outbreak and will estimate how various interventions will differentially impact this networked population and thus the projected spread of the pathogen.

REFERENCES

Albert, R., Jeong, H. and Barabasi, A. L. 2000. “Error and attack tolerance of complex networks.” Nature 406: 378–382.

Anderson, R. M. and May, R. M. 1990. “Immunization and herd-immunity.” Lancet 335: 641–645.

Anderson, R. M. and May, R. M. 1991. Infectious diseases of humans: dynamics and control. Oxford, U.K: Oxford University Press.

Barthelemy, M., Barrat, A., Pastor-Satorras, R. and Vespignani, A. 2005. “Dynamical patterns of epidemic outbreaks in complex

heterogeneous networks.” Journal of Theoreticla Biology 235:275–288.

McCallum, H., Barlow, N. and Hone, J. 2001. “How should pathogen transmission be modelled?” Trends in Ecology and Evolution 16: 295–300.

Meyers, L. A., Pourbohloul, B., Newman, M. E. J., Skowronski, D. M. and Brunham, R. C. 2005. “Network theory and SARS: predicting outbreak diversity.” Journal of Theoretical Biology 232: 71–81.

Newman, M. E. J. 2002. “Spread of epidemic disease on networks.” Physical Review E 66: 016128.

Newman, M. E. J. 2005. “Threshold effects for two pathogens spreading on a network.” Physical Review Letters 95: 108701.

Pastor-Satorras, R. and Vespignani, A. 2001. “Epidemic spreading in scale-free networks.” Physical Review Letters 86: 3200–3203.

Pourbohloul, B., Meyers, L. A., Skowronski, D. M., Krajden, M., Patrick, D. M. and Brunham, R. C. 2005. “Modeling control strategies of respiratory pathogens.” Emerg. Infect. Dis. 11: 1249–1256.

Salganik, Matthew J. and Douglas D. Heckathorn. 2004. "Sampling and Estimation in Hidden Populations Using Respondent-Driven Sampling." Sociological Methodology. 35: 193-238.