This is a piece I did for Statistics Views website back in June 2014. It’s a story about an environmental data science study that involves air quality and environmental human health in Taipei, Taiwan. I have a professional engineering license in environmental engineering. Through Data-Mania, I also offer environmental data science consulting and journalism services. Please contact me at Lillian@LillianPierson.com for more information.
When the Asian “Airpocalypse” Hits: How Biostatistics and Spatio-Temporal Modeling can be used to protect human health in Taipei
Smog, smoke, and dust are the natural by-products of industrialization. When visiting cities that have high levels of smoke and dust in the air, it’s likely that you’ve also experienced a tickle in your throat, a scratch in your nasal passage, or a burn in your eyes. “It’s just a little harmless dust or smoke”, we tell ourselves, “It’s just a minor annoyance. No big deal!” Or, is it?
In fact, it is a big deal. Particulate matter (PM) is the stuff emitted from traffic on roadways, from controlled agricultural and forest burns, and from industrial operations. The U.S. Environmental Protection Agency warns that high PM levels in the air are responsible for “premature death in people with heart or lung disease, nonfatal heart attacks, irregular heartbeat, aggravated asthma, decreased lung function, and increased respiratory symptoms, such as irritation of the airways, coughing or difficulty breathing.” It seems that, like many other things we tend to dismiss as “no big deal”, there is cause for concern. Making matters worse, in many major cities around the world, air quality is only going from bad to worse.
Taipei, Taiwan is one such city. The Taiwanese government has been struggling with air quality issues for years, but has only recently admitted that PM concentrations have reached “unhealthy” levels around several parts of the country, including Taipei. The particles that comprise PM are classified as either “inhalable course particles” (2.5 µm < diameter < 10 µm) or as “fine particles” (diameter ≤ 2.5 µm). Both types of PM have the capacity to become deeply lodged in your lung membranes, but smaller diameter fine particles are likely to pass through your lung membranes and enter your bloodstream. Although the deleterious health effects of PM are well-investigated, solid and consistent finding with respect to fine particles were yet to be established.
Dr. Lung-Chang Chien of University of Texas’ Biostatistics Division has been using Bayesian spatio-temporal modeling to uncover the nonlinear concentration-response relationship between “fine particle” concentrations and children’s respiratory clinic visits in Taipei, Taiwan. Through this approach he has established a clear nonlinear relationship between children’s health responses and different levels of fine particles in the air. He has established that children’s respiratory systems are especially sensitive to relatively low and high concentrations of fine particles.
In an exclusive interview for Statistics Views, Dr. Chien gives an overview on his research, on how his research can be used to improve human health conditions in Taiwan, and on his favorite statistical method for analyzing space-time data related to particulate matter air pollution.
1. Can you elaborate on the statistics methods and techniques that were most useful to you in establishing your findings? Did you try any alternative methods before deciding what method to use? Why did you choose to use Bayesian statistics to solve this problem?
I mainly used the structured additive regression (STAR) model to analyze space-time data. This is a Bayesian modeling approach using either fully Bayesian inference with Markov chain Monte Carlo simulation or empirical Bayesian inference with restricted maximum likelihood technique to capture uncertainty of parameters with certain priors. This modeling approach can be regarded as an extension of generalized additive models with advanced functionality to deal with linear or nonlinear relationships between predictors and outcome variables. This study needs to consider linear predictors, nonlinear smoothing functions, temporal autoregressive, and spatial autocorrelations, which can be simultaneously handled by the STAR model.
Similar modeling approaches, such as the Besag model and the Knorr-Held model, could be applied in this study, but the STAR model is more convenient for programming and more efficient in computation because it has a specific software tool called BayesX to execute model-fitting with only a few lines of code. Since 2011, the STAR model can be fitted in R software tool by using R2BayesX package, which makes its computation faster, and facilitates mapping with a better quality under the graphic device in R. Compared to other models which need complex programming and time-consuming computation, I prefer the STAR model. I strongly recommend it to other environmental health researchers.
2. What key information did you harvest from your chosen statistics methods and how do you hope that public health and environmental authorities in Taipei will use that information to improve their community?
In fact, this study succeeded from a series of previous studies published by me and another co-author, Dr. Hwa-Lung Yu. We used the STAR model to investigate the geographic impact of Asian dust storms on children’s respiratory clinic visits in Taipei, Taiwan. We were happy when we reached our goals of publishing three papers in high impact factor journals (i.e., Environmental Health Perspectives, PLoS One, and Environment International). Those studies reveal geographic disparities in children’s respiratory disease between rural and urban areas of Taipei, and an interesting spatial pattern of elevated risk after one week of an Asian dust storm event.
Those studies were built on a high quality data set that we obtained from National Health Insurance in Taiwan. National Health Insurance provides daily clinic visit records for each district of Taipei. When we noticed that the epidemiologic research of fine particulate matter (PM) had become prevalent in recent years, we started gathering PM2.5 data (data on particulate matter with a diameter of 2.5 microns or less) from air quality monitoring stations. We believed that our previous successes in Asian dust storm and human health research would be replicated in this study. That is the main reason that we made a spatiotemporal analysis of this topic.
After spending several days studying methodological and practical publications related to the STAR model, I recognized that this modeling approach is more functional for dealing with non-coordinate geographic data, such as boundary data. The STAR model was built upon the framework of a generalized additive (mixed) model, but also adopted a spatial function, which is mainly Markov random fields to deal with geographic data.
We hope that the findings of the study will raise awareness of children’s respiratory health issues. We want to raise general awareness around these issues and not just an awareness of the health risks associated with extreme PM2.5 concentrations, but also an awareness around issues related to low concentrations of PM2.5. Low concentrations of PM2.5 are detrimental enough to adversely affect children’s respiratory health as well.
Previous studies have confirmed that PM2.5 can penetrate deeper into human’s respiratory organs than PM10, and that children with premature respiratory organs may suffer greater damages than adults. In our study, we found that grade school children had a higher likelihood of being adversely affected by exposure to PM2.5 than preschool children did. The government in Taiwan still has no school warning systems for PM2.5 concentrations, because the criterion of PM2.5 levels which affects children’s health has not yet been clearly determined.
This study presents an initial threshold for PM2.5 concentrations. This threshold may serve as a reference to policy-makers as they design a preventative criterion for PM2.5 concentrations. Moreover, the spatial distribution estimated by this research can be developed into a GIS-based surveillance system to distinguish spatial areas where children are likely to be more vulnerable to respiratory diseases. Our findings can be used by public health and environmental authorities in Taipei so that they can make interventions or preventions to protect children’s respiratory health and so that they can distribute medical resources to the most appropriate areas of the city.
3. What about spatial statistics. How did you first get interested in this type of statistical modeling and what type of statistical methods have you found most useful in solving problems related to spatial and environmental variation?
When I was doing dissertation research at the University of North Carolina – Chapel Hill, I was involved in investigating the methodology of multi-city time-series air pollution and human health studies, which were usually accomplished by the Bayesian two-stage hierarchical modeling approach. I found that this modeling approach is not conveniently applied to other types of geographic information systems. I was curious about how location-based spatial influence can be considered because there is no spatial function used in this two-stage modeling approach. Thus, I started looking for alternative methods to satisfy my curiosity and concerns.
In 2008, I would occasionally find a website introducing the STAR model, which I had never heard of before. After spending several days studying methodological and practical publications related to the STAR model, I recognized that this modeling approach is more functional for dealing with non-coordinate geographic data, such as boundary data. The STAR model was built upon the framework of a generalized additive (mixed) model, but also adopted a spatial function, which is mainly Markov random fields to deal with geographic data. I found that the spatial function of the STAR model can conduct more scientific evidence to reveal statistical significance in each area attributed to its location. Moreover, map-based visualizations produce a spatial pattern map and a significance map that help us understand spatial variation and the significant pattern of health outcomes. In this way, the STAR model not only became the main part of my dissertation, but also became the main stream of my subsequent research career life.
4. What applications or languages did you use for your statistical and spatial modeling?
BayesX is the main software tool to analyze space-time data using the STAR model. The software consists of a C++ computing kernel with a Java graphical user interface. It also contains built-in graphics facilities. The programming language in BayesX was developed differently from any other language, but the compilers have simplified syntax to facilitate programming so that each model-fitting can be executed in just a few lines of code. BayesX was not well optimized and crashes often occur, especially when importing large datasets or fitting model frameworks that are too complex. Its graphics facilities are also not convenient because they only support the .eps format.
Fortunately, since 2011, the research team of STAR models proposed a package R2BayesX to call BayesX within R via the usual model specification language. This package becomes a device to translate R codes into BayesX codes, which are sent to BayesX for model-fitting, and to store all numerical results in R objects. Hence, users can easily generate and edit high-quality smoothing function plots and maps in R. The R2BayesX also optimizes the efficacy of BayesX, which leads to a decrease in the amount of time required to import big data and fitting models, and which reduces the possibility of crashes in the BayesX program.
Sources
Hwa-Lung Yu and Lung-Chang Chien (2014): Short-term population-based and spatiotemporal nonlinear concentration-response associations between fine particulate matter and children’s respiratory clinic visits, Geophysical Research Abstracts; Vol. 16, EGU2014-3695, 2014; EGU General Assembly, Vienna 2014.