Inequality in PM2.5 Exposure and Health burden attributable in China

Table of Contents

Data

We analyzed PM_2.5 air quality data, mortality rates, and population data for 372 key cities across 34 provinces in China, covering the period from 2000 to 2019. All datasets, including population, PM_2.5 concentrations, and GDP per capita, were processed in raster format with a spatial resolution of 1 km. The China annual PM_2.5 concentration dataset was obtained from the National Tibetan Plateau Science Data Center (TPDC)³⁰, which was developed using artificial intelligence techniques to account for the spatiotemporal heterogeneity of air pollution. This dataset, with a spatial resolution of 0.01° (WGS84), has been validated against ground-based measurements to ensure reliability. Population data were sourced from the WorldPop Research Programmer³¹, which integrates census information and ancillary data such as nightlights, and has been rigorously validated for accuracy across latitudes. The WorldPop Global population dataset was used to quantify the number of residents in each 1 km × 1 km grid cell. Mortality data for Hong Kong, Macao, and Taiwan were obtained from the respective local Center for Health Protection or Census Bureau. The Detailed datasets are presented in Supplementary Table 1.

In this study, we utilized existing PM_2.5 and population datasets to estimate PM_2.5-attributable mortality using established epidemiological models. Subsequently, we applied decomposition analysis to assess the contribution of multiple factors to mortality disparities. The geographic distribution of PM_2.5 and PM_2.5-related mortality was then evaluated using the Gini index to quantify spatial inequality across regions and cities. This integrated approach provides a comprehensive framework for investigating the spatial patterns and drivers of air pollution-related health impacts in China.

Population-weighted PM_2.5 concentrations

Following the World Health Organization (WHO) guidelines, this study utilized population-weighted PM_2.5 concentrations to conduct a comprehensive analysis across multiple administrative levels, including national, provincial, municipal, and county^32,33. By assigning greater weights to regions with higher population densities, this approach more accurately reflects the exposure levels of PM_2.5 experienced by the populace. This measure combines population distribution in space with PM_2.5 concentration differences across geographical regions, providing a more accurate depiction of real-world exposure. The calculation of the population-adjusted PM_2.5 concentration is formulated as follows:

$$\,P{M}_{2.5}=\frac{{\sum }_{i=1}^{n}\left({P}_{i}\times {C}_{i}\right)}{{\sum }_{i=1}^{n}{P}_{i}}$$

(1)

Where ${P}_{i}$ and ${C}_{i}$ are the population and PM_2.5 concentration in grid pixels $i$. n is the total number of grid pixels in the analytical area.

Disease burden assessment

In this study, an upgraded Global Exposure Mortality Model (GEMM) was employed to assess the health impact linked to long-term PM_2.5 exposure. This refined GEMM incorporated mortality data for five diseases. The model’s computation is based on the following formula^17,30,34:

$${M}_{i,j,k}=Po{p}_{i}\times P{S}_{j}\times {B}_{j,k}\times \frac{(R{R}_{i,j,k}-1)}{R{R}_{i,j,k}}$$

(2)

$$R{R}_{i,j,k}=\left\{\begin{array}{l}exp\left\{\frac{{\theta }_{j,k}\log \left(\frac{{C}_{i}-{C}_{0}}{{\alpha }_{j,k}}+1\right)}{1+\exp \left(-\frac{{C}_{i}-{C}_{0}-{\mu }_{j,k}}{{v}_{j,k}}\right)}\right\},\,\,\,\,\,\,\,if\,{C}_{0} < {C}_{i}\\ 1,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,if\,{C}_{0}\ge {C}_{i}\end{array}\right.$$

(3)

${\mathrm{RR}}_{i,j,k}$ denotes the relative risk; ${C}_{i}$ is the value of PM_2.5 concentration (μg m⁻³) in the grid cell; ${C}_{0}$ denotes the theoretical minimum risk concentration (2.4 μg m⁻³); $\theta$, $\alpha$, $\mu$, and $v$ are the fitting parameters of the PM_2.5 exposure–response function^35,36,37. Considering the uncertainty of the relative risk ${\mathrm{RR}}_{i,j,k}$ in the model, 95% confidence intervals for ${\mathrm{RR}}_{i,j,k}$ were calculated using standard errors in the GEMM (global exposure mortality model), where SE (${\theta }_{j,k}$) in Eq. denotes the standard deviation of ${\theta }_{j,k}$.

In the equation, as the PM_2.5 exposure level increases, the RR value shows an upward trend. The quantitative correlation between these factors is discernible through the exposure-response function. In this study, the exposure-response function was adapted from GBD’s published Bayesian, regularization, and trim (MR-BRT) model. This method provides a comprehensive evaluation of PM_2.5-related health risks, consistent with the advanced statistical approaches used by GBD. The deaths attributable to PM_2.5 pollution (DAPP) was calculated primarily using the MR-BRT model median as the main result.

Building on the established link between PM_2.5 exposure and risk, we can derive additional metrics to assess its health impact. Particularly, we are able to compute additional metrics such as the mortality rate (deaths per 10⁵ population) and the mortality fraction resulting from PM_2.5 exposure:

$$\mathop{\sum }\limits_{i,j=1}^{N}M{R}_{i,j}=\frac{{\sum }_{i,j=1}^{N}{M}_{i,j}}{{\sum }_{i=1}^{N}{P}_{i}}\times {10}^{5}$$

(4)

$$\mathop{\sum }\limits_{k,j=1}^{N}M{F}_{k,j}=\frac{{\sum }_{k,j=1}^{N}{M}_{k,j}}{{\sum }_{k=1}^{N}{\mathrm{AM}}_{k}}\times {10}^{2}$$

(5)

where, ${\mathrm{MR}}_{i,j}$ represents the mortality rate of disease $j$ in grid $i$, and ${M}_{i,j}$ is the mortality attributable to PM_2.5 exposure for disease $j$ in grid $i$; and ${P}_{i}$ denotes the exposed population in grid $i$. ${\mathrm{MF}}_{k,j}$ represents the mortality fraction of disease $j$ in province $k$; ${\mathrm{AM}}_{k}$ represents the all-cause mortality in province $k$.

Decomposition of health burden drivers

To systematically quantify the contribution of key drivers to changes in PM_2.5-related death, we adopted a decomposition approach inspired by the Global Burden of Disease (GBD) study³⁸. This methodology allows for the attribution of changes in PM_2.5-attributable deaths to four major factors: population size, age structure, baseline mortality, and PM_2.5 concentration. By sequentially adjusting each factor while holding others constant, we can isolate the specific impact of each driver on mortality trends.

Given that the order in which input variables are adjusted can influence the estimated contributions, we performed sensitivity analyses by calculating all possible permutations of the four factors. The final contribution values for each driver were derived by averaging across all 24 possible input sequences, thereby minimizing sequence-dependent bias and enhancing the robustness of our estimates.

The contribution of each driver to the change in PM_2.5-related death was calculated using the following key equations:

$${C}_{Pop}=\left({M}_{Pop}-{M}_{t0}\right)\div{M}_{t0}\times 100 \%$$

(6)

$${C}_{PS}=\left({M}_{PS}-{M}_{Pop}\right)\div{M}_{t0}\times 100 \%$$

(7)

$${C}_{B}=\left({M}_{B}-{M}_{PS}\right)\div{M}_{t0}\times 100 \%$$

(8)

$${C}_{AP}=\left({M}_{AP}-{M}_{B}\right)\div{M}_{t0}\times 100 \%$$

(9)

Where ${C}_{\mathrm{Pop}}$, ${C}_{\mathrm{PS}}$, ${C}_{B}$, and ${C}_{\mathrm{AP}}$ represent the percentage contribution of changes in population size, age structure, baseline mortality, and PM_2.5 concentration, respectively, to the total change in PM_2.5-related death. ${M}_{t0}$ and ${M}_{\mathrm{AP}}$ denote the estimated number of PM_2.5-related deaths at the baseline and after all factors have been updated.

This decomposition framework provides a transparent and quantitative basis for understanding the relative importance of demographic, epidemiological, and environmental factors in shaping PM_2.5-related health outcomes.

Measuring and decomposing inequalities of PM_2.5 and PM_2.5-health-burdens

The Gini coefficient^39,40 is a well-established metric for quantifying inequality and is widely used in fields such as economic growth⁴¹, environmental justice⁴², income distribution⁴³, and resource allocation⁴⁴. In this study, we employ the Gini coefficient to assess disparities in both PM_2.5 exposure and PM_2.5-attributable mortality.

To clarify, we calculate the Gini coefficient in two distinct ways to capture different aspects of inequality. The first approach is person-based: all individuals are ranked by their PM_2.5 exposure (or mortality risk), and the Gini coefficient quantifies how unequally PM_2.5 exposure (or mortality) is distributed across the population. The formula is as follows:

$$Gini=\frac{{\sum }_{i=1}^{n}{\sum }_{j=1}^{n}|{E}_{i}-{E}_{j}|}{2{n}^{2}\bar{E}}$$

(10)

where ${E}_{i}$ and ${E}_{j}$ represent the PM_2.5 exposure (or attributable mortality) for individuals $i$ and $j$, $n$ is the total population size, and $\bar{E}$ is the mean PM_2.5 exposure (or mortality) across all individuals.

The second approach is grid cell-based: all 1-km grid cells are ranked by their average PM_2.5 concentration (or mortality rate), and the Gini coefficient reflects the spatial inequality in PM_2.5 or mortality across geographic units. The formula is:

$$Gini=\frac{{\sum }_{i=1}^{n}{\sum }_{j=1}^{n}|{C}_{i}-{C}_{j}|}{2{n}^{2}\bar{C}}$$

(11)

Where ${C}_{i}$ and ${C}_{j}$ are the PM_2.5 concentration(mortality) of grid cell $i$ and $j$. $n$ is the total number of grid cells in the analytical unit, and $\bar{C}$ is the mean PM_2.5 concentration of all grid cells.

The person-based Gini index incorporates population weighting within each grid cell, thus reflecting disparities in exposure or health burden at the individual level. In contrast, the grid cell-based Gini index measures only the spatial heterogeneity of PM_2.5 or mortality across geographic units, without considering population distribution within each cell. Using both metrics allows us to disentangle the effects of population distribution from pure spatial inequality.

To further dissect the sources of inequality, we apply the Dagum decomposition method⁴⁵. This technique breaks down the overall Gini coefficient into three components: within-group inequality (${G}_{\text{within}}$), between-group inequality (${G}_{\text{between}}$), and an overlapping component (${G}_{\text{overlap}}$) that accounts for distributional overlap between groups. The decomposition is expressed as:

$$Gini=\mathop{\sum }\limits_{k}{v}_{k}^{2}{\lambda }_{k}{G}^{k}+\frac{1}{2}\mathop{\sum }\limits_{k}\mathop{\sum }\limits_{h}{v}_{k}{v}_{h}\left|{\lambda }_{k}-{\lambda }_{h}\right|+R={G}_{within}+{G}_{between}+{G}_{overlap}$$

(12)

Where ${v}_{k}$ is the proportion of group $k$‘s population relative to the total population, ${\lambda }_{k}$ is the ratio of group, and ${G}^{k}$ is the Gini coefficient of group $k$. ${G}_{\mathrm{within}}$ is measured as the weighted average of Gini coefficients within each subgroup. ${G}_{\mathrm{between}}$ reflects the impact of between different subgroups on the overall Gini coefficient. ${G}_{\mathrm{overlap}}$ accounts for the residual effects due to the overlap or interaction of income distributions between subgroups. This decomposition provides a deeper understanding of disparities that exist both between different population groups and within individual groups⁴⁶.

Specifically, this approach allows for the decomposition of overall inequality into within-group, between-group, and overlapping components at multiple spatial scales. In our study, the decomposition can be applied at various administrative levels, such as provinces, cities, or counties, depending on the research focus. This flexibility enables us to systematically analyze and compare spatial inequality in PM_2.5-related health burdens across different geographic units.

Mann–Kendall trend test

The Mann–Kendall (MK) trend test is a widely used nonparametric method for detecting monotonic trends in time series data. Unlike parametric approaches, the MK test does not require assumptions about the underlying data distribution, making it robust to small sample sizes and the presence of outliers. The test statistic S is calculated by evaluating all pairwise comparisons among data points in the series, where each comparison contributes positively, negatively, or neutrally depending on the direction of change.

To assess the significance of the observed trend, the standardized test statistic Z is computed, which under the null hypothesis of no trend follows a standard normal distribution. The p-value derived from Z is used to determine statistical significance, with P < 0.05 indicating a significant trend. In this study, a variance correction factor was applied to account for potential autocorrelation in the time series, thereby improving the reliability and robustness of the MK test results.

To evaluate the robustness of our mortality estimates, we conducted several sensitivity analyses (Supplementary Fig. 1). Specifically, we utilized alternative PM_2.5 concentration datasets to recalculate PM_2.5-attributable mortalities and compared the results across different data sources. For each dataset, we performed a spatial comparison of PM_2.5 concentrations at identical locations to assess consistency and potential biases. Furthermore, principal component analysis (PCA) was applied to the PM_2.5 data to investigate the similarity and usability of our primary dataset in relation to existing PM_2.5 datasets. The results of these analyses demonstrate high concordance between different sources and confirm the reliability of our data.

link

Inequality in PM2.5 Exposure and Health burden attributable in China

Data