Inequality in PM2.5 Exposure and Health burden attributable in China
Data
We analyzed PM2.5 air quality data, mortality rates, and population data for 372 key cities across 34 provinces in China, covering the period from 2000 to 2019. All datasets, including population, PM2.5 concentrations, and GDP per capita, were processed in raster format with a spatial resolution of 1 km. The China annual PM2.5 concentration dataset was obtained from the National Tibetan Plateau Science Data Center (TPDC)30, which was developed using artificial intelligence techniques to account for the spatiotemporal heterogeneity of air pollution. This dataset, with a spatial resolution of 0.01° (WGS84), has been validated against ground-based measurements to ensure reliability. Population data were sourced from the WorldPop Research Programmer31, which integrates census information and ancillary data such as nightlights, and has been rigorously validated for accuracy across latitudes. The WorldPop Global population dataset was used to quantify the number of residents in each 1 km × 1 km grid cell. Mortality data for Hong Kong, Macao, and Taiwan were obtained from the respective local Center for Health Protection or Census Bureau. The Detailed datasets are presented in Supplementary Table 1.
In this study, we utilized existing PM2.5 and population datasets to estimate PM2.5-attributable mortality using established epidemiological models. Subsequently, we applied decomposition analysis to assess the contribution of multiple factors to mortality disparities. The geographic distribution of PM2.5 and PM2.5-related mortality was then evaluated using the Gini index to quantify spatial inequality across regions and cities. This integrated approach provides a comprehensive framework for investigating the spatial patterns and drivers of air pollution-related health impacts in China.
Population-weighted PM2.5 concentrations
Following the World Health Organization (WHO) guidelines, this study utilized population-weighted PM2.5 concentrations to conduct a comprehensive analysis across multiple administrative levels, including national, provincial, municipal, and county32,33. By assigning greater weights to regions with higher population densities, this approach more accurately reflects the exposure levels of PM2.5 experienced by the populace. This measure combines population distribution in space with PM2.5 concentration differences across geographical regions, providing a more accurate depiction of real-world exposure. The calculation of the population-adjusted PM2.5 concentration is formulated as follows:
$$\,P{M}_{2.5}=\frac{{\sum }_{i=1}^{n}\left({P}_{i}\times {C}_{i}\right)}{{\sum }_{i=1}^{n}{P}_{i}}$$
(1)
Where \({P}_{i}\) and \({C}_{i}\) are the population and PM2.5 concentration in grid pixels \(i\). n is the total number of grid pixels in the analytical area.
Disease burden assessment
In this study, an upgraded Global Exposure Mortality Model (GEMM) was employed to assess the health impact linked to long-term PM2.5 exposure. This refined GEMM incorporated mortality data for five diseases. The model’s computation is based on the following formula17,30,34:
$${M}_{i,j,k}=Po{p}_{i}\times P{S}_{j}\times {B}_{j,k}\times \frac{(R{R}_{i,j,k}-1)}{R{R}_{i,j,k}}$$
(2)
$$R{R}_{i,j,k}=\left\{\begin{array}{l}exp\left\{\frac{{\theta }_{j,k}\log \left(\frac{{C}_{i}-{C}_{0}}{{\alpha }_{j,k}}+1\right)}{1+\exp \left(-\frac{{C}_{i}-{C}_{0}-{\mu }_{j,k}}{{v}_{j,k}}\right)}\right\},\,\,\,\,\,\,\,if\,{C}_{0} < {C}_{i}\\ 1,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,if\,{C}_{0}\ge {C}_{i}\end{array}\right.$$
(3)
\({\mathrm{RR}}_{i,j,k}\) denotes the relative risk; \({C}_{i}\) is the value of PM2.5 concentration (μg m−3) in the grid cell; \({C}_{0}\) denotes the theoretical minimum risk concentration (2.4 μg m−3); \(\theta\), \(\alpha\), \(\mu\), and \(v\) are the fitting parameters of the PM2.5 exposure–response function35,36,37. Considering the uncertainty of the relative risk \({\mathrm{RR}}_{i,j,k}\) in the model, 95% confidence intervals for \({\mathrm{RR}}_{i,j,k}\) were calculated using standard errors in the GEMM (global exposure mortality model), where SE (\({\theta }_{j,k}\)) in Eq. denotes the standard deviation of \({\theta }_{j,k}\).
In the equation, as the PM2.5 exposure level increases, the RR value shows an upward trend. The quantitative correlation between these factors is discernible through the exposure-response function. In this study, the exposure-response function was adapted from GBD’s published Bayesian, regularization, and trim (MR-BRT) model. This method provides a comprehensive evaluation of PM2.5-related health risks, consistent with the advanced statistical approaches used by GBD. The deaths attributable to PM2.5 pollution (DAPP) was calculated primarily using the MR-BRT model median as the main result.
Building on the established link between PM2.5 exposure and risk, we can derive additional metrics to assess its health impact. Particularly, we are able to compute additional metrics such as the mortality rate (deaths per 105 population) and the mortality fraction resulting from PM2.5 exposure:
$$\mathop{\sum }\limits_{i,j=1}^{N}M{R}_{i,j}=\frac{{\sum }_{i,j=1}^{N}{M}_{i,j}}{{\sum }_{i=1}^{N}{P}_{i}}\times {10}^{5}$$
(4)
$$\mathop{\sum }\limits_{k,j=1}^{N}M{F}_{k,j}=\frac{{\sum }_{k,j=1}^{N}{M}_{k,j}}{{\sum }_{k=1}^{N}{\mathrm{AM}}_{k}}\times {10}^{2}$$
(5)
where, \({\mathrm{MR}}_{i,j}\) represents the mortality rate of disease \(j\) in grid \(i\), and \({M}_{i,j}\) is the mortality attributable to PM2.5 exposure for disease \(j\) in grid \(i\); and \({P}_{i}\) denotes the exposed population in grid \(i\). \({\mathrm{MF}}_{k,j}\) represents the mortality fraction of disease \(j\) in province \(k\); \({\mathrm{AM}}_{k}\) represents the all-cause mortality in province \(k\).
Decomposition of health burden drivers
To systematically quantify the contribution of key drivers to changes in PM2.5-related death, we adopted a decomposition approach inspired by the Global Burden of Disease (GBD) study38. This methodology allows for the attribution of changes in PM2.5-attributable deaths to four major factors: population size, age structure, baseline mortality, and PM2.5 concentration. By sequentially adjusting each factor while holding others constant, we can isolate the specific impact of each driver on mortality trends.
Given that the order in which input variables are adjusted can influence the estimated contributions, we performed sensitivity analyses by calculating all possible permutations of the four factors. The final contribution values for each driver were derived by averaging across all 24 possible input sequences, thereby minimizing sequence-dependent bias and enhancing the robustness of our estimates.
The contribution of each driver to the change in PM2.5-related death was calculated using the following key equations:
$${C}_{Pop}=\left({M}_{Pop}-{M}_{t0}\right)\div{M}_{t0}\times 100 \%$$
(6)
$${C}_{PS}=\left({M}_{PS}-{M}_{Pop}\right)\div{M}_{t0}\times 100 \%$$
(7)
$${C}_{B}=\left({M}_{B}-{M}_{PS}\right)\div{M}_{t0}\times 100 \%$$
(8)
$${C}_{AP}=\left({M}_{AP}-{M}_{B}\right)\div{M}_{t0}\times 100 \%$$
(9)
Where \({C}_{\mathrm{Pop}}\), \({C}_{\mathrm{PS}}\), \({C}_{B}\), and \({C}_{\mathrm{AP}}\) represent the percentage contribution of changes in population size, age structure, baseline mortality, and PM2.5 concentration, respectively, to the total change in PM2.5-related death. \({M}_{t0}\) and \({M}_{\mathrm{AP}}\) denote the estimated number of PM2.5-related deaths at the baseline and after all factors have been updated.
This decomposition framework provides a transparent and quantitative basis for understanding the relative importance of demographic, epidemiological, and environmental factors in shaping PM2.5-related health outcomes.
Measuring and decomposing inequalities of PM2.5 and PM2.5-health-burdens
The Gini coefficient39,40 is a well-established metric for quantifying inequality and is widely used in fields such as economic growth41, environmental justice42, income distribution43, and resource allocation44. In this study, we employ the Gini coefficient to assess disparities in both PM2.5 exposure and PM2.5-attributable mortality.
To clarify, we calculate the Gini coefficient in two distinct ways to capture different aspects of inequality. The first approach is person-based: all individuals are ranked by their PM2.5 exposure (or mortality risk), and the Gini coefficient quantifies how unequally PM2.5 exposure (or mortality) is distributed across the population. The formula is as follows:
$$Gini=\frac{{\sum }_{i=1}^{n}{\sum }_{j=1}^{n}|{E}_{i}-{E}_{j}|}{2{n}^{2}\bar{E}}$$
(10)
where \({E}_{i}\) and \({E}_{j}\) represent the PM2.5 exposure (or attributable mortality) for individuals \(i\) and \(j\), \(n\) is the total population size, and \(\bar{E}\) is the mean PM2.5 exposure (or mortality) across all individuals.
The second approach is grid cell-based: all 1-km grid cells are ranked by their average PM2.5 concentration (or mortality rate), and the Gini coefficient reflects the spatial inequality in PM2.5 or mortality across geographic units. The formula is:
$$Gini=\frac{{\sum }_{i=1}^{n}{\sum }_{j=1}^{n}|{C}_{i}-{C}_{j}|}{2{n}^{2}\bar{C}}$$
(11)
Where \({C}_{i}\) and \({C}_{j}\) are the PM2.5 concentration(mortality) of grid cell \(i\) and \(j\). \(n\) is the total number of grid cells in the analytical unit, and \(\bar{C}\) is the mean PM2.5 concentration of all grid cells.
The person-based Gini index incorporates population weighting within each grid cell, thus reflecting disparities in exposure or health burden at the individual level. In contrast, the grid cell-based Gini index measures only the spatial heterogeneity of PM2.5 or mortality across geographic units, without considering population distribution within each cell. Using both metrics allows us to disentangle the effects of population distribution from pure spatial inequality.
To further dissect the sources of inequality, we apply the Dagum decomposition method45. This technique breaks down the overall Gini coefficient into three components: within-group inequality (\({G}_{\text{within}}\)), between-group inequality (\({G}_{\text{between}}\)), and an overlapping component (\({G}_{\text{overlap}}\)) that accounts for distributional overlap between groups. The decomposition is expressed as:
$$Gini=\mathop{\sum }\limits_{k}{v}_{k}^{2}{\lambda }_{k}{G}^{k}+\frac{1}{2}\mathop{\sum }\limits_{k}\mathop{\sum }\limits_{h}{v}_{k}{v}_{h}\left|{\lambda }_{k}-{\lambda }_{h}\right|+R={G}_{within}+{G}_{between}+{G}_{overlap}$$
(12)
Where \({v}_{k}\) is the proportion of group \(k\)‘s population relative to the total population, \({\lambda }_{k}\) is the ratio of group, and \({G}^{k}\) is the Gini coefficient of group \(k\). \({G}_{\mathrm{within}}\) is measured as the weighted average of Gini coefficients within each subgroup. \({G}_{\mathrm{between}}\) reflects the impact of between different subgroups on the overall Gini coefficient. \({G}_{\mathrm{overlap}}\) accounts for the residual effects due to the overlap or interaction of income distributions between subgroups. This decomposition provides a deeper understanding of disparities that exist both between different population groups and within individual groups46.
Specifically, this approach allows for the decomposition of overall inequality into within-group, between-group, and overlapping components at multiple spatial scales. In our study, the decomposition can be applied at various administrative levels, such as provinces, cities, or counties, depending on the research focus. This flexibility enables us to systematically analyze and compare spatial inequality in PM2.5-related health burdens across different geographic units.
Mann–Kendall trend test
The Mann–Kendall (MK) trend test is a widely used nonparametric method for detecting monotonic trends in time series data. Unlike parametric approaches, the MK test does not require assumptions about the underlying data distribution, making it robust to small sample sizes and the presence of outliers. The test statistic S is calculated by evaluating all pairwise comparisons among data points in the series, where each comparison contributes positively, negatively, or neutrally depending on the direction of change.
To assess the significance of the observed trend, the standardized test statistic Z is computed, which under the null hypothesis of no trend follows a standard normal distribution. The p-value derived from Z is used to determine statistical significance, with P < 0.05 indicating a significant trend. In this study, a variance correction factor was applied to account for potential autocorrelation in the time series, thereby improving the reliability and robustness of the MK test results.
To evaluate the robustness of our mortality estimates, we conducted several sensitivity analyses (Supplementary Fig. 1). Specifically, we utilized alternative PM2.5 concentration datasets to recalculate PM2.5-attributable mortalities and compared the results across different data sources. For each dataset, we performed a spatial comparison of PM2.5 concentrations at identical locations to assess consistency and potential biases. Furthermore, principal component analysis (PCA) was applied to the PM2.5 data to investigate the similarity and usability of our primary dataset in relation to existing PM2.5 datasets. The results of these analyses demonstrate high concordance between different sources and confirm the reliability of our data.
link
