Keywords:
Landsat imagery; AOD retrieval; Transformers; Google Earth Engine (GEE); Explainable Artificial Intelligence (XAI)
Introduction
Atmospheric aerosols are suspended particulate matter consisting of solid or liquid particles with diameters ranging from 10 nm to 100 μm suspended in the atmosphere, chiefly in the lower atmosphere (Kaufman et al., 1997). These aerosols originate from natural processes such as volcanic eruptions, wildfires, and dust storms, and anthropogenic activities, including the combustion of fuels, vehicular emissions, and industrial and agricultural processes. Aerosols can alter the Earth-atmosphere radiative balance by absorbing and scattering solar radiation, directly impacting ecosystems and climate (Ramanathan et al., 2001; Kaufman et al., 2002;). They also indirectly influence climate change through interactions with clouds and participation in both physical responses and chemical reactions (Andreae et al., 2004; Rosenfeld et al., 2008, 2014; Z. Li et al., 2011, 2016;). Furthermore, in the form of fine and ultra-fine particulate matter like PM2.5 and PM1, aerosols significantly deteriorate air quality and pose substantial threats to public health (Fuller et al., 2022; Fuzzi et al., 2015; Kim et al., 2015; Z. Li et al., 2017). Therefore, accurate determination and assessment of aerosol quantity and characteristics is of utmost significance in monitoring atmospheric environmental pollution and understanding climate change dynamics.
Aerosol optical depth (AOD) is defined as the integral of a medium’s extinction coefficient in the vertical direction and serves as a critical optical parameter for characterizing aerosol content and assessing atmospheric turbidity (Kaufman et al., 2002). Satellite remote sensing offers an efficient means to acquire AOD distributions across extended timeframes and broad spatial extents. At present, a multitude of mature aerosol retrieval algorithms have been established, e.g., Dark Target (Levy et al., 2013), Deep Blue (Hsu et al., 2013), and Multi-Angle Implementation of Atmospheric Correction (Lyapustin et al., 2018), using atmospheric radiative transfer (ART) models. These algorithms have been successfully and operationally applied to medium- and low-spatial-resolution sun-synchronous satellites, such as the MODIS and VIIRS, as well as high-temporal-resolution geostationary satellites, including Himawari and the GOES-R Series (S. Li et al., 2019; She et al., 2019; W. Wang et al., 2022), providing daily or hourly AOD products covering the entire Earth with considerable accuracy. These datasets are invaluable for real-time air quality monitoring, tracking, forecasting, and a variety of related studies, particularly in the retrieval of particulate matter (Ma et al., 2022; J. Wei et al., 2021; Zhang et al., 2021).
However, for high-spatial-resolution satellites like the Landsat series and Sentinel-2, there are considerably fewer studies dedicated to aerosol retrieval. Satellite-derived AOD serves as the primary and most crucial input for atmospheric correction, directly impacting the accuracy of quantitative remote sensing information extraction (Doxani et al., 2023; Z. B. Li et al., 2019; Vermote et al., 2016). Traditionally, physically based methods informed by an ART model have been employed to retrieve AODs at a 30-m resolution from Landsat imagery, with a particular emphasis on urban areas (Yang et al., 2022). Luo et al. (2015) developed a simple ImAero-Landsat algorithm without using look-up tables and retrieved AODs from Landsat 7 imagery, which was then applied to improve PM10 estimates in Beijing, China. Tian et al. (2018) retrieved AOD from Landsat 8 imagery in Beijing City by estimating surface reflectance with a bidirectional reflectance distribution function (BRDF) using the RossThick-LiSparse model. Lin et al. (2021a) improved AOD retrievals from Landsat 8 images by developing three schemes for estimating land surface reflectance and by considering four aerosol types in Beijing and Wuhan, China. They then further refined a fusion retrieval algorithm specifically designed for urban areas using Landsat 8 and Sentinel-2 imagery (Lin et al., 2021b). Kumar and Mehta (2023) introduced a simple iterative approach to enhance AOD retrievals from Landsat 8 and Sentinel 2 data by using visible and near-infrared spectral signals across diverse land surface types. Our group has also made notable contributions to facilitating AOD retrievals from Landsat 4–8 images through the construction of priori land surface reflectance databases and the determination of aerosol types using time series analyses of historical ground measurements in Beijing, China (L. Sun et al., 2016; J. Wei et al., 2017).
Traditional physical methods for coupled unknown variables associated with the Atmosphere-Earth system typically depend on empirical assumptions. However, the varying sensitivity of satellite signals to diverse underlying surfaces, coupled with the high spatiotemporal variability of aerosols and their compositions, introduces inherent uncertainties that have a substantial impact on the accuracy of aerosol retrievals (Z. Li et al., 2009; X. Wei et al., 2020). The growing capacity for robust information mining and the capability to tackle intricate, non-linear problems have led to the increased applications of artificial intelligence (AI) techniques in satellite aerosol retrievals across a range of satellite platforms (Jia et al., 2022; Su et al., 2020; Tao et al., 2023; Yeom et al., 2022). However, its application to Landsat imagery is still in its early experimental stages, and only a handful of studies have been conducted thus far. For example, Liang et al. (2022) applied a range of tree-based machine learning models to retrieve AOD from Landsat 7 and 8 scenes covering the Beijing area, utilizing prior knowledge from multiple sources as inputs. She et al. (2022) demonstrated the effectiveness of a deep neural network (DNN) model to retrieve AOD from Landsat 8 data over land, showing strong correlations with ground-based measurements.
The Landsat series of satellites have been providing long-term, high-spatial-resolution, and high-quality global-scale imagery, continuously improving and updating their capabilities since the 1970s with over 50 years of continuity (Crawford et al., 2023; Wulder et al., 2022). These advantages have made Landsat an indispensable tool in a variety of applications, such as natural resources mapping, land cover change analysis, and ecological environmental monitoring (Masek et al., 2020; Wulder et al., 2019). To achieve greater reliability and accuracy in these applications, it is crucial to acquire accurate surface reflectance data by removing atmospheric effects through atmospheric correction, underscoring the urgent need for AOD products. However, given the relatively long revisit periods (~16 days) with the associated challenges in real-time air quality monitoring, a significant gap persists in the availability of widely accepted global aerosol retrieval algorithms and products specifically designed for Landsat data.
With these challenges in mind, this study introduces a state-of-the-art powerful deep-learning model called Transformers, developed to tackle the intricate nonlinear problems associated with Atmosphere-Earth decoupling in aerosol retrievals for Landsat imagery. To further enhance the model’s performance, multi-dimensional information, including spatial variability, temporal sequences, and the topographical impact of air pollution, are incorporated into the model, resulting in the novel GeoChronoTransformers (GCT) model. Furthermore, we employ an ART model to determine the key input features influencing aerosol retrievals from a physical perspective and utilize the eXplainable Artificial Intelligence (XAI) technique to dissect the individual contributions of various features to aerosol retrievals. Lastly, we automate the complete process of data preprocessing and accomplish global-scale aerosol retrievals from Landsat imagery on the Google Earth Engine (GEE) platform. We assess the model’s AOD retrieval performance using a variety of independent validation methods and compare our results with those from widely used machine and deep-learning models. We also verify the model’s robustness by testing it in four representative regions selected from around the world.
  1. Materials and methods
  2. Data sources
  3. Landsat 8/9 imagery
The Landsat Operational Land Imager (OLI) is a multispectral instrument onboard the eighth satellite of the Landsat Data Continuity Mission, which was launched successfully on 11 February 2013. Landsat 8 (L8) OLI provides satellite data covering the spectrum from visible to shortwave infrared channels (Roy et al., 2014). L8 OLI introduces new bands for coastal aerosol (deep blue) and cirrus cloud detection bands in comparison to its predecessor, the Landsat 7 Enhanced Thematic Mapper Plus, which ceased operations in May 2021. It also boasts improvements in imaging width (185 km × 180 km) and daily image captures (725 scenes per day), enriching the multispectral radiometric dataset and expanding the potential for capturing cloud-free scenes globally. To minimize data gaps within the Landsat record, Landsat 9 (L9) was launched on 27 September 2021, carrying the OLI-2 sensor. It closely aligns with L8 OLI in terms of spectral bands, spatiotemporal coverage, radiometry, and geometric precision, further enhancing measurement accuracy and stability while offering extended imaging capabilities (Masek et al., 2020). L9 operates in orbit alongside L8, with an 8-day orbital offset, resulting in an updated data revisit time of every 8 days and acquiring over 1400 scenes daily. To facilitate user access and utilization, the GEE remote sensing cloud platform provides the USGS Landsat 8/9 Collection 2 Tier 1 TOA Reflectance dataset (Crawford et al., 2023). This dataset comprises 11 discrete spectral bands, 4 observation angles, and 2 quality assessment bands (see details in Table S1). Here, we have collected all available images from L8 (~19,462 cloud-screened scenes from 2013 to 2022) and L9 (~1,293 cloud-screened scenes from 2021 to 2022) worldwide since their respective launch times, matching with ground monitoring stations over land.

Ground-based measurements

The AErosol RObotic NETwork (AERONET) is a global ground-based aerosol observation network capable of automated data collection (Holben et al., 1998). It comprises over 1600 monitoring stations since 1992, each equipped with a CIMEL Electronique CE318 multi-band sun photometer. AERONET provides multi-channel AOD observations every 15 minutes, with an average uncertainty of ± 2%. These observations undergo different quality controls and are categorized into three quality levels: Level 1.0 (unscreened), Level 1.5 (cloud screened), and Level 2.0 (cloud screened and quality assured) (Giles et al., 2019). AERONET AOD data has been extensively employed as ground truth for the development and validation of aerosol retrieval studies (Hsu et al., 2013; Levy et al., 2013; Lyapustin et al., 2018; J. Wei et al., 2019a; X. Wei et al., 2020).
Given the limited number of AERONET stations in China, the Sun-sky Radiometer Observation Network (SONET) has been incorporated to enrich the observation samples. SONET is closely aligned to AERONET in terms of observation instruments, spectral bands, and data quality level categorization. It provides AOD observations at 16 long-term ground stations across China, with an average uncertainty of approximately 0.25–0.5% (Z. Q. Li et al., 2018). These observations have also been widely used in studies related to the development of satellite aerosol algorithms and uncertainty analysis in China (He et al., 2021; J. Wei et al., 2019b). In this study, we have collected AERONET and SONET Level 2.0 AOD ground observations at a total of 470 land sites from 2013 to 2022 (Figure S1). However, AERONET and SONET do not provide direct AOD observations at 550 nm, so they are interpolated using the Ångström algorithm according to our previous studies (J. Wei et al. 2019a, 2020b).
ART-GCT-GEE aerosol retrieval cloud framework
In this study, we propose a novel global aerosol retrieval cloud framework for Landsat imagery, consisting of three main steps: feature selection (input variables) for aerosol retrieval based on the atmospheric radiative transfer (ART) model, atmospheric-surface decoupling using the GeoChronoTransformers (GCT) model, and the automated global aerosol retrieval facilitated by the Google Earth Engine (GEE) platform, referred to as the ART-GCT-GEE model.

Atmospheric Radiative Transfer (ART) model

Based on the ART model, the top-of-the-atmosphere (TOA) reflectance (ρ* ) at wavelength \(\lambda\) received by satellite sensors can be approximately represented as a function of atmospheric path (aerosol) reflectance (ρAer ), Rayleigh scattering reflectance (ρRay ), and surface reflectance (ρ ) when assuming a uniform Lambertian surface:
\(\rho^{*}\left(\lambda,\theta_{s},\theta_{v},\varphi\right)\approx\rho_{\text{Aer}}\left({\lambda,\theta}_{s},\theta_{v},\varphi,\tau\right)+\rho_{\text{Ray}}\left({\lambda,\theta}_{s},\theta_{v},\varphi\right)+\frac{\rho}{1-\rho*S_{\lambda}}T\left(\theta_{s}\right)T\left(\theta_{v}\right),\)(1)
where \(\tau\) is the aerosol optical depth, S is the hemispheric albedo, and \(T\left(\theta_{s}\right)\) and \(T(\theta_{v})\) are the atmospheric downwelling and upwelling transmittances, respectively. Angles \(\theta_{s}\), \(\theta_{v}\), and \(\varphi\) represent the solar zenith angle (SZA), viewing zenith angle (VZA), and relative azimuth angle, respectively.
The complete process of atmospheric-surface decoupling, i.e., separation between atmospheric and surface contributions to TOA reflectance, forms the foundational principle of satellite aerosol retrieval. This is calculated at wavelength \(\lambda\), and based on simulations using the 6S model under specific observational conditions (\(\theta_{s}\) = 30,\(\theta_{v}\) = 30, and \(\varphi\) = 100), we found that under low surface reflectance conditions, TOA reflectance gradually increases as AOD increases (Figure S2). This effect is particularly pronounced in the visible channels, showing strong positive non-linear correlations. However, as the wavelength increases, the relationship between AOD and TOA reflectance gradually weakens. In addition, as surface reflectance increases, the impact of AOD on TOA reflectance gradually diminishes, and in longer wavelengths (bands 6–7), an inverse relationship is observed. This indicates that aerosols significantly and diversely impact TOA reflectance across different wavelengths. As a result, seven discrete spectral channels (bands 1–7) are selected as the primary input features for the deep learning (DL) model. Notably, the incorporation of multi-band information into the DL model can also enhance its ability to diagnose various aerosol types over land (J. Wei et al., 2018).
Furthermore, we have included geographical location (latitude, longitude) and monthly time series data for spatial points to improve the AOD modelling by accounting for the spatiotemporal variations in air pollution. To better capture monthly variations, particularly the seasonal cycles of AOD, we represented the temporal information using three helix-shape trigonometric month sequence vectors [T1, T2, T3] (Equations 2–4) (Sun et al., 2022; J. Wei et al., 2023). Leveraging the strong data mining capabilities of DL, this approach can utilize implicit aerosol information among multiple channels and aid in resolving complex atmospheric-surface decoupling processes to improve the capability for aerosol retrieval.
\(T_{1}=\frac{\text{Month}}{12}\) (2)
\(T_{2}=cos(2\pi\frac{\text{Month}}{12})\) (3)
\(T_{3}=\sin(2\pi\frac{\text{Month}}{12})\) (4)
In addition, the ART model is influenced by the observational geometry, as satellite signals can vary depending on different observation angles. Therefore, four angles, i.e., SZA, the solar azimuth angle (SAA), VZA, and the viewing azimuth angle (VAA), are also included as supplementary features. We also incorporated the scattering angle (\(\Theta\)) and the ground elevation (z) into our DL model because they are crucial inputs for calculating the primary parameters of the Rayleigh scattering phase function (PRay(Θ)) and Rayleigh optical depth (τRay(z)) in Rayleigh reflectance using empirical formulas (Equations 5–6) (J. Wei et al., 2018). More importantly, elevation is a critical factor characterizing surface terrain changes, particularly in rugged terrain and at high altitudes where aerosol retrievals face great challenges (Y. Wang et al., 2019; Wei et al., 2020b). Last, we employed the shortwave infrared NDVI (NDVISWIR, Equation 7), which is minimally affected by aerosols, to discern land use cover and changes.
\(P_{\text{Ray}}(\Theta)\approx\frac{3}{4}(1+\operatorname{}\Theta)\)(5)
\(\tau_{\text{Ray}}\left(\lambda,z=Z\right)\approx 0.00877{[\lambda(z=0)exp(Z/34)]}^{-4.05}\exp\left(\frac{-Z}{8.5}\right)\)(6)
\(\text{NDVI}_{\text{SWIR}}=\frac{{\rho^{*}}_{1.24}-{\rho^{*}}_{2.12}}{{\rho^{*}}_{1.24}+{\rho^{*}}_{2.12}}\)(7)
In summary, leveraging the ART model, this study has selected a total of 19 predictive factors related to aerosol remote sensing retrievals for use as input features in the DL model. These factors encompass information related to various aspects, including spectral radiance (Landsat 8/9 OLI Bands 1–7 TOA reflectances), observational geometry (i.e., SAA, SZA, VAA, VZA, and \(\Theta\)), a digital elevation model, NDVISWIR, as well as spatial (latitude, longitude) and temporal (three monthly vectors) attributes.

GeoChronoTransformers (GCT) model

Conventional remote sensing approaches suffer from large uncertainties in establishing relationships between retrieved and input quantities by resorting to assumptions such as Lambertian surface reflection and fixed aerosol types, as well as the use of fixed thresholds of vegetated index or land cover classification products in distinguishing underlying surfaces (Hsu et al., 2013; Levy et al., 2013; Lyapustin et al., 2018). They could introduce uncertainties into the aerosol retrieval process due to the nonlinear relationships and the challenges posed by variations across different times and locations. However, artificial intelligence, particularly DL, offers a promising solution to ameliorate and overcome these nonlinear challenges. More importantly, contemporary XAI technologies enable us to analyze DL models from a physical perspective (see details in Section 3.1).
Therefore, this study introduces an advanced DL model called Transformers (Vaswani et al., 2017) to minimize these effects and enhance the capability to handle nonlinear problems. Transformers, developed by Google researchers in 2017, has been recognized as the fourth major category of DL models, following multilayer perceptron (MLP), the Convolutional Neural Network (CNN), and the recurrent neural network (RNN). It was originally designed for machine translation tasks and has demonstrated outstanding performance in various domains such as image classification and text summarization (Liu & Lapata, 2019; Dong et al., 2022). However, it has seldom been applied to the field of geoscience, especially in atmospheric science. Unlike other DL models, Transformers relies solely on attention mechanisms and introduces the use of multi-head self-attention to simulate the multi-channel output of CNNs. This mechanism combines different feature representations to obtain richer and more accurate information, effectively enhancing the model’s information representation and generalization capabilities. Additionally, Transformers introduces the concepts of residual connections and encoder-decoder structures, resulting in significant breakthroughs in aspects such as operational efficiency, model flexibility, and scalability (see the model structure in Figure S3).
To enhance the applicability of the model for aerosol retrieval tasks, this study introduces improvements and incorporates spatiotemporal information into the traditional Transformer structure, resulting in a novel model named GeoChronoTransformers (GCT). Specifically, we input spatiotemporal features of AOD and auxiliary variables, including longitude, latitude, and monthly time series information (Equations 2–4), into the encoder section of the Transformers. During this process, we replace the original positional encoding module, typically employed in machine translation tasks, with fully connected layers. Subsequently, we stack multiple Transformer encoders and directly yield output through fully connected layers, which not only enables us to thoroughly extract hidden aerosol information from multidimensional features but also effectively mitigates the overfitting issue by complex models in this specific task. Last, during model training, we adopt a trial-and-error approach to fine-tune core parameters and utilize an early stopping strategy to pinpoint the retrieval model that performs best, which can also enhance both the model’s training efficiency and convergence speed.

Google Earth Engine (GEE) cloud platform

The GEE is a cloud-computing platform designed specifically to enhance the parallel processing of petabyte-scale satellite remote sensing imagery and other geographical spatial data, leveraging the powerful computational infrastructure (Gorelick et al., 2017; Tamiminia et al., 2020). Moreover, the platform offers application programming interfaces (APIs) and integrated development environments in both JavaScript and Python, allowing users to engage in customized development according to their requirements. Users can leverage the cloud computing capabilities of GEE’s backend data center to perform a variety of computations and analyses, saving a huge amount of time required for local data downloading and processing.
In this study, all data preprocessing tasks are conducted via the secondary development of the GEE Python API. First, Landsat 8/9 Collection 2 Tier 1 TOA reflectance data on the GEE platform is filtered based on the central coordinates of AERONET and SONET sites, retaining only valid observations that coincide with the satellite overpass time within a ±30-minute window. All valid observations are averaged to establish ground-truth values for each site. To further enhance the quality of data samples, only Landsat images with a low cloud-cover fraction (< 20%) are collected. Subsequently, we utilize the Landsat quality assessment (QA) band to mask out all unsuitable retrieval pixels, including those affected by clouds, cloud shadows, and ice/snow. We then proceed with the extraction (e.g., spectral reflectance and angle information), computation (e.g., NDVISWIR), and pixel-to-pixel value matching of selected input features to complete the generation of training samples. To establish a robust global aerosol retrieval model, we collected all satellite images from the launch of Landsat 8 and 9 that matched ground-based monitors, encompassing a total of 20,755 cloud-screened images (Table S2). All data samples are input into the GCT model to train the final model, and the constructed model is then applied to perform aerosol retrieval on the GEE platform, ultimately generating the final output map. Last, a 5 × 5 window median filter is applied to minimize spatial noise and enhance the quality of the retrieval, following our previous study (J. Wei et al., 2017). Figure 1 illustrates the flowchart of our proposed ART-GCT-GEE model for global aerosol retrievals from Landsat imagery.