Schematic procedure of data pre-processing for the SR technique.

ViSIR: Vision Transformer Single Image Reconstruction Method for Earth System Models

ViSIR: Vision Transformer Single Image Reconstruction Method for Earth System Models768423IEEE Pulse
Ehsan Zeraatkar, Salah A. Farough, and Jelena Tešic´

Earth system models (ESMs) are computationally intensive simulations that integrate various Earth components (atmosphere, ocean, land, ice, biosphere) to study interactions affecting climate and environmental conditions [1]. ESMs vary in scale and resolution. Global-scale models often produce low-resolution (LR) outputs when focused on smaller localities, creating challenges for local agencies needing more resources to generate high-resolution (HR) ESMs [2].

In the computer vision context, super resolution (SR) tasks aim to enhance LR images to HR. For ESMs, this involves building deep neural networks to up-sample LR models into HR outputs, addressing the need for localized HR data. To address this issue, a novel algorithm, ViSIR, combines vision transformers (ViTs) [3] for capturing long-range dependencies and sinusoidal representation networks (SIRENs) [4] for high-frequency details. By replacing the MLP layers in the ViT with SIREN, the hybrid model addresses spectral bias issues in SR tasks [5]. Detailed information is provided in the paper currently under review [5]. Figure 1 illustrates the proposed hybrid ViT-based algorithm that leverages the frequency mitigation capabilities of SIREN.

Figure 1. ViSIR flow: The input image is divided into patches, and the patches undergo a patch embedding and position encoding process before being processed by the transformer, followed by SIREN. (Image courtesy of Ehsan Zeraatkar.)

Figure 1. ViSIR flow: The input image is divided into patches, and the patches undergo a patch embedding and position encoding process before being processed by the transformer, followed by SIREN. (Image courtesy of Ehsan Zeraatkar.)

The proposed ViSIR algorithm is evaluated using images derived from the energy exascale earth system model (E3SM) simulation outputs [6] as shown in Figure 2. The E3SM model is an advanced, open-access tool for simulating Earth’s climate, including key biogeochemical and cryospheric processes. The original simulation data, mapped to a (0.25° × 0.25°) grid using bilinear interpolation, is further down-sampled to a (1° × 1°) grid using bicubic interpolation.

Figure 2. Schematic procedure of data pre-processing for the SR technique. Panels (a), (b), and (c) show surface temperature, shortwave heat flux, and longwave heat flux, respectively, for the first month of year one obtained from the global fine-resolution configuration of E3SM [7]. (Image courtesy of Ehsan Zeraatkar.)

Figure 2. Schematic procedure of data pre-processing for the SR technique. Panels (a), (b), and (c) show surface temperature, shortwave heat flux, and longwave heat flux, respectively, for the first month of year one obtained from the global fine-resolution configuration of E3SM [7]. (Image courtesy of Ehsan Zeraatkar.)

The performance of the ViSIR is superior to the three state-of-the-art algorithms as measured by the mean square error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity index measure (SSIM) point of view. Table 1 summarizes the comparison results.

Table 1.

Table 1. (Max et al.) values of MSE %, PSNR dB and SSIM for original IO and reconstructed IR images for three measurements, source temperature, shortwave heat flux, and longwave heat flux, and FOUR different models.

The ViSIR’s performance compared with that of the other three approaches demonstrates its potential for advancing the SR task.

References

  1. N. G. Heavens, D. S. Ward, and M. M. Natalie, “Studying and projecting climate change with earth system models,” Nature Educ. Knowl., vol. 4, no. 5, p. 4, 2013.
  2. V. Eyring et al., “Overview of the coupled model intercomparison project phase 6 (CMIP6) experimental design and organization,” Geosci. Model Develop., vol. 9, no. 5, pp. 1937–1958, May 2016.
  3. A. Dosovitskiy et al., “An image is worth 16×16 words: Transformers for image recognition at scale,” 2020, arXiv:2010.11929.
  4. V. Sitzmann et al., “Implicit neural representations with periodic activation functions,” in Proc. 34th Int. Conf. Neural Inf. Process. Syst., 2020, pp. 7462–7473.
  5. E. Zeraatkar, S. Faroughi, and J. Tesic, “ViSIR: Vision transformer single image reconstruction method for earth system models,” Earth Sci. Inform.
  6. Energy Exascale Earth System Model (E3SM). [Computer Software], E3SM Project, Berkeley, CA, USA, Mar. 2024.
  7. N. M. Pawar et al., “ESM data downscaling: A comparison of super resolution deep learning models,” Earth Sci. Inform., vol. 17, pp. 3511–3528, Jun. 2024.
    close