Machine Learning Advances for Satellite Data Interpolation

30-09-2024 | By Liam Critchley

Key Things to Know:

  • Machine learning significantly enhances satellite data interpolation, providing more accurate climate predictions and reducing computational limitations.
  • GPSat, an open-source library, efficiently processes satellite altimetry data, offering a 504x increase in speed without sacrificing data accuracy.
  • Altimeter data is vital for tracking changes in sea ice thickness, playing a crucial role in understanding climate variability and forecasting sea ice evolution.
  • High-resolution interpolation methods like GPSat can support improved maritime navigation, numerical weather prediction, and more detailed climate models.

Earth observation satellites have enabled scientists to monitor the entirety of the Earth’s surface. These satellites have drastically improved our understanding of the different weather and climate processes that happen around the world, as well as the rate and scale of which these weather patterns are being affected by climate change. 

Satellite-mounted altimeters play a key role in recording any changes in the elevation of both ocean and sea ice surfaces around the world. In recent years, the TOPEX/Poseidon altimeter provided data that allowed major breakthroughs in tracking global sea-level rise—by tracking sea-surface height with a much higher precision than before.  

Similarly, polar-monitoring altimeters, such as CryoSat-2, have provided a consistent record of changes in sea ice thickness over the last decade, showcasing how sea ice thickness is playing a major role in the changing climate system, is controlling atmosphere-ocean heat exchanges, affecting sea ice teleconnections, and is influencing the timing of ice-algal blooms. 

Despite the advances and discoveries made to date, there does tend to be a sparsity of altimetry data for scientists to work with, making it hard to make key discoveries. With many areas of modern-day science where there are sparse data points to work with, scientists are turning to machine learning algorithms to help make better predictions of the data in a timelier manner. 

What are Altimeters and Altimetry Data? 

Altimeters are measurement instruments that record surface elevation using either a microwave pulse (radar altimeter) or a laser beam (laser altimeter). The recording of data is based on the return time of the emitted pulse, which collects the data along narrow tracks as they orbit around the Earth. 

Altimeters have a horizontal resolution controlled by the altimeter footprint that enables them to resolve small features of the sea ice and/or ocean. A single pulse is used to perform a measurement, and the footprint of the pulse can vary in size. Larger footprints enable altimeters to scan the Earth quicker, but it comes with a lower horizontal resolution, and vice versa for smaller footprints. Depending on the region of interest, observations can take anywhere from days to weeks, depending on the required resolution. 

For example, if you look at some of the current altimeters in use, the ICESat-2 is a laser altimeter with three pairs of beams and an along-track footprint size of around 17 m. On the other hand, CryoSat-2 is a radar altimeter with an along track footprint size of around 300 m. However, both altimeters require around 30 days to investigate the sea ice cover at both poles. 

The Limitations of Altimeter Time Variability in Climate Monitoring

The variability in time required is a limitation to understanding the processes that drive ocean and sea ice variability. Time variability also means that the applications that would benefit from high spatiotemporal resolution observation—such as numerical weather prediction, climate models, and Arctic maritime navigation—are not currently feasible with altimeters. 

For these applications, having access to high spatiotemporal sea ice thickness observations could provide a lot more useful information that more traditional methods. For examples, the thickness of sea ice drives the summer sea ice melt rate, so having access to an accurate thickness will enable more accurate predictions of sea ice evolution the summer. Additionally, having access to accurate sea ice thickness conditions is beneficial for shipping forecasts so that icebreaker ships can traverse through the ice with a much greater level of knowledge about what to expect.  

Overcoming the Sparsity of Data 

Satellite altimetry method suffer from a sparsity of data due to not all locations being observed during analysis. Various statistical interpolation approaches—such as optimal interpolation, objective analysis, kriging, and Gaussian process (GP) regression—have been used to fill in the gaps of data from unobserved locations. 

One of the key challenges with this sparsity is that it leads to inaccuracies in understanding sea ice thickness, which plays a significant role in climate modelling and prediction. As highlighted by Gregory et al. (2024), traditional methods often miss finer-scale variations in sea ice thickness, which can affect interpretations of how sea ice responds to atmospheric and oceanic processes. This data gap can hinder accurate predictions of ice melt rates, making it difficult for researchers to create reliable climate models.

The Challenges of Scaling Satellite Altimetry Data for Climate Modelling

However, all these approaches don’t scale well to large data sets (over a few thousand data points) due to the computation scaling cubically to the size of the data set. The main solution to this high-performance computer (HPC) or sub-sampling is to generate predictions in a reasonable time frame. Not all research institutions have access to HPCs and having computational restrictions prevents novel data products from becoming operational and open source. With this in mind, there’s a need to develop tool kits that remove the reliance on parallelised HPCs or sub-sampling for producing scientific data sets. 

To overcome these limitations, the GPSat library was designed with machine learning capabilities that are inherently data-agnostic. This means it can be tailored to any field of interest, ranging from atmospheric weather station data to oceanic altimetry. Gregory et al. (2024) demonstrated that using GPSat on Arctic sea ice freeboard data significantly reduced computational times while maintaining high accuracy. Such advancements enable broader research participation and enhance the potential for generating comprehensive climate models.

Gregory et al. (2024) identified that many existing interpolation models are limited by their reliance on high-resolution computer resources, which makes them inaccessible to many institutions. Their study introduced the open-source Python library, GPSat, which utilises Gaussian process models to efficiently handle large satellite altimetry data without requiring extensive computational power. This approach addresses the need for scalable and inclusive data analysis, allowing smaller research teams to access advanced interpolation techniques and contribute valuable insights to climate research.

Turning to Machine Learning for the Interpolation of Satellite Altimetry Data 

Machine learning algorithms have been advancing over the years and have become a useful tool for spotting data patterns and predicting data trends—even in incomplete data sets. Over the last decade, machine learning has been developed more and more for the development of scalable inverse methods. While these methods have not yet been used for Earth observation applications, machine learning libraries are implementable with GP interpolation methods such as GPflow and GPyTorch. These libraries offer a much greater degree of flexibility for constructing different GP models and provide graphics processing unit (GPU) functionalities for speeding up linear algebra computations, as well as batch processing functionalities that improve memory handling.  

The Role of Machine Learning in Enhancing Satellite Altimetry Data Analysis

Researchers have now developed a new open-source programming library for the interpolation of non-stationary satellite altimetry data, known as GPSat. This library is built around GPflow and has been created for the specific purpose of performing efficient spatial (1D or 2D) and spatiotemporal (3D) interpolation of satellite altimetry data. 

The researchers used GPSat to generate complete maps of 50 km-gridded Arctic Sea ice radar freeboards—investigating the joint interpolation of CryoSat-2, Sentinel-3A, and Sentinel-3B sea ice radar freeboard data for the 2018/19 arctic winter season—and found that it provided a 504 x increase in computational speed without significant degradation to the freeboards. In altimetry, the radar freeboard is the height of a sea ice floe relative to the underlying ocean surface. 

The researchers demonstrated the scalability of the GPSat library by showing examples of freeboard interpolation at 5 km resolution, as well as along-track seal-level anomalies at the resolution of the altimeter footprint. The interpolated data also showed a strong correlation with airborne data. 

This high-resolution interpolation capability is crucial for monitoring dynamic sea ice changes, particularly in areas like the Fram Strait, where small-scale variations in sea ice thickness can have significant implications for understanding oceanic circulation patterns. The GPSat model's ability to capture these variations ensures that even subtle changes in sea ice are recorded accurately, which is vital for improving the precision of climate predictions and sea ice forecasts.

The Impact of GPSat on Climate Predictions and Sea Ice Forecasting

It’s thought that GPSat could be used to overcome the high computation bottlenecks in many altimetry-based interpolation methods to improve our understanding of ocean and sea ice variability over short spatiotemporal scales and key climate processes in general. Additionally, because the GPSat is data agnostic, it could be adapted for different application field, such as atmospheric weather station data. 

The researchers have stated that future work will look to include the entire CryoSat-2 record from 2010 to the present to look at the downstream impact on sea ice thickness trends. Doing so will require more sensitivity analyses and additional airborne and ground-based validation tests. The researchers also stated that they’re going to look into Sparse Variational Gaussian Process (SVGP) models to increase the computational speed even further, as well as look toward parallelised implementation to use multiple GPUs during model training. 

Reference: 

Gregory W. et al., Scalable interpolation of satellite altimetry data with probabilistic machine learning, Nature Communications15, (2024), 7453. 

Liam Critchley Headshot.jpg

By Liam Critchley

Liam Critchley is a science writer who specialises in how chemistry, materials science and nanotechnology interplay with advanced electronic systems. Liam works with media sites, companies, and trade associations around the world and has produced over 900 articles to date, covering a wide range of content types and scientific areas. Beyond his writing, Liam's subject matter knowledge and expertise in the nanotechnology space has meant that he has sat on a number of different advisory boards over the years – with current appointments being on the Matter Inc. and Nanotechnology World Association advisory boards. Liam was also a longstanding member of the advisory board for the National Graphene Association before it folded during the pandemic.