Surface Water Extraction from Remote Sensing
Images: A Mini Review

Yiming Xiao; Zihan Jiang; Han Zhai

Mini Review

Surface Water Extraction from Remote Sensing Images: A Mini Review

Yiming Xiao, Zihan Jiang and Han Zhai*

School of Geography and Information Engineering, China University of Geosciences, Wuhan, China

Corresponding Author

Received Date:May 23, 2025; Published Date:May 28, 2025

Abstract

The precise extraction of surface water bodies holds significant value in ecological monitoring and disaster early warning. The massive emergence of remote sensing (RS) sensors and rapid development of artificial intelligence (AI) greatly advance water body extraction, giving birth to numerous excellent methods. This paper briefly reviews the development venation and focuses on current progress to systematically analyze the performance boost stemming from different strategies, including weakly supervised and self-supervised learning, multimodal fusion and combination with big models or foundational vision models. Lastly, this paper looks ahead to several future research lines.

Keywords:Surface water extraction; Remote sensing image; artificial intelligence

Abbreviations: RS: Remote Sensing; HRI: High-resolution image; AI: Artificial intelligence; NDWI: Normalized difference water index; SVM: support vector machine; RF: Random Forest; DNN: Deep neural network; CNN: Convolutional neural network; ViT: Vision transformer; SOTA: Stateof- the-art.

Introduction

As a vital component of the Earth’s resources, surface water plays an indispensable role in keeping ecological balance [1]. The intensified global climate change and human activities have profound impacts on surface water, leading to frequent changes. Hence, accurate water body extraction holds significant values.

The massive available RS data especially for HRIs has provided an unprecedented opportunity for water body extraction. However, water extraction from RS images is a challenging task. Surface water exhibits diverse forms, including natural water bodies like lakes, ponds, rivers and wetlands, as well as artificial water bodies like reservoirs, canals and ditches, resulting in large varied appear ance. In addition, the largely varied spectra of water bodies due to content changes and shadow or algae cover in addition to strong interferences from similar objects and complex background make water body detection very difficult.

To date, numerous methods have been developed for water body extraction from RS images. According to the working mechanism, three main kinds are involved, i.e., index-threshold models, shallow machine learning-based models and deep learning-based methods [2-4].

The index-threshold methods generally construct indicative water indices by full exploiting distinctive reflective characteristics of water bodies to conduct specific band calculations and generating an extraction result based on a customized threshold, such as NDWI, modified NDWI, background enhanced NDWI, multi-band water index, deep-blue NDWI, and so on [2]. Owing to the simplicity and efficiency, these methods are widely used to real applications. However, due to the weak adaptability of water indices and limited generalization of customized thresholds, these methods have limited accuracy.

The shallow machine learning-based methods detect water bodies by designing various spectral and spatial features and employing certain classifiers to separate water and background pixels, such as SVM, RF, and so on [3]. These methods effectively improve water detection accuracy. However, due to the limited discriminability and robustness of hand-crafted features, these methods have limited performance in complicated scenarios.

The deep learning-based methods are recent focus and have achieved notable accomplishments due to the strong feature extraction capability of DNNs. By hierarchically extracting robust high-level semantic features for water bodies in a data-driven manner, these deep models significantly improve the detection performance and show a large superiority to traditional methods [4]. Among them, the encoder-decoder framework has become the common backbone owing to its efficient data processing and good adaptability to large scenes. Benefitting from the excellent local context mining capability, CNN-based methods have been broadly applied to water body extraction and delivered impressive performance. Based on these, various attentions and multi-scale strategies have been introduced to extend the receptive fields of CNNs and obtained good effect. However, due to the inherent locality, CNN-based methods fail to directly model global dependencies and are subject to limited discriminability. Comparatively speaking, transformer effectively makes up this defect by learning global expressions in a sequential- to-sequential manner and exhibits a large potential for water body extraction. With the popular of ViT, many transformer-based methods have been developed. However, transformer alone fails to yield good results due to the weakness in local detail description. To overcome these obstacles, more and more hybrid models of CNN and transformer have been put forward to integrate their advantages and promote more accurate detection results, which achieves the SOTA performance and greatly advances water extraction.

Despite these advancements, there are still some limitations for deep learning-based methods, such as high reliance on large amount of training samples, limited information in single data source and limited adaptability to largely varied scenes. All these limit the wide application of the above methods.

Current Progress and Future Lines

Recently, based on these advanced networks, various improved models have been developed from different aspects, including weakly supervised and self-supervised learning, multimodal fusion and combination with big models or foundational vision models.

Weakly supervised and self-supervised learning

Recent studies have shown that by introducing pixel-level or image-level weak supervision strategies, segmentation performance can be significantly improved while drastically reducing annotation costs. For example, by fully leveraging neighborhood aggregation and weak supervision based on pixel-level annotations, the challenge caused by limited labeled data was well addressed [5]. Self-supervised learning has emerged as a powerful technique for water extraction, especially in scenarios where labeled samples are scarce [6]. By combining it with prior knowledge mining, many competitive models have been proposed. In addition, some oneshot learning-based methods have been developed to deal with the small sample problem and obtained good effect [7].

Multimodal fusion

By leveraging the complementary information from SAR, optical and LiDAR RSIs, many advancements have been obtained for surface water extraction. Specifically, optical data provide rich spectral and spatial information, and LiDAR data provide precise geometrical information, with SAR data delivering all-weather monitoring. By fusing multimodal data through improved attention networks, such as Siamese affinity network and cross-modal transformer, various water bodies can be more accurately detected, with interferences from background objects like shadows and vegetation effectively suppressed [8]. In addition, spatiotemporal attention models not only exploit multi-source information in the single phase but also capturing dynamic changes of water bodies in different phases, which significantly enhances the responsiveness to seasonal water fluctuations and flood events.

Combination with big models or vision models

Nowadays, big models have attracted an increasing attention and presented huge advantage over traditional models in various fields. By introducing prior guidance derived from big models, notable improvements were obtained in water extraction. For instance, by fine-tuning segment anything model with minimal boundary prompts such as points or boxes to induce favorable priors, various water bodies in complex scenarios were precisely detected [9]. In addition, recent works indicate that large-scale pre-trained fundamental vision models have excellent cross-domain generalization capabilities by fine-tuning or prompt-based learning, offering new ideas for generalized water extraction in low-label cases. The ChatEarthNet integrates images and texts, providing valuable resources for training vision–language foundational models. These models exhibit language-driven flexibility in prompt-based retrieval and segmentation scenarios, paving the way for dynamic monitoring of water bodies through a new manner of “description + localization”. Cross-domain adaptation models enable pre-trained foundational models to obtain high-performance segmentation with minimal or no labeled samples in new scenarios by introducing contrastive learning and domain adversarial training.

Overall, the above progress greatly promotes the development of water body extraction from RSIs. However, there are still some room for future research, such as efficient lightweight models, unsupervised models and remote sensing big models.

Efficient lightweight models

Although current compact networks have obtained notable performance for water body extraction, their applicability is still limited by the large complexity, especially for mobile platforms with severely limited computing power like unmanned aerial vehicle. Therefore, efficient lightweight models proper for edge computing will be more attractive in real applications.

Unsupervised models

Nowadays, unsupervised models have attracted more and more attentions owing to the advantages of no labeled samples and automatization. Therefore, how to design reasonable unsupervised models for water body extraction will be another future research line.

Remote sensing big models

Most of the existing methods are designed for specific regions and tasks, with limited adaptability to largely varied scenarios and weak adaptability to actual complex requirements. Therefore, how to design multi-task remote sensing big models with strong generalization ability will be an important research line.

Conclusion

Surface water body extraction has become one of the core tasks in remote sensing applications, which plays an important role in ecological protection, water resource management and disaster warning. In this paper, we first review the development venation of water extraction, including index-threshold methods, shallow machine learning-based methods and deep learning-based methods. Subsequently, we systematically analyze current process from different perspectives, including weakly supervised and self-supervised learning, multimodal fusion and combination with big models or fundamental vison models. Lastly, based on the remained limitations, we give several possible future research lines. All these contribute to systematize the theory and technology of water body extraction, and promotes the development of this field.

Acknowledgment

This work was supported in part by the National Natural Science Foundation of china under grants 42271386 and 42001313, and in part by the National Science and Technology Basic Resource Investigation Project under grant 2019FY202503.

Conflicts of Interest

The authors declare no conflict of interest.

References

Article Details

Volume 3 - Issue 1, 2025

Open Access

Citation

Yiming Xiao, Zihan Jiang and Han Zhai*. Surface Water Extraction from Remote Sensing Images: A Mini Review. Adv in Hydro & Meteorol. 3(1): 2025. AHM.MS.ID.000551.

Keywords

Artificial intelligence (AI), Global climate change, Surface water, Earth's resource, Water body extraction, Surface water extraction, Remote sensing images, Lakes, Ponds, Rivers, Wetlands