Overview of Infrared Small Target
Detection

Jiawei Cao; Ximing Xie; Huanhuan Ran; Yian Liu

Mini- Review

Overview of Infrared Small Target Detection

Jiawei Cao^2,3, Ximing Xie^2,3, Huanhuan Ran² and Yian Liu^1,2*

¹University of Electronic Science and Technology of China, Chengdu, China

²Chongqing Institute of Microelectronics Industry Technology, UESTC, Chongqing, China

³School of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing, China

Corresponding Author

Received Date:March 31, 2025; Published Date:April 14, 2025

Abstract

Infrared small target detection (IRSTD) has long been a prominent field, with numerous effective algorithms developed. However, many of these algorithms do not adequately consider the shape features of small targets. In this paper, we classify the algorithms into two categories based on the presence of an independent shape feature extraction module. We conduct a comparative analysis of these two types of algorithms, using experimental data from the same dataset for consistency. Our findings indicate that the algorithm with the shape feature extraction module enhances both target detection and shape restoration in IRSTD.

Introduction

In recent years, infrared imaging has been extensively utilized across various fields due to its robust anti-interference capabilities and its ability to differentiate between genuine and false targets. Traditional methods include algorithms such as PSTNN [1], MPCM [2], and top-hat filtering [3]. While these traditional algorithms are computationally efficient, they often exhibit low detection accuracy and frequently require extensive hyperparameter tuning. Conversely, deep learning-based methods include ACMnet [4], which employs asymmetric context modulation; MDvsFA [5], which introduces the MD and FA metrics; and ISNet [6], the first to incorporate target shape for IRSTD. Additionally, Tcl-former [7] extracts feature of infrared small targets by simulating the heat conduction process. Although existing deep learning-based IRSTD methods have demonstrated promising results, most algorithms tend to overlook the shape features of infrared weak small targets. Consequently, they struggle to accurately reconstruct the shapes of these targets, despite the fact that shape features are crucial for effective target recognition and distinguishing between categories of small targets. Given the significance of shape features, this paper proposes a new classification of algorithms: those with independent shape feature extraction modules are categorized as algorithms that prioritize shape features, while the others are classified as algorithms that do not explicitly emphasize shape features.

Algorithms and Comparison

In this section, we will analyze several algorithms from both categories to evaluate whether shape features play a facilitating role in IRSTD. Two algorithms with the shape feature extraction module are discussed in this paper: ISNet and ILNet. Additionally, two algorithms without the shape feature module are also analyzed: DNANet and IRPruneDet.

With Shape Feature Extraction Module

ISNet [6] is the first to introduce shape features into IRSTD. Based on the U-Net structure, it incorporates edge blocks inspired by Taylor finite difference (TFD) and the two-orientation attention aggregation (TOAA) block. The TFD edge block draws upon the second-order TFD equation, utilizing residual learning and gated convolution to aggregate edge features at different levels, thereby enhancing the contrast between the target and background and extracting fine edge details. The TOAA block uses attention mechanisms along the row and column directions to combine lowlevel detail features with high-level semantic features, emphasizing target shape features while suppressing noise interference. However, ISNet has limited adaptability in extreme conditions. For targets that are extremely small or highly irregular in shape, as well as in extremely harsh environmental conditions, it may struggle to effectively detect and reconstruct the target’s shape.

ILNet [8] proposes treating infrared small targets as salient regions without semantic information, focusing on low-level features. It introduces the interactive polarized orthogonal fusion (IPOF) module, which enables bidirectional feature fusion between the encoder and decoder, integrating important shallow low-level features into deeper layers. Additionally, it designs the dynamic onedimensional aggregation (DODA) layer, which adaptively aggregates channel and spatial information, and the representative block (RB), which dynamically assigns weights to shallow and deep features, enhancing the recovery ability of small targets in deep networks. Although ILNet performs well in terms of detection accuracy and target shape reconstruction, its detection performance may be affected in cases of extremely small targets or when the target intensity is similar to background clutter, which presents certain limitations.

Without Shape Feature Extraction Module

DNANet [9] introduces a dense nested attention network architecture that includes the dense nested interactive module (DNIM) and the cascaded channel and spatial attention module (CSAM) for IRSTD. DNIM facilitates interaction of features at different scales by designing multiple nodes between the encoder and decoder sub-networks, thereby maintaining small target representations in deep networks. CSAM enhances multilevel features adaptively using channel and spatial attention mechanisms, further improving feature fusion. Despite DNANet’s excellent performance in detection accuracy and robustness, it has high computational complexity and is prone to false alarms and erroneous detections when dealing with random noise.

IRPruneDet [10] is the first to introduce the concept of network pruning into IRSTD tasks, proposing a soft channel pruning method based on wavelet structure regularization. The wavelet channel pruning (WCP) strategy represents the weight matrix in the wavelet domain and evaluates channel importance based on the L1 norm of the wavelet decomposition coefficients, promoting structural sparsity. Additionally, the soft channel reconstruction (SCR) method is introduced, which dynamically retains important channel parameters during pruning, preventing the premature loss of critical information. Although IRPruneDet excels in model compression and detection accuracy, its pruning strategy, which relies on wavelet transforms and channel importance evaluation, may have limitations when handling infrared images with complex textures and diverse target features. Furthermore, the model’s performance is sensitive to the initial pruning rate and training hyperparameters, requiring careful tuning to achieve optimal results.

Comparison and Analysis

Table 1:Comparisons with sota methods on NUAA-SIRST and IRSTD-1K in mIoU (%), nIoU (%), Pd(%) and F a(10⁻⁶).

irispublishers-openaccess-Robotics-Automation-Technology

Subsequently, the experimental data of the four algorithms are compared on two datasets: NUAA-SIRST and IRSTD1k. The experimental data are presented in Table 1 below.

From Table 1, it can be observed that the algorithms that focus on shape features, ISNet and ILNet, perform better overall. Among them, ILNet achieves a Pd of 100% on the NUAA-SIRST dataset, while maintaining the lowest Fa at 1.33×10⁻⁶. These methods explicitly model target shape features and also lead in terms of mIoU and nIoU metrics. Notably, on the IRSTD-1k dataset, ILNet’s nIoU outperforms DNANet by 2.69%, demonstrating the significant role of shape features in improving target localization accuracy. In contrast, methods without the shape feature extraction module, such as DNANet and IRPruneDet, perform reasonably well in certain scenarios, but generally suffer from higher false alarm rates. For instance, DNANet has a Fa of 17.52% on the IRSTD-1k dataset and shows significant performance fluctuations, indicating that these methods are more susceptible to background interference and the characteristics of the dataset. A comprehensive analysis reveals that incorporating shape features can effectively improve the accuracy and robustness of IRSTD, especially showing significant advantages in complex backgrounds, and yielding better performance in target shape reconstruction.

Conclusion

In this paper, we propose a new classification of IRSTD algorithms based on whether there is an independent shape feature extraction module. We analyze typical algorithms from both categories and compare their experimental data. The conclusion is that prioritizing shape features significantly enhances both target detection and the reconstruction of the target’s shape in IRSTD.

Acknowledgement

None.

Conflict of Interest

No conflict of interest.

References

Article Details

Citation

Jiawei Cao, Ximing Xie, Huanhuan Ran and Yian Liu*. Overview of Infrared Small Target Detection. On Journ of Robotics & Autom. 3(4): 2025. OJRAT.MS.ID.000570.

Keywords

Algorithms; Taylor finite difference (TFD); Two-Orientation Attention Aggregation (TOAA); interactive Polarized Orthogonal Fusion (IPOF); Dynamic One-Dimensional Aggregation (DODA); Cascaded Channel and Spatial Attention Module (CSAM); Wavelet Channel Pruning (WCP)