Open Access Mini- Review

Overview of Infrared Small Target Detection

Jiawei Cao2,3, Ximing Xie2,3, Huanhuan Ran2 and Yian Liu1,2*

1University of Electronic Science and Technology of China, Chengdu, China

2Chongqing Institute of Microelectronics Industry Technology, UESTC, Chongqing, China

3School of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing, China

Corresponding Author

Received Date:March 31, 2025;  Published Date:April 14, 2025

Abstract

Infrared small target detection (IRSTD) has long been a prominent field, with numerous effective algorithms developed. However, many of these algorithms do not adequately consider the shape features of small targets. In this paper, we classify the algorithms into two categories based on the presence of an independent shape feature extraction module. We conduct a comparative analysis of these two types of algorithms, using experimental data from the same dataset for consistency. Our findings indicate that the algorithm with the shape feature extraction module enhances both target detection and shape restoration in IRSTD.

Introduction

In recent years, infrared imaging has been extensively utilized across various fields due to its robust anti-interference capabilities and its ability to differentiate between genuine and false targets. Traditional methods include algorithms such as PSTNN [1], MPCM [2], and top-hat filtering [3]. While these traditional algorithms are computationally efficient, they often exhibit low detection accuracy and frequently require extensive hyperparameter tuning. Conversely, deep learning-based methods include ACMnet [4], which employs asymmetric context modulation; MDvsFA [5], which introduces the MD and FA metrics; and ISNet [6], the first to incorporate target shape for IRSTD. Additionally, Tcl-former [7] extracts feature of infrared small targets by simulating the heat conduction process. Although existing deep learning-based IRSTD methods have demonstrated promising results, most algorithms tend to overlook the shape features of infrared weak small targets. Consequently, they struggle to accurately reconstruct the shapes of these targets, despite the fact that shape features are crucial for effective target recognition and distinguishing between categories of small targets. Given the significance of shape features, this paper proposes a new classification of algorithms: those with independent shape feature extraction modules are categorized as algorithms that prioritize shape features, while the others are classified as algorithms that do not explicitly emphasize shape features.

Algorithms and Comparison

In this section, we will analyze several algorithms from both categories to evaluate whether shape features play a facilitating role in IRSTD. Two algorithms with the shape feature extraction module are discussed in this paper: ISNet and ILNet. Additionally, two algorithms without the shape feature module are also analyzed: DNANet and IRPruneDet.

With Shape Feature Extraction Module

ISNet [6] is the first to introduce shape features into IRSTD. Based on the U-Net structure, it incorporates edge blocks inspired by Taylor finite difference (TFD) and the two-orientation attention aggregation (TOAA) block. The TFD edge block draws upon the second-order TFD equation, utilizing residual learning and gated convolution to aggregate edge features at different levels, thereby enhancing the contrast between the target and background and extracting fine edge details. The TOAA block uses attention mechanisms along the row and column directions to combine lowlevel detail features with high-level semantic features, emphasizing target shape features while suppressing noise interference. However, ISNet has limited adaptability in extreme conditions. For targets that are extremely small or highly irregular in shape, as well as in extremely harsh environmental conditions, it may struggle to effectively detect and reconstruct the target’s shape.

ILNet [8] proposes treating infrared small targets as salient regions without semantic information, focusing on low-level features. It introduces the interactive polarized orthogonal fusion (IPOF) module, which enables bidirectional feature fusion between the encoder and decoder, integrating important shallow low-level features into deeper layers. Additionally, it designs the dynamic onedimensional aggregation (DODA) layer, which adaptively aggregates channel and spatial information, and the representative block (RB), which dynamically assigns weights to shallow and deep features, enhancing the recovery ability of small targets in deep networks. Although ILNet performs well in terms of detection accuracy and target shape reconstruction, its detection performance may be affected in cases of extremely small targets or when the target intensity is similar to background clutter, which presents certain limitations.

Without Shape Feature Extraction Module

DNANet [9] introduces a dense nested attention network architecture that includes the dense nested interactive module (DNIM) and the cascaded channel and spatial attention module (CSAM) for IRSTD. DNIM facilitates interaction of features at different scales by designing multiple nodes between the encoder and decoder sub-networks, thereby maintaining small target representations in deep networks. CSAM enhances multilevel features adaptively using channel and spatial attention mechanisms, further improving feature fusion. Despite DNANet’s excellent performance in detection accuracy and robustness, it has high computational complexity and is prone to false alarms and erroneous detections when dealing with random noise.

IRPruneDet [10] is the first to introduce the concept of network pruning into IRSTD tasks, proposing a soft channel pruning method based on wavelet structure regularization. The wavelet channel pruning (WCP) strategy represents the weight matrix in the wavelet domain and evaluates channel importance based on the L1 norm of the wavelet decomposition coefficients, promoting structural sparsity. Additionally, the soft channel reconstruction (SCR) method is introduced, which dynamically retains important channel parameters during pruning, preventing the premature loss of critical information. Although IRPruneDet excels in model compression and detection accuracy, its pruning strategy, which relies on wavelet transforms and channel importance evaluation, may have limitations when handling infrared images with complex textures and diverse target features. Furthermore, the model’s performance is sensitive to the initial pruning rate and training hyperparameters, requiring careful tuning to achieve optimal results.

Comparison and Analysis

Table 1:Comparisons with sota methods on NUAA-SIRST and IRSTD-1K in mIoU (%), nIoU (%), Pd(%) and F a(10−6).

irispublishers-openaccess-Robotics-Automation-Technology

Subsequently, the experimental data of the four algorithms are compared on two datasets: NUAA-SIRST and IRSTD1k. The experimental data are presented in Table 1 below.

From Table 1, it can be observed that the algorithms that focus on shape features, ISNet and ILNet, perform better overall. Among them, ILNet achieves a Pd of 100% on the NUAA-SIRST dataset, while maintaining the lowest Fa at 1.33×10⁻⁶. These methods explicitly model target shape features and also lead in terms of mIoU and nIoU metrics. Notably, on the IRSTD-1k dataset, ILNet’s nIoU outperforms DNANet by 2.69%, demonstrating the significant role of shape features in improving target localization accuracy. In contrast, methods without the shape feature extraction module, such as DNANet and IRPruneDet, perform reasonably well in certain scenarios, but generally suffer from higher false alarm rates. For instance, DNANet has a Fa of 17.52% on the IRSTD-1k dataset and shows significant performance fluctuations, indicating that these methods are more susceptible to background interference and the characteristics of the dataset. A comprehensive analysis reveals that incorporating shape features can effectively improve the accuracy and robustness of IRSTD, especially showing significant advantages in complex backgrounds, and yielding better performance in target shape reconstruction.

Conclusion

In this paper, we propose a new classification of IRSTD algorithms based on whether there is an independent shape feature extraction module. We analyze typical algorithms from both categories and compare their experimental data. The conclusion is that prioritizing shape features significantly enhances both target detection and the reconstruction of the target’s shape in IRSTD.

Acknowledgement

None.

Conflict of Interest

No conflict of interest.

References

    1. Bai X, Zhou F (2010) Analysis of new top-hat transformation and the application for infrared dim small target detection [J]. Pattern Recognition 43(6): 2145-2156.
    2. Wei Y, You X, Li H (2016) Multiscale patch-based contrast measure for small infrared target detection [J]. Pattern Recognition 58: 216-226.
    3. L Zhang, Z Peng (2019) Infrared small target detection based on partial sum of the tensor nuclear norm, Remote Sensing 11(4): 382.
    4. Dai Y, Wu Y, Zhou F (2021) Asymmetric contextual modulation for infrared small target detection [C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 950-959.
    5. Wang H, Zhou L, Wang L (2019) Miss detection vs. false alarm: Adversarial learning for small object segmentation in infrared images [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 8509-8518.
    6. Zhang M, Zhang R, Yang Y (2022) ISNET: Shape matters for infrared small target detection [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 877-886.
    7. Chen T, Tan Z, Chu Q (2024) Tci-former: Thermal conduction-inspired transformer for infrared small target detection [C]//Proceedings of the AAAI Conference on Artificial Intelligence 38(2): 1201-1209.
    8. Li H, Yang J, Wang R (2025) ILNet: Low-level matters for salient infrared small target detection [J]. IEEE Transactions on Aerospace and Electronic Systems.
    9. Li B, Xiao C, Wang L (2022) Dense nested attention network for infrared small target detection [J]. IEEE Transactions on Image Processing.
    10. Zhang M, Yang H, Guo J (2024) IRPruneDet: efficient infrared small target detection via wavelet structure-regularized soft channel pruning [C]//Proceedings of the AAAI conference on artificial intelligence 38(7): 7224-7232.
Citation
Keywords
Signup for Newsletter
Scroll to Top