Abstract
Nowadays, driven by the high demand for autonomous driving and surveillance, infrared and visible image fusion (IVIF) has attracted significant attention from both the industry and research community. Existing learning-based IVIF methods tried to design various architectures to extract features. Still, these hand-crafted designed architectures cannot adequately represent the typical features of different modalities, resulting in undesirable artifacts on their fused results. To alleviate this issue, we propose a Neural Architecture Search (NAS)-based deep learning network to realize the IVIF task, which can automatically discover the modality-oriented feature representation. Our network is accomplished through two modality-oriented encoders and a unified decoder, in addition to a self-visual saliency weight module (SvSW). The two modality-oriented encoders target to learn different intrinsic feature representations automatically from infrared-/visible- modality images. Subsequently, these intermediate features are merged via the SvSW module. Finally, the fused image is recovered by a unified decoder. Extensive experiments demonstrate that our method outperforms the state-of-the-art approaches by a large margin, especially in generating distinct targets and abundant details.