Nowadays, driven by the high demand for autonomous driving and surveillance, infrared and visible image fusion (IVIF) has attracted significant attention from both the industry and research community. Existing learning-based IVIF methods tried to design various architectures to extract features. Still, these hand-crafted designed architectures cannot adequately represent the typical features of different modalities, resulting in undesirable artifacts on their fused results. To alleviate this issue, we propose a Neural Architecture Search (NAS)-based deep learning network to realize the IVIF task, which can automatically discover the modality-oriented feature representation. Our network is accomplished through two modality-oriented encoders and a unified decoder, in addition to a self-visual saliency weight module (SvSW). The two modality-oriented encoders target to learn different intrinsic feature representations automatically from infrared-/visible- modality images. Subsequently, these intermediate features are merged via the SvSW module. Finally, the fused image is recovered by a unified decoder. Extensive experiments demonstrate that our method outperforms the state-of-the-art approaches by a large margin, especially in generating distinct targets and abundant details.