Learning Knowledge Enhanced Text-image Feature Selection Network for Chest X-ray Image Classification
-
Graphical Abstract
-
Abstract
Discriminant feature representation of chest X-ray images is crucial for predicting diseases. Currently, large-scale language models predominantly utilize linear classifier for disease prediction, which ignores the semantic correlations between different diseases and potentially leads to the omission of discriminative visual details. To this end, this work proposes a novel knowledge enhanced text-image feature selection network (KT-FSN), comprising three main components: multi-relationship image encoder module, knowledge-enhanced text encoder module, and text-image label prediction module. Specifically, the multi-relationship image encoder (MRIE) module captures the visual relationships among images and incorporates a multi-relation graph to fuse relevant image information, thereby enhancing the image features. Then, we develop a novel knowledge-enhanced text encoder (KETE) module based on a large-scale language model to learn disease label word embeddings guided by medical domain expertise. Additionally, it employs a graph convolutional network (GCN) to capture the co-occurrence and interdependence of different disease labels. Finally, we propose a novel text-image label prediction (TILP) module based on Transformer decoder, which adaptively selects discriminative image spatial features under the guidance of disease label word embeddings, ultimately leading to the accurate chest diseases prediction from chest X-ray images. Extensive experimental results on the publicly available ChestX-ray14 and CheXpert datasets validate the effectiveness and superiority of the proposed KT-FSN model. The source code will be available at https://github.com/GXY-20000/KT-FSN.
-
-