Improving Multi-task GNNs for Molecular Property Prediction via Missing Label Imputation
-
Graphical Abstract
-
Abstract
The prediction of molecular properties is a fundamental task in the field of drug discovery. Recently, graph neural networks (GNNs) have been gaining prominence in this area. Since a molecule tends to have multiple correlated properties, there is a great need to develop the multi-task learning ability of GNNs. However, limited by expensive and time-consuming human annotations, collecting complete labels for each task is difficult. As a result, most existing benchmarks involve many missing labels in training data, and the performance of GNNs is impaired due to the lack of sufficient supervision information. To overcome this obstacle, we propose to improve multi-task molecular property prediction by missing label imputation. Specifically, a bipartite graph is first introduced to model the molecule-task co-occurrence relationships. Then, the imputation of missing labels is transformed into predicting missing edges on this bipartite graph. To predict the missing edges, a graph neural network is devised, which can learn the complex molecule-task co-occurrence relationships. After that, we select reliable pseudo labels according to the uncertainty of the prediction results. Boosting with enough and reliable supervision information, our approach achieves state-of-the-art performance on a variety of real-world datasets.
-
-