Learning Efficient Linear Graph Transformer via Graph-Attention Distillation
-
Graphical Abstract
-
Abstract
In recent years, graph transformers have been demonstrated to be effective learning architectures for various graph-based learning tasks. However, their scalability on large-scale data is usually restricted due to the quadratic computational complexity of graph transformers when compared to Graph Convolutional Network (GCN) models. To overcome this issue, in this work, we propose to learn an efficient linear graph Transformer by employing graph attention distillation model. The proposed method provides a faster and lighter graph transformer framework for graph data learning tasks. The core of the proposed distillation model is to employ the kernel decomposition approach to rebuild the graph transformer architecture, thereby reducing the quadratic complexity to the linear complexity. Furthermore, to seamlessly transfer the rich learning capacity from the regular graph transformer of teacher branch to its linear student counterpart, we devise a novel graph-attention knowledge distillation strategy to enhance the capabilities of the student network. Empirical evaluations conducted on six commonly employed benchmark datasets validate our model′s superiority, as it consistently outperforms existing methods in terms of both effectiveness and efficiency.
-
-