Interpretability of Neural Networks Based on Game-theoretic Interactions

Huilin Zhou; Jie Ren; Huiqi Deng; Xu Cheng; Jinpeng Zhang; Quanshi Zhang

doi:10.1007/s11633-023-1419-7

Huilin Zhou, Jie Ren, Huiqi Deng, Xu Cheng, Jinpeng Zhang, Quanshi Zhang. Interpretability of Neural Networks Based on Game-theoretic Interactions[J]. Machine Intelligence Research, 2024, 21(4): 718-739. DOI: 10.1007/s11633-023-1419-7

Citation:

Interpretability of Neural Networks Based on Game-theoretic Interactions

Graphical Abstract

Graphical Abstract

Abstract

Abstract

This paper introduces the system of game-theoretic interactions, which connects both the explanation of knowledge encoded in a deep neural networks (DNN) and the explanation of the representation power of a DNN. In this system, we define two game-theoretic interaction indexes, namely the multi-order interaction and the multivariate interaction. More crucially, we use these interaction indexes to explain feature representations encoded in a DNN from the following four aspects: 1) Quantifying knowledge concepts encoded by a DNN; 2) Exploring how a DNN encodes visual concepts, and extracting prototypical concepts encoded in the DNN; 3) Learning optimal baseline values for the Shapley value, and providing a unified perspective to compare fourteen different attribution methods; 4) Theoretically explaining the representation bottleneck of DNNs. Furthermore, we prove the relationship between the interaction encoded in a DNN and the representation power of a DNN (e.g., generalization power, adversarial transferability, and adversarial robustness). In this way, game-theoretic interactions successfully bridge the gap between “the explanation of knowledge concepts encoded in a DNN” and “the explanation of the representation capacity of a DNN” as a unified explanation.

FullText(HTML)

References (66)

Supplements (0)

Cited By

Interpretability of Neural Networks Based on Game-theoretic Interactions

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content