Published Online

Display Method:
A Survey on Recent Advances and Challenges in Reinforcement Learning Methods for Task-oriented Dialogue Policy Learning
Wai-Chung Kwan, Hong-Ru Wang, Hui-Min Wang, Kam-Fai Wong
doi: 10.1007/s11633-022-1347-y
Dialogue policy learning (DPL) is a key component in a task-oriented dialogue (TOD) system. Its goal is to decide the next action of the dialogue system, given the dialogue state at each turn based on a learned dialogue policy. Reinforcement learning (RL) is widely used to optimize this dialogue policy. In the learning process, the user is regarded as the environment and the system as the agent. In this paper, we present an overview of the recent advances and challenges in dialogue policy from the perspective of RL. More specifically, we identify the problems and summarize corresponding solutions for RL-based dialogue policy learning. In addition, we provide a comprehensive survey of applying RL to DPL by categorizing recent methods into five basic elements in RL. We believe this survey can shed light on future research in DPL.
AI in Human-computer Gaming: Techniques, Challenges and Opportunities
Qi-Yue Yin, Jun Yang, Kai-Qi Huang, Mei-Jing Zhao, Wan-Cheng Ni, Bin Liang, Yan Huang, Shu Wu, Liang Wang
doi: 10.1007/s11633-022-1384-6
With the breakthrough of AlphaGo, human-computer gaming AI has ushered in a big explosion, attracting more and more researchers all over the world. As a recognized standard for testing artificial intelligence, various human-computer gaming AI systems (AIs) have been developed, such as Libratus, OpenAI Five, and AlphaStar, which beat professional human players. The rapid development of human-computer gaming AIs indicates a big step for decision-making intelligence, and it seems that current techniques can handle very complex human-computer games. So, one natural question arises: What are the possible challenges of current techniques in human-computer gaming and what are the future trends? To answer the above question, in this paper, we survey recent successful game AIs, covering board game AIs, card game AIs, first-person shooting game AIs, and real-time strategy game AIs. Through this survey, we 1) compare the main difficulties among different kinds of games and the corresponding techniques utilized for achieving professional human-level AIs; 2) summarize the mainstream frameworks and techniques that can be properly relied on for developing AIs for complex human-computer games; 3) raise the challenges or drawbacks of current techniques in the successful AIs; and 4) try to point out future trends in human-computer gaming AIs. Finally, we hope that this brief review can provide an introduction for beginners and inspire insight for researchers in the field of AI in human-computer gaming.
Research Article
Mitigating Spurious Correlations for Self-supervised Recommendation
Xin-Yu Lin, Yi-Yan Xu, Wen-Jie Wang, Yang Zhang, Fu-Li Feng
doi: 10.1007/s11633-022-1374-8
Recent years have witnessed the great success of self-supervised learning (SSL) in recommendation systems. However, SSL recommender models are likely to suffer from spurious correlations, leading to poor generalization. To mitigate spurious correlations, existing work usually pursues ID-based SSL recommendation or utilizes feature engineering to identify spurious features. Nevertheless, ID-based SSL approaches sacrifice the positive impact of invariant features, while feature engineering methods require high-cost human labeling. To address the problems, we aim to automatically mitigate the effect of spurious correlations. This objective requires to 1) automatically mask spurious features without supervision, and 2) block the negative effect transmission from spurious features to other features during SSL. To handle the two challenges, we propose an invariant feature learning framework, which first divides user-item interactions into multiple environments with distribution shifts and then learns a feature mask mechanism to capture invariant features across environments. Based on the mask mechanism, we can remove the spurious features for robust predictions and block the negative effect transmission via mask-guided feature augmentation. Extensive experiments on two datasets demonstrate the effectiveness of the proposed framework in mitigating spurious correlations and improving the generalization abilities of SSL models.
Dynamic Movement Primitives Based Robot Skills Learning
Ling-Huan Kong, Wei He, Wen-Shi Chen, Hui Zhang, Yao-Nan Wang
doi: 10.1007/s11633-022-1346-z
In this article, a robot skills learning framework is developed, which considers both motion modeling and execution. In order to enable the robot to learn skills from demonstrations, a learning method called dynamic movement primitives (DMPs) is introduced to model motion. A staged teaching strategy is integrated into DMPs frameworks to enhance the generality such that the complicated tasks can be also performed for multi-joint manipulators. The DMP connection method is used to make an accurate and smooth transition in position and velocity space to connect complex motion sequences. In addition, motions are categorized into different goals and durations. It is worth mentioning that an adaptive neural networks (NNs) control method is proposed to achieve highly accurate trajectory tracking and to ensure the performance of action execution, which is beneficial to the improvement of reliability of the skills learning system. The experiment test on the Baxter robot verifies the effectiveness of the proposed method.
DynamicRetriever: A Pre-trained Model-based IR System Without an Explicit Index
Yu-Jia Zhou, Jing Yao, Zhi-Cheng Dou, Ledell Wu, Ji-Rong Wen
doi: 10.1007/s11633-022-1373-9
Web search provides a promising way for people to obtain information and has been extensively studied. With the surge of deep learning and large-scale pre-training techniques, various neural information retrieval models are proposed, and they have demonstrated the power for improving search (especially, the ranking) quality. All these existing search methods follow a common paradigm, i.e., index-retrieve-rerank, where they first build an index of all documents based on document terms (i.e., sparse inverted index) or representation vectors (i.e., dense vector index), then retrieve and rerank retrieved documents based on the similarity between the query and documents via ranking models. In this paper, we explore a new paradigm of information retrieval without an explicit index but only with a pre-trained model. Instead, all of the knowledge of the documents is encoded into model parameters, which can be regarded as a differentiable indexer and optimized in an end-to-end manner. Specifically, we propose a pre-trained model-based information retrieval (IR) system called DynamicRetriever, which directly returns document identifiers for a given query. Under such a framework, we implement two variants to explore how to train the model from scratch and how to combine the advantages of dense retrieval models. Compared with existing search methods, the model-based IR system parameterizes the traditional static index with a pre-training model, which converts the document semantic mapping into a dynamic and updatable process. Extensive experiments conducted on the public search benchmark Microsoft machine reading comprehension (MS MARCO) verify the effectiveness and potential of our proposed new paradigm for information retrieval.
Vision Enhanced Generative Pre-trained Language Model for Multimodal Sentence Summarization
Liqiang Jing, Yiren Li, Junhao Xu, Yongcan Yu, Pei Shen, Xuemeng Song
doi: 10.1007/s11633-022-1372-x
Multimodal sentence summarization (MMSS) is a new yet challenging task that aims to generate a concise summary of a long sentence and its corresponding image. Although existing methods have gained promising success in MMSS, they overlook the powerful generation ability of generative pre-trained language models (GPLMs), which have shown to be effective in many text generation tasks. To fill this research gap, we propose to using GPLMs to promote the performance of MMSS. Notably, adopting GPLMs to solve MMSS inevitably faces two challenges: 1) What fusion strategy should we use to inject visual information into GPLMs properly? 2) How to keep the GPLM′s generation ability intact to the utmost extent when the visual feature is injected into the GPLM. To address these two challenges, we propose a vision enhanced generative pre-trained language model for MMSS, dubbed as Vision-GPLM. In Vision-GPLM, we obtain features of visual and textual modalities with two separate encoders and utilize a text decoder to produce a summary. In particular, we utilize multi-head attention to fuse the features extracted from visual and textual modalities to inject the visual feature into the GPLM. Meanwhile, we train Vision-GPLM in two stages: the vision-oriented pre-training stage and fine-tuning stage. In the vision-oriented pre-training stage, we particularly train the visual encoder by the masked language model task while the other components are frozen, aiming to obtain homogeneous representations of text and image. In the fine-tuning stage, we train all the components of Vision-GPLM by the MMSS task. Extensive experiments on a public MMSS dataset verify the superiority of our model over existing baselines.
Dual-domain and Multiscale Fusion Deep Neural Network for PPG Biometric Recognition
Chun-Ying Liu, Gong-Ping Yang, Yu-Wen Huang, Fu-Xian Huang
doi: 10.1007/s11633-022-1366-8
Photoplethysmography (PPG) biometrics have received considerable attention. Although deep learning has achieved good performance for PPG biometrics, several challenges remain open: 1) How to effectively extract the feature fusion representation from time and frequency PPG signals. 2) How to effectively capture a series of PPG signal transition information. 3) How to extract time-varying information from one-dimensional time-frequency sequential data. To address these challenges, we propose a dual-domain and multiscale fusion deep neural network (DMFDNN) for PPG biometric recognition. The DMFDNN is mainly composed of a two-branch deep learning framework for PPG biometrics, which can learn the time-varying and multiscale discriminative features from the time and frequency domains. Meanwhile, we design a multiscale extraction module to capture transition information, which consists of multiple convolution layers with different receptive fields for capturing multiscale transition information. In addition, the dual-domain attention module is proposed to strengthen the domain of greater contributions from time-domain and frequency-domain data for PPG biometrics. Experiments on the four datasets demonstrate that DMFDNN outperforms the state-of-the-art methods for PPG biometrics.
FedFV: A Personalized Federated Learning Framework for Finger Vein Authentication
Feng-Zhao Lian, Jun-Duan Huang, Ji-Xin Liu, Guang Chen, Jun-Hong Zhao, Wen-Xiong Kang
doi: 10.1007/s11633-022-1341-4
Most finger vein authentication systems suffer from the problem of small sample size. However, the data augmentation can alleviate this problem to a certain extent but did not fundamentally solve the problem of category diversity. So the researchers resort to pre-training or multi-source data joint training methods, but these methods will lead to the problem of user privacy leakage. In view of the above issues, this paper proposes a federated learning-based finger vein authentication framework (FedFV) to solve the problem of small sample size and category diversity while protecting user privacy. Through training under FedFV, each client can share the knowledge learned from its user′s finger vein data with the federated client without causing template leaks. In addition, we further propose an efficient personalized federated aggregation algorithm, named federated weighted proportion reduction (FedWPR), to tackle the problem of non-independent identically distribution caused by client diversity, thus achieving the best performance for each client. To thoroughly evaluate the effectiveness of FedFV, comprehensive experiments are conducted on nine publicly available finger vein datasets. Experimental results show that FedFV can improve the performance of the finger vein authentication system without directly using other client data. To the best of our knowledge, FedFV is the first personalized federated finger vein authentication framework, which has some reference value for subsequent biometric privacy protection research.
ECG Biometrics via Enhanced Correlation and Semantic-rich Embedding
Kui-Kui Wang, Gong-Ping Yang, Lu Yang, Yu-Wen Huang, Yi-Long Yin
doi: 10.1007/s11633-022-1345-0
Electrocardiogram (ECG) biometric recognition has gained considerable attention, and various methods have been proposed to facilitate its development. However, one limitation is that the diversity of ECG signals affects the recognition performance. To address this issue, in this paper, we propose a novel ECG biometrics framework based on enhanced correlation and semantic-rich embedding. Firstly, we construct an enhanced correlation between the base feature and latent representation by using only one projection. Secondly, to fully exploit the semantic information, we take both the label and pairwise similarity into consideration to reduce the influence of ECG sample diversity. Furthermore, to solve the objective function, we propose an effective and efficient algorithm for optimization. Finally, extensive experiments are conducted on two benchmark datasets, and the experimental results show the effectiveness of our framework.
Effective and Robust Detection of Adversarial Examples via Benford-Fourier Coefficients
Cheng-Cheng Ma, Bao-Yuan Wu, Yan-Bo Fan, Yong Zhang, Zhi-Feng Li
doi: 10.1007/s11633-022-1328-1
Adversarial example has been well known as a serious threat to deep neural networks (DNNs). In this work, we study the detection of adversarial examples based on the assumption that the output and internal responses of one DNN model for both adversarial and benign examples follow the generalized Gaussian distribution (GGD) but with different parameters (i.e., shape factor, mean, and variance). GGD is a general distribution family that covers many popular distributions (e.g., Laplacian, Gaussian, or uniform). Therefore, it is more likely to approximate the intrinsic distributions of internal responses than any specific distribution. Besides, since the shape factor is more robust to different databases rather than the other two parameters, we propose to construct discriminative features via the shape factor for adversarial detection, employing the magnitude of Benford-Fourier (MBF) coefficients, which can be easily estimated using responses. Finally, a support vector machine is trained as an adversarial detector leveraging the MBF features. Extensive experiments in terms of image classification demonstrate that the proposed detector is much more effective and robust in detecting adversarial examples of different crafting methods and sources compared to state-of-the-art adversarial detection methods.