Citation: | Yang Liu, Haoqin Sun, Wenbo Guan, Yuqi Xia, Zhen Zhao. Speech Emotion Recognition Using Cascaded Attention Network with Joint Loss for Discrimination of Confusions. Machine Intelligence Research, vol. 20, no. 4, pp.595-604, 2023. https://doi.org/10.1007/s11633-022-1356-x |
[1] |
J. H. Tao, J. Huang, Y. Li, Z. Lian, M. Y. Niu. Correction to: Semi-supervised ladder networks for speech emotion recognition. International Journal of Automation and Computing, vol. 18, no. 4, Article number 680, 2021. DOI: 10.1007/s11633-019-1215-6.
|
[2] |
E. M. Schmidt, Y. E. Kim. Learning emotion-based acoustic features with deep belief networks. In Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, USA, pp. 65–68, 2011. DOI: 10.1109/ASPAA.2011.6082328.
|
[3] |
K. Han, D. Yu, I. Tashev. Speech emotion recognition using deep neural network and extreme learning machine. In Proceedings of the 15th Annual Conference of the International Speech Communication Association, Singapore, pp. 223–227, 2014.
|
[4] |
Q. Mao, M. Dong, Z. W. Huang, Y. Z. Zhan. Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia, vol. 16, no. 8, pp. 2203–2213, 2014. DOI: 10.1109/TMM.2014.2360798.
|
[5] |
M. Y. Chen, X. J. He, J. Yang, H. Zhang. 3D convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Processing Letters, vol. 25, no. 10, pp. 1440–1444, 2018. DOI: 10.1109/LSP.2018.2860246.
|
[6] |
Y. Liu, H. Q. Sun, W. B. Guan, Y. Q. Xia, Z. Zhao. Discriminative feature representation based on cascaded attention network with adversarial joint loss for speech emotion recognition. In Proceedings of Interspeech, pp. 4750–4754, 2022.
|
[7] |
M. Seyedmahdad, E. Barsoum, C. Zhang. Automatic speech emotion recognition using recurrent neural networks with local attention. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, Los Angeles, USA, pp. 2227–2231, 2017.
|
[8] |
Q. P. Chen, G. M. Huang. A novel dual attention-based BLSTM with hybrid features in speech emotion recognition. Engineering Applications of Artificial Intelligence, vol. 102, Article number 104277, 2021.
|
[9] |
Y. Liu, H. Q. Sun, W. B. Guan, Y. Q. Xia, Z. Zhao. Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework. Speech Communication, vol. 139, pp. 1–9, 2022. DOI: 10.1016/j.specom.2022.02.006.
|
[10] |
M. K. Xu, F. Zhang, S. U. Khan. Improve accuracy of speech emotion recognition with attention head fusion. In Proceedings of the 10th Annual Computing and Communication Workshop and Conference, IEEE, Las Vegas, USA, pp. 1058–1064, 2020. DOI: 10.1109/CCWC47524.2020.9031207.
|
[11] |
C. Busso, M. Bulut, C. C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang, S. Lee, S. S. Narayanan. IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, vol. 42, no. 4, pp. 335–359. DOI: 10.1007/s10579-008-9076-6.
|
[12] |
S. Sahu, R. Gupta, G. Sivaraman, W. AbdAlmageed, C. Y. Espy-Wilson. Adversarial auto-encoders for speech based emotion recognition. In Proceedings of the 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, pp. 1243–1247, 2017.
|
[13] |
D. Y. Dai, Z. Y. Wu, R. N. Li, X. X. Wu, J. Jia, H. Meng. Learning discriminative features from spectrograms using center loss for speech emotion recognition. In Proceedings of ICASSP/IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Brighton, UK, pp. 7405–7409, 2019. DOI: 10.1109/ICASSP.2019.8683765.
|
[14] |
Y. Gao, J. X. Liu, L. B. Wang, J. W. Dang. Metric learning based feature representation with gated fusion model for speech emotion recognition. In Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, pp. 4503–4507, 2021.
|
[15] |
L. Tarantino, P. N. Garner, A. Lazaridis. Self-attention for speech emotion recognition. In Proceedings of the 20th Annual Conference of the International Speech Communication Association, Graz, Austria, pp. 2578–2582, 2019.
|
[16] |
J. W. Liu, H. X. Wang. A speech emotion recognition framework for better discrimination of confusions. In Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, pp. 4483–4487, 2021.
|
[17] |
A. Satt, S. Rozenberg, R. Hoory. Efficient emotion recognition from speech using deep learning on spectrograms. In Proceedings of the 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, pp. 1089–1093, 2017.
|
[18] |
P. C. Li, Y. Song, I. V. McLoughlin, W. Guo, L. R. Dai. An attention pooling based representation learning method for speech emotion recognition. In Proceedings of the 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, pp. 3087–3091, 2018.
|