Research Article

Feature Aggregation with Two-Layer Ensemble Framework for Multilingual Speech Emotion Recognition

Table 1

Classification of SER research: neural networks, feature representation, and multimodal.

Neural networkFeatures representationMultimodal

CNNSpectrogramSpeech + text
RNNNumeric valueSpeech + video
CNN + RNNSpectrogram + numeric valueSpeech + text + video