Research Article
Feature Aggregation with Two-Layer Ensemble Framework for Multilingual Speech Emotion Recognition
Table 1
Classification of SER research: neural networks, feature representation, and multimodal.
| Neural network | Features representation | Multimodal |
| CNN | Spectrogram | Speech + text | RNN | Numeric value | Speech + video | CNN + RNN | Spectrogram + numeric value | Speech + text + video |
|
|