Research Article
Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition
Table 4
Example of a feature vector in RNNLM experiments for word sam (Eng. am).
| Index | Feature type | Feature | Value | Remarks |
| 0 | Constant | Constant | 0.01 | For math reasons | 1–5 | Special | Special word feat. | 0 | For bos/eos/unk/brk/silence | 6 | Unigram | Unigram prob. | 0.00788 | Scaled unigram log-prob. | 7 | Length | Word length | 0.00186 | Scaled word length | 8–36 | Word | 1-hot vect. elem. | 0 | — | 37 | Word | 1-hot vect. elem. | 0.21 | Scale based on unigram prob. | 38–97644 | Word | 1-hot vect. elem. | 0 | — | 97645–97657 | Final | Lett. n-gram prob. | 0 | — | 97658 | Final | 3-gram -am$ pr. | 0.12 | Scaled letter 3-gram prob. | 97659–97758 | Final | Lett. n-gram prob. | 0 | — | 97759 | Final | 2-gram -m$ pr. | 0.047 | Scaled letter 2-gram prob. | 97760–97869 | Final | Lett. n-gram prob. | 0 | — | 97870–98050 | Initial | Lett. n-gram prob. | 0 | — | 98051 | Initial | 2-gram ^s- pr. | 0.03 | Scaled letter 2-gram prob. | 98052 | Initial | 3-gram ^sa- pr. | 0.069 | Scaled letter 3-gram prob. | 98053–98144 | Initial | Lett. n-gram prob. | 0 | — | 98145–98300 | Match | Lett. n-gram prob. | 0 | — | 98301 | Match | 2-gram -am- pr. | 0.064 | Scaled letter 2-gram prob. | 98302–100451 | Match | Lett. n-gram prob. | 0 | — | 100452 | Match | 2-gram -sa- pr. | 0.057 | Scaled letter 2-gram prob. | 100453–100459 | Match | Lett. n-gram prob. | 0 | — | 100460 | Match | 3-gram -sam- pr. | 0.11 | Scaled letter 3-gram prob. | 100461–101306 | Match | Lett. n-gram prob. | 0 | — |
|
|