Research Article

Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition

Table 4

Example of a feature vector in RNNLM experiments for word sam (Eng. am).

IndexFeature typeFeatureValueRemarks

0ConstantConstant0.01For math reasons
1–5SpecialSpecial word feat.0For bos/eos/unk/brk/silence
6UnigramUnigram prob.0.00788Scaled unigram log-prob.
7LengthWord length0.00186Scaled word length
8–36Word1-hot vect. elem.0
37Word1-hot vect. elem.0.21Scale based on unigram prob.
38–97644Word1-hot vect. elem.0
97645–97657FinalLett. n-gram prob.0
97658Final3-gram -am$ pr.0.12Scaled letter 3-gram prob.
97659–97758FinalLett. n-gram prob.0
97759Final2-gram -m$ pr.0.047Scaled letter 2-gram prob.
97760–97869FinalLett. n-gram prob.0
97870–98050InitialLett. n-gram prob.0
98051Initial2-gram ^s- pr.0.03Scaled letter 2-gram prob.
98052Initial3-gram ^sa- pr.0.069Scaled letter 3-gram prob.
98053–98144InitialLett. n-gram prob.0
98145–98300MatchLett. n-gram prob.0
98301Match2-gram -am- pr.0.064Scaled letter 2-gram prob.
98302–100451MatchLett. n-gram prob.0
100452Match2-gram -sa- pr.0.057Scaled letter 2-gram prob.
100453–100459MatchLett. n-gram prob.0
100460Match3-gram -sam- pr.0.11Scaled letter 3-gram prob.
100461–101306MatchLett. n-gram prob.0