8 – HMMs in Speech Recognition

You learned the basics of hidden Markov models in an earlier lesson. To recap, HMMs are useful for detecting patterns through time. This is exactly what we are trying to do with an acoustic model. HMMs can solve the challenge, we identified earlier, of time variability. For instance, my earlier example of speech versus speech, the same word but spoken at different speeds. We could train an HMM with label time series sequences to create individual HMM models for each particular sound unit. The units could be phonemes, syllables, words, or even groups of words. Training and recognition are fairly straightforward, if our training and test data are isolated units. We have many examples, we train them, we get a model for each word. Then recognition of a single word comes down to scoring the new observation likelihood over each model. It gets more complicated when our training data consists of continuous phrases or sentences which we’ll refer to as utterances. How can the series of phonemes or words be separated in training? In this example, we have the word brick, connected continuously in nine different utterance combinations. To train from continuous utterances HMMs can be tied together as pairs. We define these connectors as HMMs. In this case, we would train her brick, my brick, a brick, brick house, brick walkway, and brick wall, by tying the connecting states together. This will increase dimensionality. Not only will we need an HMM for each word, we need one for each possible work connection, which could be a lot if there are a lot of words. The same principle applies if we use phonemes. But for large vocabularies, the dimensionality increase isn’t as profound as with words. With a set of 40 phonemes, we need 1600 HMMs to account for the transitions. Still a manageable number. Once trained, the HMM models can be used to score new utterances through chains of probable paths.

%d 블로거가 이것을 좋아합니다: