So far, we have tools for addressing noise and speech variability through our feature extraction. We have HMM models that can convert those features into phonemes and address the sequencing problems for our full acoustic model. We haven’t yet solved the problems in language ambiguity though. The ASR system can’t tell from the acoustic model which combinations of words are most reasonable. That requires knowledge. We either need to provide that knowledge to the model or give it a mechanism to learn this contextual information on its own. We’ll talk about possible solutions to these problems, next.