Presentation On Speech Recognition
Presentation On Speech Recognition
Presentation On Speech Recognition
Proposed by: Aditya Sharma Computer Science and Engineering Final Year Roll No: 1005210005
Problem Statement
Given a speech sample uttered by a given user , the system will sense the voice activity and extract out the significant voice sample thereafter converting it into text message according to the language specification and model .
This text message can be further used to send commands to the system or as an input into an expert system.
Basic Challenges
Robustness graceful degradation, not catastrophic failure Portability independence of computing platform Adaptability to changing conditions (different mic, background noise, new speaker, new task domain, new language even) Language Modelling is there a role for linguistics in improving the language models? Confidence Measures better methods to evaluate the absolute correctness of hypotheses. Out-of-Vocabulary (OOV) Words Systems must have some method of detecting OOV words, and dealing with them in a sensible way. Spontaneous Speech disfluencies (filled pauses, false starts, hesitations, ungrammatical constructions etc.) remain a problem. Prosody Stress, intonation, and rhythm convey important information for word recognition and the user's intentions (e.g., sarcasm, anger) Accent, dialect and mixed language non-native speech is a huge problem, especially where code-switching is commonplace
Language Model
Word Lexicon
Isolated Words
1967
1972
1977
1982 Year
1997
2002
Dialog Systems
Robust Systems
2002
2005 Year
2008
2011
Software Modules
Preprocessing Voice Activation Detection Input Noise Cancelling Pre-emphasis
Feature Extraction
Post processing
Observations for HMM based classification Weight Function Normalization O={O1, O2,O3,.On}
By determining the probability of a sequences to the HMMs, we can decide Which HMM could most probably generate the sequence.
There are several idea about what to model: Isolated word recognition (HMM for each word) Monophone acoustic model (HMM for each phone) Triphone Acoustic model (HMM for each three phone sequence)
HMM of a Triphone
HMM of a Triphone
HMM of a Triphone
Language model
HMM Limitations
Data intensive Computationally intensive
3 states per triphone 3 Gaussian mixture for each state 262 trillion trigrams 2-20 phonemes per word in 64k vocabulary