Speech Recognition Seminar

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 19

1

SPEECH RECOGNITION

Seminar By: Suraj Vitthal Gaikwad


07-Feb-2013 Guided By: Prof. S. R. Lahane
Outline
2

 Introduction
 Speech Recognition Process
 Types Of Speech Recognition Systems
 Algorithms
 Applications
 Advantages & Disadvantages
 Future Scope
 Conclusion

SPEECH RECOGNITION 07-Feb-13


Introduction
3

 Speech recognition is the process by which a


computer (or any other type of machine) identifies
spoken words.
 Basically, it means talking to your computer, AND
having it correctly understand what you are saying.
 An alternative to traditional methods of interacting
with a computer.

SPEECH RECOGNITION 07-Feb-13


4

SPEECH RECOGNITION 07-Feb-13


Speech Recognition Process
5

 Signal Processing
 Convert the audio wave into a sequence of feature vectors
 Speech Recognition
 Decode the sequence of feature vectors into a sequence of words
 Semantic Interpretation
 Determine the meaning of the recognized words
 Dialog Management
 Correct the errors and help get the task done
 Response Generation
 What words to use so as to maximize user understanding
 Speech Synthesis (Text to Speech)
 Generate synthetic speech from a ‘marked-up’ word string

SPEECH RECOGNITION 07-Feb-13


Typical Speech Recognition Process
6

SPEECH RECOGNITION 07-Feb-13


Types of Speech Recognition
7

 Isolated Words
 Single utterance at a time
 Connected Words
 Separate utterances together with a minimal pause
between them
 Continuous Speech
 Rehearsed speech or dictation
 Spontaneous Speech
 Natural speech

SPEECH RECOGNITION 07-Feb-13


Algorithms
8

 Dynamic Time Warping


 an algorithm for measuring similarity between two
sequences which may vary in time or speed.
 Hidden Markov Models
 Neural Networks

SPEECH RECOGNITION 07-Feb-13


Hidden Markov Model
9

 In a HMM, the state is not directly visible, but


output, dependent on the state, is visible.
 Each state has a probability distribution over the
possible output tokens. Therefore the sequence of
tokens generated by an HMM gives some
information about the sequence of states.

x — states
y — possible observations
a — state transition probabilities
b — output probabilities

SPEECH RECOGNITION 07-Feb-13


HMM Example
10

SPEECH RECOGNITION 07-Feb-13


Neural Network
11

 A neural network consists of an interconnected group


of artificial neurons, and it processes information using
a connectionist approach to computation.
 An NN is typically defined by three types of
parameters:
 The interconnection pattern between different layers of
neurons
 The learning process for updating the weights of the
interconnections
 The activation function that converts a neuron's weighted
input to its output activation.

SPEECH RECOGNITION 07-Feb-13


Speech Recognition Softwares
12

 Open source
 Julius
 Macintosh
 Dragon Dictate
 Mobile Devices/ Smartphone
 Google Now
 Siri
 Micromax AISHA
(Artificial Intelligence Speech Handset Assistant)
 S Voice
 Iris (Intelligent Rival Imitator of Siri)
 Windows
 Dragon NaturallySpeaking
 Windows Speech Recognition

SPEECH RECOGNITION 07-Feb-13


Applications
13

 Games and Edutainment


 Data Entry
 Document Editing
 Speaker Identification/Verification
 Automation at Call Centers
 Medical/Disabilities
 Fighter Aircrafts

SPEECH RECOGNITION 07-Feb-13


Advantages
14

 Increases Productivity
 Can help with menial computer tasks
 Can help people with disabilities
 Cost Effective
 Diminishes Spelling Mistakes

SPEECH RECOGNITION 07-Feb-13


Disadvantages
15

 Inaccuracy & Slowness


 Vocal Strain
 Adaptability
 Out-of-Vocabulary (OOV) Words
 Spontaneous Speech. Etc
 Accent, Dialect and Mixed Language

SPEECH RECOGNITION 07-Feb-13


Future Scope
16

 Achieving efficient speaker independent word


recognition
 SRS may have the ability to distinguish nuances of
speech and meanings of words.
 Stand alone Speech Recognition Systems.
 Wearable Speech Recognition System.
 Talk with all the devices.

SPEECH RECOGNITION 07-Feb-13


Conclusion
17

 Within five years, speech recognition technology


will become so pervasive in our daily lives that
service environments lacking this technology will be
considered inferior.
 Speech recognition will revolutionize the way
people interacted with Smart devices & will,
ultimately, differentiate the upcoming technologies.

SPEECH RECOGNITION 07-Feb-13


References
18

 JOE TEBELSKIS {1995}, SPEECH RECOGNITION USING NEURAL NETWORKS, School of


Computer Science, Carnegie Mellon University
 KÅRE SJÖLANDER {2003}, An HMM-based system for automatic segmentation and
alignment of speech, Umeå University, Department of Philosophy and Linguistics
 KLAUS RIES {1999}, HMM AND NEURAL NETWORK BASED SPEECH ACT DETECTION,
International Conference on Acoustics and Signal Processing (ICASSP’99)
 B. PLANNERER {2005}, AN INTRODUCTION TO SPEECH RECOGNITION
 KIMBERLEE A. KEMBLE, AN INTRODUCTION TO SPEECH RECOGNITION, Voice Systems
Middleware Education, IBM
 LAURA SCHINDLER {2005}, A SPEECH RECOGNITION AND SYNTHESIS TOOL,
Department of Mathematics and Computer Science, College of Arts and Science, Stetson
University
 MIKAEL NILSSON, MARCUS EGNARSSON {2002}, SPEECH RECOGNITION USING
HMM, Blekinge Institute Of technology

SPEECH RECOGNITION 07-Feb-13


19

ANY QUESTIONS…??

SPEECH RECOGNITION 07-Feb-13

You might also like