Speech Recognition Seminar

1
SPEECH RECOGNITION
Seminar By: Suraj Vitthal Gaikwad

07-Feb-2013 Guided By: Prof. S. R. Lahane
Outline
2
 Introduction
 Speech Recognition Process
 Types Of Speech Recognition Systems
 Algorithms
 Applications
 Advantages & Disadvantages
 Future Scope
 Conclusion
SPEECH RECOGNITION 07-Feb-13

Introduction
3
 Speech recognition is the process by which a

computer (or any other type of machine) identifies
spoken words.
 Basically, it means talking to your computer, AND
having it correctly understand what you are saying.
 An alternative to traditional methods of interacting
with a computer.

4

Speech Recognition Process
5
 Signal Processing
 Convert the audio wave into a sequence of feature vectors
 Speech Recognition
 Decode the sequence of feature vectors into a sequence of words
 Semantic Interpretation
 Determine the meaning of the recognized words
 Dialog Management
 Correct the errors and help get the task done
 Response Generation
 What words to use so as to maximize user understanding
 Speech Synthesis (Text to Speech)
 Generate synthetic speech from a ‘marked-up’ word string

Typical Speech Recognition Process
6

Types of Speech Recognition
7
 Isolated Words
 Single utterance at a time
 Connected Words
 Separate utterances together with a minimal pause
between them
 Continuous Speech
 Rehearsed speech or dictation
 Spontaneous Speech
 Natural speech

Algorithms
8
 Dynamic Time Warping

 an algorithm for measuring similarity between two
sequences which may vary in time or speed.
 Hidden Markov Models
 Neural Networks

Hidden Markov Model
9
 In a HMM, the state is not directly visible, but

output, dependent on the state, is visible.
 Each state has a probability distribution over the
possible output tokens. Therefore the sequence of
tokens generated by an HMM gives some
information about the sequence of states.
x — states
y — possible observations
a — state transition probabilities
b — output probabilities

HMM Example
10

Neural Network
11
 A neural network consists of an interconnected group

of artificial neurons, and it processes information using
a connectionist approach to computation.
 An NN is typically defined by three types of
parameters:
 The interconnection pattern between different layers of
neurons
 The learning process for updating the weights of the
interconnections
 The activation function that converts a neuron's weighted
input to its output activation.

Speech Recognition Softwares
12
 Open source
 Julius
 Macintosh
 Dragon Dictate
 Mobile Devices/ Smartphone
 Google Now
 Siri
 Micromax AISHA
(Artificial Intelligence Speech Handset Assistant)
 S Voice
 Iris (Intelligent Rival Imitator of Siri)
 Windows
 Dragon NaturallySpeaking
 Windows Speech Recognition

Applications
13
 Games and Edutainment

 Data Entry
 Document Editing
 Speaker Identification/Verification
 Automation at Call Centers
 Medical/Disabilities
 Fighter Aircrafts

Advantages
14
 Increases Productivity
 Can help with menial computer tasks
 Can help people with disabilities
 Cost Effective
 Diminishes Spelling Mistakes

Disadvantages
15
 Inaccuracy & Slowness

 Vocal Strain
 Adaptability
 Out-of-Vocabulary (OOV) Words
 Spontaneous Speech. Etc
 Accent, Dialect and Mixed Language

Future Scope
16
 Achieving efficient speaker independent word

recognition
 SRS may have the ability to distinguish nuances of
speech and meanings of words.
 Stand alone Speech Recognition Systems.
 Wearable Speech Recognition System.
 Talk with all the devices.

Conclusion
17
 Within five years, speech recognition technology

will become so pervasive in our daily lives that
service environments lacking this technology will be
considered inferior.
 Speech recognition will revolutionize the way
people interacted with Smart devices & will,
ultimately, differentiate the upcoming technologies.

References
18
 JOE TEBELSKIS {1995}, SPEECH RECOGNITION USING NEURAL NETWORKS, School of

Computer Science, Carnegie Mellon University
 KÅRE SJÖLANDER {2003}, An HMM-based system for automatic segmentation and
alignment of speech, Umeå University, Department of Philosophy and Linguistics
 KLAUS RIES {1999}, HMM AND NEURAL NETWORK BASED SPEECH ACT DETECTION,
International Conference on Acoustics and Signal Processing (ICASSP’99)
 B. PLANNERER {2005}, AN INTRODUCTION TO SPEECH RECOGNITION
 KIMBERLEE A. KEMBLE, AN INTRODUCTION TO SPEECH RECOGNITION, Voice Systems
Middleware Education, IBM
 LAURA SCHINDLER {2005}, A SPEECH RECOGNITION AND SYNTHESIS TOOL,
Department of Mathematics and Computer Science, College of Arts and Science, Stetson
University
 MIKAEL NILSSON, MARCUS EGNARSSON {2002}, SPEECH RECOGNITION USING
HMM, Blekinge Institute Of technology

19
ANY QUESTIONS…??

Speech Recognition Seminar

Uploaded by

Copyright:

Available Formats

Speech Recognition Seminar

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Speech Recognition Seminar

Uploaded by

Copyright:

Available Formats

1

Seminar By: Suraj Vitthal Gaikwad

SPEECH RECOGNITION 07-Feb-13

 Speech recognition is the process by which a

SPEECH RECOGNITION 07-Feb-13

SPEECH RECOGNITION 07-Feb-13

SPEECH RECOGNITION 07-Feb-13

SPEECH RECOGNITION 07-Feb-13

SPEECH RECOGNITION 07-Feb-13

 Dynamic Time Warping

SPEECH RECOGNITION 07-Feb-13

 In a HMM, the state is not directly visible, but

SPEECH RECOGNITION 07-Feb-13

SPEECH RECOGNITION 07-Feb-13

 A neural network consists of an interconnected group

SPEECH RECOGNITION 07-Feb-13

SPEECH RECOGNITION 07-Feb-13

 Games and Edutainment

SPEECH RECOGNITION 07-Feb-13

SPEECH RECOGNITION 07-Feb-13

 Inaccuracy & Slowness

SPEECH RECOGNITION 07-Feb-13

 Achieving efficient speaker independent word

SPEECH RECOGNITION 07-Feb-13

 Within five years, speech recognition technology

SPEECH RECOGNITION 07-Feb-13

 JOE TEBELSKIS {1995}, SPEECH RECOGNITION USING NEURAL NETWORKS, School of

SPEECH RECOGNITION 07-Feb-13

SPEECH RECOGNITION 07-Feb-13

You might also like