Sign Laguage To Text Convertor - Synopsis - Docx - Google Drive
Sign Laguage To Text Convertor - Synopsis - Docx - Google Drive
Sign Laguage To Text Convertor - Synopsis - Docx - Google Drive
on
of
BACHELOR OF TECHNOLOGY
IN
Submitted by:
JUL-DEC-2022
Abstract
People require communication to communicate with each other. “Specially abled people”,
those who have speech or hearing disorder, “Mute” and “Deaf” people respectively, are
always dependent on some sort of visual communication. People without visual and
hearing disabilities sometimes face difficulties and cannot communicate with specially
abled people due to lack of sign language education. Sign language is well received
among them and they use it to express themselves. To achieve two-way communication
between specially abled people and the general public there is a need to build a system
that can interpret the gestures into text and speech. A vision-based technology of hand
gesture recognition is an important part of human- computer interaction. Technology like
gesture recognition can help us build a framework that can interpret sign
language/gesture into text and speech. Gestures by hand which can represent a notion
using unique shapes and finger position have a scope for human machine interaction. The
major steps involved in designing the system are: tracking, segmentation, gesture
acquisition, feature extraction, gesture recognition and conversion into text. This project
proposes a deep learning-based model that detects and recognizes the words from a
person’s gestures. Deep learning models, namely, LSTM (feedback-based learning
models), to recognize signs from Sign Language gestures.
INTRODUCTION
People with a hearing impairment are usually deprived of general communication, as they
find it difficult at times to interact with people with their gestures, as only a very few of
those are recognized by most people. Also, the general public finds it difficult to
understand sign language used by most of the specially abled people. To make this
communication effective between specially- abled and general public there is a need to
develop a system where both can understand each other.
The persistent problem in the present Sign Language Recognition is that all the
implemented systems rely on static gesture recognition which is very slow or not handy
to be used by speech impaired people. In all the existing systems they use a single gesture
for a single character which takes much time and isn't efficient working of the system.
Also, most of the solutions available are either very expensive or business oriented, and
not reachable to general public.
Aim of this project is to develop a concept of virtual talking system using a camera
sensor for people who are in need, this concept achieved by using image processing and
human hand gesture input. This mainly helps people who are deaf-mute.
• Implementing a single gesture for a word and single gesture for a phrase so that it
will be more natural.
• The problem statement revolves around the idea of a camera- based sign language
recognition system, so that the product will be Cost efficient.
• Objective of this project is to design a solution that is intuitive, simple and user
friendly.
• Communication for normal society is not difficult. It should be the same way for
the Speech impaired individuals.
SOLUTION DOMAIN
We know that traditional gesture recognition uses Image Classification techniques like
CNN. A convolutional neural network, or CNN, is a deep learning neural network
designed for processing structured arrays of data such as images. CNNs are widely used
in computer vision and have become the state of the art for many visual applications such
as image classification. However, CNN fails to recognize objects when images are blurry,
or conditions are not suitable. In most of the present sign language recognition systems
CNN is used. It also struggles to recognize signs if perfect conditions are not met. For
accurate sign language recognition we need a system that focuses on gestures and their
motion, rather than the surroundings. To improvise on that, we are using Mediapipe
holistic pipeline. The MediaPipe Holistic pipeline integrates separate models for pose,
face and hand components, each of which are optimized for their particular domain.
However, because of their different specializations, the input to one component is not
well-suited for the others. But MediaPipe Holistic has a multi-stage pipeline, which treats
the different regions using a region appropriate image resolution as shown below. It
focuses on the gestures and their motion and creates time series data.
Now that we have generated time-series data we need such a model that can be trained on
time series data and LSTM does that for us.
LSTM: Long Short-Term Memory is a kind of recurrent neural network(RNN) which
can retain the information for a long period of time. It is used for processing, predicting,
and classifying based on time-series data. Unlike CNN, LSTM is generally trained on
sequence of data i.e., sequence of images. It is much faster than CNN for recognition of
time-series data.
We start by collecting key points from mediapipe holistic and collect a bunch of data
from keypoints i.e., our hands, on our body and on our face and save data in the form of
numpy arrays. We can vary the number of sequences according to our need.
● We then build a LSTM model and train with our stored data which helps us to
detect action with a number of frames.
● The number of epochs for the model is determined by us, if we increase the
number of epochs the accuracy increases but time taken to run the model also
increases and overfitting of model can happen, for gesture recognition.
● Once training is done, we can use this model for real time hand gesture detection
and simultaneously convert the gesture to text using OpenCV.
SYSTEM DOMAIN
Hardware Requirements
● CPU: Intel Core i5 10gen/AMD Ryzen 3600X or higher
Software Requirements
1. OpenCV: OpenCV(Open Source Computer Vision) is an open source library of
programming functions used for real-time computer-vision. It is mainly used for
image processing, video capture and analysis for features like face and object
recognition. It is written in C++ which is its primary interface, We will be using
the video module which covers the video analysis concepts such as motion
estimation, background subtraction, and object tracking.
2. Tkinter/Kivy/PyQT: GUI libraries for Python provides a fast and easy way to
create GUI applications.
6. Numpy: NumPy is a Python library used for working with arrays. It also has
functions for working in the domain of linear algebra, fourier transform, and
matrices.
7. Matplotlib: Matplotlib is a low level graph plotting library in python that serves
as a visualization utility.
8. Scikit-learn: Scikit-learn (Sklearn) is the most useful and robust library for
machine learning in Python. It provides a selection of efficient tools for machine
learning and statistical modeling including classification, regression, clustering
and dimensionality reduction via a consistent interface in Python. This library,
which is largely written in Python, is built upon NumPy, SciPy and Matplotlib.
EXPECTED OUTCOME