Sign Laguage To Text Convertor - Synopsis - Docx - Google Drive

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Major Project Synopsis

on

Sign Language to Text Convertor


In partial fulfillment of requirements for the degree

of
BACHELOR OF TECHNOLOGY

IN

COMPUTER SCIENCE & ENGINEERING

Submitted by:

AMAN BIND [19100BTCSEMA05472]


AAYUSH INGOLE [19100BTCSEMA05469]
GLADWIN KURIAN [19100BTCSEMA05484]
YASH GOSWAMI [19100BTCSEMA05507]

Under the guidance of


PROF. BHARTI AHUJA

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

SHRI VAISHNAV INSTITUTE OF INFORMATION TECHNOLOGY

SHRI VAISHNAV VIDYAPEETH VISHWAVIDYALAYA, INDORE

JUL-DEC-2022
Abstract

People require communication to communicate with each other. “Specially abled people”,
those who have speech or hearing disorder, “Mute” and “Deaf” people respectively, are
always dependent on some sort of visual communication. People without visual and
hearing disabilities sometimes face difficulties and cannot communicate with specially
abled people due to lack of sign language education. Sign language is well received
among them and they use it to express themselves. To achieve two-way communication
between specially abled people and the general public there is a need to build a system
that can interpret the gestures into text and speech. A vision-based technology of hand
gesture recognition is an important part of human- computer interaction. Technology like
gesture recognition can help us build a framework that can interpret sign
language/gesture into text and speech. Gestures by hand which can represent a notion
using unique shapes and finger position have a scope for human machine interaction. The
major steps involved in designing the system are: tracking, segmentation, gesture
acquisition, feature extraction, gesture recognition and conversion into text. This project
proposes a deep learning-based model that detects and recognizes the words from a
person’s gestures. Deep learning models, namely, LSTM (feedback-based learning
models), to recognize signs from Sign Language gestures.
INTRODUCTION

A crucial application of gesture recognition is sign language detection. Current


technologies for gesture recognition can be divided into two types: sensor- based and
vision-based. In sensor-based methods, data glove or motion sensors are incorporated
from which the data of gestures can be extracted. Even minute details of the gesture can
be captured by the data capturing glove which ultimately enhances the performance of
the system. However, this method requires wearing a data capturing hand glove with
embedded sensors that makes it a bulky device to carry. This method affects the signer’s
usual signing ability and it also reduces user amenity. Vision based methods include
image processing. This approach provides a comfortable experience to the user. The
image is captured with the help of cameras. No extra devices are needed in the
vision-based approach. This method deals with the attributes of the image such as colour
and texture that are obligatory for integrating the gesture. Although the vision-based
approach is straightforward, it has many challenges such as the complexity and
convolution of the background, variations in illumination and tracking other postures
along with the hand object, etc. arise. Sign language provides a way for speech impaired
and hearing- impaired people to communicate with other people. Instead of a voice, sign
language uses gestures to communicate. Sign language is a standardized way of
communication in which every word and alphabet is assigned to a distinct gesture. It
would be a win-win situation for both specially abled people and the general public if
such a system is developed where sign language could be converted into text. Technology
is advancing day after day but no significant improvements are undertaken for the
betterment of specially-abled communities. About nine million people in the world are
deaf and mute. Communication between specially abled people and general people have
always been a challenging task but sign language helps them to communicate with other
people. But not everyone understands sign language and here is where our system will
come into the picture.
PROBLEM DOMAIN

People with a hearing impairment are usually deprived of general communication, as they
find it difficult at times to interact with people with their gestures, as only a very few of
those are recognized by most people. Also, the general public finds it difficult to
understand sign language used by most of the specially abled people. To make this
communication effective between specially- abled and general public there is a need to
develop a system where both can understand each other.

The persistent problem in the present Sign Language Recognition is that all the
implemented systems rely on static gesture recognition which is very slow or not handy
to be used by speech impaired people. In all the existing systems they use a single gesture
for a single character which takes much time and isn't efficient working of the system.

Also, most of the solutions available are either very expensive or business oriented, and
not reachable to general public.

Aim of this project is to develop a concept of virtual talking system using a camera
sensor for people who are in need, this concept achieved by using image processing and
human hand gesture input. This mainly helps people who are deaf-mute.
• Implementing a single gesture for a word and single gesture for a phrase so that it
will be more natural.
• The problem statement revolves around the idea of a camera- based sign language
recognition system, so that the product will be Cost efficient.
• Objective of this project is to design a solution that is intuitive, simple and user
friendly.
• Communication for normal society is not difficult. It should be the same way for
the Speech impaired individuals.
SOLUTION DOMAIN

We know that traditional gesture recognition uses Image Classification techniques like
CNN. A convolutional neural network, or CNN, is a deep learning neural network
designed for processing structured arrays of data such as images. CNNs are widely used
in computer vision and have become the state of the art for many visual applications such
as image classification. However, CNN fails to recognize objects when images are blurry,
or conditions are not suitable. In most of the present sign language recognition systems
CNN is used. It also struggles to recognize signs if perfect conditions are not met. For
accurate sign language recognition we need a system that focuses on gestures and their
motion, rather than the surroundings. To improvise on that, we are using Mediapipe
holistic pipeline. The MediaPipe Holistic pipeline integrates separate models for pose,
face and hand components, each of which are optimized for their particular domain.
However, because of their different specializations, the input to one component is not
well-suited for the others. But MediaPipe Holistic has a multi-stage pipeline, which treats
the different regions using a region appropriate image resolution as shown below. It
focuses on the gestures and their motion and creates time series data.

Now that we have generated time-series data we need such a model that can be trained on
time series data and LSTM does that for us.
LSTM: Long Short-Term Memory is a kind of recurrent neural network(RNN) which
can retain the information for a long period of time. It is used for processing, predicting,
and classifying based on time-series data. Unlike CNN, LSTM is generally trained on
sequence of data i.e., sequence of images. It is much faster than CNN for recognition of
time-series data.

We start by collecting key points from mediapipe holistic and collect a bunch of data
from keypoints i.e., our hands, on our body and on our face and save data in the form of
numpy arrays. We can vary the number of sequences according to our need.
● We then build a LSTM model and train with our stored data which helps us to
detect action with a number of frames.
● The number of epochs for the model is determined by us, if we increase the
number of epochs the accuracy increases but time taken to run the model also
increases and overfitting of model can happen, for gesture recognition.
● Once training is done, we can use this model for real time hand gesture detection
and simultaneously convert the gesture to text using OpenCV.
SYSTEM DOMAIN

Hardware Requirements
● CPU: Intel Core i5 10gen/AMD Ryzen 3600X or higher

● GPU: Nvidia GeForce GTX 1650/AMD Radeon RX 5300M or higher

● RAM: 8GB or higher

● ROM: 256GB or higher (SSD preferred)

Software Requirements
1. OpenCV: OpenCV(Open Source Computer Vision) is an open source library of
programming functions used for real-time computer-vision. It is mainly used for
image processing, video capture and analysis for features like face and object
recognition. It is written in C++ which is its primary interface, We will be using
the video module which covers the video analysis concepts such as motion
estimation, background subtraction, and object tracking.

2. Tkinter/Kivy/PyQT: GUI libraries for Python provides a fast and easy way to
create GUI applications.

3. TensorFlow: TensorFlow is an open-source end-to-end platform for creating


Machine Learning applications. It is a symbolic math library that uses dataflow
and differentiable programming to perform various tasks focused on training and
inference of deep neural networks. It allows developers to create machine learning
applications using various tools, libraries, and community resources.TensorFlow
is widely used in Machine Learning.

4. Media Pipe: MediaPipe is a Framework for building machine learning pipelines


for processing time-series data like video, audio, etc. For example -> We feed a
stream of images(Hands here) as input which comes out with hand landmarks
rendered on the images.

5. Jupyter Notebook: Jupyter Notebook is a web application for creating and


sharing computational documents. It offers a simple, streamlined,
document-centric experience.

6. Numpy: NumPy is a Python library used for working with arrays. It also has
functions for working in the domain of linear algebra, fourier transform, and
matrices.

7. Matplotlib: Matplotlib is a low level graph plotting library in python that serves
as a visualization utility.
8. Scikit-learn: Scikit-learn (Sklearn) is the most useful and robust library for
machine learning in Python. It provides a selection of efficient tools for machine
learning and statistical modeling including classification, regression, clustering
and dimensionality reduction via a consistent interface in Python. This library,
which is largely written in Python, is built upon NumPy, SciPy and Matplotlib.

9. IDE: VS Code/PyCharm/Sublime, etc.

10. OS: Windows 8 or higher/ MacOS 10 (Catalina) or higher/Ubuntu based OS


APPLICATION DOMAIN

Some real world applications of these projects are:


● It can be used to provide live captions for the online meetings.
● It can be used to detect mistakes in sign languages.
● It can be used for learning and practicing sign languages.
● Text generated from this application can be converted to speech for better
communication.
● Use hand gestures to control and automate other devices.

EXPECTED OUTCOME

● User friendly Sign language recognition system that is easy to use.


● Overall better performing sign recognition system than currently used.
● Sign language recognition system that is capable of recognizing word and
sentence level gestures.
● Complicated gestures involving motion cannot be identified.
● LSTM based implementation would not fail in dark and when the background is
very similar to skin color.
● System will fail to work in accelerated motions.
REFERENCES

1. Akshay Divkar, Rushikesh Bailkar, Dr. Chhaya S. Pawar, “Gesture Based


Real-time Indian Sign Language Interpreter”, International Journal of Scientific
Research in Computer Science, Engineering and Information Technology
(IJSRCSEIT), ISSN : 2456-3307, Volume 7 Issue 3, pp. 387-394, May-June 2021.
Available at DOI : https://doi.org/10.32628/CSEIT217374
2. Hema B N., Sania Anjum, Umme Hani, Vanaja P., Akshatha M., ”Sign Language
and Gesture Recognition for Deaf and Dumb People”, International Research
Journal of Engineering and Technology (IRJET) , e-ISSN: 2395-0056 Volume: 06
Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072
3. Ss, Shivashankara & S, Dr.Srinath. (2018). American Sign Language Recognition
System: An Optimal Approach. International Journal of Image, Graphics and
Signal Processing. 10. 10.5815/ijigsp.2018.08.03.
4. Shreyas Viswanathan, Saurabh Pandey, Kartik Sharma, Dr P Vijayakumar, “SIGN
LANGUAGE TO TEXT AND SPEECH CONVERSION USING CNN”,
International Research Journal of Modernization in Engineering Technology and
Science, e-ISSN: 2582-5208, Volume:03/Issue:05/May-2021, www.irjmets.com
5. Shruty M. Tomar, Dr.Narendra M. Patel, Dr. Darshak G. T., “A Survey on Sign
Language Recognition Systems”, International Journal of Creative research
Thoughts (IJCRT) 2021, Volume 9, Issue 3 March 2021 | ISSN: 2320-2882,
https://ijcrt.org/IJCRT2103503.pdf
6. Mahesh Kumar N B, “Conversion of sign language into text”, International
Journal of Creative research Thoughts (IJCRT) 2021, ISSN 0973-4562 Volume
13, Number 9 (2018) pp. 7154-7161
7. He Siming, “Research of a Sign Language Translation System Based on Deep
Learning”, International Conference on Artificial Intelligence and Advanced
Manufacturing (AIAM), 2019, Publisher: IEEE,
DOI: 10.1109/AIAM48774.2019.00083
8. Kothadiya, D.; Bhatt, C.; Sapariya, K.; Patel, K.; Gil-González, A.-B.; Corchado,
J.M., “Deepsign: Sign Language Detection and Recognition Using Deep
Learning.” Electronics 2022,11,1780. https://doi.org/10.3390/electronics11111780
9. Sakshi Mankar, Kanishka Mohapatra, Ashwin Avate, Mansi Talavadekar, Prof.
Surendra Sutar, “Realtime Hand Gesture Recognition using LSTM model and
Conversion into Speech”, March 2022, International Journal of Innovative
Research in Technology, Volume 8 Issue 10, ISSN: 2349-6002

You might also like