Sign Language Paper

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/357622360

Real Time Sign Language Detection

Article in International Journal for Modern Trends in Science and Technology · January 2022
DOI: 10.46501/IJMTST0801006

CITATIONS READS

14 23,698

4 authors, including:

Editor Ijmtst
International Journal for Modern Trends in Science and Technology
452 PUBLICATIONS 415 CITATIONS

SEE PROFILE

All content following this page was uploaded by Editor Ijmtst on 06 January 2022.

The user has requested enhancement of the downloaded file.


As per UGC guidelines an electronic bar code is provided to seure your paper
International Journal for Modern Trends in Science and Technology, 8(01): 32-37, 2022
Copyright © 2022 International Journal for Modern Trends in Science and Technology
ISSN: 2455-3778 online
DOI: https://doi.org/10.46501/IJMTST0801006
Available online at: http://www.ijmtst.com/vol8issue01.html

Real Time Sign Language Detection

Aman Pathak| Avinash Kumar|Priyam|Priyanshu Gupta|Gunjan Chugh

Department of Information Technology, Dr. Akhilesh Das Gupta Institute of Technology and Management, New Delhi, India.

To Cite this Article


Aman Pathak, Avinash Kumar, Priyam, Priyanshu Gupta and Gunjan Chugh. Real Time Sign Language Detection.
International Journal for Modern Trends in Science and Technology 2022, 8 pp. 32-37.
https://doi.org/10.46501/IJMTST0801006

Article Info
Received: 24 November 2021; Accepted: 21 December 2021; Published: 31 December 2021

ABSTRACT
A real time sign language detector is a significant step forward in improving communication between the deaf and the general
population. We are pleased to showcase the creation and implementation of sign language recognition model based on a
Convolutional Neural Network(CNN).We utilized a Pre-Trained SSD Mobile net V2 architecture trained on our own dataset in
order to apply Transfer learning to the task. We developed a robust model that consistently classifies Sign language in majority
of cases. Additionally, this strategy will be extremely beneficial to sign language learners in terms of practising sign language.
Various human-computer interface methodologies for posture recognition were explored and assessed during the project. A series
of image processing techniques with Human movement classification was identified as the best approach. The system is able to
recognize selected Sign Language signs with the accuracy of 70-80% without a controlled background with small light.

KEYWORDS: CNN, Pre-Trained SSD Mobile net V2, Sign Language

1.INTRODUCTION model in which web camera is used for capturing


Sign language is largely used by the disabled, and there images of hand gestures which is done by open cv.
are few others who understand it, such as relatives, After capturing images, labelling of images are required
activists, and teachers at SekolahLuarBiasa (SLB). and then pre trained model SSD Mobile net v2 is used
Natural gestures and formal cues are the two types of for sign recognition. Thus, an effective path of
sign language[1]. The natural cue is a manual communication can be developed between deaf and
(hand-handed) expression agreed upon by the user normal audience. Three steps must be completed in real
(conventionally), recognised to be limited in a particular time to solve our problem:
group (esoteric), and a substitute for words used by a 1. Obtaining footage of the user signing is step
deaf person (as opposed to body language). A formal one (input).
gesture is a cue that is established deliberately and has 2. Classifying each frame in the video to a
the same sign.
language structure as the community's spoken 3. Reconstructing and displaying the most
language.[2] likely Sign from classification scores
More than 360 million of world population suffers from (output).
hearing and speech impairments [3]. Sign language
detection is a project implementation for designing a

32 International Journal for Modern Trends in Science and Technology


This topic poses a big difficulty in terms of computer With Advancement in Deep Learning and neural
vision because of a variety of factors, including: networks people also implementing them in improving
the detection system. In reference [7], the ASL is
 Environmental disturbance (e.g., recognised using a variety of feature extraction and
lighting sensitivity, background, and machine learning techniques, including the Histogram
camera position) technique, the Hough transform, OTSU's segmentation
 Closure (e.g., some fingers, or an entire algorithm, and a neural network.
hand can be out of the field of view) Image processing is concerned with computer
 Sign boundary detection (when a sign processing the images which include collecting,
ends and the next begins). processing, analysing and understanding the results
obtained. Computer vision necessitates a combination
This model uses pipeline that takes input of low-level image processing to improve image quality
through a web camera from a user who is signing a (e.g., removing noise and increasing contrast) and
gesture and then by extracting different frames of video, higher-level pattern recognition and image
it generates sign language possibility for each gesture. understanding to recognise features in the image.

2. RELATED WORK 3. REVIEW OF HAND GESTURE AND SIGN


With the continuous development in Information LANGUAGE RECOGNITION TECHNIQUES:
technology the ways of interaction between computers
Methods like identifying hand motion trajectories for
and Humans have also evolved.
distinct signs and segmenting hands from the
There has been a lot of work done in this field to help
background to forecast and string them into sentences
deaf and able-bodied people communicate more
that are both semantically correct and meaningful are
effectively.
used in sign language recognition. Furthermore, motion
Because sign language is a collection of gestures and
modelling, motion analysis, pattern identification, and
postures, any effort to recognise sign language falls
machine learning are all issues in gesture recognition.
under the purview of human computer interaction.[4]
Handcrafted parameters or parameters that are not
Sign Language Detection is categorised in two parts.
manually set are used in SLR models. The model's
The first category is the Data Glove approach, in which
ability to do the categorization is influenced by the
the user wears a glove with electromechanical devices
model's background and environment, such as the
attached to digitise hand and finger motion into
illumination in the room and the pace of the motions.
processable data.
Due to changes in views, the gesture seems distinct in
The disadvantage of this method is that you must
2D space.
always wear extra gear and the results are less accurate.
In contrast, the second category, computer-vision-based There are several ways for recognising gestures
approaches, require only a camera, allowing for natural which includes sensor-based and vision-based systems.
interaction between humans and computers without the Sensor-equipped devices capture numerous parameters
use of any additional devices. such as the trajectory, location, and velocity of the hand
Apart from various developments in ASL field, in the sensor-based approach. On the other
Indian people started putting work in ISL. Like Image hand,vision-based approaches are those in which
key pointdetection using SIFT, and then comparing the images of video footages of the hand gestures are
key point of a new image to the key points of standard used.[8] The steps followed for achieving the sign
images per alphabet in a database to classify the new language recognition are:
image with the label of the closest match.[5]
Similarly various work has been put into recognising  The Camera used in the sign language recognition
the edges efficiently one[6] of the idea was to use a system:The proposed sign language recognition
combination of the colour data with bilateral filtering in system is based on frame captured by a web camera
the depth images to rectify edges.

33 International Journal for Modern Trends in Science and Technology


on a laptop or PC.Using the OpenCV Python 5. ALGORITHM USED:
computer library image processing is done.  Convolutional Neural Network:
 Capturing Images: Multiple images of different sign A Convolutional Neural Network (ConvNet/CNN)
language symbols were taken from various angles is a Deep Learning system that can take an input
and varying light conditions in order to achieve picture and assign importance (learnable weights
better accuracy through a large dataset. and biases) to various aspects/objects in the image,
 Segmentation: As the capturing part is done, further as well as differentiate between them. The amount
a particular region is selected from the entire image of pre-processing required by a ConvNet is much
which has the sign language symbol that is to be less than that required by other classification
predicted. Bounding boxes are enclosed for the sign techniques. ConvNets can learn these
to be detected. These boxes should be tight around filters/characteristics with adequate training,
the region which is to be detected from the image. whereas simple techniques need hand-engineering
Specific names were given to the hand gestures of filters.[9]
which were labelled.LabelImg tool was used for the ConvNets are multilayer artificial neural networks
labelling part. designed to handle 2D or 3D data as input. Every
 Selection of images for the training and testing layer in the network is made up of several planes
purpose that may be 2D or 3D, and each plane is made up of
 Creating TF Records: Record files were created from numerous independent neurons composition,
the multiple training and testing images. where nearby layer neurons are linked but same
 Classification:Machine learning approaches can be layer neurons are not.[9]
classified as supervised or unsupervised. Supervised A ConvNet can capture the Spatial and Temporal
machine learning is a technique for teaching a aspects of an image by applying appropriate filters.
system to detect patterns in incoming data so that it Furthermore, reducing the number of parameters
can predict future data. Supervised machine involved and reusing weights resulted in the
learning uses a collection of known training data and architecture performing better fitting to the picture
applies it to labelled training data to infer a collection. ConvNet's major goal is to make image
function.[8] processing easier by extracting relevant
characteristics from images while preserving
4.DESIGN AND IMPLEMENTATION: crucial information that is must for making
 Dataset: For this project, a user defined dataset is accurate predictions.This is highly useful for
used. It is a collection of over 2000 images, around developing an architecture that is not just capable
400 for each of its classes.This dataset contains a of collecting and learning characteristics but also
total of 5 symbols i.e.,Hello, Yes, No, I Love You and capable of handling massive volumes of data.[9]
Thank You,which is quite useful while dealing with
the real time application.  Overall Architecture:

CNNs are made up of three different sorts of layers.


There are three types of layers: convolutional layers,
pooling layers, and fully-connected layers. A CNN
architecture is generated when these layers are
layered. Figure 1 depicts a simple CNN architecture
for MNIST classification.[10]

Fig. 1 Sign Symbols

34 International Journal for Modern Trends in Science and Technology


 Transfer Learning:

Transfer learning is a concept that describes a process


in which a model that has been trained on one problem
is applied in some way to a second, related problem.
Transfer learning is a deep learning technique that
includes training a neural network model on an issue
that is similar to the one being addressed before

Fig. 2 Simple CNN architecture applying it to the problem at hand. Using one or more
layers from the learnt model, a new model is then

6.TOOLS USED: trained on the problem of interest.14]

 TensorFlow:It is an open-sourceartificial
 SSD Mobile net V2:
intelligence package that builds models using data
flow graphs. It enables developers to build
The Mobile Net SSD model is a single-shot multibox
large-scale neural networks with several layers. detection (SSD) network that scans the pixels of an
TensorFlow is mostly used for classification, image that are inside the bounding box coordinates and
perception, comprehension, discovery, prediction,
class probabilities to conduct object detection. In
and creation.[11]
contrast to standard residual models, the model's
 Object Detection API:It is an open source
architecture is built on the notion of inverted residual
TensorFlow API to locate objects in an image and structure, in which the residual block's input and output
identify it.
are narrow bottleneck layers. In addition, nonlinearities
 Open CV:OpenCV is an open-source, highly
in intermediate layers are reduced, and lightweight
optimised Python library targeted at tackling depthwise convolution is applied. The TensorFlow
computer vision issues. It is primarily focused on
object detection API includes this model.[15]
real-time applications that provide computational
efficiency for managing massive volumes of data.  Result:
[12] It processes photos and movies to recognise Table-1 analysis
items, people, and even human handwriting
Images True False Accuracy
 LabelImg:LabelImg is a graphical image
used to Result Result (%)
annotation tool that labels the bounding boxes of
train
objects in pictures.[13]
50 23 27 46
100 52 48 52
200 145 55 72.5
500 432 68 86.4

Table-2 Sign Recognition


Gesture Name Accuracy (%)
Yes 88.7
No 88.6
Fig. 3 Label Image[13] Thank You 84.1
Hello 91.0
I Love You 82.4
7.MODEL ANALYSIS AND RESULT:

The model was trained using the technique of


transfer learning and a pre-trained model
SSDmobile net v2 was used.

35 International Journal for Modern Trends in Science and Technology


9.CONCLUSION:

The main purpose of sign language detection system


is providing a feasible way of communication between a
normal and dumb people by using hand gesture. The
proposed system can be accessed by using webcam or
any in-built camera that detects the signs and processes
them for recognition.
Graph-1 Accuracy/Images From the result of the model, we can conclude that
the proposed system can give accurate results under
controlled light and intensity. Furthermore, custom
gestures can easily be added and more the images taken
at different angle and frame will provide more accuracy
to the model. Thus, the model can easily be extended on
a large scale by increasing the dataset.
The model has some limitation such as environmental
factors like low light intensity and uncontrolled
Fig-4 Real time Sign Detection background which cause decrease in the accuracy of the
detection. Therefore, we’ll work next to overcome these
8.APPLICATION AND FUTURE SCOPE: flaws and also increase the dataset for more accurate
results.
 Application:

REFERENCES
o The dataset can easily be extended and customized
[1] Martin D S 2003 Cognition, Education, and Deafness: Directions
according to the need of the user and can prove to be
for Research and Instruction (Washington: Gallaudet University
an important step towards reducing the gap of Press)
communication for dumb and deaf people. [2] McInnes J M and Treffry J A 1993 Deaf-blind Infants and
Children: A Developmental Guid (Toronto : University of
o Using the sign detection model, meetings held at a
Toronto Press)
global level can become easy for the disabled people
[3] http://www.who.int/mediacentre/factsheets/fs300/en/
to understand and the value of their hard work can [4] Harshith.C, Karthik.R.Shastry, Manoj Ravindran, M.V.V.N.S
be given. Srikanth, Naveen Lakshmikhanth, “Survey on various gesture
recognition Techniques for interfacing machines based on
o The model can be used by any person with a basic
ambient intelligence”, International Journal of Computer Science
knowledge of tech and thus available for everyone.
& Engineering Survey (IJCSES) Vol.1, No.2, (November 2010)
o This model can be implemented at elementary [5] SAKSHI GOYAL, ISHITA SHARMA, S. S. Sign language
school level so that kids at a very young age can get recognition system for deaf and dumb people. International
Journal of Engineering Research Technology 2, 4 (April 2013).
to know about the sign language.
[6] Chen L, Lin H, Li S (2012) Depth image enhancement for Kinect
using region growing and bilateral filter. In: Proceedings of the
 Future scope: 21st international conference on pattern recognition (ICPR2012).
o The implementation of our model for other sign IEEE, pp 3070–3073
[7] Vaishali.S.Kulkarni et al., “Appearance Based Recognition of
languages such as Indian sign language or
American Sign Language Using Gesture Segmentation”,
American sign language.
International Journal on Computer Science and Engineering
o Further training the neural network to (IJCSE), 2010
efficiently recognise symbols. [8] Cheok, M. J., Omar, Z., &Jaward, M. H. (2019). A review of hand
gesture and sign language recognition techniques. International
o Enhancement of model to recognise
Journal of Machine Learning and Cybernetics, 10(1), 131-153
expressions.
[9] Al-Saffar, A. A. M., Tao, H., & Talab, M. A. (2017, October).
Review of deep convolution neural network in image
classification. In 2017 International Conference on Radar,
__________________________________

36 International Journal for Modern Trends in Science and Technology


Antenna, Microwave, Electronics, and Telecommunications
(ICRAMET) (pp. 26-31). IEEE.
[10] Kiron Tello O’Shea, An Introduction to Convolutional Neural
Networks, (2015 November). Research GATE
[11] https://www.exastax.com/deep-learning/top-five-use-cases-of-te
nsorflow/
[12] https://en.m.wikipedia.org/wiki/OpenCV
[13] https://github.com/tzutalin/labelImg
[14] [Page 538
https://www.amazon.com/Deep-Learning-Adaptive-Computatio
n-, Deep Learning, 2016.]
[15] https://machinethink.net/blog/mobilenet-v2/ by Matthijs
Hollemans [22 April 2018]

37 International Journal for Modern Trends in Science and Technology

View publication stats

You might also like