Si-Lang Translator With Image Processing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Volume 6, Issue 6, June – 2021 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Si-Lang Translator with Image Processing


Ms. Livya George Asst. Professor Nimisha Elangikkal, Nina Joseph, Ronica Ross, Sharon Joy
Dept. of Computer Science and Engineering UG Students
SCET, Thrissur Dept. of Computer Science and Engineering
SCET, Thrissur

Abstract:- People having hearing and speaking disabilities Sign Language is the means of visual communication,
will have problems communicating with other people. where they use expressions, hand gestures, and body
This creates a gap between them. To avoid this problem, movements as the means of communication. Sign language is
they use some special gestures to express their thoughts. significant for people who suffer from difficulty with hearing
These gestures have different meanings. They are defined or speech. Sign Language Recognition refers to the
as “Sign Language”. Sign language is very important for transformation of words or alphabets into gestures into
deaf and mute people because they are the primary means normally spoken languages of their own locality. Thus, the
of communication between both normal people and transformation of sign language into words or alphabets can
between themselves. It is most commonly used for people help to overcome the gap existing between impaired people
with talking and hearing disorders to communicate. In and the rest of people around the world.
this application, we present a simple structure for sign
language recognition. Our system involves implementing A. PROBLEM DOMAIN
such an application that detects predefined signs through Existing systems deal with many problems. ASL
hand gestures. For the detection of gestures, we use a alphabet recognition is a challenging task due to the
basic level of hardware components like a camera, and difficulties in hand segmentation and the appearance of the
interfacing is needed. Our system would be a variations among signers. The color-based systems also suffer
comprehensive User-friendly Based application built on from many challenges such as complex background, hand
Convolutional Neural Networks. The hand gestures are segmentation, large inter-class and intra-class variations.
recognized by three main steps. First, the dataset is These all mechanisms have a practical limitation because it is
created by capturing images and these images are necessary to use costly extra hardware for getting data for
preprocessed by resizing, masking, and converting RGB sign recognition.
into grayscale images. Secondly, after creating the dataset,
we have to train the system using the Convolutional The existing dynamic sign language recognition
Neural Network, and using the trained classifier model methods still have some drawbacks with difficulties of
the given sign is recognized. Thus, the recognized sign is recognizing complex hand gestures, low recognition accuracy
displayed. We expect that the overall method of this for most dynamic sign language recognition, and potential
application may attract technophiles with an extensive problems in larger video sequence data training. The static
introduction in the sector of automated gesture and sign sign language recognition is hard to deal with the complexity
language recognition, and may help in future works in and large variations of vocabulary set in hand actions. So, it
these fields. may make a misunderstanding of some significant variations
from signers. Dynamic sign language recognition also has
Keywords:- Convolutional Neural Network ; Preprocessing ; challenges in dealing with the complexity of the sign
Sign Language ; ReLU. activities of finger motion from the large-scale body
background. Another difficulty facing is the extracting of the
I. INTRODUCTION most discriminating features from images or videos. Besides,
how to choose an appropriate classifier is also a critical factor
Around 20-45% of people all over the world suffer with for producing accurate recognition results.
hearing and speaking disabilities, and about eight per 20,000
of the global population become deaf and dump before Over a period of time, in the marketplace, there are a
learning any of the language. This creates them to start their variety of products that are capable of converting the signs to
own countries’ sign language as their foremost means of text. For example, a wearable glove is able to translate the
communication. According to topical data of the World Indian Sign Language, but the device would need to
Federation of the Deaf, there are over 70 million sign recognize both the static and dynamic gestures in ISL. While
language users in the world, with over 300 sign languages standard neural networks can classify static gestures, they
across the globe. Contrary to common thoughts, not all cannot be used for classifying dynamic gestures. In dynamic
people who speak with the help of sign languages are able to gestures, the reading at each time point is dependent on the
read and understand the normal texts as non-disabled people previous readings resulting in sequential data. Since standard
who use them, which is due to the differences in sentence neural networks require that each reading be independent of
syntax and grammar. the other readings they cannot be used for the classification of
sequential data. Under the exact context of symbolic

IJISRT21JUN525 www.ijisrt.com 654


Volume 6, Issue 6, June – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
expressions of deaf and dumb people is a challenging job in gesture. The detailed explanations of the system are given in
the real life unless it is properly specified. the coming section of this paper. Section 2 shows the overall
block diagram of the proposed system and Section 3 deals
B. PROPOSED SYSTEM with the detailed working of the system.
Communication is a big part of everyone’s day-to-day
life. People with the inability to speak use different modes to II. SYSYTEM ARCHETECURE
communicate with others, one such widely used method of
communication is sign language. Developing sign language
translation applications for deaf people can be very
beneficial, as they will be able to communicate easily with
even those who don’t understand sign language. Our project
aims at taking the basic step in interlacing the communication
gap between normal people and dumb people through an easy
sign language recognizer.

Sign language translation has always been an


engrossing area in Machine Learning. Sign languages have
multiple articulators like hands, shoulders, or even parts of
the face. Due to these facts, none of the sign language
translation techniques can have an accuracy of 100%. But the
need for them makes this field a highly researched area at all
times. by the success of deep learning technology, it has
proven to have a higher recognition accuracy than the
traditional methods. From many surveys, it has been
concluded that Deep learning can be a better solution for
many of the sign language detection shortcomings. By
considering several other facts like convenience and cost-
effectiveness, methods that are based on CNN serve better for
sign language interpretations. Fig. 1. Block diagram
Sign language translation based on CNN methods is
basically about training the model with a sign language In the training phase sign images are captured and pre-
dataset and thus creating a classifier model that detects the processed. Its output is given as the input for CNN training
signs. Different countries have various sign gestures. Sign and thus a classification model for sign recognition is
gestures can be either hand gestures alone or a combination of generated. In the recognition phase, the user shows signs in
facial expressions and hand gestures. It is difficult to segment front of the camera from which the images are captured and
the face and hand in cases where the background is noisy. pre-processed, and given to the trained classifier model. Thus,
And also differentiating facial expressions is another big task. predictions are made and the result is displayed to the user on
Hand gestures can be performed either by one hand or two the monitor.
hands. In some situations, the dual hand systems can make
confusions and affect predictions. Signs are of two types: III. WORKING EXPLANATION
Isolated sign language and Continuous sign language. Hand
tracking is a challenging task in the continuous model. The overall process can be classified into four different
Considering all these in this project, we will use an isolated phases:
one-hand gesture recognition technique and a new self-made A. Dataset creation phase
gesture dataset is created for the proposed system. In this B. Training phase
project, the recognition of hand gestures is done with the C. Sign recognition phase
usage of Convolution Neural Network.
A. Dataset creation phase
Using CNN is well-liked due to three important factors: Image capturing is an integral part of the image
 The features are learned directly by CNN. processing system. For this, we are using a good quality
webcam. First, we need to import all needed modules for
 CNN enables building on pre-existing networks.
accessing the webcam and capturing gestures. We should
 CNN produces highly accurate recognition results. give a label for each sign and then the sign capturing starts.
The captured images are given for a pre-processing step. This
This approach gets images from a good quality camera is done by resizing, masking, and color conversion. Separate
and pre-processing steps are performed for the images. These image folders are created for training and testing. For each
images are given to a convolution 2D network for feature sign, we captured 400 images of which the first 350 images
extraction. Based on the features extracted the Conv2D are stored in the training set and the remaining 50 stored in
network is trained. Thus, we get a trained classifier model testing.
that distinguishes different sign gestures. Now the system can
be used for detecting various signs shown by signers.
Respective words are displayed on the screen for each sign

IJISRT21JUN525 www.ijisrt.com 655


Volume 6, Issue 6, June – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig. 2. Data created for the sign “food” Fig. 3. Recognition of the sign “food”

B. Training phase IV. HARDWARE REQUIREMENTS


After the dataset creation, we need to train the dataset
on the convolutional neural network model. For this, we  Processor: i5
import Keras libraries and packages. The procedure for  RAM: 8 GB
building a convolutional neural network (CNN) involves four  Hard Disk: 500 GB
major steps.
Step - 1: Convolution V. SOFTWARE REQUIREMENTS
Step - 2: Pooling
Step - 3: Flattening  Python
Step - 4: Full connection Python is a general-purpose programming language that
 Convolution: This is the first layer that is used to extract may be used for a variety of tasks, including back-end
several features from the input images. We are using a development, software development, and writing system
sequential neural network model that contains a sequence scripts. Python 3.9 was used in this project.
of layers. Since we are working with sign images which is
a 2D array, here we are using Convolution 2D with 32  OpenCV
numbers of 3×3 filters and color images of 64×64 Open-Source Computer is an open-source computer
resolution. The activation function we used here is rectified vision and machine learning software library that is used to
linear activation function (ReLU). provide a common infrastructure for computer vision.
 Pooling: This is mainly done to downsize the images. We
used max pooling with a window size of 2×2.  TensorFlow
 Flattening: We need to convert the pooled images into a TensorFlow is a free open-source software library that
continuous vector form. So, we are using a flattening is used in machine learning. It is a math library based on data
function. What happens is that the images in a two- flow and differentiable programming.
dimensional array form are converted into a one-
dimensional vector form.  Keras
 Full connection: The result after the flattening step is OpenPose Keras is an open-source software library that
given as the input to this fully connected layer. These gives a python interface for the TensorFlow library.
layers are placed before the output layer. The
classification process begins to take place at this stage. VI. CONCLUSION
There is a chance for overfitting in the training dataset.
Overfitting takes place when a model works so accurately People with speaking and hearing disabilities face much
on the training data which reduces the performance of the indignation and discouragement that limit their ability to do
system when tested with new data. So, a dropout of 0.5 is day-to-day tasks. It is proven that dumb and deaf people,
used here. An activation function of SoftMax is used in especially impaired children, have a greater chance of
this case as it involves multi-class classification. behavioral and emotional disorder in the way others
discriminate against them. This causes people with such
The training dataset is given to the CNN model for disabilities to become introverts and resist social connectivity
training with an epoch of 25 and the steps in each epoch are and face-to-face communications. Unable to communicate
800. The test dataset is validated then. Now the classifier with family and friends can affect their self-esteem and this
model is ready to recognize different sign gestures. leads to isolating deaf and dumb people from society. Due to
this, they lack social interactions, and also communication
C. Sign recognition phase skill is also a huge barrier for the deaf and dumb. From this
Now the trained classifier model is loaded and Images application, we have tried to conquer some of the major
are captured live. These images are preprocessed and saved. problems faced by disabled persons in terms of talking. The
These images are given for prediction. There the image is result was that people on the other side are not able to
converted to an array for comparison. Sign with the highest communicate what these persons are trying to say or what is
match is recognized. The respective sign label is then the message that they want to convey.
returned.

IJISRT21JUN525 www.ijisrt.com 656


Volume 6, Issue 6, June – 2021 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Our application will help those who want to learn and [6]. T. Oliveira, P. Escudeiro, N. Escudeiro, E. Rocha, and
communicate in sign languages. With this, a person will F. M.Barbosa, “Automatic sign language translation to
quickly adapt various gestures and their meaning as per the improve communication,” in Proc. IEEE Global Eng.
predefined signs. They can quickly learn which gesture to use Educ. Conf. (EDUCON), Apr. 2019, pp. 937–942
for each sign. A user need not be a literate person if they [7]. P. Escudeiro et al., “Virtual sign—A real time bi-
know the action of the gesture, they can quickly form the directional translator of portuguese sign language,”
gesture and an appropriate assigned character will be shown Procedia Comput. Sci., vol. 67, pp. 252–262, Sep. 2015.
onto the screen. These predefined gestures are formed by [8]. J. Ulisses, T. Oliveira, P. M. Escudeiro, N. Escudeiro,
training the system with CNN. Firstly, a dataset is created. and F.M.Barbosa, “ACE assisted communication for
The images of gestures are captured. education: Architecture to support blind & deaf
communication,” in Proc. IEEE Global Eng. Educ.
These captured images are preprocessed. About 400 Conf. (EDUCON), Apr. 2018, pp. 1015–1023.
images are taken for each gesture in which 350 images are for [9]. T. Reuters. (2018). Thomson Reuters Web of Science,
the training set and the rest 50 are for the test set. After Web of Science.
forming the dataset, the training is done for recognizing the Available:http://apps.webofknowledge.com
gestures. The training is done through the Convolutional [10]. M. Taskiran, M. Killioglu, and N. Kahraman, “A real-
Neural Network. In CNN, the process is run through mainly time system for recognition of American sign language
four layers: 1.convolution, 2.pooling, 3.flattening and 4.full by using deep learning,” in Proc. 41st Int. Conf.
connection. After training the gestures, the trained classifier Telecommun. Signal Process. (TSP), Jul. 2018, pp. 1–5.
model is loaded. Images captured as live will undergo [11]. A. Karpathy, G. Toderici, S. Shetty, T. Leung, R.
resizing, masking. These images are then given for Sukthankar, and L. Fei-Fei, “Large-scale video
prediction. These images are converted to an array. Thus, the classification with convolutional neural networks,” in
sign with the highest match is recognized. Finally, the output Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
will be displayed. As a result, the suggested algorithm is Recognit., June. 2014, pp. 1725–1732. athindello..
robust. Experiments have proven that the procedure works [12]. E. K. Kumar, P. V. V. Kishore, A. S. C. S. Sastry, M. T.
and that it produces the desired result, as well as a proper K. Kumar, and D. A. Kumar, “Training CNNs for 3-D
alert. sign language recognition with color texture coded joint
angular displacement maps,” IEEE Signal Process.
ACKNOWLEDGMENT Lett., vol. 25, no. 5, pp. 645–649, May 2018.
[13]. Y. Liao, P. Xiong, W. Min, W. Min, and J. Lu,
We want to offer our heartfelt gratitude and heartfelt “Dynamic sign language recognition based on video
thanks to everyone who helped us to make this initiative a sequence with BLSTM-3D residual networks,” IEEE
huge success. We thank the almighty God for all the benefits Access, vol. 7, pp. 38044–38054, 2019.
he has granted us. Prof. Livya George, our guide, deserves [14]. T. Yuan et al., “Large scale sign language
huge thanks. We want to thank our Principal, Dr. Nixon interpretation,” in Proc. 14th IEEE Int. Conf. Autom.
Kuruvila, and our Department Head, Dr. M Rajeswari, for Face Gesture Recognit. (FG), May 2019, pp. 1–5.
their unwavering support throughout the project. [15]. J. Huang, W. Zhou, Q. Zhang, H. Li, and W. Li,
“Video-based sign language recognition without
REFERENCES temporal segmentation,” in Proc. 32nd AAAI Conf.
Artif. Intell., 2018.
[1]. C. Dong, M. C. Leu, and Z. Yin, “American Sign [16]. G. A. Rao, P. V. V. Kishore, A. S. C. S. Sastry, D. A.
Language alphabet recognition using Microsoft Kinect,” Kumar, and E. K. Kumar, “Selfie continuous sign
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. language recognition with neural network classifier,” in
Workshops (CVPRW), June. 2015, pp. 44–52. Proc. 2nd Int. Conf. Micro-Electron., Electromagn.
[2]. J.T.E Richardson, L. Barnes, and J. Telecommun., vol. 434. Singapore: Springer, Sep. 2018,
Fleming,“Approaches to studying and perceptions of pp. 31–40.
academic quality in deaf and hearing students in higher [17]. J. Fellinger, D. Holzinger, and R. Pollard, “Mental
education,” Deafness Educ. Int., vol. 6, no. 2, pp. 100– health of deaf people,” Lancet, vol. 379, no. 9820, pp.
122, 2004. 1037–1044, 2012. [2] (2019). Word Federation of the
[3]. S. Riddell and E. Weedon, “Disabled students in higher Deaf, Our Work. Accessed: Sep. 13, 2019. [Online].
education: Discourses of disability and the negotiation Available: http://wfdeaf.org/ our-work/
of identity,” Int. J. Educ. Res., vol. 63, pp. 38–46, Dec. [18]. Z. Zhang, “Microsoft Kinect sensor and its effect,”
2014. IEEE MultiMedia, vol. 19, no. 2, pp. 4–10, Feb. 2012
[4]. S. C. Daroque, “Alunos surdos no ensino superior: Uma [19]. P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to
discussão necessária,” Comunicações, vol. 19, no. 2, pp. Data Mining. Reading, MA, USA: Addison-Wesley,
23–32, Jul./Dec. 2012 2005.
[5]. R. Kheir and T. Way, “Inclusion of deaf students in [20]. M. Mohandes, M. Deriche, and J. Liu, “Image-based
computer science classes using real-time speech and sensor-based approaches to arabic sign language
transcription,” in Proc. 12th Annu. Conf. Innov. recognition,” IEEE Trans. Human Mach. Syst., vol. 44,
Technol. Comput. Sci. Educ. (ITiCSE), 2007, pp. 261– no. 4, pp. 551–557, Aug. 2014.
265.

IJISRT21JUN525 www.ijisrt.com 657

You might also like