Hand Sign An Incentive-Based On Object Recognition and Detection

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Volume 7, Issue 5, May – 2022 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Hand Sign: An Incentive-Based on Object


Recognition and Detection
Vikas and Rahul Mandal
Computer Science and Engineering,
Lovely Professional University, Punjab

Abstract:- The utilization of physical controllers like words or letters in order to communicate in dialects. In this
mouse, keyboards for HCI impedes the regular point of way, the change of communication through sign language
interaction as there is a solid boundary between the user into words by a robust algorithm or a model which can help
and the PC. Hence, different strategies are assembled to overcome any issues between individuals with hearing or
like speech, joint movement, and hand sign procedures speaking disabilities with the rest of the world.
to make it more natural and appealing. Over the most
recent couple of years, hand gesture recognition has been Computer vision and Machine Learning consist of a
viewed as an easy and normal procedure for human- Vision-based Hand Sign Recognition System. As Sign
machine communication. It is one of the methods for convention is one of the easiest ways for human interaction,
correspondence with PCs utilizing static and dynamic this region has numerous specialists that are chipping away
development and helps us perceive messages utilizing at it, fully intent on making a Human-Computer Interaction
them. Numerous applications have been developed and (HCI) simpler and highly affordable. Thus, the essential
upgraded for hand sign recognition. These applications objective of Sign recognition research is to make
range from cell phones to cutting-edge advanced robotics frameworks, which can recognize the human mode of
and from gaming to clinical science. In the vast communication and use them, for instance, to pass on data.
commercial and research applications, recognition of For that, vision-based hand signal points of interaction
hand signs has been performed by utilizing sensor-based require quick and incredibly vigorous hand identification,
wired installed gloves or by utilizing vision-based and motion gestures in real-time. Hand Signs are a strong
methods where skin tones, chemicals, or paperclips are human correspondence methodology with lots of possible
utilized on the hand. In any case, it is attractive to have applications and in this unique situation, we have gesture-
hand sign recognition techniques that are pertinent to a based language recognition, the specialized strategy for deaf
natural and bare hand. Today data of various and dumb individuals.[1]
researchers and now available to experiment with Hand
Sign Recognition. We have used TensorFlow, OpenCV, One of the primary objectives of Hand Sign
and Jupyter Notebook for developing the Sign Recognition is to make frameworks, which can recognize
Recognition System where we have trained our model explicit signs and use them to pass data or to control a
for various sign languages and alphabets. We have used device. Apple Inc has developed this gesture control system
Object Detection Technique to build this system where in its ecosystem where users can swipe images from one
our webcam takes the input data and trains the system Apple product to another. There are essentially two kinds of
approaches for hand Gesture Recognition firstly the Vision-
which is working in a virtual environment. Data
accuracy depends on speed. So higher the speed lowers Based approaches and other via Electronic gloves which
the accuracy and vice-versa. Using different hand signs send data in form of electric signals and it is too expensive
to advance continuous application we pick a Vision- and complex architecture. Therefore Vison-Based
based Hand Gesture Recognition System that depends framework is used as it is easy to access and manage. But it
on various shape features. causes accuracy issues since it uses light as the mode to
capture the image and if light intensity fluctuates the result
Keywords:- Human-Computer Interaction, Data Gloves, can vary. Due to its broad domain of access, it can not only
Optical Markers, Image-Based Technologies, Vision-Based be used for disabled people but can be used in entertainment
Recognition System, OpenCV, Jupyter Notebook, like gaming and animation, defenses, traffic, clinical
Tensorflow. domain, and much more. But communications via hand
signs are not standardized and can cause misinterpretation.
I. INTRODUCTION
II. MOTIVATION
Gesture-based communication is the method of
correspondence that utilizes visual ways like expressions, According to the World Health Organization, One out
hand movement, and body motion to convey meaning. It is a of every four individuals counts to 2.5 billion people across
non-verbal mode of communication. This method is a boon the globe will suffer from mild-to-profound hearing loss by
for the deaf and dumb people as it can help to convey their 2050. The criteria which WHO defines for disabling hearing
message without any difficulties to any other person who loss is >40dB in adults and >30dB in children. According to
don’t have the knowledge of Sign Language. Gesture-based the report, the cause of hearing loss is due to exposure to
communication is incredibly useful for individuals who face excessive noise, chronic ear infections, genetics, and aging.
trouble with hearing or speaking. Communication through
sign language alludes to the change of hand motions into

IJISRT22MAY162 www.ijisrt.com 241


Volume 7, Issue 5, May – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
At present, 1.1 billion people aged between 12 to 35 scene classifiers. With the fast improvement in profound
years are at risk of hearing loss due to exposure to noise. It learning, all the more incredible assets, which has had the
is really sad to say only 17% of those people use any hearing option to learn semantic, significant level, further
aid. It is due to the fact that a good hearing aid comes under highlights, has acquainted which address the issues
a premium segment and also due to the lack of awareness existing in customary structures. These models act
among people. Prices of hearing aid can go from 50k to 3L distinctively in network design, preparing technique and
and for most of the population in India, it is out of their enhancement work, and so on. In this paper, they give a
pocket. Therefore, it is necessary to build a robust and survey of profound learning by putting together image
affordable device that can reach millions of people who are recognition frameworks. The center with respect to
in need of it. It can be achieved via ML and Object ordinary nonexclusive article location models alongside
Detection techniques. That’s why we took the initiative to certain alterations and helpful stunts to further develop
work on it. [2] discovery execution. As particular explicit identification
assignments show various qualities, they additionally
III. LITERATURE REVIEW momentarily study a few explicit undertakings, including
remarkable article recognition, face location, and passerby
Understanding the existing system plays a crucial part discovery. Trial examinations are likewise given to look at
in any research paper. It provides a deeper knowledge and different techniques and make a few significant
forms the foundation for existing system optimization. determinations. At last, a few promising headings and
Therefore, we have conducted a literature review on Hand assignments are given to act as rules for future work in
gestures and their existing algorithm from various sources. both article identification and pertinent brain network-
 TensorFlow an AI framework developed by Google works based learning frameworks. [5]
on a general scale and in heterogeneous conditions. Its  In the journal “Appearance Based Recognition of
purpose has been dataflow diagrams to address American Sign Language using Gesture Segmentation”
calculation, shared state, and the tasks that change state. It Kulkarni introduced the static posture of American Sign
has the power to map the hubs of a dataflow diagram Language with the help of the Neural Network Algorithm.
across many machines in a bunch, and inside a machine For 26 alphabetic characters, she used 8 samples 5 for
across numerous computational gadgets, including training and 3 for testing in MATLAB which resulted in
multicore CPUs, universally useful GPUs, and hand- 92.78% accuracy. She used the histogram technique and
crafted ASICs known as Tensor Processing Units (TPUs). Hough Algorithm to extract features of the HSV color
This engineering has given adaptability to the application image with uniform background.[6]
designer: while in past "boundary server" plans the
administration of the shared state is incorporated into the
 In the journal “New Approach to Hand Tracking and
Gesture Recognition by a New Feature Type and HMM”
framework. It also empowers engineers to explore
Pham et al introduced Vietnamese Sign Language. It
different avenues regarding novel enhancements and
consists of a vocabulary of 29 gestures. Here, the system
preparing calculations. TensorFlow backings an
consists of 3 modules: Real-time hand tracking, training
assortment of uses, with attention on preparing and
gesture, and pseudo-2-D hidden Markov models. They
derivation on profound brain organizations. [3]
used the Tower method and skin color to track the hand
 The point of picture handling has been to assist the PC
region. [7]
with grasping the substance of a picture. OpenCV is a
library of programming capacities essentially utilized for IV. IMPLEMENTATION OF THE PROPOSED WORK
picture handling. Its point of interaction has been written
in C++ to handle the processing of the system swiftly. It A. Image Workflow and Data Examination:
gives a true standard API for PC vision applications. It has This paper contains, automatic recognition of the finger
the ability to tackle some constant issues utilizing picture spelling in the Indian sign Language. Here, the sign act as a
handling applications. [4] input to the framework. Further different advances are
 Because of object identification's close relationship with performed on the input data of sign image. Right off the bat
video analysis and image understanding, it has drawn segmentation phase is performed based on the skin tone to
many researchers in recent years. Conventional article detect the sign state. The detected area is then changed into
discovery techniques are based on high-quality highlights binary picture. Afterward, the Euclidean distance
and shallow teachable structures. Their exhibition transformation is applied on the binary image. Row and
effectively deteriorates by building complex gatherings column projection is applied on the distance changed image
which consolidate various low-level picture highlights alongside HU's minutes are utilized.[8]
with significant level settings from object finders and

IJISRT22MAY162 www.ijisrt.com 242


Volume 7, Issue 5, May – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig. 1: English alphabet in Indian Sign Language Approved by Indian Sign Language Research and Training Center (ISLRTC)

B. Build an Input Pipeline model with more efficient storage, Fast input/output, and
Tensorflow API enables us to build complex input self-contained files. The pipeline for an image model
pipelines from simple, reusable pieces.In our case, we have involves a Feature Pyramid Network(FPN) [10]which is a
used the pipeline of an image model(SSD MobileNet V2 feature extractor that takes a single-scale image of arbitrary
FPNLite 320x320 mode)[9] to collect the data from files in size as input and outputs proportionally sized(In our case it
the operating system, apply the technique that adds ‘noise’ is 320x320 resolution) feature maps at multiple levels, in a
to a dataset, or in this case file to allow individual record fully convolutional fashion. The image model is going to
confidentiality(data perturbations) to each image and merge compress the image from 604x480 from the webcam to
selected images into a TFRecords format which is a binary 320x320 in the preprocessing stage with the help of an
file format for storing data for training. To record connects image resizer and take the detection and revert back in the
our image file and annotation file which we created with the post-processing stage. Thisprocess is the backbone of the
help of a label map in our environment and help to train our convolution architecture from the MobileNet V2[9][11].

Fig. 2: MobileNet V2 Building Block

IJISRT22MAY162 www.ijisrt.com 243


Volume 7, Issue 5, May – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
C. Building the custom model from the pretrained model: details and due to the data augmentation, it gives us a more
Given the basic pipeline with the model SSD MobileNet accuracy rate.
V2 FPNLite[12]320x320 model, we have to configure our
paths and file for fine-tuning for example label map, VI. CONCLUSION AND FUTURE WORK
training, testing, and the evaluate the input(testing) which
we have created the with help of TFRecords and our We have presented a clean and simple virtual platform
subject(sign language) testing and training dataset. to build the sign detection model and implement it on the
web for its environment uses. Our method shows a
D. Training and Testing of the Model: significant improvement in our model improvement steps
Code Snippet: with various technologies such as the Tensorflow Object
Python Detection module and COCO dataset trained model in an
Tensorflow\models\research\object_detection\model_main_t anaconda Jupyter notebook with the help of an open vision
f2.py -- detection library.
model_dir=Tensorflow\workspace\models\my_ssd_mobnet
-- We have come a long way in the Sign Recognition
pipeline_config_path=Tensorflow\workspace\models\my_ss Detection but it is necessary to build a more robust and
d_mobnet\pipeline.config --num_train_steps=2000 affordable product in the segment of the Sign Detection so
Now the configured pipeline file (pipeline.config) we that it can be commercially available to everyone who needs
have created according to our dataset from the above code it. Its domain is vast and integrating it with the various
we can initiate the model_main_tf2.py file from the devices requires a lot of work. We can work with cloud
TensorFlow object detection library to our pipeline.config computing so that Sign Detection can be used in low-end
file which we created with the ssd_mobilenet v2 fpv lite devices which takes a lot of processing time. It is also
320x320 architecture and specify the training steps which is important to standardize sign language so that most people
using one batch size of training data to train the model and around the globe can utilize it. But standardizing is not easy
repeat the process 2000 times. as there are various countries with different languages which
can cause a lot of misinterpretation. We can integrate it with
E. Model Improvement: various IoT devices to make it portable and can help
Now after the initial steps, we are going to do the disabled people who need them the most.
performance tuning to improve the model by:
REFERENCES
 Adding more images of the low-performing classes to
the training set. [1.] S. R. ReddyGari, R. Rumana and R. Prema, "A Review
 Training for longer by increasing the training steps. Paper on Sign Language Recognition for The Deaf and
 By changing the architecture in this case we have used Dumb," INTERNATIONAL JOURNAL OF
SSD MobileNet V2 FPNLite 320x320. We can change ENGINEERING RESEARCH & TECHNOLOGY, vol.
it according to our needs in speed(ms) and 10, no. 10, 2021.
accuracy(mAP). [2.] "DowntoEarth," 02 03 2021. [Online]. Available:
https://www.downtoearth.org.in/news/health/every-
V. EXPERIMENT DESIGN AND RESULT 4th-person-to-suffer-hearing-loss-by-2050-who-75718.
Use of SSD MobileNet V2 FPNLite 320x320 model. [3.] M. Abadi, . P. Barham, J. Chen, P. Warden, G. Irving,
X. Zheng and Y. Yu, "TensorFlow: A system for large-
This is a pre-trained model on the COCO 2017 scale machine learning," 12th USENIX Symposium on
dataset(COCO is large-scale object detection, segmentation, Operating Systems Design and Implementation (OSDI
and captioning dataset) 16), USENIX Association, pp. 265-283, 2016.
[4.] N. K. Mahamkali and V. Ayyasamy, "OpenCV for
This model can be used on various environments such Computer Vision Applications," in Proceedings of
as the cloud platform(google-colab), various operating National Conference on Big Data and Cloud
systems, raspberry pi, web technologies like ReactJs with Computing, Trichy, 2015.
the help of TensoflowJs, and mobiles due to being its [5.] Z. Q. Zhao, P. Zheng, S. T. Xu and X. Wu, "Object
Feature Pyramid Network Lite pack. Detection with Deep Learning: A Review," IEEE
TRANSACTIONS ON NEURAL NETWORKS AND
We have also compared the SSDLite(Single Shot LEARNING SYSTEMS, pp. 1-21, 2019.
Detection) with the FPNLite(Feature Pyramid Network) and [6.] V. S. Kulkarni and S. D. Lokhande, "Appearance
find FPNLite is much better for small object detection. It Based Recognition of American Sign Language using
was due to the better feature extraction which selects the Gesture Segmentation," International Journal on
right set of features in the decisive key in order to avoid Computer Science and Engineering (IJCSE), vol. 2(3),
ambiguity. The feature extractor is more easily differentiated pp. 560-565, 2010.
from any pattern. In the case of Sign language, it’s necessary [7.] P. TheBao, N. ThanhBinh and T. Khoa, "A New
to choose features from different angles, rotation, and Approach to Hand Tracking and Gesture Recognition
inversion to extract the image and decomposed it into by a New Feature Type and HMM," Fuzzy Systems and
components with the help of depthwise convolution for the Knowledge Discovery, Sixth International Conference,
filtration. For sign language, it uses the basic shape vol. 4, pp. 3,6,14-16, 2009.
boundary information with the complete ignorance of other

IJISRT22MAY162 www.ijisrt.com 244


Volume 7, Issue 5, May – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[8.] P. Jadav and Y. I. Rokade, "Indian Sign Language
Recognition System," International Journal of
Engineering and Technology, pp. 189-196, 2017.
[9.] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov and L.
C. Chen, "MobileNetV2: Inverted Residuals and
Linear Bottlenecks," Computer Vision Foundation.
[10.] T. Y. Lin, P. Dollar´, R. Girshick, K. He, B. Hariharan
and S. Belongie, "Feature Pyramid Networks for
Object Detection," in Facebook AI Research
(FAIR),Cornell University and Cornell Tech, 2017.
[11.] "Tensorflow," [Online]. Available:
https://www.tensorflow.org/guide/data.
[12.] G. Ghaisi, T. -Y. Lin, R. Pang and Q. V. Le, "NAS-
FPN: Learning Scalable Feature Pyramid
Architecture," 2019.

IJISRT22MAY162 www.ijisrt.com 245

You might also like