Hand Gestures Classification and Image Processing Using Convolution Neural Network Algorithm

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Hand Gestures Classification and Image Processing


using Convolution Neural Network Algorithm
Dr.Sk. Mahboob Basha1,
H.C.Srivalli2,
B.Jahnavi3,
C.V.Basanth4
1
Professor, NRI Institute of Technology, A.P, India-521212
2
UG Scholar, Dept. of IT, NRI Institute of Technology, A.P-521212
3
UG Scholar, Dept. of IT, NRI Institute of Technology, A.P-521212
4
UG Scholar, Dept. of IT, NRI Institute of Technology, A.P-521212

Abstract:- The deaf community communicates primarily show how affordable this work is, the experiments are
through the use of sign language. In general, sign run on a basic CPU machine as opposed to cutting-edge
language is much more figuratively formable for GPU hardware. The proposed system outperformed
communication, which helps to advance and broaden the several cutting-edge techniques with a recognition
conversation. The ASL is regarded as the universal sign accuracy of 99.82%.
language, although there are numerous variations and
other sign systems used in various parts of the world. Keywords:- Sign language, ASL(American Sign Language)
There are fewer major ideas and concepts assigned. , Deaf Community , Gestures , Human Computer Interaction
There are fewer principal ideas and assigned , Hand Gesture Recognition, CNN( Convolution Neuron
appearances in sign language. The main goal of this Network), SVM(Support Vector Machine)
effort is to create a system of sign language that will
benefit the deaf community and speed up the process of I. INTRODUCTION
communication. The project's main objective is to build
a classifier-based software model for sign language As the standard computer input devices that have been
recognition. The strategy for this is to identify the developed in the realm of technology have not altered much,
gestures and use classifiers to assess the attributes. this is due to the fact that these gadgets work well. As of
Principal component analysis is used for gesture today, as computers become more prevalent in our daily
recognition, and a classifier is used to assess the gesture lives, it is becoming simpler and easier to introduce new
features. The hand gesture has been used as a form of hardware and software. We can only use keyboards, light
communication since the beginning of time. Recognition pens, and these days, keypads, to interact with computers.
of hand gestures makes human-computer interaction Although they are very popular, these gadgets have speed
(HCI) more versatile and convenient. Because of this, limitations. Vision-based interfaces are becoming more
accurate character identification is crucial for a tranquil common among users as technology advances, giving
and error-free HCI. The majority of the hand gesture computers the ability to see. As a result, this evolution will
recognition (HGR) systems now in use have only taken a prompt the creation of new device interfaces, which will
few straightforward discriminating motions into account enable these devices to process commands that cannot be
for recognition performance, according to a literature entered into their present input mechanisms. Man-machine
review. This study uses robust modelling of static signs interaction, often known as human-computer interaction, is
in the context of sign language recognition by using the term used to describe the relationship or interaction
convolutional neural networks (CNNs) based on deep between humans and machines. This is a reference to how
learning. In this study, CNN is used for HGR, which people and machines interact. When creating the HCI
takes into account both the ASL alphabet and numbers model, it is important to keep in mind the two key traits of
simultaneously. The CNNs utilised for HGR are functionality and usability.
emphasized, along with their benefits and drawbacks.
Modified Alex Net and modified VGG16 models for Usability of the system is its ability to do the specific
classification form the foundation of the CNN activities that the user accomplishes with accuracy, whereas
architecture. After feature extraction, a multiclass functionality of the system is the services that are supplied
support vector machine (SVM) classifier is built, which to the user of the system by the system. The use of gestures
is based on modified pre-trained VGG16 and Alex Net for inter-human communication, including sign language, is
architectures. To achieve the highest recognition also utilised for inter-human communication. Recognition of
performance, the results are assessed using various layer hand signals has recently attracted a lot of attention.
features. Both the leave-one-subject-out and a random Applications for hand gesture detection include operating
70-30 method of cross-validation were used to test the machines and removing the mouse from video games. The
accuracy of the HGR schemes. This work also sets of motions that make up sign language are its most
emphasises how easily each character can be recognised important structures. Every gesture has a distinct meaning in
and how similar their motions are to one another. To sign language.

IJISRT23APR144 www.ijisrt.com 800


Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
In essence, sign language serves as a means of  Python
communication between hearing and deaf people. Using the Python is an object-oriented, dynamically semantic,
sensors in the data gloves, sign language has become more high-level, interpreted programming language. Rapid
prevalent as a result of technological innovation. Application Development, as well as usage as a scripting or
Recognizing the representations of human hand motions is glue language to bring existing components together, find its
the major goal of sign language. Early on, recognition was high-level built-in data structures, coupled with dynamic
challenging, but as technology has advanced, it has become type and dynamic binding, to be particularly appealing.
simpler and more accurate. Communication is a necessary Python's straightforward syntax promotes readability, which
ability for community members to express themselves and lowers the expense of software maintenance. Python's
live in harmony. support for modules and packages promotes the modularity
of programmes and the reuse of code. Both the
While members of the community can communicate comprehensive standard library and the Python interpreter
via verbal and auditory languages, individuals who do not are freely distributable and are accessible in source or binary
have these abilities can express themselves through sign form for all popular systems from fig [1] .
language, which is a visual language. Sign language is a
communication strategy in which persons with hearing
challenges or issues communicate using body motions such
as hands, arms, and gestures. Because most people in our
culture are unfamiliar with the sign language used by people
with hearing problems to communicate and express
themselves, it is clear that these individuals struggle to
express themselves in their everyday lives.

The scientific community has long recognised this


need and has been working to create sign language devices
to assist hearing-impaired persons in communicating.
Despite the fact that the development of such technologies
might be challenging owing to the existence of various sign
languages and a lack of funding, large annotated datasets, as
well as recent breakthroughs in AI and machine learning,
have all significantly contributed to the automation and
improvement of such technologies.

The objective of sign language recognition (SLR) is to Fig 2 Features of Python Language
create sophisticated machine learning algorithms that
reliably categorise human articulations into individual signs  Machine Learning
or continuous phrases. Currently, the absence of large Basic and advanced machine learning principles are
annotated datasets restricts the accuracy and generalisation covered in the machine learning lesson. Both students and
capabilities of SLR algorithms, as well as the difficulty in professionals in the workforce can benefit from our machine
recognising sign boundaries in continuous SLR scenarios. learning training from fig[3].

II. TECHNOLOGIES USED Machine learning, a developing technique, allows


computers to automatically learn from historical data.
Machine learning is the process of creating mathematical
models using historical data or information and generating
predictions using a variety of techniques. It is being utilised
for many different tasks, including image identification,
speech recognition, email filtering, Facebook auto-tagging,
recommender systems, and more.

This machine learning lesson provides an overview of


the field as well as a variety of machine learning
approaches, including supervised, unsupervised, and
reinforcement learning. Regression and classification
models, clustering techniques, hidden Markov models, and
other sequential models will all be covered.

Fig 1 Technologies used

IJISRT23APR144 www.ijisrt.com 801


Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig 4 Working of CNN model

III. SOFTWARE REQUIREMENT


SPECIFICATION

 Functional Specifications:

 This feature will translate the recognized gesture into


Fig 3 Features of Machine Learning the textual meaning of the gesture and display the
translated text to the user from fig[3].
 CNN
Convolutional Neural Networks (CNNs) are deep  System will recognize the appropriate movement of the
neural networks that are used to analyze data with a hands and will search its database to match the
topology resembling a grid, such as photographs that may be movement with the pre-defined gestures. After
represented as a 2-D array of pixels. Convolution, Non- matching system, will add the meaning of the sign to
Linearity (Relu), Pooling, and Classification are the four the opened file.
fundamental processes that make up a CNN model (Fully-
connected layer ).
 Normal Flow of Events:
Convolution: Convolution is a technique used to take
 User selects the communication mode from the main
features out of a picture. By learning picture attributes from
menu
tiny squares of input data, it retains the spatial connection
 User opens a file User performs the movement
between pixels. Relu generally comes after it from fig[4].
 Gesture is recognized and has a match
 The text is added to the file and displayed to the user
Relu: It is an operation that zeroes out all negative
and also speech out the words or sentence.
pixel values in the feature map on an element-by-element
basis. Its objective is to make a convolution network non-
 Sentence Level Translation:
linearfrom fig[4].
Sign languages are distinct languages with their own
linguistic frameworks, just like spoken languages. Only
Pooling: Downsampling, commonly known as
when SEE (Signed Exact English) is available can a system
pooling, lowers the dimensionality of each feature map
that detects individual signs one at a time translate. A sign
while preserving crucial information from fig[4].
that appears later in the statement, such as when the signed
sentence turns out to be a question, may radically alter the
A multi-layer perceptron with a fully linked layer that
translation and the words employed. Additionally, making
employs the softmax function in the output layer. Its goal is
the signer stop and wait for the outcome after each sign
to classify the input image into distinct classes using training
could lead to a bad user experience.
data and characteristics from previous layers.In order to
build a CNN model, these layers are combined. A fully
linked layer makes up the final layer.  Tracking the Face:
All non-manual signals are important for tracking the
face when signing. The face is particularly significant
because it offers a variety of sign sites and because facial
expressions have grammatical connotations that must be
taken into account. A true sign language translation cannot
ever come from a project that ignores the face.

IJISRT23APR144 www.ijisrt.com 802


Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Engagement of the Deaf Community:  Limitations of Existing System:
No good product can be made without the input of the
ultimate end customers, and this is especially true in this  In existing system its restricted to only 10 voice
particular sector. This is one true evidence that differentiates announcements it may reduce product capacity
dreamers from serious competitors. The team will quickly  One of the major problem of the existing system is
understand the two reasons mentioned above among many Dumb person should always carry the hardware with
others if there are deaf participants in the project. The him
inclusion of deaf team members, in our opinion, is the key to  User cant do any other work with flex sensor on fingers
overall success. and also sensors placed straight
 The controller may think that the user is giving
 Non Functional Specifications: command and finally it may result in unwanted results
and less hardware lifetime
 Only one person should use the Leap motion controller
to do ISL. A dual-core 2.66 GHz or faster processor, V. PROPOSED SYSTEM
either 32 bit (x86) or 64 bit (x64), should be able to
operate our system. It shouldn't have more RAM than 2 A gesture or sign picture should be sent to the system
GB. in the proposed system by a dumb or inept person. The
 For the time being, we'll control the input stream using system evaluates the sign input using a MATLAB image
a Leap motion controller and write C code in Visual processing technique before classifying the input to the
Studio. Real-time continuous gesture recognition recognized identity. The machine then begins the speech
techniques will be the foundation of the software media when the input picture matches the supplied datasets.
architecture. A written version of the output will also be shown. The
concept of converting sign language into speech and text has
 Performance and scalability. How fast does the system a functional prototype. This work develops an application
return results? How much will this performance change that will aid society in enhancing communication between
with higher workloads? deaf and mute people using an image processing technique.
 Portability and compatibility. Which hardware We made advantage of the free, publicly available American
operating systems, and browsers, along with their Sign Language (ASL) data set from MNIST, which is
versions does the software run on? Does it conflict with available on Kaggle.This dataset contains 7172 test photos
other applications and processes within these and 27455 training images, each of which is a square of 28
environments? pixels by 28 pixels. The 25 classes of the English alphabet,
 Reliability, maintainability, availability. How often from A to Y, are represented by these images. (Z has no
does the system experience critical failures? How much class designations because of gestures.) The Kaggle dataset's
time does it take to fix the issue when it arises? And training data is given in CSV format and has 27455 rows
how is user availability time compared to downtime and 785 columns. The first column of the dataset contains
 Security. How well are the system and its data protected the class name for the image, while the other 784 columns
against attacks? comprise the 28 × 28 pixels. The test set of data follows the
 Localization. Is the system compatible with local same paradigm.
specifics?
 Usability. How easy is it for a customer to use the  Advantages of Proposed System:
system?
 When comparing with existing system user can give
IV. EXISTING SYSTEM more signs
 The module provides two way communications which
A module for the existing system's dumb person helps in easy interaction between normal people and
feature was created utilizing flex sensors, which the user's disables
hand is attached to. The flex sensor on this module responds  Easy to interface
to each finger bend separately. That value is used by the  Flexible to work.
controller to start speaking. The APR Kit has recorded a
separate voice for each flex sensor, and it will play that
voice for each indication. Additionally, in another system
now in use, precision is only achieved by working on a
select few alphabets rather than words or complete phrases.

IJISRT23APR144 www.ijisrt.com 803


Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
VI. SYSTEM ARCHITECTURE REFERENCES

[1]. https://peda.net/id/08f8c4a8511
[2]. K. Bantupalli and Y. Xie, "American Sign
Language Recognition using Deep Learning and
Computer Vision," 2018 IEEE International
Conference on Big Data (Big Data), Seattle, WA,
USA, 2018, pp. 4896-4899, doi: 10.1109/
BigData.2018.8622141.
[3]. CABRERA, MARIA & BOGADO, JUAN & FermÃ-
n, Leonardo & Acuña, Raul & RALEV, DIMITAR.
(2012). GLOVE-BASED GESTURE RECOGNI-
TION SYSTEM. 10.1142/9789814415958_0095.
[4]. He, Siming. (2019). Research of a Sign Language
Translation System Based on Deep Learning. 392-
396. 10.1109/AIAM48774.2019.00083.
[5]. International Conference on Trendz in Information
Sciences and Computing (TISC). : 30-35, 2012.
[6]. Herath, H.C.M. & W.A.L.V.Kumari, & Senevirathne,
W.A.P.B & Dissanayake, Maheshi. (2013). IMAGE
BASED SIGN LANGUAGE RECOGNITION
SYSTEM FOR SINHALA SIGN LANGUAGE
[7]. M. Geetha and U. C. Manjusha, , A Vision Based
Recognition of Indian Sign Language Alphabets and
Numerals Using B-Spline Approximation, Inter-
national Journal on Computer Science and
Engineering (IJCSE), vol. 4, no. 3, pp. 406-415.
2012.
[8]. Pigou L., Dieleman S., Kindermans PJ., Schrauwen
B. (2015) Sign Language Recognition Using
Convolutional Neural Networks. In: Agapito L.,
Bronstein M., Rother C. (eds) Computer Vision –
ECCV 9.2014 Workshops. ECCV 2014. Lecture
Notes in Computer Science, vol 8925. Springer,
Fig 5 System Architecture Cham. https://doi.org/10.1007/978-3-319-16178-
5_40
 Future Scope: [9]. Escalera, S., Baró, X., Gonzà lez, J., Bautista, M.,
In order to expand the model's recognition of Madadi, M., Reyes, M., . . . Guyon, I. (2014).
alphabetical features while maintaining high accuracy, we ChaLearn Looking at People Challenge 2014: Dataset
intend to use more alphabets in our datasets. In order to help and Results. Workshop at the European Conference
blind people, we would also like to improve the system by on Computer Vision (pp. 459-473). Springer, . Cham.
incorporating speech recognition. In order to communicate, [10]. Huang, J., Zhou, W., & Li, H. (2015). Sign Language
more than 70 million deaf people worldwide use sign Recognition using 3D convolutional neural networks.
language. They can learn jobs, access resources, and IEEE International Conference on Multimedia and
participate in their communities by using sign language. Expo (ICME) (pp. 1-6). Turin: IEEE.
VII. CONCLUSION

In the fields of artificial intelligence, machine learning,


and computer vision,many advances have been made. They
have significantly improved how we use their techniques in
our daily lives and how we see the world around
us.Numerous studies have been conducted on the
recognition of sign gestures using different algorithms such
as ANN, LSTM, and 3D CNN. However, the majority of
them demand more processing capability. On the contrary,
our study has a surprising accuracy of over 90% while
requiring little computing power. In order to obtain features
(binary pixels) and enhance the system, we proposed in our
research to normalize and dynamically resize our photos to
64 pixels. CNN is used to categorize the ten alphabetical
American signs.

IJISRT23APR144 www.ijisrt.com 804


Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
BIOGRAPHIES

C.V.Basanth is currently studying B.Tech with specification


of Information Technology in NRI Institute of Technology.
Dr.Sk.Mahaboob Basha is presently working as Professor
he did a mini project Sign language Classification. He
in the department of Information Technolgy at NRI Institute
earned two NPTEL certificates in total. After serving as an
of Technology, Vijayawada. He received his M.Tech degree
intern at Black Bucks
from Jawaharlal Nehru Technological University,
Kakinada(JNTUK) and Ph.D in Computer Science and
Engineering from Acharya Nagarjuna University(ANU). He
has published over 10 research papers in International
Conferences and Journals. He has more than 20 years of
experience in teaching..

H.C.Srivalli is currently studying B.Tech with specification


of Information Technology in NRI Institute of Technology.
She done a mini project on Sign language Classification and
she has done 2 NPTEL certifications She has finished an
internship at black bucks.

B.Tech with specification of Information Technology in


NRI Institute of Technology. She done a mini project Sign
language Classification and She has finished an internship at
Verzeo and black bucks and one NPTEL certification

IJISRT23APR144 www.ijisrt.com 805

You might also like