Signlanguage Double Column 2plagg
Signlanguage Double Column 2plagg
Signlanguage Double Column 2plagg
Paper Title Model Used Key Features Dataset Accuracy Key Applications
GesturetoText: A PoseNet + LSTM Realtime, lightweight, 300 ISL words, 98% Realtime ISL
RealTime Indian pose estimation, 4000 videos translation, virtual
Sign Language sequential data meetings, online
Translator processing communication
Efficient Deep YOLOv5x + Attention mechanisms, MU 98.9% (ASL), Realtime sign
Learning Models Attention realtime edge platform HandImages 97.6% language
Based on Tension Mechanisms support ASL, (BdSL) recognition of
Techniques (CBAM, SE) OkkhorNama: alphabetic and
BdSL numeric gestures
Continuous YOLOv4 + SVM Twomodel approach Custom dataset, 98.8% Education,
WordLevel Sign (YOLOv4 for 676 images, 80 (YOLOv4), communication
Language detection, SVM for ISL signs 98.62% tools for the
Recognition classification), realtime, (SVM) hearing impaired
Using an Expert expert system
System
RealTime Sign CNN Realtime detection, Sign language High Inclusive
Language lowcost hardware gestures dataset accuracy communication,
Detection Using requirement, scalable (unspecified) everyday
CNN system communication aid
Sign Language MultiHeaded CNN Fusion of image + hand Sign language Higher than Comprehensive
Recognition landmarks, endtoend gestures dataset traditional communication
Using Fusion of learning, multiheaded models tools integrating
Image and Hand architecture multiple data inputs
Landmarks
Improved 3DResNet + Enhanced hand Various sign Outperforms Realtime
3DResNet Sign Enhanced Hand features, 3D languages existing videobased sign
Language Features convolutional layers, methods language
Recognition residual learning translation
Algorithm
RealTime Sign MobileNet V2 + CNN architecture, Custom dataset 7080% Assistive
Language Transfer Learning transfer learning, communication,
Detection realtime processing, sign language
custom dataset detection in diverse
environments
Deepsign: Sign LSTM + GRU + LSTMGRU IISL2020, 1100 ~97% ISL gesture
Language InceptionResNetV2 combination for video samples per recognition,
Detection and frame analysis, feature sign potential expansion
Recognition extraction using to continuous sign
Using Deep InceptionResNetV2 language
Learning recognition
This research focuses on developing a technical solution for Within this article, we introduced a novel approach for
recognizing Sign Language. By leveraging machine detecting gestural language by dividing the system into four
learning, the study aims to build a system capable of main steps: Data Collection, Data Preprocessing, Model
identifying hand gestures in American Sign Language. The Training, and Real-time Prediction. We have used Python
dataset for this project includes three-dimensional images of for creating, training, and utilizing a gesture recognition
alphabetical sign gestures. The MediaPipe framework is system using computer vision, machine learning, and hand
applied to detect landmarks within these images[29]. landmark detection via the MediaPipe library. Each of these
components is explained in detail within their respective
For a real-time sign language detection system developed sections.
specifically for American Sign Language a dataset of 2600
samples was generated, including vowels and consonants 1. Data Collection
from the American Sign Language. The dataset consists of A framework is provided for capturing gesture images using
alphabet signs in American Sign Language, as specified in a webcam, allowing users to create datasets for gesture
Fig. 1. For each alphabet, 100 images are captured which recognition. The program starts by asking the user to input
are stored in designated folders. During data acquisition, the number of gestures they want to capture and the
images are captured via a webcam using Python and respective name of the captured gesture. For each gesture, it
OpenCV, which provides essential functions for real-time creates a corresponding directory inside a main folder,
computer vision. OpenCV not only facilitates machine typically named DATA_DIR, which defaults to ./data. If the
perception for commercial applications but also serves as a directories for the gestures don't already exist, they are
unified infrastructure for computer vision tasks. It includes automatically created. The webcam is accessed through the
over 2,500 optimized algorithms that enable tasks such as cv2.VideoCapture function, and the user is instructed to
face detection, object recognition, camera tracking, 3D press a specific key (usually 'A') to initiate the capture
modeling, human action classification and more[30]. The process. Once activated, the program captures multiple
results indicate that this method is effective in recognizing frames of the gesture, saving each image inside the
alphabets and gestures in American Sign Language, appropriate folder for that gesture. This setup allows for easy
suggesting its potential for identifying signs and gestures in and organized collection of gesture data, with the captured
various other languages as well. samples stored in separate directories for each class of
gesture.
2. Data Preprocessing
Gesture image data is processed by detecting and extracting
hand landmarks using MediaPipe, preparing the data for
machine learning. It begins by utilizing MediaPipe to detect
hand landmarks in each of the captured images. Once the
landmarks are reognized, the code withdraws the y and x
coordinates of the landmarks for each hand displayed. To
standardize the data, it normalizes these coordinates by
subtracting the minimum values of y and x, creating
consistent feature vectors. Each image is then labeled
according to its corresponding gesture, with the label
represented as the index of the gesture. After processing all
Fig 1. Sample Dataset
the images, the extracted features and labels are stored in a
pickle file, typically named data.pickle, for training of pyttsx3 library is used to convert predicted gesture names
machine learning models in future. This process ensures that into speech, providing auditory feedback. Additionally,
the data is well- structured and ready for tasks solemnly Pickle is utilized to save and load both the data and the
based gesture recognition. trained model for future use. Together, these components
enable a robust, real-time gesture recognition system that
3. Model Training
processes video input, predicts gestures using machine
A machine learning model is trained using the learning, and offers both visual and audio outputs.
preprocessed hand landmark data from a pickle file
(data.pickle). It begins by loading the saved data, which V. RESULT AND DISCUSSION
includes the extracted hand landmark features and
corresponding gesture labels. Train_test_split is then used to For sign language detection, a variety of machine learning
divide the data into training and test sets to ensure the model models have been used, and their performance has been
can be evaluated on unseen data. A RandomForestClassifier assessed using measures like accuracy based on the dataset.
learns to identify the movements using the features after With an accuracy of 99.81% on data, our model
being trained on the training set. Using the test set, the outperformed other models tested, including CNN (Sign
system's efficacy is assessed, and the accuracy of the results Language MNIST), Multi-headed CNN, Faster-RCNN and
is computed using accuracy_score to find the proportion of SVM as shown in Table 2. MediaPipe’s pre-trained, high-
correctly classified samples. Once the model is trained, both quality hand landmark detection pipeline allows for precise
the model and the gesture names are saved into a new pickle identification and tracking of hand movements, which is
file (model.p) for future use in gesture recognition tasks. This critical for gesture and sign language recognition which
process efficiently trains, evaluates, and stores the model for accounts for its success. Additionally, MediaPipe is
real-world applications. optimized for performance, making it both computationally
efficient and adaptable to various platforms, further
4. RealTime Gesture Prediction supporting robust, real-time gesture recognition in diverse
This involves enabling real-time gesture recognition settings. Each tested model's accuracies are shown in the
using a trained machine learning model. It begins by loading table below.
the previously trained RandomForest model and the Table 2 Accuracy of various ML algorithms
corresponding gesture names from a pickle file (model.p).
Model Dataset Accuracy
Using cv2. In order to identify hand landmarks, the code
VideoCapture records live video frames from the webcam
MediaPipe Model ASL Dataset 99.81%
and uses MediaPipe to process each frame. For every
detected hand, the code extracts the relevant features—hand
landmark coordinates—and passes them to the trained model CNN Sign Language 95%
to predict the gesture. Once a gesture is recognized, the (SignLanguage MNIST
gesture name is converted into speech using the pyttsx3 MNIST)
library, providing audio feedback. Additionally, the predicted Multi-headed CNN Fingerspelling A 98.98%
gesture is visually displayed on the video frame, along with a dataset
bounding box drawn around the detected hand using
OpenCV. This setup allows for intuitive, real-time gesture Faster- RCNN Custom dataset 82.8%
recognition with both visual and audio outputs.
SVM Custom dataset 97%
Fig. 2 Methodology