Itp 407
Itp 407
Itp 407
RECOGNITION
BY
2019/8155
NOVEMBER, 2022.
1
DEDICATION
This report is dedicated first to Almighty God, who has enabled me with strength throughout the
period of compiling this report. Secondly, to my family for their upmost support.
2
CERTIFICATION
This is to certify that this report on “Conference Control System based on Gesture Recognition”
was carried out by LEKE OLUWADARA DEBORAH, with matriculation number: 2019/8155
3
ACKNOWLEDGEMENT
I would like to thank the people who have helped me the most throughout my project. I am highly
grateful to my supervisor, Dr. Sotanwa, for her nonstop support of the project and her ever-
A special thank you goes to my colleagues who helped me complete the project, where they all
exchanged their ideas and made it possible to complete my project with all accurate information.
I wish to thank my parents for their support, understanding, patience and endurance.
Lastly, I want to thank my friends, who encouraged me, and finally, God, who made everything
4
TABLE OF CONTENTS
DEDICATION……………………………………………………………………………………i
CERTIFICATION………………………………………………………………………………..ii
ACKNOWLEDGMENT…………………………………………………………………………iii
5
6
CHAPTER ONE
INTRODUCTION
processes of relevance to the public and private sector alike. It helps in decision making and
Conferencing has evolved from the use of physical locations to technology supporting
conferences for many years now. Some key types of conferencing technologies are web
(HCI) and have proved to be unproductive, time consuming and leaving participants
unsatisfied.
This has caused growing interest in the development of new approaches and technologies for
bridging the human–computer barrier where interactions with computers will be as natural as
interactions between humans and the amount of time spent on conferencing platforms would
Gesture recognition is a new approach that involves recognizing gestures which is a complex
task but generally, body language involves hand movements and gestures performed through
7
1.2 Statement of problem
In a conference where there are a large number of participants, it is not easy to have permission
or time to speak, ask something, or draw attention to yourself. Also, drawing attention away
from yourself isn’t fast enough especially in cases where there is a need to quickly mute the
A more visual presentation of information and technicalities would save time and improve
The major objective of this work is to develop a Conference system based on gesture
systems
To examine whether using automated conferencing system does improve on efficiency and
To design a system that bring about improvements in conferences and group collaboration
To implement a system that will tackle the issues of recognition in virtual conferences.
The method adopted for the collection and gathering of data for this project were research
8
What is the nature of conferencing and components of various conference control system?
Does automated conferencing system improve on efficiency and effectiveness as they were
designed to do?
Will the system bring about improvements in conferences and group collaboration?
Will the system implement tackle the issues of recognition in virtual conferences?
This study is important as conferencing is necessary where human beings seek to solve a
problem and make decisions on how to go about it. The use of gesture recognition in
conferencing would ensure that the decision making process of individuals in a conference is
In this research paper, we propose a new architecture that allows the user, government agency
or an organization apply our proposed architecture to help improve security in the entire
9
To be able to implement a video streaming server.
Financial Constraints: The researcher was with limited funds and could not visit all the areas
to get responses from respondents but was able to get good information concerning the
research topic.
Time Constraints: The researcher was involved in other departmental activities like seminars,
attendance of lectures etc. which limited the time for the research but the researcher was able
to meet up with the time assigned for the completion of the research work.
10
CHAPTER TWO
LITERATURE REVIEW
2.1 Gesture
Body gestures vary and they include eye movements, variation in the pitch of vocal sounds
and many more but generally, hand movements are the predominant body language. Hand
gesture is better articulate to words like representation of a number, expressing any feeling etc.
Gestures are related to gesticulation, language-like gestures, pantomimes, emblems, and sign
language. Sign languages are characterized by a specific set of vocabulary and grammar.
Emblems are informal gestural expressions in which the meaning depends on convention,
image.
11
II. Application Scenarios
used to locate virtual objects and also, pointing gestures are demonstrated in
interact with virtual objects. Tele-operation and virtual assembly are good
examples of applications
communicative gestures. Since sign languages are highly structural, they are
very suitable for experimenting for vision algorithms and are a good way to
which gestures made by users are used to convey information or control devices. It
works by a camera reading the movements of the human body and communicating the
data to a computer that uses the gestures as input to control devices or applications.
12
It can also be defined as a user interface that recognizes and captures human gestures
and motions. It is used is to help the physically impaired to interact with computers,
such as interpreting sign language. It also causes changes to the way users interact with
computers by eliminating input devices such as joysticks, mice and keyboards which
allows the body to give signals freely to the computer through gestures such as finger
pointing. Gesture recognition technology also can be used to read facial and speech
expressions (i.e., lip reading), and eye movements. (Sonam P and Ubale, 2015)
Sensor-based gesture recognition (SGR): SGR methods use various sensors that
electrocardiograph sensors, and radar sensors and they have few limitations
2.3 Conferencing
13
As mentioned earlier, there are various types of computer mediated conferencing. They include
Teleconferencing is the media through which various people meet despite their physical
locations. It makes use of electronic telecommunications to enable users meet. (Egido, 1988)
For this research work, the main focus would be video conferencing. Videoconferencing is
often described as a type communication mode that bridges the gap between telephone calls
and face-to-face meeting. It has been commercially available for over two decades and
originated over thirty years ago. It was used majorly for corporate meetings such as annual
Early this year, popular video- meeting giant Zoom was said to be adding a set of new features
which includes a gesture recognition feature to enable raised hands and thumbs-up reactions.
It’s a great effort towards the implementation of gesture recognition in video conferencing but
Video conferencing like every other kind of technology has its flaws and one of its problems
is the language barrier problem of video conferencing. Everybody in a video conference has to
be able to communicate in the same language for communication to be effective. Also, a dumb
person communicating in sign language may not be understood by other people in the video
conference except they understand sign language. Gesture recognition can be used to create a
gesture to text system for sign language. This will effectively solve the sign language barrier
14
Also, Video Conferencing has been marketed to be a direct replacement of face-to-face
meetings which brings expectations of users high but it’s not. Unlike face-to-face meetings
where minimal preparation is needed, video conferencing requires proper preparation for
things to run smoothly and fast during the meeting. Also, unlike face-to face-meetings where
expressions can be read easily, video conferences don’t work that way. The implementation of
Gesture recognition in video conferencing can be used to cut down on preparation time of
video conferencing as gesture recognition can be used to allow users give expressions during
Navigating the GUI of a video conferencing platform can be demanding while still trying to
concentrate on the meeting itself. Implementing gesture recognition as a user interface would
make things easier. For example, muting ones audio using the keyboard in time when there is
surrounding noise is slower than when a gesture can be used to mute it. (Feenberg, 1989)
Research has shown that gesture recognition can be carried out using histograms. Freeman
and Roth used histograms as a pattern recognition technology because it is a simple and
fast algorithm that is relatively robust but changes in lighting. (Freeman and Roth, 1994)
A vision-based system that can interpret a user’s gestures in real time to manipulate
windows and objects within a graphical user interface was developed using a hand
segmentation procedure that first extracts binary hand blob(s) from each frame of the
acquired image sequence. Fourier descriptors were used to represent the shape of the hand
blobs, and were put into radial-basis function (RBF) network(s) for pose classification.
15
Gesture recognition performances using hidden Markov models (HMM) and recurrent
neural networks (RNN) were investigated. Test results showed that the continuous HMM
yielded the best performance with gesture recognition rates of 90.2%.Experiments with
combining the continuous HMMs and RNNs revealed that a linear combination of the two
classifiers improved the classification results to 91.9%. The gesture recognition system was
deployed in a prototype user interface application, and users who tested it found the
gestures intuitive and the application easy to use. Real time processing rates of up to 22
In 2014, AllSee was developed. It was a hand gesture technology that could work across
consumed significant power and computational resources. It consumed three to four orders
of magnitude lower power than other higher developed systems and can enable always-on
gesture recognition for smartphones and tablets. It extracts gesture information by using
existing wireless signals (e.g., TV transmissions) in the surrounding but does not incur the
power and computational overheads of prior wireless approaches. It was tested over a set
Singular Value Decomposition - SVD is an approach used for extracting the silent features
of image used for data dimension reduction and training purposes. Principal Component
Analysis- PCA is a linear transformation method used in statistical techniques. This method
16
The SVD-PCA system uses trained hand dataset images which are taken and then hand
detection is done through skin detection technique. After that various morphological
operation are performed on an image to improve the quality so that it can clearly show the
skin pixels. Features are then extracted from trained image dataset using SVD-PCA
approach and used to train the network. (Sharma and Sharma, 2019)
Sensor based gesture recognition has gone through a lot of developments and one of those
The proposed HGR algorithm uses the learning method of RCE neural networks and
The proposed HGR algorithm could achieve a recognition accuracy of 98.6%, which is
13.2%, 10.6%, and 4% higher than that of RCE neural networks, MLPs, and DTW-based
17
CHAPTER THREE
METHODOLOGY
For this study, Tensor Flow object Detection API and Python will be used to create a real time
gesture detection device that uses a webcam and can detect different sign language poses.
This method would be implemented because its algorithm is easier to use and codes aren’t as
complex.
We collect images using the webcam by making different sign language poses and
allowing the webcam capture it. These are the images that we’re going to use to
train on. For this project, we are going to be working with five expressions;
hello,thank you,yes,no and I love you. The figures below show the corresponding
18
Fig 6. Python code for collecting the images
The images collected are going to be passed on to label image package and
detection boxes are going to be drawn against the sign language poses. Label image
package is an open source package that allows you to label images for object
detection easily. Here, a labelling tool is dragged over the hand gesture and labelled
as the corresponding word it represents. This is done for each image generated
19
Fig 7. Labelling the ‘hello’ image using the Label Image package
3.3 Training and testing the data set for sign language
The labelled images are then split into a training and testing partition. This allows
the model to train on a certain set of data and test on another partition of data.
Transfer learning is going to be used against the Tensor Flow object detection API
to be able to train an object detector. Here and SSD mobile net model is used. It is
20
Fig 8. SSD mobile net model for training the model
Python and open.cv is going to be used to detect gestures in real time. The labelled
gestures are made in front of the webcam and it identifies and writes the gesture
21
CHAPTER FOUR
DISCUSSION AND RESULT
The following are the real time results of the hand gestures when making them in front of the
22
Performance evaluation was performed using the various accuracies that were documented while
the gesture detection occurred. Five people tested out the five gestures and the average accuracies
were calculated.
From the table above we can see the mean accuracies fall between 85% - 99% which shows that
23
CHAPTER FIVE
5.1 Summary
5.2 Recommendation
From this project, the use of Tensor Flow object Detection API and Python to create a real
time gesture detection has shown that it can be implemented in video conferencing
because of the use of the webcam, it’s making use of the already available resources and
the vocabulary can be expanded to the whole vocabulary of American Sign language and
5.3 Conclusion
The accuracy of the model fell between 85% - 99% which means the predictions are
implementation that still requires more research. This research can be carried out using
other gesture recognition methods and with even bigger sign language vocabulary and
gesture commands that consists of other factors not included in this project.
24
REFERENCES
Feenberg: The Written World On the theory and practice of computer conferencing,
Western Behavioral Sciences Institute, La Jolla, California, United States and San Diego
State University(1989)
Kellogg et all: Bringing Gesture Recognition to All Devices. Co-primary Student Authors
(2014)
Kim et all: IMU Sensor-Based Hand Gesture Recognition for Human-Machine Interfaces.
School of Electronics and Information Engineering, Korea Aerospace University, Goyang-
si 10540 Department of Information and Communication Engineering, Sejong University,
Seoul 143-747, Korea(2019)
Wah Ng and Ranganath: Real-time gesture recognition system and application. Image and
Vision Computing 20 (2002) 993–1007. Department of Electrical and Computer
Engineering, National University of Singapore, Singapore (2002)
25