Itp 407

CONFERENCE CONTROL SYSTEM BASED ON GESTURE
RECOGNITION
BY
LEKE, OLUWADARA DEBORAH
2019/8155
COLLEGE OF NATURAL AND APPLIED SCIENCE
COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
BELLS UNIVERSITY OF TECHNOLOGY, OTA, OGUN STATE
NOVEMBER, 2022.
1
DEDICATION
This report is dedicated first to Almighty God, who has enabled me with strength throughout the
period of compiling this report. Secondly, to my family for their upmost support.
2
CERTIFICATION
This is to certify that this report on “Conference Control System based on Gesture Recognition”
was carried out by LEKE OLUWADARA DEBORAH, with matriculation number: 2019/8155
from the Department of Computer science and Information Technology.
3
ACKNOWLEDGEMENT
I would like to thank the people who have helped me the most throughout my project. I am highly
grateful to my supervisor, Dr. Sotanwa, for her nonstop support of the project and her ever-
readiness to help at all stages.
A special thank you goes to my colleagues who helped me complete the project, where they all
exchanged their ideas and made it possible to complete my project with all accurate information.
I wish to thank my parents for their support, understanding, patience and endurance.
Lastly, I want to thank my friends, who encouraged me, and finally, God, who made everything
possible for me until the end.
4
TABLE OF CONTENTS
DEDICATION……………………………………………………………………………………i
CERTIFICATION………………………………………………………………………………..ii
ACKNOWLEDGMENT…………………………………………………………………………iii
5
6
CHAPTER ONE
INTRODUCTION
1.1 Background of study
Conferencing is important in any organization. Conferences support many key business
processes of relevance to the public and private sector alike. It helps in decision making and
is essential to many tasks in organizations that cannot be carried out by individuals
(Nunamaker et al, 1997).
Conferencing has evolved from the use of physical locations to technology supporting
conferences for many years now. Some key types of conferencing technologies are web
conferencing, video conferencing, teleconferencing, synchronous conferencing, data
conferencing, conference call etc.
These types of conferencing technologies do not support natural human–computer interaction
(HCI) and have proved to be unproductive, time consuming and leaving participants
unsatisfied.
This has caused growing interest in the development of new approaches and technologies for
bridging the human–computer barrier where interactions with computers will be as natural as
interactions between humans and the amount of time spent on conferencing platforms would
be reduced significantly. (Lind et al,2006)
Gesture recognition is a new approach that involves recognizing gestures which is a complex
task but generally, body language involves hand movements and gestures performed through
hands gives a better expression to your words.
7
1.2 Statement of problem
In a conference where there are a large number of participants, it is not easy to have permission
or time to speak, ask something, or draw attention to yourself. Also, drawing attention away
from yourself isn’t fast enough especially in cases where there is a need to quickly mute the
sound if there are loud distracting noises in the room.
It is important to build connections with other participants in a conference and have a
collaborative experience with them.
A more visual presentation of information and technicalities would save time and improve
understanding within a group of people during conferencing.
1.3 Aims and objective
The major objective of this work is to develop a Conference system based on gesture
recognition and establish a natural interaction within its interface.
This study aims to investigate:
 To examine the nature of conferencing and components of various conference control
systems
 To examine whether using automated conferencing system does improve on efficiency and
effectiveness as they were designed to do
 To design a system that bring about improvements in conferences and group collaboration
 To implement a system that will tackle the issues of recognition in virtual conferences.
1.4 Research Methodology
The method adopted for the collection and gathering of data for this project were research
questions and they include:
8
 What is the nature of conferencing and components of various conference control system?
 Does automated conferencing system improve on efficiency and effectiveness as they were
designed to do?
 Will the system bring about improvements in conferences and group collaboration?
 Will the system implement tackle the issues of recognition in virtual conferences?
1.5 Justification of the Study
This study is important as conferencing is necessary where human beings seek to solve a
problem and make decisions on how to go about it. The use of gesture recognition in
conferencing would ensure that the decision making process of individuals in a conference is
effective and productive by:
 Ensuring more control over large-scale events.
 Enhancing collaborative nature between attendees of conferences.
 Allowing a natural interaction with the GUI of the conferencing application.
 Reducing the amount of time spent on conferencing.
1.6 Scope of the study
In this research paper, we propose a new architecture that allows the user, government agency
or an organization apply our proposed architecture to help improve security in the entire
conferencing solution. In the proposed solution architecture, Python as a programming
language was used.
The scope of this work will include the following
 A dynamic Network system that can communicate in real time
9
 To be able to implement a video streaming server.
 Explore the power of visual basic in data handling
1.7 Limitation of the study
Financial Constraints: The researcher was with limited funds and could not visit all the areas
to get responses from respondents but was able to get good information concerning the
research topic.
Time Constraints: The researcher was involved in other departmental activities like seminars,
attendance of lectures etc. which limited the time for the research but the researcher was able
to meet up with the time assigned for the completion of the research work.
10
CHAPTER TWO
LITERATURE REVIEW
2.1 Gesture
Body gestures vary and they include eye movements, variation in the pitch of vocal sounds
and many more but generally, hand movements are the predominant body language. Hand
gesture is better articulate to words like representation of a number, expressing any feeling etc.
(Sharma and Sharma, 2019)
Gestures are related to gesticulation, language-like gestures, pantomimes, emblems, and sign
language. Sign languages are characterized by a specific set of vocabulary and grammar.
Emblems are informal gestural expressions in which the meaning depends on convention,
culture and lexicon. (Wu and Huang, 1999)
2.1.1 Categories of gestures
Gestures are categorized according to
I. Orientation of the hands
Here we have two types of gestures. They are
 Static Gesture: A static gesture is a specific pose represented by a single
image.
 Dynamic Gesture: A dynamic gesture is a moving gesture represented by a
sequence of images. (William and Roth, 1994)
11
II. Application Scenarios
According to different application scenarios, hand gestures can be classified into
several categories such as:
 Controlling gestures: Controlling gestures are the focus of current research in
vision-based interfaces. Pointing gestures; a type of controlling gesture, can be
used to locate virtual objects and also, pointing gestures are demonstrated in
display-control applications. Navigating gesture is another controlling gesture
that involves using orientation of the hands to capture 3D directional input to
navigate within virtual environments.
 Manipulative gestures: The manipulative gesture serves as a natural way to
interact with virtual objects. Tele-operation and virtual assembly are good
examples of applications
 Communicative gestures: Sign language is an important case of
communicative gestures. Since sign languages are highly structural, they are
very suitable for experimenting for vision algorithms and are a good way to
help the disabled to interact with computers.
(Wu and Huang, 1999)
2.2 Gesture Recognition
Gesture recognition is a branch of computer vision and human computer interaction by
which gestures made by users are used to convey information or control devices. It
works by a camera reading the movements of the human body and communicating the
data to a computer that uses the gestures as input to control devices or applications.
(Sonam P and Ubale, 2015)
12
It can also be defined as a user interface that recognizes and captures human gestures
and motions. It is used is to help the physically impaired to interact with computers,
such as interpreting sign language. It also causes changes to the way users interact with
computers by eliminating input devices such as joysticks, mice and keyboards which
allows the body to give signals freely to the computer through gestures such as finger
pointing. Gesture recognition technology also can be used to read facial and speech
expressions (i.e., lip reading), and eye movements. (Sonam P and Ubale, 2015)
2.2.1Categories of Gesture Recognition
Hand Gesture Recognition can be categorized into:
 Vision-based gesture recognition (VGR): VGR recognizes gestures using
camera images, and various technologies have been proposed. A disadvantage
of VGR is that its accuracy degrades in light-sensitive application scenarios
because camera images are affected by lighting conditions.
 Sensor-based gesture recognition (SGR): SGR methods use various sensors that
are not affected by lighting conditions such as Inertial Measurement Unit
(IMU) sensors, electromyography (EMG) sensors, brain wave sensors,
electrocardiograph sensors, and radar sensors and they have few limitations
(Kim et all, 2019)
2.3 Conferencing
According to Wiktionary, Conferencing is the act of consulting together as a group to discuss
and exchange views and cite issues about a topic of interest.
13
As mentioned earlier, there are various types of computer mediated conferencing. They include
web conferencing, video conferencing, teleconferencing, synchronous conferencing, data
conferencing, conference call etc.
Teleconferencing is the media through which various people meet despite their physical
locations. It makes use of electronic telecommunications to enable users meet. (Egido, 1988)
For this research work, the main focus would be video conferencing. Videoconferencing is
often described as a type communication mode that bridges the gap between telephone calls
and face-to-face meeting. It has been commercially available for over two decades and
originated over thirty years ago. It was used majorly for corporate meetings such as annual
stockholders’ meetings. (Egido, 1988)
2.4 Gesture Recognition In Video Conferencing
Early this year, popular video- meeting giant Zoom was said to be adding a set of new features
which includes a gesture recognition feature to enable raised hands and thumbs-up reactions.
It’s a great effort towards the implementation of gesture recognition in video conferencing but
gesture recognition needs to be accepted more in Video conferencing. (Tung, 2022)
Video conferencing like every other kind of technology has its flaws and one of its problems
is the language barrier problem of video conferencing. Everybody in a video conference has to
be able to communicate in the same language for communication to be effective. Also, a dumb
person communicating in sign language may not be understood by other people in the video
conference except they understand sign language. Gesture recognition can be used to create a
gesture to text system for sign language. This will effectively solve the sign language barrier
(Murakami and Taguchi, 1991)
14
Also, Video Conferencing has been marketed to be a direct replacement of face-to-face
meetings which brings expectations of users high but it’s not. Unlike face-to-face meetings
where minimal preparation is needed, video conferencing requires proper preparation for
things to run smoothly and fast during the meeting. Also, unlike face-to face-meetings where
expressions can be read easily, video conferences don’t work that way. The implementation of
Gesture recognition in video conferencing can be used to cut down on preparation time of
video conferencing as gesture recognition can be used to allow users give expressions during
meetings. (Egido, 1988)
Navigating the GUI of a video conferencing platform can be demanding while still trying to
concentrate on the meeting itself. Implementing gesture recognition as a user interface would
make things easier. For example, muting ones audio using the keyboard in time when there is
surrounding noise is slower than when a gesture can be used to mute it. (Feenberg, 1989)
2.5 Review of Related work
2.5.1 Hand Gesture Recognition through Orientation Histograms
Research has shown that gesture recognition can be carried out using histograms. Freeman
and Roth used histograms as a pattern recognition technology because it is a simple and
fast algorithm that is relatively robust but changes in lighting. (Freeman and Roth, 1994)
2.5.2 Real-time vision based gesture recognition system
A vision-based system that can interpret a user’s gestures in real time to manipulate
windows and objects within a graphical user interface was developed using a hand
segmentation procedure that first extracts binary hand blob(s) from each frame of the
acquired image sequence. Fourier descriptors were used to represent the shape of the hand
blobs, and were put into radial-basis function (RBF) network(s) for pose classification.
15
Gesture recognition performances using hidden Markov models (HMM) and recurrent
neural networks (RNN) were investigated. Test results showed that the continuous HMM
yielded the best performance with gesture recognition rates of 90.2%.Experiments with
combining the continuous HMMs and RNNs revealed that a linear combination of the two
classifiers improved the classification results to 91.9%. The gesture recognition system was
deployed in a prototype user interface application, and users who tested it found the
gestures intuitive and the application easy to use. Real time processing rates of up to 22
frames per second were obtained. (Wah Ng and Ranganath, 2002)
2.5.3 Bringing Gesture Recognition to All Devices.
In 2014, AllSee was developed. It was a hand gesture technology that could work across
all systems including those with no batteries as existing gesture-recognition systems
consumed significant power and computational resources. It consumed three to four orders
of magnitude lower power than other higher developed systems and can enable always-on
gesture recognition for smartphones and tablets. It extracts gesture information by using
existing wireless signals (e.g., TV transmissions) in the surrounding but does not incur the
power and computational overheads of prior wireless approaches. It was tested over a set
of eight gestures and had a 97% accuracy. (Kellogg et all, 2014)
2.5.4 SVD- PCA approach of neural Networks for Gesture Recognition
Singular Value Decomposition - SVD is an approach used for extracting the silent features
of image used for data dimension reduction and training purposes. Principal Component
Analysis- PCA is a linear transformation method used in statistical techniques. This method
used for data dimension reduction and feature extraction.
16
The SVD-PCA system uses trained hand dataset images which are taken and then hand
detection is done through skin detection technique. After that various morphological
operation are performed on an image to improve the quality so that it can clearly show the
skin pixels. Features are then extracted from trained image dataset using SVD-PCA
approach and used to train the network. (Sharma and Sharma, 2019)
2.5.5 Inertial Measurement Unit (IMU) Sensor-Based Hand Gesture Recognition
Sensor based gesture recognition has gone through a lot of developments and one of those
developments is an algorithm that employs Dynamic Time Warping (DTW)
in a Restricted Column Network (RCE) neural network to create a high-performance
Human Gesture Recognition (HGR) algorithm that supports real-time learning.
The proposed HGR algorithm uses the learning method of RCE neural networks and
distance measurement scheme of DTW.
The proposed HGR algorithm could achieve a recognition accuracy of 98.6%, which is
13.2%, 10.6%, and 4% higher than that of RCE neural networks, MLPs, and DTW-based
HGR algorithms, respectively. (Kim et all, 2019)
17
CHAPTER THREE
METHODOLOGY
For this study, Tensor Flow object Detection API and Python will be used to create a real time
gesture detection device that uses a webcam and can detect different sign language poses.
This method would be implemented because its algorithm is easier to use and codes aren’t as
complex.
The processes involved in carrying out the study includes:
3.1 Collecting images using Python and Open.cv
We collect images using the webcam by making different sign language poses and
allowing the webcam capture it. These are the images that we’re going to use to
train on. For this project, we are going to be working with five expressions;
hello,thank you,yes,no and I love you. The figures below show the corresponding
American sign language gestures for each expression.
Fig 1. Hello Fig 2. No Fig 3. I Love You
Fig 4. Yes Fig 5. Thank you
18
Fig 6. Python code for collecting the images
3.2 Labelling images for object orientation
The images collected are going to be passed on to label image package and
detection boxes are going to be drawn against the sign language poses. Label image
package is an open source package that allows you to label images for object
detection easily. Here, a labelling tool is dragged over the hand gesture and labelled
as the corresponding word it represents. This is done for each image generated
19
Fig 7. Labelling the ‘hello’ image using the Label Image package
3.3 Training and testing the data set for sign language
The labelled images are then split into a training and testing partition. This allows
the model to train on a certain set of data and test on another partition of data.
Transfer learning is going to be used against the Tensor Flow object detection API
to be able to train an object detector. Here and SSD mobile net model is used. It is
a pre-trained model that allows to perform transfer learning faster..
20
Fig 8. SSD mobile net model for training the model
3.4 Detecting sign language in real time
Python and open.cv is going to be used to detect gestures in real time. The labelled
gestures are made in front of the webcam and it identifies and writes the gesture
being made with its percentage accuracy.
21
CHAPTER FOUR
DISCUSSION AND RESULT
The following are the real time results of the hand gestures when making them in front of the
webcam and their corresponding accuracies.
Fig. 9 Thank you (gesture detection) Fig. 10 Yes (gesture detection)
Fig.11 I Love You (gesture detection) Fig 12 Hello (gesture detection)
Fig 13 No (gesture detection)
22
Performance evaluation was performed using the various accuracies that were documented while
the gesture detection occurred. Five people tested out the five gestures and the average accuracies
were calculated.
Gesture User 1 User 2 User 3 User 4 User 5 Average
Hello 82.5% 88.0% 81.0% 83.0% 92.5% 85.4%
Thank You 81.5% 91.5% 91.0% 89.5% 86.5% 88%
Yes 94.6% 92.9% 93.7% 91.4% 95.2% 93.6%
No 99.5% 97.0% 99.5% 99.5% 97.5% 98.6%
I Love You 97.5% 94.6% 91.0% 88.0% 81.5% 90.5%
Table 1. Accuracy measurements
From the table above we can see the mean accuracies fall between 85% - 99% which shows that
the system is very accurate.
23
CHAPTER FIVE
SUMMARY, RECOMMENDATION AND CONCLUSION
5.1 Summary
The overall implementation of gesture recognition in video conferencing was successful.
Responses indicated that it was easy to use and effective.
5.2 Recommendation
From this project, the use of Tensor Flow object Detection API and Python to create a real
time gesture detection has shown that it can be implemented in video conferencing
because of the use of the webcam, it’s making use of the already available resources and
the vocabulary can be expanded to the whole vocabulary of American Sign language and
other sign language systems.
5.3 Conclusion
The accuracy of the model fell between 85% - 99% which means the predictions are
reliable. However, the use of gesture recognition in videoconferencing is an
implementation that still requires more research. This research can be carried out using
other gesture recognition methods and with even bigger sign language vocabulary and
gesture commands that consists of other factors not included in this project.
24
REFERENCES
 Egido: Videoconferencing as a Technology to Support Group Work: A Review .of its

Failure Bell Communications Research, Inc.445 South Street Morristown, NJ 0796CL1910
(1988)
 Feenberg: The Written World On the theory and practice of computer conferencing,
Western Behavioral Sciences Institute, La Jolla, California, United States and San Diego
State University(1989)
 Kellogg et all: Bringing Gesture Recognition to All Devices. Co-primary Student Authors
(2014)
 Kim et all: IMU Sensor-Based Hand Gesture Recognition for Human-Machine Interfaces.
School of Electronics and Information Engineering, Korea Aerospace University, Goyang-
si 10540 Department of Information and Communication Engineering, Sejong University,
Seoul 143-747, Korea(2019)
 Murakami and Taguchi: Gesture Recognition using Recurrent Neural Networks

Human Interface Laboratory Fujitsu Laboratories LTD. Kawasaki 1015, Kamikodanaka,
Nakahara-ku, Kawasaki, 211, Japan (1991)
 Orientation Histograms for Hand: ‘Mitsubishi Electric Research Laboratories Cambridge

Research Center. Gesture Recognition TR-94-03a (1994)
 Sharma and Sharma: Gesture Recognition System, Computer Science Engineering,

Krishna Engineering College, Inderprastha Engineering College, Ghaziabad, India. (2019)
 Sonam P and Ubale: Gesture Recognition-A Review, Electronics Dept., AVCOE,

Sangamner. IOSR Journal of Electronics and Communication Engineering (IOSR-
JECE)(2015)
 Wah Ng and Ranganath: Real-time gesture recognition system and application. Image and
Vision Computing 20 (2002) 993–1007. Department of Electrical and Computer
Engineering, National University of Singapore, Singapore (2002)
 Wu and Huang: Vision-Based Gesture Recognition: A Review

Beckman Institute 405 N. Mathews University of Illinois at Urbana Champaign Urbana
(1999)
25

Itp 407

Uploaded by

Copyright:

Available Formats

Itp 407

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Itp 407

Uploaded by

Copyright:

Available Formats

CONFERENCE CONTROL SYSTEM BASED ON GESTURE

LEKE, OLUWADARA DEBORAH

COLLEGE OF NATURAL AND APPLIED SCIENCE

COMPUTER SCIENCE AND INFORMATION TECHNOLOGY

BELLS UNIVERSITY OF TECHNOLOGY, OTA, OGUN STATE

from the Department of Computer science and Information Technology.

readiness to help at all stages.

possible for me until the end.

1.1 Background of study

Conferencing is important in any organization. Conferences support many key business

is essential to many tasks in organizations that cannot be carried out by individuals

(Nunamaker et al, 1997).

conferencing, video conferencing, teleconferencing, synchronous conferencing, data

conferencing, conference call etc.

These types of conferencing technologies do not support natural human–computer interaction

be reduced significantly. (Lind et al,2006)

hands gives a better expression to your words.

sound if there are loud distracting noises in the room.

It is important to build connections with other participants in a conference and have a

collaborative experience with them.

understanding within a group of people during conferencing.

1.3 Aims and objective

recognition and establish a natural interaction within its interface.

This study aims to investigate:

 To examine the nature of conferencing and components of various conference control

effectiveness as they were designed to do

1.4 Research Methodology

questions and they include:

1.5 Justification of the Study

effective and productive by:

 Ensuring more control over large-scale events.

 Enhancing collaborative nature between attendees of conferences.

 Allowing a natural interaction with the GUI of the conferencing application.

 Reducing the amount of time spent on conferencing.

1.6 Scope of the study

conferencing solution. In the proposed solution architecture, Python as a programming

language was used.

The scope of this work will include the following

 A dynamic Network system that can communicate in real time

 Explore the power of visual basic in data handling

1.7 Limitation of the study

(Sharma and Sharma, 2019)

culture and lexicon. (Wu and Huang, 1999)

2.1.1 Categories of gestures

Gestures are categorized according to

I. Orientation of the hands

Here we have two types of gestures. They are

 Static Gesture: A static gesture is a specific pose represented by a single

 Dynamic Gesture: A dynamic gesture is a moving gesture represented by a

sequence of images. (William and Roth, 1994)

According to different application scenarios, hand gestures can be classified into

several categories such as:

 Controlling gestures: Controlling gestures are the focus of current research in

vision-based interfaces. Pointing gestures; a type of controlling gesture, can be

display-control applications. Navigating gesture is another controlling gesture

that involves using orientation of the hands to capture 3D directional input to

navigate within virtual environments.

 Manipulative gestures: The manipulative gesture serves as a natural way to

 Communicative gestures: Sign language is an important case of