Itp 407

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

CONFERENCE CONTROL SYSTEM BASED ON GESTURE

RECOGNITION

BY

LEKE, OLUWADARA DEBORAH

2019/8155

COLLEGE OF NATURAL AND APPLIED SCIENCE

COMPUTER SCIENCE AND INFORMATION TECHNOLOGY

BELLS UNIVERSITY OF TECHNOLOGY, OTA, OGUN STATE

NOVEMBER, 2022.

1
DEDICATION

This report is dedicated first to Almighty God, who has enabled me with strength throughout the

period of compiling this report. Secondly, to my family for their upmost support.

2
CERTIFICATION

This is to certify that this report on “Conference Control System based on Gesture Recognition”

was carried out by LEKE OLUWADARA DEBORAH, with matriculation number: 2019/8155

from the Department of Computer science and Information Technology.

3
ACKNOWLEDGEMENT

I would like to thank the people who have helped me the most throughout my project. I am highly

grateful to my supervisor, Dr. Sotanwa, for her nonstop support of the project and her ever-

readiness to help at all stages.

A special thank you goes to my colleagues who helped me complete the project, where they all

exchanged their ideas and made it possible to complete my project with all accurate information.

I wish to thank my parents for their support, understanding, patience and endurance.

Lastly, I want to thank my friends, who encouraged me, and finally, God, who made everything

possible for me until the end.

4
TABLE OF CONTENTS

DEDICATION……………………………………………………………………………………i
CERTIFICATION………………………………………………………………………………..ii
ACKNOWLEDGMENT…………………………………………………………………………iii

5
6
CHAPTER ONE
INTRODUCTION

1.1 Background of study

Conferencing is important in any organization. Conferences support many key business

processes of relevance to the public and private sector alike. It helps in decision making and

is essential to many tasks in organizations that cannot be carried out by individuals

(Nunamaker et al, 1997).

Conferencing has evolved from the use of physical locations to technology supporting

conferences for many years now. Some key types of conferencing technologies are web

conferencing, video conferencing, teleconferencing, synchronous conferencing, data

conferencing, conference call etc.

These types of conferencing technologies do not support natural human–computer interaction

(HCI) and have proved to be unproductive, time consuming and leaving participants

unsatisfied.

This has caused growing interest in the development of new approaches and technologies for

bridging the human–computer barrier where interactions with computers will be as natural as

interactions between humans and the amount of time spent on conferencing platforms would

be reduced significantly. (Lind et al,2006)

Gesture recognition is a new approach that involves recognizing gestures which is a complex

task but generally, body language involves hand movements and gestures performed through

hands gives a better expression to your words.

7
1.2 Statement of problem

In a conference where there are a large number of participants, it is not easy to have permission

or time to speak, ask something, or draw attention to yourself. Also, drawing attention away

from yourself isn’t fast enough especially in cases where there is a need to quickly mute the

sound if there are loud distracting noises in the room.

It is important to build connections with other participants in a conference and have a

collaborative experience with them.

A more visual presentation of information and technicalities would save time and improve

understanding within a group of people during conferencing.

1.3 Aims and objective

The major objective of this work is to develop a Conference system based on gesture

recognition and establish a natural interaction within its interface.

This study aims to investigate:

 To examine the nature of conferencing and components of various conference control

systems

 To examine whether using automated conferencing system does improve on efficiency and

effectiveness as they were designed to do

 To design a system that bring about improvements in conferences and group collaboration

 To implement a system that will tackle the issues of recognition in virtual conferences.

1.4 Research Methodology

The method adopted for the collection and gathering of data for this project were research

questions and they include:

8
 What is the nature of conferencing and components of various conference control system?

 Does automated conferencing system improve on efficiency and effectiveness as they were

designed to do?

 Will the system bring about improvements in conferences and group collaboration?

 Will the system implement tackle the issues of recognition in virtual conferences?

1.5 Justification of the Study

This study is important as conferencing is necessary where human beings seek to solve a

problem and make decisions on how to go about it. The use of gesture recognition in

conferencing would ensure that the decision making process of individuals in a conference is

effective and productive by:

 Ensuring more control over large-scale events.

 Enhancing collaborative nature between attendees of conferences.

 Allowing a natural interaction with the GUI of the conferencing application.

 Reducing the amount of time spent on conferencing.

1.6 Scope of the study

In this research paper, we propose a new architecture that allows the user, government agency

or an organization apply our proposed architecture to help improve security in the entire

conferencing solution. In the proposed solution architecture, Python as a programming

language was used.

The scope of this work will include the following

 A dynamic Network system that can communicate in real time

9
 To be able to implement a video streaming server.

 Explore the power of visual basic in data handling

1.7 Limitation of the study

Financial Constraints: The researcher was with limited funds and could not visit all the areas

to get responses from respondents but was able to get good information concerning the

research topic.

Time Constraints: The researcher was involved in other departmental activities like seminars,

attendance of lectures etc. which limited the time for the research but the researcher was able

to meet up with the time assigned for the completion of the research work.

10
CHAPTER TWO

LITERATURE REVIEW

2.1 Gesture

Body gestures vary and they include eye movements, variation in the pitch of vocal sounds

and many more but generally, hand movements are the predominant body language. Hand

gesture is better articulate to words like representation of a number, expressing any feeling etc.

(Sharma and Sharma, 2019)

Gestures are related to gesticulation, language-like gestures, pantomimes, emblems, and sign

language. Sign languages are characterized by a specific set of vocabulary and grammar.

Emblems are informal gestural expressions in which the meaning depends on convention,

culture and lexicon. (Wu and Huang, 1999)

2.1.1 Categories of gestures

Gestures are categorized according to

I. Orientation of the hands

Here we have two types of gestures. They are

 Static Gesture: A static gesture is a specific pose represented by a single

image.

 Dynamic Gesture: A dynamic gesture is a moving gesture represented by a

sequence of images. (William and Roth, 1994)

11
II. Application Scenarios

According to different application scenarios, hand gestures can be classified into

several categories such as:

 Controlling gestures: Controlling gestures are the focus of current research in

vision-based interfaces. Pointing gestures; a type of controlling gesture, can be

used to locate virtual objects and also, pointing gestures are demonstrated in

display-control applications. Navigating gesture is another controlling gesture

that involves using orientation of the hands to capture 3D directional input to

navigate within virtual environments.

 Manipulative gestures: The manipulative gesture serves as a natural way to

interact with virtual objects. Tele-operation and virtual assembly are good

examples of applications

 Communicative gestures: Sign language is an important case of

communicative gestures. Since sign languages are highly structural, they are

very suitable for experimenting for vision algorithms and are a good way to

help the disabled to interact with computers.

(Wu and Huang, 1999)

2.2 Gesture Recognition

Gesture recognition is a branch of computer vision and human computer interaction by

which gestures made by users are used to convey information or control devices. It

works by a camera reading the movements of the human body and communicating the

data to a computer that uses the gestures as input to control devices or applications.

(Sonam P and Ubale, 2015)

12
It can also be defined as a user interface that recognizes and captures human gestures

and motions. It is used is to help the physically impaired to interact with computers,

such as interpreting sign language. It also causes changes to the way users interact with

computers by eliminating input devices such as joysticks, mice and keyboards which

allows the body to give signals freely to the computer through gestures such as finger

pointing. Gesture recognition technology also can be used to read facial and speech

expressions (i.e., lip reading), and eye movements. (Sonam P and Ubale, 2015)

2.2.1Categories of Gesture Recognition

Hand Gesture Recognition can be categorized into:

 Vision-based gesture recognition (VGR): VGR recognizes gestures using

camera images, and various technologies have been proposed. A disadvantage

of VGR is that its accuracy degrades in light-sensitive application scenarios

because camera images are affected by lighting conditions.

 Sensor-based gesture recognition (SGR): SGR methods use various sensors that

are not affected by lighting conditions such as Inertial Measurement Unit

(IMU) sensors, electromyography (EMG) sensors, brain wave sensors,

electrocardiograph sensors, and radar sensors and they have few limitations

(Kim et all, 2019)

2.3 Conferencing

According to Wiktionary, Conferencing is the act of consulting together as a group to discuss

and exchange views and cite issues about a topic of interest.

13
As mentioned earlier, there are various types of computer mediated conferencing. They include

web conferencing, video conferencing, teleconferencing, synchronous conferencing, data

conferencing, conference call etc.

Teleconferencing is the media through which various people meet despite their physical

locations. It makes use of electronic telecommunications to enable users meet. (Egido, 1988)

For this research work, the main focus would be video conferencing. Videoconferencing is

often described as a type communication mode that bridges the gap between telephone calls

and face-to-face meeting. It has been commercially available for over two decades and

originated over thirty years ago. It was used majorly for corporate meetings such as annual

stockholders’ meetings. (Egido, 1988)

2.4 Gesture Recognition In Video Conferencing

Early this year, popular video- meeting giant Zoom was said to be adding a set of new features

which includes a gesture recognition feature to enable raised hands and thumbs-up reactions.

It’s a great effort towards the implementation of gesture recognition in video conferencing but

gesture recognition needs to be accepted more in Video conferencing. (Tung, 2022)

Video conferencing like every other kind of technology has its flaws and one of its problems

is the language barrier problem of video conferencing. Everybody in a video conference has to

be able to communicate in the same language for communication to be effective. Also, a dumb

person communicating in sign language may not be understood by other people in the video

conference except they understand sign language. Gesture recognition can be used to create a

gesture to text system for sign language. This will effectively solve the sign language barrier

(Murakami and Taguchi, 1991)

14
Also, Video Conferencing has been marketed to be a direct replacement of face-to-face

meetings which brings expectations of users high but it’s not. Unlike face-to-face meetings

where minimal preparation is needed, video conferencing requires proper preparation for

things to run smoothly and fast during the meeting. Also, unlike face-to face-meetings where

expressions can be read easily, video conferences don’t work that way. The implementation of

Gesture recognition in video conferencing can be used to cut down on preparation time of

video conferencing as gesture recognition can be used to allow users give expressions during

meetings. (Egido, 1988)

Navigating the GUI of a video conferencing platform can be demanding while still trying to

concentrate on the meeting itself. Implementing gesture recognition as a user interface would

make things easier. For example, muting ones audio using the keyboard in time when there is

surrounding noise is slower than when a gesture can be used to mute it. (Feenberg, 1989)

2.5 Review of Related work

2.5.1 Hand Gesture Recognition through Orientation Histograms

Research has shown that gesture recognition can be carried out using histograms. Freeman

and Roth used histograms as a pattern recognition technology because it is a simple and

fast algorithm that is relatively robust but changes in lighting. (Freeman and Roth, 1994)

2.5.2 Real-time vision based gesture recognition system

A vision-based system that can interpret a user’s gestures in real time to manipulate

windows and objects within a graphical user interface was developed using a hand

segmentation procedure that first extracts binary hand blob(s) from each frame of the

acquired image sequence. Fourier descriptors were used to represent the shape of the hand

blobs, and were put into radial-basis function (RBF) network(s) for pose classification.

15
Gesture recognition performances using hidden Markov models (HMM) and recurrent

neural networks (RNN) were investigated. Test results showed that the continuous HMM

yielded the best performance with gesture recognition rates of 90.2%.Experiments with

combining the continuous HMMs and RNNs revealed that a linear combination of the two

classifiers improved the classification results to 91.9%. The gesture recognition system was

deployed in a prototype user interface application, and users who tested it found the

gestures intuitive and the application easy to use. Real time processing rates of up to 22

frames per second were obtained. (Wah Ng and Ranganath, 2002)

2.5.3 Bringing Gesture Recognition to All Devices.

In 2014, AllSee was developed. It was a hand gesture technology that could work across

all systems including those with no batteries as existing gesture-recognition systems

consumed significant power and computational resources. It consumed three to four orders

of magnitude lower power than other higher developed systems and can enable always-on

gesture recognition for smartphones and tablets. It extracts gesture information by using

existing wireless signals (e.g., TV transmissions) in the surrounding but does not incur the

power and computational overheads of prior wireless approaches. It was tested over a set

of eight gestures and had a 97% accuracy. (Kellogg et all, 2014)

2.5.4 SVD- PCA approach of neural Networks for Gesture Recognition

Singular Value Decomposition - SVD is an approach used for extracting the silent features

of image used for data dimension reduction and training purposes. Principal Component

Analysis- PCA is a linear transformation method used in statistical techniques. This method

used for data dimension reduction and feature extraction.

16
The SVD-PCA system uses trained hand dataset images which are taken and then hand

detection is done through skin detection technique. After that various morphological

operation are performed on an image to improve the quality so that it can clearly show the

skin pixels. Features are then extracted from trained image dataset using SVD-PCA

approach and used to train the network. (Sharma and Sharma, 2019)

2.5.5 Inertial Measurement Unit (IMU) Sensor-Based Hand Gesture Recognition

Sensor based gesture recognition has gone through a lot of developments and one of those

developments is an algorithm that employs Dynamic Time Warping (DTW)

in a Restricted Column Network (RCE) neural network to create a high-performance

Human Gesture Recognition (HGR) algorithm that supports real-time learning.

The proposed HGR algorithm uses the learning method of RCE neural networks and

distance measurement scheme of DTW.

The proposed HGR algorithm could achieve a recognition accuracy of 98.6%, which is

13.2%, 10.6%, and 4% higher than that of RCE neural networks, MLPs, and DTW-based

HGR algorithms, respectively. (Kim et all, 2019)

17
CHAPTER THREE

METHODOLOGY

For this study, Tensor Flow object Detection API and Python will be used to create a real time

gesture detection device that uses a webcam and can detect different sign language poses.

This method would be implemented because its algorithm is easier to use and codes aren’t as

complex.

The processes involved in carrying out the study includes:

3.1 Collecting images using Python and Open.cv

We collect images using the webcam by making different sign language poses and

allowing the webcam capture it. These are the images that we’re going to use to

train on. For this project, we are going to be working with five expressions;

hello,thank you,yes,no and I love you. The figures below show the corresponding

American sign language gestures for each expression.

Fig 1. Hello Fig 2. No Fig 3. I Love You

Fig 4. Yes Fig 5. Thank you

18
Fig 6. Python code for collecting the images

3.2 Labelling images for object orientation

The images collected are going to be passed on to label image package and

detection boxes are going to be drawn against the sign language poses. Label image

package is an open source package that allows you to label images for object

detection easily. Here, a labelling tool is dragged over the hand gesture and labelled

as the corresponding word it represents. This is done for each image generated

19
Fig 7. Labelling the ‘hello’ image using the Label Image package

3.3 Training and testing the data set for sign language

The labelled images are then split into a training and testing partition. This allows

the model to train on a certain set of data and test on another partition of data.

Transfer learning is going to be used against the Tensor Flow object detection API

to be able to train an object detector. Here and SSD mobile net model is used. It is

a pre-trained model that allows to perform transfer learning faster..

20
Fig 8. SSD mobile net model for training the model

3.4 Detecting sign language in real time

Python and open.cv is going to be used to detect gestures in real time. The labelled

gestures are made in front of the webcam and it identifies and writes the gesture

being made with its percentage accuracy.

21
CHAPTER FOUR
DISCUSSION AND RESULT
The following are the real time results of the hand gestures when making them in front of the

webcam and their corresponding accuracies.

Fig. 9 Thank you (gesture detection) Fig. 10 Yes (gesture detection)

Fig.11 I Love You (gesture detection) Fig 12 Hello (gesture detection)

Fig 13 No (gesture detection)

22
Performance evaluation was performed using the various accuracies that were documented while

the gesture detection occurred. Five people tested out the five gestures and the average accuracies

were calculated.

Gesture User 1 User 2 User 3 User 4 User 5 Average

Hello 82.5% 88.0% 81.0% 83.0% 92.5% 85.4%

Thank You 81.5% 91.5% 91.0% 89.5% 86.5% 88%

Yes 94.6% 92.9% 93.7% 91.4% 95.2% 93.6%

No 99.5% 97.0% 99.5% 99.5% 97.5% 98.6%

I Love You 97.5% 94.6% 91.0% 88.0% 81.5% 90.5%

Table 1. Accuracy measurements

From the table above we can see the mean accuracies fall between 85% - 99% which shows that

the system is very accurate.

23
CHAPTER FIVE

SUMMARY, RECOMMENDATION AND CONCLUSION

5.1 Summary

The overall implementation of gesture recognition in video conferencing was successful.

Responses indicated that it was easy to use and effective.

5.2 Recommendation
From this project, the use of Tensor Flow object Detection API and Python to create a real

time gesture detection has shown that it can be implemented in video conferencing

because of the use of the webcam, it’s making use of the already available resources and

the vocabulary can be expanded to the whole vocabulary of American Sign language and

other sign language systems.

5.3 Conclusion

The accuracy of the model fell between 85% - 99% which means the predictions are

reliable. However, the use of gesture recognition in videoconferencing is an

implementation that still requires more research. This research can be carried out using

other gesture recognition methods and with even bigger sign language vocabulary and

gesture commands that consists of other factors not included in this project.

24
REFERENCES

 Egido: Videoconferencing as a Technology to Support Group Work: A Review .of its


Failure Bell Communications Research, Inc.445 South Street Morristown, NJ 0796CL1910
(1988)

 Feenberg: The Written World On the theory and practice of computer conferencing,
Western Behavioral Sciences Institute, La Jolla, California, United States and San Diego
State University(1989)

 Kellogg et all: Bringing Gesture Recognition to All Devices. Co-primary Student Authors
(2014)

 Kim et all: IMU Sensor-Based Hand Gesture Recognition for Human-Machine Interfaces.
School of Electronics and Information Engineering, Korea Aerospace University, Goyang-
si 10540 Department of Information and Communication Engineering, Sejong University,
Seoul 143-747, Korea(2019)

 Murakami and Taguchi: Gesture Recognition using Recurrent Neural Networks


Human Interface Laboratory Fujitsu Laboratories LTD. Kawasaki 1015, Kamikodanaka,
Nakahara-ku, Kawasaki, 211, Japan (1991)

 Orientation Histograms for Hand: ‘Mitsubishi Electric Research Laboratories Cambridge


Research Center. Gesture Recognition TR-94-03a (1994)

 Sharma and Sharma: Gesture Recognition System, Computer Science Engineering,


Krishna Engineering College, Inderprastha Engineering College, Ghaziabad, India. (2019)

 Sonam P and Ubale: Gesture Recognition-A Review, Electronics Dept., AVCOE,


Sangamner. IOSR Journal of Electronics and Communication Engineering (IOSR-
JECE)(2015)

 Wah Ng and Ranganath: Real-time gesture recognition system and application. Image and
Vision Computing 20 (2002) 993–1007. Department of Electrical and Computer
Engineering, National University of Singapore, Singapore (2002)

 Wu and Huang: Vision-Based Gesture Recognition: A Review


Beckman Institute 405 N. Mathews University of Illinois at Urbana Champaign Urbana
(1999)

25

You might also like