Periocular Biometrics Under Relaxed Constraints: Project Report On

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 61

PROJECT REPORT ON

PERIOCULAR BIOMETRICS
UNDER RELAXED CONSTRAINTS

BY

UDIT SHARMA(1409131103)
RISHI TRIPATHI(1409131060 )
AMAN KUSHWAHA (1409131009)

Under the Guidance of

Mrs. Ruchi Paliwal


Associate Professor

DEPARTMENT OF ELECTRONICS & COMMUNICATION


ENGINEERING
JSS ACADEMY OF TECHNICAL EDUCATION
C-20/1 SECTOR-62, NOIDA

MAY, 2018

i
Project Report On

PERIOCULAR BIOMETRICS
AND
ANALYSIS UNDER RELAXED CONSTRAINTS

by

UDIT SHARMA(1409131103)
RISHI TRIPATHI(14091310 )
AMAN KUSHWAHA (14091310)

Under the Guidance of

Mrs. Ruchi Paliwal Associate Professor

Submitted to the Department of Electronics & Communication Engineering


in partial fulfillment of the requirements
for the degree of
Bachelor of Technology
in
Electronics & Communication Engineering

ii
JSS Academy of Technical Education, Noida
Dr APJ ABDUL KALAM Technical University, Lucknow
May – 2018
DECLARATION

We hereby declare that this submission is our own work and that, to the best of our

knowledge and belief, it contains no material previously published or written by

another person nor material which to a substantial extent has been accepted for the

award of any other degree or diploma of the university or other institute of higher

learning, except where due acknowledgment has been made in the text.

Signature
1.Udit Sharma
1409131103

Signature
2. Rishi Tripathi
14091310

Signature
3.Aman Kushwaha
14091310

iii
CERTIFICATE
 

This is to certify that Project Report entitled “Periocular Recognition and Analysis

Under Relaxed Constraints” which is submitted by Udit Sharma ,Rishi Tripathi and

Aman Kushwaha for partial fulfillment of the requirement for the award of degree B.

Tech. in department of Electronics and Communication Engineering of Dr APJ

ABDUL KALAM Technical University, Lucknow is a record of the candidate own

work carried out by him under my/our supervision. The matter embodied in this thesis

is original and has not been submitted for the award of any other degree.

Signature
Mrs Ruchi Paliwal
Associate Professor

Date:

iv
ACKNOWLEDGEMENT

The successful completion of this report would not have been possible without the
help and guidance of many people. We avail this opportunity to convey our gratitude to these
people.

We would like to show our greatest appreciation to Prof. Sampath Kumar, Head of
Department, Electronics and Communication. We can’t say thank you enough for his
tremendous support and help. I feel motivated and encouraged every time. Without his
encouragement and guidance this report would not have been materialized.

We feel privileged to express our deep regards and gratitude to our mentor, Mrs.
Ruchi Paliwal without whose active support; this project would not have been possible.
We owe sincere regards towards the Department Evaluation Committee , ECE who
motivate us to achieve the desired goal.

Signature
1.Udit Sharma
0909131093

Signature
2. Rishi Tripathi
0909131126

Signature
3.Aman Kushwaha
0909131033

v
ABSTRACT

Ocular recognition is expected to provide a higher flexibility in handling practical


applications as oppose to the iris recognition, which only works for the ideal open-eye case.
However, the accuracy of the recent efforts is still far from satisfactory at uncontrollable
conditions, such as eye blinking which implies any poses of eyes. To address these issues, the
skin texture, eyelids, and additional geometrical features are employed. In addition, to achieve
higher accuracy, sequential forward floating selection (SFFS) is utilized to select the best
feature combinations. Finally, the non-linear SVM is applied for identification purpose.
Experimental results demonstrate that the proposed algorithm achieves the best accuracy for
both open eye and blinking eye scenarios. As a result, it offers greater flexibility for the
prospective subjects during recognition as well as higher reliability for security.

vi
TABLE OF CONTENTS
Page No.
DECLARATION….................................................................................................. iii
CERTIFICATE.................................................................................................……. iv
ACKNOWLEDGEMENT........................................................................................ v
ABSTRACT ............................................................................................................. vi
LIST OF FIGURES.................................................................................................. x
LIST OF TABLES.................................................................................................... xi
LIST OF ABBREVIATIONS………………………………………………………xii
LIST OF SYMBOLS................................................................................................ xiii
CHAPTER 1 - INTRODUCTION TO THE PROJECT
1.1 Description……………............................................................. 1
1.2 Flow Chart of Project………………………………………….. 2
1.3 Block Diagram of Project.......................................................... 3
1.4 Problem Statement..................................................................... 4
CHAPTER 2 - Theoretical Background and Literature Review
2.1. Introduction to RFID................................................................ 5
2.1.1 Components of RFID System................................... 5
2.1.2 RFID Tags.................................................................. 6
2.1.3 Frequency Range of RFID......................................... 8
2.1.4 Readers....................................................................... 8
2.1.5 Principle of Working ................................................. 9
2.1.6 Advantages................................................................. 10
2.1.7 Application................................................................. 11
2.2 Introduction to GSM….............................................................. 11
2.2.1 GSM-Architecture...................................................... 12
2.2.2 GSM Network Areas…............................................... 13
2.2.3 Other GSM Specifications........................................... 14
2.2.1 Advantages…….......................................................... 16
2.3 Introduction to Power Supply..................................................... 16
2.3.1 Bridge wave Rectifier…….………………………….. 17
2.3.2 Voltage Regulator……………………………………. 17
2.3.3 Basic Rectifier Operation….…………………………. 19

vii
CHAPTER 3 - Microcontroller and Its Specifications
3.1 Introduction To Embedded System........................................... 21
3.1.1 User Interface............................................................... 21
3.1.2 Simple Systems............................................................ 22
3.2 Introduction To Microcontroller................................................ 22
3.3 Microcontroller Core System.................................................... 22
3.3.1 Pin Diagram 89s52....................................................... 23
3.3.2 Functional Block Diagram............................................ 24
3.3.3 Pin Description..............................................................
25
3.3.4 Program Memory.......................................................... 27
3.3.5 Data Memory................................................................ 27
3.3.6 Watchdog Timer............................................................ 28
3.3.7 Baud Rate Generator..................................................... 28
3.3.8 Programmable Clock Out.............................................. 29
3.3.9 Interrupts........................................................................ 29
3.3.10 Oscillator Characteristics............................................... 30
3.3.11 Idle Mode ...................................................................... 31
3.3.12 Data Polling................................................................... 31
CHAPTER 4 - GSM Modem (SIM300) Specifiactions
4.1 Introduction to SIM300................................................................. 32
4.2 Product Concept……………......................................................... 32
4.3 Power Supply of SIM300............................................................... 35
4.4 Serial Interfaces............................................................................. 35
CHAPTER 5 – Specification of Other Components Used
5.1 Introduction to LCD…................................................................... 37
5.1.1 Interfacing LCD Module……………………………….. 37
5.1.2 Basic 16X 2 Characters LCD - Black On Green 5V.….. 38
5.1.3 Checking LCD Busy Status…………………………….. 39
5.1.4 Advantages Of Graphic LCD…………………………… 40
5.2 Introduction to LED……………………………………………….. 41
5.3 Introduction to Switch…………………………………………….. 42
CHAPTER 6 – Software Description
6.1 Introduction to KEIL….................................................................. 43

viii
6.2 PRO51 Burner Software……......................................................... 46
FUTURE SCOPE........................................................................................ 47
CONCLUSION........................................................................................... 48
APPENDIX………………………………………………………………... 49
BIBLIOGRAPHY…………………………………………………………. 54

ix
LIST OF FIGURES

Page
Figure 1.1 Block diagram of project…………………………………………………. 03

Figure 2.1 WORKING OF RFID SYSTEM…………………………..…………….. 10

Figure 2.2 GSM Network...…………………………………………...……………... 13

Figure 2.3 Block Diagram of Power Supply …………………………………………16

Figure 2.4 Bridge wave Rectifier…………………………….……………………….17

Figure 2.5 Voltage Regulator IC……………………………………………………...18

Figure 2.6 Circuit Diagram Of Power Supply……………………………................. 19

Figure 2.7(a) Bridge Rectifier……………….……………………………................. 19

Figure 2.7(b) Bridge Rectifier ……………………………………………................ 20

Figure 3.1 Pin Diagram of 89s52…………………………………………………….23

Figure 3.2 FUNCTIONAL BLOCK DIAGRAM OF 89s52………………............... 24

Figure 4.1 VBAT input…………………………………………………................... 35

Figure 4.2 Serial Communication Connection………………………………………..36

Figure 5.1 Graphical LCD…………………………………………………………… 38

Figure 5.2 LCD Connections with Microcontroller…………………………………..40

Figure 5.3 Led………………………………………………………………………...41

Figure 5.4 Switches…………………………………………………..……………….42

Figure 5.5 Internal Circuit Of a Manual Switch………………………………..……. 42

x
LIST OF TABLES

Page
Table 3.1 TCON Status Word…………………………………………………….............. 16

Table 3.2 TCON Bit Representation………………………………………………………. 16

Table 4.1 SIM300 key features …………………………………………………………….33

Table 4.2 Logic levels of serial ports pins Parameter……………………………………… 36

xi
LIST OF ABBREVIATIONS

RFID Radio Frequency Identification

LCD Liquid Crystal Display

RF Radio Frequency

UHF Ultra High Frequency

EPC Electronic Product Code

UPC Universal Product Code

UCC Universal Commercial Code

DTL Diode Transistor Logic

TTL Transistor-Transistor Logic

AVR Advanced Virtual RISC

CPU Central Processing Unit

RAM Random Access Memory

ROM Read Only Memory

EEPROM Electrically Erasable Read Only Memory

CMOS Complementary Metal Oxide Semiconductor

RST Reset

ALE Address Enable Latch

PSEN Program Storage enable

CLK Clock

EXEN External Enable

xii
WDT Watch Dog Timer

UART Universal Asynchronous Receiver/Transmitter

ISP Internet Service Provider

LED Light Emitting Diode

GND Ground

R/W Read/Write

I/P Input

O/P Output

E Enable

xiii
LIST OF SYMBOLS

µ Micro
µf Micro Farad
pf Pico Farad
MHz Mega Hertz
KHz Kilo Hertz
ms Milli Second
A Ampere
V Volt
Deg Degree
Ʈ Time Constant
R Resistor
C Capacitor
L Inductor
Eavg Average Voltage
Vss Supply Voltage
Ven Enable Voltage
Iₒ Peak Current
Vi Input Voltage
Ptot Total power Dissipation
°C Degree Celsius
Tj Junction temperature

xiv
CHAPTER 1

INTRODUCTION

1.1 DESCRIPTION

In recent years, biometrics has become an important part of various applications, including
identification, video surveillance, and recognition system. The reason behind is that it exploits
the bio-invariant characteristics for yielding robust features. Among various biometrics, some
particular regions on the face are commonly considered as highly discriminative, such as the
eye region which is explored in the iris and ocular recognitions.

The human iris exhibits a complicated textural pattern on its anterior surface. An iris
recognition system exploits the perceived uniqueness of this pattern to distinguish individuals.
The key processing steps of an iris recognition system are: (a) acquiring the iris imagery; (b)
locating and segmenting the iris; (c) encoding the textural patterns as feature templates; and
(d) matching the templates across an existing database for determining identity. A majority of
iris recognition systems require a considerable amount of user participation. The iris
information captured by the sensor is either processed immediately, or stored in a database for
later processing. The biometric cue resident in an iris image depends on at least two factors:
(a) the quality of the image; and (b) the spatial extent of the iris present in the captured image.
Both these factors can be regulated at the image acquisition stage to achieve reliable accuracy.
However, such a regulation is possible only when the iris recognition system is employed in
an overt situation involving cooperative subjects. Acquiring the iris information becomes
extremely challenging in covert operations or in situations involving a non-cooperative
subject. Several challenges such as moving subjects, motion blur, occlusions, improper
illumination, off-angled irises, specular reflection, and poor image resolution adversely affect
the biometric content of the iris data. In such situations, the reliability of the iris data could be
improved by fusing it with information from the surrounding regions of the eye.

1.2 Periocular Biometrics


A fixed region surrounding the iris of an individual is referred to as the periocular region

1
1.Depending on the size of the image used, this region usually encompasses the eyelids,
eyelashes, eyebrows, and the neighboring skin area. Using the periocular region has the
following advantages:
(a) the information regarding the shape of the eye and texture of the skin around it can
vary across individuals; which can be used as a soft biometric trait,

(b) No additional sensors, besides the iris camera, are required to acquire the periocular
data. Periocular skin texture has been used for human identification in various ways. Jain
et al. used it to detect micro-features such as moles, scars, or freckles and use them as soft
biometric traits. Others adopt a more general representation of the overall texture to
facilitate recognition using popular texture measures such as Discrete Cosine
Transformations (DCT) [4], Gradient Orientation Histograms (GOH), or Local Binary
Patterns(LBP) Formerly, Jillela and Ross’s method considered the scale invariant feature
transform (SIFT) and local binary pattern (LBP) to extract features on both face and ocular
regions for forming a multimodal biometrics system. On the other hand, Woodard et al.’s
method extracted both rotation-invariant LBP and color information directly as features for
recognition. However, simply the periocular region is involved, and it omits the most
discriminative eye region for description. Park et al.’s method employed the SIFT, LBP, and
gradient orientation for feature extraction. However, the localization method, which utilizes
either the LBP or gradient orientation, fails when eyes are not horizontally symmetric and
gazing forward. In summary, the common drawback of the aforementioned methods is that
they only consider the periocular region and without the eye regions. Based on observations,
the essential eye features can potentially contribute to significant improvement in terms of
distinguishability.

This study focuses on ocular recognition to compensate the above mentioned issue and the
shortcomings of iris recognition. In this algorithm, the landmark points of eyes are firstly
detected. To suppress the influences of skin colors and lighting conditions, the geometric
feature, which particularly describes the contours of eyes, are considered. The uniform
texture-based LBP (UT-LBP) is applied to extract the texture property of the selected regions.
In addition, the probabilities of single- and double-fold eyelids are also derived for
description.. Finally, the non-linear support vector machine (SVM) is applied for
classification purpose. Experimental results suggest that the proposed ocular recognition
method achieves excellent performance on three different face databases, which in turn
suggests that the proposed method is an attractive candidate for practical biometrics
applications.

2
1.3 Advantage of Ocular Over Iris Recognition

Comparing with the aforementioned two biometrics, benefits of ocular recognition are as
follows:

 Iris recognition has a relatively high discriminative capability, yet it cannot handle the
“blinking eye” scenario since it is disturbed by eyelids. The eyes also indicate any
possible poses during blinking. Ocular recognition is not affected by blinking, and
thus achieves a greater robustness.
 Eye region provides eye shape information which is useful for identification;
 Ocular recognition has a greater tolerance in handling a broad range of distances.
 Eyes recognition can still cope with partially covered faces on particular regions, such
as nose and mouth.

1.3 ALGORITHM

The figures 1.3 and 1.4 shows both the training and testing algorithms of the
proposed algorithm. First, Viola-Jones’s face detection is applied for the face localization,
where an illustration (red rectangle) is shown Fig. 1.1(a). Notably, this system extracts
features solely based on grayscale images, and thus the color information is not required.
Subsequently, the supervised descent method (SDM) is adopted to locate the required 12
landmark points with notations {𝐿1,…,𝐿6,𝑅1,…,𝑅6} as indicated in Fig. 1.2 for subsequent
feature extraction. Specifically, the notations 𝐿 and 𝑅 denote the left- and right- eye,
respectively. To reduce the interference during feature extraction, the angle of vector 𝐿1𝑅1 is
normalized to 0o, and the distance 𝑑1 as indicated in Fig. 1.2(a) is normalized to the average
|𝐿1𝑅1 | of all the samples in the training set for suppressing scaling distortion, where ∙
3
denotes a vector connecting two points, and | ∙| denotes its distance. Next, histogram
equalization is utilized for mitigate illumination variation. Based upon the normalized output,
two ROI images, i.e., 𝑅𝑂𝐼1 and 𝑅𝑂𝐼2 as shown in Fig. 1.3, are utilized for feature extraction.
In addition, the proposed method could extract the feature quickly with the 𝑅𝑂𝐼2. As a result,
we can utilize the eye’s landmark points to extract all required features. Here, 𝑅𝑂𝐼1 is of size
𝑑1 × (𝑑1/2). It is localized by the two scaled points (𝑥0,𝑦0) and (𝑥1,𝑦1) as indicated in Fig.
1.1a. 𝑅𝑂𝐼1 is defined as follows:

(𝑥0,𝑦0) = (𝐿1𝑥,𝐿1𝑦 − 𝑑1/4), (1)

(𝑥1,𝑦1) = (𝑅1𝑥,𝑅1𝑦 + 𝑑1/4), (2)

where point 𝐿1 = (𝐿1𝑥,𝐿1𝑦) and 𝑅1 = (𝑅1𝑥,𝑅1𝑦) as shown in Fig. 1.3b. In addition, 𝑅𝑂𝐼2 is of
size (2.4 × 𝑑2) × (1.6 × 𝑑2) . Specifically, 𝑅𝑂𝐼2 for the left eye is determined by points
(𝑥2,𝑦2) and (𝑥3,𝑦3) as shown in Fig. 2(b), and it is defined below:

(𝑥2,𝑦2) = ( 𝐿𝑝𝑥 − 1.2 × 𝑑2 , 𝐿𝑝𝑦 − 0.8 × 𝑑2), (3)

(𝑥3,𝑦3) = ( 𝐿𝑝𝑥 + 1.2 × 𝑑2 , 𝐿𝑝𝑦 + 0.8 × 𝑑2 ), (4)

where 𝑑2 = |𝐿1𝐿4 |, and point 𝐿𝑝 = (𝐿𝑝𝑥,𝐿𝑝𝑦) = ( [𝐿1𝑥+𝐿4𝑥]/ 2 ,[ 𝐿1𝑦+𝐿4𝑦]/ 2 ); the values


of 0.8 and 1.2 are determined for covering the complete ocular region with less irrelative

facial parts (e.g., ears or nose). Similarly, 𝑅𝑂𝐼2 for the right eye can be easily extended.
Subsequently, a feature vector (𝐹) is extracted for the descriptions of the two ROI images.
Finally, the non-linear support vector machine (SVM) is applied.

4
Fig 1.1 (a,b,c) ROI of the proposed algorithm

Fig 1.2 Landmark points of Ocular Region

5
Fig 1.3 Block Diagram for Training Phase of Project

Fig 1.4 Block Diagram for Testing Phase of Project

6
CHAPTER 2

VIOLA JONES FACE DETECTION METHOD

2.1 INTRODUCTION TO VIOLA JONES METHOD

Viola Jones Method is the first object detection method primarily used for face detection. A
face detector has to tell whether an image of arbitrary size contains a human face and if so,
where it is. One natural framework for considering this problem is that of binary
classification, in which a classifier is constructed to minimize the misclassification risk. Since
no objective distribution can describe the actual prior probability for a given image to have a
face, the algorithm must minimize both the false negative and false positive rates in order to
achieve an acceptable performance.

This task requires an accurate numerical description of what sets human faces apart from
other objects. It turns out that these characteristics can be extracted with a remarkable
committee learning algorithm called Adaboost, which relies on a committee of weak
classifiers to form a strong one through a voting mechanism. A classifier is weak if, in
general, it cannot meet a predefined classification target in error terms.

An operational algorithm must also work with a reasonable computational budget. Techniques
such as integral image and attentional cascade make the Viola-Jones algorithm [10] highly
efficient: fed with a real time image sequence generated from a standard webcam, it performs
well on a standard PC.
7
2.2 Algorithm

Below is the algorithm for Viola Jones Face Detection.Start with the image features
for the classification task.

2.2.1 Feature and Integral Image

The Viola-Jones algorithm uses Haar-like features, that is, a scalar product between
the image and some Haar-like templates. More precisely, let I and P denote an image and a
pattern, both of the same size N ×N (see Figure 1). The feature associated with pattern P of
image I is defined by

To compensate the effect of different lighting conditions, all the images should be
mean and variance normalized beforehand. Those images with variance lower than one,
having little information of interest in the first place, are left out of consideration.

Figure 2.1: Haar-like features. Here as well as below, the background of a template like (b)
is painted gray to highlight the pattern’s support. Only those pixels marked in black or white
are used when the corresponding feature is calculated.

8
Figure 2.2: Five Haar-like patterns. The size and position of a pattern’s support can vary
provided its black and white rectangles have the same dimension, border each other and keep
their relative positions. Thanks to this constraint, the number of features one can draw from an
image is somewhat manageable: a 24×24 image, for instance, has 43200, 27600, 43200,
27600 and 20736 features of category (a), (b), (c), (d) and (e) respectively, hence 162336
features in all.

In practice, five patterns are considered (see Figure 2). The derived features are
assumed to hold all the information needed to characterize a face. Since faces are by and large
regular by nature, the use of Haar-like patterns seems justified. There is, however, another
crucial element which lets this set of features take precedence: the integral image which
allows to calculate them at a very low computational cost. Instead of summing up all the
pixels inside a rectangular window, this technique mirrors the use of cumulative distribution
functions. The integral image II of I

is so defined that

holds for all N1 ≤ N2 and N3 ≤ N4. As a result, computing an image’s rectangular local sum
requires at most four elementary operations given its integral image. Moreover, obtaining the
integral image itself can be done in linear time: setting N1 = N2 and N3 = N4 in (1), we find
9
As a side note, let us mention that once the useful features have been selected by the
boosting algorithm, one needs to scale them up accordingly when dealing with a bigger
window. Smaller windows, however, will not be looked at.

2.2.2 Feature Selection with Adaboost

A classifier maps an observation to a label valued in a finite set. For face detection, it
assumes the form of f : Rd 7→{−1,1}, where 1 means that there is a face and −1 the contrary
and d is the number of Haar-like features extracted from an image. Given the probabilistic
weights w· ∈ R+ assigned to a training set made up of n observation-label pairs (xi,yi),
Adaboost aims to iteratively drive down an upper bound of the empirical loss

under mild technical conditions. Remarkably, the decision rule constructed by Adaboost
remains reasonably simple so that it is not prone to overfitting, which means that the
empirically learned rule often generalizes well. Despite its groundbreaking success, it ought to
be said that Adaboost does not learn what a face should look like all by itself because it is
humans, rather than the algorithm, who perform the labeling and the first round of feature
selection, as described in the previous section. The building block of the Viola-Jones face
detector is a decision stump, or a depth one decision tree, parametrized by a feature f
∈{1,··· ,d}, a threshold t ∈ R and a toggle T ∈{−1,1}. Given an observation x ∈Rd, a decision
stump h predicts its label using the following rule

10
where πfx is the feature vector’s f-th coordinate. Several comments follow:

1. Any additional pattern produced by permuting black and white rectangles in an existing
pattern is superfluous. Because such a feature is merely the opposite of an existing feature,
only a sign change for t and T is needed to have the same classification rule.

2. If the training examples are sorted in ascending order of a given feature f, a linear time
exhaustive search on the threshold and toggle can find a decision stump using this feature that
attains the lowest empirical losson the training set. Imagine a threshold placed somewhere on
the real line, if the toggle is set to 1, the resulting rule will declare an example x positive if πfx
is greater than the threshold and negative otherwise. This allows us to evaluate the rule’s
empirical error, thereby selecting the toggle that fits the dataset better.

Since margin,

and risk, or the expectation of the empirical loss (3), are closely related, of two decision
stumps having the same empirical risk, the one with a larger margin is preferred. Thus in the
absence of duplicates, there are n + 1 possible thresholds and the one with the smallest
empirical loss should be chosen. However it is possible to have the same feature values from
different examples and extra care must be taken to handle this case properly.

By adjusting individual example weights, Adaboost makes more effort to learn harder
examples and adds more decision stumps in the process. Intuitively, in the final voting, a
stump ht with lower empirical loss is rewarded with a bigger say when a T-member
committee (vote-based classifier) assigns an example according to

11
For notational simplicity, we denote the empirical loss by

where (X,Y ) is a random couple distributed according to the probability P defined by the
weights wi(1), 1 ≤ i ≤ n set when the training starts. As the empirical loss goes to zero with T,
so do both false positive P(fT(X) = 1|Y = −1) and false negative rates P(fT(X) = −1|Y = 1)
owing to

P(fT(X) 6= Y ) = P(Y = 1)P(fT(X) = −1|Y = 1) +P(Y = −1)P(fT(X) = 1|Y = −1).

Thus the detection rate

P(fT(X) = 1|Y = 1) = 1−P(fT(X) = −1|Y = 1),

must tend to 1. Thus the size T of the trained committee depends on the targeted false positive
and false negative rates. In addition, let us mention that, given n− negative and n+ positive
examples in a training pool, it is customary to give a negative (resp. positive) example an
initial weight equal to 0.5/n− (resp. 0.5/n+) so that Adaboost does not favor either category at
the beginning.

2.2.3 Attentional Cascade

In theory, Adaboost can produce a single committee of decision stumps that generalizes
well. However, to achieve that, an enormous negative training set is needed at the outset to
gather all possible negative patterns. In addition, a single committee implies that all the
windows inside an image have to go through the same lengthy decision process. There has to
be another more cost-efficient way.

The prior probability for a face to appear in an image bears little relevance to the
presented classifier construction because it requires both the empirical false negative and false
12
positive rate to approach zero. However, our own experience tells us that in an image, a rather
limited number of sub-windows deserve more attention than others. This is true even for face-
intensive group photos. Hence the idea of a multi-layer attentional cascade which embodies a
principle akin to that of Shannon coding: the algorithm should deploy more resources to work
on those windows more likely to contain a face while spending as little effort as possible on
the rest.

Each layer in the attentional cascade is expected to meet a training target expressed in
false positive and false negative rates: among n negative examples declared positive by all of
its preceding layers, layer l ought to recognize at least (1−γl)n as negative and meanwhile try
not to sacrifice its performance on the positives: the detection rate should be maintained above
1−βl.

At the end of the day, only the generalization error counts which unfortunately can only
be estimated with some validation examples that Adaboost is not allowed to see at the training
phase. Hence in Algorithm 10 at line 10, a conservative choice is made as to how one assesses
the error rates: the higher false positive rate obtained from training and validation is used to
evaluate how well the algorithm has learned to distinguish faces from non-faces. The false
negative rate is assessed in the same way.

Appending a layer to the cascade means that the algorithm has learned to reject a few
new negative patterns previously viewed as difficult, all the while keeping more or less the
same positive training pool. To build the next layer, more negative examples are thus required
to make the training process meaningful. To replace the detected negatives, we run the
cascade on a large set of gray images with no human face and collect their false positive
windows. The same procedure is used for constructing and replenishing the validation set
Since only 24×24 sized examples can be used in the training phase, those bigger false
positives are down-sampled and recycled.

As mentioned earlier, to prevent a committee from growing too big, the algorithm stops
refining its associated layer after a layer dependent size limit is breached. In this case, the shift
s is set to the smallest value that satisfies the false negative requirement. A harder learning
case is thus deferred to the next layer. This strategy works because Adaboost’s inability to
meet the training target can often be explained by the fact that a classifier trained on a limited
number of examples might not generalize well on the validation set. However, those hard
13
negative patterns should ultimately appear and be learned if the training goes on, albeit one bit
at a time.

14
CHAPTER 3

SUPERVISED DESENT METHOD

3.1 INTRODUCTION TO SDM

Mathematical optimization has a fundamental impact in solving many problems in


computer vision. This fact is apparent by having a quick look into any major conference in
computer vision, where a significant number of papers use optimization techniques. Many
important problems in computer vision such as structure from motion, image alignment,
optical flow, or camera calibration can be posed as solving a nonlinear optimization problem.
There are a large number of different approaches to solve these continuous nonlinear
optimization problems based on first and second order methods, such as gradient descent [1]
for dimensionality reduction, Gauss-Newton for image alignment or Levenberg-Marquardt for
structure from motion .

Despite its many centuries of history, the Newton’s method (and its variants) is
regarded as a major optimization tool for smooth functions when second derivatives are
available. Newton’s method makes the assumption that a smooth function f(x) can be well
approximated by a quadratic function in a neighborhood of the minimum. If the Hessian is
positive definite, the minimum can be found by solving a system of linear equations. Given an
initial estimate x0 ∈P p×1, Newton’s method creates a sequence of updates as

xk+1 = xk −H−1(xk)Jf(xk), (1)

p×p p×1
where H(xk) ∈P and Jf(xk) ∈p are the Hessian matrix and Jacobian matrix
evaluated at xk. Newton-type methods have two main advantages over competitors. First,
when it converges, the convergence rate is quadratic. Second, it is guaranteed to converge
provided that the initial estimate is sufficiently close to the minimum.

15
Fig 3.1 a) Manually labeled image with 66 landmarks. Blue outline indicates face detector.

b) Mean landmarks, x0, initialized using the face detector.

3.2 Derivation of SDM

Given an image d ∈R m×1 of m pixels, d(x) ∈R p×1 indexes p landmarks in the image. h
is a non-linear feature extraction function (e.g., SIFT) and h(d(x)) ∈R128p×1 in the case of
extracting SIFT features. During training, we will assume that the correct plan marks are
known, and we will refer to them as x∗ (see Fig. 3.2a). Also, to reproduce the testing scenario,
we ran the face detector on the training images to provide an initial configuration of the
landmarks (x0), which corresponds to an average shape (see Fig. 3.22b). In this setting, face
alignment can be framed as minimizing the following function over Δx

min. ca,p ||d(f(x,p))−Uaca||2 2, (2)

f(x0 +Δx)=||h(d(x0 +Δx))−φ∗||2 2, (3)

where φ∗ = h(d(x∗)) represents the SIFT values in the manually labeled landmarks. In
the training images, φ ∗ and Δx are known.

Eq. 3 has several fundamental differences with previous work on PAMs in Eq. 2. First,
in Eq. 3 we do not learn any model of shape or appearance beforehand from training data. We
align the image w.r.t. a template φ ∗ . For the shape, our model will be a non-parametric one,

16
and we will optimize the landmark locations x∈R2p×1 directly. Recall that in traditional
PAMs,thenon-rigid motion is modeled as a linear combination of shape bases learned by
computing PCA on a training set. Our non-parametric shape model is able to generalize better
to untrained situations (e.g., asymmetric facial gestures). Second, we use SIFT features
extracted from patches around the landmarks to achieve a robust representation against
illumination. Observe that the SIFT operator is not differentiable and minimizing Eq. 3 using
first or second order methods requires numerical approximations (e.g., finite differences) of
the Jacobian and the Hessian. However, numerical approximations are very computationally
expensive. The goal of SDM is to learn a series of descent directions and re-scaling factors
(done by the Hessian in the case of Newton’s method) such that it produces a sequence of
updates (xk+1 = xk +Δ xk ) starting from x0 that converges to x∗ in the training data. Now, only
for derivation purposes, we will assume that h is twice differentiable. Such assumption will be
dropped at a later part of the section. Similar to Newton’s method, we apply a second order
Taylor expansion to Eq. 3 as,

f(x0 +Δx) ≈ f(x0)+Jf(x0)T Δx + ½ ΔxT H(x0)Δx, (4)

where Jf (x0) and H(x0) are the Jacobian and Hessianmatrices of f evaluated at x 0. In the
following, we will omit x0 to simplify the notation. Differentiating (4) with respect to Δx and
setting it to zero gives us the first update for x,

Δx1 = −H−1Jf = −2H−1JhT (φ0 −φ∗), (5)

where we made use of the chain rule to show that Jf = 2JhT (φ0 −φ∗), where φ0 = h(d(x0)). The
first Newton step can be seen as projecting Δφ0 = φ0 −φ∗ onto the row vectors of matrix R0 =
−2H−1J hT. We will refer to R0 as a descent direction. The computation of this descent direction
requires the function h to be twice differentiable or expensive numerical approximations for
the Jacobian and Hessian. In supervised setting will directly estimate R0 from training data by
learning a linear regression between Δx∗ = x∗ − x0 and Δφ0. Therefore, our method is not
limited to functions that are twice differentiable. However, note that during testing (i.e.,
inference) φ∗ is unknown but fixed during the optimization process. To use the descent
direction during testing, we will not use the information of φ ∗ for training. Instead, we rewrite
Eq. 5 as a generic linear combination of feature vector φ 0 plus a bias term b0 that can be
learned during training,

17
Δx1 = R0φ0 + b0. (6)

Using training examples, our SDM will learn R 0,b0 used in the first step of optimization
procedure. In the next section, we will provide details of the learning method. It is unlikely
that the algorithm can converge in a single update step unless f is quadratic under x. To deal
with non-quadratic functions, the SDM will generate a sequence of descent directions. For a
particular image, the Newton method generates a sequence of updates along the image
specific gradient directions,

xk = xk−1 −2H−1J hT (φk−1 −φ∗). (7)

φk−1 = h(d(xk−1)) is the feature vector extracted at previous landmark locations, x k−1. In
contrast, SDM will learn a sequence of generic descent directions {R k} and bias terms{bk}, xk
= xk−1 + Rk−1φk−1 + bk−1, (8)

such that the succession of xk converges to x∗ for all images in the training set.

3.3 Learning for SDM

This section illustrates how to learn R k ,bk from training data. Assume that we are given a set

of face images {di} and their corresponding hand-labeled landmarks{xi∗}. For each image

starting from an initial estimate of the landmarks xi0, R0 and b0 are obtained by minimizing the

expected loss between the predicted and the optimal landmark displacement under many

possible initializations. We choose the L2-loss for its simplicity and solve for the R 0 and b0

that minimizes

18
where Δxi = xi∗ −xi0 and φi0 = h(di(xi0)). We assume that xi0 is sampled from a Normal

distribution whose parameters capture the variance of a face detector. We approximate the

integration with Monte Carlo sampling, and instead minimize

Minimizing Eq. 10 is the well-known linear least squares problem, which can be solved in

closed-form. The subsequent Rk,bk can be learned as follows. At each step, a new dataset

{Δxi∗,φik} can be created by re-cursively applying the update rule in Eq. 8 with previously

learned Rk−1,bk−1.More explicitly, after Rk−1,bk−1 is learned,we update the current land marks

estimate xik using Eq. 8. We generate a new set of training data by computing the new optimal

parameter update Δxik ∗ = xi ∗−xik andthe new feature vector, φik = h(di(xik)). Rk and bk can be

learned from a new linear regressor in the new training set by minimizing

The error monotonically decreases as a function of the number of regressors added. s

19
CHAPTER 4

GEOMETRIC FEATURES

4.1 INTRODUCTION TO Geometric Features

This chapter deals with the geometric features.Geometric features can be classified
further into 4 sub categories.These are

 Curvature Of Lower Eye

 Slant Angle

 Histogram Of Oriented Gradient(HOG)

 Ratio Of Distance From End Points Of Eye

It is noteworthy that all features are not affected by the lights and shadow.There are 5 types of
eye that can be distinguished with help of geometric features.These are:-

20
Fig 4.1 Types Of Eyes

4.2 Curvature Of Eye

Since the lower eyelids lean to stable during eye blinking, the curvature can be treated
as a reliable feature in our feature set. For this, the curvatures of lower eyelids are regarded as
the features 𝐺𝜃 = {𝜃𝐿,𝜃𝑅} for recognition. To derive the curvature of the left eye, the 𝐿7 point
as labeled in Fig. 4.2(a) is additionally located to describe the entire lower eyelids. This 𝐿7
point is located at the middle between points 𝐿5 and 𝐿6 since the line 𝐿5𝐿6 is quite close to
the 𝐿7𝑖𝑑𝑒𝑎𝑙 point as per the observation from the Multi PIE face database [15]. Thus, 𝜃𝐿 is the
angle between the two vectors, 𝐿𝑐𝐿1 and 𝐿𝑐𝐿4 , as illustrated in Fig. 4.2(b). A circle is
introduced to connect the points L1, L4, and L7 so that these points lie on the circumference
of the circle, and 𝐿𝑐 is the corresponding center of this circle.

Fig 4.2(a)Location Of L7 Ideal And L7

21
Fig 4.2(b)Curvature Of Lower Eye Lid

4.3 Slant Angle Of Eye

The position information of medial and lateral canthus are capable in distinguishing
different types of eyes as shown in Fig 4.1.(c)-(e) , the relative positions are considered in
forming the feature 𝐺𝜑 = {𝜑𝐿,𝜑𝑅}. Specifically, the feature of the left eye 𝜑𝐿 is the angle
between 𝐿1𝑅1 and 𝐿1𝐿4 . To derive the relative position of the left eye, a constant reference
line 𝐿1𝑅1 ̅̅ is demanded, and subsequently the degree 𝜑𝐿 between two vectors, 𝐿1𝑅1 and
𝐿1𝐿4, is derived. In addition, the cases when G𝜑 are positive, negative, and zero, they
associate to the down-turned eyes, slanted eyes, and ideal eyes types, respectively. .

4.4 Ratio Of Distance From End Points Of Eye

The ratio between the two distances, |𝐿1𝑅1| and |𝐿4𝑅4 |, is capable in distinguishing
between close-set eyes and wideset eyes as shown in Fig. 4.1(a)-(b). Hence, in Fig. 4.4, this
ratio is regarded as the feature 𝐺𝑅 as formulated below:

|L1 R 1|
Gr=
|L 4 R 4|

22
Fig 4.4 Definition for GR

4.4 Histogram Of Oriented Gradient

The shapes of the medial canthus are discriminative for the identification of different
identities, and they are formulated as histogram 𝐺𝐻 = {𝐻𝐿,𝐻𝑅} for both the left and right eyes.
Fundamentally, the idea of the histogram of oriented gradient (HOG) is employed for the
introduction of the shapes. Supposing that 𝑅𝑂𝐼3 denotes a window centered at position 𝐿4 as
illustrated in Fig. 4.4. Here, the contents of size 15×15 pixels are considered for presentation.
Subsequently, the 𝑅𝑂𝐼3 is divided into 𝛼 × 𝛼 cells, and the histogram with 𝛽 bins is calculated
for each block. Finally, the bins in each block are then normalized using L2 norm. Thus, 𝐻𝐿
can be defined below:

𝐻𝐿 = {ℎ𝐿(𝑧)(𝑖,𝑗)| 𝑧 = [1,(𝛼 − (𝜁 − 1)) × (𝛼 − (𝜁 − 1))],𝑖 = [1,𝜁 × 𝜁],𝑗 = [1,𝛽]}, where 𝜁 ≤ 𝛼,

where ℎ𝐿(𝑧)(𝑖,𝑗) denotes the j-th bin in the i-th cell and z-th block of the left eye. Both 𝛼 × 𝛼
and 𝜁 × 𝜁 denote the numbers of the cells in a window and in each block, respectively, and 𝛽
denotes the number of bins for each cell. The influences of the parameters 𝛼, 𝛽, and 𝜁

23
24
CHAPTER 4

GEOMETRIC FEATURES(HOG)

1.1 Introduction of HOG

HOG stands for Histograms of Oriented Gradients. HOG is a type of “feature descriptor”. The
intent of a feature descriptor is to generalize the object in such a way that the same object (in
this case a person) produces as close as possible to the same feature descriptor when viewed
under different conditions. This makes the classification task easier. The intensity of an image
contains discriminative information as well as noise, and in most cases, is the only source that
can be used to still object recognition. However, what really matters is not the absolute value,
but the relative value which reflects the structure information or texture variation of an object.

Various feature extraction and selection methods have been widely used.Besides
holistic methods such as PCA and LDA, local descriptors have been studied recently. An
ideal descriptor for the local facial regions should have large inter-class variance and small
intra-class variance, which means that the descriptor should be robust with respect to varying
illumination, slight deformations, image quality degradation, and so on. Information theory
was used to develop a criterion to evaluate the potential classification power of different
features.

The use of orientation histograms also has many preursors. Freeman and Rothused
orientation histograms for hand gesture recognition, Dalal and Triggs presented a pedestrian
detection algorithm with excellent detection results using a dense grid of HOG. The HOG
provides the underlying image patch descriptor for matching scale invariant keypoints when
combined with local spatial histogramming and normalization in Lowe’s scale invariant
feature transformation(SIFT) approach to wide baseline image matching.

25
5.2 Basic theory

The basic idea of HOG features is that the local object appearance and shape can often be
characterized rather well by the distribution of the local intensity gradients or edge directions,
even without precise knowledge of the corresponding gradient or edge positions. The
orientation analysis is robust to lighting changes since the histogramming gives translational
invariance. The HOG feature summarizes the distribution of measurements within the image
regions and is particularly useful for recognition of textured objects with deformable shapes.
The method is also simple and fast so the histogram can be calculated quickly.

As used in SIFT or the EBGM method, the original HOG feature is generated for each key
point of an image.The neighboringarea around each key point is divided into several
uniformly spaced cells and for each cell a local 1-D histogram of gradient directions or edge
orientations is accumulated over all the pixels of the cell. The histogram entries of all cells
around one key point form the feature of that key point. The combined histogram features of
all key points form the image representation. The whole process is shown in Fig. 5.1.

26
Fig 5.1 Image window divided into small spatial regions (“cells”). Local 1-D histograms of
gradient directions or edge orientations are accumulated and concatenated to form the final
histogram feature.

5.2 Orientation representation

The next step is the fundamental nonlinearity of the descriptor. Each pixel calculates a
weighted vote for an edge orientation histogram channel based on the orientation of the
gradient element centred on it, and the votes are accumulated into orientation bins over
local spatial regions that we call cells. Cells can be either rectangular or radial (log-polar
sectors). The orientation bins are evenly spaced over 0° – 180° (“unsigned” gradient) or
0°–360° (“signed” gradient). To reduce aliasing, votes are interpolated bilinearly between
the neighbouring bin centres in both orientation and position. The vote is a function of the
gradient magnitude at the pixel, either the magnitude itself, its square, its square root, or a
clipped form of the magnitude representing soft presence/absence of an edge at the pixel.

Orientation can be represented as a single angle or as a double angle. A single angle treats a
given edge and a contrast reversed region as having opposite orientations. A double angle
representation maps these into the same orientation. The single angle representation may
allow more patterns to be distinguished. This work used a single angle representation to allow
more differentiation between patterns. Tests in part 4 show that the single angle representation
performs much better than the double angle representation. Note that this differs from the
classic Gabor feature, which uses a single angle representation instead of the double angle. If
an image window Iof a key point is uniformly divided into N cells, the image window can be
represented as

Where C is the set of all pixels belonging to the t-th cell. For any pixel p(x,y) of the image
window I, the contrast is given by

And the gradient direction is given by

If the orientation is divided into H bins, which means the histogram vector length for each cell
is H, the histogram vector can be calculated as follows:
27
¿ C t∨¿ denotes the size of set Ct

1.2 Normalization
Gradient strengths vary over a wide range owing to local variations in illumination and
foreground-background contrast, so effective local contrast normalization turns out to be
essential for good performance. We evaluated a number of different normalization
schemes. Most of them are based on grouping cells into larger spatial blocks and contrast
normalizing each block separately. The final descriptor is then the vector of all
components of the normalized cell responses from all of the blocks in the detection
window. In fact, we typically overlap the blocks so that each scalar cell response contributes
several components to the final descriptor vector, each normalized with respect to a
different block. This may seem redundant but good normalization is critical and including
overlap significantly improves the performance. Fig. 5.2 shows that performance
increases by 4% at 10 FPPW as we increase the overlap from none (stride16) to 16-fold
−4

area / 4-follinear coverage (stride 4).

For better invariance to illumination and noise, a normalization step is usually used
after calculating the histogram vectors. Four different normalization schemes have
been proposed: L2-norm, L2-Hys, L1-sqrt, and L1-norm. This analysis used the L2-
norm scheme due to its better performance:

Where ε is a small positive value used for some regularization when an empty cell is
taken into account.

28
Fig 5.2. Comparing Performance by varying Orientation Bins

1.3
1.4 Fast computation

Liu et al.introduced methods for fast computations of histogram bin weights for
pixels whose gradient orientations are not in the orientation bin centers. As
shown in Fig. 2, gradient magnitude g is added to the nearest n-th and (n+1)-th
bin centers as gn and gn+1, respectively. It bti of the histogram vector vt of the t-th
cell can be obtained by accumulating all the gradient magnitudes in the i-th
orientation center of the t-th cell.

29
Fig 5.3 Projection of gradient magnitude to the nearest orientation bin center by the
parallelogram law

30
31
CHAPTER 5

Local binary pattern (LBP)

5.1 INTRODUCTION TO LBP

During the last few years, Local Binary Patterns (LBP) has aroused increasing interest
in image processing and computer vision. As a non-parametric method, LBP summarizes
local structures of images efficiently by comparing each pixel with its neighboring pixels. The
most important properties of LBP are its tolerance regarding monotonic illumination changes
and its computational simplicity. LBP was originally proposed for texture analysis, and has
proved a simple yet powerful approach to describe local structures. It has been extensively
exploited in many applications, for instance, face image analysis, image and video retrieval,
environment modeling, visual inspection, motion analysis, biomedical and aerial image
analysis, remote sensing, so forth.

5.2 ALGORITHM

5.2.1 GRAY SCALE AND ROTATION INVARIANT LOCAL BINARY PATTERNS

32
We start the derivation of our gray scale and rotation invariant texture operator by defining texture T
in a local neighborhood of a monochrome texture image as the joint distribution of the gray levels of P
(P > 1) image pixels:

where gray value gc corresponds to the gray value of the


center pixel of the local neighborhood and gp(p =0……..,P-1)correspond to the gray values of P
equally spaced pixels on a circle of radius R (R > 0) that form a circularly symmetric neighbor set.

If the coordinates of gc are (0; 0), then the coordinates of gp are given −R sin ⁡¿

5.2.2Achieving Gray-Scale Invariance

As the first step toward gray-scale invariance, we subtract, without losing information, the gray value
of the center pixel (gc) from the gray values of the circularly symmetric neighborhood giving:

This is a highly discriminative texture operator. It records the occurrences of various patterns in the
neighborhood of each pixel in a P-dimensional histogram. For constant regions, the differences are
zero in all directions. On a slowly sloped edge, the operator records the highest difference in the
gradient direction and zero values along the edge and, for a spot, the differences are high in all
directions. Signed differences gp - gc are not affected by changes in mean luminance; hence, the joint
difference distribution is invariant against gray-scale shifts. We achieve invariance with respect to the
scaling of the gray scale by considering just the signs of the differences instead of their exact values:

where ,

By assigning a binomial factor 2p for each sign s(g p - gc )we


transform it into a unique LBPP;R number that characterizes the spatial structure of the local image
texture:

The name “Local Binary Pattern” reflects the functionality of the operator, i.e., a local neighborhood
is thresholded at the gray value of the center pixel into a binary pattern. LBP P;R operator is by
definition invariant against any monotonic transformation of the gray scale, i.e., as long as the order
of the gray values in the image stays the same, the output of the LBP P;R operator remains constant. If
we set (P = 8; R = 1), we obtain LBP8;1, which is similar to the LBP operator . The two differences
between LBP8;1 and LBP are: 1) The pixels in the neighbor set are indexed so that they form a circular
chain and 2) the gray values of the diagonal pixels are determined by interpolation. Both

33
modifications are necessary to obtain the circularly symmetric neighbor set, which allows for deriving
a rotation invariant version of LBPP;R..

5.2.3 Achieving Rotation Invariance

The LBPP;R operator produces 2P different output values, corresponding to the 2 P different binary
patterns that can be formed by the P pixels in the neighbor set. When the image is rotated, the gray
values gp will correspondingly move along the perimeter of the circle around g 0. Since g0 is always
assigned to be the gray value of element (0; R) to the right of g c rotating a particular binary pattern
naturally results in a different LBPP;R value. This does not apply to patterns comprising of only 0s (or
1s) which remain constant at all rotation angles. To remove the effect of rotation, i.e., to assign a
unique identifier to each rotation invariant local binary pattern we define:

where ROR(x; i) performs a circular bit-wise right shift on the P-bit number x i times. In terms of
image pixels, it simply corresponds to rotating the neighbor set clockwise so many times that a
maximal number of the most significant bits, starting from g P-1, is 0.

LBPriP;R quantifies the occurrence statistics of individual rotation invariant patterns corresponding to
certain microfeatures in the image; hence, the patterns can be considered as feature detectors. Fig. 2
illustrates the 36 unique rotation invariant local binary patterns that can occur in the case of P = 8,
i.e., LBPri8;R can have 36 different values. For example, pattern #0 detects bright spots, #8 dark spots
and flat areas, and #4 edges.

Fig5.1 The 36 unique rotation invariant binary patterns that can occur in the circularly symmetric
neighbor set of LBPri8;R. Black and white circles correspond to bit values of 0 and 1 in the 8-bit
output of the operator

5.2.4 Rotation Invariant Variance Measures of the Contrast of Local Image Texture

34
The LBPriu2P;R operator is a gray-scale invariant measure, i.e., its output is not affected by any
monotonic transformation of the gray scale. It is an excellent measure of the spatial pattern, but it,
by definition, discards contrast. If gray-scale invariance is not required and we wanted to incorporate
the contrast of local image texture as well, we can measure it with a rotation invariant measure of
local variance:

V A R P;R is by definition invariant against shifts in gray scale. Since LBP riu2P;R and V A RP;R are
complementary, their joint distribution LBP riu2P;R / V A RP;R is expected to be a very powerful rotation
invariant measure of local image texture. Note that, even though we in this study restrict ourselves
to using only joint distributions of LBPriu2P;R and V A RP;R operators that have the same (P;R) values,
nothing would prevent us from using joint distributions of operators computed at different
neighborhoods.

5.2.5 Nonparametric Classification Principle

In the classification phase, we evaluate the dissimilarity of sample and model histograms as a test of
goodness-of-fit, which is measured with a nonparametric statistical test. By using a nonparametric
test, we avoid making any, possibly erroneous, assumptions about the feature distributions. There
are many well-known goodness-of-fit statistics such as the chi-square statistic and the G (log-
likelihood ratio) statistic . In this study, a test sample S was assigned to the class of the model M that
maximized the log-likelihood statistic:

where B is the number of bins and Sb and Mb correspond to the sample and model probabilities at bin
b, respectively.

Equation (12) is a straightforward simplification of the G (log-likelihood ratio) statistic:

where the first term of the righthand expression can be ignored as a constant for a given S.

L is a nonparametric pseudometric that measures likelihoods that sample S is from alternative


texture classes, based on exact probabilities of feature values of preclassified texture models M. In
the case of the joint distribution LBP riu2P;R =V A RP;R, was extended in a straightforward manner to scan
through the two-dimensional histograms. Sample and model distributions were obtained by scanning
the texture samples and prototypes with the chosen operator and dividing the distributions of
operator outputs into histograms having a fixed number of B bins. Since LBP riu2P;R has a fixed set of
discrete output values (0 → P + 1), no quantization is required, but the operator outputs are directly

35
accumulated into a histogram of P +2 bins. Each bin effectively provides an estimate of the
probability of encountering the corresponding pattern in the texture sample or prototype. Spatial
dependencies between adjacent neighborhoods are inherently incorporated in the histogram
because only a small subset of patterns can reside next to a given pattern. Variance measure V A R P;R
has a continuous-valued output; hence, quantization of its feature space is needed. This was done by
adding together feature distributions for every single model image in a total distribution, which was
divided into B bins having an equal number of entries. Hence, the cut values of the bins of the
histograms corresponded to the (100/B) percentile of the combined data. Deriving the cut values
from the total distribution and allocating every bin the same amount of the combined data
guarantees that the highest resolution of quantization is used where the number of entries is largest
and vice versa. The number of bins used in the quantization of the feature space is of some
importance as histograms with a too small number of bins fail to provide enough discriminative
information about the distributions. On the other hand, since the distributions have a finite number
of entries, a too large number of bins may lead to sparse and unstable histograms. As a rule of
thumb, statistics literature often proposes that an average number of 10 entries per bin should be
sufficient. In the experiments, we set the value of B so that this condition is satisfied.

5.3 Application

This feature is used to suppress redundancy in the description of the conventional uniform LBP
(ULBP) within the region 𝑅𝑂𝐼2. This feature is formed as a histogram 𝐾 = {𝐾𝐿,𝐾𝑅} for the left and right
eyes, in which 𝐾𝐿 = {𝑘𝐿(𝑖,𝑗)|𝑖 = [1,𝛾],𝑗 = [1,𝛿]}, and 𝑘𝐿(𝑖,𝑗) denotes the j-th bin in the i-th cell of the left
eye. Here, 𝛾 denotes the number of cells in 𝑅𝑂𝐼2, and 𝛿 denotes the number of bins in each cell.
Notably, 𝛿 = 30 is enforced by the use of WT-LBP as justified later. To extract the texture feature, a
predefined elliptical mask centered at the region 𝑅𝑂𝐼2 is demanded to eliminate the influence from
both the texture of iris and the surrounding sclera area, and the marked ROI is termed 𝑅𝑂𝐼2′. Also,
this ROI is normalized to 75×50 to address the potential geometrical influence for the later processes.
Figure 5.1(a) shows the major difference between the ULBP and the proposed WT-LBP. Since the

Fig. 5.2. Two LBP complementary cases.

36
Fig. 5.3 Rotation Invariant LBP

CHAPTER 6

SINGLE FOLD AND DOUBLE FOLD EYELIDS

6.1 INTRODUCTION

The probability of these two types of eyelid, viz., single- and doublefold eyelids for left
and right eyes, denoted by 𝑃 = {𝑃𝐿,𝑃𝑅}, are employed as a feature in the proposed
recognition system. To locate the ROI labeled as 𝑅𝑂𝐼4 in Fig. 12, the points L2 and L3 are

37
both considered to construct a square of size |𝐿2𝐿3| × |𝐿2𝐿3|. Subsequently, it is normalized to
𝑑3 × 𝑑3 pixels and the results is termed as 𝑅𝑂𝐼4′ for the feature calculation later. In this work,
𝑑3 = 12 is considered. Figure 13 shows some supervised samples of the extracted eyelids with
the normalized size. As it can be seen, there is a noisy horizontal line across the double-fold
eyelid samples in Fig. 13(b) (marked using green rectangle) in contrast to that shown in Fig.
13(a). To highlight this feature for classification, an initial refinement method which shares
the concept of non-minimum suppression (NMS) is proposed for yielding a robust feature.
Specifically, each column of the 𝑅𝑂𝐼4′ is processed independently, and the i-th binary image
(𝐵(𝑖)) which offers the potential location of double-fold eyelid is defined below:

The corresponding results are shown in the second row of Fig. 13. Initially, the location of
interest (as highlighted in the green rectangles of Fig. 13(b)) is classified. Subsequently, the
least mean square (LMS) [19] is utilized to explore the distinguishable features between these
two classes. However, the prototype of the LMS is an iterative algorithm, and it requires a
high computational complexity. Thus, its closed form is derived for a better efficiency.

Specifically, the LMS filter (𝑊) is usually employed for classification, and it is optimized
with the cost function as defined below with the gallery set.

where S denotes the number of samples. 𝐵(𝑖)(𝑥,𝑦) ∈ ℝ is reformulated as {𝐵1(𝑖),𝐵2(𝑖),……,𝐵𝑑3×𝑑3


(𝑖)
} ∈ ℝ and 𝑊(𝑥,𝑦) ∈ ℝ is reformulated as {𝑊1,𝑊2,……,𝑊𝑑3×𝑑3} ∈ ℝ. To derive the optimum
𝑊, the close form is derived based on the assumption 𝜕𝐽(𝜃) 𝜕𝑊𝑗 = 0 and thus 𝑊 = (𝐵𝐵𝑇)−1 ×
𝐵 × 𝑌 in which 𝐵𝐵𝑇 is invertible since it is a non singular matrix that consists of the outputs of
images in entire dataset for avoiding the possibility of det(𝐵𝐵𝑇) = 0. The obtained filter 𝑊 , as
illustrated in Fig. 14(a), is utilized for eyelid classification. Note that it shows the probability
of the positions of the double-fold eyelids in spatial domain.

The pre-trained filter (W) and a binary test image calculated by the same procedure of the
training process, are prepared for the eyelid type as
𝜌 = ∑ 𝑊(𝑥,𝑦) × 𝐵(𝑖)(𝑥,𝑦) (𝑥,𝑦)∈𝑅𝑂𝐼4′ , (13)
where 𝜌 denotes the probability for the double-fold eyelids. Figure 15(a) shows the
relationship between the score 𝜌 (X-axis) and the count (Y-axis) of the single- and double-

38
fold eyelids

6.2 NMS(Non Minimum Supression)

39
6.3 LMS FILTER

40
41
42
CHAPTER 7

SIMULATION RESULTS

INTRODUCTION
Two perspectives are considered in this evaluation: The performances on both 1) plain open-eye,and 2) blinking-eye
scenarios. In the following experiments, fourdatasets, the “lights” subset of CMU PIE [20], Yale face dataset [21], Multiple
Biometrics Grand Challenge (MBGC) dataset [22], andFace Recognition Grand Challenge(FRGC) v2.0[24]are considered.
Specifically, the “lights” subsetcontains 1,496 face samples of size 640×486 with various lighting conditions, where some
sample images areshown in Fig. 16. flashers and one non-ambient flasher with room lights. In addition, the frontal face of
147 subjects are captured with normal eye situation. On the other hand, the Yale face dataset contains face samples of size
320 × 243 with blinking and closed eyes, where selected sample images are shown in Fig. 17. This datasetincludes 165
samples of15 subjects captured with thefrontal faceposition.Note that the above two datasets have the same samples per
class.The MBGC dataset contains 88subjects and 3,482 images of various sizes, rangingfrom 1600 × 1200 to 3872 × 2592,
and they are captured in different backgrounds, illuminance, and distances to a person in front of the camera. However,
some samples are captured from a far distance to the person or out of focus. Thus,a pre-filtering process is applied onthat
dataset as released in [23] for subsequent experiments. The downsized dataset contains 2,447 images from88
subjects,which have different number of sample foreach subject. Last but not least, the FRGC v2.0 dataset contains 345
subjects, and it has different number of samplesper class with12,682 frontal face images of size 2272 × 1704. Some of the
representative images areshown in Fig. 19. Thisdatasetfocuses more on the light changing in controlled and uncontrolled
environment. Specifically, the controlled images are taken in a studio setting with twodifferentlighting conditions, while the
uncontrolled images aretaken undervarying illumination conditions, e.g., hallwaysand atriums. Furthermore, the images in
these two conditions are with two facial expressions,i.e., either smiling orneutral.In our simulation,the 5-fold cross
validation is applied for each of the fourdatasets for performance comparison.

Therecognition accuracies discussed in this section are highly dependenton the detected landmark points, which was
introducedin Fig. 1(b). It suggests that a failure in detecting the landmark pointswill lead to the inability in recognition. To
this end, Table III shows the successfuldetection probabilities of the landmark points under these datasets.As it can be seen,
a reliable results are achieved by usingIntraFace[25]and it ensures a consistent cognition between the values shown in the
following experiments as well asthe actual performance of the proposed method.

43
6.1 PARAMETER OPTIMIZATION

The proposed algorithm has fiveparameters to be optimized: 1)𝛼 × 𝛼: the number of cellsinthe 𝑅𝑂𝐼1 forHOG; 2)𝛽: the
number of binsof each cellforthe HOG;3) 𝜁 × 𝜁: the number of cells in each block for HOG;4) 𝛾: the number ofcells on the
𝑅𝑂𝐼2′ for the proposed WT-LBP, and;5)𝜆: threshold for single-or double-foldeyelids.The accuracy (ACC) is adopted as
themetric for the evaluation, and it is computed as follows:
ACC = 𝑡𝑝+𝑡𝑛 𝑡𝑝+𝑡𝑛+𝑓𝑝+𝑓𝑛, (14)
where tp, fp, tn, andfndenote thenumbers of true/false positive and true/false negative, respectively.Notably, only the training
sets are utilized for parameter optimization.
Theparameters are separatedinto two parts, {𝛼, 𝛽, 𝜁, 𝛾}, and {𝜆}, for the individual optimization. For the last parameter {λ},
sinceit is specifically used in the classification of the type of eyelids, its accuracy always positively contributes to the entire
system. The search is performed for the set of values {1.5, 2.5, … , 15.5}, and it is foundthat 𝜆 = 5.5 achieves the best
performance with accuracy 73.2% as shown.
To achieve the optimum configuration of the fourparameters, the brute force approachis adopted.The performances
ofeachparameter areillustratedin Fig. 20, where one parameter is adjusted while the remaining three parameters are fixed.
Among these, the performance of 𝛽 as shown in Fig. 20(b) reveals multiple optimum performances, thus the average valueof
these optimum parameters, i.e., 10,is considered as the bestvalue for𝛽 . In Fig. 20(d), it is observed that the performance of 𝛾
with 30 and 50 are superior to the others. Specifically, the parameter 𝛾 consists of 2 × 𝑛 parts. It is divided into two
horizontallypartsand 𝑛 vertical parts, where the normalized width of 𝑅𝑂𝐼2′ (i.e., 75)should be evenly divisible by 𝑛.When the
number of cells is greaterthan 50, the performance drops. The reason isthatthe number of pixels on the cell istoo
lowfordistinguishing theidentity effectively. Consequently, 30 is considered as the bestvalue for 𝛾.As a result, the optimized
parametersare as follows:𝛼 = 3,𝛽 = 10, 𝜁 = 1,and 𝛾 = 30.

6.2 RESULTS

6.3 PERFORMANCE COMPARISON

44
CONCLUSION

45
APPENDIX

46
BIBLIOGRAPHY

47

You might also like