Quantum Neural Networks Versus Conventional Feedforward Neural N

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

QUAINTUM NEURAL NETWORKS VERSUS

CONVENTIONAL FEEDFORWARD NEURAL


NETWORKS: A N EXPERIMENTAL STUDY

Ra1:f Kretzschmar*, Reto Biielert , Nicolaos B. Karayiannist ,


and Fritz Eggimannt
* t Signal and Information Processing Laboratory
Swiss Federal Institute of Technology (ETH)
Sternwartstr. 7, 8092 Zurich, Switzerland.
Department of Electrical and Computer Engineering
University of Houston
Houston, Texas 77204-4793, USA.
* Email: [email protected]

Abstract. This study investigates the capacity of quantum neural


networks (QNNs) to function as fuzzy classifiers. For this purpose,
QNNs are compared with multilayer feedforward neural networks
.
(FFNNs) The experiments are performed on two-dimensional
speech data and investigate a variety of issues involved in the train-
ing o f QNlVs. This experimental study verifies that QNNs are ca-
pable of representing and quantifying the uncertainty inherent in
the training data. It is also shown that simple post-processing of
the QNN outputs makes QNNs an attractive alternative to con-
ventional FFNNs for pattern classification applications.

INTRODZJCTION

Feedforward neural networks (FFNNs) have been a natural choice as trainable


pattern classifiers because of their function approximation capability and
generalization ability [l].The function approximation capability allows them
to form arbitrary nonlinear discriminant surfaces while the generalization
ability allows them to respond consistently to data they were not trained
with. One of the major disadvantages of FFNNs is their inability to correctly
estimate class membership of data belonging to regions of the feature space
where there is overlapping between the classes. The reason for this is that
FFNNs use sharp decision boundaries to partition the feature space. As
a result, the outputs of trained FFNNs cannot generally be interpreted as
membership values. This motivated the development of neuro-fuzzy systems
by merging neural modeling with fuzzy-theoretic concepts [5]-[7].

(C)IEEE
0-7803-6278-0/~$10.00 328
The limitations of conventional FFNNs motivated the development of
inherently fuzzy feedforward neural networks, known as quantum neural net-
works (QNNs) [6], [7]. Conventional FFNNs and QNNs satisfy the require-
ments outlined in [3] for universal function approximators. In addition t o
their function approximation capabilities, QNNs have also be shown to be
capable of representing and quantifying the uncertainty inherent in the train-
ing data. More specifically, QNNs can identify overlapping between classes
due to their capacity of approximating any arbitrary membership profile up
to any degree of accuracy.
QNNs have recently been used, together with FFNNs, to classify and
remove bird-contaminated data recorded by a 1290-MHz wind profiler [2].
QNNs outperformed FFNNs in this application, but the high dimensionality
of the input space did not allow a visual perception of the results. This paper
presents an experimental comparison of conventional FFNNs and QNNs on
a pattern classification problem involving two-dimensional (2-D) vowel data.

QUANTUM NEURAL NETWORKS

QNNs are feedforward neural networks capable of classifying uncertain data


[6], [7]. The main difference between conventional FFNNs and QNNs is the
form of the nonlinear activation functions of their hidden units. Instead of
the ordinary sigmoid functions employed by conventional FFNNs, the hid-
den units of QNNs contain multilevel activation functions. Each multilevel
function is formed as the sum of sigmoid functions shifted by the quantum
intervals. The quantum intervals add a n additional degree of freedom that
can be exploited during the learning process to capture and quantify the
structure of the input space. This is accomplished by minimizing the average
class-conditional variance at the outputs of the hidden units.
Consider a conventional FFNN consisting of n, inputs, one layer of nh
hidden units, and no output units. Let wZ3be the synaptic weight connecting
the ith output unit t o the j t h hidden unit and let uje be the synaptic weight
connecting the j t h hidden unit to the l t h input. Suppose the data set X
contains the feature vectors x k = XI,^ X 2 , k . . . z n , , k I T , 1 5 k 5 M . Then
the response of the ith output unit to Xk is yz,k = f(cyAo w v k 3 , k ) , where
LO,, = 1, Vk, f(.) can be a linear or sigmoid function, and &,k = go(h,,k) is
the response of the j t h hidden unit, which is computed as the output of the
sigmoid activation function go(.) to k3,k = C ~ ~ o v 3 e z e ,with k , z o , k = 1, Vk.
A typical example of a sigmoid function is the logistic function go(s) =
1/(1+ exp(-E)), which was used in this work.
QNNs are feedforward neural networks that employ multilevel activation
functions in their hidden units. Suppose a QNN, denoted here as QNN (n,),
contains multilevel hidden units with n, discrete quantum levels. The activa-
tion function of such units can be written as the sum of n, sigmoid functions,

0-7803-6278-0/00$10.00(C) IEEE 329


each shifted by Or, i.e.,
. ne

where p h is a slope factor, and {er}


define the jump-positions in the activation
function. The step widths of the multilevel activation function, called the
quantum intervals, are determined by the jump-positions {e'}. If n, = 1 and
O 1 = 0, then the multilevel activation function g ( . ) defined in (1) reduces to
the sigmoid Function g o ( . ) . In such a case, the response k j , k = g ( h j , k ) of the
j t h hidden unit to the feature vector x k reduces to k j , k = g O ( h 3 , k ) and the
QNN model under consideration reduces to a conventional FFNN.
The synaptic weights of conventional FFNNs and QNNs can be updated
by using gradient descent to minimize the error function E k = x r z 1 ( Y z , k -
&,k)', sequentially for k = 1 , 2 , . . . , M . The quantum intervals can be esti-
mated by minimizing the class-conditional variances at the outputs of the
hidden units. More specifically, the jump-positions {e;} can be updated by
minimizing the average class-conditional variance at the outputs of the hid-
den units, defined as

where ( k j , ~ , , , )= (l/lCml) CVxkcc,


&,k and IC,I denotes the cardinality of
the mth class Cm.

EXPERIMENTAL RESULTS

The experiments described in this paper compared FFNNs and QNNs on


a vowel data set [4].This data set has been extensively used to compare
different pattern classification approaches since it contains well-separated and
highly overlapping classes on a 2-D input feature space. The vowel data
consist of 671 samples of 10 different vowels (i.e., 10 classes) described by
their first two formants. The 671 samples were divided into a training set of
338 samples and a testing set of 333 samples. Figure 1 shows the training
and the testing sets formed from a normalized version of the vowel data.
All neural networks were trained in the experiments for 10000 cycles on
the training set. The learning rate for the weights and biases was 0.05 while
the learning rake for the quantum intervals was 0.005. The slope factor associ-
ated with the inultilevel activation functions of the QNNs was set equal to the
number of quantum levels to enable a clear separation of the superimposed
sigmoids [2]. For a given number of hidden units and quantum levels, the
free parameters of the QNNs were initialized by ten distinct sets of numbers
in the interval [-1, 11 produced by a random number generator.
The first set of experiments identified the size of the FFNN that results
in the best classification of the data. It was found that the lowest error on

(C) IEEE
0-7803-6278-0/(N)$l0.00 330
-2 L
-2 -1 0
Input xl
1 i ; -2'
-2 -1 0
Input xl
1 2
I
3

Figure 1: The normalized vowel data: (a) the training set and (b) the testing set.
The two inputs x l and x2 represent the normalized values of the first two formants
of the vowels. Each symbol denotes a different vowel.

the testing set was produced by an FFNN containing six hidden units. This
FFNN was utilized in the experiments that followed. These experiments also
calculated the classification errors E, and the values of the average class-
conditional variance G produced on the testing set by the FFNN and QNNs
with six hidden units containing multilevel activation functions with 2, 3,
4, 5, and 10 quantum levels. The initializations that resulted in the lowest
values of E, and G are summarized in Tables 1 and 2, respectively. The
percentage of classification errors E, was computed based on a winner-takes-
all strategy, that is, the input vector was assigned to the class represented by
the output unit with the largest response. When evaluated in terms of the
percentage of classification errors E,, the FFNN and QNNs tested in these
experiments produced comparable results. On average, QNNs led to lower
values of the average class-conditional variance G than FFNNs. This is not
surprising, since the quantum intervals of the QNNs were updated during
their learning by minimizing the average class-conditional variance.

Table 1:PERCENTAGE OF CLASSIFICATION ERRORS( E , ) AND AVERAGECLASS-


CONDITIONAL VARIANCE VALUES(G) PRODUCED BY AN F F N N A N D VARIOUS
QNNs THAT RESULTED IN THE LOWESTNUMBER OF CLASSIFICATION ERRORS.

Neural Network FFNN QNN(2) QNN(3) QNN(4) QNN(5) QNN(10) .


E, [%] 19.2 19.5 21.0 19.2 19.5 19.5
G 28.1 22.5 14.8 26.1 20.0 20.4

Table 2: PERCENTAGE OF CLASSIFICATION (E,)


ERRORS AND AVERAGECLASS-
CONDITIONAL VALUES(G) PRODUCED
VARIANCE BY AN FFNN AND VARIOUS
QNNs THAT RESULTED IN THE LOWEST VALUESOF THE AVERAGECLASS-
CONDITIONAL VARIANCE.

Neural Network FFNN QNN(2) QNN(3) QNN(4) QNN(5) QNN(10)


E, [%] 21.9 21.9 21.0 22.5 21.9 22.8
G 18.2 17.1 14.1 18.1 16.1 13.7

0-7803-6278-0/00$10.00 ( C ) IEEE 33 1
.. * #

-0.5-1
-0.5 0.5
-0.5
-1
- -0.5 0 0.5
InpInXl InpUcXl

-1 -0.5 0 0.5 1 -1 -L._ n "."


n r
lnpU XI InpiXl

Figure 2: Contour plots of the output values of one specific output unit (i.e., one
specific class) produced by the (a) FFNN, (b) Q N N (2), (c) Q N N (3), (d) Q N N (4),
(e) Q" (51, and (f) QNN (10).

The second set of experiments evaluated the ability of FFNNs and QNNs
to estimate class membership from the data. This investigation employed
the neural networks that produced the lowest percentage of classification
errors E, in the previous set of experiments (see Table 1). Figures 2(a)-2(f)
show the contour plots produced by the neural networks listed in Table 1
for class 1. A11 samples of class 1 that are represented by circles in Figure 1
are also shown as circles in Figure 2. The exact position of each sample of
class 1 is marked by a dot. No specific symbols are used in Figure 2 for the
samples of all other classes. Instead, their exact positions are marked by dots.
The contour lines (i.e., lines of constant values) of the network output values
that represent class 1 were obtained by presenting to the neural networks

(C) IEEE
0-7803-6278-0/00$10.00 332
input vectors produced by a regular 100 x 100 grid that covered the region
[-1,1] x [-0.5,2.5] of the input space. Dense contour lines produced dark
areas in Figure 2 and mark regions of the input space corresponding to a
steep slope in the output. Less dense contour lines produced light areas
in Figure 2 and mark regions of the input space corresponding to almost
constant output values. Figure 2 also shows the output values corresponding
to some selected contour lines. The highest output values were found near
the center of class 1 (i.e., near the center of the figures) and towards the top
right. The contour lines produced by the FFNN define a smooth surface.
In contrast, the contour lines produced by the QNNs quantize the input
space into regions corresponding to almost constant output values. These
regions represent certain levels of class membership or certain degrees of
class overlapping. This can be verified by observing the distribution of the
circled and non-circled data points in Figures 2(b)-2(f). The number of flat
regions produced by the QNNs increased as the number of quantum levels
increased. For a small number of quantum levels, QNNs produced a rough
membership profile representing the uncertainty in the training data, such
as that shown in Figure 2(b). For a large number of quantum levels, QNNs
seem to assign an individual level of uncertainty to small groups of training
vectors or even to isolated training vectors. An example of such a behavior
is shown in Figure 2(f). Referring t o Figure 2, the most reliable membership
profiles were produced by the QNNs with 3, 4,and 5 quantum levels.
The third set of experiments investigated how the response of trained
QNNs is affected by the minimization of the average class-conditional vari-
ance during the learning process. Figures 3(a) and 3(b) show the contour
lines produced by two QNNs with 3 quantum levels trained using two differ-
ent random initializations of their free parameters. These two initializations
produced different values for the average class-conditional variance corre-
sponding to class 1 and led to different membership profiles, as indicated by
comparing Figures 3(a) and 3(b). It is also clear that the minimization of
the average class-conditional variance has a significant impact on the repre-

Figure 3: Comparison of two QNN with 3 quantum levels whose training resulted
in the following values of the average class-conditional variance for class 1: (a) 1.4,
and (b) 3.9.

0-7803-6278-0/00$10.00(C) IEEE 333


(b)

2 3
Input x i Input x i

-21
-2 -1
0 1 2 3
Input x i

Figure 4: Decision boundaries produced by the FFNN on (a) the training set and
(b) the testing set, the QNN (4) on ( c ) the training set and (d) the testing set, and
the Q N N (5) on (e) the training set and (f) the testing set.

sentation of the uncertainty inherent in the training data, which is reflected


by the membership profiles produced by the trained QNNs. Comparison of
Figures 3(a) and 3(b) indicates that the best membership profile was pro-
duced by the QNN whose training led to the lowest value of the average
class-conditional variance for class 1.
The fourth set of experiments evaluated the decision boundaries produced
by trained FFNNs and QNNs when class label assignment relied on a winner-
takes-all strategy. Figures 4(a)-4(f) show the decision boundaries produced
by the FFNN and the two QNNs with 4 and 5 quantum levels that achieved
the lowest percentage of classification errors E, (see Table 1). The decision
boundaries are shown in Figures 4(a)-4(f) together with the training and

0-7803-6278-0/00$10.00(C) IEEE 3 34
lnputxi Input x i

Input x i Input xi

Figure 5 : Regions of overlapping classes (shaded) and decision boundaries produced


by the FFNN based on (a) Rule 1 and (b) Rule 2, the QNN (4) based on ( c ) Rule 1
and (d) Rule 2, and the QNN (5) based on (e) Rule 1 and (f) Rule 2.

testing sets. The neural networks tested in these experiments produced dif-
ferent decision boundaries despite the fact that they all produced almost the
same number of classification errors. According to Figures 4(a) and 4(b), the
FFNN produced relatively smooth and fairly adequate decision boundaries
for this data set. Figures 4(c)-4(f) indicate that the QNNs produced slightly
eccentric decision boundaries for certain regions of the input space. This ex-
perimental outcome reveals that a winner-takes-all strategy is not capable of
interpreting the rich and complex structure of the QNN output. As a result,
this strategy discards information that could improve class label assignment.
This reveals the need for the development of alternative strategies for class

(C) EEE
0-7803-6278-0/00$10.00 335
label assignment that could potentially improve the reliability of decision
making.
The results of the previous experiments motivated the development of
simple rules for identifying regions of uncertainty in the input space based
on the outputs of trained neural networks. These rules relied on a statistical
analysis of the most dominant network outputs corresponding to correctly
and incorrectly classified training samples. In these experiments, the input
vectors produced by a regular 100 x 100 grid were assigned to an uncertainty
region if
Rule 1': the highest output of the neural network was lower than the
average of the highest outputs of correctly classified samples and the
highest outputs of incorrectly classified samples,
Rule 5': the absolute value of the difference of the two highest outputs
was lower than the average of the two highest outputs.
Figures 5(a)-5(f) show the decision boundaries and the regions of uncertainty
produced by the two rules for the FFNN and the two QNNs with 4 and 5
quantum levels that achieved the lowest percentage of classification errors E,
(see Table 1). According to the regions of uncertainty shown in Figure 5 ,
Rule 2 appears to be more reliable than Rule 1. The distribution of the
training data in the input space also indicates that the regions of uncertainty
produced by Rule 2 can be interpreted as the regions of the input space
where class label assignment according to a winner-takes-all strategy may
not be valid Comparison of Figures 5(a)-5(f) indicates that the QNN with
5 quantum levels provided a more reliable basis for identifying regions of
uncertainty than the QNN with 4 quantum levels. This can be attributed to
the fact that the average class-conditional variance computed on the trained
QNN with 5 quantum levels was lower than that computed on the QNN with
4 quantum levels (see Table 1). According to Figures 5(e) and 5(f), Rule 2
produced uncertainty regions where no training data were available. Note
also that the QNN with 5 quantum levels produced very narrow uncertainty
regions where the classes are clearly separated and wider uncertainty regions
where there is significant overlapping between different classes.

CONCLUSIONS

The study outlined in this paper compared conventional FFNNs and QNNs
on a pattern classification problem involving a set of 2-D vowel data. This
data set is particularly challenging for any pattern classifier due t o exten-
sive overlapping between the samples belonging to different classes. FFNNs
and QNNs produced comparable classification rates when class label assign-
ment was based on a winner-takes-all strategy. However, there were some
remarkable differences between the responses of the output units of trained
FFNNs and (2"s. Unlike FFNNs, which produced smooth bell-like surfaces

0-7803-6278-0/00$10.00 (C) IEEE 336


for each class, QNNs quantized certain regions of the input space by creating
staircase-like surfaces for each class. Comparison of these surfaces with the
structure of the input space verified that QNNs are capable of representing
the uncertainty inherent in the training data in the sense that their outputs
can be interpreted as an approximation of the class membership profile asso-
ciated with the training set. It was experimentally verified that the capacity
of QNNs to quantify uncertainty depends rather strongly on the number of
quantum levels and the minimization of the average class-conditional vari-
ance that determines the lengths of the quantum intervals during learning.
The experiments indicated that the approximation of a given membership
profile degrades if the number of quantum levels increases above a certain
threshold or if the algorithm used to minimize the average class-conditional
variance is trapped in local minima.
It was experimentally verified that a winner-takes-all strategy is not nec-
essarily the best approach to assigning class labels to input vectors based on
the outputs of trained QNNs. Since this strategy is not capable of interpreting
the complex structure of the QNN output, it discards valuable information
regarding uncertainty that could be utilized to improve the reliability of de-
cision making. It was also shown that the outputs of trained QNNs can be
interpreted by some simple rules to produce regions of uncertainty in the
input space. The experiments indicated that such a post-processing of the
QNN outputs makes QNNs an attractive alternative to FFNNs for practical
pattern recognition applications.

REFERENCES

111 C. M. Bishop, Neural Networks for P a t t e r n Recognition, Oxford: Ox-


ford University Press, 1995.
121 R. Kretzschmar, “Quantum neurofuzzy bird removal algorithm (NEURO-
BRA) for 1290-MHz wind profiler data,” Master’s Thesis, Institute for At-
mospheric Science (LAPETH), ETH Zurich, Zurich, Switzerland, 1998.
131 M. Leshno, V. Y. Lin, A. Pinkus and S. Schocken, “Multilayer feedforward
networks with a nonpolynomial activation function can approximate any func-
tion,” Neural Networks, vol. 6, no. 6, pp. 861-867, 1993.
[41 K. Ng and R. P. Lippmann, “Practical characteristics of neural network and
conventional pattern classifiers,” in R. P. Lippmann, e t al. (Eds.), Advances
in Neural Information Processing Systems 3,San Mateo, CA: Morgan
Kaufmann, pp. 970-976, 1991.
151 S. K. Pal and S. Mitra, Neuro-Fuzzy P a t t e r n Recognition, New York:
Wiley, 1999.
161 G. Purushothaman and N. B. Karayiannis, “Quantum neural networks
(QNNs): Inherently fuzzy feedforward neural networks,” IEEE Transac-
tions o n Neural Networks, vol. 8, no. 3, pp. 679-693, 1997.
[71 G. Purushothaman and N. B. Karayiannis, “Feed-forward neural architectures
for membership estimation and fuzzy classification,” International Journal
of Smart Engineering System Design, vol. 1, pp. 163-185, 1998.

0-7803-6278-0/00$10.00(C) IEEE 337

You might also like