Introduction To Rule-Based Fuzzy Logic Systems: by Jerry M. Mendel
Introduction To Rule-Based Fuzzy Logic Systems: by Jerry M. Mendel
Introduction To Rule-Based Fuzzy Logic Systems: by Jerry M. Mendel
Rule-Based
Fuzzy Logic Systems
by Jerry M. Mendel
Rules
Crisp
Output
Processor
Fuzzifier
inputs
x
Fuzzy
Inference
Fuzzy
output sets
input sets
y = f( x)
Crisp
outputs
y
CONTENTS
A Self-Study Course (Introduction)
Lesson 1
Lesson 2
Fuzzy SetsPart 1
Lesson 3
Fuzzy SetsPart 2
Lesson 4
Fuzzy SetsPart 3
Lesson 5
Fuzzy Logic
Lesson 6
Case Studies
Lesson 7
Lesson 8
Lesson 9
Introduction to Rule-Based
Fuzzy Logic Systems
A Self-Study Course
This course was designed around Chapters 1, 2, 46, 13 and 14 of Uncertain Rule-Based Fuzzy Logic
Systems: Introduction and new Directions by Jerry M. Mendel, Prentice-Hall 2001. The goal of this selfstudy course is to provide training in the field of rule-based fuzzy logic systems.
In this course, which is the first of two self-study courses, the participant will focus on rule-based fuzzy
logic systems when no uncertainties are present. This is analogous to first studying deterministic systems
before studying random systems. In the follow-on self-study course New Directions in Rule-Based Fuzzy
Logic Systems: Handling Uncertainties, the participant will learn about expanded and richer kinds of rulebased fuzzy logic systems, ones that can directly model uncertainties and minimize their effects. The
present course (or equivalent knowledge) is a prerequisite to the follow-on course.
Prerequisites
This course is directed at participants who have had no formal training in fuzzy logic and want to learn
about rule-based fuzzy logic systems. It assumes a college undergraduate degree, preferably in electrical
engineering or computer science.
Course Objectives
After completing this course, you should be able to:
Describe many differences between fuzzy sets and crisp sets, and fuzzy logic and crisp logic
Describe numerous applications for rule-based fuzzy logic systems (FLSs)
Demonstrate how a fuzzy set is described by a membership function
Compute set theoretic operations for fuzzy sets using membership functions
Demonstrate compositions of fuzzy relations and compute their membership functions
Describe and use Zadehs Extension Principle
Explain the transition from crisp logic to fuzzy logic
Demonstrate membership functions for rules
Explain how rules are fired and implement the firing of rules
Describe and demonstrate how a FLS can be used to forecast a time-series
Describe and demonstrate how a FLS can be used as a fuzzy logic advisor for making social or
engineering judgments
Describe the architectures of three type-1 FLSs
Compute the input-output relationships for these three FLSs
Demonstrate and implement a variety of design methods for optimizing the design parameters of
these three FLSs
Describe the nature of and the order of all computations needed to design and implement these three
FLSs
Explain what software is available to implement and design these three FLSs
Course Components
This course includes:
A study guide including learning objectives, reading assignments, and practice problems (with
solutions)
A final exam and its solution.
The textbook is not included.
Acknowledgements
I would like to take this opportunity to thank Qilian Liang for his careful review of the Study Guide and
Li-Xin Wang for contributing the write-up in Lesson 13 about fuzzy logic control.
Reading Assignment
I. What is Fuzzy Logic (FL)?
We answer this question by contrasting FL with logic.
According to the Encyclopedia Britannica, Logic is the study of propositions and their use in
argumentation. According to Websters Dictionary of the English Language, logic is the
science of formal reasoning, using principles of valid inference, and logic is the science whose
chief end is to ascertain the principles on which all valid reasoning depends, and which may be
applied to test the legitimacy of every conclusion that is drawn from premises. Although multivalued logic exists, we are most familiar with two-valued (dual-valued) logic in which a
proposition is either true or false. This kind of logic is also referred to as crisp logic.
Traditional (sometimes called Western) logic was first systematized by Aristotle thousands of
years ago, in ancient Athens. There are two fundamental laws of classical logic:
Law of the Excluded Middle: A set and its complement must comprise the universe of
discourse.
Law of Contradiction: An element can either be in its set or its complement; it cannot
simultaneously be in both.
These two laws sound similar, but the Law of Contradiction forbids something being
simultaneously true and not true, whereas the Law of the Excluded Middle forbids anything
other than something being true or not true. Shakespeares Hamlet exemplified the Law of
Contradiction when he said To be or not to be, that is the question.
2
Fuzzy logic (FL) is a type of logic that includes more than just true or false values. It is the logic
that deals with situations where you cant give a clear yes/no (true/false) answer. In FL,
propositions are represented with degrees of truthfulness or falsehood, i.e., FL uses a continuous
range of truth values in the interval [0, 1] rather than just true or false values. In FL, both of the
two fundamental laws of classical logic can be broken, i.e., it is possible for an element to
simultaneously be in its set and its complement but to different degrees, the sums of which add
up to unity. This will be made very clear in Lesson 3. So, Zadehs Hamlet might have said To
be somewhat and not to be somewhat, that is the cunundrum. FL includes classical dual-valued
logic as a special case.
Read the IEEE Spectrum (June 1995, pp. 32-35) profile of Zadeh that is a supplement to this
lesson and appears at the end of the Study Guide.
3
Lotfi Zadeh is the founding father of FL. His first seminal paper on fuzzy sets appeared in 1965,
although he began to formulate ideas about them at least four years earlier. Fuzzy sets met with
great resistance in the West, perhaps because of the negative connotations associated with the
word fuzzy. Lets face it, fuzzy does not conjure up visions of scientific or mathematical
rigor. For decades after 1965 somealbeit, a relatively small number ofpeople, along with
Zadeh, developed the rigorous mathematical foundations of fuzzy sets and fuzzy logic.
Interestingly enough, Chinese and Japanese researchers devoted a large effort to fuzzy sets and
fuzzy logic. A popular hypothesis for this is that fuzzy fits in quite nicely with Eastern
philosophies and religions (e.g., the complimentarity of Yin and Yang). But, until the early
1970s fuzzy logic was a theory looking for an application. Then, a major breakthrough occurred
in 1975 when Mamdani and Assilian showed how to use rule-based FL to control a non-linear
dynamical system. It was relatively easy to do this, and it was a fast way to design a control
system. Although the design did not lend itself to the well-accepted, important, critical and
rigorous examinations called for by control theory, it did demonstrate an important real
application for FL. Other applications of rule-based FL began to appear, two very notable ones in
Japancontrol of the Sendai cities subway system, and control of a water treatment system.
Commercial products began to appear, e.g. fuzzy shower, fuzzy washing machine, fuzzy ricecooker, and, in Japan, the word fuzzy took on the connotation of intelligent, and in 1990
received an award. Western industries took noticethere was big money to be madeand the
decade of the 90s rolled in, during which FL achieved a high degree of acceptability (there still
is an on-going debate between subjective probabilists and fuzzy theorists about whether FL is the
same as or is different from subjective probability). The IEEE established the IEEE Transactions
on Fuzzy Systems, and established the IEEE Conference and Fuzzy Systems (FUZZ); there are
many other journals devoted to fuzzy systems (e.g., Fuzzy Sets and Systems); and, there are many
workshops and conferences devoted either exclusively to or that include sessions on fuzzy
technologies. In 1995, the IEEE awarded Zadeh its highest honor, its Medal of Honor, which is
comparable to the Nobel Prize. Fuzzy logic is now widely used in many industries and fields to
solve practical problems, and is still a subject of intense research by academics all over the
world. Although many applications have been found for FL, it is its application to rule-based
systems that has most significantly demonstrated its importance as a powerful design
methodology. Such rule-based fuzzy logic systems (FLSs) are what this course is all about.
If you are interested in a less impressionistic history of FL, then see, for example, the books by
McNeill and Freilberger (1992), Wang (1997), or Kosko (1993a). One of the best histories of FL
appears in the recent textbook by Yen and Langari (1999, pp. 318).
IV. Four Components That Make Up a Rule-Based Fuzzy Logic System (FLS)
Read pages 38 of the textbook.
FL has led to a new architecture for problem solving. This architecture processes its inputs nonlinearly and is built upon a class of logical propositionsrules. Rules can be extracted from
experts and can then be quantified using the mathematics of FL that you will learn in this course;
doing this leads to the architecture of a FLS. Or, we can a priori assume the architecture of a
FLS, using the mathematics of FL, and tune the parameters of the FLS to solve a problem. The
latter approach is in the spirit of using a neural network (NN) to solve a problem, where the
4
architecture of the NN is assumed ahead of time and its parameters are tuned to solve a problem.
The former approach is truly unique to FL. The two approaches can be combined, allowing an
architecture to be developed that can be based on a combination of linguistic and numerical
information. Both approaches have an important role to play in problem solving and are
described in this course.
VII. Coverage
After this introductory first lesson, there are a series of four lessons that will provide you with
the basic tools that are needed in order to mathematically describe a rule-based FLS. Three
lessonsLessons 24are about fuzzy sets and relations and one lessonLesson 5is about
fuzzy logic. Lesson 6 then describes two applications that are treated in the rest of this course as
case studies, namely forecasting of time-series and knowledge mining using surveys. We then
turn to three specific architectures for FLSs. Four lessonsLessons 710cover many aspects
of the very widely used singleton type-1 FLS (also known as a Mamdani FLS), ranging from
analysis to design to applications. Lesson 11 then covers all aspects of a non-singleton type-1
FLS, also ranging from analysis to design to applications. The non-singleton FLS lets us model
5
the inputs to the FLS as fuzzy numbers, whereas the singleton FLS does not, and, because a nonsingleton FLS is very similar to a singleton FLS, we spend only one lesson on it. Finally, Lesson
12 covers many aspects of a type-1 TSK FLS, again ranging from analysis to design to
applications. The TSK FLS is very popular in control systems applications of FL and is also
becoming popular in signal processing applications of a FLS. Lesson 13 lets you explore some
applications of a type-1 FLS, namely: rule-based pattern classification, equalization of timevarying non-linear digital communication channels, and fuzzy logic control. Its main purpose is
to let you see how one or more of the FLSs already studied can be used to solve some real-world
problems. Lesson 14 focuses on computation, both for implementing a FLS during its operation
and during the design of the FLS. It enumerates all computations for singleton and non-singleton
type-1 Mamdani FLSs and a singleton type-1 TSK FLS, and, overviews on-line software that is
available for these computations. Finally, Lesson 15 focuses on the shortcomings of type-1 FLSs
and how they can be overcome.
Key Points
Fuzzy logic is a type of logic that includes more than just true or false values; it uses a
continuous range of truth values in the interval [0, 1].
Fuzzy logic lets us combine linguistic knowledge and numerical data in a systematic way.
Lotfi Zadeh is the founder of fuzzy logic.
A rule-based fuzzy logic system is comprised of four elements: rules, fuzzifier, inference
engine and output processor.
A FLS is a new architecture for problem solving, one that processes its inputs nonlinearly
and is built upon IF-THEN rules.
FL and FLSs have been applied in many different fields and industries.
Today, fuzzy and neural are being combined into fuzzy neural networks and neural fuzzy
systems.
Questions
1. Consider an engineering project that you are working on or have recently worked on. What are
some IF-THEN rules for that project?
2. What are the antecedents and consequent(s) for the just-stated rules?
3. Why do you think that fuzzy logic as a discipline has encountered so much resistance?
Note: No solutions are provided for these questions because each participant will have their own
answers to them. Deeper answers to Question 3 than are given in Section III above, can be
found in the references mentioned at the end of that section.
Reading Assignment
Read pages 1925 of the textbook.
Key Points
A crisp set can be defined using a membership function (MF) that only has two values, 0 or
1.
A fuzzy set is a generalization of a crisp set to MFs that have values in the closed interval [0,
1].
A crisp set is a special case of a fuzzy set.
Linguistic variables are variables whose values are not numbers but words or sentences in a
natural or artificial language.
Membership functions are associated with termslinguistic variableswhich appear in the
antecedents of consequents of rules, or in phrases.
Popular shapes for MFs are triangles, Gaussian, trapezoidal, piece-wise linear, and bellshaped.
There is no unique MF for a term; even when its shape is agreed upon, there are parameters
for the shape that can be chosen in different ways. The freedom to make such choices
provides fuzzy logic systems with design degrees of freedom.
The terms support of a fuzzy set, fuzzy singleton, and normal fuzzy set let us communicate
about fuzzy sets.
Practice Problems
Complete Exercise 12 (all six parts).
Demonstrate how the basic crisp set theoretic operations of union, intersection and
complement can be computed using membership functions.
Explain basic set-theoretic properties for crisp sets (e.g., associativity, DeMorgans Laws,
Law of Excluded Middle, Law of Contradiction).
Describe the generalizations of the set theoretic operations of union, intersection and
complement to fuzzy sets, and how they can be computed using membership functions.
Explain what t-norms and t-conorms are.
Explain basic set-theoretic properties for fuzzy sets (e.g., associativity, DeMorgans Laws,
Law of Excluded Middle, Law of Contradiction).
Demonstrate crisp relations and compositions on the same product space.
Demonstrate fuzzy relations and compositions on the same product space and explain how
these differ from their crisp counterparts.
Explain the concept of a hedge and list some hedges and their MFs.
Reading Assignment
Read pages 2636, 517520, and 4244 (in this order) of the textbook.
Key Points
The basic crisp set theoretic operations of union, intersection and complement can be
computed using crisp membership functions whose values are either 0 or 1. The maximum
and minimum functions can be used for union and intersection, respectively.
Operations on crisp sets satisfy many properties including associativity, DeMorgans Laws,
Law of Excluded Middle, Law of Contradiction, and these properties can be proved using
Venn diagrams or membership functions.
The basic fuzzy set theoretic operations of union, intersection and complement can be
computed using fuzzy membership functions whose values are in the closed interval [0, 1].
The maximum and minimum functions can be used for fuzzy union and fuzzy intersection,
respectively, but they are not the only operations that can be used.
T-norms are operators that can be used for fuzzy intersection.
Practice Problems
Complete Exercises 1-9, 1-11 and 1-19a.
Reading Assignment
Read pages 3642 of the textbook.
The sup in the sup-star composition is short for supremum. If S is a set of real numbers
bounded from above, then there is a smallest real number y such that x y for all x S. The
number y is called the least upper bound or supremum of S and is denoted supx S (x) . We use the
maximum for the supremum.
The sup-star composition, which is given in Equation (1-45), is the most important formula for a
rule-based FLS; but, it is not proven in the text. Because of its importance, we provide a proof of
it here. Since an understanding of the proof is not essential to the use of the sup-star composition,
you may consider the proof as optional reading.
First, we define the composition of two fuzzy relations.
We have already learned that an element belongs to a fuzzy set if it has a non-zero membership
in that set. In this respect, the composition of two fuzzy relations means:
If R(U,V) and S(V,W) (R and S, for short) are two type-1 fuzzy relations on U V and
V W respectively, then the composition of these two relations, denoted
R(U,V) o S(V,W ) R o S(U,W), is defined as a subset R o S(U,W) of U W such that
2
(u,w) R o S if and only if the membership for any pair (u,w) , u U and w W , is nonzero [i.e., R o S (u,w) 0] for at least one v V such that R (u, v) 0 and S (v,w) 0 .
We shall show that this condition is equivalent to the sup-star composition
RoS
(u,w) = sup vV [
(u,v)
(v, w)]
A Side: In the proof given next, we use the following method. Let A be the statement
R o S (u,w) 0, and B be the statement there exists at least one v V such that
and S (v,w) 0 . We prove that A iff B by first proving that B A
R (u, v) 0
(equivalent to proving that A B, i.e., necessity of B) and then proving that A B
(equivalent to proving that B A, i.e., sufficiency of B).
Proof of (1-45): NecessityIf there exists no v V such that R (u, v) 0 and S (v,w) 0 , then
this means that for every v V , either R (u, v) or S (v,w) is equal to zero (or both are zero),
which in turn implies that R (u, v) S (v, w) = 0 for every v V , i.e. the supremum of
over v V is zero. Hence, R o S (u,w) = 0, as it should be.
R (u, v) S (v, w)
SufficiencyIf the sup-star composition is zero then it must be true that R (u, v) S (v, w) = 0 for
every v V , which means that for every v V , either R (u, v) or S (v,w) (or both) is zero. This
means that there is no v V such that R (u, v) 0 and S (v,w) 0 .
Key Points
The composition of two crisp relations on different product spaces that share a common set
can be computed in different ways, including relational matrices and sagittal diagrams; but,
using formulas to do this, such as the max-min or max-product compositions (or their
shortcuts), are very efficient because they can be easily implemented on a digital computer.
The composition of two fuzzy relations on different product spaces that share a common set is
performed using the sup-star composition, where star denotes a t-norm operator.
The most important application of the sup-star composition in a rule-based FLS is when one
of the relations is a fuzzy set.
The Extension Principle (EP) lets us extend mathematical relationships between non-fuzzy
variables to fuzzy variables.
When using the EP, we must be careful to distinguish between one-to-one and one-to-many
mappings, and, single- and multiple-variable mappings, so as to use the proper version of it
in each case.
Practice Problems
Complete Exercises 116 and 120 (b).
Explain that rules are a form of propositions, and describe what propositions are.
Demonstrate the role of truth tables in crisp logic.
Explain the major elements of crisp logic and demonstrate the truth table for five operations
that are frequently applied to propositions.
Explain the concept of a tautology and demonstrate how to use it to determine MFs for crisp
implications (rules).
Describe the firing of crisp rules using Modus Ponens and Modus Tollens.
Explain the transition from crisp logic to fuzzy logic.
Describe Generalized Modus Ponens and demonstrate how to implement it using a sup-star
composition formula.
Create insightful pictorial diagrams that show the steps of the Generalized Modus Ponen supstar composition.
Explain what engineering implications are and why they are needed.
Reading Assignment
Read pages 4859 of the textbook.
Because of the importance of the sup-star composition (1-74), we now illustrate its computation
when there is some uncertainty about the measurement of input variable x, in which case the
measurement can be modeled as a fuzzy number. These results are used in Lesson 11.
Let the measured value of x be denoted x . In our two examples below we create a fuzzy number
centered about x by using the following Gaussian membership function for A :
(x x 2
(x) = exp 12
2
2
1 (x mA
A (x) = exp 2
A
Example 51: Calculation of the sup-star composition for Gaussian MFs and
Product t-norm
In this example, we assume product implication and product t-norm.
(a) First, we show that the sup-star composition in (1-74) can be expressed as
B
(y) = supx X
A B
(x)
(x)
(x, y) =
(y)
(x)
(y)] = (supx X [
(y) = supx X [
A*
(x)
(x)
(x)
A*
(x)
A*
(x)
2
A
(x)])
mA +
2
A
(y)
)(
2
A
2
A
).
(x), and substitute the exponential MFs stated above into it, to see
2
2
x mA
1 x x
1
f (x) = exp 2
+
exp{ 2 (x)}
A*
A
x x
x mA
(x)
= 2 2 + 2
2
A*
x
A
(x)
= 0 x = xmax
x
(xmax x )
xmax (
2
A
xmax =
2
A
+
2
A*
+ (xmax m A )
2
A*
)=
mA +
2
A +
2
A*
2
A
2
A*
2
A*
mA +
2
A
=0
x
QED
(x)
1
2
(x) = exp 12 ( x mA )
A*
A*
(x)
(x)
2
A
2
A
)} .
(x) to obtain:
(x)] =
A*
(x max )
(x max )
2
2
x max mA
1 x max x
(xmax )} = exp 2
+
A*
A
where
xmax x
mA +
(
2
A*
A*
2
A
2
A
x ( 2A +
+ A2 * ) A*
2
A*
) x
(mA x )
( 2A + 2A * )
A*
and
xmax mA
2
A*
mA +
(
2
A
2
A
x ( 2A +
+ 2A* ) A
2
A*
)mA
(x m A )
( 2A + 2A* )
A
So,
(xmax ) =
2
A*
(mA x )2 +
( 2A +
( x mA )2 ( x mA ) 2
= 2
2
2
( A + 2A * )
A*)
2
A
Hence,
( x mA )2
f (xmax ) = exp 12 2
2
( A + A* )
Example 5-2: Calculation of the sup-star composition for Gaussian MFs and Minimum t-norm
In this example, we assume minimum implication and minimum t-norm.
(a) First, we show that the sup-star composition in (1-74) can be expressed as
B*
A B
A*
(x),
(x, y) = min[
(x)],
(x),
(y)}
B*
B*
A*
A*
(x) , m i n[ A (x),
(x),
(x)],min [
(y)]]
A*
(x),
(y)]}
B*
A*
(x),
(x)],sup x X min[
A*
(x),
(y)]}
B*
A*
(x),
(x)],min [sup x X
A*
(x),
(y)]}
B*
A*
(x),
x = x max =
A*
(y)]}
(x)],min [1,
(x)],
(y)}
(x),
mA +
)(
).
Derivation: We do this in the figure below, where it is clear that xmax occurs at the intersection
of the two Gaussian membership functions.
A*
(x)
(x)
min[
xmax
A*
(x),
(x)]
5
We must take into account the fact that, at the point where the two exponential functions cross
each other, one is increasing and the other is decreasing; hence, xmax is the solution to
xmax x
A*
x mA
= max
(xmax x ) +
xmax =
A*
(xmax m A ) = 0
x +
A* +
mA
A*
2
1 x mA
(x), A (x) = exp 2
.
A
*
A
supx X min
(x),
(x) =
(x max ) =
(xmax )
A
max
A
x +
A*
A*
mA
mA
1
= exp 12 2
x +
mA
( A* +
A*
mA
2
A )
A*
mA
2
1 x mA
A (x max ) = exp 2
A* + A
Key Points
For our work in rule-based FLSs, the following tautologies for an implication are most
important because they let us establish MFs for the implication: (p q) ~[ p (~ q)] and
(p q) (~ p) q .
Logic, set theory and Boolean Algebra are mathematically equivalent; any statement that is
true is one system becomes a true statement in the other simply by making some changes in
notation.
Crisp rules are fired using inference mechanisms known as Modus Ponens and Modus
Tollens; only Modus Pollens plays a role in a FLS.
The transition from crisp logic to FL is done by replacing crisp logics MFs by fuzzy MFs,
and Modus Ponens by Generalized Modus Ponens.
Generalized Modus Ponens is a fuzzy composition where the first fuzzy relation is a fuzzy
set.
The MF of a fired rule is given by the sup-star composition.
Singleton fuzzification simplifies the computation of the sup-star composition by
eliminating the need to perform the supremum operation.
When all MFs are Gaussian then it is possible to compute the sup-star composition
analytically for both product and minimum t-norms.
Pictorial descriptions of the sup-star composition provide insight into its operations, and
demonstrate a problem with using fuzzy versions of classical crisp implications, namely a
bias in the MF of a fired rule.
Mamdani implicationsproduct and minimumovercome the problem of a bias in the MF
of a fired rule; but, their MFs are a departure from those of classical crisp implications.
Practice Problems
Complete Exercises 123 (c) and 126.
Reading Assignment
Read pages 110118 of the textbook.
Although we focus on the Mackey-Glass chaotic time-series in this course and in Chapters 5 and
6 of the textbook, it is by no means the only chaotic time series that has been used to demonstrate
the forecasting capabilities of a FLS, e.g. the Duffing equation is considered by Mendel and
Mouzouris in their 1997 paper.
Table 41 needs some additional explanation in relation to this course. Although it refers to six
kinds of forecasters, in this course we will only cover three kinds: singleton type-1, nonsingleton type-1, and TSK. The Mackey-Glass equation may be chaotic, but it is deterministic,
i.e., even though it is very sensitive to its initial conditions (a property of a chaotic system), once
they have been chosen, then each time we run a simulation of that equation we obtain exactly the
same results. A singleton type-1 forecaster is useful when no uncertainties are present, i.e., there
is no measurement noise so that the measurements that activate the forecaster are perfect, and,
training and testing data are noise-free. A non-singleton type-1 forecaster tries to handle the
situation when the data is corrupted by measurement noise, both during the design and operation
of the forecaster. It does so by modeling the measurements as type-1 fuzzy numbers.
Unfortunately, this leaves a lot to be desired; but, we can not do better within the framework of a
type-1 FLS. To do better we must use a type-2 FLS, as described in the next course New
Directions in Rule-Based Fuzzy Logic Systems: Handling Uncertainties. Finally, we use a totally
different time series (a stream of compressed video) to illustrate the forecasting capabilities of a
TSK forecast. That series is random but has no measurement noise associated with it either
during its design or operation.
Read pages 119126 of the textbook.
2
Sometimes a FLA is comprised of FL sub-advisors. Here we describe three architectures for such
a FLA, assuming for illustrative purposes that there are three sub-advisors. The extension of
these results to more than three sub-advisors is straightforward.
There are many different ways to combine/use three FL sub-advisors. First, however, we explain
why one would construct sub-advisors. To ask people questions that use more than two
antecedents is very difficult, because people usually can not correlate more than two things at a
time. So, if more than two indicators are present for a social or engineering judgment, we can
rank order them in importance (if this ordering is known ahead of timeor it may have to be
established) and then use one or two of the indicators at a time to create the sub-advisors, after
which results from the sub-advisors are combined to give the overall output of the FLA.
x1
y1
x2
x3
FLA 2
FLA 3
y2
Combiner
FLA 1
y
decision
y3
3
2. Parallel Architecture: Aggregate Decision Maker
In the figure for this architectureFigure 2we have again partitioned the indicators in x into
three subsets, each of which is the input to its own FLA. Again, I assume that the dimensions of
each of these subsets is one or two (if there are more than 6 indicators, then more sub-advisors
will be needed). Simple one- or two-antecedent questions can be created in order to construct the
sub-advisors.
x1
x1
x2
Consensus FLA 1
Compare
Action/Decision #1
Compare
Action/Decision #2
Compare
Action/Decision #3
Individuals FLA 1
Consensus FLA 2
x
x2
x3
x3
Combiner
Action/Decision
Individuals FLA 2
Consensus FLA 3
Individuals FLA 3
The architecture of the overall FLA is different than the architecture shown in Fig. 4-3. Now
actions or decisions are made at the output of each sub-advisor and it is those actions or
decisions that are passed on to the Combiner. The Combiner could use a majority-rules strategy,
or some other strategy.
I have shown the block for the Combiner dashed because instead of combining actions and
decisions it may be important to examine the actions/decisions at the output of each sub-advisor.
For social judgments, an individual could be sensitized at the sub-advisor level with the hope that
in so doing he or she would become sensitized at the aggregate level.
3. Hierarchical Architecture
In the figure for this architectureFigure 3we have again partitioned the indicators in x into
three subsets. FLA 1 has antecedents that depend on the indicators in x1 . The output of that sub-
4
advisor, y1 (x1 ) , acts as one of the indicators of FLA 2. The output of that sub-advisor,
y2 (x 2 , y1 (x1 )), then acts as one of the indicators of FLA 3. The output of FLA 3 is considered to
be the overall output of the FLA, namely
y(x) = y3[x 3 , y2 ] = y3[x3 ,y2 (x2 ,y1 (x1))]
The output of each sub-advisor can be the same social judgment, but conditioned on different
antecedents. The questions for FLA 1 are the standard ones. Those for FLAs 2 and 3 are not. For
example a question for FLA 2 would have to be structured like:
IF judgement y made on the basis of indicators x 1 is ____
and indicator x21 is _______ and indicator x22 is _______
THEN judgment y is __________
FLA
x1
FLA 1
x2
y1
FLA 2
x3
y2
y
FLA 3
decision
We immediately see a potential problem for this architecture, namely if a sub-advisor indicator
vector has two elements, then the questions associated with that sub-advisor will have three
antecedents. Such three-antecedent questions are very difficult for people to answer. So, the
overall indicator vector must be partitioned more finely so that each sub-advisor has at most twoantecedents. This can lead to an architecture that has a lot of sub-advisors.
For engineering judgments, when rules are extracted from data, it is possible to use the
hierarchical architecture without having to worry about the dimension of the antecedents, since
questions will not be asked of people.
Key Points
To design a FLS forecaster data is partitioned into training and testing subsets. The number
of elements in each subset depends on the size of the window of data points that is used to
forecast the next data point.
The training data are used in a FLS forecaster to establish its rules.
One way to extract rules from numerical training data is: Let the data establish the fuzzy sets
that appear in the antecedents and consequents of the rules.
Another way to extract rules from numerical training data is: Pre-specify fuzzy sets for the
antecedents and consequents and then associate the data with those fuzzy sets.
A third way to extract rules from numerical training data is: Establish the architecture of a
FLS and use the data to optimize its parameters.
Chaotic behavior can be described as bounded fluctuations of the output of a non-linear
system with high degree of sensitivity to initial conditions.
The Mackey-Glass equation is a non-linear delay differential equation that is known to
exhibit chaos when its delay parameter is greater than 17.
Knowledge mining, as used in this course, means extracting information in the form of
IFTHEN rules from people.
Judgment means an assessment of the level of a variable of interest.
A six step methodology for knowledge mining involves: identifying the behavior of interest,
determining the indicators of the behavior of interest, establishing scales for each indicator
and the behavior of interest, establishing names and interval information for each of the
indicators fuzzy sets and behavior of interests fuzzy sets, establishing rules, and, surveying
people (experts) to provide a consequent for each rule.
Rules that are extracted from people about a judgment can be modeled using a FLS called a
fuzzy logic advisor (FLA).
FLAs can be used in different ways for social judgments or engineering judgments, e.g. they
can be used to sensitize people about social judgments.
FLAs can be comprised of sub-advisors that can be organized in a variety of architectures;
this is useful so that people can be asked questions with at most one or two antecedents.
Practice Problem
Participate in the survey given in Table 4-2 (see, also, the discussion given on p. 77) by: (1)
providing your start and end points for the five range labels, and (2) re-computing the mean and
standard deviation values for the start and end points for each label using those shown in the
table (obtained from 47 students) and your new values.
Reading Assignment
Read pages 131142 of the textbook.
Key Points
A singleton type-1 FLS consists of rules, fuzzifier, inference mechanism and defuzzifier.
A multiple-antecedent multiple-consequent rule can always be considered as a group of
multi-input single-output rules
Many non-obvious rules can be cast into the form of a standard IF-THEN rule, so that a rulebased FLS is quite broad in it applicability.
The MF of a fired rule, B (y), is given by B (y) = sup x X A (x) A G (x, y) , y Y .
(y).
Fired rules can be combined in different ways; there is no one best way to do this.
A singleton fuzzifier has a MF that is non-zero at only one point, xi = xi .
For singleton fuzzification the supremum operation in the sup-star composition is very easy
to evaluate because the MF of the input is non-zero only at one point, xi = xi .
Bl
Bl
(y) =
(y)
F1l
Bl
( x1)L
(y), is given by
Fpl
( xp ) , y Y
Practice Problem
Example 5-1 is one of the most important ones given in Chapter 5, because it provides a
geometric interpretation for the operations that occur within the inference engine. In this
exercise, I want you to provide the figures that are comparable to the ones given in Figures 545-6, but using triangular MFs. Do this for both the minimum and product t-norms.
Describe five popular methods for defuzzification: centroid, center-of-sums, height, modified
height, and center-of-sets.
Explain why there is no one singleton type-1 FLS, and demonstrate that the many choices
that need to be made to specify or design such a FLS lead to a rich variety of FLSs.
Demonstrate the inputoutput formula for a singleton type-1 FLS as a new kind of basis
function expansiona fuzzy basis function (FBF) expansion.
Demonstrate that each rule, whether it derives from expert linguistic knowledge or is
extracted from numerical data, can be associated with one FBF.
Explain what a universal approximation theorem is, and describe a singleton type-1 FLS as a
universal approximator.
Demonstrate what is meant by rule explosion.
Reading Assignment
Read pages 142148 of the textbook.
Section 5.5.2: Here we derive (5-16) by beginning with the additive combiner depicted in Figure
5-3, assuming product implication and product t-norm, and formally determining the center of
M
(y) = wl
Bl
l=1
(y).
Derivation: From the last line of (5-10) we know that the MF for the additive combiner
can be expressed as:
M
(y) = wl
l=1
(y) = wl
l=1
(y)
i=1
Fil
(xi ) = wl f l
l=1
where f l =
i=1
F il
Gl
(y)
2
M
y Y y wl f
y B (y)dy
l =1
Centroid of B (y) = yY
=
M
l
y Y B (y)dy
yY wl f
l =1
l
Gl
Gl
(y)dy
(y)dy
B (y) =
Centroid of
wl f l y Y y
l=1
M
(y)dy
Gl
wl f y Y
l
Gl
l= 1
(y)dy
y Y
Gl
y Y
(y)dy
(y)dy
Centroid of
(y) =
wl f l cG l aGl
l=1
M
l
wl f aG l
l =1
Gl
(y)dy y Y
Gl
(y)dy = yY y
Gl
(y)dy aG l
(y) = min
Gl
(y), f
where f l is defined above. Clearly, the previous derivation of the Centroid of B (y) depends on
the separability of G l (y) and f l in the equation for B l (y), something that can not be
guaranteed when
Bl
(y) = min
Gl
l
(y), f ; hence, Koskos SAM is of very limited value.
Note that the center-of-sums defuzzifier is still applicable in this case, because (5-14) is in terms
of the centroid and area of output fuzzy sets and not consequent fuzzy sets. These quantities can
be computed numerically from knowledge of B l (y), as calculated from (5-10).
Read pages 149157.
Key Points
Defuzzification produces a crisp output from the fuzzy sets that appear at the output of a
FLSs inference block.
There are many kinds of defuzzifiers.
The defuzzifiers that are based on some sort of center of gravity computation are: centroid,
center-of-sums, height, modified height, and center-of-sets.
Many choices need to be made in order to specify or design a type-1 FLS; they provide the
designer with many design degrees of freedom.
A FLS can be interpreted as a fuzzy basis function (FBF) expansion, which places a FLS into
the more global perspective of function approximation.
FBFs are not radial basis functions and they are not orthogonal basis functions.
Every rule in a FLS, whether it comes from linguistic knowledge or is extracted from data,
can be associated with a FBF.
A FLS is a universal approximator, i.e., it can uniformly approximate any real continuous
non-linear function to arbitrary degree of accuracy.
Universal approximation is an existence theorem that helps to explain why a FLS is so
successful in engineering applications, but is does not tell us how to specify a FLS.
Rule explosion refers to rapid growth in the maximum number of rules that may be required
in a FLS, e.g. if there are p input variables, each of which is divided into r overlapping
regions, then a complete FLS must contain p r rules.
Practice Problems
Complete Exercises 54 and 56.
Reading Assignment
Read pages 157166 of the textbook. Omit Sections 5.9.4 and 5.9.5.
The following material supplements Section 5.9.3.
I. Interpretation of a Type-1 FLS as a Three-Layered Architecture
A singleton (or non-singleton) type-1 FLS can be viewed as a three-layered architecture. This
was first discovered by my former Ph. D. student Li-Xin Wang, around 1990, as part of his Ph.
D. research. This architecture suggests the possibility of back-propagating errors from the output
of the FLS to earlier layers, in analogy with back-propagation in a feed-forward neural network
(NN) (see discussions about this on the top of p. 166 in the textbook). It is important to note,
though, that the three-layer architecture for the FLS is merely a re-interpretation of the FLS and
is not a physical architectureimplementation. This is different from the layered architecture of
a NN, where that architecture is usually viewed as a physical implementation of the network.
Starting with (5-24) and (5-25), we re-express y(x) as follows:
y(x) = fs (x) h g
where
2
M
h = y l wl and g =
l =1
l= 1
in which
p
wl =
i=1
Fi l
(x i )
l = 1,...,M
These equations lead to the following three-layered architecture for this FLS:
f s (x)
h
...
Layer 3
w
y
Layer 2
i= 1
...
F i1
(xi )
. . .
F iM
(xi )
i= 1
Layer 1
x = col(x1 ,..., x p )
II. A Very Short Primer on Optimizing a Function Using an Algorithm That Makes Use of
First Derivative Information
There are many ways to optimize (i.e., minimize or maximize) a function. Here I will briefly
describe a very popular way that uses not only the value of the function being optimized but also
its first derivative. Methods that use this information are called steepest descent algorithms.
In order to keep the initial discussion as simple as possible, I shall assume that the function being
minimized depends only on a single parameter, . That function (called an objective function) is
3
denoted J( ), and an example of it is depicted in Figure 2. Observe that there are various kinds
of extrema that can occurrelative maxima, relative minima, global maximum, global
minimum, and even inflection points. When our goal is to minimize J( ), then we want to
determine the value of labeled in Figure 2 as * . One of the great challenges to doing this is
not to get trapped at a local extremum, e.g., at 1* or 2* . The importance of a good starting value
for can not be over-stated. If, for example, our initial choice is at 0 , then it is very likely that
an optimization algorithm that is based on derivative information will cause
to lock-on
(converge) to 1* or 2* . On the other hand, if the initial choice is at 0 , then it is very likely that
an optimization algorithm that is based on derivative information will cause to converge to the
global minimum at * .
One approach to trying to achieve the global minimum is to randomly choose 0 , solve for the
associated minimum of J( ), say J( ), and to repeat this procedure for a collection of such 0
values. One then chooses * as that value of associated with the smallest value of J( ). In
many practical optimization problems, it may not be essential to compute the overall global
minimum of J( ). A value of that leads to a small enough value of J( ) may suffice.
J( )
J( *)
(1)
Note that in the textbook I refer to all of training data as the data, which is then partitioned into
a training data subset and a testing data subset. The idea is to use the training data subset to
minimize J( ) the best you can, but to then evaluate how well you do this by using the testing
data subset. There will be a trade off between over-fitting using the training data subset and
generalization using the testing data subset. Usually, over-fitting leads to poor generalization
performance.
My goal in the next few paragraphs is to give you a fairly high-level explanation of the
construction of a steepest descent algorithm for minimizing objective function J( ), where in the
discussions below now is a vector of design parameters. In order to emphasize the role of the
data during the optimization process, as used by the optimization algorithm, I shall denote J( )
as J = J(D, ) . DTRAIN is used by the steepest descent algorithm because that algorithm is based
on minimizing JTRAIN = J(DTRAIN , ). DTEST is used to evaluate the overall optimization results by
computing JTEST = J(DTEST , ) and establishing an overall stopping rule, of the form:
J(DTEST ,
i +1
) J(DTEST , i )
(2)
where
is pre-specified. This is only one example of a stopping rule, but it is one that is
frequently used in practice. Another practical stopping rule is to choose a pre-specified
maximum number of iterations, and to stop the iterative minimization when that number is
reached. This stopping rule is not as effective as the first one because JTEST = J(DTEST , ) could
still be changing by a large amount after the pre-specified number of iterations has been reached.
The general structure of a steepest descent algorithm is:
=
i+1
gi + 1 (DTRAIN , i )
i = 0,1,...
(3)
[derivatives of J(D
TRAIN
, )
i = 0,1,...
(4)
The vertical-bar notation means that after we determine the derivatives of J(DTRAIN , )
analytically, some or all of them will still be explicit functions of the unknown , and those
values are then replaced by the best values we have for them, namely i .
5
In our tuning procedure we use a squared-error function [see (5-47) in the textbook], i.e.
J(DTRAIN , ) = e(DTRAIN , i )
(5)
where
e(DTRAIN , ) =
1
2
[ y(D
TRAIN
, ) y( j ) (DTRAIN )]
(6)
and
y(DTRAIN , ) = fs (DTRAIN , )
(7)
In (7), f s(DTRAIN , ) is the output of a singleton type-1 FLS. Its exact structure depends on the
many choices that have to be made by the designer of a FLS. One example of f s(DTRAIN , ) is
given in (5-46) of the textbook.
It is easy to compute the derivatives of J(DTRAIN , ), which are needed in (4), using (5)(7), i.e.
J(DTRAIN , ) e(DTRAIN , )
y(DTRAIN , )
=
= [y(DTRAIN , ) y ( j ) (DTRAIN )]
f (D
, )
= [y(DTRAIN , ) y ( j) (DTRAIN )] s TRAIN
(8)
In order to proceed further, the specific FLS choices mentioned above must be made. Those
choices will let us determine analytical formulas for fs (DTRAIN , ) . We complete these
calculations for a specific set of choices below in Section III.
This completes the high-level overview on optimizing a function using a steepest descent
algorithm. Lots of good software already exists for doing this (e.g., The MathWorks
Optimization Toolbox), software that has been written by experts who have included lots of the
bells and whistles that let a steepest descent algorithm work well. We return to software for
doing this in Lesson 14.
J( )
where
{ [ f (x
1
2
(i)
1
2
[ f (x
s
) y( i ) ]
l
k
(i)
l
k
) y
(i)
} = [ f (x
s
(i)
) y (i ) ]
fs (x ( i ) )
f s(x( i ) ) = y l l (x ( i ) )
l=1
and
(i )
1 xk mF
exp 2
2
k=1
(i )
l (x ) =
2
( i)
M
p
x
m
k
F
1
exp 2
2
l =1 k = 1
l
k
l
k
l
k
(9)
l
k
(a)
l
= y : In this case,
f (x( i ) ) =
y l s
(x( i ) )
so that
y l (i +1) = y l (i)
l
l
l J(y ) = y (i)
y
[ f (x
(i )
) y ( i) ] l (x (i ) )
which is (5-49).
(b)
= mF : In this case, it is helpful to use the layered architecture interpretation for (5-24) and
l
k
h = y l wl
l =1
g=
w
l= 1
and
(i)
1 x k mF
l
w = exp 2
2
k=1
)
2
l
k
f s
fs w l
=
w l m F
m F
l
k
l
k
where
h
g
g l h l
l
l
f s
w
w = gy h = y f s
=
l
2
2
w
g
g
g
and
w l
=
m F
mF
l
k
(i )
1 xk mF
exp 2
2
k=1
l
k
)
2
l
k
l
k
( i)
1 xk mF
=
exp 2
2
mF
l
k
( i)
1 x k mF
= exp 2
2
k =1
l
k
) exp (x
2
l
k
) ( x
2
l
k
(i)
k
mF
l
k
l
k
k=1
k l
1
2
) = (x
mF
( i)
k
)
2
l
k
F kl
(i )
k
mF
l
k
Fk l
) w
Fkl
Hence
xk(i ) m F
f s
yl fs
=
wl
2
m F
g
F
l
k
l
k
l
k
mF (i + 1) = m F (i)
l
k
l
k
[ f (x
( i)
) y( i ) ]
fs
mF
= mF (i)
l
[ f (x
s
(i )
)y
(i)
] [ y (i) f (x
l
(i)
(x
)]
(i )
k
) w (i)
mF (i)
l
k
2
Fkl
(i)
g(i)
l
(x (i ) ) , w and g, that
(x (i ) )
Substituting this last equation into the one just before it, we reach the steepest descent algorithm
for updating mF that is given in (5-48).
l
k
(c)
Fkl
: The derivation of (5-50) is just like the derivation of (5-48). The key steps are
summarized in the layered architectural equations given above for f s , h, g and w l . We then
compute
fs
Fkl
fs
wl
wl
F kl
where fs w l has been computed above, so we only need the new computation of wl
Because this last computation is just like the one we just carried out for w
Fkl
m F , we leave its
l
k
Key Points
Each training datum can be interpreted as an IFTHEN rule of the form IF x1 is F1l and
L and xp is Fpl , THEN y is Gl , where Fi l are fuzzy sets described by Gaussian (other shapes
can be used) MFs. A particular design method establishes how the MF parameters are
specified.
It is good design practice to have fewer FLS design parameters than training pairs; hence, a
constraint always exists among the number of training samples, number of rules, and number
of antecedents.
Three high-level designs can be associated with a singleton type-1 FLS, ranging from one in
which the data establishes the rules and no tuning is used, to two others in which the training
data is used to tune some or all of the antecedent and consequent MF parameters.
The layered architecture for a type-1 FLS suggests that errors will be back-propagated
during a steepest descent parameter tuning procedure, just as they are during the steepestdescent design of a feed-forward neural network.
The two one-pass design methods let the data establish either the parameters of the MFs or
the entire rule. Their major drawback is that they lead to a FLS that has too many rules.
When all of the antecedent parameters are pre-specified, the method of least-squares can be
used to design the consequent parameters; doing this leads to a linear system of equations
that has to be solved for the consequent parameters. Knowing how to choose the antecedent
parameters ahead of time is a major drawback to using this design method.
When none of the antecedent or consequent parameters are pre-specified, they can all be
tuned using the method of steepest descent.
Calculating the derivative of the objective function, which is required to derive a steepest
descent algorithm, requires a careful use of the chain rule; this can be expedited by making
use of the three-layer architectural interpretation of a type-1 FLS.
Practice Problem
Complete Exercise 5-10.
Reading Assignment
Read pages 169183 of the textbook.
Key Points
Practice Problems
Complete Exercises 514 and 515.
Explain why the architecture of a non-singleton type-1 FLS is the same as for a singleton
type-1 FLS.
Describe what is meant by non-singleton fuzzification.
Demonstrate the calculation of the sup-star composition for the case of non-singleton fuzzification and explain why it is more difficult than in the singleton case.
Explain how a non-singleton FLS can be interpreted as a prefiltering operation on the
measurements followed by the inference mechanism.
Demonstrate pictorial descriptions of the firing of rules and the combining of multiple-fired
rules.
Explain that what is new for a non-singleton type-1 FLS is the need for the designer to
choose MFs for the input measurements, something that wasnt necessary for a singleton
type-1 FLS.
Demonstrate the inputoutput formula for a non-singleton type-1 FLS as a fuzzy basis
function (FBF) expansion and explain the differences between this FBF expansion and the
FBF for singleton type-1 FLSs.
Explain how training data can be interpreted as a collection of IF-THEN rules and describe
what the difference is between these IF-THEN rules and the ones for a singleton type-1 FLS.
Enumerate how many design parameters there can be in a specific design and describe the
relation of that number to the number of possible rules in the non-singleton type-1 FLS, and
how these numbers compare with those for a singleton type-1 FLS.
Describe four high-level designs that can be associated with a non-singleton type-1 FLSs.
Describe two high-level approaches to the tuning of a non-singleton FLS.
Demonstrate that the design methods learned for singleton type-1 FLSs are easily modified
for non-singleton type-1 FLSs.
Demonstrate how one-pass and back-propagation design methods can be applied to
forecasting the Mackey-Glass time-series.
Explain that if we only have access to noisy measurements, then the performance of a nonsingleton type-1 FLS outperforms that of a singleton type-1 FLS, but that there is room for
further improvements.
Reading Assignment
Read pages 186192 of the textbook.
Before you read Example 62, review Examples 51 and 52 in Lesson 5.
Read pages 193209 of the textbook. Omit Sections 6.6.4 and 6.6.5.
Key Points
A non-singleton type-1 FLS consists of a fuzzifier, inference mechanism and defuzzifier; its
rules are the same as those for a singleton type-1 FLS; it differs from a singleton type-1 FLS
in the nature of the fuzzifier.
A non-singleton fuzzifier treats each input as a fuzzy number, i.e. it assigns a MF to each
input that has a value equal to one at the measured value of the input and decreases to zero as
the input variable gets farther away from the measured input value.
As in a singleton type-1 FLS, the MF of a fired rule, B (y), is given by B (y) =
supx X A (x) A G (x, y) , y Y ; but, for a non-singleton type-1 FLS, the sup operation
does not disappear, because A (x) has non-zero values over a range of values for each xi .
Except for some simple, but important choices for the MFs (e.g., Gaussian MFs) it is not
possible to evaluate this sup-star composition analytically.
l
A non-singleton FLS first pre-filters its input x, transforming it into x max
. Doing this
accounts for the effects of the input measurement uncertainty, and is a direct result of the
sup-star composition.
Only the pictorial description for the input and antecedent operations of a non-singleton type1 FLS differs from the one for a singleton FLS. The other pictorial descriptions remain the
same.
The only difference between a type-1 non-singleton and a singleton FLS is the numerical
value of the firing level; for the former, this value includes the effects of input uncertainties,
whereas for the latter it does not.
The same choices must be made to specify or design a non-singleton type-1 FLS as had to be
made for a singleton type-1 FLS. In addition, the designer must specify the MFs for the input
measurements, which provides new design degrees of freedom to the non-singleton FLS.
A non-singleton FLS can also be interpreted as a FBF expansion. Input uncertainty may
activate more of these FBFs, which means that decisions are more distributed in the nonsingleton case than in the singleton case.
Training data establish exactly the same sort of rules as they did in the singleton FLS case,
and a particular design method establishes how the MF parameters are specified, including
those for the input MFs.
The constraint that exists among the number of training samples, number of rules, and
number of antecedents is slightly different for a non-singleton type-1 FLS than it is for a
singleton type-1 FLS, because of the addition of the input MF parameters.
x
Four high-level designs can be associated with a non-singleton type-1 FLS, ranging from one
in which the data establishes the rules and no tuning is used, to three in which the training
data is used to tune some or all of the antecedent, consequent, and input measurement MF
parameters.
One approach to designing a non-singleton type-1 FLSthe partially dependent approach
is to first design the best possible singleton FLS, freeze the common parameters, and only
optimize the parameters that are new to the non-singleton type-1 FLS. A second
approachthe totally independent approachis to design the best possible non-singleton
type-1 FLS regardless of any pre-existing singleton FLS design.
The one-pass and least-squares design methods developed for a singleton type-1 FLS are
essentially the same for a non-singleton type-1 FLS.
The steepest-descent algorithms are different for a non-singleton type-1 FLS because of the
pre-filtering operation performed by the sup-star composition.
A non-singleton type-1 FLS forecaster is less sensitive to noisy measurements than a
singleton type-1 FLS forecaster, but the improvement is modest.
When the training data are noisy there is no way to account for this in the antecedent and
consequent MFs of a type-1 FLS. This represents a limitation of a type-1 FLS.
Practice Problems
Exercise 111
Example 6-1 (just as Example 5-1) is one of the most important ones given in Chapter 6, because
it provides a geometric interpretation for the operations that occur within the inference engine. In
this exercise, I want you to provide the figures that are comparable to the ones given in Figures
6-3, 5-5 and 5-6, but using triangular MFs. Do this for both the minimum and product t-norms.
Complete Exercises 65 and 67.
Reading Assignment
Read pages 421428 of the textbook. Omit Section 13.3
Section 13.4 explains how to design both type-1 and type-2 TSK and Mamdani FLSs for the
problem of forecasting compressed video traffic. Because the textbook interweaves material
about both type-1 and type-2 designs, here we will filter out all of the type-2 design materials
(leaving them for the follow-on course New Directions in Rule-Based Fuzzy Logic Systems:
Handling Uncertainties), i.e. we will guide you through Section 13.4.
Start by reading Section 13.4.1, including Example 13-5, pp. 442444, but, omit the two
paragraphs directly after Example 13-5. Read the last paragraph of Section 13.4.1.
Next, we have extracted materials from Sections 13.4.213.5 that focus on the type-1 designs.
Section 13.4.2 Forecasting I frame sizes: General Information
In the rest of this section we focus on the problem of forecasting I frame sizes (i.e., the number
of bits/frame) for a specific video product, namely Jurassic Park. All of our methodologies for
doing this apply as well to forecasting P and B frame sizes and can also be applied to other video
products.
Here we examine two designs of FLS forecasters based on the logarithm of the first 1000 I frame
sizes of Jurassic Park, s(1), s(2), , s(1000) (see Figure 13-1). Those designs are type-1 TSK
FLS and singleton type-1 Mamdani FLS. We used the first 504 data [s(1), s(2), , s(504)] for
2
tuning the parameters of these forecasters, and the remaining 496 data [s(505), s(502), ,
s(1000)] for testing after tuning.
Type-1 TSK FLS: The rules of this FLS forecaster are (i = 1, , M)
Ri : I Fs(k 3) is F1i and s(k 2) is F2i and s(k 1) is F3i
and s(k) is F4i THEN s i (k +1) = c i0 + c1is(k 3)
(13-53)
(13-57)
We used height defuzzification. As we did for the type-1 TSK FLS, we initially chose Fji to be
the same for all i (rules) and j (antecedents), and used a Gaussian membership function for them,
one whose initial mean and standard deviation were chosen from the first 500 I frames, as
described earlier, as m = 4.7274 and
= 0.0954 . According to Table 13-1, the number of
design parameters for this singleton type-1 Mamdani FLS is (2 p +1)M = 9M .
13.4.3 Forecasting I frame sizes: Using the same number of rules
In this first approach to designing the two FLS forecasters, we fixed the number of rules at five
in both of them; i.e., M = 5. Doing this means that the type-1 TSK FLS is described by 65 design
parameters and the singleton type-1 Mamdani FLS is described by 45 design parameters.
Steepest descent algorithms (as described in Section 5.9.3 for the Mamdani FLS and in Section
13.2.4 for the TSK FLS) were used to tune all of these parameters. In these algorithms, we used
step sizes of = 0.001 and = 0.01 for the TSK and Mamdani FLSs, respectively.
We have already explained how we chose initial values for the membership function parameters.
All of the remaining parameters were initialized randomly, as follows:
Consequent parameters, c ij (i = 1,...,5; j = 0,1,...,4) , of the TSK FLS were each chosen
randomly in [0, 0.2] with uniform distribution.
Consequent parameters, y i (i = 1,...,5), of the Mamdani FLS were chosen randomly in
[0, 5] with uniform distribution.
3
Because we chose the initial values of the consequent parameters randomly, we ran 50 MonteCarlo realizations for each of the 2 designs.10 For each realization, each of the two FLSs was
tuned for 10 epochs on the 504 training data. All designs were then evaluated on the remaining
496 testing data using the following RMSE:
RMSE =
1
496
999
k = 504
[s(k + 1) f
FLS
(s( k ) )]
(13-59)
where s( k ) = [s(k 3),s(k 2), s(k 1), s(k)]T . The average value and standard deviations of these
RMSEs are plotted in Figure 13-2 for each of the 10 epochs. Observe, from Figure 13-2(a), that
(pay attention only to the curves for the two type-1 designs):
1. After 10 epochs of tuning, the average RMSE of the 2 FLS forecasters is:
2. In terms of average RMSE and standard deviation of the RMSE, the type-1
TSK FLS outperforms the singleton type-1 Mamdani FLS for epochs 210.
13.4.4 Forecasting I frame sizes: Using the same number of design parameters
Because a five-rule TSK FLS always has more parameters (design degrees of freedom) to tune
than does a comparable five-rule Mamdani FLS, we modified the previous approach to designing
the two FLSs. We did this by fixing the rules used by the TSK FLS at five and by then choosing
the number of rules used by the Mamdani FLS so that its total number of design parameters
approximately equals the number for the TSK FLS. Doing this led us to use seven rules for the
Mamdani FLS. The designs of the resulting two FLSs proceeded exactly as described in the
preceding section. All designs were again evaluated using the RMSE in (13-59). The average
value and standard deviations of these RMSEs are plotted in Figure 13-3 for each of the 10
epochs (again, only pay attention to the curves for the two type-1 designs). Observe that:
The results are similar to the ones depicted in Figure 13-2; so, at least for this
example, equalizing the numbers of design parameters in the Mamdani and
TSK FLSs does not seem to be so important.
13.4.5/13.5 Conclusion
It is not our intention in this example to recommend one FLS architecture over another. Some
people prefer a TSK FLS over a Mamdani FLS or vice-versa. We leave that choice to the
designer who, as always, must be guided by a specific application. When both kinds of FLSs are
applicable, as in the case of forecasting a random-signal and perfect-measurement time-series,
the designer can carry out a comparative performance analysis between the two architectures, as
we have just done.
10
In Chapters 5 and 6 Monte-Carlo simulations were run to average out the effects of additive measurement noise. Here they are run to average
out the effects of random initial consequent parameter values.
Key Points
TSK is short for Takagi, Sugeno and Kang, the originators of the TSK FLS.
To-date only a singleton type-1 TSK FLS has been described in the literature.
The most widely used type-1 TSK FLS uses first-order rules, i.e., rules whose antecedents
are type-1 fuzzy sets, and whose consequent is a linear combination of the measured
antecedents. The fact that its consequent is a function and not a fuzzy set is the biggest
difference between a TSK FLS and a Mamdani FLS.
The output formula for a type-1 TSK FLS is obtained by combining its rules in a prescribed
way; it does not derive from the sup-star composition, as does the output of a type-1
Mamdani FLS. This is another big difference between a TSK FLS and a Mamdani FLS.
Normalized and unnormalized type-1 TSK FLSs have been defined.
When the consequent function in a TSK rule is a constant, then the normalized type-1 TSK
FLS is exactly the same as a type-1 Mamdani FLS that uses either center-of-sums, height,
modified height, or center-of-sets defuzzification.
TSK FLSs are also universal approximators.
Just as in a type-1 Mamdani FLS, a constraint always exists among the number of training
samples, number of rules and number of antecedents in a type-1 TSK FLS. Because the
consequent of a TSK rule contains more design parameters than does the consequent of a
Mamdani rule, a TSK FLS that uses the same number of rules as a Mamdani FLS always has
more design degrees of freedom than a Mamdani FLS.
Two high-level designs can be associated with a singleton TSK FLS. In one design, the
shapes and parameters of all the antecedent MFs are fixed ahead of time and the training data
is used to tune only the consequent parameters. In the other design, the training data is used
to tune all of the MF and consequent parameters.
When all of the antecedent parameters are pre-specified, the method of least-squares can be
used to design the consequent parameters; doing this leads to a linear system of equations
that has to be solved for the consequent parameters. Knowing how to choose the antecedent
parameters ahead of time is a major drawback to using this method.
When none of the antecedent of consequent parameters are pre-specified, they can all be
tuned using the method of steepest descent.
It is possible to interweave the steepest-descent and least squares design methods to obtain a
more powerful iterative design method. (This can also be done for the design of a Mamdani
FLS.)
Forecasting compressed video means predicting a future value of either an I, P or B frame,
directly in the compressed video domain, using a window of previously measured I, P, or B
frame values.
Forecasting of compressed video can be accomplished using either singleton type-1 TSK or
Mamdani FLSs. Somewhat better performance is achieved for the TSK forecaster.
Some people prefer a TSK FLS over a Mamdani FLS or vice-versa. The final choice is left to
the designer who, as always, must be guided by a specific application.
Practice Problem
Complete Exercise 131. [This exercise is very similar to the calculations that are included in
Lesson 9 of this Study Guide. So, you may be wondering why I am asking you to once again
carry out derivative calculations. My answer to this rhetorical question is These calculations
require your bringing together all of the equations that are needed to implement a type-1 TSK
FLS, and this is a good thing to do.]
Reading Assignment
Read below about one or more of the following three applications:
1. Rule-based pattern classification of video traffic
2. Equalization of time-varying non-linear digital communication channels
3. Fuzzy logic control
I. Rule-Based Classification of Video Traffic
For this self-study course, we focus on the use of type-1 FLSs as rule-based classifiers.
Consequently, we have modified Section 14.4 of the textbook as follows:
1. Read the first three paragraphs in Section 14.4 on pp. 458459.
2. Paragraph 4 of Section 14.4, on p. 459, is modified to:
Given a collection of MPEG-1 compressed movies and sports program videos, we shall use a
subset of them to create (i.e., design and test) a rule-based classifier (RBC) in the framework of
FL. We shall develop two type-1 classifiers and compare them to see which provides the best
performance. Our overall approach is to:
1. Choose appropriate features that act as the antecedents in a RBC
2. Establish rules using the features
3. Optimize the rule design-parameters using a tuning procedure
4. Evaluate the performance of the optimized RBC using testing
The first two steps of this procedure are relatively straightforward. The third step requires that
we establish the computational formulas for the FL-based classifiers, in much the same way that
we established such formulas for the Mamdani FLSs of Chapters 5 and 6 and the TSK FLS in
Chapter 13. We do this following. The fourth step requires that we also baseline our FL
classifiers. We do this using the accepted standard of a Bayesian classifier, one whose structure
we also explain following.
(14-1)
Observe that these rules are a special case of a Mamdani FLS rule, one in which the consequent
is a singleton. Such a rule can also be interpreted as a TSK rule.
We use a very small number of rules, namely one per video product, e.g. if our training set
contains four movies and four sports programs, we use just eight rules.
6. Omit Section 14.4.4.
7. Section 14.4.5 (Design parameters in a FL RBC) is modified to:
In our simulations below we shall design two FL RBCssingleton type-1 FL RBC and nonsingleton type-1 FL RBC. The design results will establish which classifier provides the best
performance.
Each antecedent membership function has two design parameters, its mean and standard
deviation; hence, there are six design parameters per rule. For the non-singleton type-1 FL RBC
there is also one additional design parameter for each measurementthe standard deviation of
its Gaussian MF.
Optimum values for all design parameters are determined during a tuning process; but, before
such a process can be programmed, we must first establish computational formulas for the FL
RBCs.
8. Read Section 14.4.6.
9. Omit Section 14.4.7.
4
From these results, we see that the non-singleton type-1 FL RBC provides the best performance,
and has 35.8% fewer false alarms than does the Bayesian classifier. Additional simulation
studies that use 20 video products (10 movies and 10 sports programs) have been performed and
support these conclusions.
In summary, we have demonstrated that it is indeed possible to perform high-level classification
of movies and sports programs working directly with compressed data. Even better performance
is possible using type-2 FL RBCs, as will be demonstrated in the follow-on course New
Directions in Rule-Based Fuzzy Logic Systems: Handling Uncertainties (see, also, the discussion
of results in the textbooks Section 14.4.10 on pp. 468469).
4
II. Equalization of Time-Invariant Non-linear Digital Communication Channels
For this self-study course, we focus on the use of type-1 FLSs as equalizersfuzzy adaptive
filters (FAFs)for time-invariant non-linear digital communication channels. Consequently, we
have modified Section 14.5 of the textbook as follows:
1. Read pp. 469470, through Figure 14-2.
2. Read Section 14.5.1.
3. Omit Section 14.5.2.
4. Section 14.5.3 (Designing the FAFs) is modified to:
Here we illustrate the design of a singleton type-1 FAF for the non-linear time-invariant channel
in (14-36). The FAF has eight rules, one per channel state, and the rules have the following
structure (l = 1, , 8):
Rl : I Fr(k) is F1l and r(k 1) is F2l , THEN y l = wl
(14-44)
RBC ,1
Decision rule L u s e ( 1 4 - 1 0y )
RBC ,1 ( x )
FAF ,1
yFAF
,1
(x)
(x)
In Karnik et al. (1999) and Liang and Mendel (2000d), the mean-value parameters of all
membership functions were estimated using a clustering procedure [Chen et al. (1993a)] that was
applied to some training data, because such a procedure is computationally simple. We used this
same procedure. An alternative to doing this is to use a tuning procedure.
5. Section 14.5.4 (Simulations and conclusions) is modified to:
Here we compare a singleton type-1 FAF and a K-nearest neighbor classifier (NNC) [Savazzi et
al. (1998)] for equalization of the time-invariant non-linear channel in (14-36). In our
simulations, we chose the number of taps of the equalizer, p, equal to the number of taps of the
channel, n +1, where n = 1; i.e. p = n + 1 = 2 . The number of rules equaled the number of
clusters; i.e. 2 p + n = 8. We used a sequence s(k ) of length 1000 for our experiments. The first
5
121 symbols8 were used for training (i.e. clustering) and the remaining 879 were used for testing.
The training sequence established the parameters of the antecedent membership functions, as
described in Section 14.5.3. After training, the parameters of the type-1 FAF were fixed and then
testing was performed.
The results below do not appear in the textbook (the ones in Figures 14-4 and 14-5 are for a
time-varying channel). They were created especially for the Study Guide by Dr. Qilian Liang.
We ran simulations for nine different SNR values, ranging from SNR = 10dB to SNR = 18dB (at
equal increments of 1dB), and we set d = 0. We performed 100 Monte-Carlo simulations for
each value of SNR, where in each realization the AGN was uncertain. The mean values and
standard deviations of the bit error rate (BER) for the 100 Monte-Carlo realizations are plotted in
Figures 1 and 2 below, respectively. Observe, from these figures that:
4. In terms of the mean values of BER, the type-1 FAF performs better than the NNC (see
Figure 1).
5. In terms of the standard deviation of BER, the type-1 FAF is more robust than the NNC
(see Figure 2).
These observations suggest that a type-1 FAF, as just designed, looks very promising as a
transversal equalizer for time-invariant non-linear channels.
Figure. 1: Average BER of type-1 FAF and nearest neighbor classifier (NNC) versus SNR.
Figure. 2: STD of BER of type-1 FAF and nearest neighbor classifier (NNC) versus SNR.
When uncertainties, such as additive measurement noise or time-varying channel coefficients,
are present, then type-2 FAFs outperform their type-1 counterparts, because they are able to
model such uncertainties and minimize their effects. This will be demonstrated in the follow-on
course New Directions in Rule-Based Fuzzy Logic Systems: Handling Uncertainties.
Fuzzy Control
Non-adaptive fuzzy
control (plant model known)
Adaptive fuzzy
control (plant model unknown)
Linear
plant model
Non-linear
plant model
Fuzzy
plant model
Direct
scheme
Indirect
scheme
Basic
property
analysis,
optimal
fuzzy
control, etc.
Basic
property
analysis,
optimal
fuzzy
control, etc.
Robust
fuzzy
control and
LMI,
stability
analysis,
etc.
Learning the
fuzzy
controller
parameters
on-line,
incorporating
human control
knowledge,
stability and
convergence
analysis, etc.
Learning
model
parameters
on-line,
using plant
knowledge,
stability and
convergence
analysis, etc.
8
that rule. The fuzzy control rules are designed so as to push the systems states to the so-called
sliding surface. See Chapter 19 of Wang (1997) or Driankov, et al (1996) for detailed
discussions about fuzzy sliding-mode control.
III.B.3 Fuzzy plant model plus fuzzy controller
Using a fuzzy model of the plant as well as a fuzzy controller is very popular in recent fuzzy
control studies. The plant is modeled using r TSK fuzzy logic rules of the form:
Plant Ri : IF z1 (t) is Fi 1 and L and z g (t) is Fig , THEN x (t) = Ai x(t) + Bi u(t), yi (t )= C ix(t)
(1)
where i = 1,...,r . The fuzzy controller is also modeled by r TSK fuzzy rules of the form:
Controller Ri : IF z1 (t) is Fi 1 and L and z g (t) is Fig , THEN u(t) =Ki x(t)
(2)
where i = 1,...,r . The main advantage of this approach is that, although the plant model and the
controller are non-linear, the control law can be designed locally (i.e., for each i) using linear
control design principles. Specifically, from (1) and (2), we see that for each local region
described by IF z1 (t) is Fi1 and L and zg (t) is Fig the plant model is linear and the controller is also
linear. Studies have shown that if all of the local linear controllers are stable, then under certain
conditions the global control system is also stable. For detailed discussions about this, see
[Tanaka, K., Ikeda, T. and H. O. Wang, Fuzzy Regulators and Fuzzy Observers: Relaxed
Stability Conditions and LMI-Based Designs, IEEE Trans. on Fuzzy Systems, vol. 6, pp. 250256, 1998.]
IV. Adaptive Fuzzy Control
In this section we describe two kinds of adaptive fuzzy controlindirect and direct. In indirect
adaptive fuzzy control the fuzzy controller comprises a number of fuzzy systems constructed
(initially) form knowledge about the plant, whereas in direct adaptive fuzzy control, the fuzzy
controller is a single fuzzy system constructed (initially) from knowledge about the control. It is
even possible to combine indirect and direct fuzzy controllers.
IV.A Indirect Adaptive Fuzzy Control
In indirect adaptive fuzzy control, plant non-linearities are unknown and fuzzy systems are used
to model them. The parameters of the fuzzy systems are tuned on-line in such a way that the
overall output of the fuzzy system model follows the output of the plant. The controller is
designed according to the fuzzy system model, which is considered to be the true model of the
plant. Since the fuzzy system model is changing on-line, the controller is time-varying and
adaptive. More specifically, consider the plant with the structure:
x (t)= f(x( t ) ) +g(x(t))u(t)
(3)
y(t) = x1 (t)
(4)
where f and g are unknown non-linear functions. The fuzzy system model for the plant is
9
x (t)= f(x( t ) | f (t))+ g (x( t ) | g (t))u(t)
(5)
where f and g are fuzzy systems, and f (t) and g (t) are parameters of the respective fuzzy
system. These parameters change on-line (which is why they are shown as functions of time) so
as to make f and g approximate f and g, respectively. The adaptation laws for f (t) and g (t)
have the general forms:
(t) = h (
f
f
(t),
(t) = h ( (t),
g
g
g
(6)
(7)
The controller, u(t), is designed as if (5) is the true model of the plant in (3). For example, the
following controller cancels the non-linearities and then uses a linear control law to make the
plant output x(t) follow a desired trajectory, x d (t) , of a first-order dynamical system:
u(t) =
1
f(x(t)|
g (x(t) | g (t))
(8)
(9)
where u is a standard FLS whose parameters, (t) , are up-dated on-line in a similar manner to
(6) and (7), so as to force y(t) follow a desired trajectory, x d (t) . See Wang (1997, 1994) for the
details.
V. Conclusions
Fuzzy control is an active research field and many new results have appeared in recent years. A
good reference that puts many approaches to fuzzy control into a single book is [Farinwata, S.
S., Filev, D. and R. Langari, Fuzzy Control: Synthesis and Analysis, John Wiley & Sons, Ltd.,
New York, 2000].
Key Points
I. Rule-Based Classification of Video Traffic
Direct classification of compressed video traffic can save time and money.
The three features that are used in RBCs are logarithm of bits per I, P, and B frames.
I frames have more bits/frame than P frames, which have more bits/frame than B frames.
10
Rules for a RBC of compressed video traffic use the three selected features as their
antecedents and have one consequent (+1 for a movie or 1 for a sports program).
Each video product leads to one rule.
The computational formulas for type-1 FL RBCs follow directly from computational
formulas for singleton or non-singleton unnormalized Mamdani type-1 FLSs, in which the
MF of the consequent is either 1(for a movie or sports program) or 0 (for anything else).
Each rule has a small number of design parameters that can be tuned using a training set of
video traffic and the steepest descent tuning procedures described in Chapters 5 and 6.
The performance of the FL RBCs is base-lined against a Bayesian classifier.
False-alarm rate (FAR) is used as the measure of performance for all classifiers.
The FLCs outperformed the Bayesian classifier, and the FAR of the non-singleton type-1 FL
RBC gave the best results.
11
In adaptive fuzzy control, the controllers parameters are updated during the real-time
operation of the overall system.
Review Questions
I. Rule-Based Classification of Video Traffic
1. Circle all of the possible design parameters for a non-singleton type-1 FL RBC, when
Gaussian MFs are used:
a. Mean of each antecedent MF
b. Mean of the consequent MF
c. Standard deviation of each antecedent MF
d. Standard deviation of the consequent MF
e. Mean of each measurement MF
f. Standard deviation of each measurement MF
g. Kurtosis of each measurement MF
2. The output of the FLS in a RB FLC:
a. must be normalized
b. does not have to be normalized
c. must come from a Mamdani architecture
M
l=1
a.
=0
=1
> 0 always
< 0 always
l=1
M
b.
l =1
M
c.
l=1
M
d.
l=1
4. A 10-rule singleton type-1 FL RBC that uses Gaussian MFs has how many design parameters?
a. 50
b. 60
c. 70
5. Suppose that a FL RBC gives the following results for 500 testing elements: 240 movies are
correctly classified, 245 sports programs are correctly classified, 10 movies are mis-classified as
12
sports programs, and 5 sports programs are mis-classified as movies. How many false alarms are
there?
a. 5
b. 10
c. 15
13
III. Fuzzy Logic Control
1. How many kinds of fuzzy logic controllers are there?
a. one
b. many
c. six
2. Controllers parameters are determined during the design phase and do not change during the
on-line implementation phase in what kind of control?
a.
b.
c.
d.
non-linear
non-adaptive fuzzy control
adaptive fuzzy control
sliding-mode control
3. The motivation to study the control of a linear plant using a fuzzy controller is:
a. Improved performance can usually be obtained by controlling a linear plant with a nonlinear controller, and a FL controller is non-linear
b. Systems that use a fuzzy logic controller are guaranteed to be stable and robust
c. They are very simple to design
4. Sliding-model control can be applied in the presence of non-linear plant-model uncertainties
and plant-parameter disturbances, provided that the uncertainties and disturbances are:
a.
b.
c.
d.
uncorrelated
unknown
known
stationary
5. The main advantage to using a fuzzy model of the plant as well as a fuzzy controller is:
a. The plant model and controller are linear
b. Although the plant model and controller are non-linear, the control law can be designed
locally, and for each local region the plant model is linear and the controller is also linear
c. Although the plant model and controller are non-linear, the control law can be designed
locally, and for each local region the plant model and the controller are non-linear
Lesson 14COMPUTATION
Learning Objectives
This lesson focuses on computation, both for implementing a type-1 FLS during its operation
and for the design of the FLS. The purposes of this lesson are to enumerate all computations for
singleton and non-singleton type-1 Mamdani FLSs and for a singleton Type-1 TSK FLS, and to
overview on-line software that is available for these computations. This lesson will let you see
the forest from the trees (so-to-speak). After completing this lesson you will be able to:
Describe the nature of and the order of all computations needed to implement the three type-1
FLSs studied in this course.
Describe the nature of and the order of all computations needed to design the three type-1
FLSs studied in this course.
Explain what software is available to implement and design the three type-1 FLSs studied in
this course.
Reading Assignment
All of the reading material for this lesson is in this Study Guide.
I. Implementation of Type-1 Mamdani FLSs
In this section we collect all of the equations that are needed to implement singleton and nonsingleton type-1 Mamdani FLSs. These equations require the designer to make many choices
(see Figure 59) and will change if the choices are different from the ones we make.
I.A Singleton type-1 Mamdani FLS
General equations for inference engine [see (510)]:
Bl
(y) =
(y)
F1l
( x1)L
Fpl
( xp ) , y Y
(1)
Inputoutput equation for the FLS: This requires specific choices to be made, e.g. maxproduct composition and product implication [which together mean we use the product t-norm in
(1)], and height defuzzification, so that [see (524) and (525)]:
y(x) = fs (x) =
(x) =
M
l=1
F il
i=1
l =1
i=1
y l l (x)
(xi )
F il
(xi )
(2)
l =1,..., M
(3)
2
Final implementation of inputoutput equation for the FLS: This requires choices to be
made about the MFs, e.g. Gaussian antecedent MFs [see (533)]
2
1 x i mF
(xi ) = exp 2
F il
(4)
(x lk ,max ) = sup x
Xk
(xk )
(xk )
(5)
(x k )
(6)
(7)
Fkl
where
Xk
(xk )
Fkl
(y) =
(y) Tk =1
Q lk
(x k ,max )
Inputoutput equation for the FLS: This requires specific choices to be made, e.g. maxproduct composition and product implication [which together mean we use the product t-norm in
(7)], and height defuzzification, so that [see (617) and (618)]:
(x) =
M
l=1
(8)
k=1
y l l (x)
Qk
l=1
k =1
(xk ,max )
Q kl
(x lk,max )
(9)
Final implementation of inputoutput equation for the FLS: This requires choices to be
made about the MFs, e.g. Gaussian antecedent MFs and Gaussian input MFs [see (624) and (625)]
2
1 x i mF
(xi ) = exp 2
Fi
(10)
x k m X 2
(xk ) = exp 12
k = 1,...,p
Xk
(11)
mF +
2
Xk
2
F kl
l
k
2
X
(12)
2
Fkl
Q lk
mX
1 mX m F
l
(x k ,max ) = exp 2
2
2
X +
F
l
k
l
k
)
2
(13)
Equations (8), (9) and (13) implement a non-singleton type-1 Mamdani FLS.
yTSK ,1
(x) =
M
i =1
f i (x) = Tkp= 1
f i (x)
i =1
Fki
(14)
(15)
(xk )
Final implementation of inputoutput equation for the normalized TSK FLS: This
requires choices to be made about the MFs, e.g. Gaussian antecedent MFs [see (136)]
2
1 x i mF
(xi ) = exp 2
Fi
(16)
4
II.B First-order unnormalized type-1 TSK FLS
General equations [see (13-2)(13-4)]: Using product t-norm,
f i (x) = Tkp= 1
(17)
(18)
(xk )
Fki
Final implementation of inputoutput equation for the unnormalized TSK FLS: This
requires choices to be made about the MFs, e.g. Gaussian antecedent MFs [see (136)]
2
1 x i mF
(xi ) = exp 2
F il
(19)
III. Designs of Mamdani FLSs Using a Back Propagation (Steepest Descent) Design
Procedure
In this section we collect all of the equations that are needed to design singleton and nonsingleton type-1 Mamdani FLSs using the back-propagation (steepest descent) method. These
equations require the designer to make many choices (see Figure 59) and will change if the
choices are different from the ones we make.
III.A Singleton type-1 Mamdani FLS
The key design equations are described in Section 5.9.3 [see (548)(550)]:
mF (i + 1) = m F (i)
l
k
l
k
[x
(i )
k
Fkl
(i +1) =
F kl
l
k
xk(i ) mF (i)
l
k
3
l
Fkl
(21)
[ fs (x (i ) ) y( i ) ][ y l (i) fs (x ( i ) )]
Fk
where
(i)
(20)
( i)
l (x )
[ fs (x ( i ) ) y( i ) ] l (x ( i) )
(i)
mF (i)
F kl
y l (i +1) = y l (i)
l
(i)
[ fs (x ( i) ) y( i ) ][ y (i) fs (x )]
(i)
(22)
( i)
(x )
(x( i ) ) and f s(x( i ) ) are computed using (2)(4) in which y l = y l (i) , mF = m F (i) and
F kl
l
k
(i).
l
k
5
III.B Non-singleton type-1 Mamdani FLS
The key design equations are described in Section 6.6.3 [see (630)(633)]:
mF (i + 1) = m F (i)
l
k
l
k
xk( i ) mF (i)
l (x (i ) )
2
2
(i) + F (i)
X
(23)
l
k
l
k
y l (i +1) = y l (i)
(i +1) =
Fkl
F kl
(24)
(i)
xk(i ) mF (i)
(i) 2
2
X (i) + F (i)
(25)
l
k
Fk
(i)
l (x )
l
k
(i + 1) =
(i)
xk( i) m (i)
F
X (i) 2
X (i) + F2 (i)
l
k
(26)
(i )
)
l (x
l
k
where l (x( i ) ) and f ns (x ( i ) ) are computed using (8), (9) and (13) in which y l = y l (i) ,
mF = m F (i), F = F (i) and X = X (i).
l
k
l
k
l
k
l
k
IV. Designs of TSK FLSs Using a Back Propagation (Steepest Descent) Design Procedure
In this section we collect all of the equations that are needed to design singleton normalized and
unnormalized type-1 TSK FLSs using the back-propagation (steepest descent) method. These
equations also require the designer to make many choices and will change if the choices are
different from the ones we make.
IV.A First-order normalized type-1 TSK FLSs
The key design equations have been worked out by you in Lesson 12, Exercise 131 [see
(5) and (15) in the solution to Exercise 131]:
c ij (n + 1) = c ij (n)
mF (n +1) = mF (n)
i
k
[ y (x ) y ] g (x )
(t)
[y ( x ) y ]
(t)
(t)
i
j
TSK ,1
(t )
(27)
(t)
TSK ,1
(t )
p (t) i
x k mF (n) w i (n)
( t)
x j c j (n) y TSK ,1 (x )
2
(n)
g(n)
j = 0
F
i
k
i
k
(28)
6
where
(t )
1 xk mF (n)
i
w (n) = exp 2
2
(n)
k=1
i
k
i
k
(29)
g(n) = wi (n)
(30)
i=1
and
Fkl
l
k
F kl
Fkl
l
k
(n + 1).
i
k
7
V.C Normalized TSK FLS
tsk_type1.m: Compute the output(s) of a type-1 TSK FLS (type-1 antecedents and type-0
consequent) when the antecedent membership functions are Gaussian.
train_tsk_type1.m: Tune the parameters of a type-1 TSK FLS (type-1 antecedents and type0 consequent) when the antecedent membership functions are Gaussian, using some
inputoutput training data.
V.D Unnormalized TSK FLS
Although no M-files are available for unnormalized TSK FLSs, they can easily be
constructed using the structure of the M-files that are available for a normalized TSK FLS.
Key Points
The equations needed to implement singleton type-1 and non-singleton type-1 Mamdani
FLSs and singleton normalized and unnormalized type-1 TSK FLSs have been collected in
one place.
The equations needed to design [using the back-propagation (steepest descent) design
method] singleton type-1 and non-singleton type-1 Mamdani FLSs and singleton normalized
and unnormalized type-1 TSK FLSs have been collected in one place. These equations also
make use of the ones for implementing their respective type-1 FLSs.
On-line (free) software for implementation and design of FLSssingleton type-1 and nonsingleton type-1 Mamdani FLSs and normalized type-1 TSK FLSsare available on the
Internet at: http://sipi.usc.edu/~mendel/software.
Practice Problems
Exercise SG 14-1
Suppose that the t-norm used for implementation of a singleton type-1 Mamdani FLS is the
minimum. How do the implementation equations change for that FLS?
Exercise SG 14-2
Suppose that the t-norm used for implementation of a non-singleton type-1 Mamdani FLS is the
minimum. How do the implementation equations change for that FLS?
Exercise SG 14-3
Suppose that the t-norm used for implementation of a singleton normalized type-1 TSK FLS is
the minimum. How do the implementation equations change for that FLS?
Reading Assignment
Read pages 66-78 of the textbook. Then read the following new material.
I. Uncertainties in Our Applications
Here I explain where uncertainties can occur in the applications that have been included as part
of this course.
I.A Forecasting of time series (Lesson 6)
Since rules are extracted from numerical data, if the data are corrupted by additive noise then the
rule antecedents and the rule consequent are uncertain. Uncertainty also affects the tuning of the
FLS parameters because noisy measurements are used. Finally, if only noisy measurements are
available to activate the FLS, then uncertainty also affects the inputs to the FLS. In this
application, all four sources of uncertainty that are listed in the first paragraph on p.68 of the
textbook can be present.
I.B Knowledge mining using surveys (Lesson 6)
In Chapter 2, we saw that words can mean different things to different people, so rule
antecedents are uncertain because they use words. Surveys collected from a group of experts lead
to a histogram for the consequent of each rule; hence, there is uncertainty about a rules
consequent. There is no tuning of a FLA, so this kind of uncertainty is not present in a FLA.
Activating a FLA can be done using words. In this case, there is the usual uncertainty about
2
words associated with this activation. In this application, only three kinds of uncertainty are
present because data is not used to tune the FLA.
I.C Rule-based classification of video traffic (Lesson 13)
Example 13-5 in the textbook demonstrates that the logarithm of I, P, or B frame sizes are more
appropriately modeled as Gaussians each of whose mean is a constant, but whose standard
deviation varies. This suggests that we should use a Gaussian MF with a fixed mean and an
uncertain standard deviation to model each frame of the compressed video. Hence, rule
antecedents are uncertain. Rule consequents in a RBC are certain because they correspond to a
class (e.g. 1, movies or sports programs). If the parameters of a FL RBC are tuned using a
training sample, then the just-described uncertainties also affect the tuning. Measurements that
activate the FL RBC will also be uncertain because they are logarithms of I, P, or B frame sizes
as computed over a window of measurements. In this application, only three sources of
uncertainty will be present, because the uncertainty about words does not affect a rules
consequent.
I.D Equalization of Time-Invariant Non-linear Digital Communication Channels
(Lesson 13)
Because equalization using a rule-based FLSa FAFis equivalent to rule-based classification,
there can be three sources of uncertainty present, namely: uncertainty about a rules antecedents
(but not about a rules consequent), uncertainty about the data used to tune the parameters of the
FAF, and uncertainty about the measurements used to activate the FAF. When measurements are
very accurate, then all of these uncertainties disappear. However, if the communication system is
in a time-varying environment (e.g., as in mobile communications), then channel coefficients
will be time-varying, and rule antecedents become uncertain (e.g., see Example 14-2).
I.E Fuzzy Logic Control
Because of the vast scope of FL control, and our very brief coverage of FL control in Lesson 13,
we can only provide a very cursory discussion here about where uncertainties can occur in FL
control. To be as specific as possible, we focus first on non-adaptive fuzzy control in which a
fuzzy model is used for both the plant as well as the controller [see Equations (1) and (2) in
Lesson 13]. Uncertainty can occur in rule antecedents, and may also be present in each rules
consequent if only noisy measurements are available, or if the control cannot be implemented
perfectly. If control parameters are tuned during an off-line design phase using noisy data, then
that kind of uncertainty is also present. It would seem, therefore, that all four sources of
uncertainty could be present in this kind of non-adaptive fuzzy controller problem.
Next, focus on indirect adaptive fuzzy control, as described by Equations (3)(7) in Lesson 13.
The fuzzy system models for f and g will use IF-THEN rules. If they are Mamdani rules, then
uncertainties may be present in both antecedent and consequent words. If they use TSK rules,
then uncertainties may be present just in the antecedent words. Uncertain antecedents or
consequents can be used to model the lack of knowledge about the true non-linearities, f and g.
Observe, in the adaptation laws (6) and (7), that fuzzy system parameters are updated using
measurements, so if only noisy measurements are available to do this, then data uncertainties will
also be present. It would seem, therefore, that all four sources of uncertainty can also be present
in this kind of adaptive fuzzy controller problem.
4
Now locate triangles so that their base end-points can be anywhere in the intervals along the xaxis associated with the blurred average end-points. Doing this leads to a continuum of triangular
MFs sitting on the x-axis, e.g. picture a whole bunch of triangles all having the same apex point
but different base points, as in Figure 1.
For purposes of this discussion, suppose there are exactly N such triangles. Then at each value of
x, there can be up to N MF values, MF1(x), MF 2(x),, MFN(x). Lets assign a weight to each of
the possible MF values, say wx1, wx2,, wxN (see the insert on Figure 1). We can think of these
weights as the possibilities associated with each triangle at this value of x. The resulting type-2
MF can be expressed mathematically as
{(x, {( MFi(x), wxi)| i = 1, , N}| x an element of X}
Another way to write this is:
{(x, MF(x, w)| x an element of X and w an element of Jx}
MF(x, w) is a type-2 MF. It is three-dimensional because MF(x, w) depends on two variables, x
and w.
wx1
MF1(x)
MFN(x)
wxN
MFN(x)
MF1(x)
l
Uncertainty about
left end-point
Uncertainty about
right end-point
Figure 1: Triangular MFs when base end-points (l and r) have uncertainty intervals associated
with them.
A type-1 FLS only uses type-1 fuzzy sets whereas a type-2 FLS uses at least one type-2 fuzzy
set. The diagram for a type-2 FLS is the same as for a type-1 FLS (see Figure 1-1 in the
5
textbook). The inference engine of a type-1 FLS maps type-1 input fuzzy sets into type-1 output
fuzzy sets, whereas the inference engine of a type-2 FLS maps type-2 and/or type-1 fuzzy sets
into type-2 fuzzy sets. The output processor for a type-1 FLS transforms a type-1 fuzzy set into a
number (i.e. a type-0 fuzzy set), and is the familiar defuzzifier. The output processor for a type-2
FLS has two components to it: (1) a type-reducer that transforms a type-2 fuzzy set into a type-1
fuzzy set (a two-dimensional type- reduced set), followed by (2) a defuzzifier that transforms the
resulting type-1 fuzzy set into a number. A type-reduced set is like a confidence interval. The
more uncertainty that is present, then the larger is the type-reduced set, and vice-versa.
Type-2 FLSs have been developed that satisfy the following fundamental design requirement:
when all sources of uncertainty disappear, a type-2 FLS must reduce to a comparable type-1
FLS. This design requirement is analogous to what happens to a probability density function
when random uncertainties disappear. In that case, the variance of the pdf goes to zero, and a
probability analysis reduces to a deterministic analysis. So, just as the capability for a
deterministic analysis is embedded within a probability analysis, the capability for a type-1 FLS
is embedded within a type-2 FLS.
Type-2 FLSs are described by type-2 membership functions (MFs) that are characterized by
more parameters than are MFs for type-1 FLSs. During the designs of type-1 and type-2 FLSs,
MF parameters are optimized using some training data. Because type-2 FLSs are characterized
by more design parameters than are type-1 FLSs (i.e., they have more design degrees of
freedom), type-2 FLSs have the potential to outperform type-1 FLSs.
Of course, one way to introduce more design degrees of freedom into a type-1 FLS is to add
more rules to it. Unfortunately, additional rules do not let a type-1 FLS account for uncertainties,
because uncertainties cannot be modeled by type-1 fuzzy sets. And, in all fairness, the additional
rules should also be provided to the type-2 FLS, especially if we require that a type-2 FLS must
reduce to a type-1 FLS when all sources of uncertainty disappear.
Some specific situations where we have found that type-2 FLSs outperform type-1 FLSs are: (1)
Measurement noise is non-stationary, but the nature of the non-stationarity cannot be expressed
ahead of time mathematically (e.g. variable SNR measurements); (2) A data-generating
mechanism is time-varying, but the nature of the time-variations cannot be expressed ahead of
time mathematically (e.g. equalization of non-linear and time-varying digital communication
channels); (3) Features are described by statistical attributes that are non-stationary, but the
nature of the non-stationarity cannot be expressed ahead of time mathematically (e.g. rule-based
classification of video traffic); and, (4) Knowledge is mined from experts using IFTHEN
questionnaires (e.g. connection admission control for ATM networks).
Type-2 fuzzy sets and FLSs are covered in the textbook that came with this course and are the
subject of the follow-on (to this course) IEEE Self-Study Course New Directions in Rule-Based
Fuzzy Logic Systems: Handling Uncertainties.
Key Points
Solutions
close
(x 10) 2
(x)
=
exp
to 10
x R
approximately
equal to
(x 6) 2
(x)
=
exp
6
0.005
xR
x I
a(x) x < 200
(x) =
x 200 x I
1
There is no unique choice for a(x) and the choice of 200 is also arbitrary.
(d) Complex numbers near the origin
2
Let x = a + jb so that x = a 2 + b 2 . We can interpret complex numbers near the origin as those
numbers for which x is very small. In this case x must be a positive real number. By
multiplying a MF like any of the three given on p. 188 of the textbook by a unit step function, we
can obtain the desired MF, e.g.
x 2
close to origin ( x ) = exp
2 u 1 ( x )
2
where
<< .
LIGHT
(w) =
2e aw
1 + e aw
w0
that is related to the sigmoidal function (1 e aw ) (1+ e aw ) (a shifted version of which is widely
used in neural networks). A more general s-curve that can be used for LIGHT (w) is given in Cox
(1994, pp. 5153).
LIGHT
(w)
weight
HEAVY
(w) =
1 e aw
1+ e aw
w0
HEAVY
(w)
weight
(a)
(c-1)
A B C
(x)
( A B ) C
(x)
(b)
(c-1)
A B C
(x)
( A B ) C
(x)
(c-2)
A (BC )
(x)
(c-2)
A (BC )
(x)
(d-1)
( A B ) C
(x)
(d-1)
( A B ) C
(x)
(d-2)
A (BC )
(x)
(d-2)
A (BC )
(x)
(e)
A B C
(e)
(x)
A B C
(x)
A B C
(x)
6
Exercise 1-11
We restate (1-31) using the maximum t-norm:
c s
(ui ,vj ) =
(ui ,v j )
(ui ,v j ) = max
(ui ,v j ),
(ui ,v j )
(ui ,v j )
It follows that:
(1,1) = max(0.9,0) = 0.9
c s (1,2) = max(0.4,0.6) = 0.6
c s
(1,3) = max(0.1,1) =1
c s (2,1) = max(0.1,0) = 0.1
cs
( 2 , 2 )= max(0.4,0) = 0.4
c s (2,3) = max(0.9,0.3) = 0.9
c s
(ui ,vj ) =
(ui ,v j )
(ui ,v j ) = min
(ui , v j ),
It follows that:
(1,1) = min(0.9,0) = 0
c s (1,2) = min(0.4,0.6) = 0.4
c s
cs
( 2 , 2 )= min(0.4,0) = 0
c s (2,3) = min(0.9,0.3) = 0.3
c s
Exercise 1-19a
Let very likely VL . Then, according to the concept of concentration,
use L (x) from (1-56) to compute VL (x), i.e.
VL
(x) = (
VL
(x) = (
(x)) . We
2
+0.09
/ 0.4 + 0 . 0 4 0.3
/
Observe the higher concentration of VL (x) MF values for high values of probability (x), which
seems sensible for the term very likely.
v2
w1
v3
c (u,v) =
u2 0.1 0.4 0.9
(u,v) and
w2
v1 0
0
0.6 0
mb (v,w) = v2
v3 1
0.7
In this Exercise, product and maximum . The four elements of co mb (u,v) are computed
using the max-product composition shortcuts that are described on p. 42, as:
co mb
co mb
0
0.1) 0.6 = 0.9 0 + 0.4 0.6 + 0.1 1 = max(0,0.24,0.1) = 0.24
1
0
0.1) 0 = max(0,0,0.07)= 0.07
0.7
0
0.4 0.9) 0.6 = max(0,0.24,0.9)= 0.9
co mb (u2 ,v1 ) = (0.1
1
0
0.4 0.9) 0 = max(0,0,0.63)= 0.63
co mb (u2 ,v2 ) = (0.1
0.7
so that
0.24
(u,v)
=
co mb
0.9
0.07
when product and maximum
0.63
co mb
co mb
0.4
(u,v) =
0.9
0.1
0.7
Observe that the two MF matrices are very similar, with the biggest difference between the two
occurring in the 1-1 element.
(y) are given in the following table. We used the Extension Principle that
x
-5
-4
-3
-2
-1
0
1
2
3
4
5
0.2
0.4
0.4
0.5
0.5
0.6
0.9
1
0.8
0.5
0.1
(x)
y= x
5
4
3
2
1
0
1
2
3
4
5
(y)
max{0.2, 0.1} = 0.2
max{0.4, 0.5} = 0.5
max{0.4, 0.8} = 0.8
max{0.5, 1} = 1
max{0.5, 0.9} = 0.9
max{0.6} = 0.6
max{0.5, 0.9} = 0.9
max{0.5, 1} = 1
max{0.4, 0.8} = 0.8
max{0.4, 0.5} = 0.5
max{0.2, 0.1} = 0.2
B
q
T
T
F
F
T
T
F
F
r
T
F
T
F
T
F
T
F
(p q) r
p q
T
T
F
F
F
F
F
F
T
F
T
T
T
T
T
T
pr
T
F
T
F
T
T
T
T
(( p q ) r) ( p r) (q r ) is
( p r) (q r )
qr
T
F
T
T
T
F
T
T
T
F
T
T
T
T
T
T
(y) =
AB
( x ,y) = 1
A B
(x, y) = 1
( x )[1
(x)[1
(y)], (1-76)
(y)]
The three figures below provide our construction of B * (y). Observe that the result in part (c) is
identical to the result in part (c) of Figure 1-10; hence, conclusions drawn at the end of Example
1-19 apply here as well.
1
B
(y)
B*
(x)
(y)
y
A ( x )[1
(a)
(y)
(b)
( x )
y
B (y)]
(c)
10
1 i
x( j)
i j =1
A recursive formula for the sample mean lets us fold in a new measurement, x(i + 1), into this
formula to compute x (i + 1). It is obtained as follows:
1 i +1
1 i
x (i + 1) =
x( j) =
x( j) + x(i +1)
i +1 j =1
i +1 j =1
x (i + 1) =
i
1
x (i) +
x(i + 1)
i +1
i+1
x (48) =
Next, consider the standard deviation. We update the standard deviation by first updating the
variance, 2 (i) , and then taking its positive square root. Recall that the sample variance is given
as
2
(i) =
1 i
[ x( j) x ( j)]2
i j =1
(i + 1) =
i
i +1
(i) +
1
[ x(i + 1) x (i + 1)]2
i +1
11
2
(48) =
47
48
(47) +
1
[ x(48) x (48)]2
48
12
l
1
x1
l
1
x1
x1
x1
min
prod
l
2
l
2
x2
x2
x2
x2
(a)
(b)
Figure 5-4: Pictorial description of input and antecedent operations for a type-1
FLS that uses triangular MFs. (a) Singleton fuzzification with minimum t-norm,
and (b) singleton fuzzification with product t-norm.
l1
l1
y1
l2
y1
l2
y l2
(a)
y l2
(b)
Figure 5-5: Pictorial description of consequent operations for a type-1 FLS when
consequent fuzzy set MFs are triangles. (a) Fired output sets with minimum tnorm, and (b) fired output sets with product t-norm.
13
l1
l1
l2
l2
y1 y2
(a)
y1 y2
(b)
Figure 5-6: Pictorial description of (a) combined output sets for the two fired
output sets depicted in Figure 5-5 (a), and (b) combined output sets for the two
fired output sets depicted in Figure 5-5 (b). Observe that the maximum of the MFs
for the two fired-sets coincides with the MF for the first fired output set.
Note that it is purely by coincidence that the second fired rule makes no contribution to the
combined output sets for the two fired output sets. Your solution to this exercise might have led
to a maximum operation in which the two fired output sets contributed to the final output set.
14
Exercise 5-6:
When triangles are used for the interior MFs and piecewise linear functions are used for the two
shoulder (exterior) MFs, the design parameters are:
1. Shoulder MFs: break point and slope of leg (or location of base point)2
parameters/MF.
2. Interior triangles: center location and length of base [assume that the triangle is
symmetrical (for non-symmetrical triangles, a third parameter is needed, e.g., slopes of
both legs, or left-end and right-end base points)] 2 parameters/MF.
Assume L fuzzy sets for each antecedent and consequent. Total antecedent/consequent MF
design parameters2L.
15
1 N ( i)
e
N i= 1
where
e (i ) =
1
2
[ f (x
(i)
) y( i ) ] (5-47)
2
Hence,
grad J( ) =
2
J( ) 1 N (i )
1 N
=
e
=
f s (x( i ) ) y ( i) ]
[
N i=1
2N i= 1
1 N
fs (x ( i ) ) y (i ) ]
fs (x ( i ) )
[
N i= 1
In this last equation, note that the summation also acts on fs (x( i ) ) . Calculations of
fs (x( i ) ) are exactly the same as in Exercise 5-9. See its solution given in this Study Guide.
N
i=1
(i )
)=
l f s(x
y
(x( i ) )
and
y (i +1) = y (i)
l
J(y l ) = y l (i)
l
y
1
N
[ f (x
N
(i )
i=1
) y ( i) ]
(x( i) )
Fkl
i=1
16
1
yc1 ( 2 , 4 )=
1
=
[2.099 + 2.3546 + 0.5751 + 0.5115]
1.8268
= 5.5402 1.8262 = 3.0337
Exercise 5-15:
Single-antecedent rules: Using Figure 5-13, project upwards from the horizontal axis and
observe that there can be either one, two or three intersections with MFs. Hence, a singleantecedent FLS that uses these MFs can fire one, two or three rules.
Two-antecedent rules: If each antecedent can intersect one, two or three MFs, then we take all
possible combinations of products of (1, 2, 3) and (1, 2, 3) to obtain (1, 2, 3, 4, 6, 9). Hence, we
conclude that a two-antecedent FLS that uses these MFs can fire one, two, three, four, six or nine
rules.
l
1
l
1
x1
x1
x1
x1
min
prod
l
2
l
2
x2
x2
x2
x2
(a)
(b)
Figure 1: Pictorial description of input and antecedent operations for a nonsingleton type-1 FLS that uses triangular MFs. (a) Singleton fuzzification with
minimum t-norm, and (b) singleton fuzzification with product t-norm.
Figures comparable to Figures 5-5 and 5-6 have already been created by you in Lesson 7, and
they do not change. What does change are the numerical values for l and l .
1
Exercise 6-5:
Regardless of whether
= mF , y l ,
l
k
= J( ) , where J( ) = e ( i) =
1
2
[f
ns
Fkl
or
J( )
{ [ f (x
(i )
1
2
ns
) y ( i) ]
} = [ f (x
) y( i ) ]
(i)
ns
f (x ( i ) )
ns
(1)
where
M
f ns (x ( i ) ) = y l l (x( i ) )
(2)
l =1
and
(i )
m
1 k
F
exp 2
2
2
k=1
X +
F
(i )
l (x ) =
2
( i)
M
p
x
m
1 k
F
exp 2 X2 + 2
l =1 k = 1
l
k
(3)
Comparing the first two equations in Section III of Lesson 9 (in the Study Guide) with Equations
(1) and (2) above, we see that they are identical. Comparing Equation (10) in Section III of
Lesson 9 with Equation (3) above, we see that they are identical when
2
Fkl
(5 9)
2
X
2
Fk l
(4)
(6 5)
Hence, we do not need to repeat the derivation of the steepest descent algorithms for mF , y l ,
l
k
and
Fkl
We did not derive a steepest descent algorithm for X in Chapter 5, because X = 0 in that
chapter. In (1)(3), X only appears in (3). Comparing (3) above with Equation (10) in Lesson 9
of this Study Guide, we see (as mentioned in the textbook) that, in Chapter 6, 2X + F2 plays the
l
k
role of Chapter 5s
X
2
Fkl
. We leave it to the reader to show that the steepest descent algorithm for
Fkl
in (6-32)when
Fkl
(6 3 2 )
(6 33)
Set X = 0 in this chapters steepest-descent algorithms to show that they reduce to their
singleton counterparts in (5-48)(5-50).
Exercise 6-7:
1. Fix the shapes and parameters of all the antecedent, consequent and input measurement
membership functions ahead of time. The data establishes the rules and the standard deviation of
the measurements, and no tuning is used.
Either of our two one-pass methods can be used.
2. Fix the shapes and parameters of the antecedent and input measurement membership
functions ahead of time. Use the training data to tune the consequent parameters.
Use the least-squares method to do this.
3. Fix the shapes and parameters of all the antecedent and consequent membership functions
ahead of time. Fix the shape but not the parameter(s) of the input measurement membership
function(s) ahead of time. Use the training data to tune the parameter(s) of the input
measurement membership function(s).
Use a back-propagation (steepest descent) method to do this.
4. Fix the shapes of all the antecedent, consequent and input measurement membership functions
ahead of time. Use the training data to tune the antecedent, consequent and input measurement
parameters.
Use a back-propagation (steepest descent) method to do this.
Regardless of whether
l
k
Fkl
{ [y
1
2
(x ) y ] } = [ y (x ) y ]
(t )
TSK ,1
(t )
(t)
(t )
TSK ,1
yTSK ,1 ( x( t ) )
(1)
(2)
i=1 j= 0
(t)
1 x k mF
(t )
x j exp 2
2
k =1
g ij ( x( t ) ) =
(t)
M
p
1 x k mF
exp 2
2
i =1 k = 1
)
2
i
k
i
k
i
k
(1)
(3)
= c ij : In this case,
(t)
= gij (x (t ) )
)
i yTSK ,1 ( x
c j
(4)
Hence,
c ij (n + 1) = c ij (n)
i
i
i J (cj ) = c j (n)
c j
[ y (x ) y ] g (x )
( t)
TSK ,1
(t )
i
j
(t )
(5)
where j = 0, 1, , p and i= 1, , M.
(2)
yTSK ,1 = h g
(6)
h = c ij w i xj( t )
(7)
i= 1 j = 0
g=
(8)
i= 1
and
(t)
1 x k mF
i
w = exp 2
2
k=1
)
2
(9)
i
k
yTSK ,1 yTSK ,1 wi
=
mF
w i m F
i
k
(10)
i
k
where
p
(t ) i
h
g
g i h i g x j cj h
= w 2 w = j = 0 2
=
g
g
yTSK ,1
w i
(t)
w
1 x k mF
=
exp 2
2
m F
mF k = 1
i
k
i
k
i
k
i
k
x
j= 0
i
k
i
k
i
k
) ( x
2
i
k
(t)
k
(11)
c yTSK ,1
(t ) i
j
j
(t )
1 x k mF
exp 2
=
2
mF
(t)
w
1 x k mF
= exp 2
2
m F
k=1
mF
2
i
k
) exp (x
2
k =1
k k
1
2
(t)
k
mF
)
2
F ki
(12)
F ki
so that
xk( t ) mF
w i
=
wi
2
m F
F
i
k
i
k
i
k
(13)
yTSK ,1
=
mF
c yTSK ,1
(t ) i
j
j
j=0
i
k
(x
(t)
k
mF
i
k
) w
(14)
Fki
mF (n +1) = mF (n)
i
k
mF (n +1) = mF (n)
i
k
(3)
Fkl
[y ( x ) y ]
(t)
(t)
TSK ,1
y
[y ( x ) y ] m
(t)
(t)
TSK ,1
TSK ,1
Fki
(t )
p (t ) i
xk mF (n) w i (n)
(t)
x j c j (n) yTSK ,1( x )
2
(n)
g(n)
j = 0
F
(15)
i
k
i
k
Fkl
(16)
i
k
F ki
computation is just like the one for w m F , we leave its details to the reader.
i
i
k
(x) =
l
mini = 1 , 2 , . . .p,
Xk
(xi )
Fil
(xi )
l =1,..., M .
min k = 1 , 2 , . . .p,
(x) =
Fil
l = 1 min i= 1 , 2 , . . .p,
Exercise SG 14-2:
Equation (9) changes to [see (6-19)]
l = 1 min k = 1 , 2 , . . .p,
M
mF +
l
k
F kl
mX
Q kl
(xkl ,max )
Q kl
(x lk, m a x)
l = 1,...,M ;
Fkl
[see Lesson 5, Example 5-2, Part (c), in which we make the appropriate substitutions for xmax ,
2
1 mX m F
l
x , mA , A* , and A ] Q (x k ,max ) = exp 2
. Equation (8) and the new equations
X +
F
l
for l (x) and Q (x k ,max ) implement a non-singleton type-1 Mamdani FLS under minimum tl
l
k
l
k
l
k
norm.
Exercise SG 14-3:
Only Equation (15) changes to f i (x) = min k = 1 , 2 , . . .p,
i
Fki
for f (x), and Equation (16) implement a singleton, normalized type-1 TSK FLS under
minimum t-norm.
LOTFI A. ZADEH
T
Child of privilege
TEKL.A S. PERRY
32
Senior Editor
Perhaps the confidence Zadeh had in his judgment despite some tough opposition, and his willingness to stand apart from the crowd, originated in
a childhood of privilege. H e was born in 1921 in
Azerbaijan, then part of the Soviet Union, and moved
to Iran at age 10. His parents-his father a businessman and newspaper correspondent, his mother a
doctor-were
comfortably well off. As a child,
Zadeh was surrounded by governesses and tutors,
( K i I K ~ ) 2 3 5 ! 0 5 / $ 4 110011JY5 IFFF
IUNE 1995
"r:M i A. Zadeh
Date of birth: Feb. 4,1921
Blrthpko: Baku, Azerbaijan
Hdght: 178 cm
WdgM 63.5 kg
Family wife, Fay; chiidren, Stella and
Nonnan
Educetion. BSEE, University of Teheran,
lmMSEE, Massachusetts Institute of
Technology, 1946; Ph.D., Columbia
University, 1949
First job: design and analysis of defense
systems, lntemational Electronics
Corp., New York City, summer of 1944
P.tenls: one US. patent, two Iranian
patents
Favoritp books: "I made a conscious decision to stop readingfiction at age 15,
when I was a voracious reader.
I now read scientific books and other
nonfiction only."
paciodaalr: Four newspapers
Fadaily (The New York Times, San
FranriKo Chronicle, San Francisco
Examiner, The Wall Street Journal or
San Jose? fbfe~ury
News), 8Usiness
Week, 7he conomkt
Favorih klml of nwdc classical and
electronic
Favorit6 composers:Sergey Prokofiev,
Dimity Shostakovich
Gwtputer: a Hewlett-Packard workstation, which is wed "only to print my
ictate all my answers to
show "MacNeilllehrer
Pkwshour"
lemt fevorite food: any kind of shellfish
Fawxka testawant: Three Cs Cafe, an
inexpensive creperie in Berkeley, Calif.
Fawdtia crxpmdon: "No matter what
you are told, take it as a compliment."
FavOtite c9ty: Berkeley, Calif.
Leiswe
:portrait photography
(has
hed US. Presidents
Richard Nixon and Harry Truman, as
well as other notables), high-fidelity
audio, garage sales
Car: Nissan Qwst Minivan
Wquages spoken:English, Russian,
Iranian, French
AMfne mileage:two million miles in past
10 years on American and United
airlints alone, uncounted mileage on
ather airlines
Key organkational memberships: the
IEEE, Association for Computing
Machinery, InternationalFuuy
Systems #&sociation, American
Associationfor Artificial Intelligence
Tap awards: the IEEE Medal of Honor
(1995) and the Japan Honda Prize
(1989)
34
Berkeley beckons
As Zadeh was pretty much mtrenched
at Columbia, he surprised his colleagues
when he packed up in 1959 and moved to
the University of California at Berkeley.
"I had not been looking for another
position," Zadeh said, "so the offer from
Berkeley was unexpected.'' It came from
electrical engineering department chairman John Whinnery, who called him at
home over the weekend and offered him a
position. "If my line had been busy, I believe l would still be at Columbia," Zadeh
told Spectrum.
Whinnery recalls it slightly differently.
He had heard from a colleague that Zadeh
had been toying with the idea of leaving
Columbia. Minutes later, Whinnery picked
up the phone and called him, arranged to
meet in him New York City for dinner,
and soon afterward hired him. Berkeley
was then growing rapidly, and Whinnery
was on the lookout for young scholars
who were considered brilliant in their
fields. Zadeh fit the bill.
For Zadeh, moving to Berkeley was a
simple decision to make: "1 was happy at
Columbia, but the job was too soft. It was
a comfortable, undemanding environment;
I was not challenged internally. I realized
that at Berkeley my life would not be anywhere near as comfortable, but I felt that it
would be good for me to be challenged."
Zadeh has never regretted the decision. To this day he remains at Berkeley,
although by now as professor emeritus.
At Berkeley, Zadeh initially continued
his work in linear, noiilinear, and finitestate systems analysis. But before long he
became convinced that digital systems
would grow in importance. Appointed as
chairman of the electrical engineering department, he decided to act o n that conviction, and immediately set about strengthening the role of computer science in
the department's curriculum. He also lobbied the electrical engineering communi-
Fuzzy is born
hile he was focusing on systems analysis, in the early
1960s, Zadeh began to feel
that traditional systems analysis techniques were too precise for real-world
problems. In a paper written in 1961, he
mentioned that a new technique was
needed, a "fuzzy" kind of mathematics. At
the time, though, he had no clear idea
how this would work.
That idea came in July 1964. Zadeh
was in New York City visiting his parents,
IEFF 5I'FCTRUM
IUNE
1995
ii
(i
35
Introduction to
Rule-Based
Fuzzy Logic Systems
by Jerry M. Mendel
Rules
Crisp
Output
Processor
Fuzzifier
inputs
x
Fuzzy
Inference
Fuzzy
output sets
input sets
y = f( x)
Crisp
outputs
y
2
9. Linguistic variables are variables whose values are:
a. numbers
b. algebraic relations
c. words or sentences in a natural or artificial language
10. A normal Gaussian MF has how many design degrees of freedom?
a. one
b. two
c. three
11. Membership functions for fuzzy sets:
a. are unique
b. can be of shapes that are chosen by the designer
c. are always chosen as triangles
12. A MF for integers close to 10 is:
a. close to 10 (x) = 0.3 / 7 + 0.6 / 8 + 1 / 9 + 1 / 10 + 1 / 11+ 0.6 / 12 + 0.3 / 13
x 10, x < 10
b. close to 10 (x) =
1, x 10
c. close
to 10
(x 10) 2
(x) = exp
2
x R
B (x)
A (x)
Figure 15-a
B (x)
A (x)
Figure 15-b
B (x)
A (x)
Figure 15-c
B (x)
A (x)
Figure 15-d
16. Given two fuzzy relations on the same product space, R(U,V) and S(U, V) . Which of the following is the
correct expression for the intersection of R(U,V) and S(U, V) ?
a. R S (x, y) = R (x, y) S (x, y) x U and y V
b. R S (x, y) = max[ R (x, y) S ( x, y)] x U and y V
c. R S (x, y) = 1 R (x, y) S (x, y) x U and y V
17. The intersection and union of two fuzzy relations on the same product space are called _________ of the fuzzy
relations.
a. Cartesian products
b. compositions
c. membership functions
18. A linguistic hedge is an:
a. even bet
b. operator that acts on a fuzzy sets MF converting it into a crisp MF
c. operation that modifies the meaning of a term
19. The MF for very very unlikely is:
a. 1 4LIKELY (x)
b. [1 LIKELY (x)]
c. [1 LIKELY (x)]
[
[ max[0,
d.
e.
max yV
b.
c.
(x, y) + S (y, z) 1]
and
0 0.2
b (u, w) = 0.8 1 is:
0.3 0.4
a. a b (u2 ,w 1 ) = 0
b. a b (u2 ,w 1 ) = 0.3
c. a b (u2 ,w 1 ) = 0.5
23. The Extension Principle lets us:
a. compute the sup-star composition for continuous-valued fuzzy sets
b. extend mathematical relations between one-dimensional fuzzy variables to multi-dimensional fuzzy
variables
c. extend mathematical relationships between non-fuzzy variables to fuzzy variables
24. Given A = small = A (x) , in which one of the situations below would you use the Extension Principle?
a. Find the MF of C = not small = C (x)
b. Find the MF of B = very small = B (x)
c. Find the MF of D = (very small)3
25. Which of the Extension Principles stated below is the correct one to use for a one-to-many multi-variable
mapping?
a. B (y) = max
A (x) y V
1
x f
( y)
sup min{ A (x 1 ), A (x 2 )}
(y) B (y) = ( x , x ) f ( y ) 1
0 if f (y) =
b.
f (A
c.
B = f ( A) = f
1 , A2 )
(
xU
1
A ( x) x = A (x 1 ) y1 + A (x 2 ) y2 + A ( xN ) yN B( y)
(x m
2
A
32. When the antecedent MF is Gaussian, ( A (x) = exp 12
), the input is modeled as a Gaussian fuzzy
A
(x x 2
), product implication and t-norm are used, then:
number ( A (x) = exp 12
A
2
2
a. sup x X A (x) A (x) occurs at x = x max = A mA + A x A + A
b. sup x X
c. sup x X
[
[
[
A
A
]
(x) (x)] occurs at x = x
(x) (x)] occurs at x = x
A
max
max
(
= (
= (
)(
m + x ) (
m + x ) (
A
2
A
2
A
)
+ )
+ )
A
2
A
2
A
[
[
]
(x, y)],
(x) Al G l
y Y
8
b. sum of all antecedents and consequents multiplied by the number of rules
c. number of rules
51. FBFs are:
a. orthogonal
b. coupled
c. uncoupled
52. The fact that a FLS is a universal approximator:
a. helps to explain why a FLS is so successful in engineering applications
b. tells us exactly how to design a FLS
c. means that a FLS can uniformly approximate any real discontinuous non-linear function to arbitrary degree of
accuracy
53. Rule explosion:
a. refers to rules that are too hot to handle
b. is no problem in a FLS
c. refers to rapid growth in the maximum number of rules that may be required in a FLS.
10
c. can be computed in closed form for all MFs
68. An informative way to interpret a non-singleton type-1 FLS is as a:
a. pre-filter of the input x that transforms x into x lmax (l = 1, , M), after which the remaining operations of the
FLS are the same as those of a singleton FLS
b. pre-filter of the input x that transforms x into x lmax (l = 1, , M), after which the remaining operations of the
FLS are different from those of a singleton FLS
c. an inference mechanism followed by post-filtering
69. The firing level for a non-singleton type-1 FLS is:
a. the same as the firing level of a singleton type-1 FLS
b. smaller than the firing level of a singleton type-1 FLS
c. different than the firing level of a singleton type-1 FLS
70. The new design degrees of freedom for a non-singleton type-1 FLS, as compared to those in a singleton type-1
FLS, are associated with parameters in the:
a. consequent MF
b. antecedent MF
c. input MF
71. The rules of a non-singleton type-1 FLS are __________ the rules of a singleton type-1 FLS:
a. different from
b. the same as
c. fewer than
72. In general, the totally independent design approach should provide ____________ performance than the
partially independent design approach:
a. better
b. the same
c. worse
73. Which design method can be used to optimize all the parameters of a non-singleton type-1 FLS?
a. one-pass
b. least-squares
c. back-propagation
74. Although a non-singleton type-1 FLS can model uncertain measurements, this is usually insufficient to achieve
significantly improved performance over a singleton type-1 FLS because the:
a. rules of the two kinds of FLSs are the same
b. uncertainty contained in noisy training data is accounted for by the antecedent MFs of a type-1 FLS
c. uncertainty contained in noisy training data cannot be accounted for by the antecedent and consequent MFs of
a type-1 FLS
11
b. the consequent function of a type-1 TSK FLS is a function and not a fuzzy set
c. the consequent function of a type-1 TSK FLS is a fuzzy set and not a function
d. the output formula for a type-1 TSK FLS is obtained using a sup-star composition
e. the output formula for a type-1 TSK FLS is obtained by combining its rules in a prescribed way and does not
derive from the sup-star composition
77. Under which condition is the normalized type-1 TSK FLS exactly the same as a type-1 Mamdani FLS?
a. the consequent function in a TSK rule is a linear function of the antecedent variables
b. the consequent function in a TSK rule is a quadratic function of the antecedent variables
c. the consequent function in a TSK rule is a constant
78. A TSK FLS that uses the same number of rules as a Mamdani FLS always has _______ design degrees of
freedom than a Mamdani FLS:
a. fewer
b. the same number
c. more
79. When all of the antecedent parameters of a type-1 TSK FLS are pre-specified, which design method can be used
to design the consequent parameters?
a. least-squares
b. one-pass
c. back-propagation
80. A TSK FLS can outperform a Mamdani FLS because it:
a. has more design degrees of freedom when both use the same number of rules
b. is a universal approximator
c. does not require the use of the sup-star composition
81. Within the general class of time-series forecasting problems, the problem of forecasting compressed video falls
into which category?
a. deterministic-signal and noisy-measurement case
b. random-signal and noisy-measurement case
c. random-signal and perfect-measurement case
12
83a. The overall approach to designing a RBC involves four steps performed in a correct order, taken from the
following candidate steps:
1.a Establish rules using the features
1.b Evaluate the performance of the optimized RBC using testing
1.c Choose appropriate features that act as the antecedents in a RBC
1.d Optimize the rule design-parameters using a tuning procedure
1.e Cluster the features in feature space
Which of the following is the correctly ordered four steps?
a. 1.a, 1.b, 1.c, 1.d
b. 1.a, 1.e, 1.c, 1.b
c. 1.e, 1.d, 1.c, 1.b
d. 1.c, 1.a, 1.d, 1.b
e. 1.c, 1.e, 1.a, 1.d
84a. The rules for a RBC of compressed video traffic have:
a. Two antecedents and one consequent
b. Three antecedents and two consequents
c. Three antecedents and one consequent
85a. The consequent of a RB FLC for classification of video traffic is:
a. a fuzzy set
b. a linear function of its antecedent variables
c. 1
86a. Suppose that a FL RBC gives the following results for 600 testing elements: 370 movies are correctly
classified, 210 sports programs are correctly classified, 12 movies are mis-classified as sports programs, and 8
sports programs are mis-classified as movies. How many false alarms are there?
a. 12
b. 8
c. 20
13
85b. A channel of order 6 that is equalized by a transversal equalizer of order 4 has how many states?
4
a. 2
24
b. 2
10
c. 2
86b. For a binary input sequence, equalization is equivalent to:
a. The output of an unnormalized TSK FLS
b. The centroid defuzzified output of a Mamdani FLS
c. Two category classification
86c. In indirect adaptive fuzzy control, fuzzy systems are used to model what kinds of plant non-linearities?
a. known
b. unknown
c. discontinuous
d. continuous
14
15
26. A, D
27. A
28. A
29. C
30. A
31. B
32. C
51. B
52. A
53. C
54. B
55. C
56. B
57. C
8. A
9. C
10. B
11. B
12. A
13. B
14. A, B, D, F
15. B
16. A
17. B
18. C
19. C
20. C
21. A, B, E
22. B
23. C
24. C
25. B
33. B
34. A, C
35. C
36. B
37. C
38. B
39. A
40. A, C, D, E
41. C
42. B
43. B
44. A
45. B
46. C
47. A
48. B
49. B
50. C
58. A
59. C
60. A
61. B
62. B
63. A
64. C
65. A
66. C
67. B
68. A
69. C
70. C
71. B
72. A
73. C
74. C
75. B
76. B, E
77. C
78. C
79. A
80. A
81. C
a
b
82a. B
82b. A
83a. D
83b. B
84a. C
84b. B
85a. C
85b. C
86a. C
86b. C
87. D
88. B
89. A
90. C
91. C
92. B
93. C
94. B
95. A
96. C
97. A, D, E, G
98. C
99. A
100. B
c
82c. A
83c. C
84c. A, D, F
85c. C
86c. A