Course 395: Machine Learning - Lectures: - Lecture 7-8: Instance Based Learning (M. Pantic)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

Course 395: Machine Learning – Lectures

• Lecture 1-2: Concept Learning (M. Pantic)

• Lecture 3-4: Decision Trees & CBC Intro (M. Pantic)

• Lecture 5-6: Artificial Neural Networks (S. Zafeiriou)


• Lecture 7-8: Instance Based Learning (M. Pantic)

• Lecture 9-10: Genetic Algorithms (M. Pantic)

• Lecture 11-12: Evaluating Hypotheses (THs)

• Lecture 13-14: Bayesian Learning (S. Zafeiriou)

• Lecture 15-16: Dynamic Bayesian Networks (S. Zafeiriou)

• Lecture 17-18: Inductive Logic Programming (S. Muggleton)


Maja Pantic Machine Learning (course 395)
Instance Based Learning – Lecture Overview

• Lazy learning

• K-Nearest Neighbour learning

• Locally weighted regression

• Case-based reasoning (CBR)

• Advantages and disadvantages of lazy learning

• (Example: CBR-based system for facial expression interpretation)

Maja Pantic Machine Learning (course 395)


Eager vs. Lazy Learning

• Eager learning methods construct general, explicit description of the target function
based on the provided training examples.

≡ one-fits-all ≡ input independent

Maja Pantic Machine Learning (course 395)


Eager vs. Lazy Learning

• Eager learning methods construct general, explicit description of the target function
based on the provided training examples.

• Lazy learning methods simply store the data and generalizing beyond these data is
postponed until an explicit request is made.

Problem Solution
space space

1. Search the memory for similar instances


2. Retrieve the related solutions
3. Adapt the solutions to the current instance
4. Assign the value of the target function
estimated for the current instance

Maja Pantic Machine Learning (course 395)


Eager vs. Lazy Learning

• Eager learning methods construct general, explicit description of the target function
based on the provided training examples.

• Lazy learning methods simply store the data and generalizing beyond these data is
postponed until an explicit request is made.

Problem Solution
space space

1. Search the memory for similar instances


2. Retrieve the related solutions
3. Adapt the solutions to the current instance
4. Assign the value of the target function
estimated for the current instance

Maja Pantic Machine Learning (course 395)


Eager vs. Lazy Learning

• Eager learning methods construct general, explicit description of the target function
based on the provided training examples.

• Lazy learning methods simply store the data and generalizing beyond these data is
postponed until an explicit request is made.

Problem Solution
space space

1. Search the memory for similar instances


2. Retrieve the related solutions
3. Adapt the solutions to the current instance
4. Assign the value of the target function
estimated for the current instance

Maja Pantic Machine Learning (course 395)


Eager vs. Lazy Learning

• Eager learning methods construct general, explicit description of the target function
based on the provided training examples.

• Lazy learning methods simply store the data and generalizing beyond these data is
postponed until an explicit request is made.

Problem Solution
space space

1. Search the memory for similar instances


2. Retrieve the related solutions
3. Adapt the solutions to the current instance
4. Assign the value of the target function
estimated for the current instance

Maja Pantic Machine Learning (course 395)


Eager vs. Lazy Learning

• Eager learning methods construct general, explicit description of the target function
based on the provided training examples.

• Lazy learning methods simply store the data and generalizing beyond these data is
postponed until an explicit request is made.

Problem Solution
space space

1. Search the memory for similar instances


2. Retrieve the related solutions
3. Adapt the solutions to the current instance
4. Assign the value of the target function
estimated for the current instance

Maja Pantic Machine Learning (course 395)


Eager vs. Lazy Learning

• Eager learning methods construct general, explicit description of the target function
based on the provided training examples.

• Lazy learning methods simply store the data and generalizing beyond these data is
postponed until an explicit request is made.

• Lazy learning methods can construct a different approximation to the target function
for each encountered query instance.

• Eager learning methods use the same approximation to the target function, which
must be learned based on training examples and before input queries are observed.

Maja Pantic Machine Learning (course 395)


Eager vs. Lazy Learning

• Eager learning methods construct general, explicit description of the target function
based on the provided training examples.

• Lazy learning methods simply store the data and generalizing beyond these data is
postponed until an explicit request is made.

• Lazy learning methods can construct a different approximation to the target function
for each encountered query instance.

• Eager learning methods use the same approximation to the target function, which
must be learned based on training examples and before input queries are observed.

• Lazy learning is very suitable for complex and incomplete problem domains, where a
complex target function can be represented by a collection of less complex local
approximations.

Maja Pantic Machine Learning (course 395)


k-Nearest Neighbour Learning

The main idea behind k-NN learning is so-called majority voting.

Maja Pantic Machine Learning (course 395)


k-Nearest Neighbour Learning

The main idea behind k-NN learning is so-called majority voting.

Maja Pantic Machine Learning (course 395)


k-Nearest Neighbour Learning

• Given the target function V: X → C and a set of n already observed instances (xi, cj),
where xi ∈ X, i = [1..n], cj ∈ C, j = [1..m], V(xi) = cj, k-NN algorithm will decide the
class of the new query instance xq based on its k nearest neighbours (previously
observed instances) xr, r = [1..k], in the following way:
V(xq) ← cl ∈ C ↔ (∀ j ≠ l) ∑r E(cl, V(xr)) > ∑r E(cj, V(xr)) where
E(a, b) = 1 if a = b and E(a, b) = 0 if a ≠ b

• The nearest neighbours of a query instance xq are usually defined in terms of standard
Euclidean distance:

de (xi, xq) = √{∑g (ag(xi) – ag(xq))²}

where the instances xi, xq ∈ X are described with a set of g = [1..p] arguments ag

Maja Pantic Machine Learning (course 395)


k-Nearest Neighbour Learning
• Distance between two instances xi, xq ∈ X, described with a set of g = [1..p]
arguments ag, can be calculated as:
 City-block (Manhattan) distance (L1-norm):
de (xi, xq) = ∑g |ag(xi) – ag(xq)|

 Euclidean distance (L2-norm):


de (xi, xq) = √{∑g (ag(xi) – ag(xq))²}

 Chebyshev distance (L-infinity-norm):


de (xi, xq) = maxg |ag(xi) – ag(xq)|

Maja Pantic Machine Learning (course 395)


k-Nearest Neighbour Learning
• For k = 1, the decision surface is a set of polygons (Voronoi diagram),
completely defined by previously observed instances (training examples).

Maja Pantic Machine Learning (course 395)


k-Nearest Neighbour Learning

• The nearest neighbours (previously observed instances) xr, r = [1..k], of a query


instance xq are defined based on a distance d(xr, xq) such as the Euclidian distance.

• A refinement of the k-NN algorithm: assign a weight wr to each neighbour xr of the


query instance xq based on the distance d(xr, xq) such that (d(xr, xq)↓ ↔ wr↑ )

• Distance-weighted k-NN algorithm: Given the target function V: X → C and a set of


n already observed instances (xi, cj), where xi ∈ X, i = [1..n], cj ∈ C, j = [1..m], V(xi)
= cj, distance weighted k-NN algorithm will decide the class of the query instance xq
based on its k nearest neighbours xr, r = [1..k], in the following way:
V(xq) ← cl ∈ C ↔ (∀ j ≠ l) ∑r wr · E(cl, V(xr)) > ∑r wr · E(cj, V(xr)) where
E(a, b) = 1 if a = b, E(a, b) = 0 if a ≠ b , and
wr = 1 / (d(xr, xq))²
• any other measure favouring the votes of nearby
neighbours will do (e.g. Gaussian distribution)

Maja Pantic Machine Learning (course 395)


k-Nearest Neighbour Learning: Remarks

• By the distance-weighted k-NN algorithm, the value of k is of minor importance as


distant examples will have very small weight and will not greatly affect the value of
V(xq).

• If k = n, where n is the total number of previously observed instances, we call the


algorithm a global method. Otherwise, if k < n, the algorithm is called a local method.

• Advantage – Distance-weighted k-NN algorithm is robust to noisy training data: it


calculates V(xq) based on a weighted V(xr) values of all k nearest neighbours xr,
effectively smoothing out the impact of isolated noisy training data.

• Disadvantage – All k-NN algorithms calculate the distance between instances based
on all attributes → if there are many irrelevant attributes, instances that belong
together may still be distant from one another.

• Remedy – weight each attribute differently when calculating the distance between two
instances

Maja Pantic Machine Learning (course 395)


Locally Weighted Regression

• Locally weighted regression is a most general form of k-NN learning.


It constructs an explicit approximation to target function V that fits the training
examples in the local neighbourhood of the query instance xq.

• Local – V is approximated based only on the data (neighbours) near xq.


Weighted – contribution of a datum is weighted by its distance from xq.
Regression – refers to the problem of approximating a real-valued target function.

• Locally weighted regression:


target function: V: X → C,
target function approximation near xq: V’(xq) = w0 + w1 a1(xq) + … + wn an(xq),
where xq ∈ X is described with a set of g = [1..n] arguments aj
training examples: set of k nearest neighbours xr, r = [1..k], of the query instance xq,
learning problem: learn the most optimal weights w given the set of training examples
learning algorithm (distance-weighted gradient descent training rule):
∆ wj = η · {∑r K(d(xr, xq)) · (V(xr) – V’(xr)) · aj(xr)}
• function of distance that determines weights of xr
Maja Pantic Machine Learning (course 395)
Instance Based Learning – Lecture Overview

• Lazy learning

• K-Nearest Neighbour learning

• Locally weighted regression


• Case-based reasoning (CBR)

• Advantages and disadvantages of lazy learning

• (Example: CBR-based system for facial expression interpretation)

Maja Pantic Machine Learning (course 395)


Case Based Reasoning (CBR) – Schank’s Theory

• The work of Roger Schank, inspired by findings in cognitive sciences on human


reasoning and memory organization, is held to be the origin of CBR.

• Human knowledge about the world is organized in memory packets holding similar
concepts and/or episodes that one experienced.

• If a memory packet contains a situation when a problem was successfully solved and
the person experiences a similar situation, the previous experience is recollected and
the same steps are followed to reach a solution.

• Rather than following a general set of ruls, reapplying previously successful solution
schemes in a new but similar context solves the newly encountered problems.

≡ general approximation ≡ local approximation


to target function to target function

Maja Pantic Machine Learning (course 395)


Case Based Reasoning (CBR) – Schank’s Theory

• The work of Roger Schank, inspired by findings in cognitive sciences on human


reasoning and memory organization, is held to be the origin of CBR.

• Human knowledge about the world is organized in memory packets holding similar
concepts and/or episodes that one experienced.

• If a memory packet contains a situation when a problem was successfully solved and
the person experiences a similar situation, the previous experience is recollected and
the same steps are followed to reach a solution.

• Rather than following a general set of ruls, reapplying previously successful solution
schemes in a new but similar context solves the newly encountered problems.

• Lazy learning is much closer to human reasoning model than this is the case with
eager learning

Maja Pantic Machine Learning (course 395)


Case Based Reasoning (CBR) – Schank’s Theory

Schank’s memory-based reasoning model: based on similarity


of cases

– The memory of experiences is derived from enumaration of the observed cases,


which are stored further in memory organization packets.

– If problems occur to which no specific case can match exactly, reason from more
general similarities to come up with solutions. 1-NN, otherwise k-NN
Note: the retrieval is almost never full breadth (exhaustive).
distance
measure – The basis of memory-based model is automatic (online) learning:
Memory of experiences is augmented by each novel experience (case).
I.e., the process of learning never ceases.

opposite of offline learning (typical for eager learning methods),


where the process of learning ceases when the training is completed

Maja Pantic Machine Learning (course 395)


Case Based Reasoning (CBR)

• Schank’s memory-based reasoning model is the underlying reasoning model of CBR.

• CBR is reasoning by remembering: previously solved cases are used to suggest


solutions for novel but similar problems.

Problem Solution
space space

1. Search the memory for similar instances


2. Retrieve the related solutions (1- / k-NN)
3. Adapt the solutions to the current instance
4. Store the new case in the memory of
experiences

Maja Pantic Machine Learning (course 395)


Case Based Reasoning (CBR)

• Schank’s memory-based reasoning model is the underlying reasoning model of CBR.

• CBR is reasoning by remembering: previously solved cases are used to suggest


solutions for novel but similar problems.

Problem Solution
space space

1. Search the memory for similar instances


2. Retrieve the related solutions (1- / k-NN)
3. Adapt the solutions to the current instance
4. Store the new case in the memory of
experiences

Maja Pantic Machine Learning (course 395)


Case Based Reasoning (CBR)

• Schank’s memory-based reasoning model is the underlying reasoning model of CBR.

• CBR is reasoning by remembering: previously solved cases are used to suggest


solutions for novel but similar problems.

Problem Solution
space space

1. Search the memory for similar instances


2. Retrieve the related solutions (1- / k-NN)
3. Adapt the solutions to the current instance
4. Store the new case in the memory of
experiences

Maja Pantic Machine Learning (course 395)


Case Based Reasoning (CBR)

• Schank’s memory-based reasoning model is the underlying reasoning model of CBR.

• CBR is reasoning by remembering: previously solved cases are used to suggest


solutions for novel but similar problems.

Problem Solution
space space

1. Search the memory for similar instances


2. Retrieve the related solutions (1- / k-NN)
3. Adapt the solutions to the current instance
4. Store the new case in the memory of
experiences

Maja Pantic Machine Learning (course 395)


Case Based Reasoning (CBR)

• Schank’s memory-based reasoning model is the underlying reasoning model of CBR.

• CBR is reasoning by remembering: previously solved cases are used to suggest


solutions for novel but similar problems.

Problem Solution
space space

1. Search the memory for similar instances


2. Retrieve the related solutions (1- / k-NN)
3. Adapt the solutions to the current instance
4. Store the new case in the memory of
experiences

Maja Pantic Machine Learning (course 395)


Case Based Reasoning (CBR) – Working Cycle

CBR working cycle:


1. RETRIEVE the most similar case(s).
2. REUSE the case(s) to suggest the
solution for the current case.
3. REVISE the suggested solution.
4. RETAIN the case by storing it in the
memory of experiences.

case base

Maja Pantic Machine Learning (course 395)


Case Based Reasoning (CBR) – System Design

• How the cases will be represented?

• How the case base should be organized?

• How the indexing (assigning indexes to cases to facilitate their retrieval) should be
defined?

• Which retrieval algorithm is to be used?

• Which (case base) adaptation algorithm is to be used?

Maja Pantic Machine Learning (course 395)


Case Based Reasoning (CBR) – Cases

• Cases contain knowledge about previous experiences (solved problems).


• A case is typically composed of the problem description and the problem solution.
• The classic guideline ‘the more information it stores, the more useful the case is’,
should be applied cautiously.
• Problem description should contain enough data for an accurate and efficient case
retrieval. Useful info: retrieval statistics.
• Problem solution can be either atomic (e.g., an action) or compound (e.g., a sequence
of actions).
• Cases can be either monolithic (e.g., observation → action) or compound (e.g., a set
of observations → a sequence of actions; Note: parts can be processed separately).
• Cases can be represented in various ways: feature vectors, semantic nets, objects,
frames, rules...
Cases should be such that an accurate and efficient retrieval is facilitated.

Maja Pantic Machine Learning (course 395)


Case Based Reasoning (CBR) – Organisation

• Flat Case Base Organisation


The simplest case base organisation without any specific structure.
Case retrieval is based on case-by-case search.

• Clustered Case Base Organisation


Cases are stored in clusters of similar cases (as originally proposed by Schank).
Case retrieval includes finding the appropriate cluster(s) and searching through
it for similar cases.
Case addition / deletion algorithm is more complex than by flat organisation.

• Hierarchical Case Base Organisation


Cases that share features are grouped together.
A semantic network containing interlinked features and categories is used.
Cases are associated with categories.
Case retrieval is feature based. It is fast and accurate.
Reorganisation of the case base may be very complex and difficult.

Maja Pantic Machine Learning (course 395)


Case Based Reasoning (CBR) – Indexing

• Case indexing: assigning indexes to cases to facilitate efficient and accurate retrieval
of cases from the case base.

• Indexes are defined in terms of features / attributes of cases.

• Indexes should be:


 predictive of the case relevance most informative
features
 recognisable – it should be clear why they are used
 abstract enough to allow for widening of the case base
 discriminative enough to facilitate efficient and accurate case retrieval

trade-off between the generality and specificity of the


hypotheses (set of features) to be used for indexing

Maja Pantic Machine Learning (course 395)


Case Based Reasoning (CBR) – Retrieval

• Retrieval algorithm should retrieve case(s) most similar to the currently presented
problem / situation.
preferred as it results in faster
• 1-NN (k-NN) search retrieval and more accurate solutions
A case-by-case search. Search is accurate but highly time consuming.

• 1-NN (k-NN) search through preselected cases


Uses the indexing structure of the case base to preselect the cases.
Then, applies 1-NN or k-NN search. Faster than simple case-by-case search.
It can happen that the best match is not in the preselected cases.

• 1-NN (k-NN) search through (preselected and) ranked cases


Uses the retrieval statistics to rank the cases.
Then applies the 1-NN or k-NN search (through preselected cases). Search is
faster than in the above mentioned cases but not necessarily more accurate.

• Good retrieval algorithm: the best compromise between accuracy and efficiency.

Maja Pantic Machine Learning (course 395)


Case Based Reasoning (CBR) – Adaptation
• Adaptation algorithm adapts the solutions associated with the retrieved cases to the
currently presented problem / situation.

• Structural Adaptation
Applies a set of adaptation rules directly to the retrieved solutions.
Adaptation rules can include, e.g., modifying certain attributes through
interpolating between relevant attributes of the retrieved cases.

• Derivational Adaptation
Uses algorithms / rules that have been used to generate the original solution.
Can be used only for problem domains that are completely transparent.
↔ Not used very often.

• Manual (User-driven) Adaptation


If no exact match is found, asks the user for a feedback.
Adapts the solutions accordingly. Faulty adaptations cannot be encountered.
Used very often.

Maja Pantic Machine Learning (course 395)


Lazy Learning – Advantages

• Incremental (online) learning: The problem-solving ability is increased with each


newly presented case.

• Suitability for complex and incomplete problem domains: A complex target function
can be described as a collection of less complex local approximations and unknown
classes can be learned.

• Suitability for simultaneous application to multiple problems: Examples are simply


stored and can be used for multiple problem-solving purposes.

• Ease of maintenance: A lazy learner adapts automatically to changes in the problem


domain.

Maja Pantic Machine Learning (course 395)


Lazy Learning – Disadvantages

• Handling very large problem domains: This implies high memory / storage
requirements and time-consuming search for similar examples.

• Handling highly dynamic problem domains: In CBR, this involves continuous


reorganisation of the case base, which may introduce errors in the case base.
Overall, the set previously encountered examples may become outdated if a sudden
large shift in the problem domain occurs.

• Handling overly noisy data: Such data may result in storing same problems numerous
times because of the differences in cases due to noise. In turn, this implies high
memory / storage requirements and time-consuming search for similar examples.

• Achieving fully automatic operation: Only for complete problem domains a fully
automatic operation of a lazy learner can be expected. Otherwise, user feedback is
needed for situations for which the learner has no solution.

Maja Pantic Machine Learning (course 395)


Instance Based Learning – Exam Questions

• Tom Mitchell’s book –chapter 8

• Relevant exercises from chapter 8: 8.3

• Case-Based Reasoning Syllabus

• To prepare assignment 3 of the CBC read:


Pantic & Rothkrantz (2004):
“CBR for user-profiled recognition of emotions from face images”

Maja Pantic Machine Learning (course 395)


Instance Based Learning – Lecture Overview

• Lazy learning

• K-Nearest Neighbour learning

• Locally weighted regression

• Case-based reasoning (CBR)

• Advantages and disadvantages of lazy learning


• (Example: CBR-based system for facial expression interpretation)

Maja Pantic Machine Learning (course 395)


Automatic Facial Expression Analysis

Maja Pantic Machine Learning (course 395)


Automatic Facial Expression Analysis

Anger Surprise Sadness Disgust Fear Happiness

Maja Pantic Machine Learning (course 395)


Automatic Facial Expression Analysis

Anger Surprise Sadness Disgust Fear Happiness

Maja Pantic Machine Learning (course 395)


User-profiled Facial Expression Interpretation

Could you please display an How would you interpret this?


angry expression? Happy? Angry? Teasing?

Maja Pantic Machine Learning (course 395)


User-profiled Facial Expression Interpretation

cheeks raised (AU6)


smile (AU12)
lips parted (AU25)

Happy

AU6+AU12+AU25 = Happy
Maja Pantic Machine Learning (course 395)
Case Base Initialisation

AUs Case explanation


1 raised inner eyebrow
2 raised outer eyebrow
1+2 from “surprise”
4 furrowed eyebrows
5 raised upper eyelid
7 raised lower eyelid
1+4+5+7 from “fear”
...
1+4 from “sadness”
...
9 wrinkled nose
9+17 from “disgust”
...
6+12 from “happiness”
...

Maja Pantic Machine Learning (course 395)


Case Base Initialisation

AUs User’s interpretation


1 disappointed
Happy 2 angry
1+2 surprised
4 angry
5 please don’t
7 angry
1+4+5+7 please don’t
...
1+4 disappointed
...
9 slimy (yak!)
9+17 slimy (yak!)
...
6+13 happy
...

Maja Pantic Machine Learning (course 395)


Case Base Organisation

AUs User’s interpretation


1 disappointed
2 angry
1+2 surprised
4 angry
5 please don’t
7 angry
1+4+5+7 please don’t
...
1+4 disappointed
...
9 slimy (yak!)
9+17 slimy (yak!)
Clusters:
...
label ‹angry›
6+13 happy
cases ‹(2,4,0); (4,0); (7,0);… ;(24,0); (24,17,0)›
...
index ‹4, 7, 24,…›

Maja Pantic Machine Learning (course 395)


Retrieval

Clusters:
label ‹angry›
cases ‹(2,4,0); (4,0); (7,0);…;
(24,0); (24,17,0)›
index ‹4, 7, 24,…›

Maja Pantic Machine Learning (course 395)


Adaptation

Problem Solution User-profiled AU interpretation


space space
1. Search the Case Base for similar cases,
retrieve them, and interpret the input set of
AUs using the interpretation labels
suggested by the retrieved cases.
2. If the user is satisfied with the output, store
the new case in the Case Base. Otherwise,
adapt the Case Base (i.e., store the new
interpretation that the user associates with
the input facial expression).

Pantic and Rothkrantz, Proc. IEEE ICME’04


Maja Pantic Machine Learning (course 395)
Course 395: Machine Learning – Lectures
• Lecture 1-2: Concept Learning (M. Pantic)

• Lecture 3-4: Decision Trees & CBC Intro (M. Pantic)

• Lecture 5-6: Artificial Neural Networks (S. Zafeiriou)

• Lecture 7-8: Instance Based Learning (M. Pantic)


• Lecture 9-10: Genetic Algorithms (M. Pantic)

• Lecture 11-12: Evaluating Hypotheses (THs)

• Lecture 13-14: Bayesian Learning (S. Zafeiriou)

• Lecture 15-16: Dynamic Bayesian Networks (S. Zafeiriou)

• Lecture 17-18: Inductive Logic Programming (S. Muggleton)


Maja Pantic Machine Learning (course 395)

You might also like