Unsupervised Learning

UNSUPERVISED LEARNING
NETWORKS
1
Biological background:
Neurons are wired topographically, nearby neurons connect to nearby
neurons. In visual cortex, neurons are organized in functional columns.
How do these neurons organise? Don’t get any ‘supervision’
Ocular dominance columns: one region responds to one eye input
Orientation columns: one region responds to one direction
UNSUPERVISED
 LEARNING
No help from the outside.
 Training samples contain only input patterns
•No desired output is given (teacher-less)
 Learning by doing.
Learn to form classes/clusters of sample patterns according
to similarities among them
•Patterns in a cluster would have similar features
•No prior knowledge as what features are important for
classification, and how many classes are there.
 Used to pick out structure in the input:
•Clustering,
•Reduction of dimensionality  compression.
 Example: Kohonen’s Learning Law.
3
FEW UNSUPERVISED LEARNING
NETWORKS
• NN models to be covered
– Competitive networks and competitive learning
• Winner-takes-all (WTA)
• Maxnet
• Hemming net
– Counterpropagation nets
– Adaptive Resonance Theory
– Self-organizing map (SOM)
• Applications
– Clustering
– Vector quantization
– Feature extraction
– Dimensionality reduction
– optimization
4
COMPETITIVE
 Output
LEARNING
units compete, so that eventually only
one neuron (the one with the most input) is active
in response to each output pattern.
 The total weight from the input layer to each

output neuron is limited. If some connections are
strengthened, others must be weakened.
 A consequence is that the winner is the output

neuron whose weights best match the activation
pattern.
5
COMPETITIVE
LEARNING
• when presented with patterns from the same selection of inputs
repeatedly, will tend to stabilize so that its neurons are
representatives of clusters of closer inputs.
• Each neuron will tend to be similar to inputs

in its cluster (like a chameleon, perhaps
6
Measures of similarity or closeness
(opposite: dissimilarity or distance)
• Suppose x is an input vector and wi the weight vector of the ith neuron.
• One measure of distance is the Euclidean distance:
|| x – wi || = sqrt(sumi ( (xj - wij )2 ))
= sqrt((xj - wij )*(xj - wij )T)
(vector inner product)
The discrete metric: d(x, wi) = 0 if x = wi, 1 otherwise
Another measure of distance, used when the values are integer, is the “Manhattan”
“city-block”, or “taxi cab” distance:
|| x - wi || = sumi( | xj - wij | )
Another measure of distance, used when the values are 2-valued, is the “Hamming
distance”:
Sum i ( | xj ==wij | )
0 when the values are equal, 1 otherwise

7
Example for Different Metrics
• Suppose x = [1 1 -1 1], w = [1 -1 -1 -1]
• Euclidean distance = sqrt(02 + 22 + 02 + 22 ) = 2.83...
• Discrete metric = 1
• Manhattan distance = 0 + 2 + 0 + 2 = 4
• Hamming distance = 0 + 1 + 0 + 1 = 2
8
Competitive learning
--finite resources: outputs ‘compete’ to see which will win via
inhibitory connections between them
--aim is to automatically discover statistically salient features of
pattern vectors in training data set: feature detectors
--can find clusters in training data pattern space which can be used
to classify new patterns
--basic structure:
Outputs
inputs
input layer fully connected to output layer
input to output layer connection feedforward
output layer compares activation's of units following presentation of

pattern vector x via (sometimes virtual) inhibitory lateral connections
winner selected based on largest activation

winner- takes-all (WTA)
linear or binary activation functions of output units.
Very different from previous (supervised) learning where we pay our

attention to input-output relationship, here we will look at the pattern
of connections (weights)
NN Based on Competition
• Competition is important for NN
– Competition between neurons has been observed in
biological nerve systems
– Competition is important in solving many problems
• To classify an input x_1 C_1
pattern into one of the
m classes
– idea case: one class
node has output 1, x_n C_m
all other 0 ;
– often more than one INPUT CLASSIFICATION
class nodes have
non-zero output
– If these class nodes compete with each other, maybe
only one will win eventually and all others lose
(winner-takes-all). The winner represents the
computed classification of the input
2- Way Competionn
2- Way Completion
Why not make the winner exactly like
the input?
• There may be many more distinct input patterns than neurons.
• By “averaging” its behavior, a neuron can put a large number
of distinct, but similar inputs into the same category.
Categorization of inputs by 2- Neuron

• Ways to realize competition in
NN
– Lateral inhibition (Maxnet,
wij , w ji < 0
Mexican hat)output of each node xi xj
feeds to others through
inhibitory connections (with
negative weights)
– Resource competition w ii < 0 w jj < 0

• output of node k is distributed
to node i and j proportional to xi xj
wik and wjk , as well as xi and
xj wik w jk
• self decay xk
• biologically sound
Fixed-weight Competitive Nets -MAX NET
 Max Net is a fixed weight competitive net.
 Max Net serves as a subnet for picking the node whose
input is larger. All the nodes present in this subnet are fully
interconnected and there exist symmetrical weights in all
these weighted interconnections.
 The weights between the neurons are inhibitory and
fixed.
 The architecture of this net is as shown below:
16
17
– Notes:
• Competition: iterative process until the net stabilizes (at most one
node with positive activation)
• 0 < ε < 1 / m , where m is the # of competitors
• ε too small: takes too long to converge
• ε too big: may suppress the entire network (no winner)
• Example
θ = 1, ε = 1/5 = 0.2
x(0) = (0.5 0.9 1 0.9 0.9 ) initial input

x(1) = (0 0.24 0.36 0.24 0.24 )
x(2) = (0 0.072 0.216 0.072 0.072)
x(3) = (0 0 0.1728 0 0 )
x(4) = (0 0 0.1728 0 0 ) = x(3)
stabilized 18
MEXICAN HAT NETWORK
 Kohonen developed the Mexican hat network which is a more generalized contrast
enhancement network compared to the earlier Max Net.
 Here, in addition to the connections within a particular layer of neural net, the neurons
also receive some other external signals. This interconnection pattern is repeated for
several other neurons in the layer.
 The architecture for the network is as shown below:
close neighbors: cooperative (mutually excitatory , w > 0)
farther away neighbors: competitive (mutually inhibitory, < 0)
too far away neighbors: irrelevant (w = 0)
Need a definition of distance (neighborhood):
one dimensional: ordering by index (1,2,…n)
two dimensional: lattice
19
MEXICAN HAT FUNCTION OF LATERAL
CONNECTION
weights activation function
0 if x <0
c1 if distance(i, j ) < k (c1 > 0) 
f ( x ) =x if 0 ≤ x ≤max

wij = c2 if distance(i, j ) = k (0 < c2 < c1 ) 
max if x >max
c if distance(i, j ) > k (c3 ≤ 0)
 3
ramp function :
20
MEXICAN HAT
NETWORK
 The lateral connections are used to create a competition
between neurons. The neuron with the largest activation level
among all neurons in the output layer becomes the winner.
This neuron is the only neuron that produces an output signal.
The activity of all other neurons is suppressed in the
competition.
 The lateral feedback connections produce excitatory or

inhibitory effects, depending on the distance from the winning
neuron. This is achieved by the use of a Mexican Hat function
which describes synaptic weights between neurons in the
output layer.
21
MEXICAN HAT NETWORK
22
23
Maxican Hat Algorithm – Example for a simple net with seven units
• Equilibrium:
– negative input = positive input for all nodes
– winner has the highest activation;
– its cooperative neighbors also have positive activation;
– its competitive neighbors have negative (or zero) activations. 24
HAMMING NETWORK
 The Hamming network selects stored classes, which are at
a maximum Hamming distance (H) from the noisy vector
presented at the input.
 The Hamming distance between the two vectors is the

number of components in which the vectors differ.
 The Hamming network consists of two layers.
• The first layer computes the difference between the

total number of components and Hamming distance
between the input vector x and the stored pattern of
vectors in the feedforward path.
• The second layer of the Hamming network is

composed of Max Net (used as a subnet) or a
winner-take-all network which is a recurrent
network.
25
HAMMING NETWORK
• Hamming distance of two vectors, of dimension n,
– Number of bits in disagreement.
– In bipolar:
x T ⋅ y = ∑i xi ⋅ yi = a − d
where : a is number of bits in agreement in x and y
d is number of bits differring in x and y
d = n − a hamming distance
x T ⋅ y = 2a − n
a = 0.5( x T ⋅ y + n)
− d = a − n = 0.5( x T ⋅ y + n) − n = 0.5( x T ⋅ y ) − 0.5n
(negative) distance between x and y can be
determied by x T ⋅ y and n
26
ARCHITECTURE OF
HAMMING NET
27
28
29

Unsupervised Learning

Uploaded by

Copyright:

Available Formats

Unsupervised Learning

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unsupervised Learning

Uploaded by

Copyright:

Available Formats

UNSUPERVISED LEARNING

 The total weight from the input layer to each

 A consequence is that the winner is the output

• Each neuron will tend to be similar to inputs

The discrete metric: d(x, wi) = 0 if x = wi, 1 otherwise

0 when the values are equal, 1 otherwise

• Suppose x = [1 1 -1 1], w = [1 -1 -1 -1]

• Euclidean distance = sqrt(02 + 22 + 02 + 22 ) = 2.83...

input to output layer connection feedforward

output layer compares activation's of units following presentation of

winner selected based on largest activation

linear or binary activation functions of output units.

Very different from previous (supervised) learning where we pay our

Categorization of inputs by 2- Neuron

– Resource competition w ii < 0 w jj < 0

x(0) = (0.5 0.9 1 0.9 0.9 ) initial input

 The lateral feedback connections produce excitatory or

 The Hamming distance between the two vectors is the

 The Hamming network consists of two layers.

• The first layer computes the difference between the

• The second layer of the Hamming network is

You might also like