Unsupervised Learning

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 29

UNSUPERVISED LEARNING

NETWORKS

1
Biological background:
Neurons are wired topographically, nearby neurons connect to nearby
neurons. In visual cortex, neurons are organized in functional columns.
How do these neurons organise? Don’t get any ‘supervision’
Ocular dominance columns: one region responds to one eye input
Orientation columns: one region responds to one direction
UNSUPERVISED
 LEARNING
No help from the outside.
 Training samples contain only input patterns
•No desired output is given (teacher-less)
 Learning by doing.
Learn to form classes/clusters of sample patterns according
to similarities among them
•Patterns in a cluster would have similar features
•No prior knowledge as what features are important for
classification, and how many classes are there.
 Used to pick out structure in the input:
•Clustering,
•Reduction of dimensionality  compression.
 Example: Kohonen’s Learning Law.

3
FEW UNSUPERVISED LEARNING
NETWORKS
• NN models to be covered
– Competitive networks and competitive learning
• Winner-takes-all (WTA)
• Maxnet
• Hemming net
– Counterpropagation nets
– Adaptive Resonance Theory
– Self-organizing map (SOM)

• Applications
– Clustering
– Vector quantization
– Feature extraction
– Dimensionality reduction
– optimization

4
COMPETITIVE
 Output
LEARNING
units compete, so that eventually only
one neuron (the one with the most input) is active
in response to each output pattern.

 The total weight from the input layer to each


output neuron is limited. If some connections are
strengthened, others must be weakened.

 A consequence is that the winner is the output


neuron whose weights best match the activation
pattern.

5
COMPETITIVE
LEARNING
• when presented with patterns from the same selection of inputs
repeatedly, will tend to stabilize so that its neurons are
representatives of clusters of closer inputs.

• Each neuron will tend to be similar to inputs


in its cluster (like a chameleon, perhaps

6
Measures of similarity or closeness
(opposite: dissimilarity or distance)
• Suppose x is an input vector and wi the weight vector of the ith neuron.
• One measure of distance is the Euclidean distance:
|| x – wi || = sqrt(sumi ( (xj - wij )2 ))
= sqrt((xj - wij )*(xj - wij )T)
(vector inner product)

The discrete metric: d(x, wi) = 0 if x = wi, 1 otherwise

Another measure of distance, used when the values are integer, is the “Manhattan”
“city-block”, or “taxi cab” distance:
|| x - wi || = sumi( | xj - wij | )

Another measure of distance, used when the values are 2-valued, is the “Hamming
distance”:
Sum i ( | xj ==wij | )

0 when the values are equal, 1 otherwise


7
Example for Different Metrics

• Suppose x = [1 1 -1 1], w = [1 -1 -1 -1]

• Euclidean distance = sqrt(02 + 22 + 02 + 22 ) = 2.83...

• Discrete metric = 1

• Manhattan distance = 0 + 2 + 0 + 2 = 4

• Hamming distance = 0 + 1 + 0 + 1 = 2

8
Competitive learning
--finite resources: outputs ‘compete’ to see which will win via
inhibitory connections between them
--aim is to automatically discover statistically salient features of
pattern vectors in training data set: feature detectors
--can find clusters in training data pattern space which can be used
to classify new patterns
--basic structure:

Outputs

inputs
input layer fully connected to output layer

input to output layer connection feedforward

output layer compares activation's of units following presentation of


pattern vector x via (sometimes virtual) inhibitory lateral connections

winner selected based on largest activation


winner- takes-all (WTA)

linear or binary activation functions of output units.

Very different from previous (supervised) learning where we pay our


attention to input-output relationship, here we will look at the pattern
of connections (weights)
NN Based on Competition
• Competition is important for NN
– Competition between neurons has been observed in
biological nerve systems
– Competition is important in solving many problems
• To classify an input x_1 C_1
pattern into one of the
m classes
– idea case: one class
node has output 1, x_n C_m
all other 0 ;
– often more than one INPUT CLASSIFICATION
class nodes have
non-zero output
– If these class nodes compete with each other, maybe
only one will win eventually and all others lose
(winner-takes-all). The winner represents the
computed classification of the input
2- Way Competionn
2- Way Completion
Why not make the winner exactly like
the input?
• There may be many more distinct input patterns than neurons.
• By “averaging” its behavior, a neuron can put a large number
of distinct, but similar inputs into the same category.

Categorization of inputs by 2- Neuron


• Ways to realize competition in
NN
– Lateral inhibition (Maxnet,
wij , w ji < 0
Mexican hat)output of each node xi xj
feeds to others through
inhibitory connections (with
negative weights)

– Resource competition w ii < 0 w jj < 0


• output of node k is distributed
to node i and j proportional to xi xj
wik and wjk , as well as xi and
xj wik w jk
• self decay xk
• biologically sound
Fixed-weight Competitive Nets -MAX NET
 Max Net is a fixed weight competitive net.
 Max Net serves as a subnet for picking the node whose
input is larger. All the nodes present in this subnet are fully
interconnected and there exist symmetrical weights in all
these weighted interconnections.
 The weights between the neurons are inhibitory and
fixed.
 The architecture of this net is as shown below:

16
Fixed-weight Competitive Nets -MAX NET

17
Fixed-weight Competitive Nets -MAX NET
– Notes:
• Competition: iterative process until the net stabilizes (at most one
node with positive activation)
• 0 < ε < 1 / m , where m is the # of competitors
• ε too small: takes too long to converge
• ε too big: may suppress the entire network (no winner)

• Example
θ = 1, ε = 1/5 = 0.2

x(0) = (0.5 0.9 1 0.9 0.9 ) initial input


x(1) = (0 0.24 0.36 0.24 0.24 )
x(2) = (0 0.072 0.216 0.072 0.072)
x(3) = (0 0 0.1728 0 0 )
x(4) = (0 0 0.1728 0 0 ) = x(3)
stabilized 18
MEXICAN HAT NETWORK
 Kohonen developed the Mexican hat network which is a more generalized contrast
enhancement network compared to the earlier Max Net.
 Here, in addition to the connections within a particular layer of neural net, the neurons
also receive some other external signals. This interconnection pattern is repeated for
several other neurons in the layer.
 The architecture for the network is as shown below:
close neighbors: cooperative (mutually excitatory , w > 0)
farther away neighbors: competitive (mutually inhibitory, < 0)
too far away neighbors: irrelevant (w = 0)
Need a definition of distance (neighborhood):
one dimensional: ordering by index (1,2,…n)
two dimensional: lattice

19
MEXICAN HAT FUNCTION OF LATERAL
CONNECTION
weights activation function
0 if x <0
c1 if distance(i, j ) < k (c1 > 0) 
f ( x ) =x if 0 ≤ x ≤max

wij = c2 if distance(i, j ) = k (0 < c2 < c1 ) 
max if x >max
c if distance(i, j ) > k (c3 ≤ 0)
 3
ramp function :

20
MEXICAN HAT
NETWORK
 The lateral connections are used to create a competition
between neurons. The neuron with the largest activation level
among all neurons in the output layer becomes the winner.
This neuron is the only neuron that produces an output signal.
The activity of all other neurons is suppressed in the
competition.

 The lateral feedback connections produce excitatory or


inhibitory effects, depending on the distance from the winning
neuron. This is achieved by the use of a Mexican Hat function
which describes synaptic weights between neurons in the
output layer.

21
MEXICAN HAT NETWORK

22
23
Maxican Hat Algorithm – Example for a simple net with seven units

• Equilibrium:
– negative input = positive input for all nodes
– winner has the highest activation;
– its cooperative neighbors also have positive activation;
– its competitive neighbors have negative (or zero) activations. 24
HAMMING NETWORK
 The Hamming network selects stored classes, which are at
a maximum Hamming distance (H) from the noisy vector
presented at the input.

 The Hamming distance between the two vectors is the


number of components in which the vectors differ.

 The Hamming network consists of two layers.

• The first layer computes the difference between the


total number of components and Hamming distance
between the input vector x and the stored pattern of
vectors in the feedforward path.

• The second layer of the Hamming network is


composed of Max Net (used as a subnet) or a
winner-take-all network which is a recurrent
network.
25
HAMMING NETWORK
• Hamming distance of two vectors, of dimension n,
– Number of bits in disagreement.
– In bipolar:

x T ⋅ y = ∑i xi ⋅ yi = a − d
where : a is number of bits in agreement in x and y
d is number of bits differring in x and y
d = n − a hamming distance
x T ⋅ y = 2a − n
a = 0.5( x T ⋅ y + n)
− d = a − n = 0.5( x T ⋅ y + n) − n = 0.5( x T ⋅ y ) − 0.5n
(negative) distance between x and y can be
determied by x T ⋅ y and n

26
ARCHITECTURE OF
HAMMING NET

27
28
29

You might also like