Unsupervised Learning
Unsupervised Learning
Unsupervised Learning
NETWORKS
1
Biological background:
Neurons are wired topographically, nearby neurons connect to nearby
neurons. In visual cortex, neurons are organized in functional columns.
How do these neurons organise? Don’t get any ‘supervision’
Ocular dominance columns: one region responds to one eye input
Orientation columns: one region responds to one direction
UNSUPERVISED
LEARNING
No help from the outside.
Training samples contain only input patterns
•No desired output is given (teacher-less)
Learning by doing.
Learn to form classes/clusters of sample patterns according
to similarities among them
•Patterns in a cluster would have similar features
•No prior knowledge as what features are important for
classification, and how many classes are there.
Used to pick out structure in the input:
•Clustering,
•Reduction of dimensionality compression.
Example: Kohonen’s Learning Law.
3
FEW UNSUPERVISED LEARNING
NETWORKS
• NN models to be covered
– Competitive networks and competitive learning
• Winner-takes-all (WTA)
• Maxnet
• Hemming net
– Counterpropagation nets
– Adaptive Resonance Theory
– Self-organizing map (SOM)
• Applications
– Clustering
– Vector quantization
– Feature extraction
– Dimensionality reduction
– optimization
4
COMPETITIVE
Output
LEARNING
units compete, so that eventually only
one neuron (the one with the most input) is active
in response to each output pattern.
5
COMPETITIVE
LEARNING
• when presented with patterns from the same selection of inputs
repeatedly, will tend to stabilize so that its neurons are
representatives of clusters of closer inputs.
6
Measures of similarity or closeness
(opposite: dissimilarity or distance)
• Suppose x is an input vector and wi the weight vector of the ith neuron.
• One measure of distance is the Euclidean distance:
|| x – wi || = sqrt(sumi ( (xj - wij )2 ))
= sqrt((xj - wij )*(xj - wij )T)
(vector inner product)
Another measure of distance, used when the values are integer, is the “Manhattan”
“city-block”, or “taxi cab” distance:
|| x - wi || = sumi( | xj - wij | )
Another measure of distance, used when the values are 2-valued, is the “Hamming
distance”:
Sum i ( | xj ==wij | )
• Discrete metric = 1
• Manhattan distance = 0 + 2 + 0 + 2 = 4
• Hamming distance = 0 + 1 + 0 + 1 = 2
8
Competitive learning
--finite resources: outputs ‘compete’ to see which will win via
inhibitory connections between them
--aim is to automatically discover statistically salient features of
pattern vectors in training data set: feature detectors
--can find clusters in training data pattern space which can be used
to classify new patterns
--basic structure:
Outputs
inputs
input layer fully connected to output layer
16
Fixed-weight Competitive Nets -MAX NET
17
Fixed-weight Competitive Nets -MAX NET
– Notes:
• Competition: iterative process until the net stabilizes (at most one
node with positive activation)
• 0 < ε < 1 / m , where m is the # of competitors
• ε too small: takes too long to converge
• ε too big: may suppress the entire network (no winner)
• Example
θ = 1, ε = 1/5 = 0.2
19
MEXICAN HAT FUNCTION OF LATERAL
CONNECTION
weights activation function
0 if x <0
c1 if distance(i, j ) < k (c1 > 0)
f ( x ) =x if 0 ≤ x ≤max
wij = c2 if distance(i, j ) = k (0 < c2 < c1 )
max if x >max
c if distance(i, j ) > k (c3 ≤ 0)
3
ramp function :
20
MEXICAN HAT
NETWORK
The lateral connections are used to create a competition
between neurons. The neuron with the largest activation level
among all neurons in the output layer becomes the winner.
This neuron is the only neuron that produces an output signal.
The activity of all other neurons is suppressed in the
competition.
21
MEXICAN HAT NETWORK
22
23
Maxican Hat Algorithm – Example for a simple net with seven units
• Equilibrium:
– negative input = positive input for all nodes
– winner has the highest activation;
– its cooperative neighbors also have positive activation;
– its competitive neighbors have negative (or zero) activations. 24
HAMMING NETWORK
The Hamming network selects stored classes, which are at
a maximum Hamming distance (H) from the noisy vector
presented at the input.
x T ⋅ y = ∑i xi ⋅ yi = a − d
where : a is number of bits in agreement in x and y
d is number of bits differring in x and y
d = n − a hamming distance
x T ⋅ y = 2a − n
a = 0.5( x T ⋅ y + n)
− d = a − n = 0.5( x T ⋅ y + n) − n = 0.5( x T ⋅ y ) − 0.5n
(negative) distance between x and y can be
determied by x T ⋅ y and n
26
ARCHITECTURE OF
HAMMING NET
27
28
29