Lecture 4 ANN PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

3/22/2023

Artificial Neural Networks

Lecture 4

Content

 Learning Rules for Single-Layer Perceptron


Networks
 Perceptron Learning Rule

 Adaline Learning Rule

 -Leaning Rule

 Multilayer Perceptron

 Back Propagation Learning algorithm

Faculty of Engineering-Cairo University 2

1
3/22/2023

Delta rule-Review (single neuron)


1
x1 wi1 a(neti ) 
1  e neti


x2 wi2
wTi x( k ) yi( k )  a  wTi x ( k ) 
.
.
.
xm wim
1 p (k ) 1 p
 
2
Minimize E (w)     d ( k )  a(wT x( k ) ) 
(k ) 2
( d y )
2 k 1 2 k 1
E (w ) p
  (d ( k )  y ( k ) )x (jk )  y ( k ) (1  y ( k ) )
T
 E (w ) E (w ) E (w ) 
 w E (w )   , , ,  w j k 1
 w1 w2 wm  p
   ( k ) x(jk )  y ( k ) (1  y ( k ) )
k 1

w  w E(w) p
w j     ( k ) x (jk )  y ( k ) (1  y ( k ) )
k 1
Faculty of Engineering-Cairo University 3

Learning a multiplayer-Supervised Learning

Training Set

T  (x(1) , d(1) ), (x( 2) , d( 2) ),, (x( p ) , d( p ) ) 
o1 o2 on
d1 d2 dn
Output Layer . . .

. . .
Hidden Layer
. . .

Input Layer . . .

x1 x2 xs
Faculty of Engineering-Cairo University 5

2
3/22/2023

Supervised Learning

Training Set

T  (x(1) , d(1) ), (x( 2) , d( 2) ),, (x( p ) , d( p ) ) 
Sum of Squared Errors o1 o2 on
d1 d2 dn
. . .

E (l )

2 j 1

1 n (l )
  d j  o (jl ) 2
. . .

n: Number of outputs
Goal: . . .

p
Minimize E   E (l )
. . .

l 1
x1 x2 xs
p: Number of patterns`

Faculty of Engineering-Cairo University 6

Back Propagation Learning Algorithm

1 n (l )
 
p
  d j  o (jl ) E   E (l )
(l ) 2
E
2 j 1 l 1
o1 o2 on
d1 d2 dn
 Learning on Output Neurons . . .
 Learning on Hidden Neurons
. . .

. . .

. . .

x1 x2 xs
Faculty of Engineering-Cairo University 7

3
3/22/2023

Learning on Output Neurons

1 n

 d (jl )  o (jl ) 
p

E
2
E (l )  E (l )
2 j 1 l 1

o1 oj on o (jl )  a(net (jl ) ) net (jl )   w jioi(l )


d1 dj dn
. . . j . . . E  p p
E (l )

w ji w ji
E
l 1
(l )

l 1 w ji
wji
E (l ) E (l ) net j
(l )
. . . i . . . 
w ji net (jl ) w ji
. . . . . .
? ?
. . . . . .

Faculty of Engineering-Cairo University 8

Learning on Output Neurons

1 n

 d (jl )  o (jl ) 
p

E
2
E (l )  E (l )
2 j 1 l 1

o1 oj on o (jl )  a(net (jl ) ) net (jl )   w jioi(l )


d1 dj dn
. . . j . . . E  p p
E (l )

w ji w ji
 E (l )  
l 1 l 1 w ji
wji
E (l ) E (l ) net j
(l )
. . . i . . . 
w ji net (jl ) w ji
. . . . . .
E (l ) E (l ) o j
(l )


net (jl ) o (jl ) net (jl )
. . . . . .
depends on the
 (d (jl )  o (jl ) ) activation function
Faculty of Engineering-Cairo University 9

4
3/22/2023

Learning on Output Neurons

1 n

 d (jl )  o (jl ) 
p

E
2
E (l )  E (l )
2 j 1 l 1

o1 oj on o (jl )  a(net (jl ) ) net (jl )   w jioi(l )


d1 dj dn
. . . j . . . E  p p
E (l )

w ji w ji
 E (l )  
l 1 l 1 w ji
wji
E (l ) E (l ) net j
(l )
. . . i . . . 
w ji net (jl ) w ji
. . . . . . E (l ) E (l ) o j
(l )


net (jl ) o (jl ) net (jl )
. . . . . . Using sigmoid,

(d (j l )  o(jl ) )  o(jl ) (1  o(jl ) )


Faculty of Engineering-Cairo University 10

Learning on Output Neurons

E (l )
 (j l )   (d (j l )  o(jl ) ) o(jl ) (1  o(jl ) )
net (jl )
o1 oj on o (jl )  a(net (jl ) ) 
net (jl )  ( l ) w o(l )
ji i
d1 dj dn j
. . . j . . . E  p p
E (l )

w ji w ji
E
l 1
(l )

l 1 w ji
wji
E (l ) E (l ) net j
(l )
. . . i . . . 
w ji net (jl ) w ji
. . . . . . E (l ) E (l ) o j
(l )


net (jl ) o (jl ) net (jl )
. . . . . . Using sigmoid,

(d (j l )  o(jl ) )  o(jl ) (1  o(jl ) )


Faculty of Engineering-Cairo University 11

5
3/22/2023

Learning on Output Neurons

1 n

 d (jl )  o (jl ) 
p

E
2
E (l )  E (l )
2 j 1 l 1

o1 oj on o (jl )  a(net (jl ) ) net (jl )   w jioi(l )


d1 dj dn
. . . j . . . E  p p
E (l )

w ji w ji
 E (l )  
l 1 l 1 w ji
wji
E (l ) E (l ) net j
(l )
. . . i . . . 
w ji net (jl ) w ji oi(l )
. . . . . .
E ( l )
  (j l ) oi( l )
w ji
. . . . . .
 (d (j l )  o(jl ) ) o(jl ) (1  o(jl ) )oi(l )

Faculty of Engineering-Cairo University 12

Learning on Output Neurons


1 n

 d (jl )  o (jl ) 
p

E
2
E (l )  E (l )
2 j 1 l 1

o1 oj on o (jl )  a(net (jl ) ) net (jl )   w jioi(l )


d1 dj dn
. . . j . . . E  p (l ) p
E (l )
How

w ji wto
1 E  
l 1 weights
ji l train the w ji
wji
. . . i . . . connecting
E

E to
(l )
output neurons?
net (l ) (l )
j

w ji net (jl ) w ji
E p oi(l )
. 
. .  j . o.i .
(l ) (l )

w ji l 1 E ( l )
  (j l ) oi( l )
w ji
. . . p
w ji     (j l ) oi(l )
. . .
 (d (j l )  o(jl ) ) o(jl ) (1  o(jl ) )oi(l )
l 1
Faculty of Engineering-Cairo University 13

6
3/22/2023

Learning on Hidden Neurons

1 n

 d (jl )  o (jl ) 
p

E
2
E (l )  E (l )
2 j 1 l 1

E  p p
E (l )
. . . j . . .

wik wik
 E (l )  
l 1 l 1 wik
wji E E net
(l ) (l ) (l )
 i
wik neti(l ) wik
. . . i . . .
wik
. . .k . . .
? ?
. . . . . .

Faculty of Engineering-Cairo University 14

Learning on Hidden Neurons

1 n

 d (jl )  o (jl ) 
p

E
2
E (l )  E (l )
2 j 1 l 1
 i(l )
E  p p
E (l )
. . . j . . .

wik wik
 E (l )  
l 1 l 1 wik
wji E (l ) E (l ) neti(l )

wik neti(l ) wik
. . . i . . . ok(l )

wik
. . . k . . .

. . . . . .

Faculty of Engineering-Cairo University 15

7
3/22/2023

Learning on Hidden Neurons


o (jl )  a(net (jl ) ) net (jl )   w jioi(l )
1 n

 d (jl )  o (jl ) 
p

E
2
E (l )  E (l )
2 j 1 l 1
 i(l )
E  p p
E (l )
. . . j . . .

wik wik
 E (l )  
l 1 l 1 wik
wji E E net
(l ) (l ) (l )
 i
wik neti(l ) wik
. . . i . . . ok(l )

wik E (l ) E (l ) oi(l )

. . .k . . . neti(l ) oi(l ) neti(l )
? oi(l ) (1  oi(l ) )
. . . . . .

Faculty of Engineering-Cairo University 16

Learning on Hidden Neurons


o (jl )  a(net (jl ) ) net (jl )   w jioi(l )

 i(l ) 
E (l )
  oi( l ) (1  oi( l ) ) w ji (j l )  i(l )
neti(l )
E  E (l )
j p p

. . . j . . .

wik wik
E
l 1
(l )

l 1 wik
wji E (l ) E (l ) neti(l )

wik neti(l ) wik
. . . i . . . ok(l )

wik E (l ) E (l ) oi(l )
 (l )
. . .k . . . neti (l )
oi neti(l )
oi(l ) (1  oi(l ) )
E E net
(l ) (l ) (l )


j
. . . . . .
oi
(l )
j net j
(l )
o (l )
i

 (lj ) w ji
Faculty of Engineering-Cairo University 17

8
3/22/2023

Learning on Hidden Neurons

E (l )
 i(l )    oi( l ) (1  oi( l ) ) w ji (j l )
neti(l )
E  E (l )
j p p

. . . j . . .

wik wik
E
l 1
(l )

l 1 wik
wji E (l ) E (l ) neti(l )

wik neti(l ) wik
. . . i . . . ok(l )

wik E p

. . .k . . .    i(l ) ok(l )
wik l 1
. . . . . . p
wik     i(l ) ok(l )
l 1

Faculty of Engineering-Cairo University 18

Back Propagation

o1 oj on
d1 dj dn
. . . j . . .

. . . i . . .

. . . k . . .

. . . . . .

x1 . . . xs
Faculty of Engineering-Cairo University 19

9
3/22/2023

Back Propagation
E (l )
 (j l )    (d (j l )  o(jl ) )o(jl ) (1  o(jl ) )
net (jl )

o1 oj on
d1 dj dn
. . . j . . . p
w ji     (j l ) oi(l )
l 1

. . . i . . .

. . . k . . .

. . . . . .

x1 . . . xs
Faculty of Engineering-Cairo University 20

Back Propagation
E (l )
 (j l )    (d (j l )  o(jl ) )o(jl ) (1  o(jl ) )
net (jl )

o1 oj on
d1 dj dn
. . . j . . . p
w ji     (j l ) oi(l )
l 1

. . . i . . . p
wik     i(l ) ok(l )
l 1
. . . k . . .

. . . E (l )
  oi( l ) (1  oi( l ) ) w ji (j l )
. . .
 i(l ) 
neti(l ) j

x1 . . . xs
Faculty of Engineering-Cairo University 21

10
3/22/2023

Back-propagation training algorithm

Step 1: Initialisation
Set all the weights and threshold levels of the
network to random numbers. You can use MATLAB
command “randn”: X = randn(n,m)

Faculty of Engineering-Cairo University 22

Back-propagation training algorithm


Step 2: Activation
(a) Calculate the actual outputs of the neurons in the hidden layer:

 s 
Oi ( p )  sigmoid  xk ( p )  wik ( p )  i 
 k 1 
where s is the number of inputs to neuron i in the hidden layer.

(b) Calculate the actual outputs of the neurons in the output layer:

 m 
O j ( p )  sigmoid  Oi ( p )  w ji ( p )   j 
 i 1 

where m is the number of inputs to neuron j in the output layer.


Faculty of Engineering-Cairo University 23

11
3/22/2023

Back-propagation training algorithm

Step 3: Weight training


Update the weights in the back-propagation network propagating
backward the errors associated with output neurons.
(a) Calculate the error gradient for the neurons in the output
layer:
 j ( p)  o j ( p)  1  o j ( p) e j ( p)
where e j ( p)  d j ( p)  o j ( p)
Calculate the weight corrections:
w ji ( p)    oi ( p)   j ( p)
Update the weights at the output neurons:
w ji ( p  1)  w ji ( p)  w ji ( p)
Faculty of Engineering-Cairo University 24

Back-propagation training algorithm

Step 3: Weight training


(b) Calculate the error gradient for the neurons in the hidden
layer:
n
 i ( p)  oi ( p)  [1  oi ( p)]    j ( p) w ji ( p)
j 1

Calculate the weight corrections:


wik ( p)    xk ( p)   i ( p)
Update the weights at the hidden neurons:
wik ( p  1)  wik ( p)  wik ( p)

Faculty of Engineering-Cairo University 25

12
3/22/2023

Back-propagation training algorithm

Step 4: Iteration
Increase iteration p by one, go back to Step 2 and
repeat the process until the selected error criterion is
satisfied.

Faculty of Engineering-Cairo University 26

Back-propagation training algorithm


start Calculate 𝐸(𝑃)

Initialize all
weights &biases Last pattern?
No
yes
Enter
Calculate total 𝐸
pattern{X(p),𝑑𝑗 } No

Calculate 𝑂𝑗 , 𝑂𝑖 𝐸 < 𝜖?

Calculate 𝛿𝑗 , 𝛿𝑖 yes
Stop
Update weights
𝑤𝑗𝑖 , 𝑤𝑖𝑘

Faculty of Engineering-Cairo University 27

13
3/22/2023

Example : XOR

 Suppose that a network is required to perform logical


operation Exclusive-OR. Recall that a single-layer perceptron
could not do this operation. Now we will apply the three-
layer net.
1

3
w13 1
x1 1 3 w35
w23 5

5 y5
W14
x2 2 4 w45
w24
Input 4 Output
layer layer
1
Hidden layer
Faculty of Engineering-Cairo University 28

Example : XOR

 The effect of the threshold applied to a neuron in


the hidden or output layer is represented by its
weight, , connected to a fixed input equal to 1.
 The initial weights and threshold levels are set
randomly as follows:
w13 = 0.5, w14 = 0.9, w23 = 0.4, w24 = 1.0, w35 =
1.2, w45 = 1.1, 3 = 0.8, 4 = 0.1 and 5 = 0.3.

Faculty of Engineering-Cairo University 29

14
3/22/2023

Example : XOR
 We consider a training set where inputs x1 and x2 are
equal to 1 and desired output yd,5 is 0. The actual
outputs of neurons 3 and 4 in the hidden layer are
calculated as


y3  sigmoid( x1w13  x2 w23  3 )  1 / 1  e(10.510.410.8)  0.5250 
y4  sigmoid( x1w14  x2 w24   )  1 / 1  e
4
(10.9 11.0 10.1)
  0.8808
 Now the actual output of neuron 5 in the output layer is
determined as:
 
y5  sigmoid( y3w35  y4 w45  5 )  1 / 1  e (0.52501.20.88081.110.3)  0.5097

 Thus, the following error is obtained:

e  yd ,5  y5  0  0.5097  0.5097
Faculty of Engineering-Cairo University 30

Example : XOR

 The next step is weight training. To update the weights and


threshold levels in our network, we propagate the error, e,
from the output layer backward to the input layer.
 First, we calculate the error gradient for neuron 5 in the
output layer:
 5  y5 (1  y5 ) e  0.5097 (1  0.5097) (0.5097)  0.1274
 Then we determine the weight corrections assuming that the
learning rate parameter, , is equal to 0.1:

w35    y3   5  0.1 0.5250 (0.1274)  0.0067


w45    y4   5  0.1 0.8808 (0.1274)  0.0112
5    (1)   5  0.1 (1)  (0.1274)  0.0127
Faculty of Engineering-Cairo University 31

15
3/22/2023

Example : XOR

 Next we calculate the error gradients for neurons 3 and 4 in


the hidden layer:
 3  y3 (1  y3 )   5  w35  0.5250 (1  0.5250) (  0.1274) (  1.2)  0.0381
 4  y4 (1  y4 )   5  w45  0.8808 (1  0.8808) (  0.127 4)  1.1  0.0147

 We then determine the weight corrections:

w13    x1   3  0.11 0.0381 0.0038


w23    x2   3  0.11 0.0381 0.0038
3    (1)   3  0.1 (1)  0.0381 0.0038
w14    x1   4  0.11  (0.0147 )  0.0015
w24    x2   4  0.11 (0.0147 )  0.0015
4    (1)   4  0.1 (1)  (0.0147)  0.0015

Faculty of Engineering-Cairo University 32

Example : XOR

 At last, we update all weights and threshold:

w13  w13  w13  0 .5  0 .0038  0 .5038


w14  w14  w14  0 .9  0 .0015  0 .8985
w 23  w 23  w 23  0 .4  0 .0038  0 .4038
w 24  w 24  w 24  1 .0  0 .0015  0 .9985
w35  w35  w35  1 .2  0 .0067  1 .2067
w 45  w 45  w 45  1 .1  0 .0112  1 .0888
 3   3    3  0 .8  0 .0038  0 .7962
 4   4    4  0 .1  0 .0015  0 .0985
 5   5    5  0 .3  0 .0127  0 .3127

Faculty of Engineering-Cairo University 33

16
3/22/2023

Example : XOR

 The training process is repeated until the sum of squared


errors is less than 0.001.
Sum-Squared Network Error for 224 Epochs
101

100
Sum-Squared Error

10-1

10-2

10-3

10-4
0 50 100 150 200
Epoch

Faculty of Engineering-Cairo University 34

Example : XOR

 Final results of three-layer network learning

Inputs Desired Actual Error Sum of


output output squared
x1 x2 yd y5 e errors
1 1 0 Y
0.0155 0.0155 0.0010
e
0 1 1 0.9849 0.0151
1 0 1 0.9849 0.0151
0 0 0 0.0175 0.0175

Faculty of Engineering-Cairo University 35

17
3/22/2023

Network represented by McCulloch-Pitts model for


solving the Exclusive-OR operation
1

+1.5
1
+1.0
x1 1 3 2.0 +0.5
+1.0

5 y5

+1.0
x2 2 +1.0
4
+1.0
+0.5

1
Faculty of Engineering-Cairo University 36

How to overcome Backpropagation drawbacks

Accelerated learning in multilayer neural networks

 Change the learning rate: Use another numerical optimization


 When the performance surface techniques:
is very flat, it allows a large  The conjugate gradient algorithm,
learning rate,  Levenberg-Marquardt algorithm
 while in a high curvature (a variation of Newton’s method).
region, it would require a small
learning rate
 Add momentum term

Faculty of Engineering-Cairo University 37

18
3/22/2023

Accelerated learning

 Convergence might be improved if we could smooth out the


oscillations in the trajectory. We can do this with a low-pass
filter.

w jk ( p)    w jk ( p  1)    y j ( p)   k ( p)

where  is a positive number (0    1) called the momentum


constant. Typically, the momentum constant is set to 0.95.

Faculty of Engineering-Cairo University 38

Learning with momentum for operation Exclusive-OR


Sum-Squared Network Error for 224 Epochs
101

100
Sum-Squared Error

10-1 With momentum term


10-2 Training for 126 Epochs
102
Sum-Squared Error

1
10
10-3
100
10-1
10-4
0 50 100 150 200
Epoch 10-2
10-3
Without momentum term 10-4
0 20 40 60 80 100 120
Epoch
1.5

1
Learning Rate

0.5

-0.5

-1
0 20 40 60 80 100 120 140
Epoch

Faculty of Engineering-Cairo University 39

19
3/22/2023

Learning with adaptive learning rate

 Adapting the learning rate requires some changes in


the back-propagation algorithm:
 If the sum of squared errors at the current epoch
exceeds the previous value by more than a
predefined ratio (typically 1.04), the learning rate
parameter is decreased (typically by multiplying by
0.7) and new weights and thresholds are calculated.

 If the error is less than the previous one, the


learning rate is increased (typically by multiplying by
1.05).
Faculty of Engineering-Cairo University 40

Learning with adaptive learning rate


Training for 103 Epochs
102
Sum-Squared Error

101
100
10-1
10-2
10-3
10-4
0 10 20 30 40 50 60 70 80 90 100
Epoch
1

0.8
Learning Rate

0.6

0.4

0.2

0
0 20 40 60 80 100 120
Epoch

Faculty of Engineering-Cairo University 41

20
3/22/2023

Learning with momentum and adaptive learning rate


Training for 85 Epochs
102

Sum-Squared Error 101


100
10-1
10-2
10-3
10-4
0 10 20 30 40 50 60 70 80
Epoch
2.5

2
Learning Rate

1.5

0.5

0
0 10 20 30 40 50 60 70 80 90
Epoch

Faculty of Engineering-Cairo University 42

Learning Factors

Initial Weights
Learning Constant ()
Cost Functions
Update Rules
Training Data and Generalization
Number of Layers
Number of Hidden Nodes

Faculty of Engineering-Cairo University 43

21

You might also like