Artificial Intelligence and Learning Algorithms: Presented by Brian M. Frezza 12/1/05

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 67

Artificial Intelligence and Learning Algorithms

Presented By Brian M. Frezza 12/1/05

Game Plan
Whats a Learning Algorithm? Why should I care?
Biological parallels

Real World Examples Getting our hands dirty with the algorithms
Bayesian Networks Hidden Markov Models Genetic Algorithms Neural Networks

Artificial Neural Networks Vs Neuron Biology


Frasers Rules

Frontiers in AI

Hard Math

Whats a Learning Algorithm?


An algorithm which predicts datas future behavior based on its past performance.
Programmer can be ignorant of the datas trends.
Not rationally designed!

Training Data Test Data

Why do I care?
Use In Informatics
Predict trends in fuzzy data
Subtle patterns in data Complex patterns in data Noisy data

Network inference Classification inference

Analogies To Chemical Biology


Evolution Immunological Response Neurology

Fundamental Theories of Intelligence


Thats heavy dude

Street Smarts
CMUs Navlab-5 (No Hands Across America)
1995 Neural Network Driven Car Pittsburgh to San Diego: 2,797 miles (98.2%) Single hidden layer backpropagation network!

Subcellular location through fluorescence


A Neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells M. V. Boland, and R. F. Murphy, Bioinformatics (2001) 17(12), 1213-1223

Protein secondary structure prediction Intron/Exon predictions Protein/Gene network inference Speech recognition Face recognition

The Algorithms

Bayesian Networks Hidden Markov Models Genetic Algorithms Neural Networks

Bayesian Networks: Basics


Requires models of how data behaves
Set of Hypothesis: {H}

Keeps track of likelihood of each model being accurate as data becomes available
P(H)

Predicts as a weighted average


P(E) = Sum( P(H)*H(E) )

Bayesian Network Example


What color hair will Paul Schaffers kids have if he marries Redhead?
Hypothesis
Ha(rr) rr x rr: 100% Redhead Hb(Rr) rr x Rr: 50% Redhead 50% Not Hc(RR) rr x RR: 100% Not

Initially clueless:
So P(Ha) = P(Hb) = P(Hc) = 1/3

Bayesian Network: Trace


History Hypothesis Ha: 100% Redhead Hb: 50% Redhead 50% Not Hc: 100% Not

Redhead 0 Not 0
P(Ha)

Likelihood's

P(Hb)

P(Hc)

1/3

1/3

1/3

Prediction: Will their next kid be a Redhead?

= P(red|Ha)*P(Ha) + P(red|Hb)*P(Hb) + P(red|Hc)*P(Hc) = (1)*(1/3) + (1/2)*(1/3) + (0)(1/3) =(1/2)

Bayesian Network:Trace
History Hypothesis Ha: 100% Redhead Hb: 50% Redhead 50% Not Hc: 100% Not

Redhead 1 Not 0
P(Ha)

Likelihood's

P(Hb)

P(Hc)

1/2

1/2

Prediction: Will their next kid be a Redhead?

= P(red|Ha)*P(Ha) + P(red|Hb)*P(Hb) + P(red|Hc)*P(Hc) = (1)*(1/2) + (1/2)*(1/2) + (0)(1/3) =(3/4)

Bayesian Network: Trace


History Hypothesis Ha: 100% Redhead Hb: 50% Redhead 50% Not Hc: 100% Not

Redhead 2 Not 0
P(Ha)

Likelihood's

P(Hb)

P(Hc)

3/4

1/4

Prediction: Will their next kid be a Redhead?

= P(red|Ha)*P(Ha) + P(red|Hb)*P(Hb) + P(red|Hc)*P(Hc) = (1)*(3/4) + (1/2)*(1/4) + (0)(1/3) =(7/8)

Bayesian Network: Trace


History Hypothesis Ha: 100% Redhead Hb: 50% Redhead 50% Not Hc: 100% Not

Redhead 3 Not 0
P(Ha)

Likelihood's

P(Hb)

P(Hc)

7/8

1/8

Prediction: Will their next kid be a Redhead?

= P(red|Ha)*P(Ha) + P(red|Hb)*P(Hb) + P(red|Hc)*P(Hc) = (1)*(7/8) + (1/2)*(1/8) + (0)(1/3) =(15/16)

Bayesian Networks Notes


Never reject hypothesis unless directly disproved Learns based on rational models of behavior
Models can be extracted!

Programmer needs to form hypothesis beforehand.

The Algorithms

Bayesian Networks Hidden Markov Models Genetic Algorithms Neural Networks

Hidden Markov Models(HMM)


Discrete learning algorithm
Programmer must be able to categorize predictions

HMMs also assume a model of the world working behind the data Models are also extractable Common Uses
Speech Recognition Secondary structure prediction Intron/Exon predictions Categorization of data

Hidden Markov Models: Take a Step Back


1st order Markov Models:
Q{States} Pr{Transition} Sum of all P(T) out of state = 1
P3 1 P2

P1

Q2

Q1

Q3
1-P3

1-P1-P2
P4

Q4
1-P4

1st order Markov Model Setup


Pick Initial state: Q1 Pick Transition Probabilities:
P1 P2 P3 P4

0.6

0.2

0.9

0.4
Q2
1 P2 P3

For each time step


Pick a random number 0.0-1.0
Q1

P1

Q3
1-P3

1-P1-P2
P4

Q4
1-P4

1st order Markov Model Trace


Current State: Q1 Transition probabilities:
P1 0.6 P2 0.2 P3 0.9 P4 0.4
P1 P3

Time Step = 1

Random Number:
0.22341

Q2
1
P2 1-P1-P2 1-P3

So Next State:
0.22341 < P1
Q2
Take P1
P4

Q1

Q3

Q4
1-P4

1st order Markov Model Trace


Current State: Q2 Transition probabilities:
P1 0.6 P2 0.2 P3 0.9 P4 0.4
P1 P3

Time Step = 2

Random Number:
0.64357

Q2
1
P2 1-P1-P2 1-P3

So Next State:
No Choice, P = 1 Q3
P4

Q1

Q3

Q4
1-P4

1st order Markov Model Trace


Current State: Q3 Transition probabilities:
P1 0.6 P2 0.2 P3 0.9 P4 0.4
P1 P3

Time Step = 3

Random Number:
0.97412

Q2
1
P2 1-P1-P2 1-P3

So Next State:
0.97412 > 0.9
Q4
Take 1-P3
P4

Q1

Q3

Q4
1-P4

1st order Markov Model Trace


Current State: Q4 Transition probabilities:
P1 P2 P3 P4

Time Step = 4

0.6

0.2

0.9

0.4
P1

Im going to stop here. Markov Chain:


Q1, Q2, Q3, Q4
P4

Q2
1
P2 1-P1-P2 1-P3

P3

Q1

Q3

Q4
1-P4

What else can Markov do?


Higher Order Models
Kth order

Metropolis-Hastings
Determining thermodynamic equilibrium

Continuous Markov Models


Time step varies according to continuous distribution

Hidden Markov Models


Discrete model learning

Hidden Markov Models (HMMs)


A Markov Model drives the world but it is hidden from direct observation and its status must be inferred from a set of observables.
Voice recognition
Observable: Sound waves Hidden states: Words

Intron/Exon prediction
Observable: nucleotide sequence Hidden State: Exon, Intron, Non-coding

Secondary structure prediction for protein


Observable: Amino acid sequence Hidden State: Alpha helix, Beta Sheet, Unstructured

Hidden Markov Models: Example


Secondary Structure Prediction
His Asp Arg Phe Ser Ala Tyr Cis Thr Ser Ile Gln Trp Glu Pro Lys Gly Leu Met Asn Val

Observable States

Hidden States

Alpha Helix

Unstructured

Beta Sheet

Hidden Markov Models: Smaller Example Exon/Intron Mapping


A
P(A|Ex) P(T|Ig) P(A|Ig) P(G|Ig) P(C|Ig) P(A|It) P(T|Ex)

P(T|It)

P(G|Ex)

P(G|It)

P(C|Ex)

C
P(C|It)

Observable States

Hidden States
P(Ex|Ex) P(In|Ex) P(Ig|It) P(It|It)

Intergenic

Exon

P(Ex|Ig)

P(Itr|Ig)

Intron
P(Ex|It)

P(In|Ex)

P(Ig|Ig)

Hidden Markov Models: Smaller Example Exon/Intron Mapping


Hidden State Transition Probabilities Observable State Probabilities

To Hidden State

Observable

Ex Ig It

Ex Ig It 0.7 0.1 0.2 0.49 0.5 0.01 0.18 0.02 0.8

From

Ex 0.33 0.42 0.11 0.14 Ig 0.25 0.25 0.25 0.25 It


0.14 0.16 0.5 0.2

Starting Distribution

Ex 0.1

Ig 0.89

It 0.01

Hidden Markov Model


How to predict outcomes from a HMM Brute force: Try every possible Markov chain
Which chain has greatest probability of generating observed data?

Viterbi algorithm
Dynamic programming approach

Viterbi Algorithm: Trace


Hidden State Transition Probabilities To
Ex Ig 0.1 0.9 0.02 It 0.2 0.01 0.8 A T A Ig It 0.09 0.18 A T G

Example Sequence: ATAATGGCGAGTG


Exon 3.3*10-2 Introgenic 2.2*10-1 Intron 1.4*10-3

From

Ex

0.7

Observable State Probabilities Observable A T


0.42 0.25 0.16

G C G

G
0.11 0.25 0.5

C
0.14 0.25 0.2

A G A T

Hidden State

Ex Ig It

0.33 0.25 0.14

Starting Distribution
Ex 0.1 Ig 0.89 It 0.01

Exon = P(A|Ex) * Start Exon = 3.3*10-2 Introgenic = P(A|Ig) * Start Ig = 2.2*10-1 Intron = P(A|It) * Start It = 0.14 * 0.01 = 1.4*10-3

Viterbi Algorithm: Trace


Hidden State Transition Probabilities To
A Ex Ig 0.1 0.5 0.02 It 0.2 0.01 0.8 T A Ex Ig It 0.7 0.49 0.18

Example Sequence: ATAATGGCGAGTG


Exon 3.3*10-2 4.6*10-2 Introgenic 2.2*10-1 2.8*10-2 Intron 1.4*10-3 1.1*10-3

From

A
T G

Observable State Probabilities Observable A T


0.42 0.25 0.16

G C G

G
0.11 0.25 0.5

C
0.14 0.25 0.2

A G A

Hidden State

Ex Ig It

0.33 0.25 0.14

T G Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex) = 4.6*10-2 Introgenic =Max( P(Ig|Ex)*Pn-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig) = 2.8*10-2 Intron = Max( P(It|Ex)*Pn-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It) = 1.1*10-3

Starting Distribution
Ex 0.1 Ig 0.89 It 0.01

Viterbi Algorithm: Trace


Hidden State Transition Probabilities To
A Ex Ig 0.1 0.5 0.02 It 0.2 0.01 0.8 T A Ex Ig It 0.7 0.49 0.18

Example Sequence: ATAATGGCGAGTG


Exon 3.3*10-2 4.6*10-2 1.1*10-2 Introgenic 2.2*10-1 2.8*10-2 3.5*10-3 Intron 1.4*10-3 1.1*10-3 1.3*10-3

From

A
T G

Observable State Probabilities Observable A T


0.42 0.25 0.16

G C G

G
0.11 0.25 0.5

C
0.14 0.25 0.2

A G A

Hidden State

Ex Ig It

0.33 0.25 0.14

T G Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex) = 1.1*10-2 Introgenic =Max( P(Ig|Ex)*Pn-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig) = 3.5*10-3 Intron = Max( P(It|Ex)*Pn-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It) = 1.3*10-3

Starting Distribution
Ex 0.1 Ig 0.89 It 0.01

Viterbi Algorithm: Trace


Hidden State Transition Probabilities To
A Ex Ig 0.1 0.5 0.02 It 0.2 0.01 0.8 T A Ex Ig It 0.7 0.49 0.18

Example Sequence: ATAATGGCGAGTG


Exon 3.3*10-2 4.6*10-2 1.1*10-2 Introgenic 2.2*10-1 2.8*10-2 3.5*10-3 Intron 1.4*10-3 1.1*10-3 1.3*10-3

From

A
T G

2.4*10-3

4.3*10-4

2.9*10-4

Observable State Probabilities Observable A T


0.42 0.25 0.16

G C G

G
0.11 0.25 0.5

C
0.14 0.25 0.2

A G A

Hidden State

Ex Ig It

0.33 0.25 0.14

T G

Starting Distribution
Ex 0.1 Ig 0.89 It 0.01 Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex) Introgenic =Max( P(Ig|Ex)*Pn-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig) Intron = Max( P(It|Ex)*Pn-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It)

Viterbi Algorithm: Trace


Hidden State Transition Probabilities To
A Ex Ig 0.1 0.5 0.02 It 0.2 0.01 0.8 T A Ex Ig It 0.7 0.49 0.18

Example Sequence: ATAATGGCGAGTG


Exon 3.3*10-2 4.6*10-2 1.1*10-2 Introgenic 2.2*10-1 2.8*10-2 3.5*10-3 Intron 1.4*10-3 1.1*10-3 1.3*10-3

From

A
T G

2.4*10-3
7.2*10-4

4.3*10-4
6.1*10-5

2.9*10-4
7.8*10-5

Observable State Probabilities Observable A T


0.42 0.25 0.16

G C G

G
0.11 0.25 0.5

C
0.14 0.25 0.2

A G A

Hidden State

Ex Ig It

0.33 0.25 0.14

T G

Starting Distribution
Ex 0.1 Ig 0.89 It 0.01 Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex) Introgenic =Max( P(Ig|Ex)*Pn-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig) Intron = Max( P(It|Ex)*Pn-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It)

Viterbi Algorithm: Trace


Hidden State Transition Probabilities To
A Ex Ig 0.1 0.5 0.02 It 0.2 0.01 0.8 T A Ex Ig It 0.7 0.49 0.18

Example Sequence: ATAATGGCGAGTG


Exon 3.3*10-2 4.6*10-2 1.1*10-2 Introgenic 2.2*10-1 2.8*10-2 3.5*10-3 Intron 1.4*10-3 1.1*10-3 1.3*10-3

From

A
T G

2.4*10-3
7.2*10-4 5.5*10-5

4.3*10-4
6.1*10-5 1.8*10-5

2.9*10-4
7.8*10-5 7.2*10-5

Observable State Probabilities Observable A T


0.42 0.25 0.16

G C G

G
0.11 0.25 0.5

C
0.14 0.25 0.2

A G A

Hidden State

Ex Ig It

0.33 0.25 0.14

T G

Starting Distribution
Ex 0.1 Ig 0.89 It 0.01 Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex) Introgenic =Max( P(Ig|Ex)*Pn-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig) Intron = Max( P(It|Ex)*Pn-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It)

Viterbi Algorithm: Trace


Hidden State Transition Probabilities To
A Ex Ig 0.1 0.5 0.02 It 0.2 0.01 0.8 T A Ex Ig It 0.7 0.49 0.18

Example Sequence: ATAATGGCGAGTG


Exon 3.3*10-2 4.6*10-2 1.1*10-2 Introgenic 2.2*10-1 2.8*10-2 3.5*10-3 Intron 1.4*10-3 1.1*10-3 1.3*10-3

From

A
T G

2.4*10-3
7.2*10-4 5.5*10-5 4.3*10-6

4.3*10-4
6.1*10-5 1.8*10-5 2.2*10-6

2.9*10-4
7.8*10-5 7.2*10-5 2.9*10-5

Observable State Probabilities Observable A T


0.42 0.25 0.16

G C G

G
0.11 0.25 0.5

C
0.14 0.25 0.2

A G A

Hidden State

Ex Ig It

0.33 0.25 0.14

T G

Starting Distribution
Ex 0.1 Ig 0.89 It 0.01 Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex) Introgenic =Max( P(Ig|Ex)*Pn-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig) Intron = Max( P(It|Ex)*Pn-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It)

Viterbi Algorithm: Trace


Hidden State Transition Probabilities To
A Ex Ig 0.1 0.5 0.02 It 0.2 0.01 0.8 T A Ex Ig It 0.7 0.49 0.18

Example Sequence: ATAATGGCGAGTG


Exon 3.3*10-2 4.6*10-2 1.1*10-2 Introgenic 2.2*10-1 2.8*10-2 3.5*10-3 Intron 1.4*10-3 1.1*10-3 1.3*10-3

From

A
T G

2.4*10-3
7.2*10-4 5.5*10-5 4.3*10-6 7.2*10-7 9.1*10-8 1.1*10-7 8.4*-9 4.9*-9 1.4*10-9 1.1*10-10

4.3*10-4
6.1*10-5 1.8*10-5 2.2*10-6 2.8*10-7 3.5*10-8 9.1*10-9 2.7*10-9 4.1*10-10 1.2*10-10 3.6*10-11

2.9*10-4
7.8*10-5 7.2*10-5 2.9*10-5 4.6*10-6 1.8*10-6 2.0*10-7 8.2*10-8 9.2*10-9 1.2*10-9 4.7*10-10

Observable State Probabilities Observable A T


0.42 0.25 0.16

G C G

G
0.11 0.25 0.5

C
0.14 0.25 0.2

A G A

Hidden State

Ex Ig It

0.33 0.25 0.14

T G

Starting Distribution
Ex 0.1 Ig 0.89 It 0.01 Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex) Introgenic =Max( P(Ig|Ex)*Pn-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig) Intron = Max( P(It|Ex)*Pn-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It)

Hidden Markov Models


How to Train an HMM
The forward-backward algorithm
Ugly probability theory math:

Starts with an initial guess of parameters Refines parameters by attempting to reduce the errors it provokes with fitted to the data.
Normalized probability of the Forward probability of arriving at the state given the observable cross multiplied by the backward probability of generating that observable given the parameter.

The Algorithms

Bayesian Networks Hidden Markov Models Genetic Algorithms Neural Networks

Genetic Algorithms
Individuals are series of bits which represent candidate solutions
Functions Structures Images Code

Based on Darwin evolution


individuals mate, mutate, and are selected based on a Fitness Function

Genetic Algorithms
Encoding Rules
Gray bit encoding
Bit distance proportional to value distance

Selection Rules
Digital / Analog Threshold Linear Amplification Vs Weighted Amplification

Mating Rules
Mutation parameters Recombination parameters

Genetic Algorithms
When are they useful?
Movements in sequence space are funnel shaped with fitness function
Systems where evolution actually applies!

Examples
Medicinal chemistry Protein folding Amino acid substitutions Membrane trafficking modeling Ecological simulations Linear Programming Traveling salesman

The Algorithms

Bayesian Networks Hidden Markov Models Genetic Algorithms Neural Networks

Neural Networks
1943 McCulloch and Pitts Model of how Neurons process information
Field immediately splits
Studying brains
Neurology

Studying artificial intelligence


Neural Networks

Neural Networks: A Neuron, Node, or Unit


Wa,c

W0,c
(Bias)

Activation (W)- W0,c Function a z (Bias)

Wc,n

Output

Wb,c

Neural Networks: Activation Functions Sigmoid Function Threshold Function


(logistic function) +1 out +1 out

In Zero point set by bias

In

Threshold Functions can make Logic Gates with Neurons!


Wa,c = 1

Logical And
B

W0,c = 1.5
(Bias)

(W)- W0,c
a z
(Bias)

Output

1 0

1 1 0

0 0 0

Wb,c = 1
If ( (w) Wo,c > 0 ) Then FIRE Else

Dont

And Gate: Trace


Off
Wa,c = 1

W0,c = 1.5
(Bias)

-1.5

-1.5 < 0

Off

Off

Wb,c = 1

And Gate: Trace


On
Wa,c = 1

W0,c = 1.5
(Bias)

-0.5

-0.5 < 0

Off

Off

Wb,c = 1

And Gate: Trace


Off
Wa,c = 1

W0,c = 1.5
(Bias)

-0.5

-0.5 < 0

Off

On

Wb,c = 1

And Gate: Trace


On
Wa,c = 1

W0,c = 1.5
(Bias)

0.5

0.5 > 0

On

On

Wb,c = 1

Threshold Functions can make Logic Gates with Neurons!


Wa,c = 1

W0,c = 0.5
Logical Or B
(Bias)

(W)- W0,c
a z
(Bias)

U 1 A 0

1 1 1

0 1 0

Wb,c = 1
If ( (w) Wo,c > 0 ) Then FIRE Else

Dont

Or Gate: Trace
Off
Wa,c = 1

W0,c = 0.5
(Bias)

-0.5

-0.5 < 0

Off

Off

Wb,c = 1

Or Gate: Trace
On
Wa,c = 1

W0,c = 0.5
(Bias)

0.5

0.5 > 0

On

Off

Wb,c = 1

Or Gate: Trace
Off
Wa,c = 1

W0,c = 0.5
(Bias)

0.5

0.5 > 0

On

On

Wb,c = 1

Or Gate: Trace
On
Wa,c = 1

W0,c = 0.5
(Bias)

1.5

1.5 > 0

On

On

Wb,c = 1

Threshold Functions can make Logic Gates with Neurons!


Wa,c = -1

Logical Not

W0,c = -0.5
(Bias)

(W)- W0,c
a z
(Bias)

1 ! 0

0 1
If ( (w) Wo,c > 0 ) Then FIRE Else

Dont

Not Gate: Trace


Off
Wa,c = -1

W0,c = -0.5
(Bias)

-0.5

0.5 > 0

On

0 (-0.5) = 0.5

Not Gate: Trace


On
Wa,c = -1

W0,c = -0.5
(Bias)

-0.5

-0.5 < 0

Off

-1 (-0.5) = -0.5

Feed-Forward Vs. Recurrent Networks


Feed-Forward
No Cyclic connections Function of its current inputs No internal state other then weights of connections
Out of time

Recurrent
Cyclic connections Dynamic behavior
Stable Oscillatory Chaotic

Response depends on current state


In time

Short term memory!

Feed-Forward Networks
Knowledge is represented by weight on edges
Modeless!

Learning consists of adjusting weights Customary Arrangements


One Boolean output for each value Arranged in Layers
Layer 1 = inputs Layer 2 to (n-1) = Hidden Layer N = outputs
Perceptron 2 layer Feed-Forward network

Layers

Input

Output

Hidden layer

Perceptron Learning
Gradient Decent used to reduce error

Essentially:
New Weight = Old Weight + adjustment Adjustment = X error X input X d(activation function)
= Learning Rate

Hidden Network Learning


Back-Propagation

Essentially:
Start with Gradient Decent from output Assign blame to inputting neurons proportional to their weights Adjust weights at previous level using Gradient decent based on blame

They dont get it either: Issues that arent well understood


(Learning Rate) Depth of network (number of layers) Size of hidden layers
Overfitting Cross-validation

Minimum connectivity
Optimal Brain Damage Algorithm

No extractable model!

How Are Neural Nets Different From My Brain?


1. Neural nets are feed forward
Brains can be recurrent with feedback loops

2. Neural nets do not distinguish between + or connections


In brains excitatory and inhibitory neurons have different properties
Inhibitory neurons short-distance
Frasers Rules

3. Neural nets exist Out of time


Our brains clearly do exist in time We have very little idea how our brains are learning

4. Neural nets learn VERY differently


In theory one can, of course, implement biologically realistic neural networks, but this is a mammoth task. All kinds of details have to be gotten right, or you end up with a network that completely decays to unconnectedness, or one that ramps up its connections until it basically has a seizure.

Frontiers in AI
Applications of current algorithms New algorithms for determining parameters from training data
Backward-Forward Backpropagation

Better classification of the mysteries of neural networks Pathology modeling in neural networks Evolutionary modeling

You might also like