Artificial Intelligence and Learning Algorithms: Presented by Brian M. Frezza 12/1/05
Artificial Intelligence and Learning Algorithms: Presented by Brian M. Frezza 12/1/05
Artificial Intelligence and Learning Algorithms: Presented by Brian M. Frezza 12/1/05
Game Plan
Whats a Learning Algorithm? Why should I care?
Biological parallels
Real World Examples Getting our hands dirty with the algorithms
Bayesian Networks Hidden Markov Models Genetic Algorithms Neural Networks
Frontiers in AI
Hard Math
Why do I care?
Use In Informatics
Predict trends in fuzzy data
Subtle patterns in data Complex patterns in data Noisy data
Street Smarts
CMUs Navlab-5 (No Hands Across America)
1995 Neural Network Driven Car Pittsburgh to San Diego: 2,797 miles (98.2%) Single hidden layer backpropagation network!
Protein secondary structure prediction Intron/Exon predictions Protein/Gene network inference Speech recognition Face recognition
The Algorithms
Keeps track of likelihood of each model being accurate as data becomes available
P(H)
Initially clueless:
So P(Ha) = P(Hb) = P(Hc) = 1/3
Redhead 0 Not 0
P(Ha)
Likelihood's
P(Hb)
P(Hc)
1/3
1/3
1/3
Bayesian Network:Trace
History Hypothesis Ha: 100% Redhead Hb: 50% Redhead 50% Not Hc: 100% Not
Redhead 1 Not 0
P(Ha)
Likelihood's
P(Hb)
P(Hc)
1/2
1/2
Redhead 2 Not 0
P(Ha)
Likelihood's
P(Hb)
P(Hc)
3/4
1/4
Redhead 3 Not 0
P(Ha)
Likelihood's
P(Hb)
P(Hc)
7/8
1/8
The Algorithms
HMMs also assume a model of the world working behind the data Models are also extractable Common Uses
Speech Recognition Secondary structure prediction Intron/Exon predictions Categorization of data
P1
Q2
Q1
Q3
1-P3
1-P1-P2
P4
Q4
1-P4
0.6
0.2
0.9
0.4
Q2
1 P2 P3
P1
Q3
1-P3
1-P1-P2
P4
Q4
1-P4
Time Step = 1
Random Number:
0.22341
Q2
1
P2 1-P1-P2 1-P3
So Next State:
0.22341 < P1
Q2
Take P1
P4
Q1
Q3
Q4
1-P4
Time Step = 2
Random Number:
0.64357
Q2
1
P2 1-P1-P2 1-P3
So Next State:
No Choice, P = 1 Q3
P4
Q1
Q3
Q4
1-P4
Time Step = 3
Random Number:
0.97412
Q2
1
P2 1-P1-P2 1-P3
So Next State:
0.97412 > 0.9
Q4
Take 1-P3
P4
Q1
Q3
Q4
1-P4
Time Step = 4
0.6
0.2
0.9
0.4
P1
Q2
1
P2 1-P1-P2 1-P3
P3
Q1
Q3
Q4
1-P4
Metropolis-Hastings
Determining thermodynamic equilibrium
Intron/Exon prediction
Observable: nucleotide sequence Hidden State: Exon, Intron, Non-coding
Observable States
Hidden States
Alpha Helix
Unstructured
Beta Sheet
P(T|It)
P(G|Ex)
P(G|It)
P(C|Ex)
C
P(C|It)
Observable States
Hidden States
P(Ex|Ex) P(In|Ex) P(Ig|It) P(It|It)
Intergenic
Exon
P(Ex|Ig)
P(Itr|Ig)
Intron
P(Ex|It)
P(In|Ex)
P(Ig|Ig)
To Hidden State
Observable
Ex Ig It
From
Starting Distribution
Ex 0.1
Ig 0.89
It 0.01
Viterbi algorithm
Dynamic programming approach
From
Ex
0.7
G C G
G
0.11 0.25 0.5
C
0.14 0.25 0.2
A G A T
Hidden State
Ex Ig It
Starting Distribution
Ex 0.1 Ig 0.89 It 0.01
Exon = P(A|Ex) * Start Exon = 3.3*10-2 Introgenic = P(A|Ig) * Start Ig = 2.2*10-1 Intron = P(A|It) * Start It = 0.14 * 0.01 = 1.4*10-3
From
A
T G
G C G
G
0.11 0.25 0.5
C
0.14 0.25 0.2
A G A
Hidden State
Ex Ig It
T G Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex) = 4.6*10-2 Introgenic =Max( P(Ig|Ex)*Pn-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig) = 2.8*10-2 Intron = Max( P(It|Ex)*Pn-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It) = 1.1*10-3
Starting Distribution
Ex 0.1 Ig 0.89 It 0.01
From
A
T G
G C G
G
0.11 0.25 0.5
C
0.14 0.25 0.2
A G A
Hidden State
Ex Ig It
T G Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex) = 1.1*10-2 Introgenic =Max( P(Ig|Ex)*Pn-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig) = 3.5*10-3 Intron = Max( P(It|Ex)*Pn-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It) = 1.3*10-3
Starting Distribution
Ex 0.1 Ig 0.89 It 0.01
From
A
T G
2.4*10-3
4.3*10-4
2.9*10-4
G C G
G
0.11 0.25 0.5
C
0.14 0.25 0.2
A G A
Hidden State
Ex Ig It
T G
Starting Distribution
Ex 0.1 Ig 0.89 It 0.01 Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex) Introgenic =Max( P(Ig|Ex)*Pn-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig) Intron = Max( P(It|Ex)*Pn-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It)
From
A
T G
2.4*10-3
7.2*10-4
4.3*10-4
6.1*10-5
2.9*10-4
7.8*10-5
G C G
G
0.11 0.25 0.5
C
0.14 0.25 0.2
A G A
Hidden State
Ex Ig It
T G
Starting Distribution
Ex 0.1 Ig 0.89 It 0.01 Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex) Introgenic =Max( P(Ig|Ex)*Pn-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig) Intron = Max( P(It|Ex)*Pn-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It)
From
A
T G
2.4*10-3
7.2*10-4 5.5*10-5
4.3*10-4
6.1*10-5 1.8*10-5
2.9*10-4
7.8*10-5 7.2*10-5
G C G
G
0.11 0.25 0.5
C
0.14 0.25 0.2
A G A
Hidden State
Ex Ig It
T G
Starting Distribution
Ex 0.1 Ig 0.89 It 0.01 Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex) Introgenic =Max( P(Ig|Ex)*Pn-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig) Intron = Max( P(It|Ex)*Pn-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It)
From
A
T G
2.4*10-3
7.2*10-4 5.5*10-5 4.3*10-6
4.3*10-4
6.1*10-5 1.8*10-5 2.2*10-6
2.9*10-4
7.8*10-5 7.2*10-5 2.9*10-5
G C G
G
0.11 0.25 0.5
C
0.14 0.25 0.2
A G A
Hidden State
Ex Ig It
T G
Starting Distribution
Ex 0.1 Ig 0.89 It 0.01 Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex) Introgenic =Max( P(Ig|Ex)*Pn-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig) Intron = Max( P(It|Ex)*Pn-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It)
From
A
T G
2.4*10-3
7.2*10-4 5.5*10-5 4.3*10-6 7.2*10-7 9.1*10-8 1.1*10-7 8.4*-9 4.9*-9 1.4*10-9 1.1*10-10
4.3*10-4
6.1*10-5 1.8*10-5 2.2*10-6 2.8*10-7 3.5*10-8 9.1*10-9 2.7*10-9 4.1*10-10 1.2*10-10 3.6*10-11
2.9*10-4
7.8*10-5 7.2*10-5 2.9*10-5 4.6*10-6 1.8*10-6 2.0*10-7 8.2*10-8 9.2*10-9 1.2*10-9 4.7*10-10
G C G
G
0.11 0.25 0.5
C
0.14 0.25 0.2
A G A
Hidden State
Ex Ig It
T G
Starting Distribution
Ex 0.1 Ig 0.89 It 0.01 Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex) Introgenic =Max( P(Ig|Ex)*Pn-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig) Intron = Max( P(It|Ex)*Pn-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It)
Starts with an initial guess of parameters Refines parameters by attempting to reduce the errors it provokes with fitted to the data.
Normalized probability of the Forward probability of arriving at the state given the observable cross multiplied by the backward probability of generating that observable given the parameter.
The Algorithms
Genetic Algorithms
Individuals are series of bits which represent candidate solutions
Functions Structures Images Code
Genetic Algorithms
Encoding Rules
Gray bit encoding
Bit distance proportional to value distance
Selection Rules
Digital / Analog Threshold Linear Amplification Vs Weighted Amplification
Mating Rules
Mutation parameters Recombination parameters
Genetic Algorithms
When are they useful?
Movements in sequence space are funnel shaped with fitness function
Systems where evolution actually applies!
Examples
Medicinal chemistry Protein folding Amino acid substitutions Membrane trafficking modeling Ecological simulations Linear Programming Traveling salesman
The Algorithms
Neural Networks
1943 McCulloch and Pitts Model of how Neurons process information
Field immediately splits
Studying brains
Neurology
W0,c
(Bias)
Wc,n
Output
Wb,c
In
Logical And
B
W0,c = 1.5
(Bias)
(W)- W0,c
a z
(Bias)
Output
1 0
1 1 0
0 0 0
Wb,c = 1
If ( (w) Wo,c > 0 ) Then FIRE Else
Dont
W0,c = 1.5
(Bias)
-1.5
-1.5 < 0
Off
Off
Wb,c = 1
W0,c = 1.5
(Bias)
-0.5
-0.5 < 0
Off
Off
Wb,c = 1
W0,c = 1.5
(Bias)
-0.5
-0.5 < 0
Off
On
Wb,c = 1
W0,c = 1.5
(Bias)
0.5
0.5 > 0
On
On
Wb,c = 1
W0,c = 0.5
Logical Or B
(Bias)
(W)- W0,c
a z
(Bias)
U 1 A 0
1 1 1
0 1 0
Wb,c = 1
If ( (w) Wo,c > 0 ) Then FIRE Else
Dont
Or Gate: Trace
Off
Wa,c = 1
W0,c = 0.5
(Bias)
-0.5
-0.5 < 0
Off
Off
Wb,c = 1
Or Gate: Trace
On
Wa,c = 1
W0,c = 0.5
(Bias)
0.5
0.5 > 0
On
Off
Wb,c = 1
Or Gate: Trace
Off
Wa,c = 1
W0,c = 0.5
(Bias)
0.5
0.5 > 0
On
On
Wb,c = 1
Or Gate: Trace
On
Wa,c = 1
W0,c = 0.5
(Bias)
1.5
1.5 > 0
On
On
Wb,c = 1
Logical Not
W0,c = -0.5
(Bias)
(W)- W0,c
a z
(Bias)
1 ! 0
0 1
If ( (w) Wo,c > 0 ) Then FIRE Else
Dont
W0,c = -0.5
(Bias)
-0.5
0.5 > 0
On
0 (-0.5) = 0.5
W0,c = -0.5
(Bias)
-0.5
-0.5 < 0
Off
-1 (-0.5) = -0.5
Recurrent
Cyclic connections Dynamic behavior
Stable Oscillatory Chaotic
Feed-Forward Networks
Knowledge is represented by weight on edges
Modeless!
Layers
Input
Output
Hidden layer
Perceptron Learning
Gradient Decent used to reduce error
Essentially:
New Weight = Old Weight + adjustment Adjustment = X error X input X d(activation function)
= Learning Rate
Essentially:
Start with Gradient Decent from output Assign blame to inputting neurons proportional to their weights Adjust weights at previous level using Gradient decent based on blame
Minimum connectivity
Optimal Brain Damage Algorithm
No extractable model!
Frontiers in AI
Applications of current algorithms New algorithms for determining parameters from training data
Backward-Forward Backpropagation
Better classification of the mysteries of neural networks Pathology modeling in neural networks Evolutionary modeling