CSE 473: Ar+ficial Intelligence: Bayes' Nets

CSE
473: Ar+cial Intelligence

Bayes Nets
Daniel Weld
[Most slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at hNp://ai.berkeley.edu.]
Hidden Markov Models

Two random variable at each +me step
Hidden state, Xi
Observa+on, Ei
Condi+onal Independences
Dynamics dont change
X1
X2
X3
X4
XN5
E1
E2
E3
E4
EN5
E.g., P(X2 | X1) = P(X18 | X17)
Example
An HMM is dened by:

Ini+al distribu+on:
Transi+ons:
Emissions:
HMM Computa+ons
Given
parameters
evidence E1:n =e1:n
Inference problems include:
Filtering, find P(Xt|e1:t) for all t
Smoothing, find P(Xt|e1:n) for all t
Most probable explanation, find
x*1:n = argmaxx1:n P(x1:n|e1:n)
Base Case Inference (In Forward Algorithm)

Observation
Passage of Time
X1
X1
X2
E1
Par+cles Filtering: Representa+on

Represent P(X) with a list of N par+cles (samples), Generally, N << |X|
E.g. P(ghost@(3.3)) = 5/10 = 0.5
P(x)
Distribution
Par+cles: (3,3)

(2,3)

(3,3)

(3,2)

(3,3)

(3,2)

(1,2)

(3,3)

(3,3)

(2,3)
Par+cle Filtering: Summary

Par+cles: track samples of states rather than an explicit distribu+on
Elapse
Par+cles:
(3,3)
(2,3)
(3,3)
(3,2)
(3,3)
(3,2)
(1,2)
(3,3)
(3,3)
(2,3)
Weight
Par+cles:
(3,2)
(2,3)
(3,2)
(3,1)
(3,3)
(3,2)
(1,3)
(2,3)
(3,2)
(2,2)
Resample
Par+cles:
(3,2) w=.9
(2,3) w=.2
(3,2) w=.9
(3,1) w=.4
(3,3) w=.4
(3,2) w=.9
(1,3) w=.1
(2,3) w=.2
(3,2) w=.9
(2,2) w=.4
(New) Par+cles:
(3,2)
(2,2)
(3,2)
(2,3)
(3,3)
(3,2)
(1,3)
(2,3)
(3,2)
(3,2)
[Demos: ghostbusters par+cle ltering (L15D3,4,5)]
Which Algorithm?
Particle filter, uniform initial beliefs, 25 particles
Which Algorithm?
Particle filter, uniform initial beliefs, 300 particles
Which Algorithm?
Exact filter, uniform initial beliefs
Complexity of the Forward Algorithm?

We are given evidence at each +me and want to know
If only need P(x|e) at the
end, only normalize there
We use the single (+me-passage+observa+on) updates:
Complexity? O(|X|2) +me & O(X) space

But |X| is exponen&al in the number of state variables L
Why Does |X| Grow?

1 Ghost: k (eg 9) possible posi+ons in maze
2 Ghosts: k2 combina+ons
N Ghosts: kN combina+ons
13
Joint Distribu+on for Snapshot of World

It gets big

P
Q
R
0.1
0.05
0.2
0.07
0.03
F
http://img4.wikia.nocookie.net/__cb20090430175407/monster/images/9/92/Basilisk.jpg
0.05 0.1
0.3
15
The Sword of Condi+onal Independence!

Slay
the
Basilisk!
I am a BIG joint
distribution!
harrypotter.wikia.com/
Means:
Or, equivalently:
16
HMM Condi+onal Independence

HMMs have two important independence proper+es:
Markov hidden process, future depends on past via the present

?
X1
X2
X3
X4
E1
E1
E3
E4
HMM Condi+onal Independence

HMMs have two important independence proper+es:
Markov hidden process, future depends on past via the present

Current observa+on independent of all else given current state
?
X1
X2
X3
E1
E1
E3
X4
E4
Condi+onal Independence in Snapshot

Can we do something here?
Factor X into product of (condi+onally) independent random vars?

Maybe also factor E
X3
E3
Yes! with Bayes Nets
X3
Dynamic Bayes Nets
Dynamic Bayes Nets (DBNs)

We want to track mul+ple variables over +me, using
mul+ple sources of evidence
Idea: Repeat a xed Bayes net structure at each +me
Variables from +me t can condi+on on those from t-1
t =1
t =2
G1a
E1a
G3a
G2a
G1b
E1b
t =3
G3b
G2b
E2a
E2b
E3a
E3b
Dynamic Bayes nets are a generaliza+on of HMMs

[Demo: pacman sonar ghost DBN model (L15D6)]
10
DBN Par+cle Filters

A par+cle is a complete sample for a +me step
Ini/alize: Generate prior samples for the t=1 Bayes net
Example par+cle: G1a = (3,3) G1b = (5,3)
Elapse /me: Sample a successor for each par+cle
Example successor: G2a = (2,3) G2b = (6,3)
Observe: Weight each en)re sample by the likelihood of the evidence condi+oned on
the sample
Likelihood: P(E1a |G1a ) * P(E1b |G1b )
Resample: Select prior samples (tuples of values) in propor+on to their likelihood
Probabilis+c Models
Models describe how (a por+on of) the world works

Models are always simplica+ons
May not account for every variable
May not account for all interac+ons between variables
All models are wrong; but some are useful.
George E. P. Box
What do we do with probabilis+c models?
We (or our agents) need to reason about unknown

variables, given evidence
Example: explana+on (diagnos+c reasoning)
Example: predic+on (causal reasoning)
Example: value of informa+on
11
Independence
Independence
Two variables are independent if:
This says that their joint distribu+on factors into a product two
simpler distribu+ons
Another form:
We write:
Independence is a simplifying modeling assump)on

Empirical joint distribu+ons: at best close to independent
What could we assume for {Weather, Trac, Cavity, Toothache}?
12
Example: Independence?
hot
0.5
cold
0.5
hot
sun
0.4
hot
rain
0.1
cold
sun
0.2
cold
rain
0.3
W
sun
0.6
rain
0.4
hot
sun
0.3
hot
rain
0.2
cold
sun
0.3
cold
rain
0.2
Example: Independence
N fair, independent coin ips:
H
0.5
0.5
0.5
0.5
0.5
0.5
13
Condi+onal Independence
P(Toothache, Cavity, Catch)
If I have a cavity, the probability that the probe catches in it
doesn't depend on whether I have a toothache:
P(+catch | +toothache, +cavity) = P(+catch | +cavity)
The same independence holds if I dont have a cavity:

P(+catch | +toothache, -cavity) = P(+catch| -cavity)
Catch is condi)onally independent of Toothache given Cavity:

P(Catch | Toothache, Cavity) = P(Catch | Cavity)
Equivalent statements:
P(Toothache | Catch , Cavity) = P(Toothache | Cavity)

P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)
One can be derived from the other easily
Uncondi+onal (absolute) independence very rare (why?)
Condi)onal independence is our most basic and robust form
of knowledge about uncertain environments.
X is condi+onally independent of Y given Z
if and only if:

or, equivalently, if and only if
14
What about this domain:
Trac
Umbrella
Raining
Condi+onal Independence and the Chain Rule

Chain rule:
Trivial decomposi+on:
With assump+on of condi+onal independence:
Bayesnets / graphical models help us express condi+onal independence assump+ons
15
Ghostbusters Chain Rule

Each sensor depends only
on where the ghost is
P(T,B,G) = P(G) P(T|G) P(B|G)
That means, the two sensors are

condi+onally independent, given the
ghost posi+on
T: Top square is red
B: BoNom square is red
G: Ghost is in the top

Givens:
P( +g ) = 0.5
P( +t | +g ) = 0.8
P( +t | -g ) = 0.4
P( +b | +g ) = 0.4
P( +b | -g ) = 0.8
P(T,B,G)
+t
+b
+g
0.16
+t
+b
-g
0.16
+t
-b
+g
0.24
+t
-b
-g
0.04
-t
+b
+g
0.04
-t
+b
-g
0.24
-t
-b
+g
0.06
-t
-b
-g
0.06
Number of Parameters?
BayesNets: Big Picture
16
Bayes Nets: Big Picture

Two problems with using full joint distribu+on tables
as our probabilis+c models:
Unless there are only a few variables, the joint is WAY too
big to represent explicitly
Hard to learn (es+mate) anything empirically about more
than a few variables at a +me
Bayes nets: a technique for describing complex joint

distribu+ons (models) using simple, local
distribu+ons (condi+onal probabili+es)
More properly called graphical models
We describe how variables locally interact
Local interac+ons chain together to give global, indirect
interac+ons
For about 10 min, well be vague about how these
interac+ons are specied
Example Bayes Net: Insurance
17
Example Bayes Net: Car
Graphical Model Nota+on

Nodes: variables (with domains)
Can be assigned (observed) or unassigned

(unobserved)
Arcs: interac+ons
Similar to CSP constraints

Indicate direct inuence between variables
Formally: encode condi+onal independence
(more later)
For now: imagine that arrows mean

direct causa+on (in general, they dont!)
18
Example: Coin Flips

N independent coin ips
X1
X2
Xn
No interac+ons between variables: absolute independence
Example: Trac
Variables:
R: It rains
T: There is trac
Model 1: independence

Model 2: rain causes trac
Why is an agent using model 2 beNer?
19
Example: Trac II
Lets build a causal graphical model!
Variables
T: Trac
R: It rains
L: Low pressure
D: Roof drips
B: Ballgame
C: Cavity
L
B
R
D
T
Example: Alarm Network

Variables
B: Burglary
A: Alarm goes o
M: Mary calls
J: John calls
E: Earthquake!
B
E
A
20
Bayes Net Seman+cs
Bayes Net Seman+cs

A set of nodes, one per variable X
A directed, acyclic graph
P(A1 ) .
A1
P(An )
An
A condi+onal distribu+on for each node

A collec+on of distribu+ons over X, one for each
combina+on of parents values
CPT: condi+onal probability table

Descrip+on of a noisy causal process
A Bayes net = Topology (graph) + Local Condi)onal Probabili)es
21
Probabili+es in BNs
Bayes nets implicitly encode joint distribu+ons
As a product of local condi+onal distribu+ons
To see what probability a BN gives to a full assignment, mul+ply all the
relevant condi+onals together:
Example:
Probabili+es in BNs
Why are we guaranteed that seng
results in a proper joint distribu+on?

Chain rule (valid for all distribu+ons):
Assume condi+onal independences:
Consequence:

Not every BN can represent every joint distribu+on
The topology enforces certain condi+onal independencies
22
Example: Coin Flips

X1
X2
Xn
0.5
0.5
0.5
0.5
0.5
0.5
Only distribu)ons whose variables are absolutely independent can be

represented by a Bayes net with no arcs.
Example: Trac
+r
T
-r
+r
1/4
-r
3/4
+t
3/4
-t
1/4
+t
1/2
-t
1/2
23
Example: Alarm Network

B
P(B)
Burglary
+b 0.001
-b
Earthqk
P(E)
+e 0.002
-e
0.999
0.998
Alarm
John
calls
A
Mary
calls
P(J|A)
P(A|B,E)
+b
+e
+a
0.95
+b
+e
-a
0.05
+b
-e
+a
0.94
P(M|A)
+b
-e
-a
0.06
+e
+a
0.29
+a
+j
0.9
+a
+m
0.7
-b
+a
-j
0.1
+a
-m
0.3
-b
+e
-a
0.71
-a
+j
0.05
-a
+m
0.01
-b
-e
+a
0.001
-a
-j
0.95
-a
-m
0.99
-b
-e
-a
0.999
Example: Trac
Causal direc+on
+r
T
-r
+r
1/4
-r
3/4
+t
3/4
-t
1/4
+t
1/2
-t
1/2
+r
+t
3/16
+r
-t
1/16
-r
+t
6/16
-r
-t
6/16
24
Example: Reverse Trac

Reverse causality?
+t
9/16
-t
7/16
+r
+t
R
-t
+r
1/3
-r
2/3
+r
1/7
-r
6/7
+t
3/16
+r
-t
1/16
-r
+t
6/16
-r
-t
6/16
Causality?
When Bayes nets reect the true causal paNerns:
Oen simpler (nodes have fewer parents)
Oen easier to think about
Oen easier to elicit from experts
BNs need not actually be causal

Some+mes no causal net exists over the domain
(especially if variables are missing)
E.g. consider the variables Trac and Drips
End up with arrows that reect correla+on, not causa+on
What do the arrows really mean?

Topology may happen to encode causal structure
Topology really encodes condi+onal independence
25
Bayes Nets
So far: how a Bayes net encodes a joint
distribu+on
Next: how to answer queries about that
distribu+on
Today:
First assembled BNs using an intui+ve no+on of

condi+onal independence as causality
Then saw that key property is condi+onal independence
Main goal: answer queries about condi+onal

independence and inuence
Aer that: how to answer numerical queries

(inference)
26

CSE 473: Ar+ficial Intelligence: Bayes' Nets

Uploaded by

Copyright:

Available Formats

CSE 473: Ar+ficial Intelligence: Bayes' Nets

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CSE 473: Ar+ficial Intelligence: Bayes' Nets

Uploaded by

Copyright:

Available Formats

CSE

473: Ar+cial Intelligence

Hidden Markov Models

E.g., P(X2 | X1) = P(X18 | X17)

An HMM is dened by:

Base Case Inference (In Forward Algorithm)

Par+cles Filtering: Representa+on

Par+cle Filtering: Summary

[Demos: ghostbusters par+cle ltering (L15D3,4,5)]

Complexity of the Forward Algorithm?

We use the single (+me-passage+observa+on) updates:

Complexity? O(|X|2) +me & O(X) space

Why Does |X| Grow?

Joint Distribu+on for Snapshot of World

The Sword of Condi+onal Independence!

HMM Condi+onal Independence

Markov hidden process, future depends on past via the present

HMM Condi+onal Independence

Markov hidden process, future depends on past via the present

Condi+onal Independence in Snapshot

Yes! with Bayes Nets

Dynamic Bayes Nets

Dynamic Bayes Nets (DBNs)

Dynamic Bayes nets are a generaliza+on of HMMs

DBN Par+cle Filters

What do we do with probabilis+c models?

We (or our agents) need to reason about unknown

Independence is a simplifying modeling assump)on

The same independence holds if I dont have a cavity:

Catch is condi)onally independent of Toothache given Cavity:

P(Toothache | Catch , Cavity) = P(Toothache | Cavity)

Condi+onal Independence and the Chain Rule

With assump+on of condi+onal independence:

Bayesnets / graphical models help us express condi+onal independence assump+ons

Ghostbusters Chain Rule

P(T,B,G) = P(G) P(T|G) P(B|G)

That means, the two sensors are

BayesNets: Big Picture

Bayes Nets: Big Picture

Bayes nets: a technique for describing complex joint

Example Bayes Net: Insurance

Example Bayes Net: Car

Graphical Model Nota+on

Can be assigned (observed) or unassigned

Similar to CSP constraints

For now: imagine that arrows mean

Example: Coin Flips

No interac+ons between variables: absolute independence

Why is an agent using model 2 beNer?

Example: Alarm Network

Bayes Net Seman+cs

Bayes Net Seman+cs

A condi+onal distribu+on for each node

CPT: condi+onal probability table

A Bayes net = Topology (graph) + Local Condi)onal Probabili)es

results in a proper joint distribu+on?

Example: Coin Flips

Only distribu)ons whose variables are absolutely independent can be