CSE 473: Ar+ficial Intelligence: Bayes' Nets

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

CSE

473: Ar+cial Intelligence


Bayes Nets

Daniel Weld
[Most slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at hNp://ai.berkeley.edu.]

Hidden Markov Models


Two random variable at each +me step
Hidden state, Xi
Observa+on, Ei

Condi+onal Independences
Dynamics dont change

X1

X2

X3

X4

XN5

E1

E2

E3

E4

EN5

E.g., P(X2 | X1) = P(X18 | X17)

Example

An HMM is dened by:


Ini+al distribu+on:
Transi+ons:
Emissions:

HMM Computa+ons
Given
parameters
evidence E1:n =e1:n
Inference problems include:
Filtering, find P(Xt|e1:t) for all t
Smoothing, find P(Xt|e1:n) for all t
Most probable explanation, find
x*1:n = argmaxx1:n P(x1:n|e1:n)

Base Case Inference (In Forward Algorithm)


Observation

Passage of Time

X1
X1

X2

E1

Par+cles Filtering: Representa+on


Represent P(X) with a list of N par+cles (samples), Generally, N << |X|
E.g. P(ghost@(3.3)) = 5/10 = 0.5

P(x)
Distribution
Par+cles: (3,3)

(2,3)

(3,3)

(3,2)

(3,3)

(3,2)

(1,2)

(3,3)

(3,3)

(2,3)

Par+cle Filtering: Summary


Par+cles: track samples of states rather than an explicit distribu+on
Elapse

Par+cles:
(3,3)
(2,3)
(3,3)
(3,2)
(3,3)
(3,2)
(1,2)
(3,3)
(3,3)
(2,3)

Weight

Par+cles:
(3,2)
(2,3)
(3,2)
(3,1)
(3,3)
(3,2)
(1,3)
(2,3)
(3,2)
(2,2)

Resample

Par+cles:
(3,2) w=.9
(2,3) w=.2
(3,2) w=.9
(3,1) w=.4
(3,3) w=.4
(3,2) w=.9
(1,3) w=.1
(2,3) w=.2
(3,2) w=.9
(2,2) w=.4

(New) Par+cles:
(3,2)
(2,2)
(3,2)
(2,3)
(3,3)
(3,2)
(1,3)
(2,3)
(3,2)
(3,2)

[Demos: ghostbusters par+cle ltering (L15D3,4,5)]

Which Algorithm?
Particle filter, uniform initial beliefs, 25 particles

Which Algorithm?
Particle filter, uniform initial beliefs, 300 particles

Which Algorithm?
Exact filter, uniform initial beliefs

Complexity of the Forward Algorithm?


We are given evidence at each +me and want to know
If only need P(x|e) at the
end, only normalize there

We use the single (+me-passage+observa+on) updates:

Complexity? O(|X|2) +me & O(X) space



But |X| is exponen&al in the number of state variables L

Why Does |X| Grow?


1 Ghost: k (eg 9) possible posi+ons in maze
2 Ghosts: k2 combina+ons
N Ghosts: kN combina+ons

13

Joint Distribu+on for Snapshot of World


It gets big

P

Q
R

0.1

0.05

0.2

0.07

0.03

F
http://img4.wikia.nocookie.net/__cb20090430175407/monster/images/9/92/Basilisk.jpg

0.05 0.1

0.3
15

The Sword of Condi+onal Independence!


Slay
the
Basilisk!

I am a BIG joint
distribution!

harrypotter.wikia.com/

Means:
Or, equivalently:
16

HMM Condi+onal Independence


HMMs have two important independence proper+es:

Markov hidden process, future depends on past via the present


?

X1

X2

X3

X4

E1

E1

E3

E4

HMM Condi+onal Independence


HMMs have two important independence proper+es:

Markov hidden process, future depends on past via the present


Current observa+on independent of all else given current state
?
X1

X2

X3

E1

E1

E3

X4

E4

Condi+onal Independence in Snapshot


Can we do something here?
Factor X into product of (condi+onally) independent random vars?

Maybe also factor E

X3
E3

Yes! with Bayes Nets

X3

Dynamic Bayes Nets

Dynamic Bayes Nets (DBNs)


We want to track mul+ple variables over +me, using
mul+ple sources of evidence
Idea: Repeat a xed Bayes net structure at each +me
Variables from +me t can condi+on on those from t-1
t =1

t =2

G1a

E1a

G3a

G2a
G1b
E1b

t =3

G3b

G2b
E2a

E2b

E3a

E3b

Dynamic Bayes nets are a generaliza+on of HMMs


[Demo: pacman sonar ghost DBN model (L15D6)]

10

DBN Par+cle Filters


A par+cle is a complete sample for a +me step
Ini/alize: Generate prior samples for the t=1 Bayes net
Example par+cle: G1a = (3,3) G1b = (5,3)
Elapse /me: Sample a successor for each par+cle
Example successor: G2a = (2,3) G2b = (6,3)
Observe: Weight each en)re sample by the likelihood of the evidence condi+oned on
the sample
Likelihood: P(E1a |G1a ) * P(E1b |G1b )
Resample: Select prior samples (tuples of values) in propor+on to their likelihood

Probabilis+c Models
Models describe how (a por+on of) the world works

Models are always simplica+ons
May not account for every variable
May not account for all interac+ons between variables
All models are wrong; but some are useful.
George E. P. Box

What do we do with probabilis+c models?

We (or our agents) need to reason about unknown


variables, given evidence
Example: explana+on (diagnos+c reasoning)
Example: predic+on (causal reasoning)
Example: value of informa+on

11

Independence

Independence
Two variables are independent if:

This says that their joint distribu+on factors into a product two
simpler distribu+ons

Another form:

We write:

Independence is a simplifying modeling assump)on


Empirical joint distribu+ons: at best close to independent
What could we assume for {Weather, Trac, Cavity, Toothache}?

12

Example: Independence?

hot

0.5

cold

0.5

hot

sun

0.4

hot

rain

0.1

cold

sun

0.2

cold

rain

0.3
W

sun

0.6

rain

0.4

hot

sun

0.3

hot

rain

0.2

cold

sun

0.3

cold

rain

0.2

Example: Independence
N fair, independent coin ips:
H

0.5

0.5

0.5

0.5

0.5

0.5

13

Condi+onal Independence
P(Toothache, Cavity, Catch)
If I have a cavity, the probability that the probe catches in it
doesn't depend on whether I have a toothache:
P(+catch | +toothache, +cavity) = P(+catch | +cavity)

The same independence holds if I dont have a cavity:


P(+catch | +toothache, -cavity) = P(+catch| -cavity)

Catch is condi)onally independent of Toothache given Cavity:


P(Catch | Toothache, Cavity) = P(Catch | Cavity)

Equivalent statements:

P(Toothache | Catch , Cavity) = P(Toothache | Cavity)


P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)
One can be derived from the other easily

Condi+onal Independence
Uncondi+onal (absolute) independence very rare (why?)
Condi)onal independence is our most basic and robust form
of knowledge about uncertain environments.
X is condi+onally independent of Y given Z
if and only if:


or, equivalently, if and only if

14

Condi+onal Independence
What about this domain:
Trac
Umbrella
Raining

Condi+onal Independence and the Chain Rule


Chain rule:
Trivial decomposi+on:

With assump+on of condi+onal independence:

Bayesnets / graphical models help us express condi+onal independence assump+ons

15

Ghostbusters Chain Rule


Each sensor depends only
on where the ghost is

P(T,B,G) = P(G) P(T|G) P(B|G)

That means, the two sensors are


condi+onally independent, given the
ghost posi+on
T: Top square is red
B: BoNom square is red
G: Ghost is in the top

Givens:
P( +g ) = 0.5
P( +t | +g ) = 0.8
P( +t | -g ) = 0.4
P( +b | +g ) = 0.4
P( +b | -g ) = 0.8

P(T,B,G)

+t

+b

+g

0.16

+t

+b

-g

0.16

+t

-b

+g

0.24

+t

-b

-g

0.04

-t

+b

+g

0.04

-t

+b

-g

0.24

-t

-b

+g

0.06

-t

-b

-g

0.06

Number of Parameters?

BayesNets: Big Picture

16

Bayes Nets: Big Picture


Two problems with using full joint distribu+on tables
as our probabilis+c models:
Unless there are only a few variables, the joint is WAY too
big to represent explicitly
Hard to learn (es+mate) anything empirically about more
than a few variables at a +me

Bayes nets: a technique for describing complex joint


distribu+ons (models) using simple, local
distribu+ons (condi+onal probabili+es)
More properly called graphical models
We describe how variables locally interact
Local interac+ons chain together to give global, indirect
interac+ons
For about 10 min, well be vague about how these
interac+ons are specied

Example Bayes Net: Insurance

17

Example Bayes Net: Car

Graphical Model Nota+on


Nodes: variables (with domains)

Can be assigned (observed) or unassigned


(unobserved)

Arcs: interac+ons

Similar to CSP constraints


Indicate direct inuence between variables
Formally: encode condi+onal independence
(more later)

For now: imagine that arrows mean


direct causa+on (in general, they dont!)

18

Example: Coin Flips


N independent coin ips

X1

X2

Xn

No interac+ons between variables: absolute independence

Example: Trac
Variables:
R: It rains
T: There is trac

Model 1: independence




Model 2: rain causes trac

Why is an agent using model 2 beNer?

19

Example: Trac II
Lets build a causal graphical model!
Variables

T: Trac
R: It rains
L: Low pressure
D: Roof drips
B: Ballgame
C: Cavity
L
B
R

D
T

Example: Alarm Network


Variables

B: Burglary
A: Alarm goes o
M: Mary calls
J: John calls
E: Earthquake!
B
E
A

20

Bayes Net Seman+cs

Bayes Net Seman+cs


A set of nodes, one per variable X
A directed, acyclic graph

P(A1 ) .
A1

P(An )
An

A condi+onal distribu+on for each node


A collec+on of distribu+ons over X, one for each
combina+on of parents values

CPT: condi+onal probability table


Descrip+on of a noisy causal process

A Bayes net = Topology (graph) + Local Condi)onal Probabili)es

21

Probabili+es in BNs
Bayes nets implicitly encode joint distribu+ons
As a product of local condi+onal distribu+ons
To see what probability a BN gives to a full assignment, mul+ply all the
relevant condi+onals together:

Example:

Probabili+es in BNs
Why are we guaranteed that seng

results in a proper joint distribu+on?


Chain rule (valid for all distribu+ons):
Assume condi+onal independences:
Consequence:

Not every BN can represent every joint distribu+on
The topology enforces certain condi+onal independencies

22

Example: Coin Flips


X1

X2

Xn

0.5

0.5

0.5

0.5

0.5

0.5

Only distribu)ons whose variables are absolutely independent can be


represented by a Bayes net with no arcs.

Example: Trac

+r

T
-r

+r

1/4

-r

3/4

+t

3/4

-t

1/4

+t

1/2

-t

1/2

23

Example: Alarm Network


B

P(B)
Burglary

+b 0.001
-b

Earthqk

P(E)

+e 0.002
-e

0.999

0.998

Alarm

John
calls
A

Mary
calls

P(J|A)

P(A|B,E)

+b

+e

+a

0.95

+b

+e

-a

0.05

+b

-e

+a

0.94

P(M|A)

+b

-e

-a

0.06

+e

+a

0.29

+a

+j

0.9

+a

+m

0.7

-b

+a

-j

0.1

+a

-m

0.3

-b

+e

-a

0.71

-a

+j

0.05

-a

+m

0.01

-b

-e

+a

0.001

-a

-j

0.95

-a

-m

0.99

-b

-e

-a

0.999

Example: Trac
Causal direc+on

+r

T
-r

+r

1/4

-r

3/4

+t

3/4

-t

1/4

+t

1/2

-t

1/2

+r

+t

3/16

+r

-t

1/16

-r

+t

6/16

-r

-t

6/16

24

Example: Reverse Trac


Reverse causality?

+t

9/16

-t

7/16
+r

+t

R
-t

+r

1/3

-r

2/3

+r

1/7

-r

6/7

+t

3/16

+r

-t

1/16

-r

+t

6/16

-r

-t

6/16

Causality?
When Bayes nets reect the true causal paNerns:
Oen simpler (nodes have fewer parents)
Oen easier to think about
Oen easier to elicit from experts

BNs need not actually be causal


Some+mes no causal net exists over the domain
(especially if variables are missing)
E.g. consider the variables Trac and Drips
End up with arrows that reect correla+on, not causa+on

What do the arrows really mean?


Topology may happen to encode causal structure
Topology really encodes condi+onal independence

25

Bayes Nets
So far: how a Bayes net encodes a joint
distribu+on
Next: how to answer queries about that
distribu+on
Today:

First assembled BNs using an intui+ve no+on of


condi+onal independence as causality
Then saw that key property is condi+onal independence

Main goal: answer queries about condi+onal


independence and inuence

Aer that: how to answer numerical queries


(inference)

26

You might also like