Profile HMM

0.
Multiple Sequence Alignment

using Profile HMM
based on Chapter 5 and Section 6.5
from
Biological Sequence Analysis
by R. Durbin et al., 1998
Acknowledgements:
M.Sc. students Beatrice Miron,
Oana R
atoi, Diana Popovici
1.
PLAN
1. From profiles to Profile HMMs
2. Setting the parameters of a profile HMM;
the optimal (MAP) model construction
3. Basic algorithms for profile HMMs
4. Profile HMM training from unaligned sequences:
Getting the model and the multiple alignment simultaneously
5. Profile HMM variants for non-global alignments
6. Weighting the training sequences
2.
1 From profiles to Profile HMMs

Problem
Given a multiple alignment (obtained either manually or using one of the
methods presented in Ch. 6 and Ch. 7), and the profile associated to
a set of marked (X = match) columns,
design a HMM that would perform sequence alignments to that profile.
Example
bat
rat
cat
gnat
goat
X
A
A
A
A
X
G
G
G
.
A
A
-
.
G
A
A
-
.
A
A
-
X
C
C
C
C
A
C
G
T
-
1
4/5
0
0
0
1/5
2 . . . 3
0
0
0
4/5
3/5
0
0
0
2/5
1/5
3.
Building up a solution
At first sight, not taking into account gaps:
Mj
Begin
End
What about insert residues?

Ij
Begin
Mj
End
4.
What about gaps?
Begin
Mj
End
A better treatment of gaps:

Dj
Begin
Mj
End
5.
Could we put it all together?

Dj
Ij
Begin
Mj
End
Transition structure of a profile HMM
6.
Does it work?
bat
rat
cat
gnat
goat
X
A
A
A
A
1
X
G
G
G
2
.
A
A
.
.
G
A
A
.
.
A
A
.
X
C
C
C
C
3
Begin
End
7.
Any resemblance to pair HMMs?
12
Begin
1
X
1
x y
12
qx i
1
Y
End
qy
Doesnt seem so...
8.
However, remember...
An example of the state assignments for global alignment using the affine gap model:
Ix
Ix
Iy
Ix
Iy
Iy
Iy
Iy
Ix
Ix
Ix
Ix
Iy
Ix
Iy
When making the extension to multiple sequence alignment,

think of generating only one string (instead of a pair of strings);
use Ix for inserting residues, and Iy to produce a gap;
use one triplet of states (M, Ix , Iy ) for each column in the alignment;
finally define (appropriate) edges in the resulting FSA.
Iy
9.
Consequence
It shouldnt be difficult to re-write
the basic HMM algorithms
for profile HMMs!
one moment, though...
10.
2 Setting the parameters of a Profile HMM

2.1 Using Maximum Likelihood Estimates (MLE) for
transition and emission probabilities
For instance assuming a given multiple alignment with
match states marked (X) , the emission probabilities are
computed as
cja
eMj (a) = P
a cja
where cja is the observed frequency of residue a in the
column j of the multiple alignment.
11.
Counts for our example

X
bat A
rat A
cat A
gnat goat A
1
X
G
G
G
2
.
A
A
.
.
G
A
A
.
Begin
M
1
.
A
A
.
match
emissions
X
C
C
C
C
3
insert
emissions
M
2
M
3
state
transitions
End
MM
MD
MI
IM
ID
II
DM
DD
DI
What about zero counts, i.e. unseen emissions/transitions?

One solution:
use pseudocounts (generalising Laplaces law...):
cja + Aqa
eMj (a) = P
a cja + A
where A is a weight put on pseudocounts as compared to real counts

(cja), and qa is the frequency with which a appears in a random model.
A = 20 works well for protein alignments.
Note: At the intuitive level, using pseudocount makes a lot of sense:

eMj (a) is approximately equal to qa if very little data is available,
i.e. all real counts are very small compared to A;
when a large amount of data is available, eMj (a) is essentially equal
to the maximum likelihood solution.
For
other solutions (e.g. Dirichlet mixtures, substitution matrix mixtures, estimation based on an ancestor), you may see Durbin et al.,
1998, Section 6.5.
12.
13.
2.2 Setting the L parameter

The process of model construction represents a way to decide
which columns of the alignment should be assigned to match states,
and which to insert states.
There are 2L combinations of marking for alignment of L columns, and
hence 2L different profile HMMs to choose from.
In a marked column, symbols are assigned to match states and gaps
are assigned to delete states
In an unmarked column, symbols are assigned to insert states and gaps
are ignored.
There are at least tree ways to determine the marking:
manual construction: the user marks alignment columns by hand;
heuristic construction: e.g. a column might be marked when the
proportion of gap symbols in it is below a certain threshold;
Maximum A Posteriori (MAP) model construction: next slides.
14.
The MAP (maximum a posteriori) model construction

Objective: we search for the model that maximises the likelihood of
the given data, namely:
argmax P (C | )
where C is a set of aligned sequences.

Note: The sequences in C are assumed to be statistically independent.
Idea: The MAP model construction algorithm recursively calculates
Sj , the log probability of the optimal model for the alignment up to
and including column j, assuming that the column j is marked.
More specifically: Sj is calculated from smaller subalignments ending
at a marked column i, by incrementing Si with the summed log probability of the transitions and emissions for the columns between i and
j.
15.
MAP model construction algorithm: Notations

cxy the observed state transition counts
axy the transition probabilities, estimated from the cxy in the usual
fashion (MLE)
cxy
axy = P
y cxy
Sj the log probability of the optimal model for the alignment up to

and including column j, assuming that column j is marked
Tij the summed log probability of all the state transitions between
marked columns i and j
X
Tij =
cxy log axy
x,yM,D,I
Mj the log probability contribution for match state symbol emissions in the column j
Li,j the log probability contribution for the insert state symbol
emissions for the columns i + 1, . . . , j 1 (for j i > 1).
The MAP model construction algorithm

Initialization: S0 = 0, ML+1 = 0
Recurrence:
for j = 1, . . . , L + 1
Sj = max0i<j Si + Tij + Mj + Li+1,j1 +
j = arg max0i<j Si + Tij + Mj + Li+1,j1 +
Traceback:
from j = L+1 , while j > 0:
mark column j as a match column; j = j
Complexity:
O(L) in memory and O(L2 ) in time for an alignment of L columns...
with some care in implementation!
Note: is a penalty used to favour models with fewer match states. In
Bayesian terms, is the log of the prior probability of marking each column. It implies a simple but adequate exponentially decreasing prior distribution over model lengths.
16.
17.
3 Basic algorithms for Profile HMMs

Notations
vMj (i) the probability of the best path matching the subsequence
x1...i to the (profile) submodel up to the column j, ending with xi being
emitted by the state Mj ;
vIj (i) the probability of the best path ending in xi being emitted by
Ij ;
vDj (i) the probability of the best path ending in Dj (xi being the
last character emitted before Dj ).
VMj (i), VIj (i), VDj (i) the log-odds scores corresponding respectively
to vMj (i), vIj (i), vDj (i).
fMj (i) the combined probability of all alignments up to xi that end
in state Mj , and similarly fIj (i), fDj (i).
bMj (i), bIj (i), bDj (i) the corresponding backward probabilities.
The Viterbi algorithm for profile HMM

Initialization:
rename the Begin state as M0 , and set vM0 (0) = 1;
rename the End state as ML+1
Recursion:
vMj1 (i 1) aMj1 Mj
vMj (i) = eMj (xi ) max vIj1 (i 1) aIj1 Mj
vDj1 (i 1) aDj1 Mj
vMj (i 1) aMj Ij ,
vIj (i) = eIj (xi ) max vIj (i 1) aIj Ij
vDj (i 1) aDj Ij
vMj1 (i) aMj1 Dj

vDj (i) = max vIj1 (i) aIj1 Dj
vDj1 (i) aDj1 Dj
Termination:
the final score is vML+1 (n), calculated using the top recursion relation.
18.
The Viterbi algorithm for profile HMMs: log-odds version

Initialization:
VM0 (0) = 0; (the Begin state is M0 , and the End state is ML+1 )
Recursion:
VMj1 (i 1) + log aMj1 Mj

eMj (xi )
VMj (i) = log qx + max VIj1 (i 1) + log aIj1 Mj
i
VDj1 (i 1) + log aDj1 Mj
VMj (i 1) + log aMj Ij ,

eIj (xi )
VIj (i) = log qx + max VIj (i 1) + log aIj Ij
i
VDj (i 1) + log aDj Ij
VMj1 (i) + log aMj1 Dj

D
Vj (i) = max VIj1 (i) + log aIj1 Dj
VDj1 (i) + log aDj1 Dj
Termination:
the final score is VML+1 (n), calculated using the top recursion relation.
19.
20.
The Forward algorithm for profile HMMs

Initialization: fM0 (0) = 1
Recursion:
fMj (i) = eMj (xi )[fMj1 (i 1)aMj1 Mj + fIj1 (i 1)aIj1 Mj + fDj1 (i 1)aDj1 Mj ]
fIj (i) = eIj (xi )[fMj (i 1)aMj Ij + fIj (i 1)aIj Ij + fDj (i 1)aDj Ij ]
fDj (i) = fMj1 (i)aMj1 Dj + fIj1 (i)aIj1 Dj + fDj1 (i)aDj1 Dj
Termination:
fML+1 (n + 1) = fML (n)aML ML+1 + fIL (n)aIL ML+1 + fDL (n)aDL ML+1
21.
The Backward algorithm for profile HMMs

Initialization:
bML+1 (n + 1) = 1;
bML (n) = aML ML+1 ; bIL (n) = aIL ML+1 ; bDL (n) = aDL ML+1
Recursion:
bMj (i) = bMj+1 (i + 1)aMj Mj+1 eMj+1 (xi+1 )+bIj (i + 1)aMj Ij eIj (xi+1 )+bDj+1 (i)aMj Dj+1
bIj (i) = bMj+1 (i + 1)aIj Mj+1 eMj+1 (xi+1 ) + bIj (i + 1)aIj Ij eIj (xi+1 ) + bDj+1 (i)aIj Dj+1
bDj (i) = bMj+1 (i+1)aDj Mj+1 eMj+1 (xi+1 )+bIj (i+1)aDj Ij eIj (xi+1 )+bDj+1 (i)aDj Dj+1
22.
The Baum-Welch (Expectation-Maximization) algorithm

for Profile HMMs: re-estimation equations
Expected emission counts from sequence x:
P
1
EMj (a) = P (x) i|xi =a fMj (i)bMj (i)
P
1
EIj (a) = P (x) i|xi =a fIj (i)bIj (i)
Expected transition counts from sequence x:

P
1
AXj Mj+1 = P (x) i fXj (i)aXj Mj+1 eMj+1 (xi+1)bMj+1 (i + 1)
P
1
AXj Ij = P (x) i fXj (i)aXj Ij eIj (xi+1)bIj (i + 1)
P
1
AXj Dj+1 = P (x) i fXj (i)aXj Dj+1 bDj+1 (i)
where Xj is one of Mj , Ij , and Dj .
23.
Avoiding local maxima

The Baum-Welch algorithm is guaranteed to find a local
maximum on the probability surface but there is no
guarantee that this local optimum is anywhere near the
global optimum.
A more involved approach is to use some form of stochastic
search algorithm that bumps Baum-Welch off from local
maxima:
noise injection during Baum-Welch re-estimation,
simulated annealing Viterbi approximation of EM,
Gibbs sampling.
For details, see Durbin et al. Section 6.5, pages 154158.
4 Getting simultaneously the model and the multiple

alignment
Profile HMM training from unaligned sequences
Initialization:
Choose the length of the profile HMM (i.e., the number of match
states), and initialize the transition and emission parameters.
A commonly used rule is to set the profile HMMs length to be the
average length of the training sequences.
Training:
Estimate the model using the Baum-Welch algorithm or its Viterbi
alternative.
Start Baum-Welch from multiple different points to see if it all converges to approximately the same optimum.
If necessary, use a heuristic method for avoiding local optima. (See
the previous slide.)
Multiple alignment:
Align all sequences to the final model using the Viterbi algorithm and
build a multiple alignment.
24.
25.
Further comments on profile HMMs initial parameters:

One possibility:
guess a multiple alignment of some or all given sequences.
A further possibility:
derive the models initial parameters from the Dirichlet prior over
parameters (see Ch. 11).
Alternatively:
use frequencies derived from the prior to initialise the models parameters, then use this model to generate a small number of random
sequences, and finally use the resulting counts as data to estimate an
initial model.
Note:
The model should be encouraged to use sensible transitions; for instance transitions into match states should be large compared to other
transition probabilities.
26.
Model surgery
After training a model we can analyse the alignment it produces:
From counts estimated by the forward-backward procedure we can see
how much a certain transition is used by the training sequences.
The usage of a match state is the sum of counts for all letters emitted
in the state.
If a certain match state is used by less than half the number of given
sequences, the corresponding module (triplet of match, insert, delete
states) should be deleted.
Similarly, if more than half (or some other predefined fraction) of the
sequences use the transitions into a certain insert state, this should
be expanded to some number of new modules (usually the average
number of insertions).
27.
5 Profile HMMs for non-global alignments
Local
multiple alignment
Begin
End
28.
The looping probabilities of the flanking (blue diamond) states should be

close to 1; lets set them to 1 .
Transition probabilities from the left flanking state to the match states:
one option: all /L
another option: /2 for the transition into the first match state, and
/2(L 1) for the other positions.
29.
For the rare case when the first residue might be missing:
Begin
End
30.
A profile HMM for

repeat matches
to subsections of
the profile model
Begin
End
31.
6 Weighting the training sequences

7
When the (training) sequences

in the given multiple alignment
are not statistically independent,
one may use some simple weighting schemes derived from a phylogenetic tree.
t 6=3
6
t 4=8
t 5=3
Example of a phylogenetic tree:
t 3=5
5
t 1=2
t 2=2
1
32.
6.1 A weighting scheme using Kirchhof s laws

[Thompson et al., 1994]
Let In and Vn be the current and respectively the voltage at node n. We
can set the resistance equal to the
edge length/time.
V7
I 1+ I 2+ I 3
V6
I4
V5
I1
I 1+ I 2
I2
I3
Equations:
V5 = 2I1 = 2I2
V6 = 2I1 + 3(I1 + I2 ) = 5I3
V7 = 8I4 = 5I3 + 3(I1 + I2 + I3 )
Result:
I1 : I2 : I3 : I4 = 20 : 20 : 32 : 47
33.
6.2 Another simple algorithm

[Gerstein et al., 1994]
Initially the weights are set to
the edge lengths of the leafs:
w1 = w2 = 2, w3 = 5, w4 = 8.
Then
t 6=3
i = tn P
leaves k below n
6
t 4=8
t 5=3
t 3=5
5
t 1=2
t 2=2
1
So, at node 5:
w1 = w2 = 2 + 3/2 = 3.5
At node 6:
w1 = w2 = 3.5 + 3 3.5/12,
w3 = 5 + 3 5/12
Result:
w1 : w2 : w3 : w4 = 35 : 35 : 50 : 64
34.
For details on other, more involved weighting schemes,

Root weights from Gaussian parameters
Voronoi weights
Maximum discrimination weights
Maximum entropy weights
you may see Durbin et al., Section 5.8, pages 126132
35.
Examples
(including the use of weights in the computation of parameters for
the HMM profile):
pr. 5.6, 5.10 in [Borodovsky, Ekisheva, 2006]

Profile HMM

Uploaded by

Copyright:

Available Formats

Profile HMM

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Profile HMM

Uploaded by

Copyright:

Available Formats

0.

Multiple Sequence Alignment

1 From profiles to Profile HMMs

What about insert residues?

What about gaps?

A better treatment of gaps:

Could we put it all together?

Transition structure of a profile HMM

Any resemblance to pair HMMs?

Doesnt seem so...

When making the extension to multiple sequence alignment,

one moment, though...

2 Setting the parameters of a Profile HMM

Counts for our example

What about zero counts, i.e. unseen emissions/transitions?

use pseudocounts (generalising Laplaces law...):

where A is a weight put on pseudocounts as compared to real counts

Note: At the intuitive level, using pseudocount makes a lot of sense:

2.2 Setting the L parameter

The MAP (maximum a posteriori) model construction

where C is a set of aligned sequences.

MAP model construction algorithm: Notations

Sj the log probability of the optimal model for the alignment up to

The MAP model construction algorithm

3 Basic algorithms for Profile HMMs

The Viterbi algorithm for profile HMM

vMj1 (i) aMj1 Dj

vDj1 (i) aDj1 Dj

The Viterbi algorithm for profile HMMs: log-odds version

VMj1 (i 1) + log aMj1 Mj

VDj1 (i 1) + log aDj1 Mj

VMj (i 1) + log aMj Ij ,

VDj (i 1) + log aDj Ij

VMj1 (i) + log aMj1 Dj

VDj1 (i) + log aDj1 Dj

The Forward algorithm for profile HMMs

The Backward algorithm for profile HMMs

The Baum-Welch (Expectation-Maximization) algorithm

Expected transition counts from sequence x:

Avoiding local maxima

4 Getting simultaneously the model and the multiple

Further comments on profile HMMs initial parameters:

5 Profile HMMs for non-global alignments

The looping probabilities of the flanking (blue diamond) states should be

A profile HMM for

6 Weighting the training sequences

When the (training) sequences

Example of a phylogenetic tree:

6.1 A weighting scheme using Kirchhof s laws

6.2 Another simple algorithm

For details on other, more involved weighting schemes,

pr. 5.6, 5.10 in [Borodovsky, Ekisheva, 2006]

You might also like