HMM的假设:

  • 1.齐次马尔科夫假设:P(i_t|i_t-1,o_t-1,i_t-2,o_t-2,...,i_1,o_1)=P(i_t|i_t-1), t=1,2,...,T
  • 2.观测独立假设:P(o_t|i_1,o_1,i_2,o_2,...,i_t,i_t+1,o_t+1,...,i_T,o_T)=P(o_t|i_t), t=1,2,...,T

符号说明

  • 隐藏状态序列I=i1,i2,...,iT
  • 状态集合Q={q1,q2,...,qN}
  • 观测序列O=o1,o2,...,oT
  • 观测集合V={v1,v2,...,vM}
  • π:状态的初始概率分布,π=(π_1,π_2,…,π_N),π_1+π_2+…+π_N=1
  • A:状态转移矩阵,A=(a_ij)N×N,a_ij=P(i_t+1=q_j|i_t=q_i)
  • B:发射矩阵,B=(b_j(k))N×M,b_j(k)=P(O_t=v|i_t=q_j)
  • λ=(π,A,B)

1.评估问题

给定观测序列O=o1o2...oT和模型λ=(π,A,B),求P(O|λ),即给定模型下观测序列的概率是多少?

P(O|λ)=∑_I P(O,I|λ)=∑_I P(I|λ).P(O|λ,I)

=∑_i1,i2,...,iT P(i_1,i_2,...,i_T|λ).P(o_1,o_2,...,o_T|i_1,i_2,...,i_T,λ)

=∑_i1,i2,...,iT P(i_1|λ).P(i_2|λ,i_1),P(i_3|λ,i_1,i_2)...P(i_T|λ,i_1,...,i_T).P(o_1|i_1,λ).P(o_2|i_2,λ)...P(o_T|i_T,λ)

=∑_i1,i2,...,iT π_i1.a_i1i2.a_i2i3...a_iT-1iT.b_i1(o_1).b_i2(o_2)...b_iT(o_T)

前向算法

定义:

α_t(i)=P(o_1,o_2,...,o_T,i_t=q_i|λ), t=1,2,...,T

α_1(i)=P(o_1,i_1=q_i|λ)=P(i_1=q_i|λ).P(o_1|λ,i_1=q_i)=π_i.b_i(o_1)

已知α_t(i),t=1,2,...,T-1,则

α_t+1(i+1)=P(o_1,o_2,...,o_t,o_t+1,i_t+1=q_i|λ)

=∑_j P(o_1,o_2,...,o_t,i_t=q_j).a_ji.b_i(o_t+1)

=∑_j α_t(j).a_ji.b_i(o_t+1)

则P(O|λ)=∑_i α_T(i)

小例子

In [8]:
# Initialise HMM
library(HMM)
hmm = initHMM(c("A","B"), c("L","R"), transProbs=matrix(c(.8,.2,.2,.8),2),
emissionProbs=matrix(c(.6,.4,.4,.6),2))
print(hmm)
# Sequence of observations
observations = c("L","L","R","R")
# Calculate forward probablities
logForwardProbabilities = forward(hmm,observations)
print(exp(logForwardProbabilities))
$States
[1] "A" "B"

$Symbols
[1] "L" "R"

$startProbs
  A   B 
0.5 0.5 

$transProbs
    to
from   A   B
   A 0.8 0.2
   B 0.2 0.8

$emissionProbs
      symbols
states   L   R
     A 0.6 0.4
     B 0.4 0.6

      index
states   1     2      3        4
     A 0.3 0.168 0.0608 0.024448
     B 0.2 0.088 0.0624 0.037248
In [9]:
#或者得到hmm和observations之后,直接用forward函数得到上述结果
exp(forward(hmm,observations))#forward
#exp(backward(hmm,observations))#backward
A matrix: 2 × 4 of type dbl
1234
A0.30.1680.06080.024448
B0.20.0880.06240.037248

2.预测问题

给定观测序列O=o1o2...oT和模型λ=(π,A,B),求出最大概率的隐藏序列I=i1,i2,...,iT?

维特比(Viterbi)算法

I=argmax_I P(I|λ,O)

=argmax_I P(I|λ,O).P(O|λ)

=argmax_I P(I,O|λ)=argmax_i1,i2,...,iT P(i_1,i_2,...,i_T,o_1,o_2,...,o_T|λ)

i_tϵ{q1,q2,...,qN}={1,2,...,N}, t=1,2,...,T

定义:δ_t(i)为从时刻1到时刻t,且在时刻t的状态取i的所有路径中概率的最大值

δ_t(i)=max_i1,i2,...,it-1 P(i_1,i_2,...,i_t-1,i_t=i,o_1,o_2,...,o_t|λ),i=1,2,...,N

则有δ_T(i)=max_i1,i2,...,iT-1 P(i_1,i_2,...,i_T-1,i_T=i,o_1,o_2,...,o_T|λ)

则max_I P(I,O|λ)=max_1≤i≤N δ_T(i)

如何用动态规划——维特比(Viterbi)算法求δ_T(i)?

δ_T-1(i)→δ_T(i):δ_T(i)=max_j δ_T-1(j).a_ji.b_i(o_T)

对t=1,2,...,T-1,

δ_t+1(i)=max_1≤j≤N δ_t(j).a_ji.b_i(o_t+1),t=1,2,...,N

δ_1(i)=P(i_1=i,o_1|λ)=π_i.b_i(o_1),t=1,2,...,N

δ_1(i)→δ_2(i)→...→δ_T(i)→max_1≤i≤N δ_T(i)

(i_T)^* =arg max_1≤i≤N δ_T(i)

(i_T-1)^ =arg max_1≤j≤N δ_T-1(j).a_j,(i_t)^ .b_(i_t)^* (o_t)

=arg max_1≤j≤N δ_T-1(j).a_j,(i_t)^

......

(i_t)^ =arg max_1≤j≤N δ_t(j).a_j,(i_t+1)^

最佳路径为(i_1)^ ,(i_2)^ ,...,(i_T-1)^ ,(i_T)^

小例子

In [10]:
# Initialise HMM
hmm = initHMM(c("A","B"), c("L","R"), transProbs=matrix(c(.6,.4,.4,.6),2),
emissionProbs=matrix(c(.6,.4,.4,.6),2))
print(hmm)
# Sequence of observations
observations = c("L","L","R","R")
# Calculate Viterbi path
viterbi = viterbi(hmm,observations)
print(viterbi)
$States
[1] "A" "B"

$Symbols
[1] "L" "R"

$startProbs
  A   B 
0.5 0.5 

$transProbs
    to
from   A   B
   A 0.6 0.4
   B 0.4 0.6

$emissionProbs
      symbols
states   L   R
     A 0.6 0.4
     B 0.4 0.6

[1] "A" "A" "B" "B"
In [11]:
#直接用viterbi函数实现上述结果
viterbi(hmm,observations)
  1. 'A'
  2. 'A'
  3. 'B'
  4. 'B'

3.学习问题

给定观测序列O=o1o2...oT,求λ=(π,A,B)?

EM算法,前后向算法

λ_MLE=arg max_λ P(O|λ)=arg max_λ ∑_I P(O,I|λ)=arg max_λ ∑_I P(I|λ).P(O|λ,I)

EM :含有隐变量的最大似然估计问题

data : O=o1,o2,...,oT

latent varible :I=i1,i2,...,iT

完全数据 :(O,I)=(o1,o2,...,oT,i1,i2,...,iT)

参数 :λ=(π,A,B)

E-step :Q(λ,λ^t)=E_I|λ^t,o logP(O,I|λ)

M-step :π^t+1=arg max_π Q(λ,λ^t)

∴πi^ =α1(i)β1(i)/∑α1(i)β1(i),i=1,2,...,N

小例子

In [13]:
# Initial HMM
hmm = initHMM(c("A","B"),c("L","R"),
transProbs=matrix(c(.9,.1,.1,.9),2),
emissionProbs=matrix(c(.5,.51,.5,.49),2))
print(hmm)
# Sequence of observation
a = sample(c(rep("L",100),rep("R",300)))
b = sample(c(rep("L",300),rep("R",100)))
observation = c(a,b)
# Baum-Welch
bw = baumWelch(hmm,observation,10)
print(bw$hmm)
$States
[1] "A" "B"

$Symbols
[1] "L" "R"

$startProbs
  A   B 
0.5 0.5 

$transProbs
    to
from   A   B
   A 0.9 0.1
   B 0.1 0.9

$emissionProbs
      symbols
states    L    R
     A 0.50 0.50
     B 0.51 0.49

$States
[1] "A" "B"

$Symbols
[1] "L" "R"

$startProbs
  A   B 
0.5 0.5 

$transProbs
    to
from            A           B
   A 9.975520e-01 0.002447994
   B 1.568199e-07 0.999999843

$emissionProbs
      symbols
states         L         R
     A 0.2511406 0.7488594
     B 0.7596840 0.2403160

In [ ]: