0% found this document useful (0 votes)

40 views31 pages

Hidden Markov Model - Implemented From Scratch - by Oleg Żero - Towards Data Science

This summarizes the key steps in implementing a Hidden Markov Model from scratch in Python code. It defines ProbabilityVector and ProbabilityMatrix classes to represent the fundamental concepts of states, observations, and probabilities in an HMM. Methods are included to initialize, compare, perform mathematical operations on these objects while enforcing properties like values summing to 1. This implementation will be used in a later example to demonstrate an HMM's performance.

Uploaded by

droolingpanda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views31 pages

Hidden Markov Model - Implemented From Scratch - by Oleg Żero - Towards Data Science

Uploaded by

droolingpanda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Get started Open in app

Follow 596K Followers

You have 2 free member-only stories left this month. Sign up for Medium and get an extra one

Hidden Markov Model — Implemented from

scratch
Oleg Żero Mar 27, 2020 · 23 min read
Get started Open in app

A short pause in the summer heat. Portugal, 2019.

I want to expand this work into a series of µ-tutorial videos. If you’re interested, please
subscribe to my newsletter to stay in touch.

Introduction
The Internet is full of good articles that explain the theory behind the Hidden Markov
Model (HMM) well (e.g. 1, 2, 3 and 4). However, many of these works contain a fair
amount of rather advanced mathematical equations. While equations are necessary if
one wants to explain the theory, we decided to take it to the next level and create a
gentle step by step practical implementation to complement the good work of others.
In this short series of two articles, we will focus on translating all of the complicated
Get started Open in app
mathematics into code. Our starting point is the document written by Mark Stamp. We
will use this paper to define our code (this article) and then use a somewhat peculiar
example of “Morning Insanity” to demonstrate its performance in practice.

Notation
Before we begin, let’s revisit the notation we will be using. By the way, don’t worry if
some of that is unclear to you. We will hold your hand.

T - length of the observation sequence.

N - number of latent (hidden) states.

M - number of observables.

Q = {q₀, q₁, …} - hidden states.

V = {0, 1, …, M — 1} - set of possible observations.

A - state transition matrix.

B - emission probability matrix.

π- initial state probability distribution.

O - observation sequence.

X = (x₀, x₁, …), x_t ∈ Q - hidden state sequence.

Having that set defined, we can calculate the probability of any state and observation
using the matrices:

A = {a_ij} — begin an transition matrix.

B = {b_j(k)} — being an emission matrix.

The probabilities associated with transition and observation (emission) are:

Get started Open in app

The model is therefore defined as a collection:

Fundamental definitions
Since HMM is based on probability vectors and matrices, let’s first define objects that
will represent the fundamental concepts. To be useful, the objects must reflect on
certain properties. For example, all elements of a probability vector must be numbers 0
≤ x ≤ 1 and they must sum up to 1. Therefore, let’s design the objects the way they will
inherently safeguard the mathematical properties.

import numpy as np

import pandas as pd

class ProbabilityVector:

def init(self, probabilities: dict):

states = probabilities.keys()

probs = probabilities.values()

assert len(states) == len(probs),

"The probabilities must match the states."

assert len(states) == len(set(states)),

"The states must be unique."

assert abs(sum(probs) - 1.0) < 1e-12,

"Probabilities must sum up to 1."

assert len(list(filter(lambda x: 0 <= x <= 1, probs))) ==

len(probs), \

"Probabilities must be numbers from [0, 1] interval."

self.states = sorted(probabilities)

self.values = np.array(list(map(lambda x:

probabilities[x], self.states))).reshape(1, -1)

@classmethod

def initialize(cls, states: list):

size = len(states)

rand = np.random.rand(size) / (size**2) + 1 / size

rand /= rand.sum(axis=0)

return cls(dict(zip(states, rand)))

@classmethod

Get started Open in app

def from_numpy(cls, array: np.ndarray, state: list):

return cls(dict(zip(states, list(array))))

@property

def dict(self):

return {k:v for k, v in zip(self.states,

list(self.values.flatten()))}

@property

def df(self):

return pd.DataFrame(self.values, columns=self.states, index=

['probability'])

def __repr__(self):

return "P({}) = {}.".format(self.states, self.values)

def eq(self, other):

if not isinstance(other, ProbabilityVector):

raise NotImplementedError

if (self.states == other.states) and (self.values ==

other.values).all():

return True

return False

def getitem(self, state: str) -> float:

if state not in self.states:

raise ValueError("Requesting unknown probability state

from vector.")

index = self.states.index(state)

return float(self.values[0, index])

def mul(self, other) -> np.ndarray:

if isinstance(other, ProbabilityVector):

return self.values * other.values

elif isinstance(other, (int, float)):

return self.values * other

else:

NotImplementedError

def rmul(self, other) -> np.ndarray:

return self.__mul__(other)

def matmul(self, other) -> np.ndarray:

if isinstance(other, ProbabilityMatrix):

return self.values @ other.values

def truediv(self, number) -> np.ndarray:

if not isinstance(number, (int, float)):

raise NotImplementedError

x = self.values

return x / number if number != 0 else x / (number + 1e-12)

def argmax(self):

Get started index

Open = self.values.argmax()

in app
return self.states[index]

The most natural way to initialize this object is to use a dictionary as it associates values
with unique keys. Dictionaries, unfortunately, do not provide any assertion mechanisms
that put any constraints on the values. Consequently, we build our custom
ProbabilityVector object to ensure that our values behave correctly. Most importantly, we
enforce the following:

The number of values must equal the number of the keys (names of our states).
Although this is not a problem when initializing the object from a dictionary, we will
use other ways later.

All names of the states must be unique (the same arguments apply).

The probabilities must sum up to 1 (up to a certain tolerance).

All probabilities must be 0 ≤ p ≤ 1.

Having ensured that, we also provide two alternative ways to instantiate

ProbabilityVector objects (decorated with @classmethod ).

1. We instantiate the objects randomly — it will be useful when training.

2. We use ready-made numpy arrays and use values therein, and only providing the
names for the states.

For convenience and debugging, we provide two additional methods for requesting the
values. Decorated with, they return the content of the PV object as a dictionary or a
pandas dataframe.

The PV objects need to satisfy the following mathematical operations (for the purpose of
constructing of HMM):

1. comparison ( eq ) - to know if any two PV's are equal,

2. element-wise multiplication of two PV’s or multiplication with a scalar ( mul and

__rmul__ ).
3. dot product ( __matmul__ ) - to perform vector-matrix multiplication
Get started Open in app
4. division by number ( __truediv__ ),

5. argmax to find for which state the probability is the highest.

6. getitem to enable selecting value by the key.

Note that when e.g. multiplying a PV with a scalar, the returned structure is a resulting
numpy array, not another PV. This is because multiplying by anything other than 1
would violate the integrity of the PV itself.

Internally, the values are stored as a numpy array of size (1 × N).

Example

a1 = ProbabilityVector({'rain': 0.7, 'sun': 0.3})

a2 = ProbabilityVector({'sun': 0.1, 'rain': 0.9})

print(a1.df)

print(a2.df)

print("Comparison:", a1 == a2)

print("Element-wise multiplication:", a1 * a2)

print("Argmax:", a1.argmax())

print("Getitem:", a1['rain'])

# OUTPUT

>>> rain sun

probability 0.7 0.3

rain sun

probability 0.9 0.1

>>> Comparison: False

>>> Element-wise multiplication: [[0.63 0.03]]

>>> Argmax: rain

>>> Getitem: 0.7

Probability Matrix
Another object is a Probability Matrix , which is a core part of the HMM definition.
Get started Open in app
Formally, the A and B matrices must be row-stochastic, meaning that the values of every
row must sum up to 1. We can, therefore, define our PM by stacking several PV's, which
we have constructed in a way to guarantee this constraint.

class ProbabilityMatrix:

def init(self, prob_vec_dict: dict):

assert len(prob_vec_dict) > 1, \

"The numebr of input probability vector must be greater

than one."

assert len(set([str(x.states) for x in

prob_vec_dict.values()])) == 1, \

"All internal states of all the vectors must be

indentical."

assert len(prob_vec_dict.keys()) ==
len(set(prob_vec_dict.keys())), \

"All observables must be unique."

self.states = sorted(prob_vec_dict)

self.observables = prob_vec_dict[self.states[0]].states

self.values = np.stack([prob_vec_dict[x].values \

for x in self.states]).squeeze()

@classmethod

def initialize(cls, states: list, observables: list):

size = len(states)

rand = np.random.rand(size, len(observables)) \

/ (size**2) + 1 / size

rand /= rand.sum(axis=1).reshape(-1, 1)

aggr = [dict(zip(observables, rand[i, :])) for i in

range(len(states))]

pvec = [ProbabilityVector(x) for x in aggr]

return cls(dict(zip(states, pvec)))

@classmethod

def from_numpy(cls, array:

np.ndarray,

states: list,

observables: list):

p_vecs = [ProbabilityVector(dict(zip(observables, x))) \

for x in array]

return cls(dict(zip(states, p_vecs)))

@property

def dict(self):

return self.df.to_dict()

@property

def df(self):

return pd.DataFrame(self.values,

Get started Open incolumns=self.observables,

app index=self.states)

def __repr__(self):

return "PM {} states: {} -> obs: {}.".format(

self.values.shape, self.states, self.observables)

def getitem(self, observable: str) -> np.ndarray:

if observable not in self.observables:

raise ValueError("Requesting unknown probability

observable from the matrix.")

index = self.observables.index(observable)

return self.values[:, index].reshape(-1, 1)

Here, the way we instantiate PM’s is by supplying a dictionary of PV’s to the constructor
of the class. By doing this, we not only ensure that every row of PM is stochastic, but also
supply the names for every observable.

Our PM can, therefore, give an array of coefficients for any observable. Mathematically,
the PM is a matrix:

The other methods are implemented in similar way to PV.

Example

a1 = ProbabilityVector({'rain': 0.7, 'sun': 0.3})

a2 = ProbabilityVector({'rain': 0.6, 'sun': 0.4})

A = ProbabilityMatrix({'hot': a1, 'cold': a2})

print(A)

print(A.df)

>>> PM (2, 2) states: ['cold', 'hot'] -> obs: ['rain', 'sun'].

>>> rain sun

cold 0.6 0.4

hot 0.7 0.3

b1 = ProbabilityVector({'0S': 0.1, '1M': 0.4, '2L': 0.5})

b2started
Get = ProbabilityVector({'0S':
Open in app 0.7, '1M': 0.2, '2L': 0.1})
B = ProbabilityMatrix({'0H': b1, '1C': b2})

print(B)

print(B.df)

>>> PM (2, 3) states: ['0H', '1C'] -> obs: ['0S', '1M', '2L'].

>>> 0S 1M 2L

0H 0.1 0.4 0.5

1C 0.7 0.2 0.1

P = ProbabilityMatrix.initialize(list('abcd'), list('xyz'))

print('Dot product:', a1 @ A)

print('Initialization:', P)

print(P.df)

>>> Dot product: [[0.63 0.37]]

>>> Initialization: PM (4, 3)

states: ['a', 'b', 'c', 'd'] -> obs: ['x', 'y', 'z'].
>>> x y z

a 0.323803 0.327106 0.349091

b 0.318166 0.326356 0.355478

c 0.311833 0.347983 0.340185

d 0.337223 0.316850 0.345927

Implementing Hidden Markov Chain

Before we proceed with calculating the score, let’s use our PV and PM definitions to
implement the Hidden Markov Chain.

Again, we will do so as a class, calling it HiddenMarkovChain . It will collate at A, B and π.

Later on, we will implement more methods that are applicable to this class.

Computing score
Computing the score means to find what is the probability of a particular chain of
observations O given our (known) model λ = (A, B, π). In other words, we are interested
in finding p(O|λ).

We can find p(O|λ) by marginalizing all possible chains of the hidden variables X, where
X = {x₀, x₁, …}:
Since p(O|X, λ) = ∏ b(O) (the product of all probabilities related to the observables) and
Get started Open in app
p(X|λ)=π ∏ a (the product of all probabilities of transitioning from x at t to x at t + 1, the
probability we are looking for (the score) is:

This is a naive way of computing of the score, since we need to calculate the probability
for every possible chain X. Either way, let’s implement it in python:

from itertools import product

from functools import reduce

class HiddenMarkovChain:

def init(self, T, E, pi):

self.T = T # transmission matrix A

self.E = E # emission matrix B

self.pi = pi

self.states = pi.states

self.observables = E.observables

def __repr__(self):

return "HML states: {} -> observables: {}.".format(

len(self.states), len(self.observables))

@classmethod

def initialize(cls, states: list, observables: list):

T = ProbabilityMatrix.initialize(states, states)

E = ProbabilityMatrix.initialize(states, observables)

pi = ProbabilityVector.initialize(states)

return cls(T, E, pi)

def _create_all_chains(self, chain_length):

return list(product((self.states,) chain_length))

def score(self, observations: list) -> float:

def mul(x, y): return x * y

score = 0

all_chains = self._create_all_chains(len(observations))

for idx, chain in enumerate(all_chains):

expanded_chain = list(zip(chain, [self.T.states[0]] +

list(chain)))

expanded_obser = list(zip(observations, chain))

p_observations = list(map(lambda x: self.E.df.loc[x[1],

x[0]], expanded_obser))

Get started p_hidden_state

Open in app = list(map(lambda x: self.T.df.loc[x[1],
x[0]], expanded_chain))

p_hidden_state[0] = self.pi[chain[0]]

score += reduce(mul, p_observations) * reduce(mul,

p_hidden_state)

return score

Example

a1 = ProbabilityVector({'1H': 0.7, '2C': 0.3})

a2 = ProbabilityVector({'1H': 0.4, '2C': 0.6})

b1 = ProbabilityVector({'1S': 0.1, '2M': 0.4, '3L': 0.5})

b2 = ProbabilityVector({'1S': 0.7, '2M': 0.2, '3L': 0.1})

A = ProbabilityMatrix({'1H': a1, '2C': a2})

B = ProbabilityMatrix({'1H': b1, '2C': b2})

pi = ProbabilityVector({'1H': 0.6, '2C': 0.4})

hmc = HiddenMarkovChain(A, B, pi)

observations = ['1S', '2M', '3L', '2M', '1S']

print("Score for {} is {:f}.".format(observations,

hmc.score(observations)))

>>> Score for ['1S', '2M', '3L', '2M', '1S'] is 0.003482.

If our implementation is correct, then all score values for all possible observation chains,
for a given model should add up to one. Namely:

all_possible_observations = {'1S', '2M', '3L'}

chain_length = 3 # any int > 0

all_observation_chains = list(product(*(all_possible_observations,)
* chain_length))

all_possible_scores = list(map(lambda obs: hmc.score(obs),

all_observation_chains))

print("All possible scores added:

{}.".format(sum(all_possible_scores)))

>>>
Get All possible
started scores added: 1.0.
Open in app

Indeed.

Score with forward-pass

Computing the score the way we did above is kind of naive. In order to find the number
for a particular observation chain O, we have to compute the score for all possible latent
variable sequences X. That requires 2TN^T multiplications, which even for small
numbers takes time.

Another way to do it is to calculate partial observations of a sequence up to time t.

For and i ∈ {0, 1, …, N-1} and t ∈ {0, 1, …, T-1} :

Consequently,

and

Then
Get started Open in app

Note that α_t is a vector of length N. The sum of the product α a can, in fact, be written
as a dot product. Therefore:

where by the star, we denote an element-wise multiplication.

With this implementation, we reduce the number of multiplication to N²T and can take
advantage of vectorization.

class HiddenMarkovChain_FP(HiddenMarkovChain):

def _alphas(self, observations: list) -> np.ndarray:

alphas = np.zeros((len(observations), len(self.states)))

alphas[0, :] = self.pi.values * self.E[observations[0]].T

for t in range(1, len(observations)):

alphas[t, :] = (alphas[t - 1, :].reshape(1, -1)

@ self.T.values) *
self.E[observations[t]].T

return alphas

def score(self, observations: list) -> float:

alphas = self._alphas(observations)

return float(alphas[-1].sum())

Example

hmc_fp = HiddenMarkovChain_FP(A, B, pi)

observations = ['1S', '2M', '3L', '2M', '1S']

print("Score for {} is {:f}.".format(observations,

hmc_fp.score(observations)))

>>> All possible scores added: 1.0.

…yup.
Simulation and convergence
Get started Open in app
Let’s test one more thing. Basically, let’s take our λ = (A, B, π) and use it to generate a
sequence of random observables, starting from some initial state probability π.

If the desired length T is “large enough”, we would expect that the system to converge on
a sequence that, on average, gives the same number of events as we would expect from A
and B matrices directly. In other words, the transition and the emission matrices
“decide”, with a certain probability, what the next state will be and what observation we
will get, for every step, respectively. Therefore, what may initially look like random
events, on average should reflect the coefficients of the matrices themselves. Let’s check
that as well.

class HiddenMarkovChain_Simulation(HiddenMarkovChain):

def run(self, length: int) -> (list, list):

assert length >= 0, "The chain needs to be a non-negative

number."

s_history = [0] * (length + 1)

o_history = [0] * (length + 1)

prb = self.pi.values

obs = prb @ self.E.values

s_history[0] = np.random.choice(self.states,
p=prb.flatten())

o_history[0] = np.random.choice(self.observables,
p=obs.flatten())

for t in range(1, length + 1):

prb = prb @ self.T.values

obs = prb @ self.E.values

s_history[t] = np.random.choice(self.states,
p=prb.flatten())

o_history[t] = np.random.choice(self.observables,
p=obs.flatten())

return o_history, s_history

Example

hmc_s = HiddenMarkovChain_Simulation(A, B, pi)

observation_hist, states_hist = hmc_s.run(100) # length = 100

stats = pd.DataFrame({

'observations': observation_hist,

'states': states_hist}).applymap(lambda x: int(x[0])).plot()

Get started Open in app

Figure 1. An example of a Markov process. The states and the observable sequences are shown.

Latent states
The state matrix A is given by the following coefficients:

Consequently, the probability of “being” in the state “1H” at t+1, regardless of the
previous state, is equal to:
Get started Open in app

If we assume that the prior probabilities of being at some state at are totally random,
then p(1H) = 1 and p(2C) = 0.9, which after renormalizing give 0.55 and 0.45,
respectively.

If we count the number of occurrences of each state and divide it by the number of
elements in our sequence, we would get closer and closer to these number as the length
of the sequence grows.

Example

hmc_s = HiddenMarkovChain_Simulation(A, B, pi)

stats = {}

for length in np.logspace(1, 5, 40).astype(int):

observation_hist, states_hist = hmc_s.run(length)

stats[length] = pd.DataFrame({

'observations': observation_hist,

'states': states_hist}).applymap(lambda x: int(x[0]))

S = np.array(list(map(lambda x:

x['states'].value_counts().to_numpy() / len(x),
stats.values())))

plt.semilogx(np.logspace(1, 5, 40).astype(int), S)

plt.xlabel('Chain length T')

plt.ylabel('Probability')

plt.title('Converging probabilities.')

plt.legend(['1H', '2C'])

plt.show()
Get started Open in app

Figure 2. Convergence of the probabilities against the length of the chain.

Let’s take our HiddenMarkovChain class to the next level and supplement it with more
methods. The methods will help us to discover the most probable sequence of hidden
variables behind the observation sequence.

Expanding the class

We have defined α to be the probability of partial observation of the sequence up to time
.

Now, let’s define the “opposite” probability. Namely, the probability of observing the
sequence from T - 1down to t.

For t= 0, 1, …, T-1 and i=0, 1, …, N-1, we define:

c`1As before, we can β(i) calculate recursively:

Then for t ≠ T-1:

which in vectorized form, will be:
Get started Open in app

Finally, we also define a new quantity γ to indicate the state q_i at time t, for which the
probability (calculated forwards and backwards) is the maximum:

Consequently, for any step t = 0, 1, …, T-1, the state of the maximum likelihood can be
found using:

class HiddenMarkovChain_Uncover(HiddenMarkovChain_Simulation):

def _alphas(self, observations: list) -> np.ndarray:

alphas = np.zeros((len(observations), len(self.states)))

alphas[0, :] = self.pi.values * self.E[observations[0]].T

for t in range(1, len(observations)):

alphas[t, :] = (alphas[t - 1, :].reshape(1, -1) @

self.T.values) \

* self.E[observations[t]].T

return alphas

def _betas(self, observations: list) -> np.ndarray:

betas = np.zeros((len(observations), len(self.states)))

betas[-1, :] = 1

for t in range(len(observations) - 2, -1, -1):

betas[t, :] = (self.T.values @ (self.E[observations[t +

1]] \

* betas[t + 1, :].reshape(-1,
1))).reshape(1, -1)

return betas

def uncover(self, observations: list) -> list:

alphas = self._alphas(observations)

betas = self._betas(observations)

maxargs = (alphas * betas).argmax(axis=1)

Get started return

Open inlist(map(lambda
app x: self.states[x], maxargs))

Validation
To validate, let’s generate some observable sequence O. For that, we can use our model’s
.run method. Then, we will use the .uncover method to find the most likely latent
variable sequence.

Example

np.random.seed(42)

a1 = ProbabilityVector({'1H': 0.7, '2C': 0.3})

a2 = ProbabilityVector({'1H': 0.4, '2C': 0.6})

b1 = ProbabilityVector({'1S': 0.1, '2M': 0.4, '3L': 0.5})

b2 = ProbabilityVector({'1S': 0.7, '2M': 0.2, '3L': 0.1})

A = ProbabilityMatrix({'1H': a1, '2C': a2})

B = ProbabilityMatrix({'1H': b1, '2C': b2})

pi = ProbabilityVector({'1H': 0.6, '2C': 0.4})

hmc = HiddenMarkovChain_Uncover(A, B, pi)

observed_sequence, latent_sequence = hmc.run(5)

uncovered_sequence = hmc.uncover(observed_sequence)

| | 0 | 1 | 2 | 3 | 4 | 5 |

|:------------------:|:----|:----|:----|:----|:----|:----|

| observed sequence | 3L | 3M | 1S | 3L | 3L | 3L |

| latent sequence | 1H | 2C | 1H | 1H | 2C | 1H |

| uncovered sequence | 1H | 1H | 2C | 1H | 1H | 1H |

As we can see, the most likely latent state chain (according to the algorithm) is not the
same as the one that actually caused the observations. This is to be expected. After all,
each observation sequence can only be manifested with certain probability, dependent
on the latent sequence.

The code below, evaluates the likelihood of different latent sequences resulting in our
observation sequence.

all_possible_states = {'1H', '2C'}

chain_length = 6 # any int > 0

all_states_chains = list(product(*(all_possible_states,) *
chain_length))

Get started Open in app

df = pd.DataFrame(all_states_chains)

dfp = pd.DataFrame()

for i in range(chain_length):

dfp['p' + str(i)] = df.apply(lambda x:

hmc.E.df.loc[x[i], observed_sequence[i]], axis=1)

scores = dfp.sum(axis=1).sort_values(ascending=False)

df = df.iloc[scores.index]

df['score'] = scores

df.head(10).reset_index()

| index | 0 | 1 | 2 | 3 | 4 | 5 | score |

|:--------:|:----|:----|:----|:----|:----|:----|--------:|

| 8 | 1H | 1H | 2C | 1H | 1H | 1H | 3.1 |

| 24 | 1H | 2C | 2C | 1H | 1H | 1H | 2.9 |

| 40 | 2C | 1H | 2C | 1H | 1H | 1H | 2.7 |

| 12 | 1H | 1H | 2C | 2C | 1H | 1H | 2.7 |

| 10 | 1H | 1H | 2C | 1H | 2C | 1H | 2.7 |

| 9 | 1H | 1H | 2C | 1H | 1H | 2C | 2.7 |

| 25 | 1H | 2C | 2C | 1H | 1H | 2C | 2.5 |

| 0 | 1H | 1H | 1H | 1H | 1H | 1H | 2.5 |

| 26 | 1H | 2C | 2C | 1H | 2C | 1H | 2.5 |

| 28 | 1H | 2C | 2C | 2C | 1H | 1H | 2.5 |

The result above shows the sorted table of the latent sequences, given the observation
sequence. The actual latent sequence (the one that caused the observations) places itself
on the 35th position (we counted index from zero).

dfc = df.copy().reset_index()

for i in range(chain_length):

dfc = dfc[dfc[i] == latent_sequence[i]]

dfc

| index | 0 | 1 | 2 | 3 | 4 | 5 | score |
|:-------:|:----|:----|:----|:----|:----|:----|--------:|
| 18 | 1H | 2C | 1H | 1H | 2C | 1H | 1.9 |

Training the model

The time has come to show the training procedure. Formally, we are interested in
finding λ = (A, B, π) such that given a desired observation sequence O, our model λ
would give the best fit.
Expanding the class
Get started Open in app
Here, our starting point will be the HiddenMarkovModel_Uncover that we have defined
earlier. We will add new methods to train it.

Knowing our latent states Q and possible observation states O, we automatically know
the sizes of the matrices A and B, hence N and M. However, we need to determine a and
b and π.

For t = 0, 1, …, T-2 and i, j =0, 1, …, N -1, we define “di-gammas”:

γ(i, j) is the probability of transitioning for q at t to t + 1. Writing it in terms of α, β, A, B

we have:

Now, thinking in terms of implementation, we want to avoid looping over i, j and t at the
same time, as it’s gonna be deadly slow. Fortunately, we can vectorize the equation:

Having the equation for γ(i, j), we can calculate

To find λ = (A, B, π), we do

For i = 0, 1, …, N-1:
Get started Open in app

For i, j = 0, 1, …, N-1:

For j = 0, 1, …, N-1 and k = 0, 1, …, M-1:

class HiddenMarkovLayer(HiddenMarkovChain_Uncover):

def _digammas(self, observations: list) -> np.ndarray:

L, N = len(observations), len(self.states)

digammas = np.zeros((L - 1, N, N))

alphas = self._alphas(observations)

betas = self._betas(observations)

score = self.score(observations)

for t in range(L - 1):

P1 = (alphas[t, :].reshape(-1, 1) * self.T.values)

P2 = self.E[observations[t + 1]].T * betas[t +

1].reshape(1, -1)

digammas[t, :, :] = P1 * P2 / score

return digammas
Having the “layer” supplemented with the ._difammas method, we should be able to
Get started Open in app
perform all the necessary calculations. However, it makes sense to delegate the
"management" of the layer to another class. In fact, the model training can be
summarized as follows:

1. Initialize A, B and π.

2. Calculate γ(i, j).

3. Update the model’s A, B and π.

4. We repeat the 2. and 3. until the score p(O|λ) no longer increases.

class HiddenMarkovModel:

def init(self, hml: HiddenMarkovLayer):

self.layer = hml

self._score_init = 0

self.score_history = []

@classmethod

def initialize(cls, states: list, observables: list):

layer = HiddenMarkovLayer.initialize(states, observables)

return cls(layer)

def update(self, observations: list) -> float:

alpha = self.layer._alphas(observations)

beta = self.layer._betas(observations)

digamma = self.layer._digammas(observations)

score = alpha[-1].sum()

gamma = alpha * beta / score

L = len(alpha)

obs_idx = [self.layer.observables.index(x) \

for x in observations]

capture = np.zeros((L, len(self.layer.states),

len(self.layer.observables)))

for t in range(L):

capture[t, :, obs_idx[t]] = 1.0

pi = gamma[0]

T = digamma.sum(axis=0) / gamma[:-1].sum(axis=0).reshape(-1,
1)

E = (capture * gamma[:, :, np.newaxis]).sum(axis=0) /

gamma.sum(axis=0).reshape(-1, 1)

self.layer.pi = ProbabilityVector.from_numpy(pi,
self.layer.states)

self.layer.T = ProbabilityMatrix.from_numpy(T,
self.layer.states, self.layer.states)

Get started self.layer.E

Open in app = ProbabilityMatrix.from_numpy(E,
self.layer.states, self.layer.observables)

return score

def train(self, observations: list, epochs: int, tol=None):

self._score_init = 0

self.score_history = (epochs + 1) * [0]

early_stopping = isinstance(tol, (int, float))

for epoch in range(1, epochs + 1):

score = self.update(observations)

print("Training... epoch = {} out of {}, score =

{}.".format(epoch, epochs, score))

if early_stopping and abs(self._score_init - score) /

score < tol:

print("Early stopping.")

break

self._score_init = score

self.score_history[epoch] = score

Example

np.random.seed(42)

observations = ['3L', '2M', '1S', '3L', '3L', '3L']

states = ['1H', '2C']

observables = ['1S', '2M', '3L']

hml = HiddenMarkovLayer.initialize(states, observables)

hmm = HiddenMarkovModel(hml)

hmm.train(observations, 25)
Get started Open in app

Figure 3. Example of the score funciton during training.

Verification
Let’s look at the generated sequences. The “demanded” sequence is:

| | 0 | 1 | 2 | 3 | 4 | 5 |

|---:|:----|:----|:----|:----|:----|:----|

| 0 | 3L | 2M | 1S | 3L | 3L | 3L |

RUNS = 100000

T = 5

chains = RUNS * [0]

for i in range(len(chains)):

chain = hmm.layer.run(T)[0]

chains[i] = '-'.join(chain)

The table below summarizes simulated runs based on 100000 attempts (see above),
with the frequency of occurrence and number of matching observations.

The bottom line is that if we have truly trained the model, we should see a strong
tendency for it to generate us sequences that resemble the one we require. Let’s see if it
happens.

df = pd.DataFrame(pd.Series(chains).value_counts(), columns=
['counts']).reset_index().rename(columns={'index': 'chain'})

df = pd.merge(df, df['chain'].str.split('-', expand=True),

left_index=True, right_index=True)

s = []

for i in range(T + 1):

s.append(df.apply(lambda x: x[i] == observations[i], axis=1))

df['matched'] = pd.concat(s, axis=1).sum(axis=1)

df['counts'] = df['counts'] / RUNS * 100

dfstarted
Get = df.drop(columns=['chain'])

Open in app
df.head(30)

---

|---:|---------:|:----|:----|:----|:----|:----|:----|----------:|

| 0 | 8.907 | 3L | 3L | 3L | 3L | 3L | 3L | 4 |

| 1 | 4.422 | 3L | 2M | 3L | 3L | 3L | 3L | 5 |

| 2 | 4.286 | 1S | 3L | 3L | 3L | 3L | 3L | 3 |

| 3 | 4.284 | 3L | 3L | 3L | 3L | 3L | 2M | 3 |

| 4 | 4.278 | 3L | 3L | 3L | 2M | 3L | 3L | 3 |

| 5 | 4.227 | 3L | 3L | 1S | 3L | 3L | 3L | 5 |

| 6 | 4.179 | 3L | 3L | 3L | 3L | 1S | 3L | 3 |

| 7 | 2.179 | 3L | 2M | 3L | 2M | 3L | 3L | 4 |

| 8 | 2.173 | 3L | 2M | 3L | 3L | 1S | 3L | 4 |

| 9 | 2.165 | 1S | 3L | 1S | 3L | 3L | 3L | 4 |

| 10 | 2.147 | 3L | 2M | 3L | 3L | 3L | 2M | 4 |

| 11 | 2.136 | 3L | 3L | 3L | 2M | 3L | 2M | 2 |

| 12 | 2.121 | 3L | 2M | 1S | 3L | 3L | 3L | 6 |

| 13 | 2.111 | 1S | 3L | 3L | 2M | 3L | 3L | 2 |

| 14 | 2.1 | 1S | 2M | 3L | 3L | 3L | 3L | 4 |

| 15 | 2.075 | 3L | 3L | 3L | 2M | 1S | 3L | 2 |

| 16 | 2.05 | 1S | 3L | 3L | 3L | 3L | 2M | 2 |

| 17 | 2.04 | 3L | 3L | 1S | 3L | 3L | 2M | 4 |

| 18 | 2.038 | 3L | 3L | 1S | 2M | 3L | 3L | 4 |

| 19 | 2.022 | 3L | 3L | 1S | 3L | 1S | 3L | 4 |

| 20 | 2.008 | 1S | 3L | 3L | 3L | 1S | 3L | 2 |

| 21 | 1.955 | 3L | 3L | 3L | 3L | 1S | 2M | 2 |

| 22 | 1.079 | 1S | 2M | 3L | 2M | 3L | 3L | 3 |

| 23 | 1.077 | 1S | 2M | 3L | 3L | 3L | 2M | 3 |

| 24 | 1.075 | 3L | 2M | 1S | 2M | 3L | 3L | 5 |

| 25 | 1.064 | 1S | 2M | 1S | 3L | 3L | 3L | 5 |

| 26 | 1.052 | 1S | 2M | 3L | 3L | 1S | 3L | 3 |

| 27 | 1.048 | 3L | 2M | 3L | 2M | 1S | 3L | 3 |

| 28 | 1.032 | 1S | 3L | 1S | 2M | 3L | 3L | 3 |

| 29 | 1.024 | 1S | 3L | 1S | 3L | 1S | 3L | 3 |

And here are the sequences that we don’t want the model to create.

| | counts | 0 | 1 | 2 | 3 | 4 | 5 | matched |

|----:|---------:|:----|:----|:----|:----|:----|:----|----------:|

| 266 | 0.001 | 1S | 1S | 3L | 3L | 2M | 2M | 1 |

| 267 | 0.001 | 1S | 2M | 2M | 3L | 2M | 2M | 2 |

| 268 | 0.001 | 3L | 1S | 1S | 3L | 1S | 1S | 3 |

| 269 | 0.001 | 3L | 3L | 3L | 1S | 2M | 2M | 1 |

| 270 | 0.001 | 3L | 1S | 3L | 1S | 1S | 3L | 2 |

| 271 | 0.001 | 1S | 3L | 2M | 1S | 1S | 3L | 1 |

| 272 | 0.001 | 3L | 2M | 2M | 3L | 3L | 1S | 4 |

| 273 | 0.001 | 1S | 3L | 3L | 1S | 1S | 1S | 0 |

| 274 | 0.001 | 3L | 1S | 2M | 2M | 1S | 2M | 1 |

| 275 | 0.001 | 3L | 3L | 2M | 1S | 3L | 2M | 2 |
Get started Open in app
As we can see, there is a tendency for our model to generate sequences that resemble the
one we require, although the exact one (the one that matches 6/6) places itself already
at the 10th position! On the other hand, according to the table, the top 10 sequences are
still the ones that are somewhat similar to the one we request.

To ultimately verify the quality of our model, let’s plot the outcomes together with the
frequency of occurrence and compare it against a freshly initialized model, which is
supposed to give us completely random sequences — just to compare.

hml_rand = HiddenMarkovLayer.initialize(states, observables)

hmm_rand = HiddenMarkovModel(hml_rand)

RUNS = 100000

T = 5

chains_rand = RUNS * [0]

for i in range(len(chains_rand)):

chain_rand = hmm_rand.layer.run(T)[0]

chains_rand[i] = '-'.join(chain_rand)

df2 = pd.DataFrame(pd.Series(chains_rand).value_counts(), columns=

['counts']).reset_index().rename(columns={'index': 'chain'})

df2 = pd.merge(df2, df2['chain'].str.split('-', expand=True),

left_index=True, right_index=True)

s = []

for i in range(T + 1):

s.append(df2.apply(lambda x: x[i] == observations[i], axis=1))

df2['matched'] = pd.concat(s, axis=1).sum(axis=1)

df2['counts'] = df2['counts'] / RUNS * 100

df2 = df2.drop(columns=['chain'])

fig, ax = plt.subplots(1, 1, figsize=(14, 6))

ax.plot(df['matched'], 'g:')

ax.plot(df2['matched'], 'k:')

ax.set_xlabel('Ordered index')

ax.set_ylabel('Matching observations')

ax.set_title('Verification on a 6-observation chain.')

ax2 = ax.twinx()

ax2.plot(df['counts'], 'r', lw=3)

ax2.plot(df2['counts'], 'k', lw=3)

ax2.set_ylabel('Frequency of occurrence [%]')

ax.legend(['trained',
Get started Open in app 'initialized'])

ax2.legend(['trained', 'initialized'])

plt.grid()

plt.show()

Figure 4. Result after training of the model. The dotted lines represent the matched sequences. The lines
represent the frequency of occurrence for a particular sequence: trained model (red) and freshly initialized
(black). The initialized results in almost perfect uniform distribution of sequences, while the trained model
gives a strong preference towards the observable sequence.

It seems we have successfully implemented the training procedure. If we look at the

curves, the initialized-only model generates observation sequences with almost equal
probability. It’s completely random. However, the trained model gives sequences that
are highly similar to the one we desire with much higher frequency. Despite the genuine
sequence gets created in only 2% of total runs, the other similar sequences get generated
approximately as often.

Conclusion
In this article, we have presented a step-by-step implementation of the Hidden Markov
Model. We have created the code by adapting the first principles approach. More
specifically, we have shown how the probabilistic concepts that are expressed through
equations can be implemented as objects and methods. Finally, we demonstrated the
usage of the model with finding the score, uncovering of the latent variable chain and
Get started Open in app
applied the training procedure.

PS. I apologise for the poor rendering of the equations here. Basically, I needed to do it
all manually. However, please feel free to read this article on my home blog. There, I
took care of it ;)

There will be more…

I am planning to bring the articles to next level and offer short screencast video µ-
tutorials.

If you want to be updated concerning the videos and future articles, subscribe to my
newsletter. You can also let me know of your expectations by filling out the form. See
you soon!

Sign up for The Variable

By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials
and cutting-edge research to original features you don't want to miss. Take a look.

Get this newsletter

Data Science Machine Learning Python Probability Algorithms

About Write Help Legal

Get the Medium app

Get started Open in app

Quadratic Formula PROOF
100% (1)
Quadratic Formula PROOF
1 page
Conference Latex Template ECCE
No ratings yet
Conference Latex Template ECCE
6 pages
SigmaXL Version 8 Workbook
No ratings yet
SigmaXL Version 8 Workbook
541 pages
Pennachi - Theory of Asset Pricing
100% (1)
Pennachi - Theory of Asset Pricing
570 pages
Roller Deflection
No ratings yet
Roller Deflection
18 pages
Pivot HH LL Imp
No ratings yet
Pivot HH LL Imp
3 pages
Origin of The South African Measurement System
No ratings yet
Origin of The South African Measurement System
3 pages
Polynomial Series
No ratings yet
Polynomial Series
3 pages
UNIT5 Comparison Tree
No ratings yet
UNIT5 Comparison Tree
52 pages
Proshake Tutorial
No ratings yet
Proshake Tutorial
10 pages
Load Length N LB MM in
No ratings yet
Load Length N LB MM in
3 pages
Class Test On Linear Motion.
No ratings yet
Class Test On Linear Motion.
7 pages
Volumen Finito
No ratings yet
Volumen Finito
11 pages
Intermediate AMC Questions
0% (1)
Intermediate AMC Questions
6 pages
Delivery Feet Data Using K Mean Clustering With Applied SPSS
No ratings yet
Delivery Feet Data Using K Mean Clustering With Applied SPSS
2 pages
Mathematics Grade 4
No ratings yet
Mathematics Grade 4
4 pages
A Comparative Review of 3D Container Loading Algorithms
No ratings yet
A Comparative Review of 3D Container Loading Algorithms
34 pages
CTFT DTFT DFT
No ratings yet
CTFT DTFT DFT
6 pages
Inter 1b Syllabus
No ratings yet
Inter 1b Syllabus
3 pages
AnalysisandDesignofaSmallTwo BarCreepTestSpecimen
No ratings yet
AnalysisandDesignofaSmallTwo BarCreepTestSpecimen
14 pages
Introduction To Simio
No ratings yet
Introduction To Simio
10 pages
Module 5.3 Lateral Loads On Building Frames (Portal and Cantilever Method)
No ratings yet
Module 5.3 Lateral Loads On Building Frames (Portal and Cantilever Method)
11 pages
Eighth Semester B.Tech Degree Examination, May 2019: C H1005 Pages: 2
No ratings yet
Eighth Semester B.Tech Degree Examination, May 2019: C H1005 Pages: 2
2 pages
Chapter 3 FM I
No ratings yet
Chapter 3 FM I
16 pages
Capgemini Frequent Questions+Previous Year
100% (1)
Capgemini Frequent Questions+Previous Year
57 pages
Alligation and Mixture - Lecture
No ratings yet
Alligation and Mixture - Lecture
33 pages
RCC54 Circular Column Charting
No ratings yet
RCC54 Circular Column Charting
13 pages
Sawali Synthesis Paper
No ratings yet
Sawali Synthesis Paper
9 pages
Roark's Formulas For Excel - Superposition Wizard: Universal Technical Systems Inc
No ratings yet
Roark's Formulas For Excel - Superposition Wizard: Universal Technical Systems Inc
6 pages
Electromagnetic Field of A Moving Point Charge
No ratings yet
Electromagnetic Field of A Moving Point Charge
7 pages
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6458)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5181)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (1005)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (464)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (582)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2814)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2016)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1022)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4135)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4372)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2141)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (280)

Hidden Markov Model - Implemented From Scratch - by Oleg Żero - Towards Data Science

Uploaded by

Hidden Markov Model - Implemented From Scratch - by Oleg Żero - Towards Data Science

Uploaded by

Get started Open in app

Follow 596K Followers

Hidden Markov Model — Implemented from

A short pause in the summer heat. Portugal, 2019.

T - length of the observation sequence.

N - number of latent (hidden) states.

Q = {q₀, q₁, …} - hidden states.

V = {0, 1, …, M — 1} - set of possible observations.

A - state transition matrix.

B - emission probability matrix.

π- initial state probability distribution.

X = (x₀, x₁, …), x_t ∈ Q - hidden state sequence.

A = {a_ij} — begin an transition matrix.

B = {b_j(k)} — being an emission matrix.

The probabilities associated with transition and observation (emission) are:

The model is therefore defined as a collection:

def __init__(self, probabilities: dict):

assert len(states) == len(probs),

"The probabilities must match the states."

assert len(states) == len(set(states)),

"The states must be unique."

assert abs(sum(probs) - 1.0) < 1e-12,

"Probabilities must sum up to 1."

assert len(list(filter(lambda x: 0 <= x <= 1, probs))) ==

"Probabilities must be numbers from [0, 1] interval."

probabilities[x], self.states))).reshape(1, -1)

def initialize(cls, states: list):

rand = np.random.rand(size) / (size**2) + 1 / size

return cls(dict(zip(states, rand)))

Get started Open in app

return cls(dict(zip(states, list(array))))

return {k:v for k, v in zip(self.states,

return pd.DataFrame(self.values, columns=self.states, index=

return "P({}) = {}.".format(self.states, self.values)

def __eq__(self, other):

if not isinstance(other, ProbabilityVector):

if (self.states == other.states) and (self.values ==

def __getitem__(self, state: str) -> float:

if state not in self.states:

raise ValueError("Requesting unknown probability state

return float(self.values[0, index])

def __mul__(self, other) -> np.ndarray:

return self.values * other.values

elif isinstance(other, (int, float)):

return self.values * other

def __rmul__(self, other) -> np.ndarray:

def __matmul__(self, other) -> np.ndarray:

return self.values @ other.values

def __truediv__(self, number) -> np.ndarray:

if not isinstance(number, (int, float)):

return x / number if number != 0 else x / (number + 1e-12)

Get started index

The probabilities must sum up to 1 (up to a certain tolerance).

All probabilities must be 0 ≤ p ≤ 1.

Having ensured that, we also provide two alternative ways to instantiate

1. We instantiate the objects randomly — it will be useful when training.

1. comparison ( __eq__ ) - to know if any two PV's are equal,

2. element-wise multiplication of two PV’s or multiplication with a scalar ( __mul__ and

5. argmax to find for which state the probability is the highest.

6. __getitem__ to enable selecting value by the key.

Internally, the values are stored as a numpy array of size (1 × N).

a1 = ProbabilityVector({'rain': 0.7, 'sun': 0.3})

a2 = ProbabilityVector({'sun': 0.1, 'rain': 0.9})

print("Element-wise multiplication:", a1 * a2)

>>> rain sun

probability 0.7 0.3

probability 0.9 0.1

>>> Comparison: False

>>> Element-wise multiplication: [[0.63 0.03]]

>>> Argmax: rain

>>> Getitem: 0.7

def __init__(self, prob_vec_dict: dict):

assert len(prob_vec_dict) > 1, \

"The numebr of input probability vector must be greater

assert len(set([str(x.states) for x in

"All internal states of all the vectors must be

def init(self, probabilities: dict):

def eq(self, other):

def getitem(self, state: str) -> float:

def mul(self, other) -> np.ndarray:

def rmul(self, other) -> np.ndarray:

def matmul(self, other) -> np.ndarray:

def truediv(self, number) -> np.ndarray:

1. comparison ( eq ) - to know if any two PV's are equal,

2. element-wise multiplication of two PV’s or multiplication with a scalar ( mul and

6. getitem to enable selecting value by the key.

def init(self, prob_vec_dict: dict):

def getitem(self, observable: str) -> np.ndarray:

def init(self, T, E, pi):

return list(product((self.states,) chain_length))