100% found this document useful (17 votes)

344 views14 pages

Mixture and Hidden Markov Models With R Extended Version Download

The book 'Mixture and Hidden Markov Models with R' offers a practical introduction to mixture models and hidden Markov models, targeting researchers and students in social and behavioral sciences. It balances statistical theory with practical applications, providing examples from various fields and utilizing the R programming language for analysis. The book includes online resources for code and datasets, facilitating replication and adaptation of the analyses presented.

Uploaded by

hu.ngngietcoachs.oanh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (17 votes)

344 views14 pages

Mixture and Hidden Markov Models With R Extended Version Download

Uploaded by

hu.ngngietcoachs.oanh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Mixture and Hidden Markov Models with R

Visit the link below to download the full version of this book:

https://medipdf.com/product/mixture-and-hidden-markov-models-with-r/

Click Download Now

This series of inexpensive and focused books on R is aimed at practitioners.
Books can discuss the use of R in a particular subject area (e.g., epidemiology,
econometrics, psychometrics) or as it relates to statistical topics (e.g., missing data,
longitudinal data). In most cases, books combine LaTeX and R so that the code for
figures and tables can be put on a website. Authors should assume a background as
supplied by Dalgaard’s Introductory Statistics with R or other introductory books
so that each book does not repeat basic material.
How to Submit Your Proposal
Book proposals and manuscripts should be submitted to one of the publishing
editors in your region per email – for the list of statistics editors by their loca-
tion please see https://www.springer.com/gp/statistics/contact-us. All submissions
should include a completed Book Proposal Form.
For general and technical questions regarding the series and the submission
process please contact Laura Briskman ([email protected]) or Veronika
Rosteck ([email protected]).
Ingmar Visser • Maarten Speekenbrink

Mixture and Hidden Markov

Models with R
Ingmar Visser Maarten Speekenbrink
University of Amsterdam University College London
Amsterdam, The Netherlands London, UK

ISSN 2197-5736 ISSN 2197-5744 (electronic)

Use R!
ISBN 978-3-031-01438-3 ISBN 978-3-031-01440-6 (eBook)
https://doi.org/10.1007/978-3-031-01440-6

© Springer Science+Business Media, LLC, part of Springer Nature 2022

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To Lieke and Pien
I.V.

To Gabriel and Hunter

M.S.
Preface

This book aims to provide a self-contained practical introduction to mixture models

and hidden Markov models. The reason for introducing both in one book is that
there are very close links between these models. This allows us to introduce
important concepts, such as maximum likelihood estimation and the Expectation-
Maximization algorithm, in the relatively simpler context of mixture models.
Approaching hidden Markov models from a thorough understanding of mixture
models involves, we hope, a relatively small conceptual leap. We aimed to provide
a reasonable balance between statistical theory and practice. The objective is to
provide enough mathematical details—but no more!—to allow our target audience
to understand key results that are necessary to apply these models.
Our target audience are those with a more applied background, in particular
researchers, graduate, and advanced undergraduate students in the social and
behavioral sciences. Researchers or future researchers hence who see the potential
for applying these models and explaining heterogeneity in their data, but who
currently lack the tools to fulfill this potential. Those looking for a more purely
mathematical treatment of mixture and hidden Markov models we gladly refer to
the books by Cappe et al. (2005) and/or Frühwirth-Schnatter (2006).
To familiarize readers with the possibilities of mixture and hidden Markov
models, a large part of this book consists of practical examples of applying
these models, many of which taken from our own research in developmental and
experimental psychology. Much of our work on these models was driven by the
research questions that arose during the study of experimental or developmental
(time series) data. Over the years, we have also accumulated examples from other
fields, such as climate change and economics. In the examples, we provide some
background knowledge of these different domains as applicable to understand the
rationale of the analyses. At the same time, we abstract away from many details and
focus on the generalizability of the presented models to research questions in other
domains.
The example analyses in this book rely on the R programming language
and software environment (R Core Team 2020) and in particular the depmixS4
package (Visser and Speekenbrink 2010). Nowadays, the choice for R hardly needs

vii
viii Preface

justification, having become the lingua franca of statistics and data science. R is
open source, freely available, and has an active user community such that anyone
interested can add and contribute packages implementing new analytical methods.
As all required tools are freely available, the readers should be able to replicate the
example analyses on their own computers, as well as adapting the analyses for their
own purposes. To aid in this process, all the code for running the examples in the
book are provided online at https://depmix.github.io/hmmr/. Moreover, the datasets
and special purpose functions written for this book are available as an R package
called hmmr. Section 1.2 provides pointers for getting started with R and provides
all the basics that are needed to then understand and apply subsequent analyses and
examples.

Chapter Outlines and Reading Guide

Chapter 1 provides a brief introduction to R and describes the basic features of

the datasets used throughout this book to illustrate the use of mixture and hidden
Markov models. Chapters 2 and 4 are mostly theoretical in nature, providing a
statistical treatment of mixture and latent class models (Chap. 2), and the extension
of those models into hidden Markov models (Chap. 4). Chapter 3 provides a number
of worked examples of applications of mixture and latent class models to analyze
both univariate and multivariate data. Similarly, Chaps. 5 and 6 provide detailed
example analyses which apply hidden Markov models to univariate (Chap. 5) and
multivariate (Chap. 6) time series data. Finally, Chap. 7 discusses some extensions
of the basic hidden Markov model, as well as alternative estimation techniques,
including a brief introduction to Bayesian estimation of these models.
In Chaps. 2 and 4, the first two sections are devoted to conceptually describing
and defining mixture and hidden Markov models, respectively. These sections
lay the foundations for understanding how these models work and how they can
be usefully applied. These sections should be read by everyone. Rushed readers
wanting to get started right away with applying the models may skip the remainder
of those chapters, where we delve deeper into parameter estimation and inference.
The examples in Chaps. 3, 5, and 6, are standalone sections that treat data with
particular characteristics and describe the models that can be used to answer the
research questions of interest. Where warranted, these application sections also refer
back to the relevant sections in Chaps. 2 and 4 which offer more technical detail of
topics that arise. Readers who skipped most of Chaps. 2 and 4 can then read the
relevant parts of these chapters when the need arises.
Preface ix

Acknowledgments

Writing and producing a book is rarely done in isolation, and this one is no
exception. Many people have contributed by asking tough research questions,
providing data, LATEX, and Sweave() advice. Below is a list of people that we
know have surely contributed in important ways. We know that this list is likely
incomplete, so just let us know if you ought to be on the list so we can include you
in future editions. We would like to thank Achim Zeileis for getting us started with
the combination of LATEX, Sweave() and make files to produce this book. We would
like to thank Chen Haibo for providing the S&P-500 data example, Brenda Jansen
for sharing her balance scale data, Gilles Dutilh and Han van der Maas for sharing
their speed-accuracy data, the Water Corporation of Western Australia for providing
the Perth dams data on their website, Bianca van Bers for sharing the dimensional
change card sorting task data, Han van der Maas for sharing the conservation of
liquid data, Maartje Raijmakers for sharing the discrimination data, and Emmanouil
Konstantinidis for sharing the Iowa gambling task data. Finally, we would like to
thank John Kimmel, Marc Strauss, and Laura Briskman for inviting us to write this
book and organize things Springer.
This book has been taking us a while to complete. On a more personal level,
many people have been by our sides during that period. Maarten would like to thank
Gabriel and Hunter for being wondering and wonderful human beings, and Ria and
Jan for their love and support. Ingmar would like to thank Jaro for her love and
support.

Amsterdam, The Netherlands Ingmar Visser

London, UK Maarten Speekenbrink
January 2022
Settings, Appearance, and Notation

In producing the examples in this book, R is mainly run at its default settings. A
few modifications were made to render the output more easily readable; these were
invoked by the following chunk of R-code:

R> options(prompt = "R> ", continue = "+ ", width = 60,

+ digits = 4, show.signif.stars = FALSE,
+ useFancyQuotes = FALSE)

This replaces the standard R prompt > by R>. For compactness, digits = 4
reduces the number of digits shown when printing numbers from the default of 7.
Note that this does not reduce the precision with which these numbers are internally
processed and stored.
We use set.seed(x) whenever we generate data or fit models such that the
exact values of data and fitted parameters may be replicated. When fitting models,
this is necessary, because random starting values are generated (see Sect. 2.3.6 for
more details).
We use a typewriter font for all code; additionally, function names are
followed by parentheses, as in plot(), and class names (a concept that is explained
in Chap. 1) are displayed as in “depmix.” Furthermore, boldface is used for package
names, as in hmmr.
The following symbols are used throughout the book:
A Transition matrix
π Initial state probability vector
S Stochastic state variable
s Realization of the state variable
Y Stochastic (possibly multivariate) observation variable
y Realization of the observation variable
z Covariate, possibly multivariate
θ Total model parameter vector; θ = (θ pr , θ tr , θ obs )
θ pr Subvector of the parameter vector with parameters of the prior model
θ tr Subvector of the parameter vector with parameters of the transition model

xi
xii Settings, Appearance, and Notation

θ obs Subvector of the parameter vector with parameters of the observation model
T Total number of time points
N Number of states of a model
f Probability density function
P Probability distribution
H Hessian matrix
Contents

1 Introduction and Preliminaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 What Are Mixture and Hidden Markov Models?. . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Getting Started with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Help! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Loading Packages and Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Object Types and Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.4 Visualizing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.5 Summarizing Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2.6 Linear and Generalized Linear Models. . . . . . . . . . . . . . . . . . . . . . . 17
1.2.7 Multinomial Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.2.8 Time-Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.3 Datasets Used in the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.3.1 Speed-Accuracy Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.3.2 S&P 500 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.3.3 Perth Dams Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.3.4 Discrimination Learning Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.3.5 Balance Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.3.6 Repeated Measures on the Balance Scale Task . . . . . . . . . . . . . . 33
1.3.7 Dimensional Change Card Sorting Task Data . . . . . . . . . . . . . . . 34
1.3.8 Weather Prediction Task Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.3.9 Conservation of Liquid Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.3.10 Iowa Gambling Task Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2 Mixture and Latent Class Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.1 Introduction and Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.2 Definitions and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2.1 Mixture Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.2.2 Example: Generating Data from a Mixture Distribution . . . . 48
2.2.3 Parameters of the Mixture Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

xiii
xiv Contents

2.2.4 Mixture Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.2.5 Posterior Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.3 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3.1 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3.2 Numerical Optimization of the Likelihood. . . . . . . . . . . . . . . . . . . 54
2.3.3 Expectation Maximization (EM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.3.4 Optimizing Parameters Subject to Constraints . . . . . . . . . . . . . . . 66
2.3.5 EM or Numerical Optimization? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.3.6 Starting Values for Parameters in Mixture Models . . . . . . . . . . 70
2.4 Parameter Inference: Likelihood Ratio Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.4.1 Example: Equality Constraint on Standard Deviations . . . . . . 73
2.5 Parameter Inference: Standard Errors and Confidence Intervals. . . . . . 74
2.5.1 Finite Difference Approximation of the Hessian . . . . . . . . . . . . 76
2.5.2 Parametric Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.5.3 Correcting the Hessian for Linear Constraints . . . . . . . . . . . . . . . 79
2.6 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.6.1 Likelihood-Ratio Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.6.2 Information Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
2.6.3 Example: Model Selection for the Speed1 RT Data . . . . . . . . . 89
2.7 Covariates on the Prior Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.8 Identifiability of Mixture Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
2.9 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3 Mixture and Latent Class Models: Applications . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.1 Gaussian Mixture for the S&P500 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.2 Gaussian Mixture Model for Conservation Data . . . . . . . . . . . . . . . . . . . . . . 99
3.3 Bivariate Gaussian Mixture Model for Conservation Data . . . . . . . . . . . 100
3.4 Latent Class Model for Balance Scale Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.4.1 Model Selection and Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.4.2 Testing Item Homogeneity Using Parameter Constraints . . . 110
3.5 Binomial Mixture Model for Balance Scale Data . . . . . . . . . . . . . . . . . . . . . 112
3.5.1 Binomial Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.5.2 Mixture Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.5.3 Model Selection Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.6 Model Selection with the Bootstrap Likelihood Ratio . . . . . . . . . . . . . . . . 119
3.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4 Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.1 Preliminaries: Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.1.2 Properties of Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.2 Introducing the Hidden Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.2.2 Relation Between Hidden Markov and Mixture Model . . . . . 136
4.2.3 Example: Bernoulli Hidden Markov Model . . . . . . . . . . . . . . . . . 137
4.2.4 Likelihood and Inference Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Contents xv

4.3 Filtering, Likelihood, Smoothing and Prediction . . . . . . . . . . . . . . . . . . . . . . 140

4.3.1 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.3.2 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
4.3.3 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
4.3.4 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
4.3.5 The Likelihood Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
4.3.6 Multiple Timeseries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.3.7 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
4.4 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
4.4.1 Numerical Optimization of the Likelihood. . . . . . . . . . . . . . . . . . . 152
4.4.2 Expectation Maximization (EM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
4.5 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
4.5.1 Local Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
4.5.2 Global Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
4.6 Parameter Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
4.6.1 Standard Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
4.7 Covariates on Initial and Transition Probabilities . . . . . . . . . . . . . . . . . . . . . 163
4.8 Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
4.8.1 Missing Data in Hidden Markov Models . . . . . . . . . . . . . . . . . . . . 166
4.8.2 Missing at Random . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
4.8.3 State-Dependent Missingness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5 Univariate Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
5.1 Gaussian Hidden Markov Model for Financial Time Series . . . . . . . . . . 173
5.2 Bernoulli HMM for the DCCS Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.3 Accounting for Autocorrelation Between Response Times . . . . . . . . . . . 182
5.3.1 Response Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
5.3.2 Models for Response Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
5.3.3 Model Assessment and Selection of RT Models . . . . . . . . . . . . . 186
5.4 Change Point HMM for Climate Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
5.5 Generalized Linear Hidden Markov Models for Multiple
Cue Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
6 Multivariate Hidden Markov Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
6.1 Latent Transition Model for Balance Scale Data . . . . . . . . . . . . . . . . . . . . . . 201
6.1.1 Learning and Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
6.2 Switching Between Speed and Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
6.2.1 Modeling Hysteresis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.2.2 Testing Conditional Independence and Further
Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
6.3 Dependency Between Binomial and Multinomial
Responses: The IGT Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
xvi Contents

7 Extensions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
7.1 Higher-Order Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
7.1.1 Reformulating a Higher-Order HMM as a
First-Order HMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
7.1.2 Example: A Two-State Second-Order HMM for
Discrimination Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
7.2 Models with a Distributed State Representation . . . . . . . . . . . . . . . . . . . . . . . 237
7.3 Dealing with Practical Issues in Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
7.3.1 Unbounded Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
7.4 The Classification Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
7.4.1 Mixture Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
7.4.2 Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
7.5 Bayesian Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
7.5.1 Sampling States and Model Parameters . . . . . . . . . . . . . . . . . . . . . . 249
7.5.2 Sampling Model Parameters by Marginalizing
Over Hidden States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Epilogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265