Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Learning Probabilistic Graphical Models in R
Learning Probabilistic Graphical Models in R
Learning Probabilistic Graphical Models in R
Ebook425 pages2 hours

Learning Probabilistic Graphical Models in R

Rating: 0 out of 5 stars

()

Read preview

About this ebook

About This Book
  • Predict and use a probabilistic graphical models (PGM) as an expert system
  • Comprehend how your computer can learn Bayesian modeling to solve real-world problems
  • Know how to prepare data and feed the models by using the appropriate algorithms from the appropriate R package
Who This Book Is For

This book is for anyone who has to deal with lots of data and draw conclusions from it, especially when the data is noisy or uncertain. Data scientists, machine learning enthusiasts, engineers, and those who are curious about the latest advances in machine learning will find PGM interesting.

LanguageEnglish
Release dateApr 29, 2016
ISBN9781784397418
Learning Probabilistic Graphical Models in R

Related to Learning Probabilistic Graphical Models in R

Related ebooks

Data Modeling & Design For You

View More

Related articles

Reviews for Learning Probabilistic Graphical Models in R

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Learning Probabilistic Graphical Models in R - David Bellot

    Table of Contents

    Learning Probabilistic Graphical Models in R

    Credits

    About the Author

    About the Reviewers

    www.PacktPub.com

    eBooks, discount offers, and more

    Why subscribe?

    Preface

    What this book covers

    What you need for this book

    Who this book is for

    Conventions

    Reader feedback

    Customer support

    Downloading the example code

    Errata

    Piracy

    Questions

    1. Probabilistic Reasoning

    Machine learning

    Representing uncertainty with probabilities

    Beliefs and uncertainty as probabilities

    Conditional probability

    Probability calculus and random variables

    Sample space, events, and probability

    Random variables and probability calculus

    Joint probability distributions

    Bayes' rule

    Interpreting the Bayes' formula

    A first example of Bayes' rule

    A first example of Bayes' rule in R

    Probabilistic graphical models

    Probabilistic models

    Graphs and conditional independence

    Factorizing a distribution

    Directed models

    Undirected models

    Examples and applications

    Summary

    2. Exact Inference

    Building graphical models

    Types of random variable

    Building graphs

    Probabilistic expert system

    Basic structures in probabilistic graphical models

    Variable elimination

    Sum-product and belief updates

    The junction tree algorithm

    Examples of probabilistic graphical models

    The sprinkler example

    The medical expert system

    Models with more than two layers

    Tree structure

    Summary

    3. Learning Parameters

    Introduction

    Learning by inference

    Maximum likelihood

    How are empirical and model distribution related?

    The ML algorithm and its implementation in R

    Application

    Learning with hidden variables – the EM algorithm

    Latent variables

    Principles of the EM algorithm

    Derivation of the EM algorithm

    Applying EM to graphical models

    Summary

    4. Bayesian Modeling – Basic Models

    The Naive Bayes model

    Representation

    Learning the Naive Bayes model

    Bayesian Naive Bayes

    Beta-Binomial

    The prior distribution

    The posterior distribution with the conjugacy property

    Which values should we choose for the Beta parameters?

    The Gaussian mixture model

    Definition

    Summary

    5. Approximate Inference

    Sampling from a distribution

    Basic sampling algorithms

    Standard distributions

    Rejection sampling

    An implementation in R

    Importance sampling

    An implementation in R

    Markov Chain Monte-Carlo

    General idea of the method

    The Metropolis-Hastings algorithm

    MCMC for probabilistic graphical models in R

    Installing Stan and RStan

    A simple example in RStan

    Summary

    6. Bayesian Modeling – Linear Models

    Linear regression

    Estimating the parameters

    Bayesian linear models

    Over-fitting a model

    Graphical model of a linear model

    Posterior distribution

    Implementation in R

    A stable implementation

    More packages in R

    Summary

    7. Probabilistic Mixture Models

    Mixture models

    EM for mixture models

    Mixture of Bernoulli

    Mixture of experts

    Latent Dirichlet Allocation

    The LDA model

    Variational inference

    Examples

    Summary

    A. Appendix

    References

    Books on the Bayesian theory

    Books on machine learning

    Papers

    Index

    Learning Probabilistic Graphical Models in R


    Learning Probabilistic Graphical Models in R

    Copyright © 2016 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    First published: April 2016

    Production reference: 1270416

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham B3 2PB, UK.

    ISBN 978-1-78439-205-5

    www.packtpub.com

    Credits

    Author

    David Bellot

    Reviewers

    Mzabalazo Z. Ngwenya

    Prabhanjan Tattar

    Acquisition Editor

    Divya Poojari

    Content Development Editor

    Trusha Shriyan

    Technical Editor

    Vivek Arora

    Copy Editor

    Stephen Copestake

    Project Coordinator

    Kinjal Bari

    Proofreader

    Safis Editing

    Indexer

    Mariammal Chettiyar

    Graphics

    Abhinash Sahu

    Production Coordinator

    Nilesh Mohite

    Cover Work

    Nilesh Mohite

    About the Author

    David Bellot is a PhD graduate in computer science from INRIA, France, with a focus on Bayesian machine learning. He was a postdoctoral fellow at the University of California, Berkeley, and worked for companies such as Intel, Orange, and Barclays Bank. He currently works in the financial industry, where he develops financial market prediction algorithms using machine learning. He is also a contributor to open source projects such as the Boost C++ library.

    About the Reviewers

    Mzabalazo Z. Ngwenya holds a postgraduate degree in mathematical statistics from the University of Cape Town. He has worked extensively in the field of statistical consulting and has considerable experience working with R. Areas of interest to him are primarily centered around statistical computing. Previously, he has been involved in reviewing the following Packt Publishing titles: Learning RStudio for R Statistical Computing, Mark P.J. van der Loo and Edwin de Jonge; R Statistical Application Development by Example Beginner's Guide, Prabhanjan Narayanachar Tattar; Machine Learning with R, Brett Lantz; R Graph Essentials, David Alexandra Lillis; R Object-oriented Programming, Kelly Black; Mastering Scientific Computing with R, Paul Gerrard and Radia Johnson; and Mastering Data Analysis with R, Gergely Darócz.

    Prabhanjan Tattar is currently working as a senior data scientist at Fractal Analytics, Inc. He has 8 years of experience as a statistical analyst. Survival analysis and statistical inference are his main areas of research/interest. He has published several research papers in peer-reviewed journals and authored two books on R: R Statistical Application Development by Example, Packt Publishing; and A Course in Statistics with R, Wiley. The R packages gpk, RSADBE, and ACSWR are also maintained by him.

    www.PacktPub.com

    eBooks, discount offers, and more

    Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

    At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

    https://www2.packtpub.com/books/subscription/packtlib

    Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

    Why subscribe?

    Fully searchable across every book published by Packt

    Copy and paste, print, and bookmark content

    On demand and accessible via a web browser

    Preface

    Probabilistic graphical models is one of the most advanced techniques in machine learning to represent data and models in the real world with probabilities. In many instances, it uses the Bayesian paradigm to describe algorithms that can draw conclusions from noisy and uncertain real-world data.

    The book covers topics such as inference (automated reasoning and learning), which is automatically building models from raw data. It explains how all the algorithms work step by step and presents readily usable solutions in R with many examples. After covering the basic principles of probabilities and the Bayes formula, it presents Probabilistic Graphical Models(PGMs) and several types of inference and learning algorithms. The reader will go from the design to the automatic fitting of the model.

    Then, the books focuses on useful models that have proven track records in solving many data science problems, such as Bayesian classifiers, Mixtures models, Bayesian Linear Regression, and also simpler models that are used as basic components to build more complex models.

    What this book covers

    Chapter 1, Probabilistic Reasoning, covers topics from the basic concepts of probabilities to PGMs as a generic framework to do tractable, efficient, and easy modeling with probabilistic models, through the presentation of the Bayes formula.

    Chapter 2, Exact Inference, shows you how to build PGMs by combining simple graphs and perform queries on the model using an exact inference algorithm called the junction tree algorithm.

    Chapter 3, Learning Parameters, includes fitting and learning the PGM models from data sets with the Maximum Likelihood approach.

    Chapter 4, Bayesian Modeling – Basic Models, covers simple and powerful Bayesian models that can be used as building blocks for more advanced models and shows you how to fit and query them with adapted algorithms.

    Chapter 5, Approximate Inference, covers the second way to perform an inference in PGM using sampling algorithms and a presentation of the main sampling algorithms such as MCMC.

    Chapter 6, Bayesian Modeling – Linear Models, shows you a more Bayesian view of the standard linear regression algorithm and a solution to the problem of over-fitting.

    Chapter 7, Probabilistic Mixture Models, goes over more advanced probabilistic models in which the data comes from a mixture of several simple models.

    Appendix, References, includes all the books and articles which have been used to write this book.

    What you need for this book

    All the examples in this book can be used with R version 3 or above on any platform and operating system supporting R.

    Who this book is for

    This book is for anyone who has to deal with lots of data and draw conclusions from it, especially when the data is noisy or uncertain. Data scientists, machine learning enthusiasts, engineers, and those who are curious about the latest advances in machine learning will find PGM interesting.

    Conventions

    In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

    Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: We can also mention the arm package, which provides Bayesian versions of glm() and polr() and implements hierarchical models.

    Any command-line input or output is written as follows:

    pred_sigma <- sqrt(sigma^2 + apply((T%*%posterior_sigma)*T, MARGIN=1, FUN=sum)) upper_bound <- T%*%posterior_beta + qnorm(0.95)*pred_sigma lower_bound <- T%*%posterior_beta - qnorm(0.95)*pred_sigma

    Note

    Warnings or important notes appear in a box like this.

    Tip

    Tips and tricks appear like this.

    Reader feedback

    Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

    To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.

    If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

    Customer support

    Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

    Downloading the example code

    You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

    You can download the code files by following these steps:

    Log in or register to our website using your e-mail address and password.

    Hover the mouse pointer on the SUPPORT tab at the top.

    Click on Code Downloads & Errata.

    Enter the name of the book in the Search box.

    Select the book for which you're looking to download the code files.

    Choose from the drop-down menu where you purchased this book from.

    Click on Code Download.

    You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Please note that you need to be logged in to your Packt account.

    Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

    WinRAR / 7-Zip for Windows

    Zipeg / iZip / UnRarX for Mac

    7-Zip / PeaZip for Linux

    Errata

    Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

    To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

    Piracy

    Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

    Please contact us at <[email protected]> with a link to the suspected pirated material.

    We appreciate your help in protecting our authors and our ability to bring you valuable content.

    Questions

    If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.

    Chapter 1. Probabilistic Reasoning

    Among all the predictions that were made about the 21st century, maybe the most unexpected one was that we would collect such a formidable amount of data about everything, everyday, and everywhere in the world. Recent years have seen an incredible explosion of data collection about our world, our lives, and technology; this is the main driver of what we can certainly call a revolution. We live in the Age of Information. But collecting data is nothing if we don't exploit it and try to extract knowledge out of it.

    At the beginning of the 20th century, with the birth of statistics, the world was all about collecting data and making statistics. In that time, the only reliable tools were pencils and paper and of course, the eyes and ears of the observers. Scientific observation was still in its infancy, despite the prodigious development of the 19th century.

    More than a hundred years later, we have computers, we have electronic sensors, we have massive data storage and we are able to store huge amounts of data continuously about, not only our physical world, but also our lives, mainly through the use of social networks, the Internet, and mobile phones. Moreover, the density of our storage technology has increased so much that we can, nowadays, store months if not years of data into a very small volume that can fit in the palm of our hand.

    But storing data is not acquiring knowledge. Storing data is just keeping it somewhere for future use. At the same time as our storage capacity dramatically evolved, the capacity of modern computers increased too, at a pace that is sometimes hard to believe. When I was a doctoral student, I remember how proud I was when in the laboratory I received that brand-new, shiny, all-powerful PC for carrying my research work. Today, my old smart phone, which fits in my pocket, is more than 20 times faster.

    Therefore in this book, you will learn one of the most advanced techniques to transform data into knowledge: machine learning. This technology is used in every aspect of modern life now, from search engines, to stock market predictions, from speech recognition to autonomous vehicles. Moreover it is used in many fields where one would not suspect it at all, from quality assurance in product chains to optimizing the placement of antennas for mobile phone networks.

    Machine learning is the marriage between computer science and probabilities and statistics. A central theme in machine learning is the problem of inference or how to produce knowledge or predictions using an algorithm fed with data and examples. And this brings us to the two fundamental aspects of machine learning: the design of algorithms that can extract patterns and high-level knowledge from vast amounts of data and also the design of algorithms that can use this knowledge—or, in scientific terms: learning and inference.

    Pierre-Simon Laplace (1749-1827) a French mathematician and one of the greatest scientists of all time, was presumably among the first to understand an important aspect of data collection: data is unreliable, uncertain and, as we say today, noisy. He was also the first to develop the use of probabilities to deal with such aspects of uncertainty and

    Enjoying the preview?
    Page 1 of 1