Learning Probabilistic Graphical Models in R
By David Bellot
()
About this ebook
- Predict and use a probabilistic graphical models (PGM) as an expert system
- Comprehend how your computer can learn Bayesian modeling to solve real-world problems
- Know how to prepare data and feed the models by using the appropriate algorithms from the appropriate R package
This book is for anyone who has to deal with lots of data and draw conclusions from it, especially when the data is noisy or uncertain. Data scientists, machine learning enthusiasts, engineers, and those who are curious about the latest advances in machine learning will find PGM interesting.
Related to Learning Probabilistic Graphical Models in R
Related ebooks
Learning Bayesian Models with R Rating: 5 out of 5 stars5/5Bayesian Analysis with Python Rating: 5 out of 5 stars5/5R High Performance Programming Rating: 4 out of 5 stars4/5R Object-oriented Programming Rating: 3 out of 5 stars3/5Effective Amazon Machine Learning Rating: 0 out of 5 stars0 ratingsBuilding a Recommendation System with R Rating: 0 out of 5 stars0 ratingsStatistics for Machine Learning: Implement Statistical methods used in Machine Learning using Python (English Edition) Rating: 0 out of 5 stars0 ratingsMastering Probabilistic Graphical Models Using Python Rating: 3 out of 5 stars3/5Designing Machine Learning Systems with Python Rating: 0 out of 5 stars0 ratingsGetting Started with Python Data Analysis Rating: 0 out of 5 stars0 ratingsPython Data Cleaning Cookbook: Prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI Rating: 0 out of 5 stars0 ratingsLearning Predictive Analytics with Python Rating: 0 out of 5 stars0 ratingsMastering Python for Data Science Rating: 3 out of 5 stars3/5Learning Data Mining with Python Rating: 0 out of 5 stars0 ratingsPython Data Analysis - Second Edition Rating: 0 out of 5 stars0 ratingsMastering Predictive Analytics with R Rating: 4 out of 5 stars4/5Data Analysis with Python: Introducing NumPy, Pandas, Matplotlib, and Essential Elements of Python Programming (English Edition) Rating: 0 out of 5 stars0 ratingsR Graph Essentials Rating: 0 out of 5 stars0 ratingsMachine Learning with R, the tidyverse, and mlr Rating: 0 out of 5 stars0 ratingsMachine Learning with R - Third Edition: Expert techniques for predictive modeling, 3rd Edition Rating: 0 out of 5 stars0 ratingsMarkov Models Supervised and Unsupervised Machine Learning: Mastering Data Science And Python Rating: 2 out of 5 stars2/5Bayes’ Theorem and Bayesian Statistics: Getting Started With Statistics Rating: 0 out of 5 stars0 ratingsAn Introduction to Statistical Computing: A Simulation-based Approach Rating: 0 out of 5 stars0 ratings
Data Modeling & Design For You
150 Most Poweful Excel Shortcuts: Secrets of Saving Time with MS Excel Rating: 3 out of 5 stars3/5Microsoft Access: Database Creation and Management through Microsoft Access Rating: 0 out of 5 stars0 ratingsThe Secrets of ChatGPT Prompt Engineering for Non-Developers Rating: 5 out of 5 stars5/5Neural Networks for Beginners: An Easy-to-Follow Introduction to Artificial Intelligence and Deep Learning Rating: 2 out of 5 stars2/5Data Visualization: a successful design process Rating: 4 out of 5 stars4/5Mapping with ArcGIS Pro: Design accurate and user-friendly maps to share the story of your data Rating: 0 out of 5 stars0 ratingsData Analytics for Beginners: Introduction to Data Analytics Rating: 4 out of 5 stars4/5DAX Cookbook: Over 120 recipes to enhance your business with analytics, reporting, and business intelligence Rating: 0 out of 5 stars0 ratingsMachine Learning Interview Questions Rating: 5 out of 5 stars5/5Managing Data Using Excel Rating: 5 out of 5 stars5/5Thinking in Algorithms: Strategic Thinking Skills, #2 Rating: 4 out of 5 stars4/5Data Analytics with Python: Data Analytics in Python Using Pandas Rating: 3 out of 5 stars3/5DAX Patterns: Second Edition Rating: 5 out of 5 stars5/5The Definitive Guide to Power Query (M): Mastering complex data transformation with Power Query Rating: 5 out of 5 stars5/5How To Make Money With 3D Printing: The New Digital Revolution Rating: 3 out of 5 stars3/5Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science Rating: 0 out of 5 stars0 ratingsPython Data Analysis - Second Edition Rating: 0 out of 5 stars0 ratingsSupercharge Power BI: Power BI is Better When You Learn To Write DAX Rating: 5 out of 5 stars5/5Python Data Science Essentials - Second Edition Rating: 4 out of 5 stars4/5Data Visualization with D3.js Cookbook Rating: 0 out of 5 stars0 ratingsNo-Code Data Science: Mastering Advanced Analytics, Machine Learning, and Artificial Intelligence Rating: 5 out of 5 stars5/5The Esri Guide to GIS Analysis, Volume 3: Modeling Suitability, Movement, and Interaction Rating: 0 out of 5 stars0 ratingsNeural Networks: Neural Networks Tools and Techniques for Beginners Rating: 5 out of 5 stars5/5WordPress For Beginners - How To Set Up A Self Hosted WordPress Blog Rating: 0 out of 5 stars0 ratings
Reviews for Learning Probabilistic Graphical Models in R
0 ratings0 reviews
Book preview
Learning Probabilistic Graphical Models in R - David Bellot
Table of Contents
Learning Probabilistic Graphical Models in R
Credits
About the Author
About the Reviewers
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Probabilistic Reasoning
Machine learning
Representing uncertainty with probabilities
Beliefs and uncertainty as probabilities
Conditional probability
Probability calculus and random variables
Sample space, events, and probability
Random variables and probability calculus
Joint probability distributions
Bayes' rule
Interpreting the Bayes' formula
A first example of Bayes' rule
A first example of Bayes' rule in R
Probabilistic graphical models
Probabilistic models
Graphs and conditional independence
Factorizing a distribution
Directed models
Undirected models
Examples and applications
Summary
2. Exact Inference
Building graphical models
Types of random variable
Building graphs
Probabilistic expert system
Basic structures in probabilistic graphical models
Variable elimination
Sum-product and belief updates
The junction tree algorithm
Examples of probabilistic graphical models
The sprinkler example
The medical expert system
Models with more than two layers
Tree structure
Summary
3. Learning Parameters
Introduction
Learning by inference
Maximum likelihood
How are empirical and model distribution related?
The ML algorithm and its implementation in R
Application
Learning with hidden variables – the EM algorithm
Latent variables
Principles of the EM algorithm
Derivation of the EM algorithm
Applying EM to graphical models
Summary
4. Bayesian Modeling – Basic Models
The Naive Bayes model
Representation
Learning the Naive Bayes model
Bayesian Naive Bayes
Beta-Binomial
The prior distribution
The posterior distribution with the conjugacy property
Which values should we choose for the Beta parameters?
The Gaussian mixture model
Definition
Summary
5. Approximate Inference
Sampling from a distribution
Basic sampling algorithms
Standard distributions
Rejection sampling
An implementation in R
Importance sampling
An implementation in R
Markov Chain Monte-Carlo
General idea of the method
The Metropolis-Hastings algorithm
MCMC for probabilistic graphical models in R
Installing Stan and RStan
A simple example in RStan
Summary
6. Bayesian Modeling – Linear Models
Linear regression
Estimating the parameters
Bayesian linear models
Over-fitting a model
Graphical model of a linear model
Posterior distribution
Implementation in R
A stable implementation
More packages in R
Summary
7. Probabilistic Mixture Models
Mixture models
EM for mixture models
Mixture of Bernoulli
Mixture of experts
Latent Dirichlet Allocation
The LDA model
Variational inference
Examples
Summary
A. Appendix
References
Books on the Bayesian theory
Books on machine learning
Papers
Index
Learning Probabilistic Graphical Models in R
Learning Probabilistic Graphical Models in R
Copyright © 2016 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: April 2016
Production reference: 1270416
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78439-205-5
www.packtpub.com
Credits
Author
David Bellot
Reviewers
Mzabalazo Z. Ngwenya
Prabhanjan Tattar
Acquisition Editor
Divya Poojari
Content Development Editor
Trusha Shriyan
Technical Editor
Vivek Arora
Copy Editor
Stephen Copestake
Project Coordinator
Kinjal Bari
Proofreader
Safis Editing
Indexer
Mariammal Chettiyar
Graphics
Abhinash Sahu
Production Coordinator
Nilesh Mohite
Cover Work
Nilesh Mohite
About the Author
David Bellot is a PhD graduate in computer science from INRIA, France, with a focus on Bayesian machine learning. He was a postdoctoral fellow at the University of California, Berkeley, and worked for companies such as Intel, Orange, and Barclays Bank. He currently works in the financial industry, where he develops financial market prediction algorithms using machine learning. He is also a contributor to open source projects such as the Boost C++ library.
About the Reviewers
Mzabalazo Z. Ngwenya holds a postgraduate degree in mathematical statistics from the University of Cape Town. He has worked extensively in the field of statistical consulting and has considerable experience working with R. Areas of interest to him are primarily centered around statistical computing. Previously, he has been involved in reviewing the following Packt Publishing titles: Learning RStudio for R Statistical Computing, Mark P.J. van der Loo and Edwin de Jonge; R Statistical Application Development by Example Beginner's Guide, Prabhanjan Narayanachar Tattar; Machine Learning with R, Brett Lantz; R Graph Essentials, David Alexandra Lillis; R Object-oriented Programming, Kelly Black; Mastering Scientific Computing with R, Paul Gerrard and Radia Johnson; and Mastering Data Analysis with R, Gergely Darócz.
Prabhanjan Tattar is currently working as a senior data scientist at Fractal Analytics, Inc. He has 8 years of experience as a statistical analyst. Survival analysis and statistical inference are his main areas of research/interest. He has published several research papers in peer-reviewed journals and authored two books on R: R Statistical Application Development by Example, Packt Publishing; and A Course in Statistics with R, Wiley. The R packages gpk, RSADBE, and ACSWR are also maintained by him.
www.PacktPub.com
eBooks, discount offers, and more
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Preface
Probabilistic graphical models is one of the most advanced techniques in machine learning to represent data and models in the real world with probabilities. In many instances, it uses the Bayesian paradigm to describe algorithms that can draw conclusions from noisy and uncertain real-world data.
The book covers topics such as inference (automated reasoning and learning), which is automatically building models from raw data. It explains how all the algorithms work step by step and presents readily usable solutions in R with many examples. After covering the basic principles of probabilities and the Bayes formula, it presents Probabilistic Graphical Models(PGMs) and several types of inference and learning algorithms. The reader will go from the design to the automatic fitting of the model.
Then, the books focuses on useful models that have proven track records in solving many data science problems, such as Bayesian classifiers, Mixtures models, Bayesian Linear Regression, and also simpler models that are used as basic components to build more complex models.
What this book covers
Chapter 1, Probabilistic Reasoning, covers topics from the basic concepts of probabilities to PGMs as a generic framework to do tractable, efficient, and easy modeling with probabilistic models, through the presentation of the Bayes formula.
Chapter 2, Exact Inference, shows you how to build PGMs by combining simple graphs and perform queries on the model using an exact inference algorithm called the junction tree algorithm.
Chapter 3, Learning Parameters, includes fitting and learning the PGM models from data sets with the Maximum Likelihood approach.
Chapter 4, Bayesian Modeling – Basic Models, covers simple and powerful Bayesian models that can be used as building blocks for more advanced models and shows you how to fit and query them with adapted algorithms.
Chapter 5, Approximate Inference, covers the second way to perform an inference in PGM using sampling algorithms and a presentation of the main sampling algorithms such as MCMC.
Chapter 6, Bayesian Modeling – Linear Models, shows you a more Bayesian view of the standard linear regression algorithm and a solution to the problem of over-fitting.
Chapter 7, Probabilistic Mixture Models, goes over more advanced probabilistic models in which the data comes from a mixture of several simple models.
Appendix, References, includes all the books and articles which have been used to write this book.
What you need for this book
All the examples in this book can be used with R version 3 or above on any platform and operating system supporting R.
Who this book is for
This book is for anyone who has to deal with lots of data and draw conclusions from it, especially when the data is noisy or uncertain. Data scientists, machine learning enthusiasts, engineers, and those who are curious about the latest advances in machine learning will find PGM interesting.
Conventions
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: We can also mention the arm package, which provides Bayesian versions of glm() and polr() and implements hierarchical models.
Any command-line input or output is written as follows:
pred_sigma <- sqrt(sigma^2 + apply((T%*%posterior_sigma)*T, MARGIN=1, FUN=sum)) upper_bound <- T%*%posterior_beta + qnorm(0.95)*pred_sigma lower_bound <- T%*%posterior_beta - qnorm(0.95)*pred_sigma
Note
Warnings or important notes appear in a box like this.
Tip
Tips and tricks appear like this.
Reader feedback
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Customer support
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
Downloading the example code
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
You can download the code files by following these steps:
Log in or register to our website using your e-mail address and password.
Hover the mouse pointer on the SUPPORT tab at the top.
Click on Code Downloads & Errata.
Enter the name of the book in the Search box.
Select the book for which you're looking to download the code files.
Choose from the drop-down menu where you purchased this book from.
Click on Code Download.
You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Please note that you need to be logged in to your Packt account.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
Errata
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
Questions
If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.
Chapter 1. Probabilistic Reasoning
Among all the predictions that were made about the 21st century, maybe the most unexpected one was that we would collect such a formidable amount of data about everything, everyday, and everywhere in the world. Recent years have seen an incredible explosion of data collection about our world, our lives, and technology; this is the main driver of what we can certainly call a revolution. We live in the Age of Information. But collecting data is nothing if we don't exploit it and try to extract knowledge out of it.
At the beginning of the 20th century, with the birth of statistics, the world was all about collecting data and making statistics. In that time, the only reliable tools were pencils and paper and of course, the eyes and ears of the observers. Scientific observation was still in its infancy, despite the prodigious development of the 19th century.
More than a hundred years later, we have computers, we have electronic sensors, we have massive data storage and we are able to store huge amounts of data continuously about, not only our physical world, but also our lives, mainly through the use of social networks, the Internet, and mobile phones. Moreover, the density of our storage technology has increased so much that we can, nowadays, store months if not years of data into a very small volume that can fit in the palm of our hand.
But storing data is not acquiring knowledge. Storing data is just keeping it somewhere for future use. At the same time as our storage capacity dramatically evolved, the capacity of modern computers increased too, at a pace that is sometimes hard to believe. When I was a doctoral student, I remember how proud I was when in the laboratory I received that brand-new, shiny, all-powerful PC for carrying my research work. Today, my old smart phone, which fits in my pocket, is more than 20 times faster.
Therefore in this book, you will learn one of the most advanced techniques to transform data into knowledge: machine learning. This technology is used in every aspect of modern life now, from search engines, to stock market predictions, from speech recognition to autonomous vehicles. Moreover it is used in many fields where one would not suspect it at all, from quality assurance in product chains to optimizing the placement of antennas for mobile phone networks.
Machine learning is the marriage between computer science and probabilities and statistics. A central theme in machine learning is the problem of inference or how to produce knowledge or predictions using an algorithm fed with data and examples. And this brings us to the two fundamental aspects of machine learning: the design of algorithms that can extract patterns and high-level knowledge from vast amounts of data and also the design of algorithms that can use this knowledge—or, in scientific terms: learning and inference.
Pierre-Simon Laplace (1749-1827) a French mathematician and one of the greatest scientists of all time, was presumably among the first to understand an important aspect of data collection: data is unreliable, uncertain and, as we say today, noisy. He was also the first to develop the use of probabilities to deal with such aspects of uncertainty and