15 Math Concepts Every Data Scientist Should Know: Understand and learn how to apply the math behind data science algorithms
By David Hoyle
()
David Hoyle
David Hoyle is an international management consultant with over 30 years' experience in quality management. He has held senior positions in quality management with British Aerospace and Ferranti International and worked with such companies as General Motors, the UK Civil Aviation Authority and Bell Atlantic on their quality improvement programmes. As well as delivering quality management and auditor training courses throughout the world, he has participated in various industry councils and committees, including the Institute of Quality Assurance.
Read more from David Hoyle
A Year of Grace: Exploring the Christian seasons Rating: 0 out of 5 stars0 ratingsThe Noble Army: The Modern Martyrs of Westminster Abbey Rating: 0 out of 5 stars0 ratings
Related to 15 Math Concepts Every Data Scientist Should Know
Related ebooks
Principles of Data Science: A beginner's guide to essential math and coding skills for data fluency and machine learning Rating: 0 out of 5 stars0 ratingsBayesian Analysis with Python: A practical guide to probabilistic modeling Rating: 3 out of 5 stars3/5Start Predicting In A World Of Data Science And Predictive Analysis Rating: 0 out of 5 stars0 ratingsAn Introduction to Statistical Computing: A Simulation-based Approach Rating: 0 out of 5 stars0 ratingsData Science Career Guide Interview Preparation Rating: 0 out of 5 stars0 ratingsPython Machine Learning Complete Self-Assessment Guide Rating: 0 out of 5 stars0 ratingsThe Malliavin Calculus Rating: 5 out of 5 stars5/5Pearls in Graph Theory: A Comprehensive Introduction Rating: 4 out of 5 stars4/5Ben Graham Was a Quant: Raising the IQ of the Intelligent Investor Rating: 5 out of 5 stars5/5macOS Big Sur Demystified: Most Well-guarded Secrets to Crack macOS Big Sur to Pro Level Revealed Rating: 0 out of 5 stars0 ratingsIntelligent Computational Systems: A Multi-Disciplinary Perspective Rating: 0 out of 5 stars0 ratingsSketches in Quantitative Finance A Translation of Bachelier's Le Jeu, la Chance et le Hasard Rating: 0 out of 5 stars0 ratingsWavelet Neural Networks: With Applications in Financial Engineering, Chaos, and Classification Rating: 0 out of 5 stars0 ratingsDifferential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization Rating: 4 out of 5 stars4/5Data Engineering Best Practices: Architect robust and cost-effective data solutions in the cloud era Rating: 0 out of 5 stars0 ratingsDEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB Rating: 0 out of 5 stars0 ratingsMachine Learning: Hands-On for Developers and Technical Professionals Rating: 0 out of 5 stars0 ratingsProfessional Python Rating: 0 out of 5 stars0 ratingsF# for Machine Learning Essentials: Get up and running with machine learning with F# in a fun and functional way Rating: 0 out of 5 stars0 ratingsAlternating Decision Tree: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsPractice Makes Perfect Linear Algebra: With 500 Exercises Rating: 0 out of 5 stars0 ratingsPyTorch Cookbook: 100+ Solutions across RNNs, CNNs, python tools, distributed training and graph networks Rating: 0 out of 5 stars0 ratingsC++ for Financial Engineers Complete Self-Assessment Guide Rating: 0 out of 5 stars0 ratingsDynamic Bayesian Networks: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsMachine Learning for the Web Rating: 0 out of 5 stars0 ratingsSynthetic Data Generation: A Beginner’s Guide Rating: 0 out of 5 stars0 ratingsGroup Theory in Solid State Physics and Photonics: Problem Solving with Mathematica Rating: 0 out of 5 stars0 ratingsA Short Course in Automorphic Functions Rating: 0 out of 5 stars0 ratingsPeak Performance: Nutrition Strategies for Athletes Rating: 0 out of 5 stars0 ratings
Programming For You
Learn SQL in 24 Hours Rating: 5 out of 5 stars5/5Coding All-in-One For Dummies Rating: 4 out of 5 stars4/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5Python: Learn Python in 24 Hours Rating: 4 out of 5 stars4/5JavaScript All-in-One For Dummies Rating: 5 out of 5 stars5/5Microsoft Azure For Dummies Rating: 0 out of 5 stars0 ratingsSQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsPython Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps Rating: 4 out of 5 stars4/5Algorithms For Dummies Rating: 4 out of 5 stars4/5Linux: Learn in 24 Hours Rating: 5 out of 5 stars5/5Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5Godot from Zero to Proficiency (Foundations): Godot from Zero to Proficiency, #1 Rating: 5 out of 5 stars5/5SQL All-in-One For Dummies Rating: 3 out of 5 stars3/5PYTHON PROGRAMMING Rating: 4 out of 5 stars4/5PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project Rating: 5 out of 5 stars5/5Beginning Programming with C++ For Dummies Rating: 4 out of 5 stars4/5Learn NodeJS in 1 Day: Complete Node JS Guide with Examples Rating: 3 out of 5 stars3/5Python Data Structures and Algorithms Rating: 5 out of 5 stars5/5
Reviews for 15 Math Concepts Every Data Scientist Should Know
0 ratings0 reviews
Book preview
15 Math Concepts Every Data Scientist Should Know - David Hoyle
15 Math Concepts Every Data Scientist Should Know
Copyright © 2024 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Niranjan Naikwadi
Publishing Product Manager: Yasir Ali Khan
Content Development Editor: Joseph Sunil
Technical Editor: Seemanjay Ameriya
Copy Editor: Safis Editing
Project Coordinator: Urvi Sharma
Proofreader: Safis Editing
Indexer: Hemangini Bari
Production Designer: Joshua Misquitta
Marketing Coordinator: Vinishka Kalra
First published: July 2024
Production reference: 2221024
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK
ISBN 978-1-83763-418-7
www.packtpub.com
To my wife Clare for her unwavering love, support, and inspiration throughout our life together.
– David Hoyle
Contributors
About the author
David Hoyle has over 30 years’ experience in machine learning, statistics, and mathematical modeling. He gained a BSc. degree in mathematics and physics and a Ph.D. in theoretical physics, both from the University of Bristol, UK. He then embarked on an academic career that included research at the University of Cambridge and leading his own research groups as an Associate Professor at the University of Exeter and the University of Manchester in the UK. For the last 13 years, he has worked in the commercial sector, including for Lloyds Banking Group – one of the UK’s largest retail banks, and as joint Head of Data Science for AutoTrader UK. He now works for the global customer data science company dunnhumby, building statistical and machine learning models for the world’s largest retailers, including Tesco UK and Walmart. He lives and works in Manchester, UK.
This has been a long endeavor. I would like to thank my wife and children for their encouragement, and the team at Packt for their patience and support throughout the process.
About the reviewer
Emmanuel Nyatefe is a data analyst with over 5 years of experience in data analytics, AI, and ML. He holds a Masters of Science in Business Analytics from the W. P. Carey School of Business at Arizona State University and a Bachelors of Science in Business Information Technology from Kwame Nkrumah University of Science and Technology. He has led various AI and ML projects, including developing models for detecting crop diseases and applying Generative AI to innovate business solutions and optimize operations. His expertise in data engineering, modeling, and visualization, alongside his proficiency in LLMs and advanced analytics, highlights his significant contributions to data science. His dedication to data-driven innovation is evident in his book review.
Table of Contents
Preface
Part 1: Essential Concepts
1
Recap of Mathematical Notation and Terminology
Technical requirements
Number systems
Notation for numbers and fields
Complex numbers
What we learned
Linear algebra
Vectors
Matrices
What we learned
Sums, products, and logarithms
Sums and the 𝚺 notation
Products and the http://schemas.openxmlformats.org/officeDocument/2006/math
>
Logarithms
What we learned
Differential and integral calculus
Differentiation
Finding maxima and minima
Integration
What we learned
Analysis
Limits
Order notation
Taylor series expansions
What we learned
Combinatorics
Binomial coefficients
What we learned
Summary
Notes and further reading
2
Random Variables and Probability Distributions
Technical requirements
All data is random
A little example
Systematic variation can be learned – random variation can’t
Random variation is not just measurement error
What are the consequences of data being random?
What we learned
Random variables and probability distributions
A new concept – random variables
Summarizing probability distributions
Continuous distributions
Transforming and combining random variables
Named distributions
What we learned
Sampling from distributions
How datasets relate to random variables and probability distributions
How big is the population from which a dataset is sampled?
How to sample
Generating your own random numbers code example
Sampling from numpy distributions code example
What we learned
Understanding statistical estimators
Consistency, bias, and efficiency
The empirical distribution function
What we learned
The Central Limit Theorem
Sums of random variables
CLT code example
CLT example with discrete variables
Computational estimation of a PDF from data
KDE code example
What we learned
Summary
Exercises
3
Matrices and Linear Algebra
Technical requirements
Inner and outer products of vectors
Inner product of two vectors
Outer product of two vectors
What we learned
Matrices as transformations
Matrix multiplication
The identity matrix
The inverse matrix
More examples of matrices as transformations
Matrix transformation code example
What we learned
Matrix decompositions
Eigen-decompositions
Eigenvector and eigenvalues
Eigen-decomposition of a square matrix
Eigen-decomposition code example
Singular value decomposition
The SVD of a complex matrix
What we learned
Matrix properties
Trace
Determinant
What we learned
Matrix factorization and dimensionality reduction
Dimensionality reduction
Principal component analysis
Non-negative matrix factorization
What we learned
Summary
Exercises
Notes and further reading
4
Loss Functions and Optimization
Technical requirements
Loss functions – what are they?
Risk functions
There are many loss functions
Different loss functions = different end results
Loss functions for anything
A loss function by any other name
What we learned
Least Squares
The squared-loss function
OLS regression
OLS, outliers, and robust regression
What we learned
Linear models
Practical issues
The model residuals
OLS regression code example
What we learned
Gradient descent
Locating the minimum of a simple risk function
Gradient descent code example
Gradient descent is a general technique
Beyond simple gradient descent
What we learned
Summary
Exercises
5
Probabilistic Modeling
Technical requirements
Likelihood
A simple probabilistic model
Log likelihood
Maximum likelihood estimation
What we have learned
Bayes’ theorem
Conditional probability and Bayes’ theorem
Priors
The posterior
What we have learned
Bayesian modeling
Bayesian model averaging
MAP estimation
As http://schemas.openxmlformats.org/officeDocument/2006/math
>
Least squares as an approximation to Bayesian modeling
What we have learned
Bayesian modeling in practice
Analytic approximation of the posterior
Computational sampling
MCMC code example
Probabilistic programming languages
What we have learned
Summary
Exercises
Part 2: Intermediate Concepts
6
Time Series and Forecasting
Technical requirements
What is time series data?
What does auto-correlation mean for modeling time series data?
The auto-correlation function (ACF)
The partial auto-correlation function (PACF)
Other data science implications of time series data
What we have learned
ARIMA models
Integrated
Auto-regression
Moving average
Combining the AR(p), I(d), and MA(q) into an ARIMA model
Variants of ARIMA modeling
What we have learned
ARIMA modeling in practice
Unit root testing
Interpreting ACF and PACF plots
auto.arima
What we have learned
Machine learning approaches to time series analysis
Routine application of machine learning to time series analysis
Deep learning approaches to time series analysis
AutoML approaches to time series analysis
What we have learned
Summary
Exercises
Notes and further reading
7
Hypothesis Testing
Technical requirements
What is a hypothesis test?
Example
The general form of a hypothesis test
The p-value
The effect of increasing sample size
The effect of decreasing noise
One-tailed and two-tailed tests
Using samples variances in the test statistic – the t-test
Computationally intensive methods for p-value estimation
Parametric versus non-parametric hypothesis tests
What we learned
Confidence intervals
What does a confidence interval really represent?
Confidence intervals for any parameter
A confidence interval code example
What we learned
Type I and Type II errors, and power
What we learned
Summary
Exercises
Notes and further reading
8
Model Complexity
Technical requirements
Generalization, overfitting, and the role of model complexity
Overfitting
Why overfitting is bad
Overfitting increases the variability of predictions
Underfitting is also a problem
Measuring prediction error
What we learned
The bias-variance trade-off
Proof of the bias-variance trade-off formula
Double descent – a modern twist on the generalization error diagram
What we learned
Model complexity measures for model selection
Selecting between classes of models
Akaike Information Criterion
Bayesian Information Criterion
What we learned
Summary
Notes and further reading
9
Function Decomposition
Technical requirements
Why do we want to decompose a function?
What is a decomposition of a function?
Example 1 – decomposing a one-dimensional function into symmetric and anti-symmetric parts
Example 2 – decomposing a time series into its seasonal and non-seasonal components
What we’ve learned
Expanding a function in terms of basis functions
What we’ve learned
Fourier series
What we’ve learned
Fourier transforms
The multi-dimensional Fourier transform
What we’ve learned
The discrete Fourier transform
DFT code example
Uses of the DFT
What is the difference between the DFT, Fourier series, and the Fourier transform?
What we’ve learned
Summary
Exercises
10
Network Analysis
Technical requirements
Graphs and network data
Network data is about relationships
Example 1 – substituting goods in a supermarket
Example 2 – international trade
What is a graph?
What we’ve learned
Basic characteristics of graphs
Undirected and directed edges
The adjacency matrix
In-degree and out-degree
Centrality
What we’ve learned
Different types of graphs
Fully connected graphs
Disconnected graphs
Directed acyclic graphs
Small-world networks
Scale-free networks
What we’ve learned
Community detection and decomposing graphs
What is a community?
How to do community detection
Community detection algorithms
Community detection code example
What we’ve learned
Summary
Exercises
Notes and further reading
Part 3: Selected Advanced Concepts
11
Dynamical Systems
Technical requirements
What is a dynamical system and what is an evolution equation?
Time can be discrete or continuous
Time does not have to mean chronological time
Evolution equations
What we learned
First-order discrete Markov processes
Variations of first-order Markov processes
A Markov process is a probabilistic model
The transition probability matrix
Properties of the transition probability matrix
Epidemic modeling with a first-order discrete Markov process
The transition probability matrix is a network
Using the transition matrix to generate state trajectories
Evolution of the state probability distribution
Stationary distributions and limiting distributions
First-order discrete Markov processes are memoryless
Likelihood of the state sequence
What we learned
Higher-order discrete Markov processes
Second-order discrete Markov processes
Evolution of the state probability distribution in higher-order models
A higher-order discrete Markov process is a first-order discrete Markov process in disguise
Higher-order discrete Markov processes are still memoryless
What we learned
Hidden Markov Models
Emission probabilities
Making inferences with an HMM
What we learned
Summary
Exercises
Notes and further reading
12
Kernel Methods
Technical requirements
The role of inner products in common learning algorithms
Sometimes we need new features in our inner products
What we learned
The kernel trick
What is a kernel?
Commonly used kernels
Kernel functions for other mathematical objects
Combining kernels
Positive semi-definite kernels
Mercer’s theorem and the kernel trick
Kernelized algorithms
What we learned
An example of a kernelized learning algorithm
kFDA code example
What we learned
Summary
Exercises
13
Information Theory
Technical requirements
What is information and why is it useful?
The concept of information
The mathematical definition of information
Information theory applies to continuous distributions as well
Why we measure information on a logarithmic scale
Why is quantifying information useful?
What we’ve learned
Entropy as expected information
Entropy
What we’ve learned
Mutual information
Conditional entropy
Mutual information for continuous variables
Mutual information as a measure of correlation
Mutual information code example
What we’ve learned
The Kullback-Leibler divergence
Relative entropy
KL-divergence for continuous variables
Using the KL-divergence for approximation
Variational inference
What we’ve learned
Summary
Exercises
Notes and further reading
14
Non-Parametric Bayesian Methods
Technical requirements
What are non-parametric Bayesian methods?
We still have parameters
The different types of non-parametric Bayesian methods
The pros and cons of non-parametric Bayesian methods
What we learned
Gaussian processes
The kernel function
Fitting GPR models
Prediction using GPR models
GPR code example
What we learned
Dirichlet processes
How do DPs differ from GPs?
The DP notation
Sampling a function from a DP
Generating a sample of data from a DP
Bayesian non-parametric inference using a DP
What we learned
Summary
Exercises
15
Random Matrices
Technical requirements
What is a random matrix?
What we learned
Using random matrices to represent interactions in large-scale systems
What we learned
Universal behavior of large random matrices
The Wigner semicircle law
What does RMT study?
Universal is universal
The classical Gaussian matrix ensembles
What we learned
Random matrices and high-dimensional covariance matrices
The Marčenko-Pastur distribution is a bulk distribution
Universality in the singular values of http://schemas.openxmlformats.org/officeDocument/2006/math
>
The Marčenko-Pastur distribution and neural networks
What we learned
Summary
Exercises
Notes and further reading
Index
Other Books You May Enjoy
Preface
This is not a book about a specific technology or programming language. This is a book about mathematics. And mathematics is a language. It is the language of science, and so it is the language of data science as well. We can say beautiful things with that language. Just as a piece of great literature is more than a large collection of individual letters, a mathematical equation is more than just a collection of symbols. An equation conveys a way of thinking about a data science problem. It conveys a concept or an idea. If you want to fully exploit the power of those ideas and adapt them to your own data science work, you need to move beyond just recognizing the symbols in an equation and move towards understanding what that equation is really telling you.
Many people are not confident in reading and interpreting mathematical equations and mathematical ideas. And yet, as with great literature, once someone guides us through the nuances and subtexts, their beauty is revealed and becomes obvious. That is what this book aims to do.
This book will not make you an expert in every area of mathematics. Instead, it will give you enough skills and confidence to read and navigate mathematical equations and ideas on your own. We do that by walking you through the core concepts that underpin many data science algorithms – the 15 math concepts of the book’s title. We also do that by walking through those concepts slowly and in detail. I am not a fan of mathematics books that consist solely of theorems, lemmas, and proofs. Instead, this book is unapologetically long-form math. When we introduce an equation, we will explain what the equation tells us, what its implications and ramifications are, and how it connects to other parts of math. We also illustrate those concepts with code examples in Python.
At the end of the book, you will be equipped to look at the math equations of any data science algorithm and confidently unpack what that algorithm is trying to do.
Who this book is for
This book is for data scientists and machine learning engineers who have been using data science and machine learning techniques, software, and Python packages such as scikit-learn, but without necessarily fully understanding the mathematics behind the algorithms. This could include the following types of people:
Data scientists who have a college/undergraduate degree in a numerate subject and so have a basic understanding of mathematics, but they want to learn more, particularly those bits of mathematics that will be helpful in their roles as data scientists.
Data scientists who have a good understanding of some of the mathematics behind bits of data science but want to discover some new math concepts that will be useful to them in their data science work.
Data scientists who have business or data science problems they need to solve, but existing software does not provide appropriate algorithms. They want to construct their own algorithms but lack the mathematical guidance on how to apply mathematics to the new data science problems.
What this book covers
Chapter 1
, Recap of Mathematical Notation and Terminology, provides a summary of the main mathematical notation you will encounter in this book and that we expect you to already be familiar with.
Chapter 2
, Random Variables and Probability Distributions, introduces the idea that all data contains some degree of randomness, and that random variables and their associated probability distributions are the natural way to describe that randomness. The chapter teaches you how to sample from a probability distribution, understand statistical estimators, and about the Central Limit Theorem.
Chapter 3
, Matrices and Linear Algebra, introduces vectors and matrices as the basic mathematical structures we use to represent and transform data. It then shows how matrices can be broken down into simple-to-understand parts using techniques such as eigen-decomposition and singular value decomposition. The chapter finishes with explanations of how these decomposition methods are applied to principal component analysis (PCA) and non-negative matrix factorization (NMF).
Chapter 4, Loss Functions and Optimization, starts by introducing loss functions, risk functions, and empirical risk functions. The concept of minimizing an empirical risk function to estimate the parameters of a model is explained, before introducing Ordinary Least Squares estimation of linear models. Finally, gradient descent is illustrated as a general technique for minimizing risk functions.
Chapter 5
, Probabilistic Modeling, introduces the concept of building predictive models that explicitly account for the random component within data. The chapter starts by introducing likelihood and maximum likelihood estimation, before introducing Bayes’ theorem and Bayesian inference. The chapter finishes with an illustration of Markov Chain Monte Carlo and importance sampling from the posterior distribution of a model’s parameters.
Chapter 6
, Time Series and Forecasting, introduces time series data and the concept of auto-correlation as the main characteristic that distinguishes time series data from other types of data. It then describes the classical ARIMA approach to modeling time series data. Finally, it ends with a summary of concepts behind modern machine learning approaches to time series analysis.
Chapter 7
, Hypothesis Testing, introduces what a hypothesis test is and why they are important in data science. The general form of a hypothesis test is outlined before the concepts of statistical significance and p-values are explained in depth. Next, confidence intervals and their interpretation are introduced. The chapter ends with an explanation of Type-I and Type-II errors, and power calculations.
Chapter 8
, Model Complexity, introduces the concept of how we describe and quantify model complexity and discusses its impact on the predictive accuracy of a model. The classical bias-variance trade-off view of model complexity is introduced, along with the phenomenon of double descent. The chapter finishes with an explanation of model complexity measures for model selection.
Chapter 9
, Function Decomposition, introduces the idea of decomposing or building up a function from a set of simpler basis functions. A general approach is explained first before the chapter moves on to introducing Fourier Series, Fourier Transforms, and the Discrete Fourier Transform.
Chapter 10
, Network Analysis, introduces networks, network data, and the concept that a network is a graph. The node-edge description of a graph, along with its adjacency matrix representation is explained. Next, the chapter describes different types of common graphs and their properties. Finally, the decomposition of a graph into sub-graphs or communities is explained, and various community detection algorithms are illustrated.
Chapter 11
, Dynamical Systems, introduces what a dynamical system is and explains how its dynamics are controlled by an evolution equation. The chapter then focuses on discrete Markov processes as these are the most common dynamical systems used by data scientists. First-order discrete Markov processes are explained in depth, before higher-order Markov processes are introduced. The chapter finishes with an explanation of Hidden Markov Models and a discussion of how they can be used in commercial data science applications.
Chapter 12
, Kernel Methods, starts by introducing inner-product-based learning algorithms, then moves on to explaining kernels and the kernel trick. The chapter ends with an illustration of a kernelized learning algorithm. Throughout the chapter, we emphasize how the kernel trick allows us to implicitly and efficiently construct new features and thereby uncover any non-linear structure present in a dataset.
Chapter 13
, Information Theory, introduces the concept of information and how it is measured mathematically. The main information theory concepts of entropy, conditional entropy, mutual information, and relative entropy are then explained, before practical uses of the Kullback-Leibler divergence are illustrated.
Chapter 14
, Bayesian Non-Parametric Methods, introduces the idea of using a Bayesian prior over functions when building probabilistic models. The idea is illustrated through Gaussian Processes and Gaussian Process Regression. The chapter then introduces Dirichlet Processes and how they can be used as priors for probability distributions.
Chapter 15
, Random Matrices, introduces what a random matrix is and why they are ubiquitous in science and data science. The universal properties of large random matrices are illustrated along with the classical Gaussian random matrix ensembles. The chapter finishes with a discussion of where large random matrices occur in statistical and machine learning models.
To get the most out of this book
To get the most out of this book, we assume you have at least some familiarity with high-school mathematics, such as complex numbers, basic calculus, and elementary uses of vectors and matrices. To get the most out of the code examples in the book, you should have some experience of coding in Python. You will also need access to a computer or server with a full Python installation and/or where you have privileges to run and install Python and any additional packages required.
The code examples given in each chapter, and the answers to the exercises at the end of each chapter, are available in the book’s GitHub repository as Jupyter notebooks. To run the notebooks, you will need a Jupyter installation.
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
Download the example code files
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/15-Math-Concepts-Every-Data-Scientist-Should-Know
. If there’s an update to the code, it will be updated in the GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/
. Check them out!
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: The following code example can be found in the Code_Examples_Chap5.ipynb notebook in the GitHub repository.
A block of code is set as follows:
map_estimate = minimize(neg_log_posterior,
x0,
method='BFGS',
options={'disp': True})
# Convert from logit(p) to p
p_optimal = np.exp(map_estimate['x'][0])/ (
1.0 + np.exp(map_estimate['x'][0]))
print(MAP estimate of success probability =
, p_optimal)
Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: The name ARIMA stands for Auto-Regressive Integrated Moving Average models.
Tips or important notes
Appear like this.
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at [email protected]
and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata
and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected]
with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com
.
Share your thoughts
Once you’ve read 15 Math Concepts Every Data Scientist Should Know, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page
for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Download a free PDF copy of this book
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere?
Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily
Follow these simple steps to get the benefits:
Scan the QR code or visit the link below
https://packt.link/free-ebook/9781837634187
2. Submit your proof of purchase
3. That’s it! We’ll send your free PDF and other benefits to your email directly
Part 1: Essential Concepts
In this part, we will introduce the math concepts that you will encounter again and again as a data scientist. These concepts are vital to gain a good understanding of. After a recap of basic math notation, we look at the concepts related to how data is produced and then move through to concepts related to how to transform data, finally building up to our end goal of how to model data. These concepts are essential because you will use and combine them simultaneously in your work. By the end of Part 1, you will be comfortable with the math concepts that underpin almost all data science models and algorithms.
This section contains the following chapters:
Chapter 1
, Recap of Mathematical Notation and Terminology
Chapter 2
, Random Variables and Probability Distributions
Chapter 3
, Matrices and Linear Algebra
Chapter 4
, Loss Functions and Optimization
Anchor 5
, Probabilistic Modeling
1
Recap of Mathematical Notation and Terminology
Our tour of math concepts will start properly in Chapter 2
. Before we begin that tour, we’ll start by recapping some mathematical notation and terminology. Mathematics is a language, and mathematical symbols and notation are its alphabet. Therefore, we must be comfortable with and understand the basics of this alphabet.
In this chapter, we will recap the most common core notation and terminology that we are likely to use repeatedly throughout the book. We have grouped the recap into six main math areas or topics. Those topics are as follows:
Number systems: In this section, we introduce notation for real and complex numbers
Linear algebra: In this section, we introduce notation for describing vectors and matrices
Sums, products, and logarithms: In this section, we introduce notation for succinctly representing sums and products, and we introduce rules for logarithms
Differential and integral calculus: In this section, we introduce basic notation for differentiation and integration
Analysis: In this section, we introduce notation for describing limits, and order notation
Combinatorics: In this section, we introduce notation for binomial coefficients
Some of this notation you may already be familiar with. For example, complex numbers, matrices, logarithms, and basic differential calculus you will have seen either in high school or in the first year of an undergraduate degree in a numerate subject. Other topics, such as order notation, you may have encountered as part of a university degree course on mathematical analysis or algorithm complexity, or it may be new to you. For the most part, the notation we recap in this chapter you will have seen before. You can skip this chapter if you want to and if you are already familiar and comfortable with the symbols and notation recapped here. You can easily come back later or read those sections that contain notation that is new to you.
We should emphasize that this chapter is a recap. It is brief. It is not meant to be an exhaustive and comprehensive review. We focus on presenting a few main facts, but also on trying to give a feel for why the notation may be useful and how it is likely to be used.
Finally, we will encounter new notation, terminology, and symbols as we progress through the book when we are discussing specific topics. We will introduce this new notation and terminology as and when we need it.
Technical requirements
As this chapter solely recaps some of the mathematical notation we will use in later chapters, there are no code examples given and hence no technical requirements for this particular chapter.
For later chapters, you will be able to find code examples at the GitHub repository: https://github.com/PacktPublishing/15-Math-Concepts-Every-Data-Scientist-Should-Know
Number systems
In this section, we introduce notation for describing sets of numbers. We will focus on the real numbers and the complex numbers.
Notation for numbers and fields
As this is a book about data science, we will be dealing with numbers. So, it will be worthwhile recapping the notation we use to refer to the most common sets of numbers.
Most of the numbers we will deal with in this book will be real numbers, such as 4.6, 1, or -2.3. We can think of them as living
on the real number line shown in Figure 1.1. The real number line is a one-dimensional continuous structure. There are an infinite number of real numbers. We denote the set of all real numbers by the symbol ℝ.
Figure 1.1: The real number line
Obviously, there will be situations where we want to restrict our datasets to, say, just integer-valued numbers. This would be the case if we were analyzing count data, such as the number of items of a particular product on an e-commerce site sold on a particular day. The integer numbers, …, -2, -1, 0, 1, 2, …, are a subset of the real numbers, and we denote them by the symbol ℤ. Despite them being a subset of the real numbers, there are still an infinite number of integers.
For the e-commerce count data that we mentioned earlier, the integer value would always be positive. If we restrict ourselves to strictly positive integers, 1, 2, 3, …, and so on, then we have the natural or counting numbers. These we denote by the symbol http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
As well as real numbers, we will occasionally deal with complex numbers. As the name suggests, complex numbers have more structure to them than real numbers. The complex numbers don’t live on the real number line and so are not a subset of the real numbers, but instead, they have a two-dimensional structure, which we’ll explain in a moment. We denote the set of complex numbers by the symbol http://schemas.openxmlformats.org/officeDocument/2006/math
>
Sometimes, there are very specific occasions when we may want to refer to other subsets of the real numbers. Other common symbols you may encounter are http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>}
separators=|
>
Numbers such as 4.6 are specific instances of a real number. When we are talking about algorithms or code, we will want to talk about variables, in which case we use a symbol such as http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
> or more succinctly,
is real."
Likewise, if we wanted to say http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
When we have several variables that all have similar properties or that may be related in some way – for example, they represent different features of a data point in a training set – then we use subscripts to denote the different variables. For example, we would use http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Complex numbers
If the real numbers live on the one-dimensional structure that is the real number line, this raises the question of whether we can have numbers that live in a two-dimensional space. Complex numbers are such numbers. A complex number, http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Eq. 1
The symbol http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Figure 1.2: The complex number plane
The position of http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Eq. 2
Consequently, we have used Re http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
A number that has http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Just as with other 2D planes, we can represent a point in the complex plane not just with Cartesian coordinates http://schemas.openxmlformats.org/officeDocument/2006/math
>
Eq. 3
The symbol http://schemas.openxmlformats.org/officeDocument/2006/math
>|
separators=|
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>|
separators=|
>
Eq. 4
The angle http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Eq. 5
This means we can also write http://schemas.openxmlformats.org/officeDocument/2006/math
>
Eq. 6
This last form for writing a complex number will be useful when we introduce Fourier transforms, which are used to represent functions as a sum of sine and cosine waves. In fact, this is our main reason for introducing complex numbers.
One important concept relating to the complex number z is that of its complex conjugate. The complex conjugate of http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Eq. 7
Figure 1.3: The complex conjugateFigure 1.3: The complex conjugate
The integers, real numbers, and complex numbers represent the overwhelming majority of the numbers we will meet throughout this book, so this is a good place to end our recap of number systems.
Let’s summarize what we learned.
What we learned
In this section, we have learned the following:
The notation http://schemas.openxmlformats.org/officeDocument/2006/math
>
The notation http://schemas.openxmlformats.org/officeDocument/2006/math
>
The notations http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
The notation http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>}
separators=|
>
The notation http://schemas.openxmlformats.org/officeDocument/2006/math
>
How complex numbers have a real and an imaginary part
How a complex number http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
How to calculate the complex conjugate http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
In the next section, having learned how to describe both real and complex numbers, we move on to how to describe collections of numbers (vectors) and how to describe mathematical objects (matrices) that transform those vectors.
Linear algebra
In this section, we introduce notation to describe vectors and matrices, which are key mathematical objects that we will encounter again and again throughout this book.
Vectors
In many circumstances, we will want to represent a set of numbers together. For example, the numbers 7.3 and 1.2 might represent the values of two features that correspond to a data point in a training set. We often group these numbers together in brackets and write them as (7.3, 1.2) or [7.3, 1.2]. Because of the similarity to the way we write spatial coordinates, we tend to call a collection of numbers that are held together a vector. A vector can be two-dimensional, as in the example just given, or d-dimensional, meaning it contains d components, and so might look like http://schemas.openxmlformats.org/officeDocument/2006/math
>
We can write a vector in two ways. We can write it as a row vector, going across the page, such as the following vector:
d-dimensional row vector
Eq. 8
Alternatively, we can write it as a column vector going down the page, such as the following vector:
<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" display="block"><mml:mfenced separators="|"><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>⋮</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mfenced><mml:mo>=</mml:mo><mml:mtext>a</mml:mtext><mml:mo> </mml:mo><mml:mi>d</mml:mi><mml:mtext>-dimensional column vector</mml:mtext></mml:math>Eq. 9
We can convert between a row vector and a column vector (and vice versa) using the transpose operator, denoted by a http://schemas.openxmlformats.org/officeDocument/2006/math
>
Eq. 10
And vice-versa in the following example:
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><msup><mfenced open="(" close=")"><mtable columnwidth="auto" columnalign="center" rowspacing="1.0000ex 1.0000ex 1.0000ex" rowalign="baseline baseline baseline baseline"><mtr><mtd><msub><mi>x</mi><mn>1</mn></msub></mtd></mtr><mtr><mtd><msub><mi>x</mi><mn>2</mn></msub></mtd></mtr><mtr><mtd><mo>⋮</mo></mtd></mtr><mtr><mtd><msub><mi>x</mi><mi>d</mi></msub></mtd></mtr></mtable></mfenced><mi mathvariant="normal">⊤</mi></msup><mo>=</mo><mfenced open="(" close=")"><mrow><msub><mi>x</mi><mn>1</mn></msub><mo>,</mo><msub><mi>x</mi><mn>2</mn></msub><mo>,</mo><mo>…</mo><mo>,</mo><msub><mi>x</mi><mi>d</mi></msub></mrow></mfenced></mrow></mrow></math>Eq. 11
Symbolically, we often write a vector using a boldface font – for example, http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Matrices
Usually, we will want to transform a vector more than just transposing it. Linear transformations of vectors can be done with matrices. We will cover such transformations in Chapter 3
, but for now, we will just show how we write a matrix. A matrix is a two-dimensional array. For example, the following array is a matrix:
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><munder><munder><mi>M</mi><mo stretchy="true">_</mo></munder><mo stretchy="true">_</mo></munder><mspace width="0.25em" /><mo>=</mo><mspace width="0.25em" /><mfenced open="(" close=")"><mtable columnspacing="0.8000em 0.8000em 0.8000em" columnwidth="auto auto auto auto" columnalign="center center center center" rowspacing="1.0000ex 1.0000ex" rowalign="baseline baseline baseline"><mtr><mtd><mn>7</mn></mtd><mtd><mn>3</mn></mtd><mtd><mn>2</mn></mtd><mtd><mn>5</mn></mtd></mtr><mtr><mtd><mn>1</mn></mtd><mtd><mrow><mo>−</mo><mn>2</mn></mrow></mtd><mtd><mrow><mo>−</mo><mn>1</mn></mrow></mtd><mtd><mn>6</mn></mtd></mtr><mtr><mtd><mn>1</mn></mtd><mtd><mrow><mo>−</mo><mn>9</mn></mrow></mtd><mtd><mn>14</mn></mtd><mtd><mn>0</mn></mtd></mtr></mtable></mfenced></mrow></mrow></math>Eq. 12
We have used a double underline to denote the matrix http://schemas.openxmlformats.org/officeDocument/2006/math
>
Because a matrix is a two-dimensional structure, we use two numbers to describe its size: the number of rows and the number of columns. If a matrix has R rows and C columns, we describe it as an R x C matrix. The matrix http://schemas.openxmlformats.org/officeDocument/2006/math
>
We pick out individual parts of a matrix by referring to a matrix element. The symbol http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
The matrix elements in the previous example are all integers. This need not be the case. A matrix element could be any real number. It can also be a complex number. If all the matrix elements are real, we say it is a real matrix, while if any of the matrix elements are complex, then we say the matrix is complex.
That short recap on notation for vectors and matrices is enough for now. We will meet vectors and matrices again in Chapter 3
, but for now, let’s summarize what we have learned about them.
What we learned
In this section, we have learned about the following:
How to represent a vector as a collection of multiple components (numbers)
Row vectors and column vectors and how they are related to each other via the transpose operator
How a matrix is a two-dimensional collection of components (numbers) and how the notation http://schemas.openxmlformats.org/officeDocument/2006/math
>
In the next section, now we have learned about various notations for individual numbers and collections of them, we move on to notation for performing operations on them. We start with the simplest operations – adding numbers together, multiplying numbers together, and taking logarithms.
Sums, products, and logarithms
In this section, we introduce notation for doing the most basic operations we can do with numbers, namely adding them together or multiplying them together. We’ll then introduce notation for working with logarithms.
Sums and the 𝚺 notation
When we want to add several numbers together, we can use the summation, or http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Eq. 13
This notation is shorthand for writing
<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub></mml:math>. This essentially defines what the http://schemas.openxmlformats.org/officeDocument/2006/math
>
Eq. 14
In the left-hand side (LHS) of Eq. 14, the integer indexing variable, http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
You may wonder whether the shorthand notation on the LHS of Eq. 14 is of any use. After all, the right-hand side (RHS) isn’t very long. However, when we want to represent the adding up of lots of numbers, then the http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Eq. 15
Sometimes, we will use the http://schemas.openxmlformats.org/officeDocument/2006/math
>
∑ i=1 i=N x i
Eq. 16
This means "add together the N numbers, http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Sometimes, you may see variants of the expression in the previous equation. Sometimes, a person may omit the upper value of http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Eq. 17
This usually means "add up all values of http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Finally, note that when writing sums using the http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Eq. 18
The LHS of Eq. 18 is the http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Eq. 19
Finally, it is worth pointing out that we can also use the http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Eq. 20
This is obviously shorthand notation for
<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:msup><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:mo>…</mml:mo><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mn>99</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mn>100</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math>.
Products and the http://schemas.openxmlformats.org/officeDocument/2006/math
>
Having introduced the http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Eq. 21
As with the http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Eq. 22
Logarithms
Logarithms are extremely useful for describing how quickly a quantity or function grows. In particular, the logarithm tells us the exponent that describes the rate of growth of a quantity or function. Let’s make that more explicit. The logarithm to base http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Eq. 23
The symbol http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
We can see from Eq. 23 that the logarithm does in fact tell us the exponent (in base http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>monotonic
means of one tone
or of one direction,
and so it means either only going up (monotonically increasing) or only going down (monotonically decreasing). This is shown in Figure 1.4, which shows the natural logarithm function http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Figure 1.4: Graph of the natural logarithm function
An important consequence of the monotonically increasing nature of the logarithm function is that if we have a function http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Eq. 24
We will refer to this again in a moment.
There are well-known rules for taking logarithms of reciprocals, products, and ratios. These are (for any base):
<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" display="block"><mml:msub><mml:mrow><mml:mtext>log</mml:mtext></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mfenced separators="|"><mml:mrow><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:mfenced><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mtext>log</mml:mtext></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mfenced separators="|"><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:mfenced></mml:math>Eq. 25
And the following:
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><msub><mtext>log</mtext><mi>a</mi></msub><mfenced open="(" close=")"><mrow><mi>x</mi><mi>y</mi></mrow></mfenced><mo>=</mo><msub><mtext>log</mtext><mi>a</mi></msub><mfenced open="(" close=")"><mi>x</mi></mfenced><mo>+</mo><msub><mtext>log</mtext><mi>a</mi></msub><mfenced open="(" close=")"><mi>y</mi></mfenced></mrow></mrow></math>Eq. 26
Combining these two rules, we get the rule for taking the log of a ratio:
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><msub><mtext>log</mtext><mi>a</mi></msub><mfenced open="(" close=")"><mfrac><mi>x</mi><mi>y</mi></mfrac></mfenced><mo>=</mo><msub><mtext>log</mtext><mi>a</mi></msub><mfenced open="(" close=")"><mi>x</mi></mfenced><mo>+</mo><msub><mtext>log</mtext><mi>a</mi></msub><mfenced open="(" close=")"><mfrac><mn>1</mn><mi>y</mi></mfrac></mfenced><mo>=</mo><msub><mtext>log</mtext><mi>a</mi></msub><mfenced open="(" close=")"><mi>x</mi></mfenced><mo>−</mo><msub><mtext>log</mtext><mi>a</mi></msub><mfenced open="(" close=")"><mi>y</mi></mfenced></mrow></mrow></math>Eq. 27
The rule for taking the log of a product is particularly useful when we have a product formed from many numbers. Using the http://schemas.openxmlformats.org/officeDocument/2006/math
>http://schemas.openxmlformats.org/officeDocument/2006/math
>
Eq. 28
This, in conjunction with the fact that taking the log is a monotonic transformation, will be very useful to us when we start to use the concept of maximum likelihood to build probabilistic models in Chapter 5
.
We will make lots of use of sums, products, and logarithms throughout this book, but we have all the notation we need to work with them, so let’s summarize what we have learned about that notation.
What we learned
In this section, we have learned about the following:
The Σ notation for adding lots of numbers together
The Π notation for multiplying lots of numbers together
How we can also use the Σ and Π notations when we have a function, http://schemas.openxmlformats.org/officeDocument/2006/math
>