Explore 1.5M+ audiobooks & ebooks free for days

Only $12.99 CAD/month after trial. Cancel anytime.

15 Math Concepts Every Data Scientist Should Know: Understand and learn how to apply the math behind data science algorithms
15 Math Concepts Every Data Scientist Should Know: Understand and learn how to apply the math behind data science algorithms
15 Math Concepts Every Data Scientist Should Know: Understand and learn how to apply the math behind data science algorithms
Ebook1,285 pages9 hours

15 Math Concepts Every Data Scientist Should Know: Understand and learn how to apply the math behind data science algorithms

Rating: 0 out of 5 stars

()

Read preview
LanguageEnglish
PublisherPackt Publishing
Release dateAug 16, 2024
ISBN9781837631940
15 Math Concepts Every Data Scientist Should Know: Understand and learn how to apply the math behind data science algorithms
Author

David Hoyle

David Hoyle is an international management consultant with over 30 years' experience in quality management. He has held senior positions in quality management with British Aerospace and Ferranti International and worked with such companies as General Motors, the UK Civil Aviation Authority and Bell Atlantic on their quality improvement programmes. As well as delivering quality management and auditor training courses throughout the world, he has participated in various industry councils and committees, including the Institute of Quality Assurance.

Read more from David Hoyle

Related to 15 Math Concepts Every Data Scientist Should Know

Related ebooks

Programming For You

View More

Reviews for 15 Math Concepts Every Data Scientist Should Know

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    15 Math Concepts Every Data Scientist Should Know - David Hoyle

    Cover.jpg

    15 Math Concepts Every Data Scientist Should Know

    Copyright © 2024 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Group Product Manager: Niranjan Naikwadi

    Publishing Product Manager: Yasir Ali Khan

    Content Development Editor: Joseph Sunil

    Technical Editor: Seemanjay Ameriya

    Copy Editor: Safis Editing

    Project Coordinator: Urvi Sharma

    Proofreader: Safis Editing

    Indexer: Hemangini Bari

    Production Designer: Joshua Misquitta

    Marketing Coordinator: Vinishka Kalra

    First published: July 2024

    Production reference: 2221024

    Published by Packt Publishing Ltd.

    Grosvenor House

    11 St Paul’s Square

    Birmingham

    B3 1RB, UK

    ISBN 978-1-83763-418-7

    www.packtpub.com

    To my wife Clare for her unwavering love, support, and inspiration throughout our life together.

    – David Hoyle

    Contributors

    About the author

    David Hoyle has over 30 years’ experience in machine learning, statistics, and mathematical modeling. He gained a BSc. degree in mathematics and physics and a Ph.D. in theoretical physics, both from the University of Bristol, UK. He then embarked on an academic career that included research at the University of Cambridge and leading his own research groups as an Associate Professor at the University of Exeter and the University of Manchester in the UK. For the last 13 years, he has worked in the commercial sector, including for Lloyds Banking Group – one of the UK’s largest retail banks, and as joint Head of Data Science for AutoTrader UK. He now works for the global customer data science company dunnhumby, building statistical and machine learning models for the world’s largest retailers, including Tesco UK and Walmart. He lives and works in Manchester, UK.

    This has been a long endeavor. I would like to thank my wife and children for their encouragement, and the team at Packt for their patience and support throughout the process.

    About the reviewer

    Emmanuel Nyatefe is a data analyst with over 5 years of experience in data analytics, AI, and ML. He holds a Masters of Science in Business Analytics from the W. P. Carey School of Business at Arizona State University and a Bachelors of Science in Business Information Technology from Kwame Nkrumah University of Science and Technology. He has led various AI and ML projects, including developing models for detecting crop diseases and applying Generative AI to innovate business solutions and optimize operations. His expertise in data engineering, modeling, and visualization, alongside his proficiency in LLMs and advanced analytics, highlights his significant contributions to data science. His dedication to data-driven innovation is evident in his book review.

    Table of Contents

    Preface

    Part 1: Essential Concepts

    1

    Recap of Mathematical Notation and Terminology

    Technical requirements

    Number systems

    Notation for numbers and fields

    Complex numbers

    What we learned

    Linear algebra

    Vectors

    Matrices

    What we learned

    Sums, products, and logarithms

    Sums and the 𝚺 notation

    Products and the http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>bold>Π notation

    Logarithms

    What we learned

    Differential and integral calculus

    Differentiation

    Finding maxima and minima

    Integration

    What we learned

    Analysis

    Limits

    Order notation

    Taylor series expansions

    What we learned

    Combinatorics

    Binomial coefficients

    What we learned

    Summary

    Notes and further reading

    2

    Random Variables and Probability Distributions

    Technical requirements

    All data is random

    A little example

    Systematic variation can be learned – random variation can’t

    Random variation is not just measurement error

    What are the consequences of data being random?

    What we learned

    Random variables and probability distributions

    A new concept – random variables

    Summarizing probability distributions

    Continuous distributions

    Transforming and combining random variables

    Named distributions

    What we learned

    Sampling from distributions

    How datasets relate to random variables and probability distributions

    How big is the population from which a dataset is sampled?

    How to sample

    Generating your own random numbers code example

    Sampling from numpy distributions code example

    What we learned

    Understanding statistical estimators

    Consistency, bias, and efficiency

    The empirical distribution function

    What we learned

    The Central Limit Theorem

    Sums of random variables

    CLT code example

    CLT example with discrete variables

    Computational estimation of a PDF from data

    KDE code example

    What we learned

    Summary

    Exercises

    3

    Matrices and Linear Algebra

    Technical requirements

    Inner and outer products of vectors

    Inner product of two vectors

    Outer product of two vectors

    What we learned

    Matrices as transformations

    Matrix multiplication

    The identity matrix

    The inverse matrix

    More examples of matrices as transformations

    Matrix transformation code example

    What we learned

    Matrix decompositions

    Eigen-decompositions

    Eigenvector and eigenvalues

    Eigen-decomposition of a square matrix

    Eigen-decomposition code example

    Singular value decomposition

    The SVD of a complex matrix

    What we learned

    Matrix properties

    Trace

    Determinant

    What we learned

    Matrix factorization and dimensionality reduction

    Dimensionality reduction

    Principal component analysis

    Non-negative matrix factorization

    What we learned

    Summary

    Exercises

    Notes and further reading

    4

    Loss Functions and Optimization

    Technical requirements

    Loss functions – what are they?

    Risk functions

    There are many loss functions

    Different loss functions = different end results

    Loss functions for anything

    A loss function by any other name

    What we learned

    Least Squares

    The squared-loss function

    OLS regression

    OLS, outliers, and robust regression

    What we learned

    Linear models

    Practical issues

    The model residuals

    OLS regression code example

    What we learned

    Gradient descent

    Locating the minimum of a simple risk function

    Gradient descent code example

    Gradient descent is a general technique

    Beyond simple gradient descent

    What we learned

    Summary

    Exercises

    5

    Probabilistic Modeling

    Technical requirements

    Likelihood

    A simple probabilistic model

    Log likelihood

    Maximum likelihood estimation

    What we have learned

    Bayes’ theorem

    Conditional probability and Bayes’ theorem

    Priors

    The posterior

    What we have learned

    Bayesian modeling

    Bayesian model averaging

    MAP estimation

    As http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>bold-italic>N becomes large the prior becomes irrelevant

    Least squares as an approximation to Bayesian modeling

    What we have learned

    Bayesian modeling in practice

    Analytic approximation of the posterior

    Computational sampling

    MCMC code example

    Probabilistic programming languages

    What we have learned

    Summary

    Exercises

    Part 2: Intermediate Concepts

    6

    Time Series and Forecasting

    Technical requirements

    What is time series data?

    What does auto-correlation mean for modeling time series data?

    The auto-correlation function (ACF)

    The partial auto-correlation function (PACF)

    Other data science implications of time series data

    What we have learned

    ARIMA models

    Integrated

    Auto-regression

    Moving average

    Combining the AR(p), I(d), and MA(q) into an ARIMA model

    Variants of ARIMA modeling

    What we have learned

    ARIMA modeling in practice

    Unit root testing

    Interpreting ACF and PACF plots

    auto.arima

    What we have learned

    Machine learning approaches to time series analysis

    Routine application of machine learning to time series analysis

    Deep learning approaches to time series analysis

    AutoML approaches to time series analysis

    What we have learned

    Summary

    Exercises

    Notes and further reading

    7

    Hypothesis Testing

    Technical requirements

    What is a hypothesis test?

    Example

    The general form of a hypothesis test

    The p-value

    The effect of increasing sample size

    The effect of decreasing noise

    One-tailed and two-tailed tests

    Using samples variances in the test statistic – the t-test

    Computationally intensive methods for p-value estimation

    Parametric versus non-parametric hypothesis tests

    What we learned

    Confidence intervals

    What does a confidence interval really represent?

    Confidence intervals for any parameter

    A confidence interval code example

    What we learned

    Type I and Type II errors, and power

    What we learned

    Summary

    Exercises

    Notes and further reading

    8

    Model Complexity

    Technical requirements

    Generalization, overfitting, and the role of model complexity

    Overfitting

    Why overfitting is bad

    Overfitting increases the variability of predictions

    Underfitting is also a problem

    Measuring prediction error

    What we learned

    The bias-variance trade-off

    Proof of the bias-variance trade-off formula

    Double descent – a modern twist on the generalization error diagram

    What we learned

    Model complexity measures for model selection

    Selecting between classes of models

    Akaike Information Criterion

    Bayesian Information Criterion

    What we learned

    Summary

    Notes and further reading

    9

    Function Decomposition

    Technical requirements

    Why do we want to decompose a function?

    What is a decomposition of a function?

    Example 1 – decomposing a one-dimensional function into symmetric and anti-symmetric parts

    Example 2 – decomposing a time series into its seasonal and non-seasonal components

    What we’ve learned

    Expanding a function in terms of basis functions

    What we’ve learned

    Fourier series

    What we’ve learned

    Fourier transforms

    The multi-dimensional Fourier transform

    What we’ve learned

    The discrete Fourier transform

    DFT code example

    Uses of the DFT

    What is the difference between the DFT, Fourier series, and the Fourier transform?

    What we’ve learned

    Summary

    Exercises

    10

    Network Analysis

    Technical requirements

    Graphs and network data

    Network data is about relationships

    Example 1 – substituting goods in a supermarket

    Example 2 – international trade

    What is a graph?

    What we’ve learned

    Basic characteristics of graphs

    Undirected and directed edges

    The adjacency matrix

    In-degree and out-degree

    Centrality

    What we’ve learned

    Different types of graphs

    Fully connected graphs

    Disconnected graphs

    Directed acyclic graphs

    Small-world networks

    Scale-free networks

    What we’ve learned

    Community detection and decomposing graphs

    What is a community?

    How to do community detection

    Community detection algorithms

    Community detection code example

    What we’ve learned

    Summary

    Exercises

    Notes and further reading

    Part 3: Selected Advanced Concepts

    11

    Dynamical Systems

    Technical requirements

    What is a dynamical system and what is an evolution equation?

    Time can be discrete or continuous

    Time does not have to mean chronological time

    Evolution equations

    What we learned

    First-order discrete Markov processes

    Variations of first-order Markov processes

    A Markov process is a probabilistic model

    The transition probability matrix

    Properties of the transition probability matrix

    Epidemic modeling with a first-order discrete Markov process

    The transition probability matrix is a network

    Using the transition matrix to generate state trajectories

    Evolution of the state probability distribution

    Stationary distributions and limiting distributions

    First-order discrete Markov processes are memoryless

    Likelihood of the state sequence

    What we learned

    Higher-order discrete Markov processes

    Second-order discrete Markov processes

    Evolution of the state probability distribution in higher-order models

    A higher-order discrete Markov process is a first-order discrete Markov process in disguise

    Higher-order discrete Markov processes are still memoryless

    What we learned

    Hidden Markov Models

    Emission probabilities

    Making inferences with an HMM

    What we learned

    Summary

    Exercises

    Notes and further reading

    12

    Kernel Methods

    Technical requirements

    The role of inner products in common learning algorithms

    Sometimes we need new features in our inner products

    What we learned

    The kernel trick

    What is a kernel?

    Commonly used kernels

    Kernel functions for other mathematical objects

    Combining kernels

    Positive semi-definite kernels

    Mercer’s theorem and the kernel trick

    Kernelized algorithms

    What we learned

    An example of a kernelized learning algorithm

    kFDA code example

    What we learned

    Summary

    Exercises

    13

    Information Theory

    Technical requirements

    What is information and why is it useful?

    The concept of information

    The mathematical definition of information

    Information theory applies to continuous distributions as well

    Why we measure information on a logarithmic scale

    Why is quantifying information useful?

    What we’ve learned

    Entropy as expected information

    Entropy

    What we’ve learned

    Mutual information

    Conditional entropy

    Mutual information for continuous variables

    Mutual information as a measure of correlation

    Mutual information code example

    What we’ve learned

    The Kullback-Leibler divergence

    Relative entropy

    KL-divergence for continuous variables

    Using the KL-divergence for approximation

    Variational inference

    What we’ve learned

    Summary

    Exercises

    Notes and further reading

    14

    Non-Parametric Bayesian Methods

    Technical requirements

    What are non-parametric Bayesian methods?

    We still have parameters

    The different types of non-parametric Bayesian methods

    The pros and cons of non-parametric Bayesian methods

    What we learned

    Gaussian processes

    The kernel function

    Fitting GPR models

    Prediction using GPR models

    GPR code example

    What we learned

    Dirichlet processes

    How do DPs differ from GPs?

    The DP notation

    Sampling a function from a DP

    Generating a sample of data from a DP

    Bayesian non-parametric inference using a DP

    What we learned

    Summary

    Exercises

    15

    Random Matrices

    Technical requirements

    What is a random matrix?

    What we learned

    Using random matrices to represent interactions in large-scale systems

    What we learned

    Universal behavior of large random matrices

    The Wigner semicircle law

    What does RMT study?

    Universal is universal

    The classical Gaussian matrix ensembles

    What we learned

    Random matrices and high-dimensional covariance matrices

    The Marčenko-Pastur distribution is a bulk distribution

    Universality in the singular values of http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>false>false>bold-italic>X__

    The Marčenko-Pastur distribution and neural networks

    What we learned

    Summary

    Exercises

    Notes and further reading

    Index

    Other Books You May Enjoy

    Preface

    This is not a book about a specific technology or programming language. This is a book about mathematics. And mathematics is a language. It is the language of science, and so it is the language of data science as well. We can say beautiful things with that language. Just as a piece of great literature is more than a large collection of individual letters, a mathematical equation is more than just a collection of symbols. An equation conveys a way of thinking about a data science problem. It conveys a concept or an idea. If you want to fully exploit the power of those ideas and adapt them to your own data science work, you need to move beyond just recognizing the symbols in an equation and move towards understanding what that equation is really telling you.

    Many people are not confident in reading and interpreting mathematical equations and mathematical ideas. And yet, as with great literature, once someone guides us through the nuances and subtexts, their beauty is revealed and becomes obvious. That is what this book aims to do.

    This book will not make you an expert in every area of mathematics. Instead, it will give you enough skills and confidence to read and navigate mathematical equations and ideas on your own. We do that by walking you through the core concepts that underpin many data science algorithms – the 15 math concepts of the book’s title. We also do that by walking through those concepts slowly and in detail. I am not a fan of mathematics books that consist solely of theorems, lemmas, and proofs. Instead, this book is unapologetically long-form math. When we introduce an equation, we will explain what the equation tells us, what its implications and ramifications are, and how it connects to other parts of math. We also illustrate those concepts with code examples in Python.

    At the end of the book, you will be equipped to look at the math equations of any data science algorithm and confidently unpack what that algorithm is trying to do.

    Who this book is for

    This book is for data scientists and machine learning engineers who have been using data science and machine learning techniques, software, and Python packages such as scikit-learn, but without necessarily fully understanding the mathematics behind the algorithms. This could include the following types of people:

    Data scientists who have a college/undergraduate degree in a numerate subject and so have a basic understanding of mathematics, but they want to learn more, particularly those bits of mathematics that will be helpful in their roles as data scientists.

    Data scientists who have a good understanding of some of the mathematics behind bits of data science but want to discover some new math concepts that will be useful to them in their data science work.

    Data scientists who have business or data science problems they need to solve, but existing software does not provide appropriate algorithms. They want to construct their own algorithms but lack the mathematical guidance on how to apply mathematics to the new data science problems.

    What this book covers

    Chapter 1

    , Recap of Mathematical Notation and Terminology, provides a summary of the main mathematical notation you will encounter in this book and that we expect you to already be familiar with.

    Chapter 2

    , Random Variables and Probability Distributions, introduces the idea that all data contains some degree of randomness, and that random variables and their associated probability distributions are the natural way to describe that randomness. The chapter teaches you how to sample from a probability distribution, understand statistical estimators, and about the Central Limit Theorem.

    Chapter 3

    , Matrices and Linear Algebra, introduces vectors and matrices as the basic mathematical structures we use to represent and transform data. It then shows how matrices can be broken down into simple-to-understand parts using techniques such as eigen-decomposition and singular value decomposition. The chapter finishes with explanations of how these decomposition methods are applied to principal component analysis (PCA) and non-negative matrix factorization (NMF).

    Chapter 4, Loss Functions and Optimization, starts by introducing loss functions, risk functions, and empirical risk functions. The concept of minimizing an empirical risk function to estimate the parameters of a model is explained, before introducing Ordinary Least Squares estimation of linear models. Finally, gradient descent is illustrated as a general technique for minimizing risk functions.

    Chapter 5

    , Probabilistic Modeling, introduces the concept of building predictive models that explicitly account for the random component within data. The chapter starts by introducing likelihood and maximum likelihood estimation, before introducing Bayes’ theorem and Bayesian inference. The chapter finishes with an illustration of Markov Chain Monte Carlo and importance sampling from the posterior distribution of a model’s parameters.

    Chapter 6

    , Time Series and Forecasting, introduces time series data and the concept of auto-correlation as the main characteristic that distinguishes time series data from other types of data. It then describes the classical ARIMA approach to modeling time series data. Finally, it ends with a summary of concepts behind modern machine learning approaches to time series analysis.

    Chapter 7

    , Hypothesis Testing, introduces what a hypothesis test is and why they are important in data science. The general form of a hypothesis test is outlined before the concepts of statistical significance and p-values are explained in depth. Next, confidence intervals and their interpretation are introduced. The chapter ends with an explanation of Type-I and Type-II errors, and power calculations.

    Chapter 8

    , Model Complexity, introduces the concept of how we describe and quantify model complexity and discusses its impact on the predictive accuracy of a model. The classical bias-variance trade-off view of model complexity is introduced, along with the phenomenon of double descent. The chapter finishes with an explanation of model complexity measures for model selection.

    Chapter 9

    , Function Decomposition, introduces the idea of decomposing or building up a function from a set of simpler basis functions. A general approach is explained first before the chapter moves on to introducing Fourier Series, Fourier Transforms, and the Discrete Fourier Transform.

    Chapter 10

    , Network Analysis, introduces networks, network data, and the concept that a network is a graph. The node-edge description of a graph, along with its adjacency matrix representation is explained. Next, the chapter describes different types of common graphs and their properties. Finally, the decomposition of a graph into sub-graphs or communities is explained, and various community detection algorithms are illustrated.

    Chapter 11

    , Dynamical Systems, introduces what a dynamical system is and explains how its dynamics are controlled by an evolution equation. The chapter then focuses on discrete Markov processes as these are the most common dynamical systems used by data scientists. First-order discrete Markov processes are explained in depth, before higher-order Markov processes are introduced. The chapter finishes with an explanation of Hidden Markov Models and a discussion of how they can be used in commercial data science applications.

    Chapter 12

    , Kernel Methods, starts by introducing inner-product-based learning algorithms, then moves on to explaining kernels and the kernel trick. The chapter ends with an illustration of a kernelized learning algorithm. Throughout the chapter, we emphasize how the kernel trick allows us to implicitly and efficiently construct new features and thereby uncover any non-linear structure present in a dataset.

    Chapter 13

    , Information Theory, introduces the concept of information and how it is measured mathematically. The main information theory concepts of entropy, conditional entropy, mutual information, and relative entropy are then explained, before practical uses of the Kullback-Leibler divergence are illustrated.

    Chapter 14

    , Bayesian Non-Parametric Methods, introduces the idea of using a Bayesian prior over functions when building probabilistic models. The idea is illustrated through Gaussian Processes and Gaussian Process Regression. The chapter then introduces Dirichlet Processes and how they can be used as priors for probability distributions.

    Chapter 15

    , Random Matrices, introduces what a random matrix is and why they are ubiquitous in science and data science. The universal properties of large random matrices are illustrated along with the classical Gaussian random matrix ensembles. The chapter finishes with a discussion of where large random matrices occur in statistical and machine learning models.

    To get the most out of this book

    To get the most out of this book, we assume you have at least some familiarity with high-school mathematics, such as complex numbers, basic calculus, and elementary uses of vectors and matrices. To get the most out of the code examples in the book, you should have some experience of coding in Python. You will also need access to a computer or server with a full Python installation and/or where you have privileges to run and install Python and any additional packages required.

    The code examples given in each chapter, and the answers to the exercises at the end of each chapter, are available in the book’s GitHub repository as Jupyter notebooks. To run the notebooks, you will need a Jupyter installation.

    If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

    Download the example code files

    You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/15-Math-Concepts-Every-Data-Scientist-Should-Know

    . If there’s an update to the code, it will be updated in the GitHub repository.

    We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/

    . Check them out!

    Conventions used

    There are a number of text conventions used throughout this book.

    Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: The following code example can be found in the Code_Examples_Chap5.ipynb notebook in the GitHub repository.

    A block of code is set as follows:

    map_estimate = minimize(neg_log_posterior,

                            x0,

                            method='BFGS',

                            options={'disp': True})

    # Convert from logit(p) to p

    p_optimal = np.exp(map_estimate['x'][0])/ (

        1.0 + np.exp(map_estimate['x'][0]))

    print(MAP estimate of success probability = , p_optimal)

    Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: The name ARIMA stands for Auto-Regressive Integrated Moving Average models.

    Tips or important notes

    Appear like this.

    Get in touch

    Feedback from our readers is always welcome.

    General feedback: If you have questions about any aspect of this book, email us at [email protected]

    and mention the book title in the subject of your message.

    Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata

    and fill in the form.

    Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected]

    with a link to the material.

    If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com

    .

    Share your thoughts

    Once you’ve read 15 Math Concepts Every Data Scientist Should Know, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page

    for this book and share your feedback.

    Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

    Download a free PDF copy of this book

    Thanks for purchasing this book!

    Do you like to read on the go but are unable to carry your print books everywhere?

    Is your eBook purchase not compatible with the device of your choice?

    Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

    Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

    The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

    Follow these simple steps to get the benefits:

    Scan the QR code or visit the link below

    https://packt.link/free-ebook/9781837634187

    2. Submit your proof of purchase

    3. That’s it! We’ll send your free PDF and other benefits to your email directly

    Part 1: Essential Concepts

    In this part, we will introduce the math concepts that you will encounter again and again as a data scientist. These concepts are vital to gain a good understanding of. After a recap of basic math notation, we look at the concepts related to how data is produced and then move through to concepts related to how to transform data, finally building up to our end goal of how to model data. These concepts are essential because you will use and combine them simultaneously in your work. By the end of Part 1, you will be comfortable with the math concepts that underpin almost all data science models and algorithms.

    This section contains the following chapters:

    Chapter 1

    , Recap of Mathematical Notation and Terminology

    Chapter 2

    , Random Variables and Probability Distributions

    Chapter 3

    , Matrices and Linear Algebra

    Chapter 4

    , Loss Functions and Optimization

    Anchor 5

    , Probabilistic Modeling

    1

    Recap of Mathematical Notation and Terminology

    Our tour of math concepts will start properly in Chapter 2

    . Before we begin that tour, we’ll start by recapping some mathematical notation and terminology. Mathematics is a language, and mathematical symbols and notation are its alphabet. Therefore, we must be comfortable with and understand the basics of this alphabet.

    In this chapter, we will recap the most common core notation and terminology that we are likely to use repeatedly throughout the book. We have grouped the recap into six main math areas or topics. Those topics are as follows:

    Number systems: In this section, we introduce notation for real and complex numbers

    Linear algebra: In this section, we introduce notation for describing vectors and matrices

    Sums, products, and logarithms: In this section, we introduce notation for succinctly representing sums and products, and we introduce rules for logarithms

    Differential and integral calculus: In this section, we introduce basic notation for differentiation and integration

    Analysis: In this section, we introduce notation for describing limits, and order notation

    Combinatorics: In this section, we introduce notation for binomial coefficients

    Some of this notation you may already be familiar with. For example, complex numbers, matrices, logarithms, and basic differential calculus you will have seen either in high school or in the first year of an undergraduate degree in a numerate subject. Other topics, such as order notation, you may have encountered as part of a university degree course on mathematical analysis or algorithm complexity, or it may be new to you. For the most part, the notation we recap in this chapter you will have seen before. You can skip this chapter if you want to and if you are already familiar and comfortable with the symbols and notation recapped here. You can easily come back later or read those sections that contain notation that is new to you.

    We should emphasize that this chapter is a recap. It is brief. It is not meant to be an exhaustive and comprehensive review. We focus on presenting a few main facts, but also on trying to give a feel for why the notation may be useful and how it is likely to be used.

    Finally, we will encounter new notation, terminology, and symbols as we progress through the book when we are discussing specific topics. We will introduce this new notation and terminology as and when we need it.

    Technical requirements

    As this chapter solely recaps some of the mathematical notation we will use in later chapters, there are no code examples given and hence no technical requirements for this particular chapter.

    For later chapters, you will be able to find code examples at the GitHub repository: https://github.com/PacktPublishing/15-Math-Concepts-Every-Data-Scientist-Should-Know

    Number systems

    In this section, we introduce notation for describing sets of numbers. We will focus on the real numbers and the complex numbers.

    Notation for numbers and fields

    As this is a book about data science, we will be dealing with numbers. So, it will be worthwhile recapping the notation we use to refer to the most common sets of numbers.

    Most of the numbers we will deal with in this book will be real numbers, such as 4.6, 1, or -2.3. We can think of them as living on the real number line shown in Figure 1.1. The real number line is a one-dimensional continuous structure. There are an infinite number of real numbers. We denote the set of all real numbers by the symbol ℝ.

    Figure 1.1: The real number line

    Figure 1.1: The real number line

    Obviously, there will be situations where we want to restrict our datasets to, say, just integer-valued numbers. This would be the case if we were analyzing count data, such as the number of items of a particular product on an e-commerce site sold on a particular day. The integer numbers, …, -2, -1, 0, 1, 2, …, are a subset of the real numbers, and we denote them by the symbol ℤ. Despite them being a subset of the real numbers, there are still an infinite number of integers.

    For the e-commerce count data that we mentioned earlier, the integer value would always be positive. If we restrict ourselves to strictly positive integers, 1, 2, 3, …, and so on, then we have the natural or counting numbers. These we denote by the symbol http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>double-struck>Z+ , clearly meaning positive integers. The fact that these strictly positive integers are the natural numbers means we also denote them using the symbol http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>double-struck>N .

    As well as real numbers, we will occasionally deal with complex numbers. As the name suggests, complex numbers have more structure to them than real numbers. The complex numbers don’t live on the real number line and so are not a subset of the real numbers, but instead, they have a two-dimensional structure, which we’ll explain in a moment. We denote the set of complex numbers by the symbol http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>double-struck>C .

    Sometimes, there are very specific occasions when we may want to refer to other subsets of the real numbers. Other common symbols you may encounter are http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>double-struck>Q , for the set of rational numbers, and http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>double-struck>Z2 for the two-element set http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>{ close=} separators=|>0,1 . The latter you may encounter when we talk about modeling binary discrete target variables or working with binary features.

    Numbers such as 4.6 are specific instances of a real number. When we are talking about algorithms or code, we will want to talk about variables, in which case we use a symbol such as http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x to represent a number, which could take on a range of different values depending on what we do with it. But what could that range be? When we are documenting an algorithm, we may want to tell the reader that http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x will always be a real number. We do that by writing http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>xdouble-struck>R , which is mathematical language for " http://www.w3.org/1998/Math/MathML>x is in the set of real numbers, or more succinctly, http://www.w3.org/1998/Math/MathML>x is real."

    Likewise, if we wanted to say http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x was always a positive integer, then we would write http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>xdouble-struck>Z+ . Or, if we wanted to say http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x was a complex number, we would write http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>xdouble-struck>C .

    When we have several variables that all have similar properties or that may be related in some way – for example, they represent different features of a data point in a training set – then we use subscripts to denote the different variables. For example, we would use http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x1,x2,x3 to represent three features of a dataset. Just as with the single variable http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x , if we want to say that those three features will always contain real numbers, then we would write http://www.w3.org/1998/Math/MathML>x1,x2,x3double-struck>R .

    Complex numbers

    If the real numbers live on the one-dimensional structure that is the real number line, this raises the question of whether we can have numbers that live in a two-dimensional space. Complex numbers are such numbers. A complex number, http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z , has two components or parts. These are a real part, http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x , and an imaginary part, http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>y , with both http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x,y being real numbers. The real and imaginary parts are combined, and we write the complex number http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z as follows:

    <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" display="block"><mml:mi>z</mml:mi><mml:mo>=</mml:mo><mml:mi>x</mml:mi><mml:mo> </mml:mo><mml:mo>+</mml:mo><mml:mo> </mml:mo><mml:mtext>i</mml:mtext><mml:mi>y</mml:mi></mml:math>

    Eq. 1

    The symbol http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>i has a special meaning. It is in fact the square root of -1, so that http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>i2=-1 . We can think of the pair of numbers, http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>|>x,y , as picking out a point in a 2D plane. That plane is the complex plane, sometimes also called the Argand plane. Figure 1.2 shows the point http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z in the complex plane:

    Figure 1.2: The complex number plane

    Figure 1.2: The complex number plane

    The position of http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z along the x-axis is given by the real part of http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z , while the position of http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z along the y-axis is given by the imaginary part of http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z . We also use Re http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z to denote the real part of http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z , and Im http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z to denote the imaginary part of http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z , so that we have the following:

    <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" display="block"><mml:mtext>Re</mml:mtext><mml:mo> </mml:mo><mml:mi>z</mml:mi><mml:mo>=</mml:mo><mml:mi>x</mml:mi><mml:mo> </mml:mo><mml:mo> </mml:mo><mml:mo>,</mml:mo><mml:mo> </mml:mo><mml:mo> </mml:mo><mml:mtext>Im</mml:mtext><mml:mo> </mml:mo><mml:mi>z</mml:mi><mml:mo>=</mml:mo><mml:mi>y</mml:mi></mml:math>

    Eq. 2

    Consequently, we have used Re http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z and Im http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z to label the axes of the complex plane in Figure 1.2.

    A number that has http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>y=0 sits entirely on the x-axis and is a purely real number. Likewise, a complex number that has http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x=0 sits entirely on the y-axis and is a purely imaginary number.

    Just as with other 2D planes, we can represent a point in the complex plane not just with Cartesian coordinates http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>|>x,y but with polar coordinates as well. This is also illustrated in Figure 1.2. A quick bit of high-school trigonometry gives us the following:

    <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" display="block"><mml:mi>z</mml:mi><mml:mo>=</mml:mo><mml:mfenced open="|" close="|" separators="|"><mml:mrow><mml:mi>z</mml:mi></mml:mrow></mml:mfenced><mml:mo>×</mml:mo><mml:mfenced separators="|"><mml:mrow><mml:mtext>cos</mml:mtext><mml:mo> </mml:mo><mml:mi mathvariant="normal">θ</mml:mi><mml:mo> </mml:mo><mml:mo>+</mml:mo><mml:mo> </mml:mo><mml:mtext>i</mml:mtext><mml:mo> </mml:mo><mml:mtext>sin</mml:mtext><mml:mo> </mml:mo><mml:mi mathvariant="normal">θ</mml:mi></mml:mrow></mml:mfenced></mml:math>

    Eq. 3

    The symbol http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>| close=| separators=|>z denotes the modulus of http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z and is the same as the distance of the point http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z from the origin in Figure 1.2. Looking at Figure 1.2 and using Pythagoras’ theorem, we can calculate http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>| close=| separators=|>z using the following:

    <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" display="block"><mml:msup><mml:mrow><mml:mfenced open="|" close="|" separators="|"><mml:mrow><mml:mi>z</mml:mi></mml:mrow></mml:mfenced></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mfenced separators="|"><mml:mrow><mml:mtext>Re</mml:mtext><mml:mi> </mml:mi><mml:mi>z</mml:mi></mml:mrow></mml:mfenced></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mfenced separators="|"><mml:mrow><mml:mtext>Im</mml:mtext><mml:mi> </mml:mi><mml:mi>z</mml:mi></mml:mrow></mml:mfenced></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math>

    Eq. 4

    The angle http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>θ is conventionally measured in a counterclockwise direction and in radians, so that a point on the positive y-axis would have http://www.w3.org/1998/Math/MathML>normal>θ=normal>π/2 (remember http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>2normal>π radians = 360° ). Euler’s formula is as follows:

    <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><msup><mi mathvariant="normal">e</mi><mrow><mtext>i</mtext><mtext>θ</mtext></mrow></msup><mo>=</mo><mi>cos</mi><mtext>θ</mtext><mo>+</mo><mtext>i</mtext><mi>sin</mi><mtext>θ</mtext></mrow></mrow></math>

    Eq. 5

    This means we can also write http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z in the following form:

    <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><mi>z</mi><mo>=</mo><mfenced open="|" close="|"><mi>z</mi></mfenced><msup><mi>e</mi><mrow><mtext>i</mtext><mi mathvariant="normal">θ</mi></mrow></msup></mrow></mrow></math>

    Eq. 6

    This last form for writing a complex number will be useful when we introduce Fourier transforms, which are used to represent functions as a sum of sine and cosine waves. In fact, this is our main reason for introducing complex numbers.

    One important concept relating to the complex number z is that of its complex conjugate. The complex conjugate of http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z we will denote by http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>true>z- . Sometimes, the symbol http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z* is used instead. The complex conjugate http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>true>z- is related to http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z by flipping the sign of the imaginary part of http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z . So, if http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z=x+iy , then http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>true>z-=x-iy . In Figure 1.3, this is shown by simply reflecting http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z in the x-axis. A useful relation that follows is the following:

    <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" display="block"><mml:mi>z</mml:mi><mml:mover accent="false"><mml:mrow><mml:mi>z</mml:mi></mml:mrow><mml:mo>¯</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mfenced open="|" close="|" separators="|"><mml:mrow><mml:mi>z</mml:mi></mml:mrow></mml:mfenced></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math>

    Eq. 7

    Figure 1.3: The complex conjugate

    Figure 1.3: The complex conjugate

    The integers, real numbers, and complex numbers represent the overwhelming majority of the numbers we will meet throughout this book, so this is a good place to end our recap of number systems.

    Let’s summarize what we learned.

    What we learned

    In this section, we have learned the following:

    The notation http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>double-struck>R , for describing the real numbers

    The notation http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>double-struck>Z , for describing the integer numbers

    The notations http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>double-struck>Z+ and http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>double-struck>N , for describing the strictly positive integers, also known as the natural numbers

    The notation http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>double-struck>Z2 , for describing the binary set http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>{ close=} separators=|>0,1

    The notation http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>double-struck>C , for describing the complex numbers

    How complex numbers have a real and an imaginary part

    How a complex number http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z can also be described in terms of a modulus, http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>|z| , and a phase, http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>θ

    How to calculate the complex conjugate http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>true>z- of a complex number http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>z

    In the next section, having learned how to describe both real and complex numbers, we move on to how to describe collections of numbers (vectors) and how to describe mathematical objects (matrices) that transform those vectors.

    Linear algebra

    In this section, we introduce notation to describe vectors and matrices, which are key mathematical objects that we will encounter again and again throughout this book.

    Vectors

    In many circumstances, we will want to represent a set of numbers together. For example, the numbers 7.3 and 1.2 might represent the values of two features that correspond to a data point in a training set. We often group these numbers together in brackets and write them as (7.3, 1.2) or [7.3, 1.2]. Because of the similarity to the way we write spatial coordinates, we tend to call a collection of numbers that are held together a vector. A vector can be two-dimensional, as in the example just given, or d-dimensional, meaning it contains d components, and so might look like http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>|>x1,x2,,xd .

    We can write a vector in two ways. We can write it as a row vector, going across the page, such as the following vector:

    http://www.w3.org/1998/Math/MathML>( close=)>x1,x2,,xd=a d-dimensional row vector

    Eq. 8

    Alternatively, we can write it as a column vector going down the page, such as the following vector:

    <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" display="block"><mml:mfenced separators="|"><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>⋮</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mfenced><mml:mo>=</mml:mo><mml:mtext>a</mml:mtext><mml:mo> </mml:mo><mml:mi>d</mml:mi><mml:mtext>-dimensional column vector</mml:mtext></mml:math>

    Eq. 9

    We can convert between a row vector and a column vector (and vice versa) using the transpose operator, denoted by a http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>⊤ superscript. So, the transpose of a row vector is a column vector. See the following example:

    <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" display="block"><mml:msup><mml:mrow><mml:mfenced separators="|"><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>…</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfenced></mml:mrow><mml:mrow><mml:mi mathvariant="normal">⊤</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mfenced separators="|"><mml:mrow><mml:mtable><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>⋮</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mfenced></mml:math>

    Eq. 10

    And vice-versa in the following example:

    <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><msup><mfenced open="(" close=")"><mtable columnwidth="auto" columnalign="center" rowspacing="1.0000ex 1.0000ex 1.0000ex" rowalign="baseline baseline baseline baseline"><mtr><mtd><msub><mi>x</mi><mn>1</mn></msub></mtd></mtr><mtr><mtd><msub><mi>x</mi><mn>2</mn></msub></mtd></mtr><mtr><mtd><mo>⋮</mo></mtd></mtr><mtr><mtd><msub><mi>x</mi><mi>d</mi></msub></mtd></mtr></mtable></mfenced><mi mathvariant="normal">⊤</mi></msup><mo>=</mo><mfenced open="(" close=")"><mrow><msub><mi>x</mi><mn>1</mn></msub><mo>,</mo><msub><mi>x</mi><mn>2</mn></msub><mo>,</mo><mo>…</mo><mo>,</mo><msub><mi>x</mi><mi>d</mi></msub></mrow></mfenced></mrow></mrow></math>

    Eq. 11

    Symbolically, we often write a vector using a boldface font – for example, http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>bold-italic>y would mean a vector. Sometimes, we use an underline to denote a vector, so you may also see http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>false>y_ . Throughout this book, I will use an underline to denote a vector. This will make it clear when I am talking about a vector.

    Matrices

    Usually, we will want to transform a vector more than just transposing it. Linear transformations of vectors can be done with matrices. We will cover such transformations in Chapter 3

    , but for now, we will just show how we write a matrix. A matrix is a two-dimensional array. For example, the following array is a matrix:

    <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><munder><munder><mi>M</mi><mo stretchy="true">_</mo></munder><mo stretchy="true">_</mo></munder><mspace width="0.25em" /><mo>=</mo><mspace width="0.25em" /><mfenced open="(" close=")"><mtable columnspacing="0.8000em 0.8000em 0.8000em" columnwidth="auto auto auto auto" columnalign="center center center center" rowspacing="1.0000ex 1.0000ex" rowalign="baseline baseline baseline"><mtr><mtd><mn>7</mn></mtd><mtd><mn>3</mn></mtd><mtd><mn>2</mn></mtd><mtd><mn>5</mn></mtd></mtr><mtr><mtd><mn>1</mn></mtd><mtd><mrow><mo>−</mo><mn>2</mn></mrow></mtd><mtd><mrow><mo>−</mo><mn>1</mn></mrow></mtd><mtd><mn>6</mn></mtd></mtr><mtr><mtd><mn>1</mn></mtd><mtd><mrow><mo>−</mo><mn>9</mn></mrow></mtd><mtd><mn>14</mn></mtd><mtd><mn>0</mn></mtd></mtr></mtable></mfenced></mrow></mrow></math>

    Eq. 12

    We have used a double underline to denote the matrix http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>false>false>M__ . Note that a matrix has a double underline because it is a two-dimensional structure, while we use a single underline for a vector, which is a one-dimensional structure.

    Because a matrix is a two-dimensional structure, we use two numbers to describe its size: the number of rows and the number of columns. If a matrix has R rows and C columns, we describe it as an R x C matrix. The matrix http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>false>false>M__ in Eq. 12 is a 3 x 4 matrix.

    We pick out individual parts of a matrix by referring to a matrix element. The symbol http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>Mij or http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>false>false>M__ij refers to the number that is in the position of the ith row and jth column. So, for the matrix in Eq. 12, http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>M24=6 .

    The matrix elements in the previous example are all integers. This need not be the case. A matrix element could be any real number. It can also be a complex number. If all the matrix elements are real, we say it is a real matrix, while if any of the matrix elements are complex, then we say the matrix is complex.

    That short recap on notation for vectors and matrices is enough for now. We will meet vectors and matrices again in Chapter 3

    , but for now, let’s summarize what we have learned about them.

    What we learned

    In this section, we have learned about the following:

    How to represent a vector as a collection of multiple components (numbers)

    Row vectors and column vectors and how they are related to each other via the transpose operator

    How a matrix is a two-dimensional collection of components (numbers) and how the notation http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>Mij is used to pick out individual components or matrix elements

    In the next section, now we have learned about various notations for individual numbers and collections of them, we move on to notation for performing operations on them. We start with the simplest operations – adding numbers together, multiplying numbers together, and taking logarithms.

    Sums, products, and logarithms

    In this section, we introduce notation for doing the most basic operations we can do with numbers, namely adding them together or multiplying them together. We’ll then introduce notation for working with logarithms.

    Sums and the 𝚺 notation

    When we want to add several numbers together, we can use the summation, or http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>Σ , notation. For example, if we want to represent the addition of the numbers http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x1,x2,x3,x4,normal>x5 , we use the http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>Σ notation to write this as follows:

    <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>i</mi><mo>=</mo><mn>5</mn></mrow></munderover><msub><mi>x</mi><mi>i</mi></msub></mrow></mrow></math>

    Eq. 13

    This notation is shorthand for writing

    <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub></mml:math>

    . This essentially defines what the http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>Σ notation represents – that is, the following:

    <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><mrow><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>i</mi><mo>=</mo><mn>5</mn></mrow></munderover><msub><mi>x</mi><mi>i</mi></msub></mrow><mo>=</mo><msub><mi>x</mi><mn>1</mn></msub><mo>+</mo><msub><mi>x</mi><mn>2</mn></msub><mo>+</mo><msub><mi>x</mi><mn>3</mn></msub><mo>+</mo><msub><mi>x</mi><mn>4</mn></msub><mo>+</mo><msub><mi>x</mi><mn>5</mn></msub></mrow></mrow></math>

    Eq. 14

    In the left-hand side (LHS) of Eq. 14, the integer indexing variable, http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>i , takes the values between 1 (indicated beneath the http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>Σ symbol) and 5 (indicated above the http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>Σ symbol) and we interpret the LHS as "take all the numbers http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>xi for the values of http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>i indicated by the http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>Σ symbol and add them together."

    You may wonder whether the shorthand notation on the LHS of Eq. 14 is of any use. After all, the right-hand side (RHS) isn’t very long. However, when we want to represent the adding up of lots of numbers, then the http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>Σ notation really comes into its own. For example, if we want to add up the numbers http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x1,x2, up to http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x100 , then we use the http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>Σ notation to write this compactly, as follows:

    <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" display="block"><mml:mrow><mml:munderover><mml:mo stretchy="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:math>

    Eq. 15

    Sometimes, we will use the http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>Σ notation to add together a set of numbers where the size of the set (the number of numbers being added together) is variable. For example, see the following notation:

    ∑ i=1 i=N x i

    Eq. 16

    This means "add together the N numbers, http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x1,x2,,xN ." Clearly, we would get a different result for different choices of http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>N . This means the expression given in Eq. 16 is a function of http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>N .

    Sometimes, you may see variants of the expression in the previous equation. Sometimes, a person may omit the upper value of http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>i or both the lower and upper values in the http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>Σ notation because it is taken as understood what the values should naturally be. For example, you may see the following:

    <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" display="block"><mml:mrow><mml:munder><mml:mo stretchy="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:math>

    Eq. 17

    This usually means "add up all values of http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>xi in the problem we are analyzing." Similarly, the expressions http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>false>∑i=1i=Nxi and http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>false>∑i=1Nxi mean the same thing.

    Finally, note that when writing sums using the http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>Σ notation, we haven’t said where the values of http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>xi come from. We could in fact use the http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>Σ notation to add up the values we get after we have applied a function http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>f to the values http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>xi . In this case, we would write the following:

    <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><mrow><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>i</mi><mo>=</mo><mi>N</mi></mrow></munderover><mrow><mi>f</mi><mfenced open="(" close=")"><msub><mi>x</mi><mi>i</mi></msub></mfenced></mrow></mrow><mo>=</mo><mi>f</mi><mfenced open="(" close=")"><msub><mi>x</mi><mn>1</mn></msub></mfenced><mo>+</mo><mi>f</mi><mfenced open="(" close=")"><msub><mi>x</mi><mn>2</mn></msub></mfenced><mo>+</mo><mo>⋯</mo><mo>+</mo><mi>f</mi><mfenced open="(" close=")"><msub><mi>x</mi><mi>N</mi></msub></mfenced></mrow></mrow></math>

    Eq. 18

    The LHS of Eq. 18 is the http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>Σ notation way of writing the RHS. The example in Eq. 19 makes this clearer. If we set http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>N=5 so we had five numbers, http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x1,x2,x3,x4,x5 , and we want to apply the sine function to these five numbers and add them up, then we would write the following:

    <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>i</mi><mo>=</mo><mn>5</mn></mrow></munderover><mrow><mi>sin</mi><mfenced open="(" close=")"><msub><mi>x</mi><mi>i</mi></msub></mfenced></mrow></mrow></mrow></math>

    Eq. 19

    Finally, it is worth pointing out that we can also use the http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>Σ notation to add numbers that are simple functions of the index variable http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>i . For example, using the http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>Σ notation, we can write the sum of the first 100 squares as follows:

    <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" display="block"><mml:mrow><mml:munderover><mml:mo stretchy="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>100</mml:mn></mml:mrow></mml:munderover><mml:mrow><mml:msup><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:math>

    Eq. 20

    This is obviously shorthand notation for

    <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:msup><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:mo>…</mml:mo><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mn>99</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mn>100</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math>

    .

    Products and the http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>bold>Π notation

    Having introduced the http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>Σ notation and explained it at length, we can now introduce the complimentary idea of a concise, shorthand notation for multiplying lots of numbers together. We do this with the Π or product notation. If we want to multiply http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x1,x2,x3,x4,x5 together, we can write this as follows:

    <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><munderover><mo>∏</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>i</mi><mo>=</mo><mn>5</mn></mrow></munderover><msub><mi>x</mi><mi>i</mi></msub></mrow></mrow></math>

    Eq. 21

    As with the http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>Σ notation, we can use the http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>Π notation more generally. For example, we can write http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>false>∏i=1i=Nxi as shorthand for http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x1×x2××xN . Again, we can use the product notation as shorthand for multiplying function values together, as follows:

    <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><mrow><munderover><mo>∏</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>i</mi><mo>=</mo><mi>N</mi></mrow></munderover><mrow><mi>f</mi><mfenced open="(" close=")"><msub><mi>x</mi><mi>i</mi></msub></mfenced></mrow></mrow><mo>=</mo><mi>f</mi><mfenced open="(" close=")"><msub><mi>x</mi><mn>1</mn></msub></mfenced><mo>×</mo><mi>f</mi><mfenced open="(" close=")"><msub><mi>x</mi><mn>2</mn></msub></mfenced><mo>×</mo><mo>⋯</mo><mo>×</mo><mi>f</mi><mfenced open="(" close=")"><msub><mi>x</mi><mi>N</mi></msub></mfenced></mrow></mrow></math>

    Eq. 22

    Logarithms

    Logarithms are extremely useful for describing how quickly a quantity or function grows. In particular, the logarithm tells us the exponent that describes the rate of growth of a quantity or function. Let’s make that more explicit. The logarithm to base http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>a of the number http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>ax is http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x . Mathematically, we write this as follows:

    <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><msub><mtext>log</mtext><mi>a</mi></msub><mfenced open="(" close=")"><msup><mi>a</mi><mi>x</mi></msup></mfenced><mo>=</mo><mi>x</mi><mspace width="0.25em" /><mspace width="0.25em" /><mspace width="0.25em" /><mspace width="0.25em" /></mrow></mrow></math>

    Eq. 23

    The symbol http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>loga is shorthand for taking the logarithm to base http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>a . This shorthand is so common that even in the text, I will use the word log when I mean logarithm. It is also not uncommon to omit the brackets in the previous equation and write http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>logaax=x . The most common bases we use for taking logarithms are base http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>e , base 10, and base 2. Of these, base http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>e is so commonly used that we use a different symbol, http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>ln , when taking the log. So, in effect, this means http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>ln=loge . This symbol means the natural logarithm or natural log to denote the fact that taking the log to base http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>e is the most natural or common thing to do. Because taking the natural log is so common or natural, most mathematicians don’t really consider taking the log to any other base, and so by default, we use the symbol http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>log to mean http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>ln . Watch out for this. If you see the symbol http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>log without a base specified, then it either means the base is not important – for example, the proof of the mathematical statement does not depend upon the base – or base http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>e is implicitly meant. This is also the case in most computer programming languages. Applying the operator http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>log will return the natural logarithm. For example, in Python, if we use the numpy.log(y) NumPy function, we will get the natural logarithm of http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>y returned.

    We can see from Eq. 23 that the logarithm does in fact tell us the exponent (in base http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>a ) of the number we are taking the log of. So, if http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>logab=c , then http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>b=ac . Because taking the log effectively gives us an exponent value, the logarithm of a number is typically much smaller than the number itself. More importantly, it also means that the logarithm function is monotonic, so that http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>log|>x increases as http://www.w3.org/1998/Math/MathML>x increases. The word monotonic means of one tone or of one direction, and so it means either only going up (monotonically increasing) or only going down (monotonically decreasing). This is shown in Figure 1.4, which shows the natural logarithm function http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>ln|>x, from which we can see the value of http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>ln|>x increasing as http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x gets bigger:

    Figure 1.4: Graph of the natural logarithm function

    Figure 1.4: Graph of the natural logarithm function

    An important consequence of the monotonically increasing nature of the logarithm function is that if we have a function http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>f|>x and we want to find the value of http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x where http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>f|>x has its highest (or maximum) value, then that maximal value of http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x , let’s call it http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>x* , is also the point where http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>log|>f|>x has its maximal value. In mathematical notation, we can write this fact as follows:

    <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><mtext>If</mtext><mspace width="0.25em" /><mi>f</mi><mfenced open="(" close=")"><mi>x</mi></mfenced><mo>≤</mo><mi>f</mi><mfenced open="(" close=")"><msup><mi>x</mi><mi mathvariant="normal">*</mi></msup></mfenced><mspace width="0.25em" /><mtext>when</mtext><mspace width="0.25em" /><mi>x</mi><mo>≠</mo><msup><mi>x</mi><mi mathvariant="normal">*</mi></msup><mspace width="0.25em" /><mtext>then</mtext><mspace width="0.25em" /><mtext>log</mtext><mi>f</mi><mfenced open="(" close=")"><mi>x</mi></mfenced><mo>≤</mo><mtext>log</mtext><mi>f</mi><mfenced open="(" close=")"><msup><mi>x</mi><mi mathvariant="normal">*</mi></msup></mfenced><mspace width="0.25em" /><mtext>when</mtext><mspace width="0.25em" /><mi>x</mi><mo>≠</mo><msup><mi>x</mi><mi mathvariant="normal">*</mi></msup></mrow></mrow></math>

    Eq. 24

    We will refer to this again in a moment.

    There are well-known rules for taking logarithms of reciprocals, products, and ratios. These are (for any base):

    <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" display="block"><mml:msub><mml:mrow><mml:mtext>log</mml:mtext></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mfenced separators="|"><mml:mrow><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:mfenced><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mtext>log</mml:mtext></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mfenced separators="|"><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:mfenced></mml:math>

    Eq. 25

    And the following:

    <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><msub><mtext>log</mtext><mi>a</mi></msub><mfenced open="(" close=")"><mrow><mi>x</mi><mi>y</mi></mrow></mfenced><mo>=</mo><msub><mtext>log</mtext><mi>a</mi></msub><mfenced open="(" close=")"><mi>x</mi></mfenced><mo>+</mo><msub><mtext>log</mtext><mi>a</mi></msub><mfenced open="(" close=")"><mi>y</mi></mfenced></mrow></mrow></math>

    Eq. 26

    Combining these two rules, we get the rule for taking the log of a ratio:

    <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><msub><mtext>log</mtext><mi>a</mi></msub><mfenced open="(" close=")"><mfrac><mi>x</mi><mi>y</mi></mfrac></mfenced><mo>=</mo><msub><mtext>log</mtext><mi>a</mi></msub><mfenced open="(" close=")"><mi>x</mi></mfenced><mo>+</mo><msub><mtext>log</mtext><mi>a</mi></msub><mfenced open="(" close=")"><mfrac><mn>1</mn><mi>y</mi></mfrac></mfenced><mo>=</mo><msub><mtext>log</mtext><mi>a</mi></msub><mfenced open="(" close=")"><mi>x</mi></mfenced><mo>−</mo><msub><mtext>log</mtext><mi>a</mi></msub><mfenced open="(" close=")"><mi>y</mi></mfenced></mrow></mrow></math>

    Eq. 27

    The rule for taking the log of a product is particularly useful when we have a product formed from many numbers. Using the http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>Σ and http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>normal>Π notations we introduced earlier, we can write the following:

    <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" display="block"><mml:msub><mml:mrow><mml:mtext>log</mml:mtext></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mfenced separators="|"><mml:mrow><mml:mrow><mml:munderover><mml:mo stretchy="false">∏</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mrow></mml:mfenced><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext>log</mml:mtext></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mfenced separators="|"><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>…</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfenced><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext>log</mml:mtext></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mfenced separators="|"><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mfenced><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mtext>log</mml:mtext></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mfenced separators="|"><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mfenced><mml:mo>+</mml:mo><mml:mo>…</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mtext>log</mml:mtext></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mfenced separators="|"><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfenced><mml:mi> </mml:mi><mml:mi> </mml:mi><mml:mi> </mml:mi><mml:mi> </mml:mi><mml:mi> </mml:mi><mml:mi> </mml:mi><mml:mi> </mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:munderover><mml:mo stretchy="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:msub><mml:mrow><mml:mtext>log</mml:mtext></mml:mrow><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mfenced separators="|"><mml:mrow><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfenced></mml:mrow></mml:mrow></mml:math>

    Eq. 28

    This, in conjunction with the fact that taking the log is a monotonic transformation, will be very useful to us when we start to use the concept of maximum likelihood to build probabilistic models in Chapter 5

    .

    We will make lots of use of sums, products, and logarithms throughout this book, but we have all the notation we need to work with them, so let’s summarize what we have learned about that notation.

    What we learned

    In this section, we have learned about the following:

    The Σ notation for adding lots of numbers together

    The Π notation for multiplying lots of numbers together

    How we can also use the Σ and Π notations when we have a function, http://www.w3.org/1998/Math/MathML xmlns:m=http://schemas.openxmlformats.org/officeDocument/2006/math>f(x) , applied

    Enjoying the preview?
    Page 1 of 1