0% found this document useful (0 votes)
266 views16 pages

Maths Roadmap For Machine Learning

This document provides an overview of key linear algebra concepts used in machine learning and deep learning. It covers scalars, vectors, matrices, and tensors, as well as related topics like norms, independence, vector spaces, matrix factorization methods, and more. These foundational concepts are crucial for computations throughout machine learning and deep learning systems, from data representation and transformation to training deep neural networks.

Uploaded by

amriteshwork
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
266 views16 pages

Maths Roadmap For Machine Learning

This document provides an overview of key linear algebra concepts used in machine learning and deep learning. It covers scalars, vectors, matrices, and tensors, as well as related topics like norms, independence, vector spaces, matrix factorization methods, and more. These foundational concepts are crucial for computations throughout machine learning and deep learning systems, from data representation and transformation to training deep neural networks.

Uploaded by

amriteshwork
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Module Topic

Scalars What are scalars

Vectors What are Vectors


Row Vector and Column Vector
Distance from Origin
Euclidean Distance between 2 vectors
Scalar Vector Addition/Subtraction(Shifting)
Scalar Vector Multiplication/Division(Scaling)
Vector Vector Addition/Subtraction

Dot Product of 2 vectors


Angle between 2 vectors

Unit Vectors
Projection of a Vector
Basis Vectors

Equation of a Line in n-D

Vector Norms[L]

Linear Independence

Vector Spaces

Matrix What are Matrices?


Types of Matrices
Orthogonal Matrices
Symmetric Matrices
Diagonal Matrices
Matrix Equality
Scalar Operations on Matrices
Matrix Addition and Subtraction
Matrix Multiplication
Transpose of a Matrix
Determinant
Minor and Cofactor
Adjoint of a Matrix
Inverse of a Matrix
Rank of a Matrix

Coulumn Space and Null Space[L]

Change of Basis [L]

Solving a System of linear equations

Linear Transormations
3d Linear Transformations
Matrix Multiplication as Composition

Linear Transformation of Non-square Matrix

Dot Product
Cross Product [L]

Tensors What are Tensors


Importance of Tensors in Deep Learning
Tensor Operations
Data Representation using Tensors

Eigen Values and Vectors Eigen Vectors and Eigen Values


Eigen Faces [L]
Principal Component Analysis [L]

Matrix Factorization LU Decomposition[L]


QR Decomposition[L]
Eigen Decompositon[L]
Singular Value Decomposition[L]
Non-Negative Matrix Factorization[L]

Advanced Topics Moore-Penrose Pseudoinverse[L]


Quadratic Forms[L]
Positive Definite Matrices[L]
Hadamard Product[L]

Tools and Libraries Numpy


Scipy[L]
Usage in Machine Learning

A scalar is a single numeric quantity, fundamental in machine learning for computations, and deep learning for things
like learning rates and loss values.

These are arrays of numbers that can represent multiple forms of data. In machine learning, vectors can represent data
points, while
These are in deep
different learning,
forms they can represent
of representing vectors. features, weights,learning
In both machine and biases.
and deep learning, these representations
matter
This is because they affect
the magnitude computations
of the vector from like matrixofmultiplication,
the origin critical
the vector space. in areas like
It's important neural network
in machine operations.
learning for operations
like normalization, while in deep learning, it can help understand the magnitude of weights or feature vectors.
in many machine learning algorithms, including clustering and nearest neighbor search, and also used in deep learning
loss
Thesefunctions likecan
operations Mean
shiftSquared
vectors,Error.
useful in machine learning for data normalization and centering. In deep learning,
they are employed for operations like
Scalar and vector multiplication/divisionbiascan
correction.
be used for data scaling in machine learning. In deep learning, it's used to
control thefundamental
These are learning rateoperations
in optimization
used toalgorithms.
combine or compare vectors, used across machine learning and deep learning
for computations on data and weights.

computations in more advanced algorithms. In deep learning, it's crucial in operations like calculating the weighted
sums in a neurallike
in applications network layer. systems, and also in deep learning when examining the relationships between high-
recommender
dimensional vectors.

Unit vectors are important in machine learning for normalization and simplifying computations. They're equally
significant in deep
The projection of alearning, particularly
vector can be used forwhen it comes to reduction
dimensionality generatingindirectionally consistent
machine learning weight
and can updates.
be useful in deep
learning forlike
algorithms visualizing
PCA andhigh-dimensional data or understanding
SVD. In deep learning, features. basis vectors can be useful for interpreting the internal
representations that a network learns.

regression, and also crucial in deep learning where hyperplanes (an n-D extension of a line) are used to separate classes
in high-dimensional space.

deep learning, they're used in measuring the size of weights, which can control the complexity of the model, and in
normalization techniques such as batch and layer normalization.

lead to issues like inflated variance and unstable estimates of parameters.


PCA assumes that the principal components are linearly independent.

In deep learning, each layer of a neural network can be seen as transforming one vector space (the layer's input) into
another vector space (the layer's output).

A matrix is a two-dimensional array of numbers. In machine learning and deep learning, matrices are often used to
represent sets ofoffeatures,
Different types matricesmodel parameters,
(identity, or transformations
zero, sparse, of various
etc.) are used in data. ways, such as the identity matrix in linear
algebra operations,
they're often used inorPCA
sparse
andmatrices for handling
SVD, which large, high-dimensional
are dimension data In
reduction techniques. setsdeep
efficiently.
learning, orthogonal matrices
are often used to initialize weights in a way that prevents vanishing or exploding gradients.
These are matrices that are equal to their transpose. They're used in various algorithms because of their desirable
properties, like always
Diagonal matrices having
are used forreal eigenvalues.
scaling Covariance
operations. In machinematrices in statistics
learning, they oftenare an example
appear of symmetric
in quadratic matrices.
forms, while in
deep learning, the diagonal matrix structure is used in constructing learning rate schedules for stochastic optimization.
Matrices are equal if they're of the same size and their corresponding elements are equal. This is fundamental to many
machine learning and
Scalar operations deeptolearning
are used algorithms,
adjust all elements for
of aexample,
matrix bywhen checking
a fixed value. convergence of machine
This is used in algorithms.
learning and deep
learning for data scaling, weight updates, and more.
Theseoperation
This operations are usedtotomany
is central combine or compare
algorithms datasets
in both or model
machine parameters,
learning among other
and deep learning, like things.
linear regression or
forward propagation in neural networks.
Transposing a matrix is important for operations like computing the dot product between two vectors, or performing
certain types of
distributions. In matrix multiplication.
deep learning, the determinant is often used in advanced topics like volume-preserving transformations
in flow-based models.
These concepts are used in computing the inverse of a matrix or its determinant. While not directly used in many
machine learning
The adjoint algorithms,
of a matrix they're fundamental
is the transpose to thematrix.
of the cofactor underlying linear
It's used algebra. the inverse of a matrix, which is
in calculating
crucial in solving
deep learning, systems of linear
pseudo-inverse equations,
matrices often
are used found in machine
in techniques learning algorithms.
like Moore-Penrose inversion, which can be used to
calculate weights in certain network architectures.
machine learning for determining the solvability of linear systems (like in linear regression), and in deep learning, it's
used to investigate the properties of weight matrices.

represents the solutions to the homogeneous equation Ax=0. They are important for understanding the solvability of a
system of equations, which can arise in algorithms like linear regression.

different coordinate systems. This is often used in dimensionality reduction techniques like PCA, or when visualizing
high-dimensional feature spaces.

of linear equations. In deep learning, backpropagation can be seen as a process of solving a system of equations to find
the best parameters.

a fundamental operation in many machine learning and deep learning algorithms, from simple regression to complex
neural networks.
These transformations preserve points, lines, and planes. They're often used in machine learning for visualization and
geometric interpretations
matrix, created of data.
by multiplying the matrices representing the individual transformations. This is used extensively in
deep learning where each layer of a neural network can be seen as a matrix transformation of the input.

match the number of data points. Their transformations can be used for dimensionality reduction or feature
construction.

Dot product is a way of multiplying vectors that results in a scalar. It's used in machine learning to compute similarity
measures and in deep
machine learning, learning,
it's used less for instance,
often to calculate
due to its restrictionthe
to weighted sum of inputs
three dimensions, but it in a neural
might network
appear layer.
in specific
applications that involve 3D data.

learning, they are used to represent and manipulate data of various dimensionalities, such as 1D for time series, 2D for
images, or 3D for videos.

Operations such as tensor addition, multiplication, and reshaping are common in deep learning algorithms for
manipulating data and
In machine learning weights.
and deep learning, tensors are used to represent multidimensional data. For instance, an image can
be represented as a 3D tensor with dimensions for height, width, and color channels.

These concepts are used in machine learning for dimensionality reduction (PCA), understanding linear transformations,
and
Thismore. In deepapplication
is a specific learning, they're used to understand
of eigenvectors the behavior
used for facial of optimization
recognition. algorithms.
The 'eigenfaces' represent the directions in
which themore.
data, and images of faces
While not show theoften
used as mostinvariation.
deep learning, it's sometimes used for visualizing learned embeddings or
activations.

LU decomposition is a method of solving linear equations, which can arise in machine learning models like linear
regression. While not
QR decomposition canoften usedindirectly
be used machine in learning
deep learning, it's a linear
for solving fundamental linear
regression algebra or
problems operation.
for numerical stability in
certain algorithms.
This is used In deep
in machine learning,
learning it's often
to solve used in
problems some
that optimization
involve methods.
understanding the underlying structure of data, like
PCA. In deep learning, eigen decomposition can be used to analyze the weights
SVD is a method used in machine learning for dimensionality reduction, latent semanticof a model.
analysis, and more. In deep
learning,
extractionSVD can be where
in datasets used for
themodel compression
data and or are
the features initialization.
non-negative. In deep learning, NMF is less common, but
might be used in some specific data preprocessing or analysis tasks.

useful in machine learning algorithms such as linear regression. In deep learning, it can be used in calculating the
weights
Quadraticofforms
certainappear
network architectures.
in many machine learning algorithms such as support vector machines and Gaussian processes.
In deep learning, they are often found in the formulation of loss functions and regularization terms.
equations, which is used in many machine learning algorithms. In deep learning, positive definite matrices appear in the
analysis of optimization
for instance, methods,
in computing ensuring
certain types certain desirable
of features. propertiesit'slike
In deep learning, convergence.
used in operations such as gating in recurrent
neural networks (RNNs).

Numpy is a fundamental library for numerical computation in Python and is used extensively in both machine learning
and deep learning
optimization, for operations
statistical on arrays
testing, and and matrices.
some specific models like hierarchical clustering. In deep learning, Scipy might be
used for tasks like image processing or signal processing.
Important

Very important

[L] Later
Module Topic

Descriptive Statistics What is Stats/Types of Stats


Population Vs Sample

Types of Data

Measures of Central Tendency


- Mean
- Median
- Mode
- Weighted Mean [L]
- Trimmed Mean [L]

Measure of Dispersion
- Range
- Variance
- Standard Deviation
- Coefficient of Variation

Quantiles and Percentiles


5 number summary and BoxPlot

Skewness
Kurtosis [L]

Plotting Graphs
- Univariate Analysis
- Bivariate Analysis
- Multivariate Analysis

Correlation Covariance
Covariance Matrix
Pearson Correlation Coefficient
Spearman Correlation Coefficient [L]
Correlation and Causation

Probability DistributionRandom Variables


What are Probability Distributions
Why are Probability Distributions important
Probability Distribution Functions and it's types

Probability Mass Function (PMF)


CDF of PMF

Probability Density Function(PDF)


CDF of PDF
Density Estimation [L]
Parametric Density Estimation [L]
Non-Parametric Density Estimation [L]
Kernel Density Estimation(KDE) [L]

How to use PDF/PMF and CDF in Analysis

2D Density Plots

Types of Probability DisNormal Distribution


- Properties of Normal Distribution
- CDF of Normal Distribution
- Standard Normal Variate

Uniform Distribution

Bernaulli Distribution

Binomial Distribution

Multinomial Distribution

Log Normal Distribution

Pareto Distribution [L]

Chi-square Distribution

Student's T Distribution

Poisson Distribution [L]


Beta Distribution [L]

Gamma Distribution [L]

Transformations

Confidence Intervals Point Estimates


Confidence Intervals
Confidence Interval(Sigma Known)
Confidence Interval(Sigma Unknown)
Interpreting Confidence Interval
Margin of Error and factors affecting it

Central Limit TheoremSampling Distribution


What is CLT
Standard Error

Hypothesis Tests What is Hypothesis Testing?


Null and Alternate Hypothesis
Steps involved in a Hypothesis Test
Performing Z-test
Rejection Region Approach
Type 1 Vs Type 2 Errors
One Sided vs 2 sided tests
Statistical Power
P-value
How to interpret P-values

Types of Hypothesis TesZ-test

T-test
- Single Sample T-test
- Independent 2 sample t-test
- Paired 2 sample t-test

Chi-square Test
Chi-square Goodness of Fit Test
Chi-square Test of Independence

ANOVA
One Way Anova
Two Way Anova
F-test

Levene Test [L]

Shapiro Wilk Test [L]

K-S Test [L]

Fisher's Test [L]

Miscellaneous Topics Chebyshev's Inequality [L]


QQ Plot
Sampling
Resampling Techniques
Bootstraping [L]
Standardization
Normalization
Statistical Moments [L]
Bayesian Statistics
A/B Testing
Law of Large Numbers
Usage in Machine Learning

data. Training a model typically happens on a sample of the total data (the training set), which is assumed to be representative of the
population. This concept is used to perform inferential statistics and to estimate the model's performance on unseen data.

Understanding the type of data you're working with helps in selecting the appropriate preprocessing techniques, feature engineering
methods, and machine learning models.

These measures provide the central value of a data distribution. They are used to understand the 'typical' value in a dataset and are used
in various areas of machine learning including exploratory data analysis, outlier detection, and data imputation.

These measures provide insights into the spread or variability of the data distribution. They help in understanding the consistency in the
data and are also used in exploratory data analysis, outlier detection, feature normalization, etc.

These help in understanding the distribution of data and are used in descriptive statistics, outlier detection, and setting up thresholds for
decision-making.
These are used in the exploratory data analysis phase to understand the data distribution and identify outliers. Boxplots graphically
depict the minimum, first quartile, median, third quartile, and maximum of a dataset.

These are used to understand the asymmetry and tailedness of the data distribution, respectively. They're particularly useful in
exploratory data analysis, informing data transformations needed to meet the assumptions of some machine learning algorithms.

Graphical analysis is crucial in the exploratory phase of machine learning. It helps in understanding the distributions of individual
variables (univariate), relationships between two variables (bivariate), or complex interactions among multiple variables (multivariate).

gives the covariance between each pair of features in a dataset. These concepts are used in many machine learning algorithms, such as
Principal Component Analysis (PCA) for dimensionality reduction, or Gaussian Mixture Models for clustering.

This statistic measures the linear relationship between two datasets. It's used in feature selection, where highly correlated input features
can be identified and reduced, to improve the performance and interpretability of the model.
This measures the monotonic relationship between two datasets. It's useful when the data doesn't meet the assumptions of Pearson's
correlation (linearity, normality). It can be used in the same contexts as Pearson's correlation coefficient.
Correlation measures association between variables, while causation indicates a cause-effect relationship. In machine learning, it's
important to remember that correlation doesn't imply causation, and algorithms based purely on correlation might fail to generalize well.

Random variables and their distributions form the mathematical basis of probabilistic machine learning algorithms. They help us
understand the data's inherent randomness and variability, and guide the choice and behavior of algorithms.
These concepts are critical in understanding and manipulating discrete random variables, often used in algorithms like Naive Bayes,
Hidden Markov Models, etc.

These are used for continuous random variables. For instance, in the Gaussian Mixture Model, each cluster is modeled as a Gaussian
distribution with its PDF.

learning for tasks such as anomaly detection. Kernel Density Estimation (KDE), a non-parametric way to estimate the PDF of a random
variable, is particularly useful when no suitable parametric form of the data is known.

hese concepts are used for data analysis and visualization, to understand and communicate the distribution and trends in the data. In
machine learning, these analyses can inform the choice of model, preprocessing steps, and potential feature engineering.

They can reveal patterns and associations in the data that can guide subsequent modeling steps. For instance, they could help identify
clusters for a clustering algorithm in unsupervised learning.

algorithm that uses these as a base, such as neural networks. Also, many statistical methods require the assumption of normally
distributed errors.

This distribution is used in random forest algorithms for feature splits, and also in initialization of weights in neural networks. It is also
used in methods like grid search where you need to randomly sample parameters.

Used in algorithms that model binary outcomes, such as the Bernoulli Naive Bayes classifier and logistic regression.

Used in modelling the number of successes in a fixed number of Bernoulli trials, often applied in classification problems.

Text Classification, topic modelling, deep learning and word embeddings

Useful in various contexts, such as when dealing with variables that are the multiplicative product of other variables, or when working
with data that exhibit skewness.

Often used in the realm of anomaly detection or for studying phenomena in the domain of social, quality control, and economic
sciences.

Chi-square tests use this distribution extensively to test relationships between categorical variables. The chi-square statistic is also used
in the context of feature selection.

Plays a crucial role in formulating the confidence interval when the sample size is small and/or when the population standard deviation
is unknown.

Used for modeling the number of times an event might occur within a set time or space. It's often used in queuing theory and for time-
series prediction models.
This is a versatile distribution often used in Bayesian methods, and is also the conjugate prior for the Bernoulli, binomial, negative
binomial and geometric distributions.

The Gamma distribution is used in a variety of fields, including queuing models, climatology, and financial services. It's the conjugate
prior of the Poisson, exponential, and normal distributions.

These are used to make the data conform to the assumptions of a machine learning algorithm, enhance the performance of the
algorithm, or help visualize the data. Common examples are the logarithmic, square root, and z-score standardization transformations.

the true population parameter to lie, with a given level of confidence. They are used to understand the reliability of point estimates and
are often used to report the results of models.

size gets larger, regardless of the shape of the population distribution. This is the foundation for many machine learning methods and is
often used in hypothesis testing and in creating confidence intervals.

This is used to understand the variability in a point estimate. In machine learning, it's often used in constructing confidence intervals for
model parameters and in hypothesis testing.

related to specific models. For instance, a t-test might be used to determine if the means of two sets of results (like two algorithms) are
significantly different.
These are fundamental components of all hypothesis tests. The null hypothesis typically represents a theory that has been put forward,
either because it is believed to be true or because it is to be used as a basis for argument, but has not been proved.

These are all components of hypothesis testing, and they're used to make decisions about whether the observed effect in our sample is
real or happened due to chance. These concepts are used in feature selection, model validation, and comparisons between models.

This is the ability of a hypothesis test to detect an effect, if the effect actually exists. In machine learning, power analysis can be used to
estimate the minimum number of observations required to detect an effect.
your test occurred at random. If p-value is small (typically ≤ 0.05), it indicates strong evidence to reject the null hypothesis. In machine
learning, p-values are often used in feature selection where the null hypothesis is that the feature has no effect on the target variable.

when the data is normally distributed and the population variance is known. It's often used in A/B testing to decide whether two groups'
mean outcomes are different.

T-tests are used when the data is normally distributed but the population variance is unknown.
compares the mean of a single sample to a known population mean.
compares the means of two independent samples.
compares the means of the same group at two different times (say, before and after a treatment). In machine learning, t-tests are often
used in experiments designed to compare the performance of two different algorithms on the same problem.

The Chi-square test is used when dealing with categorical variables. It helps to establish if there's a statistically significant relationship
between categorical variables.
determines if a sample data matches a population.
checks the relationship between two categorical variables.

factors by comparing the response variable means at the different factor levels. The null hypothesis states that all population means are
equal while the alternative hypothesis states that at least one is different.
It’s used to test for differences among at least three groups, as they relate to one factor or variable.
It’s used to compare the mean differences between groups that have been split on two independent variables.

This test assesses the equality of variances for a variable calculated for two or more groups. It's often used in feature selection where the
null hypothesis is that the variances are equal.

This test is used to check the normality of a distribution. Many machine learning algorithms assume normal distribution, making this
test quite useful.

he K-S test is a non-parametric test that compares a sample with a reference probability distribution, or two samples with each other. It's
used in goodness-of-fit tests.

Fisher's test is used to determine if there are nonrandom associations between two categorical variables.

for understanding the range within which most data points lie and can be applied for outlier detection. Chebyshev's inequality is also
used in the analysis and proof of convergence of some machine learning algorithms.
assumption of normality in data. Normality of residuals is an assumption in certain statistical and machine learning models, so this can
help in diagnostic analysis of these models.
It's widely used in machine learning, especially in the context of large datasets, where it may be computationally infeasible to use the
entire population. Techniques such as train-test split, k-fold cross-validation, and stratified sampling all involve sampling principles.
Cross Validation
the population. In machine learning, it's used in ensemble methods like Bagging and Random Forests to generate diverse models by
creating different datasets.
common scale without distorting differences in the ranges of values or losing information. Many machine learning algorithms perform
better with standardized input features.
Similar to standardization, normalization is a scaling technique that modifies the values of numeric columns in the dataset to a common
scale, but without distorting differences in the ranges of values or losing information. It's also known as Min-Max scaling.
machine learning to describe, understand, and compare variable distributions. In particular, skewness and kurtosis can be used in feature
engineering to create new features or to select features.
Hyperparameter Tuning
Yellow Important
Red Extremely Important
[L] Later

You might also like