0% found this document useful (0 votes)

2 views4 pages

Machine Learning Notes

Uploaded by

durgavihashini.p26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views4 pages

Machine Learning Notes

Uploaded by

durgavihashini.p26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 4

MACHINE LEARNING 19/5/25

STATISTICS

TYPES - > DESCRIPTIVE(DESCRIBES THE DATA) & INFERENTIAL(APPLIES SOME OPERATION ON

SAMPLE DATA TO PREDICT OP)

DESCRIPTIVE 4 WAYS -> MEASURE DISPERSION(STANDARD DEVIATION, VARIAnce), MEASURE

POSITION(percentile 100, quantile equal parts by any values, decile 10, quartile
4), CENTRAL TENDENCY(mean, median,mode), FREQUENCY

INFERENCE -> sample is subset of population. takes sample dataset to predict the
total outcome of all the population.

Data types-> Quantitative , Qualitative

Quantitative ->
Descriptive is countable, Continuous product price

Qualitative -> category

Sampling techniques -> Probablistic and Non-probablistic

Probablistic-> random, statistical, systematic, clustering. Chances of being

selected is same
Non P-> Chances of being selected is not equal. Convenience, snowball, consecutive,
quota

another type of data is -> structured (csv), semi-structured , non structured.

Sampling bias-> if any case of

Skewness & kertosis -> normal, right, left,

mode <- median <- mean

sci-py Scientific python

highly skewed, moderately skewed, normally skewed.

kertosis is how sharp it is? leptokartic, meso, platyokertotic

empirical rule -> if 68% data falls in

21/05/2025

Variance, Standard Deviation(mostly preferred), co-variance(captures direction

only), correlation (relation between two variables, captures both direction and
strength) (x & y correlation is 0 means less correlation. x&y =1 means highly
correlated, x&y is -1 then negatively highly correlation),

mean > median > mode

CENTRAL LIMIT THEOREM (CLT) =>

Sampling theorem
Law of Large Number ->

impute (fill null values), mean, median, mode.

median is not sensitive to outliers. to fill the impute values use median of the
dataset numbers.

Probability Distribution -> Discrete, Continuous. Binomial Bernoulli

uniform distribution implies that all outcomes within a specific range have an
equal probability of occurring, while a normal distribution (also known as the
Gaussian distribution) describes a probability distribution where most data points
cluster around the mean, tapering off symmetrically toward the extremes
Distribution types:
Binomial, Bernoulli
Normal, Uniformed, Continuous

Transformation -> if data is skewed we can't apply ml models on it. (Skewed

transformation -> normal transformation) POWER TRANSFORMATION,

FIVE NUMBER SUMMARY -> max, Q3, Median, Q1, Min

H/W = Hypothesis testing , Traditional Programming vs Machine learning.

types of ML -> supervised, unsupervised, reinforcement.

Batch learning, online learning.

Problem we face in ML-> missing values, bias, imbalance, choosing the right algo,
getting a labelled data

ML lifecycle. - business understanding, data collection, model selection, training,

evaluation, deployment.

Data drift(tomorrow H/W)

when to use ML and DL -> if limited data points t\are there then ML is preferred
else DL is used.

Feature Engineering -> Better/Accurate Performance.

imputation, encoding, Scaling, Normalization of data, Binning(grping values unto

bins or buckets),

in feature eng -> feat.constructuction, feat.transformation, feat.selection(imp),

feat.extraction.

feature.scaling-> Normalizes or standardizes numerical features to a specific range

or distribution.
Standardization: Transforms features to have a mean of 0 and a standard deviation
of 1 AKA z-score.
Normalization: Min-max, Max abs, min normalization, robust

Data -> Numerical ; Category-> nominal, ordinal

ENCODING -> textual data to numerical data ml can't process text. like a matrix,
for n values conider only n-1 cols in the matrix.

scikit-learn, skilearn.

how outliers come is by mistake at the time of collecting the data,

why outliers? -> statistical, biasness,

techniques to detect outliers -> IQR (interquartile range) iqr=q3-q1 lowerbound=

q1-1.5*iqr. upperbound = q3+ 1.*iqr any value less than lowebound an dgreater than
upper bound is outliers; Z-scores, sorting, graphing, scatter plot(visual way).

how outliers are treated -> imputation, remove, transformation, capping, binning,

Curse of Dimension -> if features are increasing then it is difficult for ml to

find pattern.

problems when many feature's are there -> overfitting, time,complexity, performance
will decrease(inaccurate predictions)

problem is having many fatures -> dimension reducing techniques (PCA for linear
datas, TSNE for non-linear datas) ;

Feature Selection -> filter VIF, wrapper, embedd (random forest)

23/5/25

LINEAR REGRESSIONtypes: Simple (independent & dependent columns)

, Multiple (independent & dependent columns)
, Polynomial (independent columns)

Assumptions of Linear Regression:

Linearity => if one variable changes the other variable changes in the same amount,
plots can be used
Normality => it follows normal distribution, quantile plots
Independence => ACF , ARIMA((Autoregressive Integrated Moving Average)
Homoscedasticity(same variance) => he variance of error terms (residuals) should be
consistent across all levels of the independent variables, to remove this non-
linearity can use NOVA

to remove non-linearity -> Transformation( Power, Mathematical ->Logarithmic

transformations )

How to solve linear regression?

1) closed form solution -> OLS uses mathematical formula (library used are
statsmodel and scikit learn)
2) non-closed solution ->Gradient Descendent uses approximate

(apply statsmodel and multiple polynomial )

Simple Linear Regression Model working: y=mx+c

a) calculate x bar y bar
b) calculate m, c. m is slope c is intercept

Gradient Descent -> start with guess, calculate the error, calculate the
gradient(go through videos), update the value of m1 b, Repeat find min value for
cost price.

Gradient descent has three main types: batch, stochastic, and mini-batch.

Evaluation Metric for Regression : use Performance instead of Accuracy(only for

classification problem).
Mean Absolute Error (MAE),
Mean Squared Error (MSE) gives result in square unit form,
Root Mean Squared Error (RMSE) root of MSE,
R-squared (Coefficient of Determination) R2 Square ,
Mean Absolute Percentage Error (MAPE),
Adjusted-R square 85%

Ch 2 PPT
No ratings yet
Ch 2 PPT
72 pages
ML-Lecture-6-7-preprocess
No ratings yet
ML-Lecture-6-7-preprocess
43 pages
Data and Metrics
No ratings yet
Data and Metrics
35 pages
Complete Data Science Questions
No ratings yet
Complete Data Science Questions
5 pages
Data Cleaning
No ratings yet
Data Cleaning
6 pages
001 Unit01 Basic Concepts of Surveying
No ratings yet
001 Unit01 Basic Concepts of Surveying
57 pages
Bloomberg - Term Structure of Risk
No ratings yet
Bloomberg - Term Structure of Risk
42 pages
ML DL NLP Definitions
No ratings yet
ML DL NLP Definitions
22 pages
Calibration Uncertainty: NS, Krogsh/'ljvej 51
No ratings yet
Calibration Uncertainty: NS, Krogsh/'ljvej 51
6 pages
Feature Engineering
No ratings yet
Feature Engineering
18 pages
PPA Data Preparation
No ratings yet
PPA Data Preparation
31 pages
margin_6794edf99eb1f_3c24107b2ce99dfbffd813406a34e332_6794ede66a47f
No ratings yet
margin_6794edf99eb1f_3c24107b2ce99dfbffd813406a34e332_6794ede66a47f
2 pages
Chapter 02 Overview (R)
No ratings yet
Chapter 02 Overview (R)
43 pages
ML_Notes
No ratings yet
ML_Notes
44 pages
APznzaZmf FjNZzQU2KZGNWcTIMyEPNieeXpEIC4txhLpx IW9aIcijwEdcvmrObIy4gDpcU78AYLsB6msaeqj47x3Fc6z9vdKhe5EnyMTtReSpFg 23R3DG W66DWWysqOW PfB BJrKuEN CsrKXdSrdM OKOdbGKa2ND0ltkJXrievcwimUpSlHEYiQCPleUm8zmyjmaz7 PPZRnRfUuizv
No ratings yet
APznzaZmf FjNZzQU2KZGNWcTIMyEPNieeXpEIC4txhLpx IW9aIcijwEdcvmrObIy4gDpcU78AYLsB6msaeqj47x3Fc6z9vdKhe5EnyMTtReSpFg 23R3DG W66DWWysqOW PfB BJrKuEN CsrKXdSrdM OKOdbGKa2ND0ltkJXrievcwimUpSlHEYiQCPleUm8zmyjmaz7 PPZRnRfUuizv
24 pages
EDAN96_2024_Last_lecture-1
No ratings yet
EDAN96_2024_Last_lecture-1
78 pages
Unit 3
No ratings yet
Unit 3
55 pages
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
No ratings yet
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
44 pages
Interview questions companie
No ratings yet
Interview questions companie
72 pages
General ML Notes
No ratings yet
General ML Notes
30 pages
PMA Unit-2 pdf
No ratings yet
PMA Unit-2 pdf
19 pages
Unit 1(DS)
No ratings yet
Unit 1(DS)
15 pages
Data Preprocessing
No ratings yet
Data Preprocessing
56 pages
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
33 pages
Unit-2Exploratory-Analysis
No ratings yet
Unit-2Exploratory-Analysis
37 pages
Bitcoin As A Global Currency - Exploring The Wild West of Crypto
No ratings yet
Bitcoin As A Global Currency - Exploring The Wild West of Crypto
45 pages
Tugas2 TA2201 JuwitaSasiMaulidya12116018 AstriedMaulidya12116084
No ratings yet
Tugas2 TA2201 JuwitaSasiMaulidya12116018 AstriedMaulidya12116084
36 pages
Cheatsheet FDA a4 Full
No ratings yet
Cheatsheet FDA a4 Full
2 pages
Moral Pajak, Pemeriksaan, Sanksi, Kepatuhan Pajak Umkm Peran Moderasi Kesadaran Pajak
No ratings yet
Moral Pajak, Pemeriksaan, Sanksi, Kepatuhan Pajak Umkm Peran Moderasi Kesadaran Pajak
15 pages
Features
No ratings yet
Features
42 pages
Lec06 7 Feature Engineering 08112022 100115am
No ratings yet
Lec06 7 Feature Engineering 08112022 100115am
44 pages
Data Mining
No ratings yet
Data Mining
33 pages
DJ 14 Ai&ds 3
No ratings yet
DJ 14 Ai&ds 3
20 pages
SML
No ratings yet
SML
8 pages
4 - Basics in Statistics and Linear Algebra
No ratings yet
4 - Basics in Statistics and Linear Algebra
7 pages
ERERER
No ratings yet
ERERER
1 page
9fm0-4b-rms-20220818
No ratings yet
9fm0-4b-rms-20220818
14 pages
Variance and Standard Deviation of The Sampling Distribution of Means With Replacement
No ratings yet
Variance and Standard Deviation of The Sampling Distribution of Means With Replacement
33 pages
DS assignment COMPLETED DOC
No ratings yet
DS assignment COMPLETED DOC
11 pages
ISDS 361A - Cheat Sheet Exam 1.pdf
No ratings yet
ISDS 361A - Cheat Sheet Exam 1.pdf
2 pages
BB A 3 Econometric Sand Excel
No ratings yet
BB A 3 Econometric Sand Excel
28 pages
Resumo Adp
No ratings yet
Resumo Adp
5 pages
Odometry: Error Detection & Correction
No ratings yet
Odometry: Error Detection & Correction
32 pages
Data Science Cheatsheet 2.0: Statistics Model Evaluation Logistic Regression
No ratings yet
Data Science Cheatsheet 2.0: Statistics Model Evaluation Logistic Regression
4 pages
Cheat Sheet
No ratings yet
Cheat Sheet
3 pages
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
Empirical Accounting Research Design For Ph.D. Students
No ratings yet
Empirical Accounting Research Design For Ph.D. Students
14 pages
Attitude, Subjective Norms, Perceived Behavior, Entrepreneurship Education and Self-Efficacy Toward Entrepreneurial Intention University Student in Indonesia
No ratings yet
Attitude, Subjective Norms, Perceived Behavior, Entrepreneurship Education and Self-Efficacy Toward Entrepreneurial Intention University Student in Indonesia
21 pages
Names of Team Members Roll No Criteria For Selection of Main Company
No ratings yet
Names of Team Members Roll No Criteria For Selection of Main Company
31 pages
Cross-Correlation Leading Indicator TS
No ratings yet
Cross-Correlation Leading Indicator TS
11 pages
Pattern L1 L6
No ratings yet
Pattern L1 L6
19 pages
Problem-Set - 1 Practise Problems From Textbook
No ratings yet
Problem-Set - 1 Practise Problems From Textbook
2 pages
Notes
No ratings yet
Notes
12 pages
Data Science Cheatsheet
100% (1)
Data Science Cheatsheet
5 pages
SUSS BSBA: BUS105 Jan 2021 TOA Answers
No ratings yet
SUSS BSBA: BUS105 Jan 2021 TOA Answers
9 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
No ratings yet
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
11 pages
Data Science Cheat Sheet
No ratings yet
Data Science Cheat Sheet
10 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
54 pages
Lecture 05: Feature Engineering: Ms. Mehroz Sadiq
No ratings yet
Lecture 05: Feature Engineering: Ms. Mehroz Sadiq
69 pages
Class Material - Multiple Linear Regression
No ratings yet
Class Material - Multiple Linear Regression
57 pages
5
No ratings yet
5
23 pages
My Notes
No ratings yet
My Notes
15 pages
Most Compact and Complete Data Science Cheat Sheet 1672981093
No ratings yet
Most Compact and Complete Data Science Cheat Sheet 1672981093
10 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Final ML
No ratings yet
Final ML
2 pages
CH 6 Single Index
No ratings yet
CH 6 Single Index
20 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Introduction To Metrology
No ratings yet
Introduction To Metrology
55 pages
ML Ques Bank For 3rd Unit PDF
No ratings yet
ML Ques Bank For 3rd Unit PDF
8 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
Hca 465 - Regression Assignment
No ratings yet
Hca 465 - Regression Assignment
6 pages
What Is Data Science? Probability Overview Descriptive Statistics
No ratings yet
What Is Data Science? Probability Overview Descriptive Statistics
10 pages
Senior General Physics 1 q1 Module 1
No ratings yet
Senior General Physics 1 q1 Module 1
20 pages
The Moderating Role of Top Management Team Interdependence X
No ratings yet
The Moderating Role of Top Management Team Interdependence X
15 pages
A Short Guide For Feature Engineering and Feature Selection
No ratings yet
A Short Guide For Feature Engineering and Feature Selection
32 pages
BRM Question Bank
No ratings yet
BRM Question Bank
12 pages
Data Science Master
No ratings yet
Data Science Master
11 pages
Skittles Report
No ratings yet
Skittles Report
9 pages
Unit 4 Basics of Feature Engineering
No ratings yet
Unit 4 Basics of Feature Engineering
33 pages
05 Goldburd Khare Tevet PDF
No ratings yet
05 Goldburd Khare Tevet PDF
106 pages
Machine Learning - Lec4 - 5
No ratings yet
Machine Learning - Lec4 - 5
41 pages
Iso TS 28037-2010
100% (1)
Iso TS 28037-2010
72 pages
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
No ratings yet
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
55 pages
Unit 1 Metro Logy
100% (2)
Unit 1 Metro Logy
9 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
6 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet

Machine Learning Notes

Uploaded by

Machine Learning Notes

Uploaded by

MACHINE LEARNING 19/5/25

TYPES - > DESCRIPTIVE(DESCRIBES THE DATA) & INFERENTIAL(APPLIES SOME OPERATION ON

DESCRIPTIVE 4 WAYS -> MEASURE DISPERSION(STANDARD DEVIATION, VARIAnce), MEASURE

Data types-> Quantitative , Qualitative

Qualitative -> category

Sampling techniques -> Probablistic and Non-probablistic

Probablistic-> random, statistical, systematic, clustering. Chances of being

another type of data is -> structured (csv), semi-structured , non structured.

Sampling bias-> if any case of

Skewness & kertosis -> normal, right, left,

mode <- median <- mean

sci-py Scientific python

highly skewed, moderately skewed, normally skewed.

kertosis is how sharp it is? leptokartic, meso, platyokertotic

empirical rule -> if 68% data falls in

Variance, Standard Deviation(mostly preferred), co-variance(captures direction

mean > median > mode

CENTRAL LIMIT THEOREM (CLT) =>

impute (fill null values), mean, median, mode.

Probability Distribution -> Discrete, Continuous. Binomial Bernoulli

Transformation -> if data is skewed we can't apply ml models on it. (Skewed

FIVE NUMBER SUMMARY -> max, Q3, Median, Q1, Min

H/W = Hypothesis testing , Traditional Programming vs Machine learning.

types of ML -> supervised, unsupervised, reinforcement.

ML lifecycle. - business understanding, data collection, model selection, training,

Data drift(tomorrow H/W)

Feature Engineering -> Better/Accurate Performance.

imputation, encoding, Scaling, Normalization of data, Binning(grping values unto

in feature eng -> feat.constructuction, feat.transformation, feat.selection(imp),

feature.scaling-> Normalizes or standardizes numerical features to a specific range

Data -> Numerical ; Category-> nominal, ordinal

how outliers come is by mistake at the time of collecting the data,

techniques to detect outliers -> IQR (interquartile range) iqr=q3-q1 lowerbound=

Curse of Dimension -> if features are increasing then it is difficult for ml to

Feature Selection -> filter VIF, wrapper, embedd (random forest)

LINEAR REGRESSIONtypes: Simple (independent & dependent columns)

Assumptions of Linear Regression:

to remove non-linearity -> Transformation( Power, Mathematical ->Logarithmic

How to solve linear regression?

(apply statsmodel and multiple polynomial )

Simple Linear Regression Model working: y=mx+c

Evaluation Metric for Regression : use Performance instead of Accuracy(only for

You might also like