0% found this document useful (0 votes)
2 views4 pages

Machine Learning Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views4 pages

Machine Learning Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 4

MACHINE LEARNING 19/5/25

STATISTICS

TYPES - > DESCRIPTIVE(DESCRIBES THE DATA) & INFERENTIAL(APPLIES SOME OPERATION ON


SAMPLE DATA TO PREDICT OP)

DESCRIPTIVE 4 WAYS -> MEASURE DISPERSION(STANDARD DEVIATION, VARIAnce), MEASURE


POSITION(percentile 100, quantile equal parts by any values, decile 10, quartile
4), CENTRAL TENDENCY(mean, median,mode), FREQUENCY

INFERENCE -> sample is subset of population. takes sample dataset to predict the
total outcome of all the population.

Data types-> Quantitative , Qualitative

Quantitative ->
Descriptive is countable, Continuous product price

Qualitative -> category

Sampling techniques -> Probablistic and Non-probablistic

Probablistic-> random, statistical, systematic, clustering. Chances of being


selected is same
Non P-> Chances of being selected is not equal. Convenience, snowball, consecutive,
quota

another type of data is -> structured (csv), semi-structured , non structured.

Sampling bias-> if any case of

Skewness & kertosis -> normal, right, left,

mode <- median <- mean

sci-py Scientific python

highly skewed, moderately skewed, normally skewed.

kertosis is how sharp it is? leptokartic, meso, platyokertotic

empirical rule -> if 68% data falls in

21/05/2025

Variance, Standard Deviation(mostly preferred), co-variance(captures direction


only), correlation (relation between two variables, captures both direction and
strength) (x & y correlation is 0 means less correlation. x&y =1 means highly
correlated, x&y is -1 then negatively highly correlation),

mean > median > mode

CENTRAL LIMIT THEOREM (CLT) =>


Sampling theorem
Law of Large Number ->

impute (fill null values), mean, median, mode.


median is not sensitive to outliers. to fill the impute values use median of the
dataset numbers.

Probability Distribution -> Discrete, Continuous. Binomial Bernoulli

uniform distribution implies that all outcomes within a specific range have an
equal probability of occurring, while a normal distribution (also known as the
Gaussian distribution) describes a probability distribution where most data points
cluster around the mean, tapering off symmetrically toward the extremes
Distribution types:
Binomial, Bernoulli
Normal, Uniformed, Continuous

Transformation -> if data is skewed we can't apply ml models on it. (Skewed


transformation -> normal transformation) POWER TRANSFORMATION,

FIVE NUMBER SUMMARY -> max, Q3, Median, Q1, Min

H/W = Hypothesis testing , Traditional Programming vs Machine learning.

types of ML -> supervised, unsupervised, reinforcement.


Batch learning, online learning.

Problem we face in ML-> missing values, bias, imbalance, choosing the right algo,
getting a labelled data

ML lifecycle. - business understanding, data collection, model selection, training,


evaluation, deployment.

Data drift(tomorrow H/W)

when to use ML and DL -> if limited data points t\are there then ML is preferred
else DL is used.

Feature Engineering -> Better/Accurate Performance.

imputation, encoding, Scaling, Normalization of data, Binning(grping values unto


bins or buckets),

in feature eng -> feat.constructuction, feat.transformation, feat.selection(imp),


feat.extraction.

feature.scaling-> Normalizes or standardizes numerical features to a specific range


or distribution.
Standardization: Transforms features to have a mean of 0 and a standard deviation
of 1 AKA z-score.
Normalization: Min-max, Max abs, min normalization, robust

Data -> Numerical ; Category-> nominal, ordinal

ENCODING -> textual data to numerical data ml can't process text. like a matrix,
for n values conider only n-1 cols in the matrix.

scikit-learn, skilearn.

how outliers come is by mistake at the time of collecting the data,


why outliers? -> statistical, biasness,

techniques to detect outliers -> IQR (interquartile range) iqr=q3-q1 lowerbound=


q1-1.5*iqr. upperbound = q3+ 1.*iqr any value less than lowebound an dgreater than
upper bound is outliers; Z-scores, sorting, graphing, scatter plot(visual way).

how outliers are treated -> imputation, remove, transformation, capping, binning,

Curse of Dimension -> if features are increasing then it is difficult for ml to


find pattern.

problems when many feature's are there -> overfitting, time,complexity, performance
will decrease(inaccurate predictions)

problem is having many fatures -> dimension reducing techniques (PCA for linear
datas, TSNE for non-linear datas) ;

Feature Selection -> filter VIF, wrapper, embedd (random forest)

23/5/25

LINEAR REGRESSIONtypes: Simple (independent & dependent columns)


, Multiple (independent & dependent columns)
, Polynomial (independent columns)

Assumptions of Linear Regression:


Linearity => if one variable changes the other variable changes in the same amount,
plots can be used
Normality => it follows normal distribution, quantile plots
Independence => ACF , ARIMA((Autoregressive Integrated Moving Average)
Homoscedasticity(same variance) => he variance of error terms (residuals) should be
consistent across all levels of the independent variables, to remove this non-
linearity can use NOVA

to remove non-linearity -> Transformation( Power, Mathematical ->Logarithmic


transformations )

How to solve linear regression?


1) closed form solution -> OLS uses mathematical formula (library used are
statsmodel and scikit learn)
2) non-closed solution ->Gradient Descendent uses approximate

(apply statsmodel and multiple polynomial )

Simple Linear Regression Model working: y=mx+c


a) calculate x bar y bar
b) calculate m, c. m is slope c is intercept

Gradient Descent -> start with guess, calculate the error, calculate the
gradient(go through videos), update the value of m1 b, Repeat find min value for
cost price.

Gradient descent has three main types: batch, stochastic, and mini-batch.

Evaluation Metric for Regression : use Performance instead of Accuracy(only for


classification problem).
Mean Absolute Error (MAE),
Mean Squared Error (MSE) gives result in square unit form,
Root Mean Squared Error (RMSE) root of MSE,
R-squared (Coefficient of Determination) R2 Square ,
Mean Absolute Percentage Error (MAPE),
Adjusted-R square 85%

You might also like