Day 4
Day 4
Sreejith R
https://www.labellerr.com/blog/supervised-vs-unsupervised-learning-whats-the-
difference/
ML : Categorization
1) Parametric Algorithms:
2) Non-Parametric Algorithms:
Multi-Linear Regression
Used to model the relationship between multiple independent variables and a
dependent variable.
Validation Rules
-Logical relationship between the variables
- Adjusted R-Squared Value >.7 (70% model fit)
- Individual p-value of variables <0.05
- Model's overall p-value of variables <0.05
- Residuals from the model should be normally distributed.
- Check for multicollinearity between independent variables. (Eg: Program
coordinator influencing the judge, remove the coordinator or remove judge)(Eg- how
to keep the variable 80% with 3 variables or 85% with 5 variables)
- Evaluating the volume of residue. (Rootmeanssquare (RMS) value)
LAB
Eg: using Dta SET
Loan-amt -csv
Get Ref Books -> Read Data -> Data preprocessing (Missing values) -> Split Data to
test & train -> Model fit -> Test & Evaluate
-----------------------------------------------------------------------------------
------------------------------------------------------
Logistic Regression
- prediction will be of probability (binary outcome - 1,0 or true,false)
- probability of occurance
-sigmoid equation
Validation Rules
-Confusion Matrix metrics
--Accuracy
--Precision (Eg: Supreme Court - no innocents should be punished even though 100
criminals escape)
--Recall - (Eg: Covid Case - false alam is okey but covid cases should not be
escaped)
--F1-Score
Categorical variable
One hot encoding
--------------------------------------------------------------------------
Classification
Decision Tree (Non Parametric Method)
-supervised ML algorithm
-Rule based method
-Splitting Node
---Information Gain(Entropy)
---Gini Index
--------------------------------------------------------------------------
ORANGE