0% found this document useful (0 votes)
22 views3 pages

Day 4

Uploaded by

CYRIAC JOSE MBA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views3 pages

Day 4

Uploaded by

CYRIAC JOSE MBA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

Predictive Analytics using Python and Orange by Dr.

Sreejith R

Data Exploration& Analysis

Simplest Form -Reports,charts- What has happened


Advanced form - mathematical models for forecasting, predict behavioural patterns
and trends.
Options for future, What will happen, What should we do

AI-> ML-> Deep Learning


Supevised, unsupervised,reinforced learnings
Supervised learnings - Eg: Tom & Jerry pics grouping in cluster
Unsupervised - Eg: Humans & animals pics grouping in cluster
Reinforced Learning - Eg: Dynamic pricing in ecommerce, -----------Deep Q
--------------- paper publications huge scope

https://www.labellerr.com/blog/supervised-vs-unsupervised-learning-whats-the-
difference/

ML : Categorization
1) Parametric Algorithms:
2) Non-Parametric Algorithms:

1) Parametric Algorithms: - make assumptions (Eg: linear regression, logistic


regression, naive bayes, linear SVM
2) Non-Parametric Algorithms:- do not make strong assumptions (Eg: decision tree, k
nearest neighbours, random forest, support vector machine)

Multi-Linear Regression
Used to model the relationship between multiple independent variables and a
dependent variable.

Validation Rules
-Logical relationship between the variables
- Adjusted R-Squared Value >.7 (70% model fit)
- Individual p-value of variables <0.05
- Model's overall p-value of variables <0.05
- Residuals from the model should be normally distributed.
- Check for multicollinearity between independent variables. (Eg: Program
coordinator influencing the judge, remove the coordinator or remove judge)(Eg- how
to keep the variable 80% with 3 variables or 85% with 5 variables)
- Evaluating the volume of residue. (Rootmeanssquare (RMS) value)

------------Python used for---------

a)Web Development - django,flask,pyramid


b)Data Analysis - numpy, pandas,sklearn
c)Data Visualization - matplotlib,seaborn
d)Machine Learning - scikit-learn, NLTK and Tensor flow
e) Computer Vision - opencv
f) Web Scraping - Beautifulsoap,scrapy, urllib , phyton selenium Eg:
Glassdoor(CSV, XML, XLS,SQL)
g) Browser Automation - selenium with python
h) Writing Scripts - emails, automate repatitive stuff.
i) ML Projects - embedded Systems - Raspberry pi
j)GUI - Tkinter, PyQT etc
k) Game Development - pygame

LAB
Eg: using Dta SET

Loan-amt -csv

Get Ref Books -> Read Data -> Data preprocessing (Missing values) -> Split Data to
test & train -> Model fit -> Test & Evaluate

-----------------------------------------------------------------------------------
------------------------------------------------------
Logistic Regression
- prediction will be of probability (binary outcome - 1,0 or true,false)
- probability of occurance
-sigmoid equation

Validation Rules
-Confusion Matrix metrics
--Accuracy
--Precision (Eg: Supreme Court - no innocents should be punished even though 100
criminals escape)
--Recall - (Eg: Covid Case - false alam is okey but covid cases should not be
escaped)
--F1-Score

Categorical variable
One hot encoding

--------------------------------------------------------------------------
Classification
Decision Tree (Non Parametric Method)
-supervised ML algorithm
-Rule based method
-Splitting Node
---Information Gain(Entropy)
---Gini Index

--------------------------------------------------------------------------
ORANGE

You might also like