0% found this document useful (0 votes)

26 views11 pages

ML Python

The document provides an overview of machine learning (ML) concepts, techniques, and Python libraries, emphasizing the distinction between supervised and unsupervised learning. Key ML techniques include regression, classification, clustering, and anomaly detection, while popular Python libraries for ML include NumPy, SciPy, and Scikit-learn. It also covers model evaluation methods and specific algorithms like K-Nearest Neighbors, Decision Trees, and Support Vector Machines.

Uploaded by

randa.maazouza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views11 pages

ML Python

Uploaded by

randa.maazouza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Machine Learning with python

ML : Science that gives computers the ability to learn without begin explicitly programmed

ML Techniques :

 Regression/Estimation : Predicting continuous values

 Classification : Predecting class / category
 Clustering :Finfing the structure of data , summarization
 Associations : frequent co occurring item /events (ex : a products always bought together by
same clt)
 Anomaly detection : descove unusual cases
 Sequence mining : predicting next events
 Dimension reduction :reduce size of data

Python libraries for machine learning :

 NumPy : to work with N-dimension arrays

 SciPy : signal process /optimaztion /statistics
 Matplotib : 2D and 3D plotting
 Pandas :import /manipul /analys data
 Scikit-learn : algo and tools for ML

ML Pipline :

All this process is included in scikit-learn

Supervised learning :

teach model by training it with labled dataset

labeled dataset -> classes

supervised technique : classification et regression

Unupervised learning :

Trains on dataset and conclusion on unlabled data

Unsupervised technique : demnsion reduction , clustering , density estimation , market basket

analysis

Regression

To predict continuous value

Type des var dans reg :

Independent (x explanatory var causes of y )

dependent (y state , target ,final goal to study )

type de reg : selon nbr de x

 Simple reg (linear et non linear ) ex : predict co2 using enginsize

 multiple reg (linear et non linear ) ex : predict co2 using enginsize and cylinders

Reg algo : poisson /linear/neural network /decision forest /boosted decision tree /k-nearest
neighbors

Simple Linear Regression

X can be continuous or categorial

Y always continuous

Y=b+ax

a et b param to adjust (les coff of the fit ligne) avec a :gradient et b intercept

MSE marge d error

Model Evaluation in Regression Models

After bulding a model we should evaluate it

Accuracy of a model : how much we can trust this model / Performance of a model
Types to evaluation :

Train and test on same dataset :

Compare actual with predicted to know the accuracy

Training accuracy : % correct predictions using test dataset !! high could be overfit

Out of sample accuracy : % correct predictions on data that the model has not been trained on
Train /Test split :

This methode will improve out of sample accuracy

K-fold cross validation :

Evaluation Metrics in Regression Models

Error : diff data point and trend line generated by the algo

MSE / RMSE interpretable en meme unit /RAE /RSE

divise le DataFrame cdf en deux ensembles : train (80 % des données) pour l'entraînement du modèle
et test (20 % des données) pour évaluer la performance du modèle. :

msk = np.random.rand(len(df)) < 0.8

train = cdf[msk] #tab de train

test = cdf[~msk] #tab de test

from sklearn import linear_model

regr = linear_model.LinearRegression() # Création d'un modèle de régression linéaire

# Préparation des données d'entraînement

train_x = np.asanyarray(train[['ENGINESIZE']])

train_y = np.asanyarray(train[['CO2EMISSIONS']])

regr.fit(train_x, train_y) # Entraînement du modèle

# Affichage des coefficients et de l'interception a et b

print ('Coefficients: ', regr.coef_[0][0]) #a

print ('Intercept: ',regr.intercept_[0]) #b

#Tester model

test_x = np.asanyarray(test[['ENGINESIZE']])

test_y = np.asanyarray(test[['CO2EMISSIONS']])

test_y_ = regr.predict(test_x)

Multiple Linear Reg :

Y=b+ax1+cx2+…

Y=zX

z=[b,a,c,…]: weight vector of regression or optimize param

X=[1,x1,x2,..] / line calld hyperline

Obectif est de minimize MSE pour faire ca il faut trouver best z

Comment trouver z :

algebre linear (complex qst temp)

An optimaztion algo (gradient decendt random val for coff compare et trouver the best )

Chapitre 3 : Classification

Supervised learning approach

Learn realation between items

Target attribut in classification is a categorical var discret values

Type de classification : decision trees -knearest neighbor – logistic reg – neural networks

K-Nearest Neighbours KNN

 Find closest cases and assigne the same class label to our case
 La distance entre deux point est la mesure de leurs dissimilarity
peut etre calculer par euclidean dis

How its work ?

 Pick val for k

 Calculate dis of unknown case from all other cases in our dataset
 Select k-observation (the nearest to our unknown case )
 Predict result using most popular response val from k-nearest neighbors (k-observation)

How Calculate dis of unknown case from all other cases in our dataset ?

 euclidean dis :

what is the best val of k for KNN ?

k c est nbr de nearest neibhors to examine

if k is very low :bad prediction -> overfitting

if k is very high : overly generalized

to finde best K :

 resreve from data a part for testing the accuracy of the model
 start with k=1 and calculate accuracyof prediction using my test set
 repeat process increasing k untel find ur best k

We can use it to finde

cantinous val (avarg of
the nearest neighbors to predict val of new case )

Evaluation Metrics in Classification:

Those metrics explain the performance of a model

Compare val de test set avec val predicted by model

Jaccard index

(y:actual labels , y’:predicted labels )

Jaccard as the size of the intersect / union of two label sets

F1-score

Other way to verifier accuracy

Using test set

Rows shows actual true labels

Colomus shows
predicted val by model
Log loss

Measures the performance of a classifier where predicted output val in [0,1]

Equation : (y*log(y’)+(1-y)*log(1-y ‘)) ave y true val et y’ val predicted !! this equation Measure how
predicted val from actueal label

The wa calculate logloss=-1/n Σ(ylog(y’)+(1-y)log(1-y ‘))

Decision Trees

Testing attribute and branching the cases based on the result of the test

Each node coresponde to a test and each branch correspond to a result and each leaf node a class

How to buld a decision tree algo :

1. Choose attribute from ur dataset

2. Calculate the
significance of
attribute in splitting of
data (to see if it a eefective attribute or not )
3. Split data based on the value of the best attribute
4. Go to step 1

!! pure node si 100% of the cases apartient a une meme category ex:

Entropy : nbr of randomness or uncertainty in data

In decision trees we look fro trees with -entropy in theier nodes

Entropy=0 is the best

Ex 1 we calculi entropy for cholesterol:

6 drug b si col est normale

=>E=0.811

EX2 sex

Betwen colosterol et sex whos the best to choose ? => tree with higher information gain after splitting

Information gain : IG

Info can Incrase level of certainty after splitting

Si -E => + info gain

IG=(E befor split )-(weighted E after split )

Ex de sex : IG= 0.940-[(7/14)0.985 + (7/14)0.592 ]

Chapitre 4

Support Vector Machine SVM

 Used for classification

 Supervised algo
 Classifies cases by finding a separator
 Data can be seperted by a curb non a ligne

1. Map data to a high-dimensional feature space

2. Find a separator
Transformin data :

 Kernelling : le faite de mapper data a higher-dim

 Fct math utuliser pour faire ca est nome kernel peut etre de diff type :
(liear,polynomial,RBF,sigmond) those methode included in ml libreries
 To know best fuction preform w/ our dataset we test and compare

How to find separator:

Svm based on the idea of find a hyperplane who who devise data set into classes

With big margin possible

Hyperplane learnd from training data

SVM application : Image recognition-Sentiment Analysis-Detect spam …

Evaluation :

 Classification binaire ou multi-classes : F1-Score, Jaccard Index, …

 Régression : RMSE, MAE
 Clustering : Adjusted Rand Index

Logistic Regression

 Used for classification

 Statiscale and ML technique for classifying records
 Logstic regression diff de linear reg car linear we predict continuous values alors que logistic
reg we predict classes
 Indep var (X) should be continuous if they are cat we should transform theme to continuous
val

When use logistic reg :

 When ur target field in ur data is categorical binary

 If u need probability of ur result
 Whe u need a linear decision boundry
 If u need to understand the impact of the features

Logistic Regression Training

Main obejectif of traning is change param to find best estimation

Clustering

Groups have similers charactaire

Unsupervised

Clustring algo : k-means k-median

k-means

claculer similarite par calculi distance

1- Initialiser K : determiner nbr de cluster et randomly place k centiods
2- Calculer distance
3- Assigne each point to the closest centroid
4- New cintriods
5- Repeat until no more changes of cintiods

Machine Learning: Engr. Ejaz Ahmad
No ratings yet
Machine Learning: Engr. Ejaz Ahmad
54 pages
Moocs Ritesh
No ratings yet
Moocs Ritesh
22 pages
Machine Learning Evaluation Guide
100% (1)
Machine Learning Evaluation Guide
504 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
ML Models
No ratings yet
ML Models
21 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
Beginner's Guide to Machine Learning
No ratings yet
Beginner's Guide to Machine Learning
8 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
Machine Learning Overview and Techniques
No ratings yet
Machine Learning Overview and Techniques
36 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
No ratings yet
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
34 pages
Classification and Regression Models
No ratings yet
Classification and Regression Models
20 pages
ML Cheatsheet
No ratings yet
ML Cheatsheet
4 pages
Regression Pipeline in Machine Learning
No ratings yet
Regression Pipeline in Machine Learning
58 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
SML
No ratings yet
SML
8 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
6 pages
TD2345
No ratings yet
TD2345
3 pages
Lecture 4
No ratings yet
Lecture 4
63 pages
LAB MANUAL For Machine Learning
No ratings yet
LAB MANUAL For Machine Learning
15 pages
Titanic Data Analysis with Python
No ratings yet
Titanic Data Analysis with Python
20 pages
Developing A Machining Learning Models From Start To Finish.
No ratings yet
Developing A Machining Learning Models From Start To Finish.
59 pages
Statistical Learning Slides
No ratings yet
Statistical Learning Slides
60 pages
Supervised Learning Notes
No ratings yet
Supervised Learning Notes
7 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
Broadly, There Are 3 Types of Machine Learning Algorithms.
No ratings yet
Broadly, There Are 3 Types of Machine Learning Algorithms.
33 pages
Machine Learning Cheat Sheet
No ratings yet
Machine Learning Cheat Sheet
15 pages
Lab On ML Print-Set-2022
No ratings yet
Lab On ML Print-Set-2022
10 pages
Supervised Learning Notes
No ratings yet
Supervised Learning Notes
13 pages
Machine Learning Project Guide
100% (2)
Machine Learning Project Guide
26 pages
Lect 1
No ratings yet
Lect 1
24 pages
ML Codes
No ratings yet
ML Codes
9 pages
Machine Learning Concepts and Techniques
No ratings yet
Machine Learning Concepts and Techniques
13 pages
ML With Python Practical
No ratings yet
ML With Python Practical
22 pages
ML Record
No ratings yet
ML Record
19 pages
Python Machine Learning Guide
No ratings yet
Python Machine Learning Guide
4 pages
5 Markd
No ratings yet
5 Markd
24 pages
Machine Learning Regression Techniques
No ratings yet
Machine Learning Regression Techniques
4 pages
Supervised ML with Flask & Docker
No ratings yet
Supervised ML with Flask & Docker
30 pages
ML Lab-1
No ratings yet
ML Lab-1
32 pages
DSCI 6003 Class Notes
No ratings yet
DSCI 6003 Class Notes
7 pages
Machine Learning With Python 2021
No ratings yet
Machine Learning With Python 2021
124 pages
B24 ML Exp-3
No ratings yet
B24 ML Exp-3
10 pages
Machine Learing Algorithms
No ratings yet
Machine Learing Algorithms
13 pages
Machine Learning with Python Workshop
No ratings yet
Machine Learning with Python Workshop
65 pages
PythonForML2023 Laboratory07 08 Regression Classification Update2
No ratings yet
PythonForML2023 Laboratory07 08 Regression Classification Update2
6 pages
ML Lab 146
No ratings yet
ML Lab 146
50 pages
Slide 1
No ratings yet
Slide 1
29 pages
Machine Learning Class Notes
No ratings yet
Machine Learning Class Notes
2 pages
Machine Learning Cheatsheet
No ratings yet
Machine Learning Cheatsheet
5 pages
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
Predictive Maintenance
No ratings yet
Predictive Maintenance
66 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Standards and Specifications For GRC
No ratings yet
Standards and Specifications For GRC
2 pages
Tadao Ando: Critical Regionalism & Nature
No ratings yet
Tadao Ando: Critical Regionalism & Nature
42 pages
ECV 308 SOIL MECHANICS II-Slides 1-15
No ratings yet
ECV 308 SOIL MECHANICS II-Slides 1-15
16 pages
A1-Board Tunnelling WEB
No ratings yet
A1-Board Tunnelling WEB
1 page
CDF and PMF in Probability Theory
No ratings yet
CDF and PMF in Probability Theory
10 pages
Gahhaj
No ratings yet
Gahhaj
10 pages
Physical Education Lesson Plan Q3 L5
No ratings yet
Physical Education Lesson Plan Q3 L5
3 pages
Capstone Research Format
No ratings yet
Capstone Research Format
11 pages
Mountain Province State Polytechnic College: School of Healthcare Education
No ratings yet
Mountain Province State Polytechnic College: School of Healthcare Education
6 pages
Analysis of Figurative Language in Jane Naana Opoku Agyemang
No ratings yet
Analysis of Figurative Language in Jane Naana Opoku Agyemang
3 pages
A Practical Treatise On Brewing and On S
No ratings yet
A Practical Treatise On Brewing and On S
173 pages
Graphical Method
No ratings yet
Graphical Method
20 pages
Emico 3Pc Stainless Steel Ball Valves
No ratings yet
Emico 3Pc Stainless Steel Ball Valves
8 pages
Grade 2 Reading Comprehension Free English Worksheets
No ratings yet
Grade 2 Reading Comprehension Free English Worksheets
1 page
Grade 12 Electrostatics Guide
No ratings yet
Grade 12 Electrostatics Guide
2 pages
API 520 Part 1 Blowdown Liquids LESSER PSV
No ratings yet
API 520 Part 1 Blowdown Liquids LESSER PSV
1 page
2023 RRRT
No ratings yet
2023 RRRT
35 pages
Evidence of Evolution Study Guide
No ratings yet
Evidence of Evolution Study Guide
6 pages
EE 609 Tut-1 Questions
No ratings yet
EE 609 Tut-1 Questions
2 pages
Technology Data For Renewable Fuels
No ratings yet
Technology Data For Renewable Fuels
381 pages
Mitchell 2010
No ratings yet
Mitchell 2010
17 pages
Nabard Grade A Syllabus 2022 Byju S Exam Prep 15
No ratings yet
Nabard Grade A Syllabus 2022 Byju S Exam Prep 15
7 pages
Netviel Catalog
No ratings yet
Netviel Catalog
9 pages
Pandas Aggregates Guide
No ratings yet
Pandas Aggregates Guide
2 pages
Report Sample
No ratings yet
Report Sample
13 pages
Pedagogy 1000 - Mcqs PDF
100% (1)
Pedagogy 1000 - Mcqs PDF
263 pages
CV Model
No ratings yet
CV Model
2 pages
Sacred Geometry and Patterns of Divine Design by Derek Dodds
No ratings yet
Sacred Geometry and Patterns of Divine Design by Derek Dodds
119 pages
Design, Development and Optimization of Nano Emulsified Drug Delivery System of Poorly Permeable Drugs
No ratings yet
Design, Development and Optimization of Nano Emulsified Drug Delivery System of Poorly Permeable Drugs
2 pages
Wisa2010 P045
No ratings yet
Wisa2010 P045
10 pages

ML Python

Uploaded by

ML Python

Uploaded by

Machine Learning with python

 Regression/Estimation : Predicting continuous values

Python libraries for machine learning :

 NumPy : to work with N-dimension arrays

All this process is included in scikit-learn

teach model by training it with labled dataset

labeled dataset -> classes

supervised technique : classification et regression

Trains on dataset and conclusion on unlabled data

Unsupervised technique : demnsion reduction , clustering , density estimation , market basket

To predict continuous value

Type des var dans reg :

Independent (x explanatory var causes of y )

type de reg : selon nbr de x

 Simple reg (linear et non linear ) ex : predict co2 using enginsize

Simple Linear Regression

X can be continuous or categorial

MSE marge d error

Model Evaluation in Regression Models

After bulding a model we should evaluate it

Train and test on same dataset :

Compare actual with predicted to know the accuracy

This methode will improve out of sample accuracy

K-fold cross validation :

Evaluation Metrics in Regression Models

MSE / RMSE interpretable en meme unit /RAE /RSE

msk = np.random.rand(len(df)) < 0.8

train = cdf[msk] #tab de train

test = cdf[~msk] #tab de test

from sklearn import linear_model

regr = linear_model.LinearRegression() # Création d'un modèle de régression linéaire

# Préparation des données d'entraînement

regr.fit(train_x, train_y) # Entraînement du modèle

# Affichage des coefficients et de l'interception a et b

print ('Coefficients: ', regr.coef_[0][0]) #a

print ('Intercept: ',regr.intercept_[0]) #b

Multiple Linear Reg :

z=[b,a,c,…]: weight vector of regression or optimize param

X=[1,x1,x2,..] / line calld hyperline

Obectif est de minimize MSE pour faire ca il faut trouver best z

algebre linear (complex qst temp)

Supervised learning approach

Target attribut in classification is a categorical var discret values

K-Nearest Neighbours KNN

How its work ?

 Pick val for k

what is the best val of k for KNN ?

k c est nbr de nearest neibhors to examine

if k is very low :bad prediction -> overfitting

if k is very high : overly generalized

We can use it to finde

Evaluation Metrics in Classification:

Those metrics explain the performance of a model

Compare val de test set avec val predicted by model

(y:actual labels , y’:predicted labels )

Jaccard as the size of the intersect / union of two label sets

Other way to verifier accuracy

Using test set

Rows shows actual true labels

Measures the performance of a classifier where predicted output val in [0,1]

The wa calculate logloss=-1/n Σ(y*log(y’)+(1-y)*log(1-y ‘))

How to buld a decision tree algo :

1. Choose attribute from ur dataset

Entropy : nbr of randomness or uncertainty in data

In decision trees we look fro trees with -entropy in theier nodes

Entropy=0 is the best

6 drug b si col est normale

Info can Incrase level of certainty after splitting

Si -E => + info gain

IG=(E befor split )-(weighted E after split )

Ex de sex : IG= 0.940-[(7/14)*0.985 + (7/14)*0.592 ]

Support Vector Machine SVM

 Used for classification

1. Map data to a high-dimensional feature space

 Kernelling : le faite de mapper data a higher-dim

How to find separator:

The wa calculate logloss=-1/n Σ(ylog(y’)+(1-y)log(1-y ‘))

Ex de sex : IG= 0.940-[(7/14)0.985 + (7/14)0.592 ]