0% found this document useful (0 votes)

67 views10 pages

Lab 1 - Machine Learning with Python - ML Engineering مهم

Uploaded by

Dr. Ahmed Khazal احمد خزعل

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views10 pages

Lab 1 - Machine Learning with Python - ML Engineering مهم

Uploaded by

Dr. Ahmed Khazal احمد خزعل

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Lab 1: Machine Learning with

Print to PDF

Python
Contents
Lab 1: Machine Learning with Python
Why Python?
Numpy, Scipy, Matplotlib
scikit-learn

Joaquin Vanschoren, Pieter Gijsbers, Bilge Celik, Prabhant Singh

%matplotlib inline
import numpy as np
import pandas as pd

Overview
Why Python?
Intro to scikit-learn
Exercises

Many data-heavy applications are now developed in Python

Highly readable, less complexity, fast prototyping
Easy to offload number crunching to underlying C/Fortran/…
Easy to install and import many rich libraries
numpy: efficient data structures
scipy: fast numerical recipes
matplotlib: high-quality graphs
Skipalgorithms
scikit-learn: machine learning to main content
tensorflow: neural networks
…

See the tutorials (in the course GitHub)

Many good tutorials online
Jake VanderPlas’ book and notebooks
J.R. Johansson’s notebooks
DataCamp
…

One of the most prominent Python libraries for machine learning:

Contains many state-of-the-art machine learning algorithms

Builds on numpy (fast), implements advanced techniques
Wide range of evaluation measures and techniques
Offers comprehensive documentation about each algorithm
Widely used, and a wealth of tutorials and code snippets are available
Works well with numpy, scipy, pandas, matplotlib,…

Skip to main content

Algorithms
See the Reference

Supervised learning:

Linear models (Ridge, Lasso, Elastic Net, …)

Support Vector Machines
Tree-based methods (Classification/Regression Trees, Random Forests,…)
Nearest neighbors
Neural networks
Gaussian Processes
Feature selection

Unsupervised learning:

Clustering (KMeans, …)
Matrix Decomposition (PCA, …)
Manifold Learning (Embeddings)
Density estimation
Outlier detection

Model selection and evaluation:

Cross-validation
Grid-search
Lots of metrics

Data import
Multiple options:

A few toy datasets are included in sklearn.datasets

Import 1000s of datasets via sklearn.datasets.fetch_openml
You can import data files (CSV) with pandas or numpy

Skip to main content

from sklearn.datasets import load_iris, fetch_openml
iris_data = load_iris()
dating_data = fetch_openml("SpeedDating", version=1)

/Users/jvanscho/miniconda3/lib/python3.10/site-packages/sklearn/datasets/_openml.py:93
warn(

These will return a Bunch object (similar to a dict )

print("Keys of iris_dataset: {}".format(iris_data.keys()))

print(iris_data['DESCR'][:193] + "\n...")

Keys of iris_dataset: dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', '

.. _iris_dataset:

Iris plants dataset

--------------------

Data Set Characteristics:

:Number of Instances: 150 (50 in each of three classes)

:Number of Attributes: 4 numeric, pre
...

Targets (classes) and features are lists of strings

Data and target values are always numeric (ndarrays)

print("Targets: {}".format(iris_data['target_names']))
print("Features: {}".format(iris_data['feature_names']))
print("Shape of data: {}".format(iris_data['data'].shape))
print("First 5 rows:\n{}".format(iris_data['data'][:5]))
print("Targets:\n{}".format(iris_data['target']))

Targets: ['setosa' 'versicolor' 'virginica']

Features: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width
Shape of data: (150, 4)
First 5 rows:
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]]
Skip to main content
Targets:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]

Building models
All scikitlearn estimators follow the same interface

class SupervisedEstimator(...):
def __init__(self, hyperparam, ...):

def fit(self, X, y): # Fit/model the training data

... # given data X and targets y
return self

def predict(self, X): # Make predictions

... # on unseen data X
return y_pred

def score(self, X, y): # Predict and compare to true

... # labels y
return score

Training and testing data

To evaluate our classifier, we need to test it on unseen data.
train_test_split : splits data randomly in 75% training and 25% test data.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
iris_data['data'], iris_data['target'],
random_state=0)
print("X_train shape: {}".format(X_train.shape))
print("y_train shape: {}".format(y_train.shape))
print("X_test shape: {}".format(X_test.shape))
print("y_test shape: {}".format(y_test.shape))

X_train shape: (112, 4)

y_train shape: (112,)
Skip to main content
X_test shape: (38, 4)
y_test shape: (38,)

We can also choose other ways to split the data. For instance, the following will create a training
set of 10% of the data and a test set of 5% of the data. This is useful when dealing with very
large datasets. stratify defines the target feature to stratify the data (ensure that the class
distributions are kept the same).

X, y = iris_data['data'], iris_data['target']
Xs_train, Xs_test, ys_train, ys_test = train_test_split(X,y, stratify=y, train_size=0
print("Xs_train shape: {}".format(Xs_train.shape))
print("Xs_test shape: {}".format(Xs_test.shape))

Xs_train shape: (15, 4)

Xs_test shape: (8, 4)

Looking at your data (with pandas)

from pandas.plotting import scatter_matrix

# Build a DataFrame with training examples and feature names

iris_df = pd.DataFrame(X_train,
columns=iris_data.feature_names)

# scatter matrix from the dataframe, color by class

sm = scatter_matrix(iris_df, c=y_train, figsize=(8, 8),
marker='o', hist_kwds={'bins': 20}, s=60,
alpha=.8)

Skip to main content

Fitting a model
The first model we’ll build is a k-Nearest Neighbor classifier.
kNN is included in sklearn.neighbors , so let’s build our first model

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)

Skip to main content

▾ KNeighborsClassifier
KNeighborsClassifier(n_neighbors=1)

Making predictions
Let’s create a new example and ask the kNN model to classify it

X_new = np.array([[5, 2.9, 1, 0.2]])

prediction = knn.predict(X_new)
print("Prediction: {}".format(prediction))
print("Predicted target name: {}".format(
iris_data['target_names'][prediction]))

Prediction: [0]
Predicted target name: ['setosa']

Evaluating the model

Feeding all test examples to the model yields all predictions

y_pred = knn.predict(X_test)
print("Test set predictions:\n {}".format(y_pred))

Test set predictions:

[2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0
2]

The score function computes the percentage of correct predictions

knn.score(X_test, y_test)

print("Score: {:.2f}".format(knn.score(X_test, y_test) ))

Score: 0.97
Skip to main content
Instead of a single train-test split, we can use cross_validate do run a cross-validation. It will
return the test scores, as well as the fit and score times, for every fold. By default, scikit-learn
does a 5-fold cross-validation, hence returning 5 test scores.

!pip install -U joblib

Requirement already satisfied: joblib in /Users/jvanscho/miniconda3/lib/python3.10/sit

from sklearn.model_selection import cross_validate

xval = cross_validate(knn, X, y, return_train_score=True, n_jobs=-1)
xval

{'fit_time': array([0.0004108 , 0.00043321, 0.00047421, 0.00054502, 0.00044918]),

'score_time': array([0.00080895, 0.00081778, 0.00089979, 0.00099206, 0.00093198]),
'test_score': array([0.96666667, 0.96666667, 0.93333333, 0.93333333, 1. ]),
'train_score': array([1., 1., 1., 1., 1.])}

The mean should give a better performance estimate

np.mean(xval['test_score'])

0.96

Introspecting the model

Most models allow you to retrieve the trained model parameters, usually called coef_

from sklearn.linear_model import LinearRegression

lr = LinearRegression().fit(X_train, y_train)
lr.coef_

array([-0.15330146, -0.02540761, 0.26698013, 0.57386186])

Skip to main content

Matching these with the names of the features, we can see which features are primarily used by
the model

d = zip(iris_data.feature_names,lr.coef_)
set(d)

{('petal length (cm)', 0.2669801292888399),

('petal width (cm)', 0.5738618608875331),
('sepal length (cm)', -0.15330145645467938),
('sepal width (cm)', -0.025407610745503684)}

Please see the course notebooks for more examples on how to analyse models.

Essential n8n Playbook
From Everand
Essential n8n Playbook
Leandro Calado
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
4 pages
Impact of GT, SFD and EIRP On System Design
No ratings yet
Impact of GT, SFD and EIRP On System Design
18 pages
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
100% (1)
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
1 page
Scikit Learn
No ratings yet
Scikit Learn
25 pages
ML Remaining Jds
No ratings yet
ML Remaining Jds
35 pages
Sklearn
No ratings yet
Sklearn
141 pages
Machine Learning Aiml
No ratings yet
Machine Learning Aiml
7 pages
K-Nearest Neighbors Classifiers 2025
No ratings yet
K-Nearest Neighbors Classifiers 2025
33 pages
Unit 2 ML
No ratings yet
Unit 2 ML
93 pages
Unit-2 Feature Selection
No ratings yet
Unit-2 Feature Selection
92 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
Exercise and Experiment 3
No ratings yet
Exercise and Experiment 3
14 pages
Scikit Learn
No ratings yet
Scikit Learn
17 pages
Maxbox - Starter67 Machine Learning
No ratings yet
Maxbox - Starter67 Machine Learning
7 pages
Scikit Learn Cheat Sheet Python
No ratings yet
Scikit Learn Cheat Sheet Python
1 page
Machine Learning Algorithms (Python & R) PDF
No ratings yet
Machine Learning Algorithms (Python & R) PDF
11 pages
Beginner's Guide To Implementing A Simple Machine Learning Project - DeV Community
No ratings yet
Beginner's Guide To Implementing A Simple Machine Learning Project - DeV Community
9 pages
Artificial Intelligence Advance Practical
No ratings yet
Artificial Intelligence Advance Practical
12 pages
To Study About Numpy, Pandas and Matplotlib Libraries in Python
No ratings yet
To Study About Numpy, Pandas and Matplotlib Libraries in Python
21 pages
Udacity Machine Learning Analysis Supervised Learning
100% (1)
Udacity Machine Learning Analysis Supervised Learning
504 pages
Scikit-Learn Cheat Sheet
No ratings yet
Scikit-Learn Cheat Sheet
1 page
Scikit-Learn Cheat Sheet
No ratings yet
Scikit-Learn Cheat Sheet
1 page
ML Cheatsheet
No ratings yet
ML Cheatsheet
4 pages
Scikit-Learn: Library For Machine Learning and Data Science With Python
No ratings yet
Scikit-Learn: Library For Machine Learning and Data Science With Python
11 pages
2 Machine Learning
No ratings yet
2 Machine Learning
21 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
Code Examples in Space
No ratings yet
Code Examples in Space
13 pages
Machine Learning: Supervised /unsupervised
No ratings yet
Machine Learning: Supervised /unsupervised
33 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
39 pages
Linearregression SVM
No ratings yet
Linearregression SVM
3 pages
Tutorial 6
No ratings yet
Tutorial 6
8 pages
Crash Course Sul Machine Learning ?
No ratings yet
Crash Course Sul Machine Learning ?
13 pages
Scikit-Learn Cheat Sheet Python For Data Science: Preprocessing The Data Evaluate Your Model's Performance
100% (1)
Scikit-Learn Cheat Sheet Python For Data Science: Preprocessing The Data Evaluate Your Model's Performance
1 page
ML Algorithms
100% (1)
ML Algorithms
1 page
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
ChatGPT - MyLearning On Coding For Machine Learning
No ratings yet
ChatGPT - MyLearning On Coding For Machine Learning
16 pages
Mnbnmnbnnmbbhhuyrgh
No ratings yet
Mnbnmnbnnmbbhhuyrgh
3 pages
Machine Learning With Python - Machine Learning Algorithms - KNN
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - KNN
15 pages
KNN Datacamp
No ratings yet
KNN Datacamp
31 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
Machine Learning Cheatsheet
No ratings yet
Machine Learning Cheatsheet
5 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Project 1
No ratings yet
Project 1
4 pages
Dsbda 10
No ratings yet
Dsbda 10
5 pages
Intro To Scikit Learning
No ratings yet
Intro To Scikit Learning
18 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
36 pages
Lab 6
No ratings yet
Lab 6
4 pages
Perform the Data Classification Using SVM Classifier_BI Prac 1
No ratings yet
Perform the Data Classification Using SVM Classifier_BI Prac 1
8 pages
FREE AI Code Generator - Generate Code Online in Any Language
No ratings yet
FREE AI Code Generator - Generate Code Online in Any Language
12 pages
Python Code For KNN Classifier 1. Initial Message
No ratings yet
Python Code For KNN Classifier 1. Initial Message
7 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
Mod3 Classification
No ratings yet
Mod3 Classification
32 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Core Java Programming Book
From Everand
Core Java Programming Book
Manish Soni
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet
Neuralnetworks 130217080859 Phpapp01
No ratings yet
Neuralnetworks 130217080859 Phpapp01
47 pages
NN Ch3
No ratings yet
NN Ch3
40 pages
Nnet Intro
No ratings yet
Nnet Intro
22 pages
Neural Network
No ratings yet
Neural Network
18 pages
FMDS0153
No ratings yet
FMDS0153
17 pages
Quality Management
50% (4)
Quality Management
10 pages
Lecture 2 - CML 231 - Atomic Bonding and Structure of Solids
100% (1)
Lecture 2 - CML 231 - Atomic Bonding and Structure of Solids
115 pages
VI Licensing 2023 1 Guide
No ratings yet
VI Licensing 2023 1 Guide
21 pages
Api Use For Full Stack Mern Project
No ratings yet
Api Use For Full Stack Mern Project
4 pages
CLI Vmotion
No ratings yet
CLI Vmotion
90 pages
Equalizer부분 rev2
100% (1)
Equalizer부분 rev2
36 pages
Hotel Reservation System
0% (2)
Hotel Reservation System
24 pages
Environmental Biotechnology 環保生物技術
No ratings yet
Environmental Biotechnology 環保生物技術
89 pages
Time Signatures: What Is A Time Signature?
No ratings yet
Time Signatures: What Is A Time Signature?
2 pages
Gary Burton Technique
50% (2)
Gary Burton Technique
53 pages
App - Question Bank
No ratings yet
App - Question Bank
15 pages
Spinkler Calculation Ankur
No ratings yet
Spinkler Calculation Ankur
12 pages
Jawaharlal Nehru Technological University Hyderabad II Year B.Tech. EEE. I-Sem (R09)
No ratings yet
Jawaharlal Nehru Technological University Hyderabad II Year B.Tech. EEE. I-Sem (R09)
4 pages
Product Catalog - 2024
No ratings yet
Product Catalog - 2024
46 pages
Lecture 5 20240325
No ratings yet
Lecture 5 20240325
38 pages
Untitled
No ratings yet
Untitled
13 pages
AFS 222 Ds
No ratings yet
AFS 222 Ds
2 pages
Digital Verification Intro
No ratings yet
Digital Verification Intro
6 pages
HwGUI Documentation 5
No ratings yet
HwGUI Documentation 5
6 pages
The MPEG Handbook 2nd Edition John Watkinson Instant Download
No ratings yet
The MPEG Handbook 2nd Edition John Watkinson Instant Download
52 pages
A2 Level Chemistry 5.3 Transition Metals Assessed Homework: Answer All Questions Max 114 Marks
No ratings yet
A2 Level Chemistry 5.3 Transition Metals Assessed Homework: Answer All Questions Max 114 Marks
17 pages
Biotite
No ratings yet
Biotite
6 pages
Absorption Cooling System by Using Solar Energy
No ratings yet
Absorption Cooling System by Using Solar Energy
16 pages
Chemistry Worksheet - Ions
No ratings yet
Chemistry Worksheet - Ions
2 pages
GE 101 EE Report
No ratings yet
GE 101 EE Report
6 pages
Resm2021.350ma1007 Glass Micro Fibre Paper RESM
No ratings yet
Resm2021.350ma1007 Glass Micro Fibre Paper RESM
17 pages
Level III - Ata 38 Water - Waste
No ratings yet
Level III - Ata 38 Water - Waste
40 pages
Fundamentals of Computer
No ratings yet
Fundamentals of Computer
71 pages