0% found this document useful (0 votes)
18 views

Regression Scikit Learn

regression

Uploaded by

amarinder765
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Regression Scikit Learn

regression

Uploaded by

amarinder765
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

scikit-learn

▪ introduction
▪ installation/distribution
▪ essential/auxiliary libraries
▪ usage

1
scikit-learn

▪ free
introduction--- ▪ open-source
▪ constantly being developed and improved
scikit-learn (also known as sklearn) is ▪ an active user community.
a free software machine ▪ state-of-the-art machine learning algorithms
learning library for ▪ provides nice documentation
the Python programming language. ▪ widely used in industry and academia
▪ a wealth of tutorials and code snippets are
available online.
▪ works well with many scientific Python
tools

2
scikit-learn

▪ for scientific computation


dependencies --- ▪ NumPy
▪ SciPy.
scikit-learn heavily relies on
NumPy and SciPy for its ▪ for plotting
▪ matplotlib
functions-moreover, can be used
more effectively with other ▪ for interactive development
auxiliary packages ▪ Ipython
▪ Jupyter Notebook

3
scikit-learn

installation--- 1
• Anaconda Free
(recommended)
▪ can be independently installed
▪ (recommended) can be 2
• Enthought canopy Not free

installed via a number of


• Python( x, y ) Free
python distributions ⇛ 3

if you install any of these Python


distributions, scikit-learn comes
packaged with it -

4
scikit-learn

comes with:
▪ NumPy,
▪ SciPy,

anaconda --- ▪
matplotlib,
pandas,
a Python distribution for ▪ IPython,
▪ Jupyter Notebook,
large-scale data processing,
▪ scikit-learn
predictive analytics, and
scientific computing ⇛ available on:
▪ Mac
▪ OS
▪ Windows

5
scikit-learn

Jupyter Notebook
• provides an interactive environment
libraries--- • runs code in the browser.
• great tool for exploratory data analysis
essentially required or increase the

⇛ NumPy •

widely used by data scientists
supports many programming languages
effectiveness of scikit-learn

⇛ SciPy
⇛ Jupyter Notebook NumPy
⇛ matplotlib • fundamental packages for scientific computing
• provides functionality for:
⇛ Pandas • multidimensional arrays
• high-level mathematical functions, e.g.,
• linear algebra operations
• Fourier transform
• pseudorandom number generators.

6
scikit-learn

NumPy, SciPy
:: strengths ::

7
scikit-learn

SciPy
libraries--- • a collection of functions for scientific computing
• provides, among other functionality:
essentially required or increase the

⇛ NumPy • advanced linear algebra routines,


effectiveness of scikit-learn


⇛ SciPy •
mathematical function optimization,
signal processing,
⇛ Jupyter Notebook • special mathematical functions,
⇛ matplotlib • statistical distributions.
⇛ Pandas • scikit-learn draws from SciPy’s collection of functions
for implementing its algorithms.

8
scikit-learn

matplotlib
libraries--- • primary scientific plotting library in Python
essentially required or increase the

• provides functions for


⇛ NumPy • making publication-quality visualizations:
effectiveness of scikit-learn

⇛ SciPy • line charts,


⇛ Jupyter Notebook • histograms,
• scatter plots,
⇛ matplotlib • and so on.
⇛ Pandas

9
scikit-learn

libraries--- pandas
essentially required or increase the

⇛ NumPy •

Python library for data wrangling and analysis
effectiveness of scikit-learn

built around a data structure called the DataFrame


⇛ SciPy • a DataFrame is a table
⇛ Jupyter Notebook • has methods for manipulating this table, e.g.,
• allows SQL-like queries and joins on such tables
⇛ matplotlib
⇛ Pandas

10
Fitting the Linear Regression Model
𝑚
𝜏 = 𝑥 𝑖 ,𝑦 𝑖
𝑖=1
, 𝑥 𝑖
𝜖ℝ 𝑛
,𝑦 𝑖
∈ℝ
(𝑖) (𝑖) (𝑖)
▪ model: 𝑦ො = 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤𝑛 𝑥𝑛
▪ model parameters: 𝑤0 , 𝑤1 , 𝑤1 ,…, 𝑤𝑛
▪ intercept: 𝑤0
▪ coefficients: 𝑤1 , 𝑤1 ,…, 𝑤𝑛

▪ dataset: the Boston data

11
the Boston data
• The Boston house-price data of Harrison, D. and
Rubinfeld, D. L. 'Hedonic prices and the demand
for clean air', J. Environ. Economics &
Management, vol.5, 81-102, 1978.
▪ Regression diagnostics: Identifying Influential
Data and Sources of Collinearity’…what
influences housing prices in Boston-

CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV
0.00632 18 2.31 0 0.538 6.575 65.2 4.09 1 296 15.3 396.9 4.98 24
0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.9 9.14 21.6
0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7
0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4
0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.9 5.33 36.2
0.02985 0 2.18 0 0.458 6.43 58.7 6.0622 3 222 18.7 394.12 5.21 28.7
0.08829 12.5 7.87 0 0.524 6.012 66.6 5.5605 5 311 15.2 395.6 12.43 22.9
0.14455 12.5 7.87 0 0.524 6.172 96.1 5.9505 5 311 15.2 396.9 19.15 27.1
0.21124 12.5 7.87 0 0.524 5.631 100 6.0821 5 311 15.2 386.63 29.93 16.5

12
the Boston housing example
𝑖 𝑖 𝑖
𝑥1 𝑥2
… 𝑥13 𝑦ො 𝑖

CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV
0.00632 18 2.31 0 0.538 6.575 65.2 4.09 1 296 15.3 396.9 4.98 24
0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.9 9.14 21.6
0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7
0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4
0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.9 5.33 36.2
0.02985 0 2.18 0 0.458 6.43 58.7 6.0622 3 222 18.7 394.12 5.21 28.7
0.08829 12.5 7.87 0 0.524 6.012 66.6 5.5605 5 311 15.2 395.6 12.43 22.9
0.14455 12.5 7.87 0 0.524 6.172 96.1 5.9505 5 311 15.2 396.9 19.15 27.1
0.21124 12.5 7.87 0 0.524 5.631 100 6.0821 5 311 15.2 386.63 29.93 16.5

𝑠𝑖𝑧𝑒 = 506 × (13 + 1)


𝑖 𝑖 𝑖
𝑦ො = 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤13 𝑥13
𝑓: ℝ13 → ℝ
13
exploring the data 14

Steps ---
▪ import the dataset loader
▪ create the loader object
▪ explore/understand the data
▪ shape of the data
▪ description(DESCR)
▪ feature names/values feature values target values

▪ target names/values
names of the features
▪ file path
▪ etc.
information about the data

file path
exploring the data 15

Steps ---
▪ import the dataset loader
▪ create the loader object
▪ explore/understand the data #columns/
#features
#rows/
▪ shape of the data #training examples
▪ description(DESCR)
▪ feature names/values
▪ target names/values
▪ file path
▪ etc.

506 rows, 1 target


exploring the data 16

Steps ---
▪ import the dataset loader
▪ create the loader object
▪ explore/understand the data
▪ shape of the data
▪ description(DESCR)
▪ feature names/values
▪ target names/values
▪ file path
▪ etc.
exploring the data 17

Steps ---
▪ import the dataset loader
▪ create the loader object
▪ explore/understand the data
▪ shape of the data
▪ description(DESCR)
▪ feature names/values
▪ target names/values
▪ file path
▪ etc.
exploring the data 18

Steps ---
▪ import the dataset loader
▪ create the loader object
▪ explore/understand the data
▪ shape of the data
▪ description(DESCR)
▪ feature names/values
▪ target names/values
▪ file path
▪ etc.
exploring the data 19

Steps ---
▪ import the dataset loader
▪ create the loader object
▪ explore/understand the data
▪ shape of the data
▪ description(DESCR) …
▪ feature names/values
▪ target names/values
▪ file path
▪ etc.
exploring the data 20

training ---
▪ split the data into training(75%), test sets (25%)
▪ import the model
▪ fit the model to the data
▪ test the model
▪ predict
training the algorithm 21

training ---
▪ split the data into training(75%), test sets (25%)
▪ import the model
▪ fit the model to the data
▪ test the model
▪ predict
training the algorithm 22

training ---
▪ split the data into training/test sets
▪ import the model
▪ fit the model to the data
▪ test the model
▪ predict
training the algorithm 23

training ---
▪ split the data into training/test sets
▪ import the model
▪ fit the model to the data
▪ test the model
▪ predict
the iris data

Iris Flower--
Data about 150 iris flowers to
be classified into 3 varieties; Sepal length Sepal width Petal length Petal width specie

sitosa, versicolor, virginica 5.1 3.3 1.7 0.5 sitosa


4.9 3.0 1.4 0.2 versicolor
5.4 3.6 1.4 0.2 sitosa
6.0 2.7 5.1 1.5 virginica

size: 150 × (4 + 1)

24
25

training the algorithm

step 1

Steps---
▪ load the data
▪ explore the data step 2
▪ split into training and validation
subsets
▪ import the optimizer

step 3
26

training the algorithm

step 4

Steps---
▪ load the data
▪ explore the data step 5
▪ split into training and validation
subsets
▪ import the optimizer
▪ fit to the data (derive the model)
▪ check accuracy of the model on step 6
the data
27

training the algorithm

step 4

Steps---
▪ load the data
▪ explore the data step 5
▪ split into training and validation
subsets
▪ import the optimizer
▪ fit to the data (derive the model)
▪ check accuracy of the model on step 6
the data
▪ predict with the model derived
28

training the algorithm

step 7

Steps---
▪ load the data
▪ explore the data
▪ split into training and validation
subsets
▪ import the optimizer
▪ fit to the data (derive the model)
▪ check accuracy of the model on
the data
▪ predict with the model derived
29

training the algorithm


30
31
https://scikit-
learn.org/stable/modules/generated/sklearn.linear_model.Logi
sticRegression.html#sklearn.linear_model.LogisticRegression

32
end

33

You might also like