0% found this document useful (0 votes)

28 views11 pages

Lab Description File

The document provides an introduction to basic Python libraries used in data mining, specifically NumPy, Matplotlib, and SciPy. It covers how to import these libraries, perform basic operations with NumPy arrays, create plots with Matplotlib, and utilize SciPy for mathematical algorithms and statistical functions. Exercises are included to reinforce learning and encourage practical application of the concepts discussed.

Uploaded by

Ren Keting

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views11 pages

Lab Description File

Uploaded by

Ren Keting

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

SCC403 – Data Mining

Week 2: Introduction to the basic Python

libraries

Aim of the session:

• NumPy
• Matplotlib
• SciPy

1
1 Introduction

In computer programming, we can use libraries to help us develop our appli-

cations. Fundamentally a library is like a “toolbox”, making it easier to do
common tasks. It consists of lines of code (subroutines) that were developed
by others, but we can access those in our own applications, by calling specific
functions.
In order to use libraries, we use the following syntax: import [library name]
as [short name]. [library name] would be the specific library that you want
to load, and [short name] is how you are going to refer to it in your program.
It is also possible to use: from [library name] import [package], when we just
want to use a specific package from a library. We are going to see some
examples later. In this lab we are going to see 3 very common libraries for
scientific computing in Python: NumPy, Matplotlib and SciPy, though there
are others that we will refer to later in the course.

2 NumPy

NumPy is the main library for using and manipulating matrices in Python,
including linear algebra and statistics capabilities. It is, therefore, a very
fundamental library for scientific computing and data science. NumPy ma-
trices are similar to data structures used in other scientific computing frame-
works, like Matlab. In fact, if you are a Matlab user, you can see the main
similarities and differences at https://docs.scipy.org/doc/numpy/user/
numpy-for-matlab-users.html.
As mentioned previously, we first need to import the library. We do that
by typing:
1 import numpy as np
Now we can use np to refer to functions implemented in the NumPy li-
brary.

2.1 Basic Operations with arrays in NumPy

The main object in NumPy is the array which can be seen as a table of
elements (usually numbers), all of the same type, indexed by a tuple of non-
negative integers. Note, it starts (same as in C language from 0). In NumPy
dimensions are called axes. For example, the coordinates of a point in 3D
space [3, 4, 2] has one axis. That axis has 3 elements in it, so we say it has a
length of 3. We may also consider examples with more axes, e.g. an example
of an array that has 2 axes is:

2
1 [[ 1. , 0. , 0.] ,
2 [ 0. , 1. , 2.]]
The first axis has a length of 2, the second axis has a length of 3.
NumPy’s array class is called ndarray.
You can check the dimension of the ndarray with the command ndarray.shape.
For a matrix with n rows and m columns, the shape is (n,m).
Arithmetic operators apply on arrays element-wise.
Arrays can be initialised by zeros, or sequential numbers as follows:
1 >>> np . zeros ((2 , 1) )
will produce
1 array ([[0.] ,
2 [0.]])
To create sequences of numbers, NumPy provides the arange function
which is analogous to the Python built-in range, but returns an array.
1 c = np . arange (3 ,21 ,2)
will produce
1 array ([ 3 , 5, 7, 9 , 11 , 13 , 15 , 17 , 19])
We defined a sequence starting with 3 with step 2 finishing with 21. You
can note that the last value (21 is not included.
Exercise 1:
For the scalar λ = 0.1 and the two vectors a = [3, 45, 7, 2] and b =
[2, 54, 13, 15] define them as ndarray and then calculate:

• sum of a and b;
• multiplication of lambda by a;
• element-wise product of a and b.

2.2 Indexing arrays

One-dimensional arrays can be indexed, sliced and iterated over, much like
lists and other Python sequences.
Let us take the sequence, c we defined earlier and index its first element
(remember, same as in C language, the indexing starts with 0. Then we get:
1 c [0]=3
2 c [2]=7
3 c [8]=19

3
Some examples of slicing of the same array follow:
1 c [1:4]
2 array ([5 , 7 , 9])

1 c [7:3: -1]
2 array ([17 , 15 , 13 , 11])

1 c [1:7:2]=100
2 array ([ 3 , 100 , 7 , 100 , 11 , 100 , 15 , 17 , 19])

2.3 Linear Algebra

Let us consider the following array:

1 d = np . array ([[1.0 , 2.0] , [3.0 , 4.0]])
Its transpose can be found as follows:
1 d . transpose ()
The result is:
1 array ([[1. , 3.] ,
2 [2. , 4.]])
Inverse of square matrices is quite easy with NumPy:
1 np . linalg . inv ( d )
The result is:
1 array ([[ -2. , 1. ] ,
2 [ 1.5 , -0.5]])
You can find more details on the official NumPy tutorial, available at
https://docs.scipy.org/doc/numpy/user/quickstart.html and, in par-
ticular, about Linear algebra at linalg.py in NumPy folder.

3 Matplotlib

Matplotlib is a library for plotting graphs in Python. It has an Objected

Oriented interface, and a simpler interface called PyPlot, which is similar to
Matlab. We will use in our labs the PyPlot interface.
Each pyplot function makes some change to a figure: e.g., creates a fig-
ure, creates a plotting area in a figure, plots some lines in a plotting area,
decorates the plot with labels, etc.
Let us start with a very simple example:

4
1 import matplotlib . pyplot as plt
2
3 plt . plot ([1 , 4 , 3 , 2])
4 plt . ylabel ( ’ some numbers ’)
5 plt . show ()

Formatting the style of the plot

For every x, y pair of arguments, there is an optional third argument
which is the format string that indicates the color and line type of the plot.
The letters and symbols of the format string are the same as in MATLAB.
The default format string is ’b-’, which is a solid blue line.
The example below illustrates plotting several lines with different format
styles in one function call using arrays.
1 import numpy as np
2 import matplotlib . pyplot as plt
3
4 # evenly sampled time at 200 ms intervals
5 t = np . arange (0. , 5. , 0.2)
6
7 # red dashes , blue squares and green triangles
8 plt . plot (t , t , ’r - - ’ , t , t **2 , ’ bs ’ , t , t **3 , ’g ^ ’)
9 plt . show ()

5
You can create multiple figures by using multiple figure calls with an
increasing figure number. Each figure can contain as many axes and subplots
as necessary:
1 import matplotlib . pyplot as plt
2 plt . figure (1) # the first figure
3 plt . subplot (211) # the first subplot in
the first figure
4 plt . plot ([1 , 2 , 3])
5 plt . subplot (212) # the second subplot
in the first figure
6 plt . plot ([6 , 5 , 4])

6
You can find more details We refer you now to the official PyPlot tutorial.
Exercise 2:
Please follow the tutorial at https://matplotlib.org/tutorials/introductory/
pyplot.html, and try the examples on your own computer.

4 SciPy

SciPy is a collection of mathematical algorithms and convenience functions

built on the NumPy extension of Python. It adds significant power to the
interactive Python session by providing the user with high-level commands
and classes for manipulating and visualizing data.In particular, it includes
the following packages:
• cluster: Clustering algorithms
• constants: Physical and mathematical constants
• fftpack: Fast Fourier Transform routines
• integrate: Integration and ordinary differential equation solvers
• interpolate: Interpolation and smoothing splines
• io: Input and Output

7
• linalg: Linear algebra
• ndimage: N-dimensional image processing
• odr: Orthogonal distance regression
• optimize: Optimization and root-finding routines
• signal: Signal processing
• sparse: Sparse matrices and associated routines
• spatial: Spatial data structures and algorithms
• special: Special functions
• stats: Statistical distributions and functions

4.1 Statistics

The stats package contains various probability distributions and statistical

functions. For instance, let’s see what we can do with the Normal distribu-
tion. First we will load the statistics package by:
1 from scipy import stats
We can now easily take samples from the Normal distribution using ran-
dom variables sampling, rvs:
1 a = norm . rvs ( size =3)
The size parameter allows you to specify how many random numbers you
want to generate. Try it in your computer, and you will see that each time a
different array is returned. You can also specify the mean and the standard
deviation of the Normal distribution, by using the loc and scale parameters,
respectively. For example, check the output of the following two commands:
1 a = stats . norm . rvs ( loc =6 , scale =5 , size =5)
2 b = stats . norm . rvs ( loc =4 , scale =0.5 , size =5)
Besides simulating distributions, several statistical functions are available.
For instance, we can run a t-test to compare two different distributions,
which is a very useful approach for analysing experimental data. Let’s see
the example below:
1 rvs1 = stats . norm . rvs ( loc =5 , scale =10 , size =500)
2 rvs2 = stats . norm . rvs ( loc =5 , scale =10 , size =500)
3 Result = stats . ttest_ind ( rvs1 , rvs2 )
4 print ( Result )

8
Here we are generating two samples of the same distribution (rvs1, rvs2 ),
and then performing a t-test in the third line. The output of the t-test may
vary slightly each time you run, given that rvs1 and rvs2 will be re-generated,
but you will see something like:

Ttest_indResult(statistic=-0.5489036175088705, pvalue=0.5831943748663959

As you can see, the pvalue was around 0.58. Since 0.58 > 0.01, the t-test
correctly identified that the two samples come from the same distribution.
Now if you try:
1 rvs3 = stats . norm . rvs ( loc =8 , scale =10 , size =500)
2 Result_1_3 = stats . ttest_ind ( rvs1 , rvs3 )
3 print ( Result_1_3 )
The output will be similar to:

Ttest_indResult(statistic=-4.533414290175026, pvalue=6.507128186389019e-

As we can see, this time we compared rvs1 against a sample from a differ-
ent distribution (rvs3 ), since rvs3 has a different mean. The t-test returned
a very low value (6.50 × 10−6 ), correctly identifying that the underlying dis-
tributions that generated the samples are different.

4.2 Linear Algebra

SciPy provides several functions for linear algebra in the linalg package. It
has more linear algebra functions than the ones in NumPy, and they usually
run faster. Hence, we will show here some examples of linear algebra using
SciPy.
For instance, let’s see how to find the inverse of a matrix. As you may
recall, the inverse of a matrix A is a matrixB such that AB = I, where I
1 0 0
is the identity matrix. That is, I = 0 1 0. In SciPy, we can obtain the
0 0 1
inverse of a matrix A by using linalg.inv(A). Additionally, we can multiply
a matrix A by a matrix B using A.dot(B). Hence, the following example
calculates the inverse and checks the result:
1 import numpy as np
2 from scipy import linalg
3
4 A = np . array ([[1 ,3 ,5] ,[2 ,5 ,1] ,[2 ,3 ,8]])
5 B = linalg . inv ( A )

9
6 print ( B )
7 A . dot ( B )
 
−1.48 0.36 0.88
You will find the matrix B =  0.56 0.08 −0.36 which when multi-
0.16 −0.12 0.04
 
1.00000000e + 00 −1.11022302e − 16 −5.55111512e − 17
plied to A leads to: 3.05311332e − 16 1.00000000e + 00 1.87350135e − 16 .
2.22044605e − 16 −1.11022302e − 16 1.00000000e + 00
As we can see, this is very close to the “ideal” identity matrix.
Another common linear algebra operation is to find eigenvalues and eigen-
vectors of a matrix A. For instance, we will need these operations when
implementing PCA in the next lab session. Fundamentally the eigenvalues
and eigenvectors are the scalars λ and corresponding vectors v such that:
Av = λv. These can be found in SciPy using the function linalg.eig(A),
which returns the eigenvalues followed by the eigenvectors. For instance,
consider the following example:
1 import numpy as np
2 from scipy import linalg
3
4 A = np . array ([[1 , 2] , [3 , 4]])
5 la , v = linalg . eig ( A )
6 l1 , l2 = la
7
8 print ( l1 , l2 ) # eigenvalues
9 print ( v [: , 0]) # first eigenvector
10 print ( v [: , 1]) # second eigenvector
11 print ( A . dot ( v [: ,0]) - l1 * v [: ,0]) # check the
computation
12 print ( A . dot ( v [: ,1]) - l2 * v [: ,1])

1 2
By running this example, you will find that for the matrix A = ,
3 4
we can find the following eigenvalues: -0.3722, 5.3722. Additionally, the
respective eigenvectors are: [−0.8245, 0.5657]; [−0.41597356, −0.90937671].
We check the computation by executing Av − λv, which should return a
vector with zeros. Indeed, our code outputs:
0.00000000e + 00 + 0.j 5.55111512e − 17 + 0.j
−4.4408921e − 16 + 0.j 0.0000000e + 00 + 0.j ,
which, as we can see, is very close to 0 for all values.
Exercise 3:

10
Try additional examples in the SciPy tutorials (https://docs.scipy.
org/doc/scipy/reference/). While doing so, plot graphs using the PyPlot
library.

Mat Plot Lib
No ratings yet
Mat Plot Lib
51 pages
Unit Vi
No ratings yet
Unit Vi
60 pages
03-Python Libraries - Numpy - Matplotlib
No ratings yet
03-Python Libraries - Numpy - Matplotlib
56 pages
Numpy and Matplotlib
No ratings yet
Numpy and Matplotlib
25 pages
Python Unit 3
No ratings yet
Python Unit 3
38 pages
Python 5th Sem
No ratings yet
Python 5th Sem
33 pages
HKU - 7001 - 3.2 Managing Data II
No ratings yet
HKU - 7001 - 3.2 Managing Data II
67 pages
Introduction To Numpy
No ratings yet
Introduction To Numpy
13 pages
Unit II 07 Numpy
No ratings yet
Unit II 07 Numpy
6 pages
Numpy Python
No ratings yet
Numpy Python
36 pages
ML Practice Session 1
No ratings yet
ML Practice Session 1
8 pages
Unit 2
No ratings yet
Unit 2
25 pages
45B AIML Practical1.1
No ratings yet
45B AIML Practical1.1
57 pages
Hands On Intro Prog II 24CSA 115 Lab5 Numpy
No ratings yet
Hands On Intro Prog II 24CSA 115 Lab5 Numpy
10 pages
Lab Sheet 05 - Numpy and Matplotlib
No ratings yet
Lab Sheet 05 - Numpy and Matplotlib
12 pages
Tutorial 2
No ratings yet
Tutorial 2
9 pages
DS 4 1 Unit 2
No ratings yet
DS 4 1 Unit 2
60 pages
Numpy
No ratings yet
Numpy
4 pages
Python Unit 4
No ratings yet
Python Unit 4
43 pages
Python Presentation 3
No ratings yet
Python Presentation 3
44 pages
Ch2 Numpy Pandas
No ratings yet
Ch2 Numpy Pandas
87 pages
NumPy Notes
No ratings yet
NumPy Notes
13 pages
Comprehensive NumPy Guide for Python
No ratings yet
Comprehensive NumPy Guide for Python
30 pages
Lab-3 AI
No ratings yet
Lab-3 AI
21 pages
6 Numpy VI
No ratings yet
6 Numpy VI
126 pages
Ex 1
No ratings yet
Ex 1
6 pages
Grace Python Numpy MB Final
No ratings yet
Grace Python Numpy MB Final
55 pages
AI/ML Python Modules
No ratings yet
AI/ML Python Modules
17 pages
02 Appendix 2 Python Packages
No ratings yet
02 Appendix 2 Python Packages
25 pages
Numpy
No ratings yet
Numpy
64 pages
Unit-V Python - BCC402
No ratings yet
Unit-V Python - BCC402
20 pages
New Chat
No ratings yet
New Chat
30 pages
B14 - LT2 - 07 - Numpy Matplotlib Pandas
No ratings yet
B14 - LT2 - 07 - Numpy Matplotlib Pandas
101 pages
Scipy, Matplotlib, Pandas
No ratings yet
Scipy, Matplotlib, Pandas
16 pages
12 Numpy&Matplotlib
No ratings yet
12 Numpy&Matplotlib
48 pages
Python Abstract
No ratings yet
Python Abstract
7 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
Lab 2, Python Numpy
No ratings yet
Lab 2, Python Numpy
9 pages
Module 4
No ratings yet
Module 4
4 pages
Lab 2, Python Numpy - LUMS
No ratings yet
Lab 2, Python Numpy - LUMS
4 pages
NUMPY
No ratings yet
NUMPY
33 pages
Numpy 1
No ratings yet
Numpy 1
14 pages
IRJET Scientific Computing and Data Anal
No ratings yet
IRJET Scientific Computing and Data Anal
13 pages
Cs229 Python Friday
No ratings yet
Cs229 Python Friday
38 pages
NumPy Basics
No ratings yet
NumPy Basics
23 pages
Python Module 5
No ratings yet
Python Module 5
43 pages
Numpy For Mathematical Computing
No ratings yet
Numpy For Mathematical Computing
41 pages
Lesson 03 Python Libraries For Data Science
No ratings yet
Lesson 03 Python Libraries For Data Science
190 pages
NumPy Guide for Python Beginners
No ratings yet
NumPy Guide for Python Beginners
67 pages
Python Sem V Portion 2
No ratings yet
Python Sem V Portion 2
29 pages
An Introduction To Numpy and Scipy by Scott Shell
No ratings yet
An Introduction To Numpy and Scipy by Scott Shell
24 pages
Numpy 2
No ratings yet
Numpy 2
24 pages
Python for Numerical Methods
No ratings yet
Python for Numerical Methods
34 pages
Lab 2 DWM
No ratings yet
Lab 2 DWM
13 pages
Unit 5 Python Packages 240127 185930
No ratings yet
Unit 5 Python Packages 240127 185930
34 pages
Introduction To NumPy
No ratings yet
Introduction To NumPy
27 pages
EME Project Dilpoma CLG
No ratings yet
EME Project Dilpoma CLG
10 pages
SAIPEM MS Grounding & Lightning Protection
50% (2)
SAIPEM MS Grounding & Lightning Protection
20 pages
DT-9 Controller Training
No ratings yet
DT-9 Controller Training
10 pages
Export 2
No ratings yet
Export 2
328 pages
Reaction Lab Manual PDF
No ratings yet
Reaction Lab Manual PDF
25 pages
Trouble Shooting Guide: Error Indicator
No ratings yet
Trouble Shooting Guide: Error Indicator
67 pages
Occupational Safety Health
No ratings yet
Occupational Safety Health
729 pages
Marine Offshore Brochure - en
No ratings yet
Marine Offshore Brochure - en
28 pages
Auditing: The Art and Science of Assurance Engagements, Thirteenth
No ratings yet
Auditing: The Art and Science of Assurance Engagements, Thirteenth
362 pages
Thermo Scientific Orion 960 Titrator: Addendum
No ratings yet
Thermo Scientific Orion 960 Titrator: Addendum
11 pages
Itr 23-24
No ratings yet
Itr 23-24
3 pages
ECM GD-08 - Placards Required On Board-US
No ratings yet
ECM GD-08 - Placards Required On Board-US
1 page
Firestone Tire and Rubber Co. v. Ines Chavez PDF
No ratings yet
Firestone Tire and Rubber Co. v. Ines Chavez PDF
3 pages
Types of Flooring
100% (1)
Types of Flooring
12 pages
English Model Paper with Questions
No ratings yet
English Model Paper with Questions
14 pages
Hindustan Aeronautics Ltd. Balance Sheet
No ratings yet
Hindustan Aeronautics Ltd. Balance Sheet
3 pages
Factors Affecting Cost. Cost Management
No ratings yet
Factors Affecting Cost. Cost Management
2 pages
SIP OF INVENTORY MANAGEMET (1) Pradeep P
No ratings yet
SIP OF INVENTORY MANAGEMET (1) Pradeep P
48 pages
Data Science: Normal Distribution Guide
No ratings yet
Data Science: Normal Distribution Guide
12 pages
Efficient Flame Detection and Early Warning
No ratings yet
Efficient Flame Detection and Early Warning
9 pages
Environmental Assessment Regulations, 1999 (Li 1652
No ratings yet
Environmental Assessment Regulations, 1999 (Li 1652
60 pages
Mahendra Satyam Reliveing Letter
No ratings yet
Mahendra Satyam Reliveing Letter
2 pages
Shubhangi Blackbook-1
No ratings yet
Shubhangi Blackbook-1
64 pages
9-Scattering Matrix
No ratings yet
9-Scattering Matrix
7 pages
CSRG Oecd Principals
No ratings yet
CSRG Oecd Principals
35 pages
GTA V PC Keyboard Controls
No ratings yet
GTA V PC Keyboard Controls
3 pages
Style of Business Letter
100% (2)
Style of Business Letter
23 pages
Load and Trımsheet
0% (1)
Load and Trımsheet
8 pages
Integration Guide, DSE0421, DSE0451 - OEM Area Imager Decoded Scan Engine
No ratings yet
Integration Guide, DSE0421, DSE0451 - OEM Area Imager Decoded Scan Engine
52 pages
Lock Out Tag Out Procedures
100% (9)
Lock Out Tag Out Procedures
52 pages

Lab Description File

Uploaded by

Lab Description File

Uploaded by

SCC403 – Data Mining

Week 2: Introduction to the basic Python

Aim of the session:

In computer programming, we can use libraries to help us develop our appli-

2.1 Basic Operations with arrays in NumPy

2.2 Indexing arrays

2.3 Linear Algebra

Let us consider the following array:

Matplotlib is a library for plotting graphs in Python. It has an Objected

Formatting the style of the plot

SciPy is a collection of mathematical algorithms and convenience functions

The stats package contains various probability distributions and statistical

4.2 Linear Algebra

You might also like