0% found this document useful (0 votes)

7 views8 pages

Lab Description File

The document outlines the first week of a Data Mining course, focusing on the installation of Python and an introduction to linear algebra concepts such as vectors, matrices, and distance metrics. It provides instructions for setting up Python IDEs, including Anaconda and online compilers, and emphasizes the importance of understanding linear algebra for machine learning. Additionally, it includes exercises for implementing basic vector and matrix operations in Python.

Uploaded by

Ren Keting

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views8 pages

Lab Description File

Uploaded by

Ren Keting

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

SCC403 – Data Mining

Week 1: Introduction

Aim of the session:

• Installations and Introduction to Python

• Introduction to Linear Algebra using Python, including:

• Vectors and Matrices

• Common Vector and Matrices Operations

• Distance Metrics

1
1 Introduction
Welcome to the first practical session of the Data Mining course. In this session we are going to
review some of the fundamental mathematical concepts that we are going to use in this module,
and do some programming exercises with these concepts. That will solidify the base that you are
going to need for the lab sessions, for the coursework assignments and for the future.
We are going to use Python, version 3. Python is a general purpose programming language, and
it is widely used in data science, scientific computing and machine learning. Python is completely
free and open source. Additionally, it is very popular for three main reasons: (i) it is easy to learn,
being recommended as a “first” programming language; (ii) it allows easy and quick development
of applications; (iii) it has a great variety of useful and open libraries.
If you have never used Python before, then please take a look at a Python tutorial before
proceeding, so that you can follow the Data Mining labs smoothly. For example, the official
Python tutorial at https://docs.python.org/3/tutorial/index.html is a good starting point.
The TAs are going to demonstrate how to start Python, so in this lab you can focus on Chapter
3, 4 and 5. That is, from “An Informal Introduction to Python” until (and including) “Data
Structures”. However, if you already have experience with Python, you can start the lab from the
next section.

2 Online IDEs for Python

First things first. Before we start we need to install our Integrated Development Environment or
IDE.
There are several options.
Perhaps the easiest is to use one of the simplest IDE - REPL. A more sophisticated option is
to use the OnlineGBD compiler.

• https://repl.it/
Click in “new repl”, no need to sign up

• https://www.onlinegdb.com/online_python_compiler

However, perhaps, the best option is to install Spyder (an abbreviation from Scientific PYthon
Development EnviRonment) from Anaconda Navigator (which is free to install). It is also possible
to find Anaconda on AppsAnywhere. In the labs on campus it will be installed, but if you like to
install it at home the following steps must be followed:

• Download and install Anaconda from https://www.anaconda.com/

• Open Anaconda Navigator and in Environments find Python and mark specific version and
chose Python 3.7.7 (the default version may be 3.8 - please, see next item for the secuence
needed to downgrade it to 3.7.7)

• Again in the Environments of Anaconda Navigator check that PyTorch is installed and check
its version. If it is version 1.6 downgrade it to version 1.4. After that you will be able to also
downgrade the version of Python from the default version 3.8 down to version 3.7.7 which
we need (for Labs 8, 9 and 10 you will need PyTorch v1.4)

• install Torchvision (which you will need for Lab9) from Anaconda command prompt by the
command:
1 pip install torchvision

2
3 Linear Algebra Revision
Data is usually organised in tables, where each row represents an item, and each column a feature
of the item. That is, each item is normally characterised by a list of values. For example, a fruit
could be characterised by its width, height, weight, colour, type, price, etc. In this lab we will
assume that all features are numbers. Later in class we will see how to handle different kind of
features.
Therefore, mathematically we can see a data-set as a matrix, and each item as one point (or a
vector ) in a multi-dimensional space. This mathematical point of view is the fundamental basis of
many of the machine learning techniques. Hence, it is important to understand the basic concepts
of linear algebra.

3.1 Vectors, Spaces, and Matrices

We represent a list of values as a vector, which are usually represented in bold face. For example:
v = (1, 5, 9, 10.5), is a vector that holds 4 values (features). Another common representation is to
write an arrow on the top of the variable, e.g.: ~v = (1, 5, 9, 10.5). Usually we will use vectors in
the real space (R), which means that a feature can be any arbitrary number (instead of being, for
example, only integers).
The dimension of the space is the number of features that we have in our data-set. Hence, in
the examples above we have vectors with four dimensions, and we say that the vectors are in R4
space. In general, if we have n features, we have vectors in Rn space. It is also common to see a
data item with n features as a point in the Rn space.
Only one, two and three dimensional spaces can be directly visualised. We can still do calcu-
lations and reason in higher than 3 dimensions, they just cannot be visualised easily.
When we have multiple items, we represent them as a table, where each row is an item and
each column is a feature. We represent that as a matrix, such as the one below:

3
 
0.2 3 4.5
 2.5 4.1 3.7 
V=
 0.7 1.5 2.5 


2.75 3.5 2.47

This matrix represents four items, each containing three features. If we get one row of the
matrix, we would have one vector/point, for example: V0 = (0.2, 3, 4.5).

3.1.1 Some useful Python functions

Before we go into details of how to represent vectors in Python we will look closer at some specific
Python functions.

• range
The range function returns an iterator that yields a sequence of evenly spaced integers. For
example,
1 In : list ( range (9) )

results in:
1 Out : [0 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8]

You can also try

1 In : list ( range (0 ,10 ,2) )

resulting in:
1 Out : [0 ,2 ,4 ,6 ,8]

or going backwards:
1 In : list ( range (12 ,0 , -2) )

resulting in:
1 Out : [12 ,10 ,8 ,6 ,4 ,2]

• zip The zip function ”pairs” up the elements of a number of lists, tuples or other sequences
to create a list of tuples. For example,
1 In : a ={1 ,2 ,3 ,4}
2 In : b ={10 ,20 ,30 ,40}
3 In : zipped_result = zip (a , b )
4 In : list ( zipped_result )

resulting in:
1 Out : [(1 , 40) , (2 , 10) , (3 , 20) , (4 , 30) ]

• append() method
The append() is a method that appends an element to the end of a list. For example,
1 In : a =[1 ,2 ,3 ,4]
2 In : a . append (999)
3 In : list ( a )

resulting in:

4
1 Out : [1 ,2 ,3 ,4 ,999]

Note, that instead of

1 In : list ( a )

one may also use

1 In : print ( a )

3.1.2 Representation in Python

There are two ways to represent vectors and matrices in Python: as lists or as NumPy arrays. In
this lab we are going to first use the list representation, in order for you to exercise how the data
manipulation operations can be implemented. In the next lab we will introduce NumPy arrays,
which facilitates the vector/matrices manipulation.
Hence, representing a vector is straight-forward. For instance, our example vector ~v = (1, 5, 9, 10.5)
would be represented as:
1 v = [1 , 5 , 9 , 10.5];
For matrices, we can use list of lists. That is, we see the matrix
 as a list of rows,where each
0.2 3 4.5
 2.5 4.1 3.7 
row is a list of columns. Therefore, our example matrix V =   0.7
 would be
1.5 2.5 
2.75 3.5 2.47
represented as:
1 V = [[0.2 , 3 , 4.5] ,[2.5 , 4.1 , 3.7] ,[0.7 , 1.5 , 2.5] ,[2.75 , 3.5 ,
2.47]];
Therefore, to access the first element of a vector V, you would use in Python V[0]. To get the
second row of a matrix V, you would use V[1]. And to access the third element of the second row,
it would be V[1][2].

5
3.2 Vector Operations
There are several useful operations that we can perform with vectors and matrices, for example:

• We can sum two vectors a and b. Let a = {a1 , a2 , . . . , an } and b = {b1 , b2 , . . . , bn }. The sum
a + b = {a1 + b1 , a2 + b2 , . . . , an + bn }.

• We can multiply a vector a by a scalar (i.e., a number) λ. Let a = {a1 , a2 , . . . , an } and λ be

a scalar. The multiplication of λ × a = {λa1 , λa2 , . . . , λan }.

• We can compute the dot product between two vectors. Let a = {a1 , a2 , . . . , an } and b =
{b1 , b2 , . . . , bn }. The dot product a· b = a1 × b1 + a2 × b2 + . . . + an × bn .

Exercise 1:
Write the following three methods in Python:

• Sum(a, b): Receives two vectors a and b represented as lists, and returns their sum.

• Mult(a, lambda): Receives a vector a and a scalar λ and returns λa.

• DotProduct(a, b): Receives two vectors a and b and returns the dot product a· b.

3.3 Matrix Operations

Similarly, we can apply several operations on matrices. For instance, we can calculate a matrix
C = AB, multiplying matrix A with matrix B. Let ci,j be an element in the new matrix C in row
i and column j (and correspondingly ai,j and bi,j for
Pnmatrices A and B). Each element ci,j in the
new matrix C will be given by the equation ci,j = k=1 ai,k bk,j .
That is, the element in row i and column j (ci,j ) in the new matrix C will be given by going
through row i in the matrix A and column j in the matrix B. We then multiply each pair of
numbers, and sum up all the results. For example:

2 1 4 6 2∗4+1∗5 2∗6+1∗7
=
3 4 5 7 3∗4+4∗5 3∗6+4∗7
Exercise 2:
Write in Python a method mult(A, B), which receives two n × n matrices A and B, and returns
a new matrix C = AB. Assume that the matrices are represented as lists of lists.

3.4 Transpose and Inverse

Another common operation in vector and matrices is to calculate its transpose. We indicate the
transpose of a matrix A by AT , where each row of A becomes a column of AT . That is, ai,j = aTj,i .
   
2 1 3 2 4 7
For example, if A =  4 6 9 , then AT =  1 6 8 . The same applies to vectors. That
7 8 2 3 9 2
 
3
is, if a = (3, 5, 8), then aT =  5 .
8
Exercise 3:

• What is the meaning of abT ?

• Write in Python a method transpose(A), which returns the transpose of a vector or a matrix
A.

6
The inverse of a matrix is a different concept than the transpose. However, before introducing
the inverse, we have to define the identity matrix I. I is a square matrix (i.e., of size n × n),
whose
 diagonal
 elements are all 1s, and all other elements are 0s. For instance, the matrix I =
1 0 0
 0 1 0  is a 3 × 3 identity matrix.
0 0 1
We can now introduce the inverse. We indicate the inverse of a matrix A by A−1 , and it is
defined as the matrix A−1 such that AA−1 = I. That is, the multiplication of a matrix by its
inverse has as a result the identity matrix.
2 1 −1 1 −1
For example, given the matrix A = , its inverse A = . It can be
1 1 −1 2

2 1 1 −1 1 0
verified that =
1 1 −1 2 0 1
Exercise 4:
Write in Python a method isInverse(A,B), which returns True if B is the inverse of A; or False
otherwise. Again, assume that the matrices are represented as lists of lists.
Hint: You are just being asked to verify if B is the inverse, not to actually calculate the inverse.
You can make use of the multiplication method mult(A, B) from Exercise 2,

3.5 Eigenvalues and Eigenvectors

The eigenvalue and eigenvectors of matrices are also very important concepts, which are used in
several data mining methods. Given a matrix A, the eigenvectors v and eigenvalues λ are such
that Av = λv. Each eigenvector v has a corresponding eigenvalue λ, which is the one that makes
the previous equation hold true. How to calculate eigenvalues and eigenvectors are beyond the
scope of this lab, but in the next labs you will find that there are libraries available for doing this
calculation.

3.6 Distance Metrics

Given two points in space, it is very useful to calculate the distance between them. As you may
remember that we represent an item in our dataset as a point. The distance allows us to calculate
how (dis-)similar two items are.
There are many distance metrics. The most common one is the Euclidean distance, which is
defined by the following equation:
v
u n
uX
d(a, b) = t (ai − bi )2 ,
i=1

where n is the dimension of the vectors.

p For example, if the points have only two dimensions, then
the equation would be: d(a, b) = (a1 − b1 )2 + (a2 − b2 )2 .
Exercise 5:

• Given two points a and b, create a method dist(a, b), which returns their Euclidean distance.

• Given a matrix A, create a method lowDist(A), that returns which pairs of rows in A has
the lowest Euclidean distance.

Again, for these exercises please assume that vectors are represented as Python lists and ma-
trices as lists of lists.

Another common metric is the cosine similarity. When calculating the cosine similarity we
will see each item now as a vector (instead of a point). We then check the angle (θ) between two

7
vectors (items) a and b. The lower θ, the more a and b can be considered as “similar”. However,
instead of directly using θ, we use cos(θ). Hence, a cosine similarity of 1 would mean that two
items are equivalent, while a cosine similarity of −1 indicates that they are complete opposites.
The cosine similarity can be directly calculated using the dot product, as:

a· b
cos(θ) =
||a||||b||
√ qP
n 2
We use ||a|| to indicate the norm of a vector, which is the same as a· a = i=1 ai , where
n is the dimension of the vector. Therefore, the above equation leads to:
Pn
ai bi
cos(θ) = qP i=1qP .
n 2 n 2
a
i=1 i b
i=1 i

Exercise 6:
Given two vectors a and b, create a Python method cosSimilarity(a,b), which returns their
cosine similarity.

Principles of Communication Syllabus
No ratings yet
Principles of Communication Syllabus
1 page
Notes Python
No ratings yet
Notes Python
93 pages
Eliwell 978 Manual
No ratings yet
Eliwell 978 Manual
12 pages
Unit 5-Simulation Powerpoint
100% (2)
Unit 5-Simulation Powerpoint
11 pages
Linear Algebra in Python
No ratings yet
Linear Algebra in Python
42 pages
GIS For Resource Assessment Management: Dr.R.Jaganathan Deartment of Geography University of Madras Chennai-600005
No ratings yet
GIS For Resource Assessment Management: Dr.R.Jaganathan Deartment of Geography University of Madras Chennai-600005
79 pages
Machine Learning and Pattern Recognition Programming
No ratings yet
Machine Learning and Pattern Recognition Programming
4 pages
Lin Alg
No ratings yet
Lin Alg
71 pages
Linear Algebra Coding With Python Pythons Application For Linear Algebra
100% (3)
Linear Algebra Coding With Python Pythons Application For Linear Algebra
196 pages
ECON 262-Mathematical Applications in Economics-Kiran Arooj
0% (1)
ECON 262-Mathematical Applications in Economics-Kiran Arooj
4 pages
HD 42
No ratings yet
HD 42
11 pages
CS229 Section: Python Tutorial: Maya Srikanth
No ratings yet
CS229 Section: Python Tutorial: Maya Srikanth
39 pages
Attendence System Using Python
No ratings yet
Attendence System Using Python
6 pages
XS Series E Appen 7 Installation PDF
No ratings yet
XS Series E Appen 7 Installation PDF
101 pages
Math Linear Algebra
No ratings yet
Math Linear Algebra
54 pages
3 IntroToPython-PythonLibraries
No ratings yet
3 IntroToPython-PythonLibraries
36 pages
NumPy Basics
No ratings yet
NumPy Basics
23 pages
HOW-TO-BUILD-AN-ENTREPRENEUR Rich Dad PDF
100% (1)
HOW-TO-BUILD-AN-ENTREPRENEUR Rich Dad PDF
45 pages
‏لقطة شاشة ٢٠٢٤-٠١-٠٣ في ١٠.٤٧.٣٦ م
No ratings yet
‏لقطة شاشة ٢٠٢٤-٠١-٠٣ في ١٠.٤٧.٣٦ م
7 pages
Math Linear Algebra
No ratings yet
Math Linear Algebra
54 pages
Lecture 2 New
No ratings yet
Lecture 2 New
40 pages
Schneider - Industrial Automation - Contractor, Push Button, SMPS, Limit Switch Price List Wef 25-01-2022
No ratings yet
Schneider - Industrial Automation - Contractor, Push Button, SMPS, Limit Switch Price List Wef 25-01-2022
148 pages
Notebook Linear Algebra
No ratings yet
Notebook Linear Algebra
4 pages
Matlab Tutorial: Introduction To Vectors in Matlab
No ratings yet
Matlab Tutorial: Introduction To Vectors in Matlab
6 pages
Session 1
No ratings yet
Session 1
66 pages
INT254
No ratings yet
INT254
14 pages
Ids Unit 3 Notes CSM & CSD
No ratings yet
Ids Unit 3 Notes CSM & CSD
24 pages
PMI - Modules and Data Structures
No ratings yet
PMI - Modules and Data Structures
23 pages
Cs229 Python Friday
No ratings yet
Cs229 Python Friday
38 pages
Tugas Aljabar 2
No ratings yet
Tugas Aljabar 2
54 pages
UNIT 5 Python Aktu
No ratings yet
UNIT 5 Python Aktu
49 pages
Programing in Matlab: General Rules of Matlab
No ratings yet
Programing in Matlab: General Rules of Matlab
8 pages
Unit 5 Numpy and Pandas - in Python
No ratings yet
Unit 5 Numpy and Pandas - in Python
58 pages
Session 11 Lecture 2
No ratings yet
Session 11 Lecture 2
5 pages
Python Unit 5.notes
No ratings yet
Python Unit 5.notes
47 pages
Matrix Methods
No ratings yet
Matrix Methods
16 pages
NumPy Quiz
No ratings yet
NumPy Quiz
9 pages
Module5 Python
No ratings yet
Module5 Python
10 pages
cs229 Python Friday
No ratings yet
cs229 Python Friday
40 pages
CG Lab Week 1 - For Students
No ratings yet
CG Lab Week 1 - For Students
26 pages
PW Matrices and Vectors: The Index 0 (N, M) NXM N M (I) (J)
No ratings yet
PW Matrices and Vectors: The Index 0 (N, M) NXM N M (I) (J)
5 pages
20BIT037 - Data Analytics
No ratings yet
20BIT037 - Data Analytics
30 pages
Lab Description File
No ratings yet
Lab Description File
11 pages
Lektion Python-Linkoping University
No ratings yet
Lektion Python-Linkoping University
14 pages
Python Introduction
No ratings yet
Python Introduction
20 pages
C1 W2 Lab01 Python Numpy Vectorization Soln
No ratings yet
C1 W2 Lab01 Python Numpy Vectorization Soln
19 pages
Programming With Python - PGDBDA - Feb20
No ratings yet
Programming With Python - PGDBDA - Feb20
26 pages
ML Lab Manual
No ratings yet
ML Lab Manual
59 pages
Linear Algebra in Python
No ratings yet
Linear Algebra in Python
19 pages
Linear Algebra For Deep Learning. The Math Behind Every Deep Learning - by Vihar Kurama - Towards Data Science
No ratings yet
Linear Algebra For Deep Learning. The Math Behind Every Deep Learning - by Vihar Kurama - Towards Data Science
20 pages
BASIC Linear Algebra Tools in Pure Python 1658615784
No ratings yet
BASIC Linear Algebra Tools in Pure Python 1658615784
10 pages
Python Presentation 3
No ratings yet
Python Presentation 3
44 pages
Python Tutorial Completed - Michigan PDF
No ratings yet
Python Tutorial Completed - Michigan PDF
15 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
52 pages
Lab1 ML Eac22050
No ratings yet
Lab1 ML Eac22050
17 pages
ELEN E3084: Signals and Systems Lab Lab II: Introduction To Matlab (Part II) and Elementary Signals
No ratings yet
ELEN E3084: Signals and Systems Lab Lab II: Introduction To Matlab (Part II) and Elementary Signals
20 pages
13 - NumPy
No ratings yet
13 - NumPy
46 pages
Reader 133A
No ratings yet
Reader 133A
164 pages
HKSI LE Withdrawal Application Form Chinese
No ratings yet
HKSI LE Withdrawal Application Form Chinese
1 page
Basic Math: 1.1 Scipy Constants (Scipy - Constants)
No ratings yet
Basic Math: 1.1 Scipy Constants (Scipy - Constants)
32 pages
02 Numpy
No ratings yet
02 Numpy
11 pages
Combined Cheatsheet
No ratings yet
Combined Cheatsheet
5 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
C1 W2 Lab01 Python Numpy Vectorization Soln
No ratings yet
C1 W2 Lab01 Python Numpy Vectorization Soln
12 pages
03-Python Libraries - Numpy - Matplotlib
No ratings yet
03-Python Libraries - Numpy - Matplotlib
56 pages
Lab Notes: CE 33500, Computational Methods in Civil Engineering
No ratings yet
Lab Notes: CE 33500, Computational Methods in Civil Engineering
10 pages
Python Summary
No ratings yet
Python Summary
10 pages
Optproject Usermanual
No ratings yet
Optproject Usermanual
72 pages
Matlab®: Academic Resource Center
No ratings yet
Matlab®: Academic Resource Center
40 pages
1.10 Taylor and Maclaurin Series
No ratings yet
1.10 Taylor and Maclaurin Series
12 pages
The Limits Theorem
No ratings yet
The Limits Theorem
18 pages
Economics PPT Education and Skills
No ratings yet
Economics PPT Education and Skills
16 pages
Báo cáo Đa nền tảng
No ratings yet
Báo cáo Đa nền tảng
24 pages
Bibliography
No ratings yet
Bibliography
3 pages
Python Tutorial
No ratings yet
Python Tutorial
32 pages
Document Approval Chain Example
No ratings yet
Document Approval Chain Example
18 pages
Implementation of An E-Commerce System For The Automation and Improvement of Commercial Management at A Business Level
No ratings yet
Implementation of An E-Commerce System For The Automation and Improvement of Commercial Management at A Business Level
7 pages
4 TwinCAT - 3 - PLC - HMI
No ratings yet
4 TwinCAT - 3 - PLC - HMI
27 pages
Section 1
No ratings yet
Section 1
46 pages
How To - Localize With SurvCE PDF
No ratings yet
How To - Localize With SurvCE PDF
3 pages
DCT-Net - Domain-Calibrated Translation For Portrait Stylization
No ratings yet
DCT-Net - Domain-Calibrated Translation For Portrait Stylization
9 pages
Share Whitepaper 7
No ratings yet
Share Whitepaper 7
14 pages
Dragonpay Payment Instruction
No ratings yet
Dragonpay Payment Instruction
1 page
Kafd A1 CJ01 P504 Gas TRN 00402
No ratings yet
Kafd A1 CJ01 P504 Gas TRN 00402
2 pages
COPIA 2-Plantilla Con Formulas V3-1-InTRENA .
No ratings yet
COPIA 2-Plantilla Con Formulas V3-1-InTRENA .
7 pages
Binary Search
No ratings yet
Binary Search
18 pages
Ds Security Operations
No ratings yet
Ds Security Operations
3 pages
DR Deepak02
No ratings yet
DR Deepak02
1 page
A Friendly Introduction to MATLAB Programming
From Everand
A Friendly Introduction to MATLAB Programming
Orhan Gazi
No ratings yet
MATLAB for Beginners: A Gentle Approach - Revised Edition
From Everand
MATLAB for Beginners: A Gentle Approach - Revised Edition
Peter Kattan
No ratings yet

Lab Description File

Uploaded by

Lab Description File

Uploaded by

SCC403 – Data Mining

Aim of the session:

• Introduction to Linear Algebra using Python, including:

• Vectors and Matrices

• Common Vector and Matrices Operations

2 Online IDEs for Python

• Download and install Anaconda from https://www.anaconda.com/

3.1 Vectors, Spaces, and Matrices

2.75 3.5 2.47

3.1.1 Some useful Python functions

You can also try

Note, that instead of

one may also use

3.1.2 Representation in Python

• We can multiply a vector a by a scalar (i.e., a number) λ. Let a = {a1 , a2 , . . . , an } and λ be

• Mult(a, lambda): Receives a vector a and a scalar λ and returns λa.

3.3 Matrix Operations

3.4 Transpose and Inverse

• What is the meaning of abT ?

3.5 Eigenvalues and Eigenvectors

3.6 Distance Metrics

where n is the dimension of the vectors.

You might also like