Unit3 Notes
Unit3 Notes
1. Introduction to Matrices
Matrix (or linear algebra) is also called the mathematics of data. It is arguably the pillar of the study of
Artificial Intelligence and therefore this topic is advised as a prerequisite prior to getting started with
the study of Artificial Intelligence.
When we represent a set of numbers in the form of ‘M’ horizontal line (called rows) and ‘N’
vertical line (called columns), this arrangement is called m x n (m by n) matrix.
If A= | 1 2 3|
|4 5 6|
|7 8 9|
1
The top row is row 1. The leftmost column is column 1. This matrix is a 3x3 matrix because it
has three rows and three columns. In describing matrices, the format is:
rows X columns
Each number that makes up a matrix is called an element of the matrix. The elements in a matrix
have specific locations.
The upper left corner of the matrix is [row 1 x column 1]. In the above matrix the element at
row 1 column 1 is the value 1. The element at [row 2 x column 3] is the value 6.
Quick Question
Question 1: What is the location of value 8?
Question 2: What is the value at location row 3 x column 2?
Activity 1
Mohan purchased 3 Math books, 2 Physics books and 3 Chemistry books. Sohan purchased 8
Math books, 7 Physics books and 4 Chemistry books.
3 2 3
8 7 4 2 x 3 matrix
A = [1 3 -5]
1
A= [ 3 ]
−5
3. Square Matrix: A matrix in which number of rows are equal to number of columns.
A= | 1 2 3|
|4 5 6|
|7 8 9|
4. Diagonal Matrix: A matrix with all elements zero except its leading diagonal.
A= | 2 0 0|
2
| 0 3 0|
| 0 0 4|
5. Scalar Matrix: A matrix in which all the diagonal elements are equal and all other
elements are zero.
A= |5 0 0|
|0 5 0|
| 0 0 5|
And if all diagonal element is unity (1) and all other non-diagonal element is equal to
zero, this matrix is called Unit matrix.
A= |1 0 0|
|0 1 0|
|0 0 1|
1.2. Matrix Operations
AT = |1 3 5 |
|2 4 6|
Inverse
For matrices, there is no such thing as division. You can add, subtract or multiply but you can’t
divide them. There is a related concept, which is called "inversion".
Matrix inversion is a process that finds another matrix that when multiplied with the matrix,
results in an identity matrix. Given a matrix A, find matrix B, such that
AB = I n or BA = I n
AB = BA = I n
Calculating inverse of matrix is slightly complicated, so let us use Inverse matrix calculator
2. Determinant
Every square matrix can be expressed using a number which is known as it determinant.
If A = [aif] is a square matrix of order n, then determinant of A is denoted by det A or |𝐴| .
3
To find the value assigned to determinant we can expand it along any row or column.
Explanation: let us take a 2 x 2 matrix -
|a b|
A= | c d|
The determinant is: |A| = ad − bc
Example 1
If A= | 2 4|
|3 8|
|A| = 2 x 8 – 4 x 3
= 16 – 12
=4
Example 2: A= | 6 1 1|
|4 -2 5|
|2 8 7 |
1. Vector Addition
Vectors of equal length can be added to create a new vector
x=y+z
The new vector has the same length as the other two.
X = (y1 + z1, y 2 + z 2, y3 + z3 )
2. Vector Subtraction
Vector of unequal length can be subtracted from another vector of equal length to create a new
third vector.
4
x=x−y
As with addition, the new vector has the same length as the parent vectors and each element
of the new vector is calculated as the subtraction of the elements at the same indices.
X = (y1 - z1, y 2 - z 2, y3 - z3)
3.Vector Multiplication
If we perform a scaler multiplication, there is only one type operation – multiply the scaler
with a scaler and obtain a scaler result,
axb=c
But vector has a different story, there are two different kinds of multiplication - the one in
which the result of the product is scaler and the other where the result of product is vector
(there is third one also which gives tensor result, but out of scope for now)
To begin, let’s represent vectors as column vectors. We’ll define the vectors A and B as the
column vectors
A= | Ax | B= | Box |
| Ay | | By |
| Az| | Bz|
Physical quantities are of two types:
Scaler: Which has only magnitude, no direction.
Vector: Which has both in it – magnitude and direction.
This is the first type of vector multiplication, called dot product, written as A.B. The vector dot
product, multiplication of one vector by another, gives scaler result.
[Where do we use it in AI – This operation is used in machine learning to calculate
weight. Please refer “weight” in the Unit 2: Deep Learning]
If i = unit vector along the direction of x -
axis j = unit vector along the direction of y
-axis k = unit vector along the direction of z
-axis
Vector Dot Product
If there are 2 vectors, vector a = a 1i + a2j + a3k
And vector b = b1i + b2j + b3k
Their dot product a.b = a1b1 + a2b2 + a3b3
| b1 b2 b3 |
Example 1
Calculate the dot product of a= (1,2,3) and b= (4, −5,6).
5
Using the formula for the dot product of three-dimensional vectors,
a. b = a1b1 + a2b2 + a3b3,
we calculate the dot product to be
a⋅b=1(4) + 2(−5) + 3(6) = 4−10+18 = 12.
Practice Sum -1: Calculate the dot product of c = (−4, −9) and d = (−1,2).
1.4. Matrix and Matrix Arithmetic
Matrices are a foundational elements of linear algebra. Matrices are used in machine learning
to processes the input data variable when training a model.
1.4.1. Addition of matrices
A and B are two matrices of order m x n (means it has m rows and n columns), then their sum
A+B is a matrices of order m x n, is obtained by adding corresponding elements of A and B.
| 12 1 | B = | 8 9 |
A= | 3 -5 | -1 4 |
A + B = | 12+8 1+9 | | 20 10 |
| 3 + (-1) -5 + 4 | = |2 -1 |
1.4.2. Multiplication of a matrix by a scalar
Let A = [a if] be an m x n matrix and K be any number called a scalar. Then matrix obtained by
multiplying scalar K is denote by K A
If A = | 12 1 |
| 3 -5 | and K = 2
Then K A = = | 24 2 |
| 6 -10 |
A= | 2 -3 4 | and B=
| 2 5|
| 3 6 -1 | | -1 0 |
| 4 -2 |
They are meet the condition for matrix multiplication.
Now we use to multiply them A and B matrix as
(first row of A) X First column of B
(first Row of A) X second column of B
(second row of A) X (first column of B)
(second row of A) X (second column of B)
For example
6
AB = | (2x2) + (-3 x -1) + (4 x 4) (2x5) + (-3 x 0) + (4x -2) |
| (3 x 2) + (6 x -1) + (- 1 x 4) (3 x 5) + (6 x 0) + (-1 x -2) |
= | 4+3+16 10+0-8 |
| 6 -6 -4 15 + 0 +2 |
= | 23 2 |
|- 4 17 |
We use database (like Oracle, MS SQL server, MySql etc.) to store digital data. Database is
made up of several components, of which table is the most important. Database stores the
data in the table. Without tables, there would not be must significance of the DBMS.
For example, student database and its 2 tables
7
Please see the records in the ‘Activity Table’, does this information make any meaning - No?
But if you combine the information from the 2 tables - ‘Students Table’ and ‘Activities Table’,
you get a meaning information.
For example, student John Smith, participated in swimming and he must have paid $17.
The data in the table of database are of limited values unless the data from different tables are
combined and manipulated to generate useful information. And from here, the role of
relational algebra begins.
Relational algebraic is a set of algebraic operators and rules that manipulates the relational
tables to yield desired information. Relational algebra takes relation (table) as their operands
and returns relation (table) as their results. Relational algebra consists of eight operators:
SELECT, PROJECT, JOIN, INTERSECT, UNION, DIFFERENCE, PRODUCT, AND DIVIDE.
= {5}
iv) Set Difference
Difference between sets is denoted by ‘A – B’, is the set containing elements of set A but not in
B. i.e. all elements of A except the element of B.
A – B = {2}
v) Cartesian Product
Remember the term used when plotting a graph, like axes (x-axis, y-axis). For example, (2, 3)
depicts that the value on the x-plane (axis) is 2 and that for y is 3 which is not the same as (3,
2).
The way of representation is fixed that the value of the x- coordinate will come first and then
9
that for y (ordered way). Cartesian product means the product of the elements say x and y in
an ordered way.
A and B are two non-empty sets, then the Cartesian product of two sets, A and set B is the
set of all ordered pairs (a, b) such that a ∈A and b∈B which is denoted as A × B.
A x B = {(2,3) ;(2,4) ;(2,5) ;(3,3) ;(3,4) ;(3,5) ;(4,3) ;(4,4) ;(4,5)}
2.3. Data Tables Join (SQL Joins)
You may have understood by now that relational databases are based almost entirely upon set
theory. In fact, if you’ve ever worked with or SQL queried a database you’re probably familiar
with the idea of finding records from a database tables. Finding records from a database tables
is nothing but some form of set operations.
Look at the diagram below, all possible table join operations have been summarized here for
your quick reference:
10
In an inner join, only those tuples that satisfy the matching criteria are included, while the
rest are excluded. Let's study various types of Inner Joins.
2. LEFT (OUTER) JOIN
Select records from the first (left-most) table with matching right table records.
In the left outer join, operation allows keeping all tuple in the left relation. However, if there is
no matching tuple is found in right relation, then the attributes of right relation in the join
result are filled with null values.
3. RIGHT (OUTER) JOIN
Select records from the second (right-most) table with matching left table records.
In the right outer join, operation allows keeping all tuple in the right relation. However, if
there is no matching tuple is found in the left relation, then the attributes of the left
relation in the join result are filled with null values.
4. FULL (OUTER) JOIN
Selects all records that match either left or right table records.
In a full outer join, all tuples from both relations are included in the result, irrespective of the
matching condition.
3.1.1. Mean
In statistics, the mean (more technically the arithmetic mean or sample mean) can be
estimated from a sample of examples drawn from the domain. It is a quotient obtained by
dividing the total of the values of a variable by the total number of their observations or items.
If we have n values in a data set and they have values x1, x2, x3 …, the sample mean,
M = (x1 + x2 + x3 …xn) / n
And if we need to calculate the mean of a grouped
data, M = ∑fx / n
Where M = Mean
∑ = Sum total of the scores
f = Frequency of the distribution
x = Scores
n = Total number of cases
Example 1
The set S = { 5,10,15,20,30},
Mean of set S = 5+10+15+20+30/5 = 80/5 = 16
Example 2
Calculate the mean of the following grouped data
Class
Frequency
12
2-4 3
4-6 4
6–8 2
8 – 10 1
Solution
Mid
Class Frequency (f) value (x) f⋅x
2 -4 3 3 9
4-6 4 5 20
6–8 2 7 14
8–
1 9 9
10
n=10 ∑f⋅x=52
=52 / 10
= 5.2
When to use Mean?
1. Mean is more stable than the median and mode. So that when the measure of
central tendency having the greatest stability is wanted mean is used.
2. When you want to includes all the scores of a distribution
3. When you want your result should not be affected by sampling data.
When not to use the mean
1. The mean has one main disadvantage: it is particularly susceptible to the influence of
outliers. These are values that are unusual compared to the rest of the data set by
being especially small or large in numerical value.
For example, consider the wages of staff at a factory below:
Staff 1 2 3 4 5 6 7 8 9 10
Salar 15 18 16 14 15 15 12 17 90 95
y k k k k k k k k k k
Mean = Total Salary / Number of Staffs
13
= 307 / 10
= 30.7 K
The mean salary for these ten staff is INR 30.7k. However, inspecting the raw data suggests
that this mean value might not be the best way to accurately reflect the typical salary of a
worker, as most workers have salaries in the INR 12k to INR 18k range. The mean is being
skewed by the two large salaries.
3.1.2. Median
The median is another measure of central tendency. It is positional value of the variables
which divides the group into two equal parts one part comprising all values greater than
median and other part smaller than median.
Following series shows marks in mathematics of students learning AI
17 32 35 15 21 41 32 11 10 20 27 28 30
Number 22 38 46 35 20
of
workers
0-10 22 22
10-20 38 60
20-30 46 106
30-40 35 141
40-50 20 161
15
N=161
20.5
× 10
46
= 20 + 4.46
Median = 24.46
16
3.1.3. Mode
Mode is another important measure of central tendency of statistical series. It is the value
which occurs most frequently in the data series. On a histogram it represents the highest bar in
a bar chart or histogram. You can, therefore, sometimes consider the mode as being the most
popular option. An example of a mode is presented below:
17
and f0 = Frequency corresponding to the pre-modal class
Example – 2: Calculate mode for the following data:
Class Interval 10-20 20-30 30-40 40-50 50-60
Frequency 3 10 15 10 2
Answer: As the frequency for class 30-40 is maximum, this class is the modal class. Classes 20-
30 and 40-50 are pre-modal and post-modal classes respectively. The mode is:
Mode= 30 + 10× [(15-10)/ (2×15-10-10)] = 30+ 5= 35
There are two methods for calculation of mode in discrete frequency series:
(i) By inspection method - Same as above example.
(ii) Grouping method:
More than one value may command the highest frequency in the series.
In such cases grouping method of calculation is used.
The mean is a good measure The median is a good Mode is used when you
of the central tendency when measure of the central value need to find the
a data set contains values when the data include distribution peak and
that are relatively evenly exceptionally high or low peak may be many.
spread with no exceptionally values. The median is the
For example, it is
high or low values. most suitable measure of
important to print more
average for data classified on
of the most popular
an ordinal scale.
books; because printing
different books in equal
numbers would cause a
shortage of some books
and an oversupply of
others.
18
3.2. Variance and Standard Deviation
Measures of central tendency (mean, median and mode) provide the central value of the data
set. Variance and standard deviation are the measures of dispersion (quartiles, percentiles,
ranges), they provide information on the spread of the data around the centre.
In this section we will look at two more measures of dispersion: Variance and standard
deviation.
Let us understand these two using a diagram:
Let us measure the height (at the shoulder) of 5 dogs (in millimetres)
As you can see, their heights are: 600mm, 470mm, 170mm, 430mm and 300mm.
Let us calculate their mean,
Mean = (600 + 470 + 170 + 430 + 300) / 5
= 1970 / 5
= 394 mm
Now let us plot again after taking mean height (The green Line)
19
Now, let us find the deviation of dogs height from the mean height
Calculate the difference (from mean height), square them , and find the average. This average
is the value of the variance.
Variance = [ (206) 2 + (76) 2 + (-224) 2 + (36) 2 + (-94) 2] / 5
= 108520 / 5
= 21704
And standard deviation is the square root of the variance.
Standard deviation = √21704 = 147.32
I am assuming that the example above, must have given you a clear idea about the variance
and standard deviation.
So just to summarize, Variance is the sum of squares of differences between all numbers and
means.
In order to calculate variance , first, calculate the deviations of each data point from the mean,
and square the result of each .
Say, there is a data range: 2 ,4 ,4,4,5,5,7,9
Calculate the variance:
Find the mean first: (2 + 4 + 4 + 4+ 5 + 5 + 7 + 9) / 8
=5
20
Then sum of square of differences between all numbers and mean =
(2-5) 2 + (4-5) 2 + (4-5) 2 + (4-5) 2 + (5-5) 2 + (5-5) 2 + (7-5) 2 + (9-5) 2
= 9 + 1 +1 + 1+ 0 +0 + 4 + 16
= 32
This module will provide an introduction about the purpose, importance and various methods
of data representation using graphs. Statistics is a science of data, so we deal with large data
volume in statistics or Artificial Intelligence. Whenever volume of data increases rapidly, an
efficient and convenient technique for representing data is needed. For a complex and large
quantity, human brain is more comfortable in dealing if represented through visual format.
And that is how the need arise for the graphical representation of data.
The important topics that we are going to cover in this module is:
3.1. Why do we need to represent data graphically?
3.2. What is a Graph?
3.3. Types of Graphs
3.1 Why do we need to represent data graphically?
There could be various reasons of representing data on graphs, few of them have been
outlined below
21
The purpose of a graph is to present data that are huge in volume or complicated to
be described in the text / tables.
Graphs only represent the data but also reveals relations between variables and shows
the trends in data sets.
Graphical representation helps us in analysing the data.
3.2. What is a Graph?
Graph is a chart of diagram through with data are represented in the form of lines or curve
drawn on the coordinated points and its shows the relation between variable quantities.
The are some algebraic and coordinate geometry principle which apply in drawing the graphs of
any kind.
Graphs have two axis, the vertical one is called Y-axis
and the horizontal one is called X-Axis. X and Y axis are
perpendicular to each other. The intersection of these
two axis is called ‘0’ or the Origin. On the X axis the
distances right to the origin have positive value (see fig.
7.1) and distances left to the origin have negative
value. On the Y axis distances above the origin have a
positive value and below the origin have a negative
value.
22
3.3. Types of Graphs
3.3.1 Bar Graphs
As per Wikipedia “A bar chart or bar graph is a chart or graph that presents categorical data
with rectangular bars with heights or lengths proportional to the values that they represent “.
It is a really good way to show relative sizes of different variables.
There are many characteristics of bar graphs that make them useful. Some of these are that:
They make comparisons between different variables very easy to see.
They clearly show trends in data, meaning that they show how one variable is affected
as the other rises or falls.
Given one variable, the value of the other can be easily determined.
Example 1
The percentage of total income spent under various heads by a family is given below.
Different House
Food Clothing Health Education Miscellaneous
Heads Rent
% Age of
Total 40% 10% 10% 15% 20% 5%
Number
23
3.3.2 Histogram
Histogram is drawn on a natural scale in which the representative frequencies of the different
class of values are represented through vertical rectangles drawn closed to each other.
Measure of central tendency, mode can be easily determined with the help of this graph.
Histogram is easy to draw and simple to understand but it has one limitation that we cannot
plot more than one data distribution on the same axis as histogram.
Example 1
Below is the waiting time of the customer at the cash counter of a bank branch during peak
hours. You are required to create a histogram based on the below data.
24
3.3.3. Scatter Plot
Scatter plots is way to represent the data on the graph which is similar to line graphs. A line
graph uses a line on an X-Y axis, while a scatter plot uses dots to represent individual pieces of
data. In statistics, these plots are useful to see if two variables are related to each other. For
example, a scatter chart can suggest a linear relationship (i.e. a straight line).
There is no line but dots are representation the value of variables on the graph.
Scatter plot is most frequently used data plotting technique in machine learning.
When should we use scatter plot:
It is used to observe relationship between two numeric variables. The dots on the plot
not only denotes value of variable but also the patterns, when data taken as whole.
Scatter plot is a useful tool for the correlation. Relationships between variables can be
described in many ways: positive or negative, strong or weak, linear or nonlinear.
25
4. Introduction to Dimensionality of Data
4.1. Data Dimensionality
Dimensionality in statistics refers to how many attributes a dataset has. There is a sample
students dataset with four attributes ( columns ) , so this student dataset is of 4 dimensions.
Students Dataset
So if you have a data-set having n observations (or rows) and m columns (or features), then
your data is m-dimensional.
The dimension of dataset can change without forcing change in another dimension. We can
change the age of students without changing class or address, for example.
26
Combination of these three colours (numbers: 0 – 255 ) ultimately decides the colour, hence
we say that colour space is three-dimensional because there are three “directions” in which a
colour can vary.
Quadrants - I
Quadrants - IV
Please look at the above diagram of the graph and try to reason out why
i) (6, 4) is in first quadrant
ii) ( -6, 4) is in second quadrant
iii) ( -6, -4) is in third quadrant
iv) (6, -4) is in the fourth quadrant
4.3. Multi-Dimensional Data and Graph
Use Case 1
Let us assume a data set of 1-Dimension
Students Dataset
Age
16
15
14
16 91
15 85
14 93
left-right, and
up-down
So any position needs two numbers.
28
Use Case 3
Let us take 3-Dimensional data
Students Dataset
Age Maths Marks Science Marks
16 91 92
15 85 90
14 93 72
How do we locate a spot in the real world (such as the tip of your nose)? We need to know:
left-right,
up-down, and
forward-backward
that is three numbers, or 3 dimensions!
What kind of situation are these? College admission (one variable) depends on other variable
i.e. 12th score. Number of sales (one variable) depends on other variable i.e. product price.
In all the situations, there are two variables – one is input variable (12th score, product price
etc.) and other one is the outcome (college admission, sales, farming etc.)
We know these two variables i.e. input and outcome are related but what is the equation of
the relation is unknown.
29
Example 1
The general formula of linear equation is:
Ax + By = C
Cab fare revised, new fare is = Fixed amount(x) + twice the number of Km travelled
For a journey of 1 KM, cab fare = x + 2y
Slope of straight line in this case = -2
Thereby, if data points change, the slop of linear equation also changes.
We need to know that linear equation changes its path only when condition of variable changes.
Example 2
When we collect data, sometimes there are values that are "far away" from the main group
of data, how does that ‘far away’ value (called outlier) impacts the equation? What do we
do with them?
Below is the Delhi daily temperature data, recorded for a week:
Temperature recorded (degree C): 1st Week of June
Day st nd rd th th th th
1 2 3 4 5 6 7
Temp. 42 44 47 30 40 43 46
30
4th of June, it rained in Delhi and therefore temperature dipped
Now, let us take the outlier out, and calculate the mean
Mean (without outlier) = 43.66
Because data range is very small, even though we notice visible difference in the mean. When we
remove outliers we are changing the data, it is no longer "pure", so we shouldn't just get rid of the
outliers without a good reason! And when we do get rid of them, we should explain what we are
doing and why.
Example 1:
Let us collect a data how many hours of sunshine vs how many ice creams were sold at the shop from
Monday to Friday:
X ( hours of sunshine) Y ( Ice cream sold )
2 4
3 5
5 7
7 10
9 15
2 4 4 8
3 5 9 15
31
5 7 25 35
7 10 49 70
9 15 81 135
2 4 4 8
3 5 9 15
5 7 25 35
32
7 10 49 70
9 15 81 135
c = Σy − m Σx / N
= 41 − 1.5183 x 26 /
5
= 0.3049...
Step 5: Assemble the equation of a line:
y = mx + c
y = 1.518 x + 0.305
33
Here are the (x,y) points and the line y = 1.518x + 0.305 on a graph:
Once you hear the weather forecast which says "we expect 8 hours of sun tomorrow", so you use the
above equation to estimate that you will sell
y = 1.518 x 8 + 0.305 = 12.45 Ice Creams
34