Engineering Mathematics II

Mr.
NIYIGABA Emmanuel, Assistant Lecturer (IPRC MUSANZE) 1
Engineering Mathematics II(MAT122)

Electrical, Level one, Semester 2
March 20, 2020
Contents
1 Matrices and determinant 4
1.1 Types of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Some special types of square matrices . . . . . . . . . . . . . . . . . . . . 5
1.1.2 Operation with matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.3 Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.4 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.5 Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.6 Scalar Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.7 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.8 Transpose of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.9 Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.10 Properties of transpose and inverse of matrix. . . . . . . . . . . . . . . . 8
1.2 Determinant of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Determinant of order two . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.2 Determinant of order three . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.3 Properties of determinant . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.4 Calculation of the Inverse of matrix . . . . . . . . . . . . . . . . . . . . 9
1.2.5 Minor and cofactors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.6 Inverse of a square matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 System of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Eigen-system and Characteristic Polynomial of a Square Matrix . . . . . . . . . 12
1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Fourier Series 15
2.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Some useful integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Useful trigonometric results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Fourier series with period L 6= 2π . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 Full worked Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Mr. NIYIGABA Emmanuel, Assistant Lecturer (IPRC MUSANZE) 2
3 Laplace Transform 47
3.1 Important Formulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Properties of Laplace Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4 Application of Laplace transform to electricity 50

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 Some formulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3 Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.1 Example One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.2 Example Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5 Appendices 54
5.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2 Some questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.3 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6 Differential Equation 57
6.1 Some definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2 Formulation of Differential equation . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.3 Solution of a Differential Equation . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.3.1 First order DE by Separation of Variables Method . . . . . . . . . . . . . 60
6.3.2 Homogeneous Differential Equations . . . . . . . . . . . . . . . . . . . . 61
6.3.3 Linear Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . 63
6.3.4 Bernoulli’s Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.3.5 Exact Differential Equation . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.4 Linear Second order Differential Equation with Constant Coefficients . . . . . . 66
6.4.1 Solution of homogeneous second order DE with Constant Coefficients . . 67
6.4.2 Solution of non-homogeneous second order DE with Constant Coefficients 67
7 Introduction to PROBABILITY and STATISTICS 69

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.2 Classical Definition of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.3 Laws of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.4 Multiplication Laws of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.5 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.6 Bayes’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.7 Binomial Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8 Probability distribution 77
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8.2 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8.3 Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8.4 Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . 78
8.5 Expectation and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
8.5.1 Expected Value of a Function of X . . . . . . . . . . . . . . . . . . . . . 79
8.5.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.6 The Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.6.1 Expectation and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.6.2 The Poisson,s Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.6.3 Binomial Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
8.6.4 Derivation of the Poisson,s distribution . . . . . . . . . . . . . . . . . . . 81
8.6.5 Some mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
9 Introduction to Statistics 82
9.1 Arrangement of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.2 Types of Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.2.1 Dependent and Independent Variables . . . . . . . . . . . . . . . . . . . 83
9.3 Experimental and Non-Experimental Research . . . . . . . . . . . . . . . . . . . 84
9.3.1 Categorical Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
9.3.2 Continuous variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
10 Descriptive Statistics 86
10.1 Measures of Central Tendency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
10.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
10.1.2 Mean (Arithmetic) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
10.1.3 Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
10.1.4 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
10.2 Measures of Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
10.2.1 Average Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
10.2.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
10.2.3 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
10.3 Five number summary and box plot . . . . . . . . . . . . . . . . . . . . . . . . . 90
10.3.1 Five number summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
10.3.2 Box plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
10.3.3 Procedure for constructing a box plot . . . . . . . . . . . . . . . . . . . . 91
10.3.4 Information Obtained from a Box plot . . . . . . . . . . . . . . . . . . . 91
10.3.5 Range and interquartile range . . . . . . . . . . . . . . . . . . . . . . . . 92
10.4 Graphical Representation of the Statistical Data . . . . . . . . . . . . . . . . . . 93
10.5 FREQUENCY DISTRIBUTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . 94
10.5.1 GROUPED DATA: Tabular presentation of data . . . . . . . . . . . . . 94
10.5.2 Some terminologies: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
10.5.3 Relative Frequency Distributions . . . . . . . . . . . . . . . . . . . . . . 96
10.5.4 General Rules for Organizing Data into Groups . . . . . . . . . . . . . . 98
10.6 Normal distribution curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
10.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
1 Matrices and determinant

Definition: A matrix is a rectangular array of elements of a field on which it is defined. Ele-
ments in a matrix are called entries. A matrix is made by rows and columns and any entry of
a matrix is defined by the row and a column in which it is allocated.
Examples:
 
1 2 3
A = 4 4 6 , (1)
7 8 9

B = 1 2 3 , (2)
 
1
C = 4 , (3)
7
 
1 0 0
D = 0 4 0 , (4)
0 0 9
 
1 2 3 0
E = 4 4 6 10 , (5)
7 8 9 23
 
1 0 9
UT = 0 4 7 , (6)
0 0 9
 
1 0 0
LT = 78 4 0 and (7)
9 6 9
 
1 2 3
4 4 6
A = 
7 8 9 .
 (8)
4 4 6
are some of the examples of matrices. We may represent a matrix as follows:

 
a11 a12 a13 ... a1n
 a21 a22 a23 ... a2n 
A =  .. ..  . (9)
 
.. .. . .
 . . . . . 
am1 am2 am3 ... amn
The element aij is allocated in ith row and in j th column; for example a32 is the entry in 3rd
row and in 2nd column. The order of a matrix is determined by its number of rows and the
number of columns.
1.1 Types of matrices

There exists main categories of matrices:
• Square matrices: when the number of rows is the same as the number of columns for
example the matrix in Equation (1).
• Rectangular matrices: when the number of rows is different from the number of columns
for example the matrix in Equations (8) and (5).
We may also distinguish row matrices and column matrices: a matrix made by a single row
is called a row matrix and a matrix made by a single column is called a column matrix.
The matrices given by Equations (3) and (2) are examples of column matrix and row matrix
respectively.
1.1.1 Some special types of square matrices

There exists different types of square matrices, let us describe some of them here:
• Diagonal matrix: When all entries above and below the leading diagonal are zeros. The
matrix given by Equation (4) is the example of a diagonal matrix.
• Lower triangular matrix: when all entries above the reading diagonal are zeros. The
matrix given by the Equation (7) is the example of the lower triangular matrix.
• Upper triangular matrix: when all entries below the reading diagonal are zeros.The ma-
trix given by the Equation (6) is the example of the lower triangular matrix.
• Identity matrix: All entries are zeros except the entries on the reading diagonal which
are all one (1):
 
1 0 0 0
0 1 0 0
Example 
0
.
0 1 0
0 0 0 1
1.1.2 Operation with matrices

1.1.3 Equality
Two matrices are equal if and only if the order of the matrices are the same and the corre-
sponding elements of the matrices are the same. Example: Find the value of x, y and z if the
matrices A and B are equal.
 
x−1 0 0 9
 0 1 z 0
A =   15
 and (10)
0 1 0
0 52 0 y
 
5 0 0 9
0 1 8 0
B =  15 0 1 0  .
 (11)
0 52 0 20
Solution: Since A and B have the same order 4, we have the following: x = 6, z = 8 and y = 20.
1.1.4 Addition
Order of the matrices must be the same. Add corresponding elements together. Note that
matrix addition is commutative and matrix addition is associative. ie
A + B = B + A commutativity and
(A + B) + C = A + (B + C) associativity.
1.1.5 Subtraction
The order of the matrices must be the same then subtract corresponding elements. Note that
matrix subtraction is not commutative (neither is subtraction of real numbers) and matrix
subtraction is not associative (neither is subtraction of real numbers)
1.1.6 Scalar Multiplication

A scalar is a number, not a matrix. The matrix can be any order. Multiply all elements in the
matrix by the scalar. Note that the scalar multiplication is commutative and associative.
examples: Let A and B be two matrices, calculate the following:
1. A + B,
2. A − B and
3. A + 3B.
   
2 3 4 1 3 1
where A = 0 2 6 and A = 0 2 6
6 3 8 6 3 1
Solution:
 
3 6 5
A + B =  0 4 12  ,
12 6 9
 
1 0 3
A − B =  0 0 0 ,
0 0 7
 
5 12 7
A + 3B =  0 8 24  . (12)
24 12 11
1. Commutativity of Addition A + B = B + A,
2. Associativity of Addition A + (B + C) = (A + B) + C,
3. Associativity of Scalar Multiplication (cd)A = c(dA),
4. Scalar Identity 1A = A(1) = A,
5. Distributive c (A + B) = cA + cB,
6. Distributive (c + d)A = cA + dA,
7. Additive Identity A + O = O + A = A,
8. Associativity of Multiplication A(BC) = (AB)C,

9. Left Distributive A(B + C) = AB + AC,
10. Right Distributive (A + B)C = AC + BC,
11. Scalar Associativity Commutativity c(AB) = (cA)B = A(cB) = (AB)c,
12. Multiplicative Identity IA = AI = A.
1.1.7 Matrix Multiplication

The number of columns in the first matrix must be equal to the number of rows in the second
matrix. That is, the inner dimensions must be the same.
Am×n × Bn×p = Cm×p .
The order of the product is the number of rows in the first matrix by the number of columns
in the second matrix. That is, the dimensions of the product are the outer dimensions. Since
the number of columns in the first matrix is equal to the number of rows in the second matrix,
you can pair up entries. Each element in row i from the first matrix is paired up with an
element in column j from the second matrix. The element in row i, column j, of the product
is formed by multiplying these paired elements and summing them. Each element in the prod-
uct is the sum of the products of the elements from row i of the first matrix and column j of
the second matrix. There will be n products which are summed for each element in the product.
Rxamples: Let A and B be two matrices, calculate the following:

1. AB,
2. A2 and
   
2 3 4 1 3 1
where A = 0 2 6 and A = 0 2 6.
6 3 8 6 3 1
Solution:
   
2(1) + 2(0) + 4(6) 2(3) + 3(2) + 4(3) 2(1) + 3(6) + 4(1) 26 24 24
1. AB =  36 22 18 
 
AB = 
0(1) + 2(0) + 6(6) 0(3) + 2(2) + 6(3) 0(1) + 2(6) + 6(1)

6(1) + 3(0) + 6(6) 6(3) + 3(2) + 8(3) 6(1) + 3(6) + 8(1) 54 48 32
    
2 3 4 2 3 4 28 24 58
2.
    
A2 = 
0 2 6
 0
 2 6 =  36
 22 60 

6 3 8 6 3 8 60 48 106
1.1.8 Transpose of a matrix

A matrix which is formed by turning all the rows of a given matrix into columns and vice-
versa. The transpose
 matrix A iswritten AT . For example the transpose of the matrix
of 
1 3 1 1 0 6
A = 0 2 6 is AT =  3 2 3  .
6 3 1  1 6  1  
2 0 6 2 3 4
The transpose of B =  3 2 3  is B T =  0 2 6  .
4 6 8 6 3 8
1.1.9 Inverse of a Matrix

For a square matrix A the inverse is written A−1 . When A is multiplied by A−1 the result is
the identity matrix I. Non-square matrices do not have inverses. Note: Not all square matrices
have inverses. A square matrix which has an inverse is called invertible or non-singular, and a
square matrix without an inverse is called non-invertible or singular. For matrix A, its inverse
is A−1 since AA−1 = A−1 A = I.
For
 example     
1 3 1 − 15 0 1
5
1 0 0
 and
     
 0 2 6 × 9 1 3 = 0
   20 − 16 − 40   1 0 
3 3 1
6 3 1 − 20 0 0 1
   16 40   
1 1
−5 0 5
1 3 1 1 0 0
     
9 1 3 
 20 − 16 − 40  ×  0 2 6
= 0 1 .
0 
 
 
3 3 1
− 20 16 40
6 3 1 0
0 1
 
1 3 1
Thus the matrix  0 2 6  is invertible matrix.
6 3 1
1.1.10 Properties of transpose and inverse of matrix.

• (A + B)T = AT + B T ,
• (AB)T = B T AT ,
• (AB)−1 = B −1 A−1 ,
• If AT = A, then A is called symetric matrix,
• If AT = −A, then A is called skey-symmetric matrix,
• If AT A = I, then A is called unitary matrix,
1.2 Determinant of a matrix

The determinant of a square matrix is special scalar assigned to a matrix denoted by det(A) or
|A|.
1.2.1 Determinant of order two

a11 a12
a21 a22 = a11 a22 − a21 a12 .

Example: Evaluate the following determinant.

10 10

2 −6
Solution:

10 10
2 −6 = 10(−6) − 2(10) = −80

1.2.2 Determinant of order three

a11 a12 a13

a21 a22 a23 = a11 a22 a33 + a12 a23 a31 + a13 a32 a21

a31 a32 a33
− (a31 a22 a13 + a21 a12 a33 + a11 a23 a32 ).
Example: Evaluate the following determinant.

10 10 −10

2 −6 12

−4 −12 6
Solution:

10 10 −10

−6 12 2 12 2 −6
2 −6 12 = 10 − 10 − 10

−12 6 −4 6 −4 −12
−4 −12 6
= 960
1.2.3 Properties of determinant

1. If a matrix has a row or a column of zeros then its determinant is zero.
2. If a matrix has two identical rows or a columns then, its determinant is zero.
3. If a matrix is triangular then, its determinant is the product of entries on the leading
diagonal.
4. If two rows or columns of a matrix are interchanged then, its determinant changes the
sign.
5. If a row or column of a matrix is multiplied by a constant then,the new determinant is a

multiple of the first determinant.
6. If a multiple of a row or column of a matrix were added to another row or column then,
its determinant is not changed.
1.2.4 Calculation of the Inverse of matrix

We saw that for a given matrix A, its inverse is A−1 such that AA−1 = A−1 A = I. Here we are
showing the techniques of finding such matrix A−1 .
1.2.5 Minor and cofactors

Consider a square matrix A. Let Mij be the matrix obtained by cancelling the ith row and
j th column then its determinant is said to be minor of the entry aij . The term (−1)i+j Mij is
called the cofactor of the entry above. The matrix made by the minors with their corresponding
respective positions is called Minor matrix and the matrix made by the cofactors with their
corresponding respective positions is called cofactor matrix (Cof (A)). The transpose of the
cofactor matrix is called Ad-joint matrix of the matrix A. ie Adj(A) = [Cof (A)]T .
1.2.6 Inverse of a square matrix

If the determinant of a square matrix A is different from zero, the we calculate the imverse of
A as follows:
1
A−1 = Adj(A) (13)
det(A)
1.3 System of linear equations

The system of linear equation is written as follows:
a11 x1 + a12 x2 + a13 x3 + ... + a1n xn = d1 ,

a21 x1 + a22 x2 + a23 x3 + ... + a2n xn = d2 ,
.. . .. .
. + .. + ... + ... . = ..
am1 x1 + am2 x2 + am3 x3 + ... + amn xn = dn .
The system above can be written as the product of matrices as follows:

    
a11 a12 a13 ... a1n x1 d1
 a21 a22 a23 ... a2n   x2   d2 
..   ..  =  ..  . (14)
    
 .. .. .. ...
 . . . .  .   . 
am1 am2 am3 ... amn xn dn
The solution is the vector

 
x1
 x2 
X =  ..  . (15)
 
.
xn
, Which can be given by

   −1  
x1 a11 a12 a13 ... a1n d1
 x2   a21 a22 a23 ... a2n   d2 
 
X =  ..  =  .. (16)
   
.. .. ... ..   .. 
.  . . . .   .
xn am1 am2 am3 ... amn dn
if the matrix
 
a11 a12 a13 ... a1n
 a21 a22 a23 ... a2n 
=  .. ..  . (17)
 
.. .. ...
 . . . . 
am1 am2 am3 ... amn
is invertible.
Example: Consider the following diagram find the current through each junction.
Solution:
The above system can be modelled as follows by using Kirchoff’s laws.
13I − 13I1 − 11I3 = 54

−2I + 16I1 − 32I3 = 0
−11I + 14I1 + 46I3 = 0
This can be written as follows:

    
13 −13 −11 I 54
 −2 16 −32 I1  =  0 
−11 14 46 I2 0
Thus
   −1  
I 13 −13 −11 54
I1  =  −2 16 −32  0 
I −11 14 46 0
 2  
I 8
I1  = 3
I2 1
Then it remains to calculate the current through each junction. Please try to do it!!!!
Remark

a11 a12 a13 ... a1n

a21 a22 a23 ... a2n
If .. .. 6= 0, then we have a unique solution. Thus the system is said

.. .. ..
. . . . .

am1 am2 am3 ... amn
to be consistent. If this determinant is zero we may have two cases to discuss; either there is
infinitely many solution or no solution. this have been discussed in class.
1.4 Diagonalization
Diagonaization is a process of transforming a square matrix A into a similar diagonal matrix
D. We say that two matrices are similar if there exists an invertible matrix B such that
A = B −1 AB. To find this matrix B, we need the concepts of eigenvalues and eigenvector of a
given square matrix.
1.5 Eigen-system and Characteristic Polynomial of a Square Matrix

Let A be a square matrix defined on the set of complex numbers. The relation AX = λX where
X is column vector and λ is a scalar, gives rise of the concepts of eigenvalues and eigenvectors.
Definition 1
If X is a non zero vector, then a such scalar λ in the relation above is called an eigenvalue of
A corresponding to X.
Definition 2
A such non zeros vector X in the relation above is called an eigenvector of A corresponding to
the eigenvalue λ.
Definition 3
The association of eigenvalues and eigenvectors is called the eigen-system of A .
Definition 4
The determinant p(λ) = det(A − λI) is called characteristic polynomial of matrix A. Here I is
an identity matrix with the same dimension as the matrix A.
Remark
1. The degree of the characteristic polynomial is exactly the order of matrix A.
2. All eigenvalues of the matrix A are the roots of the characteristic polynomial.
3. The geometric multiplicity of an eigenvalue λ is the dimension of an eigenspace Eλ cor-

responding to that eigenvalue.
4. The algebraic multiplicity of an eigenvalue is the number of times an eigenvalue is repeated

in roots of characteristic polynomial.
5. The diagonalisation can be used to calculate the power of matrices: An = P Dn P −1
Example
 
1 0 −1
Consider the following matrix A = 1 2 1 
2 2 3
• Calculate the determinant of matrix A.
• Find all eigenvalues and eigenvectors of the matrix A.
• Find a matrix P which transform the matrix A into a diagonal form.
• Hence find A4 , A2 and A−1 .
Solution

1 0 −1

• Calculate the determinant of matrix A. 1 2 1 = 1(6 − 2) − 1(2 − 4) = 6
2 2 3
• Find all eigenvalues and eigenvectors of the matrix A. The characteristic polynomial is
given by (x−3)(x−2)(x−1) hence the eigenvectors are given by the root of characteristic
polynomial ie (x − 3)(x − 2)(x − 1) = 0 implies x = 3, x = 2 and x = 1 are the eigenvalues
of the matrix A.  
1 1 1
The eigenvectors are compacted in the following matrix as its columns  −1 − 12 −1 .
−2 −1 0
• Find
 a matrix P which transform the matrix A
 into a diagonal
 form. The matrix P =
1 1 1 3 0 0
 −1 − 1 −1  can transform A into D =  0 2 0  such that
2
−2 −1 0 0 0 1
−1
D = P AP .
4 2 −1
• Hence find
 A , A and1 A .
−1 −1 − 2
−1
P =  2 2 0 , thus
1
0 −1 2
D4 P −1
A4 = P 
−49 −50 −40
A4 =  65 66 40 
130 130  81 
−1 −2 −4
A2 = P D2 P −1 =  5 6 4 
10 2 10 1 9 1 
3
−3 3
−1 −1 −1 1 5 1 
A = PD P =  −6 6
− 3
1 1 1
−3 −3 3
1.6 Exercises
1. Expand the following determinants:

2 −3 4

(a) 5 1 −6
−7 8 −9

5 0 7

(b) 8 −6 −4
2 3 9
2. Show the following equalities:

b − c c − a a − b

(a) c − a a − b b − c = 0
a − b b − c c − a

x+y x x

(b) 5x + 4y 4x 2x = x3

10x + 8y 8x 3x

x y z

(c) x2 y 2 z 2 = xyz(x − y)(y − z)(z − x)
x3 y 3 z 3

a + x y z

(d) x a+y z = a2 (a + x + y + z)
x y a + z
3. Determine the values of a and b if the system of linear equation below:

    
3 −2 1 x b
5 −8 9 y  =  3  ,
2 1 a z −1
(a) has a unique solution

(b) has no solution
(c) has infinitely many solutions
4. Matrices A and B are such that:

2 1
3A − 2B = ,
−2 −1

−1 2
−4A + B = .
−4 3
Find A and B.
5. Given

x y x 6 4 x+y
3 = + .
z w −1 2w z+w 3
Find x, y, z and w
2 Fourier Series
2.1 Theory
2.2 Exercises
2.3 Some useful integrals

2.4 Useful trigonometric results

2.5 Fourier series with period L 6= 2π

2.6 Full worked Solutions

3 Laplace Transform
Let f (t) be a function defined for all positive values of t, then
Z ∞
F (s) = e−st f (t)dt (18)
0
provided the integral exists, is called the Laplace transform of f (t). It is denoted as
Z ∞
L[f (t)] = F (s) = e−st f (t)dt
0
3.1 Important Formulae
1
L[1] = (19)
s
1
L[eat ] = (20)
s−a
a
L[sin(at)] = (21)
s + a2
2
s
L[cos(at)] = (22)
s + a2
2
a
L[sinh(at)] = (23)
s − a2
2
s
L[cosh(at)] = (24)
s 2 − a2
n!
L[tn ] = n+1
(25)
s
3.2 Properties of Laplace Transform

1. L[f (t) ± g(t)] = L[f (t)] ± L[g(t)]
2. First shifting Theorem:

If L[f (t)] = F (s), then L[eat f (t)] = F (s − a),
3. Laplace transform of derivative:

L[f 0 (t)] = sF (s) − f (0), where L[f (t)] = F (s),
4. Laplace
Rt transform of integral:
L[ 0 f (t)dt] = F (s)
s
, where L[f (t)] = F (s),
5. Dividing by t Theorem: R∞
If L[f (t)] = F (s), then L[ 1t f (t)] = s F (s),
6. Laplace transform of unit step function:

L[U (t − a)] = 1s e−as ,
7. Second shifting Theorem:

If L[f (t)] = F (s), then L[U (t − a)f (t − a)] = e−as F (s),
8. Theorem: L[U (t − a)f (t)] = e−as L[f (t + a)],

4 Application of Laplace transform to electricity

4.1 Introduction
The Laplace transforms and inverse Laplace transforms are very important in solving ODE
and PDE. Most of the time the electrical circuit are analysed and presented by using Ordinary
differential equations. Sometimes solving systems of ODEs require much calculations which
is time consuming. It may be even hard to solve depending on the nature of the differential
equations. But in contrast the Laplace transform can simplify the calculation and transform
the ODE into the algebraic equations and then the solution of ODE can be obtained by the
inverse Laplace transform.
4.2 Some formulae

1. VR = RI
1
2. VC = C
q
3. VL = L dI
dt
dq
4. I = dt
4.3 Illustrations
4.3.1 Example One
Suppose the current and charge on the capacitor in the circuit of Figure: 1 are zero at time
zero. Find the output voltage response to an input voltage modelled by δ(t) volts.
Figure 1: Example One
Solution
The output voltage is Eout (t) = q(t)/C, so we will determine q(t). We have also Ein (t) = δ(t).
By Kirchhoffs voltage law, we have the following:
1
LI 0 + RI + q = Ein (t) i.e (26)
C
1
LI 0 + RI + q = δ(t) (27)
C
This can be written as follows:
d2 q dq 1
2
+ R + q = δ(t) (28)
dt dt C
Assume that q(0) = q(0) = 0, the we apply on both side the Laplace transform, we get the
following:
s2 q(s) + 10sq(s) + 100q(s) = 1. (29)
Which give that

1 1
q(s) = = (30)
s2 + 10s + 100 (s + 5)2 + 75
Thus q(t) is given by the inverse Laplace transform. ie,

1 √
q(t) = √ e−5t sin(5 3t). (31)
5 3
20 −5t
√
Thus the output voltage is Eout (t) = q(t)/C = √
3
e sin(5 3t).
Figure 2: Example one simulation
4.3.2 Example Two

Suppose the switch is closed at time zero See Figure:3. Assume that both loop currents and the
charges on the capacitors are initially zero, and apply Kirchhoffs laws to each loop and solve
for the current in each loop.
Figure 3: Example Two

Solution
40I + 120(q − q1 ) = 10, (32)

60I1 + 120q1 − 120(q − q1 ) = 0. (33)
Rt
We know that q(t) = 0
I(τ )dτ − q(0).Thus (52) and (53) become as follows:
Z t
40I + 120[ (I(τ ) − I1 (τ )) dτ ] = 10,
0
Z t Z t
60I1 + 240 I1 (τ )dτ − 120[ I(τ )dτ ] = 0.
0 0
Upon simplification we get the following:

Z t Z t
4I + 12 I(τ )dτ − 12 I1 (τ )dτ ] = 1, (34)
0 0
Z t Z t
I1 + 4 I1 (τ )dτ − 2 I(τ )dτ = 0. (35)
0 0
Apply the Laplace transform on both sides of equations (54) and (55) we obtain the following:
12 12 1
4I(s) + I(s) − I1 (s) = , (36)
s s s
4 2
Is (s) + I1 (s) − I(s) = 0. (37)
s s
Simplification give us the following:
1
(s + 3)I(s) − 3I1 (s) = , (38)
4
−2I(s) + (s + 4)I1 (s) = 0. (39)
Up on solving for I and I1 , we get the following:

s+4
I(s) = I1 (s) and (40)
2
1
I1 (s) = 2
, (41)
2(s + 7s + 6)
which give
s+4
I(s) = and
4(s2 + 7s + 6)
1
I1 (s) = 2
. (42)
2(s + 7s + 6)
Using simple partial fraction we get the following:
1 1 1 1
I1 (s) = + and
10 s + 1 10 s + 6
3 1 1 1
I(s) = − . (43)
20 s + 1 10 s + 6
Using the inverse Laplace we get the following:
1 −t 1
I1 (t) = e − e−6t and (44)
10 10
3 −t 1
I(t) = e + e−6t . (45)
20 10
1 −t 1 1
q1 = − e + e−6t + and (46)
10 60 12
3 1 1
q = − e−t − e−6t +
20 60 6
I2 = I − I1 and q2 = q − q1 . (47)
1 −t 1 −6 t
I2 (t) = e + e and (48)
20 5
1 1 −6 t 1
q2 (t) = − e−t − e + . (49)
20 30 12
Figure 4: Example Two simulation
Figure 5: Example Two simulation:Charges

5 Appendices
5.1 Exercises
1. Calculate the determinant of the following matrices.
(a)
 
2 2 5 7
0 5 7 0
  , 2 marks (50)
0 0 2 5
0 0 0 10
(b)
 
2 2 5 7
0 5 7 0

7
 (51)
0 2 5
14 0 4 10
2. Matrices A and B are such that:

2 1
3A − 2B = ,
−2 −1

−1 2
−4A + B = .
−4 3
Find A and B.

x y z

3. x2 y 2 z 2 = xyz(x − y)(y − z)(z − x)
x3 y 3 z3
5.2 Some questions

Q1 (a) Solve the following Differential equations:
dy cos(x)
i. dx
= cos(y)
.
00 0
ii. y + 5y + 6y = 0.
iii. y 00 + y 0 + y = 2e2t .
(b) Find the Fourier series representation of f (t) = t2 on the interval −π < t < π.
Q2 Find the Laplace transform of the following function:
(a) f (t) = cos(2t) + sin(3t) + e5t .

(b) f (t) = cosh(2t) + sinh(3t) + te5t .
(c) f (t) = t2 + sin(t − 5)u(t − 5) + δ(t − 3).
Note: u(t − 5) is unit step function and δ(t − 3) is delta function.

 
1 0 −1
Q3 Find a matrix P which transform the matrix A = 1 2 1  into a diagonal matrix.
2 2 3
Q5 (a) Find matrices A and B that satisfy the following equations:

2 1
3A − 2B = and
−2 −1
−1 2
−4A + B = .
−4 3
(b) Verify the following
equality
x y z
2 2 2
x y z = xyz(x − y)(y − z)(z − x).
3 3 3
x y z
Q6 (a) Calculate the inverse Laplace transform of the following functions of s:
s+4
i. F (s) = s2 +4
+ 5.
s
ii. F (s) = s2 +4
+ s23+9 .
e−5s
iii. F (s) = + e−3s + s23 .
s2 +1
(b) Find the differential equation whose solution is y = A cos(2x) + B sin(2x), where A
and B are arbitrary constants.
5.3 Solutions
Q1 (a) Solve the following Differential equations:
i. dy
dx
= cos(x)
cos(y)
.
cos(y)dy = cos(x)dx,
sin(y) = sin(x) + C.
ii. y 00 + 5y 0 + 6y = 0.
The characteristic equation is given by: m2 + 5m + 6 = 0, this implies that
m = −2 or m = −3. 1 mark
Thus the solution is the following: y = Ae−2x + Be−3x .
iii. y 00 + y 0 + y = 2e2t .
The characteristic
√ equation is√ given by: m2 + m + 1 = 0, this implies that
m = − 2 + i 2 or m = − 21 − i 23 .
1 3
1
√ √
Thus the homogeneous solution is as follows: yh = e− 2 x A cos( 23 )t + B sin( 23 )t .
The particular is of this form yp = Ce2t , this give the following: C(7) = 2 thus
2
C= 7
thus yp = 27 e2t . Therefore
the general solution is a follows: y = yp + yh =
1
√ √
e− 2 x A cos( 2
3
)t + B sin( 2
3
)t + 72 e2t .
(b) Find the Fourier series representation of f (t) = t2 on the interval −π < t < π.
This function is even, thus we need a0 and an .
1 π 2 π2
Z
a0 = t dt = ,
π 0 3
Z π
1 2
an = t2 cos(nt)dt = (−1)n .
π 0 n2
So ∞
π2 X 2
f (t) = + 2
(−1)n cos(nt).
6 n=0
n
Q2 Find the Laplace transform of the following function:
(a) f (t) = cos(2t) + sin(3t) + e5t .

s 3 1
F (s) = + +
s2 + 4 s2 + 9 s − 5
(b) f (t) = cosh(2t) + sinh(3t) + te5t .

s 3 1
F (s) = + 2 +
s2 − 4 s − 9 (s − 5)2
(c) f (t) = t2 + sin(t − 5)u(t − 5) + δ(t − 3).
2 e−5s
F (s) = + + e−3s
s3 s2 + 1
Note: u(t − 5) is unit step function and δ(t − 3) is delta function.

 
1 0 −1
Q3 Find a matrix P which transform the matrix A = 1 2 1  into a diagonal matrix.
2 2 3
We first find eigenvalues of the matrix A: we get λ1 = 3, λ2 = 2 and λ3 = 1.
For each eigenvalues we calculate the following eigenvector and then we write the matrix
P as a matrix with columns which are the obtained eigenvectors, thus we get the following:
 
1 1 1
 −1 − 1 −1  .
2
−2 −1 0
Thus  
3 0 0
P −1 AP = 0 2 0 .
0 0 1
Q5 (a) Find matrices

A and B that satisfy the following equations:
2 1
3A − 2B = and
−2 −1
−1 2
−4A + B = .
−4 3
0 −1
A=
2 −1
and
1− −2
B=
4 −1
(b) Verify the following
equality
x y z
2 2 2
x y z = xyz(x − y)(y − z)(z − x).
3 3 3
x y z
Q6 (a) Calculate the inverse Laplace transform of the following functions of s:

s+4
i. F (s) = s2 +4
+ 5.
f (t) = cos(2t) + 2 sin(2t) + 5δ(t).
s 3
ii. F (s) = s2 +4
+ s2 +9
.
f (t) = cos(2t) + sin(3t).
e−5s
iii. F (s) = s2 +1
+ e−3s + 2
s3
.
f (t) = t2 + sin(t − 5)u(t − 5) + δ(t − 3).
(b) Find the differential equation whose solution is y = A cos(2x) + B sin(2x), where A
and B are arbitrary constants.
y = A cos(2x) + B sin(2x)
y 0 = 2B cos(2x) − 2A sin(2x)
y 00 = −4A cos(2x) − 4B sin(2x)
Thus y 00 = −4y is the required ODE
6 Differential Equation
6.1 Some definitions
Definition 1: An equation which involves differential coefficients is called a differential Equa-
tion.
Examples
1.
dy 1 + x2
= ,
dx 1 − y2
2.
d2 y dy
= 2 − 8y,
dx2 dx
3. 2 i
d2 y h

dy 3/2
α 2 = 1+ ,
dx dx
4.
∂u ∂y
x +y = nu.
∂x ∂x
We distinguish two types of differential equations: Ordinary differential differential equations
and Partial differential differential equations; ie ODEs and PDEs.
Definition 2: An ODE is a differential involving derivatives with respect to a single indepen-
dent variable.
Definition 3: An PDE is a differential involving partial derivatives with respect to more than
one independent variables.

Definition 4: The order of an ODE is the highest derivative present in the equation. The
degree of an ODE is the degree of the highest derivative after removing the radical sign and
fraction.
Examples
Consider the following Differential equations:
1.
d2 q dq 1
L 2
+ R + q = E sin ωt
dt dt C
2. 2
d2 q

dq 1
cos t 2 + sin t + q = E sin ωt
dt dt C
3. 2 i
d2 y h

dy 3/2
α 2 = 1+ ,
dx dx
The order of all the above equations is 2. The degree of equation 1 and 2 is 1. The degree of
equation 3 is 2.
6.2 Formulation of Differential equation

The DEs can be formed by eliminating the arbitrary constants after some differentiations of
ordinary equations. The DEs can also be formed by modelling of some phenomenons like phys-
ical, chemical, financial etc.
Examples: Write down the differential equation for the following cases:
1.
y = Ax + A2
2.
y = A cos x + B sin x
3.
y 2 = Ax2 + Bx + C
Solution:
1.
y = Ax + A2
Differentiate once to get y 0 = A,then twice differentiation give y 00 = 0.
2.
y = A cos x + B sin x
Differentiate once to get y 0 = −A sin x + B cos x,then twice differentiation give y 00 =
−(A cos x + B sin x). Replace y by its value we get the following y 00 = −y
3.
y 2 = Ax2 + Bx + C
Differentiate three times we get the following:
2yy 0 = 2Ax + B
2y 0 y 0 + 2yy 00 = 2A
2y 00 y 0 + 2y 0 y 00 + 2y 0 y 00 + 2yy 000 = 0
6y 00 y 0 + 2yy 000 = 0
3y 00 y 0 + yy 000 = 0
Exercises
Write the order and degree of the following differential equations:
1. (a)
y 00 + a2 x = 0
(b)
h i3/2
0 2
1 + (y ) = y 00
(c)
x2 (y 00 )3 + y(y 0 )2 + y 4 = 0
2. Give an example of each of the following type of the differential equations:

(a) A linear- differential equation of second order and first degree.
(b) A non linear- differential equation of second degree and second order.
(c) A second order and a third degree differential equation.
3. Obtain the DE of which y 2 = 4a(x + a) is a solution.
4. Obtain the DE of which By 2 + Ax2 = 1 is a solution.
5. Find the DE corresponding to:
(a)
y = ae3x + bex
(b)
y = ex (A cos x + B sin x)
(c)
y = a cos(x + 3)
6.3 Solution of a Differential Equation

Let for example y = A cos x + B sin x, eliminating A and B, we get the differential equation
y”+y = 0. Thus y = A cos x+B sin x is called a solution of the differential equation y”+y = 0.
It is clear that y = A cos x + B sin x contains two arbitrary constants. Thus the umber of
arbitrary constants in the solution is equal to the order of the differential equation. The
equation containing dependent variable y and independent variable x and free from derivatives
which satisfies the differential equation is called the solution of the differential equation or
(primitive)
6.3.1 First order DE by Separation of Variables Method

If a DE can be written in the form
f (y)dy = φ(x)dx,
we say that variables are separable. We get solution by integrating both sides.
Working Rule
• Separate variables as f (y)dy = φ(x)dx.

R R
• Integrate both sides as f (y)dy = φ(x)dx.
• Add an arbitrary constant C on RHS.
Examples:
1. Solve
dy x(2lnx + 1)
=
dx sin y + y cos y
• Separate variables as (sin y + y cos y)dy = x(2lnx + 1)dx.
R R
• Integrate both sides as (sin y + y cos y)dy = x(2lnx + 1)dx ⇒
x2 x2 i
h Z
− cos y + y sin y + cosy = 2 (lnx) − xdx +
2 2
y sin y = x2 ln(x)
• Add an arbitrary constant C ON RHS.
y sin y = x2 ln(x) + C
2. Solve x4 y 0 + x3 y = sec(xy),
x3 (xy 0 + y) = sec(xy). Let u = xy then du
dx
= xy 0 + y.This gives the following: x3 u0 = secu.
• Separate variables as (cos u)du = (x− 3)dx.
• Integrate both sides as (cos u)du = (x− 3)dx ⇒

R R
x−2 x−2
sin u = ⇒ sin xy =
−2 −2
• Add an arbitrary constant C ON RHS.
x−2
sin xy = − +C
2
3. Solve
(2x2 + 3y 2 − 7)xdx = (3x2 + 2y 2 − 8)ydy.
xdx 3x2 + 2y 2 − 8
= 2
ydy 2x + 3y 2 − 7
xdx + ydy 5x2 + 5y 2 − 15

=
xdx − ydy x2 − y 2 − 1
xdx + ydy xdx − ydy
2 2
=5 2
x +y −3 x − y2 − 1
Let
u = x2 + y 2 − 3 and v = x2 − y 2 − 1
du = 2(xdx + ydy) and dv = 2(xdx − ydy)

du dv
=5
u v
ln|u| = 5ln|v|
ln|x2 + y 2 − 3| = lnC(x2 − y 2 − 1)5
|x2 + y 2 − 3| = C(x2 − y 2 − 1)5
6.3.2 Homogeneous Differential Equations

A differential equation of the form y 0 = fg(x,y)
(x,y)
is called homogeneous equation if each term of
f (x, y) and φ(x, y) is of the same degree. For example
dy 3xy + y 2
= 2 .
dx 3x + xy
In such case we put y = v(x)x and y 0 = v + xv 0 . The reduced equation involves v and x only.
The new differential equation can be solved by separation of variables method.
Working Rule
• Put y = vx ⇒ y 0 = v + xv 0
• Separate variables
• Integrate both sides
• Put y = v/x and add C and then simplify.
Example 1: Solve
(2xy + x2 )y 0 = 3y 2 + 2xy
3y 2 + 2xy
y0 = .
2xy + x2
This is homogeneous DE.
• Put y = vx ⇒ y 0 = v + xv 0
0 3v 2 x2 + 2x2 v
xv + v = .
2x2 v + x2
dv v2 + v
x =
dx 2v + 1
• Separate variables
dx 2v + 1
= 2 dv
x v +v
• Integrate both sides Z Z
dx 2v + 1
= dv
x v2 + v
ln|x| = ln|v 2 + v|
• Put y = v/x and add C and then simplify.

y y
ln|x|C = ln|( )2 + |
x x
Example 2:
(3xy + y 2 )
y0 =
3x2 + xy
Let y = vx then y 0 = v + xv,.
3x2 v + v 2 x2
v + xv 0 =
3x2 + x2 v
3v + v 2
v + xv =
v+3
0
v + xv = v
xv 0 = 0
y = cx
Exercises Solve the following differential equations:
1.
dy y
= + x sin(y/x)
dx x
2.
(y 2 − xy)dx + x2 dy = 0
3.
(x2 − y 2 )dx + 2xydy = 0
4.
dy
x(y − x) = y(y + x)
dx
5.
x(x − y)dy + y 2 dx = 0
6.
dy x − 2y
+
dx 2x − y
7.
dy 3xy + y 2
=
dx 3x2
6.3.3 Linear Differential Equations

dy
A differential equation of the form dx + P y = Q(x) is called a linear differential equation, where
P and Q, are functions of xR (but not of y) or constants. In such case, multiply both side of
dy
dx
+ P y = Q(x) by I(x) = e P dx to get
P dx dy
R R R
P dx P dx
e +e P y = Q(x)e .
dx
d(yI(x))
= Q(x)I(x)
dx
Z
yI(x) = QI(x)dx + C
Z
1 C
y= QI(x)dx +
I(x) I(x)
R
P dx
The value I(x) = e is called integrating factor .
Working Rule
• Step 1: Convert the given equation to the standard form of linear differential equation ie
dy
dx
+ P y = Q(x).
R
• Step 2: Find the integrating factor I(x) = e P dx .
1 C
R
• Step 3: Solution is y = I(x) QI(x)dx + I(x) .
dy
Example 1: Solve (x + 1) dx = y + ex (x + 1)2
• Step 1: Convert the given equation to the standard form of linear differential equation ie
dy 1 dy 1
= y + ex (x + 1) ⇒ − y + ex (x + 1)
dx x+1 dx x + 1
.
−1
R
dx 1
• Step 2: Find the integrating factor I(x) = e x+1 = x+1
.
• Step 3: Solution is
Z
1 x
y = (x + 1) e (x + 1)dx + C(x + 1) ⇒ y = (x + 1)ex + C(x + 1).
x+1
dy
Example 2: Solve (x3 − x) dx − (3x2 − 1)y = x5 − 2x3 + x
sol: y = (x3 − x) ln x + C(x3 − x)

dy
Example 3: Solve sin(x) dx + 2y = tan(x/2)
sol: y tan2 (x/2) = tan3 (x/2)
6.3.4 Bernoulli’s Equation

dy
The equation of the form dx + P y = Q(x)y n , where P and Q, are functions of x or constants
dy
and n in a positive integer can be reduced to the linear differential equation dx + P y = Q(x)
as follows:
• Divide both sides by y n to obtain
dy
y −n + P y 1−n = Q(x)
dx
• Put
dz dy
z = y 1−n ⇒ = (1 − n) y 1−n
dx dx
dy 1 dz
y −n =
dx 1 − n dx
Using these information we get the following:
dz
+ (1 − n)P z = (1 − n)Q
dx
If we let q = (1−n)Q and p = (1−n)P , then we obtain the following differential equation:
dz
+ pz = q (52)
dx
We can solve equation (52) by using integrating factor since it is linear.
• Write y = z 1/(1−n)
Example 1 Solve x2 dy + y(x + y)dx = 0, by using Bernoulli’s method.

dy
Solution: The equation can be written as follows: dx + x1 y = −1
x2
y 2 . Thus the solution can be
written as
Z
1 R 1
R 1 1
z= qI(x)dx + C/I(x) with I(x) = e (1−n) x dx = e− x dx =
I(x) x
1 1
q = (1 − n)Q = −(− 2
)= 2
x x
Hence
x−2 2Cx2 − 1
Z
z=x x−3 dx + Cx ⇒ z = x(− ) + Cx =
2 2x
Therefore
2x
y = z 1−n = z −1 =
2Cx2 − 1
Example 2 Solve If possible using Bernoulli’s method solve the following differential equation:
dy
x + ylny = xyex
dx
This form is special because of the term ylny. But it can be written as follows:
dy 1
+ ylny = yex (53)
dx x
dw 1 dy
Let w = lny ⇒ dx
= y dx
. Put this in equation (53) we get the following:
dw 1 dw 1
y + yw = ex y ⇒ + w = ex .
dx x dx x
This equation is not in Bernoulli’s form but it is linear one. Thus its solution is given by
(x − 1)ex + C (x−1)ex +C
.
w= ⇒ y=e x
x
Example 3 Using Bernoulli’s method solve the following differential equation:
dr
r sin θ − cos θ = r2
dθ
Solution:
1
r=
sin θ + cos θ
Example 4 solve the following differential equation:
dθ tan θ
− = (1 + r)er sec θ
dr 1 + r
Solution:
sin(θ) = (1 + r)(er + C)
6.3.5 Exact Differential Equation

An exact differential equation is formed by direct differentiation it primitive (solution) without
any other process. It has the following form:
M (x, y)dx + N (x, y)dy = 0.
This form above is said to an exact differential equation if the following hold:
∂M ∂N
=
∂y ∂x
Where ∂M
∂y
denotes the differential coefficients of M with respect to y keeping x to be constant
∂N
and ∂x is the differential coefficients of N with respect to x keeping y to be constant.
Working rule
• Step 1: Integrate M with respect to x keeping y constant.
• Step 2: Integrate terms of N which do not contain x with respect to y keeping x constant.
• Step 3: Result of step 1 +result of step 2 =constant
Example: Solve the following exact differential equation:
(5x4 + 3x2 y 2 − 2xy 3 )dx + (2x3 y − 3x2 y 2 − 5y 4 )dy = 0
M = 5x4 + 3x2 y 2 − 2xy 3

N = 2x3 y − 3x2 y 2 − 5y 4
∂M
= 6x2 y − 6xy 2
∂y
∂N
= 6x2 y − 6xy 2
∂x
Hence
∂M ∂N
=
∂y ∂x
Thus it is an exact differential equation.
Z Z
M dx + (terms of N that are not containing x)dy = c
x5 + x3 y 2 − x2 y 3 − y 5 = c
6.4 Linear Second order Differential Equation with Constant Coef-

ficients
The general form of the linear differential equation of second order is
ay 00 + by 0 + cy = R(x).
Where a, b and c are constants and a 6= 0. R(x) is a function of x.

If R(x) = 0, then ay 00 + by 0 + cy = 0 is said to a homogeneous and its solution is called homo-
geneous solution.
6.4.1 Solution of homogeneous second order DE with Constant Coefficients

We let y = emx , putting this in ay 00 + by 0 + cy = 0, we get the following emx (am2 + bm + c) = 0.
Since emx =6= 0 we have
am2 + bm + c = 0
Thus m is given by the roots of the above equation which is called characteristic equation. We
distinguish three cases:
1. If ∆ = b2 − 4ac > 0, then the solution is given by y = Aem1 x + Bem2 x where A and B are
arbitrary constants and m1 , m2 are roots of the characteristic equation.
2. If ∆ = b2 − 4ac = 0, then the solution is given by y = (A + Bx)emx where A and B are

arbitrary constants and m is the double of the characteristic equation.
3. If ∆ = b2 − 4ac < 0, then the solution is given by y√ = (A cos βx + B sin βx)eαx where
−∆
A and B are arbitrary constants and α = −b 2a
, β = 2a are roots of the characteristic
equation.
6.4.2 Solution of non-homogeneous second order DE with Constant Coefficients

Variation of parameter method
Let y1 and y2 be the solutions of the homogeneous second order DE with Constant Coefficients.
We are interested in finding a particular solution of ay 00 + by 0 + cy = R(x).
It is thus let to be yp = u(x)y1 + v(x)y2 , where u(x) and v(x) are function we need to find. To
find these functions we proceed as follows:
yp0 = u0 (x)y1 + v 0 (x)y2 + u(x)y10 + v(x)y20
Let again
u0 (x)y1 + v 0 (x)y2 = 0 (54)
then
yp0 = u(x)y10 + v(x)y20
yp00 = u0 (x)y10 + v 0 (x)y20 + u(x)y100 + v(x)y200
Put yp , yp and yp00 in ay 00 + by 0 + cy = R(x) we get the following:
a(u0 (x)y10 + v 0 (x)y20 + u(x)y100 + v(x)y200 ) + b(u(x)y10 + v(x)y20 + cu(x)y1 + cv(x)y2 ) = R(x)
au0 (x)y10 + av 0 (x)y20 + au(x)y100 + av(x)y200 + bu(x)y10 + bv(x)y20 + cu(x)y1 + v(x)y2 = R(x)
au0 (x)y10 + av 0 (x)y20 = R(x)
1
u0 (x)y10 + v 0 (x)y20 = R(x) (55)
a
Put (54) and (55) together we get the following:
u0 (x)y1 + v 0 (x)y2 = 0 (56)

1
u0 (x)y10 + v 0 (x)y20 = R(x) (57)
a
Let w(y1 , y2 ) = y1 y20 − y10 y2 , then the values of u and v are given by the following:
Z
y2 R(x)
u(x) = − dx (58)
aw(y1 , y2 )
Z
y1 R(x)
v(x) = dx (59)
aw(y1 , y2 )
Method of undetermined coefficients
We distinguish different cases to be specific we shall consider only the following cases:
• R(x) = A0 erx
• R(x) = A0 cos wx or R(x) = A0 sin wx
• R(x) = Pn (x) where Pn (x) = a0 + a1 x + a2 x2 + a3 x3 + ... + an xn and an 6= 0.
• A mixture of the above cases shall also be discussed about.
• The superposition principle also will be discussed.
Case 1: R(x) = A0 erx .
1. If r is not a root of characteristic equation am2 +bm+c = 0, we set the particular solution
1
to be yp = Aerx where A = ar2 +br+c
2. If r is a single root of characteristic equation am2 + bm + c = 0, we set the particular

1
solution to be yp = Axerx where A = 2ar+b
3. If r is a double root of characteristic equation am2 + bm + c = 0, we set the particular

1
solution to be yp = Ax2 erx where A = 2a
Case 2: R(x) = A0 cos wx.
1. If r is not a root of characteristic equation am2 +bm+c = 0, we set the particular solution
to be yp = A cos wx + B sin wx where A and B are constant to found.
2. If r is a pure imaginary root of characteristic equation am2 + bm + c = 0, we set the

particular solution to be yp = x(A cos wx + B sin wx) where A and B are constants to be
found.
Case 3: R(x) = Pn (x).
1. If the term cy is present, we set the particular solution to be
Pn (x) = A0 + A1 x + A2 x2 + A3 x3 + ... + An xn
where Ai are constants be to found.
2. If the term cy is not present, we set the particular solution to be
Pn (x) = x(A0 + A1 x + A2 x2 + A3 x3 + ... + An xn )
where Ai are constants be to found.

7 Introduction to PROBABILITY and STATISTICS

7.1 Introduction
Definition: Probability is a concept which numerically measure the degree of uncertainty and
therefore of certainty of the occurrence of events.
Some terminologies
1. Random experiment: There are some experiments whose results may be different, even
if they are performed under the same conditions. They are called random experiments.
For example tossing a coin or throwing a die is random experiment.
2. Trial or event: Performing a random experiment is called a trial and its outcome is called
event. For instance Tossing a coin is trial and turning up a head or tail is an event.
3. Equally likely events: Two events are said to equally likely events if one of them cannot
be expected in preference of the other. For example if we draw a card from a well shuffled
pack, we may get any card, the 52 different cases are equally likely.
4. Compound events: When two or more events occur in composition with each other, the
simultaneous occurrence is termed as compound event. For instance when a die is thrown,
getting 5 or 6 is compound event.
5. Exhaustive event: The set of all possible outcomes of single performance of a random
experiment is exhaustive event or sample space. Each outcome is called sample point.
For example in case of tossing a coin once the sample space would be S = {H, T }.
6. Independent event: Two events may be independent when the actual happening of one
of them does not influence in any way the chance (probability) of the happening of the
other. For example the event of getting head on first coin and the event of getting a tail
on the second coin in a simultaneous throw of two coins are independent.
7. Mutually exclusive events: Two events are said to be mutually exclusive if the occurrence
of one excludes the occurrence of the other. For instance, on tossing a coin either we get
head or tail but not both.
8. Favourable events: The events that ensure the required happening are said to be favourable
events. For instance, in throwing of a die to have even numbers 2, 4 and 6 are favourable
ways.
9. Conditional probability: The probability of happening of an event A, such that event

B has already happened is called the conditional probability of happening of A on the
condition that B has already happened. It is usually denoted by P r(A/B).
10. Complement events: Events are said to be complements if they are mutually exclusive
and they exhaust the entire sample space.
7.2 Classical Definition of Probability

If an event A can happen in m ways and fails in n ways, all these ways being equally likely to
occur, then the probability of happening of A is
m
P r(A) =
m+n
and that of failing is

n
P r(Ā) = .
m+n
Thus
m n
P r(A) + P r(Ā) = + = 1.
m+n m+n
If they are N equally likely, mutually exclusive and exhaustive events of an experiment and m
of these are favourable, then the probability of happening of the favourable event A is defined
by
Number of favourable ways
P r(A) = .
N Total number of equaly likelly ways
Examples
1. Consider a random experiment of throwing an ordinary six faced die. Find the probability
of throwing
(a) 5,
(b) an even number.
2. Consider a random experiment of throwing two ordinary six faced dice. Find the proba-
bility of throwing a sum of 9.
3. In class of 12 students, 5 are boys and rest are girls. Find the probability that a student
selected randomly will be a girl.
Solutions
1. (a) They are six possible ways in which the die can fall and there is only one way of
throwing five.
Number of favourable ways 1
Thus the corresponding probability = = .
Total number of equaly likelly ways 6
(b) They are six possible ways in which the die can fall and there are three possible ways
of throwing an even number ie 2, 4 and 6. Thus the
Number of favourable ways 3 1
Thus the corresponding probability = = = .
Total number of equaly likelly ways 6 2
2. They are 36 possible ways in which the sum of dice can be, and there are 4 possible way
of throwing a sum of 9. See the following table.
Number of favourable ways 4 1

Thus the corresponding probability = = = .
Total number of equaly likelly ways 36 9
3. The total number of students is 12, and number of girls is 12−5 = 7. They are 12 possible
ways in which a students can be chosen and there are 7 ways girls can be chosen.
Number of favourable ways 7
Thus the corresponding probability = = .
Total number of equaly likelly ways 12
7.3 Laws of Probability

For any events A and B of sample space S the following hold:
1. P r(A) ≥ 0 and P r(A) ≤ 1 ie 0 ≤ P r(A) ≤ 1
2. If A and B are mutually exclusive then P r(A ∪ B) = P r(A) + P r(B) and if A and B are
not mutually exclusive then P r(A ∪ B) = P r(A) + P r(B) − P r(A ∩ B).
3. P r(S) = 1
4. P r(A) = 1 − P r(A)
Examples
1. An integer is chosen at random from the set of S = {x/x ∈ Z+ , x < 14}. Let A be the
event of choosing a multiple of three and let B be the event of choosing an even number.
Find the probability of A ∪ B, A ∩ B and A − B.
Solution:
S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13}.
A = {3, 6, 9, 12} and B = {2, 4, 6, 8, 10, 12}.
A ∪ B = {2, 3, 4, 6, 8, 9, 10, 12}.A ∩ B = {6, 12} and AB = {2, 4, 8, 10}.
Thus
8 2 4
P r(A ∪ B) = , P r(A ∩ B) = , P r(AB) = .
13 3 13
2. A coin is weighted so that head is three times as likely to appear as tail.
Find P r(T ) and P r(H) .
Solution: P r(T ) + P r(H) = 1 but 3P r(T ) = P r(H), theref oreP r(T ) + 3P r(T ) = 1,
hence P r(T ) = 41 and 3P r(T ) = P r(H) = 34 .
3. If A and B are events with P r(A) = 1/8 and P r(B) = 1/12 and P r(A ∩ B) = 14 . Find
P r(A ∪ B).
1
Solution : P r(A ∪ B) = P r(A) + P r(B) − P r(A ∩ B) = 18 + 12 − 14 = 13
24
7.4 Multiplication Laws of Probability

If A and B are independent events of sample space S then P r(A ∩ B) = P r(A) × P r(B)
Proof:
Suppose that A and B are 2 independent events. Let A happen in M ways and fails in N ways
then P r(A) = MM +N
m
. Let B happen in m ways and fails in n ways then P r(B) = m+n . We have
to show that
M m
P r(A ∩ B) =
M +N m+n
Consider the following table:
Occurrence / events Happening Failling Results: One happens and the other fails
A M N MN
B m n mn
Results: Both happen or fail Mm Nn MN+mn+Mm+Nn
We have
Mm M m
P r(A ∩ B) = =
M N + mn + M m + N n M +N m+n
Note that if A and B are independent events, then Ā and B̄ are independent events too.
ie P r(A ∩ B) = P r(A)P r(B) ⇒ P r(Ā ∩ B̄) = P r(Ā)P r(B̄).
The proof is simple: We know
¯ B) = 1 − P r(A ∪ B) = 1 − P r(A) − P r(B) + P r(A ∩ B).
P r(Ā ∩ B̄) = P r(A ∪
But A and A are independent ie P r(A ∩ B) = P r(A)P r(B), this implies that
P r(Ā ∩ B̄) = 1 − P r(A) − P r(B) + P r(A)P r(B) = (1 − P r(A))(1 − P r(B)) = P r(Ā)P r(B̄).
EXAMPLE: An article manufactured by a company consists of two parts A and B. In the
process of manufacture of part A, 9 out of 100 are likely to be defective similarly In the process
of manufacture of part B 5 out of 100 are likely to be defective. Calculate the probability that
the assembled article will not be defective. Assuming that the events of finding the part A non
defective and that of the part B non defective are independent.
Solution
T he probability that part A is defective is 9/100.
T he probability that part b is defective is 5/100.
T he probability that part A is not defective is 1 − 9/100 = 91/100.
T he probability that part B is not defective is 1 − 5/100 = 95/100.
The probability that the assembled article will not be defective is (91/100) × (95/100) = 0.8645
since those events are independent.
Example:
Let consider a random experiment of throwing 2 dice at the same time and record the results
to be on the form (d1 , d2 ). Consider an event A to be the six is appearing on the first place.
And Consider an event B to be the six is appearing on the second place.
1. Calculate the probability of the event A.
2. Calculate the probability of the event B.
3. Calculate the probability of the compound event A ∩ B.
4. Calculate the probability of the compound event A ∪ B.
5. Verify that the events A and B are independent or not.
The cardinality of the sample space is 36.

A = {(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)},
B = {(1, 6), (2, 6), (3, 6), (4, 6), (5, 6), (6, 6)},
A ∩ B = {(6, 6)},
A ∪ B = {(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6), (1, 6), (2, 6), (3, 6), (4, 6), (5, 6)}.
6
1. P r(A) = 36
= 61 ,
6
2. P r(B) = 36
= 16 ,
1
3. P r(A ∩ B) = 36
,
11
4. P r(A ∪ B) = 36
,
1
5. P r(A)P r(B) = 36
.
Hence A and B are independent events.
Example:
A factory runs two machines A and B. Machine A operates 80% of the time while Machine B
operates 60% of the time. And at least one machine operates for 92% of the time. Do these
machines operate independently?
Solution :
P r(A) = 0.8
P r(B) = 0.6
P r(A ∪ B) = 0.92
P r(A ∪ B) = P r(A) + P r(B) − P r(A ∩ B)
0.92 = 0.8 + 0.6 − P r(A ∩ B)
P r(A ∩ B) = 0.48
P r(A ∩ B) = P r(A)P r(B) = 0.48 Hence they operate independently.
7.5 Conditional probability

Let A and B be two events of sample space S and let P r(B) 6= 0, then the conditional probability
of the event A given B has happened is denoted by P r(A/B) and is defined by
P r(A ∩ B)
P r(A/B) = .
P r(B)
Therefore if A and B are independent, then P r(A/B) = P r(A).

EXAMPLE:
A coin is tossed twice in succession. Let A be the event that the first toss is head and Let B
be the event that the second toss is head. Find the following probabilities:
1. P r(A)
2. P r(B)
3. P r(A ∪ B)
4. P r(B/A)
5. P r(A/B)
Solution
1
1. P r(A) = 2
1
2. P r(B) = 2
1
3. P r(A ∪ B) = 4
P r(A∩B) 1
4. P r(B/A) = P r(A)
= 2
P r(A∩B) 1
5. P r(A/B) = P r(B)
= 2
7.6 Bayes’s Theorem

If B1 , B2 , B3 , ....., Bn are mutually exclusive events with P r(B1) 6= 0, P r(B2) 6= 0, P r(B3) 6=
0, P r(B4) 6= 0, ... ... P r(Bn) 6= 0 of a random experiment, then for any arbitrarily events A a
sample space S of the above experiment with P r(A) > 0 we have
P r(Br )P r(A/Br )
P r(Br /A) = Pn
i=1 P r(Bi )P r(A/Bi )
EXAMPLE
Machines A and B produce 60% and 40% respectively of the total output of the factory. The
parts produced by the machine A, 3% are defective and the parts produced by the machine
B, 5% are defective. A part is selected at random from a days production and found to be
defective. What is the probability that it comes from Machine A.
7.7 Binomial Probability

Binomial Experiment A binomial experiment is an experiment which satisfies these four
conditions
1. A fixed number of trials
2. Each trial is independent of the others
3. There are only two outcomes
4. The probability of each outcome remains constant from trial to trial.

These can be summarized as: An experiment with a fixed number of independent trials, each
of which can only have two possible outcomes.The fact that each trial is independent actually
means that the probabilities remain constant.
Examples of binomial experiments
1. Tossing a coin 20 times to see how many tails occur.
2. Asking 200 people if they watch ABC news.
3. Rolling a die to see if a 5 appears.

Examples which aren’t binomial experiments
1. Rolling a die until a 6 appears (not a fixed number of trials)
2. Asking 20 people how old they are (not two outcomes)
3. Drawing 5 cards from a deck for a poker hand (done without replacement, so not inde-
pendent)
Example:
What is the probability of rolling exactly two sixes in 6 rolls of a die?
SOLUTION
There are five things you need to do to work a binomial story problem.
1. Define Success first. Success must be for a single trial. Success = ”Rolling a 6 on a single
die”
2. Define the probability of success (p): p = 1/6
3. Find the probability of failure: q = 5/6
4. Define the number of trials: n = 6
5. Define the number of successes out of those trials: x = 2

Any time a six appears, it is a success (denoted S) and any time something else appears, it
is a failure (denoted F). The ways you can get exactly 2 successes in 6 trials are given below.
The probability of each is written to the right of the way it could occur. Because the trials
are independent, the probability of the event (all six dice) is the product of each probability of
each outcome (die)
Notice that each of the 15 probabilities are exactly the same as (1/6)2 × (5/6)4 . Also, note
that the 1/6 is the probability of success and you needed 2 successes. The 5/6 is the probability
of failure, and if 2 of the 6 trials were success, then 4 of the 6 must be failures. Note that 2 is
the value of r and 4 is the value of n r. Further note that there are fifteen ways this can occur.
This is the number of ways 2 successes can be occur in 6 trials without repetition and order
not being important, or a combination of 6 things, 2 at a time. Thus the required probability
is 15 × (1/6)2 × (5/6)4
The probability of getting exactly x success in n trials, with the probability of success on a
single trial being p is:
P (X = r) = (rn ) × pr × q n−r
with
n!
(rn ) = and p + q = 1.
r!(n − r)!
Example1:
A coin is tossed 10 times. What is the probability that exactly 6 heads will occur.
SOLUTION:
Success = ”A head is flipped on a single coin” p = 0.5, q = 0.5, n = 10 and r = 6.
P (r = 6) = (610 ) × 0.56 × 0.54 = 210 × 0.015625 × 0.0625 = 0.205078125.
Example2:
Suppose a biased coin comes up heads with probability 0.3 when tossed. What is the proba-
bility of achieving 0, 1,..., 6 heads after six tosses?
Solution:
8 Probability distribution
8.1 Introduction
To define probability distributions for the simplest cases, one needs to distinguish between dis-
crete and continuous random variables. In the discrete case, one can easily assign a probability
to each possible value: for example, when throwing a fair die, each of the six values 1 to 6 has
the probability 1/6. In contrast, when a random variable takes values from a continuum then,
typically, probabilities can be non-zero only if they refer to intervals: in quality control one
might demand that the probability of a ”500 g” package containing between 490 g and 510 g
should be no less than 98%.
8.2 Discrete Random Variables

A probability distribution is a table of values that shows the probabilities of various outcomes
of an experiment. For example, if a coin is tossed three times, the number of heads obtained
can be 0, 1, 2 or 3. The probabilities of each of these possibilities can be tabulated as shown:
A discrete variable is a variable which can only take a countable number of values. In this
example, the number of heads can only take 4 values (0, 1, 2, 3) and so the variable is discrete.
The variable is said to be random if the sum of the probabilities is one.
8.3 Probability Density Function

The probability density function (p.d.f.) of a random variable X (or probability mass function)
is a function which allocates probabilities. Put simply, it is a function which tells you the
probability of the occurring of certain events. The usual notation that is used is P (X = x) =
something. The random variable (r.v.) X is the event that we are considering. So in the
above example, X represents the number of heads that we throw. So P (X = 0) means ”the
probability that no heads are thrown”. Here, P (X = 0) = 1/8(the probability that we throw
no heads is 1/8 ). In the above example, we could therefore write:
Quite often, the probability density function will be given to you in terms of x. In the above
3!
example, P (X = x) = (x3 ) /23 = x!(3−x)!8 . (see permutations and combinations for the meaning
x
of (3 ) ).
Example:
A die is thrown repeatedly until a 6 is obtained. Find the probability density function for the
number times we throw the die.
Solution:
Let X be the random variable representing the number of times we throw the die.
P (X = 1) = 1/6 (if we only throw the die once, we get a 6 on our first throw. Its probability is
1/6 . P (X = 2) = 65 16 (if we throw the die twice before getting a 6, we must throw something
that isn’t a 6 with our first throw, the probability of which is 5/6 and we must throw a 6 on
our second throw, the probability of which is 1/6) etc. In general P (X = x) = ( 56 )x−1 ( 61 )
8.4 Cumulative Distribution Function

The cumulative distribution function (c.d.f.) of a discrete random variable X is the function
F(t) which tells you the probability that P X is less than or equal to t. So if X has p. d. f.
P (X = x), we have: F (t) = P (X ≤ t) = P (X = x). In other words, for each value that X
can be which is less than or equal to t, work out the probability that X is that value and add
up all such results.
Example
In the above example where the die is thrown repeatedly, let work out P (X ≤ t) for some
values of t. P (X ≤ t) is the probability that the number of throws until we get a 6 is less than
or equal to 1.
1 1
P (X ≤ 1) = P (X = 0) + P (X = 1) = +0 + = ,
6 6
1 5 11
P (X ≤ 2) = P (X = 0) + P (X = 1) + P (X = 2) = 0 + + = .
6 36 36
8.5 Expectation and Variance

The expected value (or mean) of a discrete random variable X, is a weighted average of the
possible values that X can take, each value being weighted according to its probability of
occurring. The expected value of X is usually written as E(X) or µ. It is calculated as follows:
X
E(X) = xP (X = x)
x
So the expected value is the sum of: [(each of the possible outcomes) × (the
probability of the outcome occurring)].
In more concrete terms, the expectation is what you would expect the outcome of an experiment
to be on average.
Example
What is the expected value when we roll a fair die?
Solution
There are six possible outcomes: 1, 2, 3, 4, 5, 6. each of these has a probability of 1/6 of
occurring. Let X represents the outcome of the experiment.
Therefore
1
P (X = 1) = P (X = 2) = P (X = 3) = P (X = 4) = P (X = 5) = P (X = 6) =
6
For instance, P (X = 6) = 16 means that the probability that the outcome of the experiment is
6 is 1/6.Thus the expected value is calculated as follows:
1 1 1 1 1 1
E(X) = P (X = 1)× +P (X = 2)× +P (X = 3)× +P (X = 4)× +P (X = 5)× +P (X = 6)×
6 6 6 6 6 6
1 7
E(X) = [1 + 2 + 3 + 4 + 5 + 6] = .
6 2
So the expectation is 3.5 . If you think about it, 3.5 is halfway between the possible values the
die can take and so this is what you should have expected.
8.5.1 Expected Value of a Function of X

To find E[f (X)], where f (X) is a function of X, use the following formula:
X
E(f (X)) = f (x)P (X = x)
x
Example
For the above experiment (with the die), calculate E(X 2 ). Using our notation above, f (x) = x2
1 1 1
E(X 2 ) = P (X 2 = 1) × + P (X 2 = 4) × + P (X 2 = 9) ×
6 6 6
1 1 1
+P (X 2 = 16) × + P (X 2 = 25) × + P (X 2 = 36) ×
6 6 6
1 91
E(X 2 ) = [1 + 4 + 9 + 16 + 25 + 36] = .
6 6
Some useful properties of Expected values
The expected value of a constant is just the constant, so for example E(1) = 1.
Multiplying a random variable by a constant multiplies the expected value by that constant,
so E[2X] = 2E[X]. Consider the cases where a and b are constants, then
E[aX + b] = aE[X] + b,
E[X + Y ] = E(X) + E(Y ),
E[X − Y ] = E[X] − E[Y ].
8.5.2 Variance
The variance of a random variable tells us something about the spread of the possible values
of the variable. For a discrete random variable X, the variance of X is written as V ar(X).
Variance is calculated as follows:
V ar(X) = E[(x − µ)2 ].
This can also be written as follows:
V ar(X) = E[X 2 ] − µ2 = E[X 2 ] − (E[X])2 .
The standard deviation of X is the square root of Var(X). Note that the variance does not behave
in the same way as expectation when we multiply and add constants to random variables. In
fact:
V ar[aX + b] = a2 V ar(X),
V ar[X + Y ] = V ar[X] + V ar[Y ],
V ar[X − Y ] = V ar[X] + V ar[Y ].
8.6 The Binomial Distribution

If a discrete random variable X has the following probability density function (p.d.f.), it is said
to have a binomial distribution:
P (X = x) = (xn ) px q n−x ,
where q = 1 − p, p is the probability of a success, and q the probability of a failure.
(xn ) is the number of ways of choosing x objects from a collection of n objects (see per-
mutations and combinations). If a random variable X has a binomial distribution, we write
X ∼ B(n, p) (∼ means has distribution...). n and p are known as the parameters of the distri-
bution (n can be any integer greater than 0 and p can be any number between 0 and 1). All
random variables with a binomial distribution have the above p.d.f., but may have different
parameters (different values for n and p).
Example
A coin is thrown 10 times. Find the probability density function for X, where X is the ran-
dom variable representing the number of heads obtained. The probability of throwing a head
is 0.5 and the probability of throwing a tail is 0.5. Therefore, the probability of throwing 8
tails is 0.58 . If we throw 2 heads and 8 tails, we could have thrown them HT T T T T HT T , or
T T HT HT T T T T , or in a number of other ways. In fact, the total number of ways of throwing
2 heads and 8 tails is (102 ) (see the permutations and combinations section). Hence the proba-
bility of throwing 2 heads and 8 tails is (102 ) (0.5)2 × (0.5)8 . As you can see this has a Binomial
distribution, where n = 10, p = 0.5. You can see, therefore, that the p.d.f. is going to be:
P (X = x) =10 Cx (0.5)(10−x) (0.5)x .
From this, we can work out the probability of throwing, for example, 3 heads (put x = 3).
8.6.1 Expectation and Variance

If X ∼ B(n, p), then the expectation and variance is given by E(X) = np, V ar(X) = npq
respectively.
Example
In the above example, what is the expected number of heads thrown?
E(X) = np
Now in the above example,
p = probability of throwing a head = 0.5,
n = number of throws = 10.
Hence expected number of heads = 5. This is what you would expect: if you throw a coin 10
times you would expect 5 heads and 5 tails on average.
8.6.2 The Poisson,s Distribution

A discrete random variable X with a probability distribution function (p.d.f.) of the form:
−λ x
P (X = x) = e x!λ is said to be a Poisson distribution of a random variable with parameter λ.
We write X ∼ P o(λ). Expectation and Variance If ∼ XP o(λ), then:
E(X) = ,
V ar(X) = λ.
Random Events
1. The Poisson distribution is useful because many random events follow it.
2. If a random event has a mean number of occurrences λ in a given time period, then the
number of occurrences within that time period will follow a Poisson distribution.
3. For example, the occurrence of earthquakes could be considered to be a random event.If

there are 5 major earthquakes each year, then the number of earthquakes in any given
year will have a Poisson distribution with parameter 5.
Example
There are 50 misprints in a book which has 250 pages. Find the probability that page 100 has
no misprints.
The average number of misprints on a page is 50/250 = 0.2. Therefore, if we let X be the random
variable denoting the number of misprints on a page, X will follow a Poisson distribution with
parameter 0.2 . Since the average number of misprints on a page is 0.2, the parameter, λ of
the distribution is equal to 0.2 .
P (X = 0) = (e−0.2 )(0.20 )/0! = 0.819
8.6.3 Binomial Approximation

The Poisson distribution can be used as an approximation to the binomial distribution.
A Binomial distribution with parameters n and p can be approximated by a Poisson distribution
with parameter np.
In this section we show you how the probability model of a binomial model can be translated
across to a Poisson process. It teaches us a few things:
1. The memory less property of a binomial process carries across to a Poisson process;
2. The Poisson process is often a good approximation to the binomial process; and therefore
3. The various distributions of the Poisson process are good often approximations to their
corresponding binomial process distributions.
8.6.4 Derivation of the Poisson,s distribution

We’ll start with an example application. Imagine that I am about to drink some water from a
large vat, and that randomly distributed in that vat are bacteria. The larger the quantity of
water I drink, the more risk I take of consuming bacteria, and the larger the expected number
of bacteria I would have consumed. We could have a go at modeling this as a binomial process.
A trial could be a small amount of water, say 1 ml. A success would be that there were at
least one bacterium in that ml of water. If the concentration in the vat was much less that 1
bacterium/ml, then the probability of having a second bacterium in a contaminated ml is small.
Then the number of trials n is the number of ml of water I drink, and the probability of success
is roughly the concentration/ml. I could make the binomial model increasingly more accurate
by having smaller units of water in a trial: 0.1 ml, 0.01 ml, etc. The problem is that for this
model to work, I must always test whether the probability of success for a trial is sufficiently
low that I have no real chance of a second bacterium in the one unit (that would be bad because
the second bacterium wouldn’t be accounted for in the 0/1 regime of a binomial process).What
we need to do is make the number of trials approach infinity, so of course the probability of
success approaches zero, but keep the same expected level of risk. This is exactly what Simeon
Poisson did.
8.6.5 Some mathematics

Consider a binomial process where the number of trials tends to infinity, and the probability of
success at the same time tends to zero, with the constraint that the mean of the Binomial dis-
tribution np remains finitely large. The probability mass function of the Binomial distribution
is:
n!
P (x) = (p)x (1 − p)n−x
x!(n − x)!
So, in the example above, x would be the number of bacteria I consume in n units of water,
and p is the probability that a random unit of water contains a bacterium. We’ll replace p with
the Poisson intensity λ = bacteria/ml, and the number of trials n with the amount of water
consumed t ml.
λ = np
Putting into the above Equation. gives:
x n−x
n! λ λ
P (x) = 1−
x!(n − x)! n n
n−x
n(n − 1)(n − 2)(n − 3)...(n − x + 1) x λ
P (x) = (λ) 1 −
x!nx n
For n big and p small, we have
n(n − 1)(n − 2)(n − 3)...(n − x + 1)
−→ 1
nx
n−x
λ
1− −→ e−λ
n
which simplifies:
λx −λ
P (x) = e ,
x!
This is the probability mass function for the Poisson distribution.
9 Introduction to Statistics
Statistics is the science that deals with the collection, analysis and interpretation of numer-
ical information. This science is divided into two areas: descriptive statistics and inferential
statistics. The theoretical base of the science of statistics is a field within mathematics called
mathematical statistics. Here, statistics is presented as an abstract, tightly integrated structure
of axioms, theorems, and rigorous proofs, involving many other areas of mathematics such as
calculus, probability theory, and higher algebra. To make this theoretical structure available to
the non-mathematician, an interpretative discipline has been developed called general statis-
tics in which the presentation is greatly simplified and often non-mathematical. From this
simplified version, each specialized field (e.g., agriculture, anthropology, biology, economics,
and engineering) takes material that is appropriate for its own numerical data.
Definition1: Statistics is concerned with the collection, ordering and analysis of data.
Definition2: Statistical data consists of set of recorded observation or values of some vari-
ables.
Definition3: a data variable is any quantity that can have a number of values. It can be
discrete or continuous.
A statistical exercise normally consist of four stages:
1. collection of data by counting or measuring or recording,
2. arrangement of data ie ordering and presenting data in convenient form,
3. analysis of the collected data,
4. interpretation of results and formulate related conclusion.
Example: The ages of student in the department of food processing and Agriculture were
recorded:
23 28 27 21 24 24 23 26 22 25 23 25 25 26 25 25 22 27 23 23 25 26 25 23 25 24 25 27 23 26 22
24 22 25 25 22 24 24 26 24 26 24 25 23 25 23 24 24 22 25 22 26 24. We can order this values,
analyse them and then we can draw a useful conclusion.
9.1 Arrangement of data

For the data above we appreciate this set of numbers better if we now arrange the in ascending
order. But after ordering them we still have two lines. Some values are occurring more than
once.Therefore we can form a table showing how many times each values occurs:
Ages (Ai ) Number of times (fi )
21 1
22 7
23 9
24 11
25 15
26 7
27 3
Total 53
fi is frequency of i observation or Ath
th
i observation. This table above is summarizing the
information of our data and it is clear to be adapted because it is simplified version of our
original data.
9.2 Types of Variable

All experiments examine some kind of variable(s). A variable is not only something that we
measure, but also something that we can manipulate and something we can control for. To
understand the characteristics of variables and how we use them in research, this guide is divided
into three main sections. First, we illustrate the role of dependent and independent variables.
Second, we discuss the difference between experimental and non-experimental research. Finally,
we explain how variables can be characterised as either categorical or continuous.
9.2.1 Dependent and Independent Variables

An independent variable, sometimes called an experimental or predictor variable, is a variable
that is being manipulated in an experiment in order to observe the effect on a dependent vari-
able, sometimes called an outcome variable.
Imagine that a tutor asks 100 students to complete a maths test. The tutor wants to know
why some students perform better than others. Whilst the tutor does not know the answer to
this, she thinks that it might be because of two reasons: (1) some students spend more time
revising for their test; and (2) some students are naturally more intelligent than others. As
such, the tutor decides to investigate the effect of revision time and intelligence on the test
performance of the 100 students. The dependent and independent variables for the study are:
Dependent Variable: Test Mark (measured from 0 to 100)
Independent Variables: Revision time (measured in hours) and Intelligence (measured using IQ
score)
The dependent variable is simply that, a variable that is dependent on an independent vari-
able(s). For example, in our case the test mark that a student achieves is dependent on revision
time and intelligence. Whilst revision time and intelligence (the independent variables) may (or
may not) cause a change in the test mark (the dependent variable), the reverse is implausible;
in other words, whilst the number of hours a student spends revising and the higher a student’s
IQ score may (or may not) change the test mark that a student achieves, a change in a student’s
test mark has no bearing on whether a student revises more or is more intelligent (this simply
doesn’t make sense).
Therefore, the aim of the tutor’s investigation is to examine whether these independent vari-
ables - revision time and IQ - result in a change in the dependent variable, the students’ test
scores. However, it is also worth noting that whilst this is the main aim of the experiment, the
tutor may also be interested to know if the independent variables - revision time and IQ - are
also connected in some way.
In the section on experimental and non-experimental research that follows, we find out a little
more about the nature of independent and dependent variables.
9.3 Experimental and Non-Experimental Research

• Experimental research: In experimental research, the aim is to manipulate an inde-
pendent variable(s) and then examine the effect that this change has on a dependent
variable(s). Since it is possible to manipulate the independent variable(s), experimental
research has the advantage of enabling a researcher to identify a cause and effect between
variables. For example, take our example of 100 students completing a maths exam where
the dependent variable was the exam mark (measured from 0 to 100), and the indepen-
dent variables were revision time (measured in hours) and intelligence (measured using
IQ score). Here, it would be possible to use an experimental design and manipulate the
revision time of the students. The tutor could divide the students into two groups, each
made up of 50 students. In ”group one”, the tutor could ask the students not to do
any revision. Alternately, ”group two” could be asked to do 20 hours of revision in the
two weeks prior to the test. The tutor could then compare the marks that the students
achieved.
• Non-experimental research: In non-experimental research, the researcher does not

manipulate the independent variable(s). This is not to say that it is impossible to do so,
but it will either be impractical or unethical to do so. For example, a researcher may be
interested in the effect of illegal, recreational drug use (the independent variable(s)) on
certain types of behaviour (the dependent variable(s)). However, whilst possible, it would
be unethical to ask individuals to take illegal drugs in order to study what effect this had
on certain behaviours. As such, a researcher could ask both drug and non-drug users to
complete a questionnaire that had been constructed to indicate the extent to which they
exhibited certain behaviours. Whilst it is not possible to identify the cause and effect
between the variables, we can still examine the association or relationship between them.In
addition to understanding the difference between dependent and independent variables,
and experimental and non-experimental research, it is also important to understand the
different characteristics amongst variables. This is discussed next.
9.3.1 Categorical Variables

Categorical variables are also known as discrete or qualitative variables. Categorical variables
can be further categorized as either nominal, ordinal or dichotomous.
• Nominal variables are variables that have two or more categories, but which do not have
an intrinsic order. For example, a real estate agent could classify their types of property
into distinct categories such as houses, condos, co-ops or bungalows. So ”type of property”
is a nominal variable with 4 categories called houses, condos, co-ops and bungalows. Of
note, the different categories of a nominal variable can also be referred to as groups or
levels of the nominal variable. Another example of a nominal variable would be classifying
where people live in Rwanda by PROVINCE. In this case there will be many more levels
of the nominal variable (5 in fact).
• Dichotomous variables are nominal variables which have only two categories or levels.
For example, if we were looking at gender, we would most probably categorize somebody
as either ”male” or ”female”. This is an example of a dichotomous variable (and also a
nominal variable). Another example might be if we asked a person if they owned a mobile
phone. Here, we may categorise mobile phone ownership as either ”Yes” or ”No”. In the
real estate agent example, if type of property had been classified as either residential or
commercial then ”type of property” would be a dichotomous variable.
• Ordinal variables are variables that have two or more categories just like nominal variables
only the categories can also be ordered or ranked. So if you asked someone if they liked
the policies of the Democratic Party and they could answer either ”Not very much”,
”They are OK” or ”Yes, a lot” then you have an ordinal variable. Why? Because you
have 3 categories, namely ”Not very much”, ”They are OK” and ”Yes, a lot” and you can
rank them from the most positive (Yes, a lot), to the middle response (They are OK), to
the least positive (Not very much). However, whilst we can rank the levels, we cannot
place a ”value” to them; we cannot say that ”They are OK” is twice as positive as ”Not
very much” for example.
9.3.2 Continuous variables

Continuous variables are also known as quantitative variables. Continuous variables can be
further categorized as either interval or ratio variables.
• Interval variables are variables for which their central characteristic is that they can be
measured along a continuum and they have a numerical value (for example, temperature
measured in degrees Celsius or Fahrenheit). So the difference between 20C and 30C is the
same as 30C to 40C. However, temperature measured in degrees Celsius or Fahrenheit is
NOT a ratio variable.
• Ratio variables are interval variables, but with the added condition that 0 (zero) of the
measurement indicates that there is none of that variable. So, temperature measured in
degrees Celsius or Fahrenheit is not a ratio variable because 0C does not mean there is
no temperature. However, temperature measured in Kelvin is a ratio variable as 0 Kelvin
(often called absolute zero) indicates that there is no temperature whatsoever. Other
examples of ratio variables include height, mass, distance and many more. The name
”ratio” reflects the fact that you can use the ratio of measurements. So, for example, a
distance of ten metres is twice the distance of 5 metres.
10 Descriptive Statistics
In descriptive statistics, techniques are provided for processing raw numerical data into usable
forms. These techniques include methods for collecting, organizing, summarizing, describing,
and presenting numerical information. If entire groups (populations) were always available for
study, then descriptive statistics would be all that is required.
10.1 Measures of Central Tendency

10.1.1 Introduction
A measure of central tendency is a single value that attempts to describe a set of data by
identifying the central position within that set of data. As such, measures of central tendency
are sometimes called measures of central location. They are also classed as summary statistics.
The mean (often called the average) is most likely the measure of central tendency that you
are most familiar with, but there are others, such as the median and the mode.
The mean, median and mode are all valid measures of central tendency, but under different
conditions, some measures of central tendency become more appropriate to use than others. In
the following sections, we will look at the mean, mode and median, and learn how to calculate
them and under what conditions they are most appropriate to be used.
10.1.2 Mean (Arithmetic)

The mean (or average) is the most popular and well known measure of central tendency. It can
be used with both discrete and continuous data, although its use is most often with continuous
data. The mean is equal to the sum of all the values in the data set divided by the number of
values in the data set. So, if we have n values in a data set and they have values x1 , x2 , ..., xn ,
the sample mean, usually denoted by (pronounced x bar), is:
n p
x1 + x2 + ... + xn 1X 1X
X̄ = = xi = f i xi .
n n i=1 n i=1
This formula is usually written in a slightly different manner using the Greek capital letter, Σ
, pronounced ”sigma”, which means ”sum of...”:
You may have noticed that the above formula refers to the sample mean. So, why have we
called it a sample mean? This is because, in statistics, samples and populations have very
different meanings and these differences are very important, even if, in the case of the mean,
they are calculated in the same way. To acknowledge that we are calculating the population
mean and not the sample mean, we use the Greek lower case letter ”mu”, denoted as µ:
The mean is essentially a model of your data set. It is the value that is most common. You
will notice, however, that the mean is not often one of the actual values that you have observed
in your data set. However, one of its important properties is that it minimises error in the
prediction of any one value in your data set. That is, it is the value that produces the lowest
amount of error from all other values in the data set.
An important property of the mean is that it includes every value in your data set as part of
the calculation. In addition, the mean is the only measure of central
Pn tendency where the sum
of the deviations of each value from the mean is always zero, ie i=1 (xi − X̄) = 0.
When not to use the mean?
The mean has one main disadvantage: it is particularly susceptible to the influence of outliers.
These are values that are unusual compared to the rest of the data set by being especially small
or large in numerical value. For example, consider the wages of staff at a factory below:
Staff 1 2 3 4 5 6 7 8 9 10
Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k
The mean salary for these ten staff is 30.7k. However, inspecting the raw data suggests that
this mean value might not be the best way to accurately reflect the typical salary of a worker,
as most workers have salaries in the 12k to 18k range. The mean is being skewed by the two
large salaries. Therefore, in this situation, we would like to have a better measure of central
tendency. As we will find out later, taking the median would be a better measure of central
tendency in this situation.
Another time when we usually prefer the median over the mean (or mode) is when our data
is skewed (i.e., the frequency distribution for our data is skewed). If we consider the normal
distribution - as this is the most frequently assessed in statistics - when the data is perfectly
normal, the mean, median and mode are identical. Moreover, they all represent the most
typical value in the data set. However, as the data becomes skewed the mean loses its ability
to provide the best central location for the data because the skewed data is dragging it away
from the typical value. However, the median best retains this position and is not as strongly
influenced by the skewed values. This is explained in more detail in the skewed distribution
section later.
10.1.3 Median
The median is the middle score for a set of data that has been arranged in order of magnitude.
The median is less affected by outliers and skewed data. In order to calculate the median,
suppose we have the data below: 65 55 89 56 35 14 56 55 87 45 92
We first need to rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89 92
Our median mark is the middle mark - in this case, 56 (highlighted in bold). It is the middle
mark because there are 5 scores before it and 5 scores after it. This works fine when you have
an odd number of scores, but what happens when you have an even number of scores? What if
you had only 10 scores? Well, you simply have to take the middle two scores and average the
result. So, if we look at the example below:
65 55 89 56 35 14 56 55 87 45
We again rearrange that data into order of magnitude (smallest first): 14 35 45 55 55 56 56
65 87 89
Only now we have to take the 5th and 6th score in our data set and average them to get a
median of 55.5.
10.1.4 Mode
The mode is the most frequent score in our data set. On a histogram it represents the highest
bar in a bar chart or histogram. You can, therefore, sometimes consider the mode as being the
most popular option. An example of a mode is presented below:
Normally, the mode is used for categorical data where we wish to know which is the most
common category, as illustrated below:
We can see above that the most common form of transport, in this particular data set, is the
bus. However, one of the problems with the mode is that it is not unique, so it leaves us with
problems when we have two or more values that share the highest frequency, such as below:
We are now stuck as to which mode best describes the central tendency of the data. This
is particularly problematic when we have continuous data because we are more likely not to
have any one value that is more frequent than the other. For example, consider measuring 30
peoples’ weight (to the nearest 0.1 kg). How likely is it that we will find two or more people
with exactly the same weight (e.g., 67.4 kg)? The answer, is probably very unlikely - many
people might be close, but with such a small sample (30 people) and a large range of possible
weights, you are unlikely to find two people with exactly the same weight; that is, to the nearest
0.1 kg. This is why the mode is very rarely used with continuous data.
Another problem with the mode is that it will not provide us with a very good measure of
central tendency when the most common mark is far away from the rest of the data in the data
set, as depicted in the diagram below:
In the above diagram the mode has a value of 2. We can clearly see, however, that the mode is
not representative of the data, which is mostly concentrated around the 20 to 30 value range.
To use the mode to describe the central tendency of this data set would be misleading.
10.2 Measures of Dispersion

Dispersion refers to the variability or spread in the data. In statistics, dispersion (also called
variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed.
Common examples of measures of statistical dispersion are:
• Range,
• interquartile range,
• the average deviation,
• the variance and
• the standard deviation.
10.2.1 Average Deviation

The average
Pn deviation (AD), also called the mean absolute
Pn deviation (MAD)is given by
1 1
AD = n i=1 | xi − µ | for populations and AD = n i=1 | xi − x̄ | for samples.
It can beP
writen also as follows:
AD = n pi=1 fi | xi − µ | for populations and AD = n1 pi=1 fi | xi − x̄ | for samples.
1
P
Note: Example to be given in class!!!!
10.2.2 Variance
The population variance σ 2 (the Greek letter sigma squared) and the sample variance s2 are
given byP
σ 2 = n1 ni=1 (xi − µ)2 for populations and s2 = n−1
1
Pn 2
i=1 (xi − x̄) for samples.
It can be
P written also as follows:
σ 2 = n1 pi=1 fi (xi − µ)2 for populations and s2 = n−1
1
Pp 2
i=1 fi (xi − x̄) for samples.
Note: Example to be given in class!!
10.2.3 Standard Deviation

The population deviation σ (the Greek letter sigma squared) and the sample variance s are
givenqby q
σ = n1 ni=1 (xi − µ)2 for populations and s = n−1
P 1
Pn 2
i=1 (xi − x̄) for samples.
It canqbe written also as follows: q
1
Pp 2 1
Pp 2
σ = n i=1 fi (xi − µ) for populations and s = n−1 i=1 fi (xi − x̄) for samples. Note:
Example to be given in class!!
10.3 Five number summary and box plot

10.3.1 Five number summary
1. The lowest value of the data set (i.e., minimum)
2. The highest value of the data set (i.e., maximum)
3. The Quartile one is the number or observation which leaves 25 % of the data on the left
hand side and 75% on the right side.
4. The Quartile three is the number or observation which leaves 75 % of the data on the left
hand side and 25% on the right side.
5. The Median is the number or observation which leaves 50 % of the data on the left hand
side and 50% on the right side.
These values are called a five-number summary of the data set.
10.3.2 Box plot

A box plot is a graph of a data set obtained by drawing a horizontal line from the minimum
data value to Q1 , drawing a horizontal line from Q3 to the maximum data value, and drawing
a box whose vertical sides pass through Q1 and Q3 with a vertical line inside the box passing
through the median or Q2 .
10.3.3 Procedure for constructing a box plot

1. Find the five-number summary for the data values, that is, the maximum and minimum
data values, Q1 and Q3 , and the median.
2. Draw a horizontal axis with a scale such that it includes the maximum and minimum data
values.
3. Draw a box whose vertical sides go through Q1 and Q3 , and draw a vertical line though the
median.
4. Draw a line from the minimum data value to the left side of the box and a line from the
maximum data value to the right side of the box.
10.3.4 Information Obtained from a Box plot

1. (a) If the median is near the center of the box, the distribution is approximately sym-
metric.
(b) If the median falls to the left of the center of the box, the distribution is positively
skewed.
(c) If the median falls to the right of the center, the distribution is negatively skewed.
2. (a) If the lines are about the same length, the distribution is approximately symmetric.
(b) If the right line is larger than the left line, the distribution is positively skewed.
(c) If the left line is larger than the right line, the distribution is negatively skewed.
Exercise: A dietitian is interested in comparing the sodium content of real cheese with the
sodium content of a cheese substitute. The data for two random samples are shown. Compare
the distributions, using box plots.
10.3.5 Range and interquartile range

The range is the difference between the highest observation and the lowest observation:
R = max − min
The interquartile range is the difference between the quartile 3 and quartile 1:
IR = Q3 − Q1
10.4 Graphical Representation of the Statistical Data
Note: We shall talk about histogram later.

10.5 FREQUENCY DISTRIBUTIONS

It is often useful to organize or arrange a body of data into a frequency distribution. This
breaks up the data into groups or classes and shows the number of observations in each class.
The number of classes is usually between 5 and 15.
10.5.1 GROUPED DATA: Tabular presentation of data

If a sample (or even a population) or the range of values of data is large, it is difficult to observe
the various characteristics or to compute statistics such as mean or standard deviation. For
this reason it is useful to organize or group the raw data. This means that it is often helpful
to consider these values arranged in regular groups or classes.
Example: The number of overtime hours per week worked by employee at a factory are as
follows:
45 31 46 25 57 39 42 55 20 37 40 59 11 38 34 22 62 33 48 43 57 37 43 51 29 41 35
66 45 32 44 47 42 46 54 65 17 35 53 27 38 22 33 39 45 32 43 41 57 45.
The range of these data is R = 66 − 11 = 55, this value is very big comparatively to these
values we are having. So we need to group them into small groups. Let us try to arrange then
into six classes of 10 each.
We have the following table:
Overtime hours Frequencies

10-19 2
20-29 6
30-39 14
40-49 17
50-59 8
60-69 3
Total 50
10.5.2 Some terminologies:

• Class: In the above table 20-29 is a called class.
• Lower and upper limits: The numbers 20 and 29 are lower limits and upper limits
respectively.
• Class boundaries: class boundaries are 0.5 below lower limit and 0.5 above the upper
limit. For example for the class 20-29, the lower bound is 20 − 0.5 = 19.5 and the upper
bound is 29 + 0.5 = 29.5. Note: If the data are whole numbers we use 0.5 if there are of
one decimal we use 0.05 and so on.
• Class interval C: It is the difference between the upper and lower class boundaries. For
example, for the same class C = 29.5 − 19.5 = 10.
• Class width W: It is the difference between the upper and lower class limits. For
example, for the same class W = 29 − 20 = 9.
• Central values (or mid-value): It is the average of the upper and lower boundaries.
For example, for the same class ci = 29.5+19.5
2
= 25.5.
• Frequency: The number of observations and their corresponding repetition in a given

class. For example, for the same class 29 − 20 is frequency is 6.
• Relative frequency: Is the percentage of frequency on the total frequency. For example,
for the same class 29 − 20 its relative frequency is 6×100
50
= 12% .
• A relative frequency distribution is obtained by dividing the number of observations

in each class by the total number of observations in the data as a whole. The sum of the
relative frequencies equals 1.
• A histogram is a bar graph of a frequency distribution, where classes are measured

along the horizontal axis and frequencies along the vertical axis.
• A frequency polygon is a line graph of a frequency distribution resulting from joining

the frequency of each class plotted at the class midpoint.
• A cumulative frequency distribution shows, for each class, the total number of
observations in all classes up to and including that class. When plotted, this gives a
distribution curve, or ogive.
Now from the above data, we can have the following table:
Overtime hours Frequencies Central values Lower bounds Upper bounds
10-19 2 14.5 9.5 19.5
20-29 6 24.5 19.5 29.5
30-39 14 34.5 29.5 39.5
40-49 17 44.5 39.5 49.5
50-59 8 54.5 49.5 59.5
60-69 3 64.5 59.5 69.5
Total 50
Example
As an illustration, suppose that a sample consists of the heights of 100 male students at MP
University. We arrange the data into classes or categories and determine the number of indi-
viduals belonging to each class, called the class frequency. The resulting arrangement as in the
below table is called a frequency distribution or frequency table.
The first class or category, for example, consists of heights from 60 to 62 inches, indicated by
60 − 62, which is called class. 60 is the lower class limit but 62 is the upper class limit.
Since 5 students have heights belonging to this class, the corresponding class frequency is
5. Since a height that is recorded as 60 inches is actually between 59.5 and 60.5 inches while
one recorded as 62 inches is actually between 61.5 and 62.5 inches, we could just as well have
recorded the class as 59.5 − 62.5. The next class would then be 62.5 − 65.5, etc. In the class
59.5 − 62.5, the numbers 59.5 and 62.5 are often called class lower boundary and upper
class boundary respectively. The width of this class interval, denoted by w, which is usually
the same for all classes (in which case it is denoted by w), is the difference between the upper
and lower class limits. In this case w = 62 − 60 = 2. The class interval of this class interval,
denoted by c, which is usually the same for all classes (in which case it is denoted by c), is the
difference between the upper and lower class boundaries. In this case c = 62.5 − 59.5 = 3. The
midpoint of the class, which can be taken as representative of the class, is called the class
mark. In the above the class mark corresponding to the class interval 60 − 62 is 61. A graph
for the frequency distribution can be supplied by a histogram, as shown in the figure below, or
by a polygon graph (often called a frequency polygon) connecting the midpoints of the tops in
the histogram. It is of interest that the shape of the graph seems to indicate that the sample
is drawn from a population of heights that is normally distributed.
10.5.3 Relative Frequency Distributions

If in this current above Table we recorded the relative frequency or percentage rather than
the number of students in each class, the result would be a relative or percentage frequency
distribution. For example, the relative or percentage frequency corresponding to the class 63−65
is 18/100 or 18%. The corresponding histogram is similar to that in the above figure except
that the vertical axis is relative frequency instead of frequency. The sum of the rectangular
areas is then 1, or 100%. We can consider a relative frequency as a probability distribution
in which probabilities are replaced by relative frequencies. Since relative frequencies can be
thought of as empirical probabilities, relative frequency distributions are known as empirical

probability distributions.
Example
The following data 7 10 3 4 5 4 5 6 6 5 6 7 2 5 7 8 8 4 9 3 76 8 6 6 7 2 7 7 4 4 9 3 8 7
10 9 2 9 5 give the grades on a quiz for a class of 40 students.
a) Arrange these grades (raw data set) into an array from the lowest grade to the highest
grade.
b) Construct a table showing class intervals and class midpoints and the absolute, relative,
and cumulative frequencies for each grade.
c) Present the data in the form of a histogram, relative-frequency histogram, frequency

polygon, and ogive.
Solutions
a) Arranged grades
b) Note that since we are dealing here with discrete data (i.e., data expressed in whole
numbers), we used the actual grades as the class midpoints. We get the following table:
c) Histogram, relative-frequency histogram, frequency polygon, and ogive.

10.5.4 General Rules for Organizing Data into Groups

1. Determine the largest and smallest numbers in the raw data and thus find the range.
2. Divide the range into a convenient number of class intervals having the same size. If
this is not feasible, use class intervals of different sizes. Two formulas are often used to
determine the number of classes. These are the Sturge rule and the Yule rule. They are
given respectively by number √ of classes K = 1 + 3.3 log(n) (Sturge’s rule),
number of classes K = 2.5 4 n (Yule’s rule).
R+1
3. Calculate the size of each class using this formula c = K
.
4. Determine the number of observations falling into each class interval; that is, find the
class frequencies and the class midpoints.
Note: Example to be given in class!!
10.6 Normal distribution curve

The normal distribution is a continuous probability distribution and the most commonly used
distribution in statistical analysis. The normal curve is bell-shaped and symmetrical about its
mean. It extends indefinitely in both directions, but most of the area (probability) is clustered
around the mean; 68.26% of the area (probability) under the normal curve is included within
one standard deviation of the mean (i.e., within µ ± σ, 95.44% within µ ± 2σ, and 99.74%
within µ ± 3σ).
The standard normal distribution is a normal distribution with a mean of 0 and a standard
deviation of 1 (i.e., µ = 0 and σ = 1). Any normal distribution can be converted into a standard
normal distribution by letting µ = 0 and expressing deviations from µ in standard deviation
units (z scale). To find probabilities (areas) for problems involving the normal distribution, we
first convert the X value into its corresponding z value, as follows:
X −µ
z=
σ
10.7 Exercises
1. Give short notes on Categorical variables and provide examples
2. With help of examples describe the difference between descriptive statistics and inferential
statistics.
3. Give 4 steps a statistical exercise is based on.
4. Explain what is a box plot and give its role in testing the symmetry of a distribution.
5. In which cases the mean is not a good measure of central tendency?
6. Answer true or false.
(a) The range of a statistical data is the difference between the median and mode.
(b) The mode of data is the observation with lowest frequency.
(c) The median of a statistical data is a number which divide the data into two equal
parts.
7. The doughnut are distributed in boxes depending on capacity of boxes: The number of
doughnut are as follows:
27 38 22 33 39 45 32 43 41 57 45 40 59 11 38 34 22 62 33 48 43 57 37 34 51
29 41 45 31 46 25 57 39 42 55 20 37 35 66 45 32 44 47 42 46 54 65 17 35 53 .
Use these data to Calculate
b median of data,
c mode of data,
d quartile three,
e mean of data,
f variance and standard deviation of data.
8. P
The sampling of thePnages of students from aPschool gave the following information:
n 2
x
i=1 i = 204802, i=1 xi = 3278, where n = i=1 fi = 53.
(a) Calculate the mean age, variance and standard deviation of the age.
(b) What can you say about the obtained results in (a)?
9. The time taken by employees to complete an operation was recorded on 80 occasions:

Time (min) 10.0 10.5 11.0 11.5 12.0 12.5 13.0
Frequency (fi ) 4 8 14 22 19 10 3
(a) Determine the mean, mode, median, variance, standard deviation, quartile one and
quartile three.
(b) Draw the bar-chart, pie-chart and box plot.
10. The length of 20 seeds are measured and the results in centimetre to two significant figure
are as follows: 7.3 7.1 6.6 7.0 7.8 7.3 7.5 6.2 6.9 6.7 6.5 6.8 7.2 7.4 6.5 6.9 7.2
7.6 7.0 6.8
(a) Compile step by step a table showing frequency distribution and relative frequency
distribution for regular classes of 0.2 cm from 6.2 cm to 7.9 cm.
(b) Compile step by step a table showing lower and upper limits, central values, lower
and upper bounds ans their corresponding frequencies of all classes, for regular classes
of 0.2 cm from 6.2 cm to 7.9 cm.
(c) Draw the corresponding histogram, frequency polygon curve and the frequency curve.
(d) Find the mean , variance, and the standard deviation of the grouped data.
(e) Find the modal class and the mode of the grouped data.
(f) Find the mode, mean , median and variance of the non grouped data.
(g) Compare the obtained value above to the one from grouped data.
(h) In one paragraph draw the conclusion about the length of the seeds.
11. The length in millimetre of 40 spindles were measured with the following results: 20.90
20.57 20.86 20.74 20.82 20.63 20.53 20.89 20.75 20.65 20.71 21.03 20.72 20.41
20.94 20.75 20.79 20.65 21.08 20.89 20.50 20.88 20.97 20.78 20.64 20.92 21.07
21.16 20.80 20.77 20.82 20.72 20.60 20.90 20.86 20 68 20.75 20.88 20.56 20.94
(a) Do the same as in the question one from 20.40 to 21.20 and consider the regular
intervals of 0.01.
(b) Is the frequency curve obtained a normal distribution curve? Explain your answer.
12. The company want to investigate the precision of a new employee who is employed to
make the doughnuts. A well made doughnut has 20 cm of perimeter. The perimeters
of a sample of 34 doughnuts are measured and the following results in cm are obtained.
19.63 19.82 19.96 19.75 19.86 19.82 16.61 19.97 20.07 19.89 20.16 19.56 20.05
19.96 19.68 19.87 19.90 19.66 19.77 19.99 20.00 20.11 20.01 19.84 19.73 19.93
20.03 19.86 19.81 19.77 19.78 19.75 19.87 19.72
(a) Arrange the data into 7 equal classes of width 0.09 cm for the range 19.50 cm to
29.19 cm and determine the frequency distribution.
(b) Find the mean , variance, and the standard deviation of the grouped data.
(c) Is this employee accurate?
(d) Find the class having the highest frequency.
(e) Find the lower class boundary of the 3rd class.
(f) Find the upper class boundary of the 7th class.
(g) Find the central value of the 5th class.
13. Specialist edible oil manufactures and many food companies convert the raw materials to
a wide range of food products. The main product of edible oil manufactures are described
in the following table by their corresponding quantities imported and their corresponding
prices in 1995.
Oil or Fat Tones Price in USD millions

Soya Bean Oil 21,000 24.6
Palm Oil and its Fractions 11,800 13.1
Sunflower Oil 8,050 8.9
Canda/ Rapessed Oil 4,900 6.0
Coconut Oil 3,950 4.6
Other(corn, maize oil,....) 1,200 2.0
Total 50,950 59.2
Investigate this table and answer to the following questions
(a) Draw a bar-chart (Oil or fat versus Tonnes)

(b) Draw a pie-chart (Oil or fat versus Tonnes)
(c) Identify the type of oil which is highly needed
(d) Identify the type of oil which is very expensive
(e) What is the percentage of soya bean oil?
(f) Find the average price/ explain the meaning of the value obtained.
(g) Find the average quantity of tonnes imported and explain the meaning of the value
obtained.
(h) Verify if the values obtained in f and g are good measures of central tendency to
represent the overall information on the corresponding quantities.
(i) Find the percentage of tonnes of each type of oil or fat imported.
14. A machine is set to produce bolts of a nominal diameter 25 mm. Measurements of the
diameter of 60 bolts gave the following frequency distribution:
Diameter (mm) 23.3-23.7 23.8-24.2 24.3-24.7 24.8-25.2 25.3-25.7 27.8-26.2 26.3-26.7

Frequency 2 4 10 17 16 8 3
(a) Calculate the mean, average deviation and standard deviation of the diameter.
(b) For a full run of 3000 bolts, calculate
i. The limits between which all the diameters are likely to lie,
ii. The approximate number of bolts with diameters less than 24.5 mm
15. The number of overtime hours per week worked by employee at a factory are as follows:
45 31 46 25 57 39 42 55 20 37 40 59 11 38 34 22 62 33 48 43 57 37 34 51 29 41 35 66 45
32 44 47 42 46 54 65 17 35 53 27 38 22 33 39 45 32 43 41 57 45.
(a) Find range of the data.
(b) Arrange these data into classes of size 10 each starting from 10 as the lower limit of
the first class. Make a table including limits of classes ,boundaries of classes, central
values of classes and the frequencies corresponding to each class.
(c) Construct the histogram for these data.
(d) Approximate the mean and standard deviation for these grouped data.
(e) Approximate the mode and median for these grouped data.
16. The following data are the height in (cm) of students of mathematics department in a
certain University.
66 56 59 57 62 60 69 49 61 55 60 69 62 63 53 63 59 50 58 60 72 64 65 65 64
70 64 68 66 65 63 69 70 65 69 65 66 50 56 50 70 55 61 55 70 67 50 62 62 75
56 59 58 .
(a) Find range of the data.
(b) Find the number of possible classes of size 5 each that can be made from these data,
by arranging them starting from 49.
(c) Make a table including limits of classes, boundaries of classes, central values of classes
and the frequencies corresponding to each class.
(d) Construct the histogram for these data.
(e) Approximate the mode and median for these grouped data.

Engineering Mathematics II

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Engineering Mathematics II

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Engineering Mathematics II

Uploaded by

Copyright:

Available Formats

Mr.

NIYIGABA Emmanuel, Assistant Lecturer (IPRC MUSANZE) 1

Engineering Mathematics II(MAT122)

March 20, 2020

4 Application of Laplace transform to electricity 50

7 Introduction to PROBABILITY and STATISTICS 69

8.6.5 Some mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

1 Matrices and determinant

are some of the examples of matrices. We may represent a matrix as follows:

1.1 Types of matrices

1.1.1 Some special types of square matrices

1.1.2 Operation with matrices

1.1.6 Scalar Multiplication

3. Associativity of Scalar Multiplication (cd)A = c(dA),

4. Scalar Identity 1A = A(1) = A,

6. Distributive (c + d)A = cA + dA,

8. Associativity of Multiplication A(BC) = (AB)C,

1.1.7 Matrix Multiplication

Am×n × Bn×p = Cm×p .

Rxamples: Let A and B be two matrices, calculate the following:

1.1.8 Transpose of a matrix

1.1.9 Inverse of a Matrix

1.1.10 Properties of transpose and inverse of matrix.

• If AT = A, then A is called symetric matrix,

• If AT = −A, then A is called skey-symmetric matrix,

• If AT A = I, then A is called unitary matrix,

1.2 Determinant of a matrix

1.2.1 Determinant of order two

Example: Evaluate the following determinant.

1.2.2 Determinant of order three

Example: Evaluate the following determinant.

1.2.3 Properties of determinant

5. If a row or column of a matrix is multiplied by a constant then,the new determinant is a

1.2.4 Calculation of the Inverse of matrix

1.2.5 Minor and cofactors

1.2.6 Inverse of a square matrix

1.3 System of linear equations

a11 x1 + a12 x2 + a13 x3 + ... + a1n xn = d1 ,

The system above can be written as the product of matrices as follows:

The solution is the vector

, Which can be given by

The above system can be modelled as follows by using Kirchoff’s laws.

13I − 13I1 − 11I3 = 54

This can be written as follows:

1.5 Eigen-system and Characteristic Polynomial of a Square Matrix

3. The geometric multiplicity of an eigenvalue λ is the dimension of an eigenspace Eλ cor-

4. The algebraic multiplicity of an eigenvalue is the number of times an eigenvalue is repeated

5. The diagonalisation can be used to calculate the power of matrices: An = P Dn P −1

• Find all eigenvalues and eigenvectors of the matrix A.

• Find a matrix P which transform the matrix A into a diagonal form.

• Hence find A4 , A2 and A−1 .

3. Determine the values of a and b if the system of linear equation below:

(a) has a unique solution

4. Matrices A and B are such that:

2.3 Some useful integrals

2.4 Useful trigonometric results

2.5 Fourier series with period L 6= 2π

2.6 Full worked Solutions

Q5 (a) Find matrices A and B that satisfy the following equations: