Data Science
Data Science
By:
Dharna Ahuja
What is Data Science?
• Data Science is about data gathering, analysis and
decision-making.
• Summarizing data
• Visualization
print(x) will make the datatype of all the elements as same. So this
will give output as [‘2’,’3’,’n’,’5]
arr = np.array(42)
print(arr)
1-D Arrays
• An array that has 0-D arrays as its elements is called uni-
dimensional or 1-D array.
• These are the most common and basic arrays.
• import numpy as np
print(arr)
2-D Arrays
• An array that has 1-D arrays as its elements is called a 2-D
array.
• These are often used to represent matrix or 2nd order
tensors.
• import numpy as np
print(arr)
3-D arrays
• An array that has 2-D arrays (matrices) as its elements is
called 3-D array.
• These are often used to represent a 3rd order tensor.
• import numpy as np
print(arr)
Check Number of Dimensions?
• NumPy Arrays provides the ndim attribute that returns an
integer that tells us how many dimensions the array have.
• import numpy as np
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3],
[4, 5, 6]]])
print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)
NumPy Array Indexing
• Access Array Elements
• import numpy as np
print(arr[0])
• Get third and fourth elements from the following array and add them.
• import numpy as np
print(arr[2] + arr[3])
Generate arrays using linspace()
• Numpy.linspace()- returns equally spaced numbers within the
given range based on the sample number.
• Syntax: numpy.linspace(start,stop,num,dtype,retstep)
• Syntax: numpy.arange(start,stop,step)
• D=np.arange(start=1,stop=10,step=2
Generate arrays using ones()
• Numpy.ones()- returns an array of given shape and type filled
with ones.
• Syntax: numpy.ones(shape,dtype)
• Syntax: numpy.zeros(shape,dtype)
• Np.zeros((3,4))
Generate arrays using random.rand()
• Numpy.random.rand()- returns an array of given shape filled
with random values.
• Syntax: numpy.random.rand(shape)
• Np.random.rand(5)
Generate arrays using random.rand()
• Generate an array of random values with 5
rows and 2 columns
arr =
np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(2, 3, 2)
print(newarr)
Can We Reshape Into any Shape?
• Yes, as long as the elements required for reshaping are equal
in both shapes.
• We can reshape an 8 elements 1D array into 4 elements in 2
rows 2D array but we cannot reshape it into a 3 elements 3
rows 2D array as that would require 3x3 = 9 elements.
• import numpy as np
newarr = arr.reshape(3, 3)
print(newarr)
Numpy addition
• numpy.add() -> performs elementwise
addition between two arrays.
• Numpy.add(array_1,array_2)
• Create two arrays a and b.
• a=np.array([[1,2,3],[4,5,6]])
• b=np.arange(start=11,stop=20).reshape(3,3)
• np.add(a,b)
Numpy multiplication
• Numpy.multiply()-> performs elementwise
multiplication between two arrays.
• numpy.multiply(array_1,array_2)
Other Numpy functions
• Numpy.subtract()-> performs elementwise
subtraction between two arrays.
• Numpy.divide()-> returns an element wise
division of inputs.
• Numpy.remainder()-> returns element-wise
remainder of division.
Accessing components of an array
• Components of an array can be accessed using
index number.
• a=[[1 2 3]
[4 5 6]
[7 8 9]]
• import numpy as np
arr = np.array([
• [[1, 2, 3], [4, 5, 6]],
[[7, 8, 9], [10, 11, 12]]])
print(arr[0, 1, 2])
Example Explained
• The first number represents the first dimension, which contains
two arrays:
[[1, 2, 3], [4, 5, 6]]
and:
[[7, 8, 9], [10, 11, 12]]
• Since we selected 0, we are left with the first array:
• [[1, 2, 3], [4, 5, 6]]
• The second number represents the second dimension, which also contains two arrays:
[1, 2, 3]
and:
[4, 5, 6]
arr =
np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5])
Slicing 2-D Arrays
• From the second element, slice elements from
index 1 to index 4 (not included):
• import numpy as np
print(arr[1, 1:4])
print(arr[0:2, 2])
• From both elements, slice index 1 to index 4 (not included), this will
return a 2-D array:
• import numpy as np
print(arr[0:2, 1:4])
NumPy Data Types
• NumPy has some extra data types, and refer to data types with
one character, like i for integers, u for unsigned integers etc.
• Below is a list of all data types in NumPy and the characters used
to represent them.
• i- integer
• B-Boolean
• f-float
• c-complex float
• M-datetime
• O-object
• S-string
• The NumPy array object has a property called dtype that returns the data
type of the array:
• import numpy as np
print(arr.dtype)
Creating Arrays With a Defined
Data Type
• We use the array() function to create arrays, this function can
take an optional argument:dtype that allows us to define the
expected data type of the array elements:
• import numpy as np
print(arr)
print(arr.dtype)
• For i,u,f,S and U we can define size as well.
• Create an array with data type 4 bytes integer:
• import numpy as np
print(arr)
print(arr.dtype)
Converting Data Type on Existing
Arrays
• The best way to change the data type of an existing array,
is to make a copy of the array with the astype() function.
• The astype() function creates a copy of the array, and
allows you to specify the data type as a parameter.
• The data type can be specified using a string, like ‘f’ for
float, ‘I’ for integer etc.
• Change data type from float to integer by using ‘I’ as
parameter valyue:
• import numpy as np
newarr = arr.astype('i')
print(newarr)
print(newarr.dtype)
NumPy Array Copy vs View
• The main difference between a copy and a view of an array
is that the copy is a new array, and the view is just a view
of the original array.
• The copy owns the data and any changes made to the copy
will not affect original array, and any changes made to the
original array will not affect the copy.
• The view does not own the data and any changes made to the
view will affect the original array, and any changes made to the
original array will affect the view.
• COPY:
• Make a copy, change the original array, and display both
arrays:
• import numpy as np
print(arr)
print(x)
• The copy SHOULD NOT be affected by the changes made to the original array.
Make Changes in the VIEW
• Make a view, change the view, and
display both arrays:
• import numpy as np
print(arr)
print(x)
• The original array SHOULD be affected
by the changes made to the view.
VIEW
• Make a view, change the original array,
and display both arrays:
• import numpy as np
print(arr)
print(x)
• The view SHOULD be affected by the
changes made to the original array.
Flattening the arrays
• Flattening array means converting a multidimensional array
into a 1D array.
• We can use reshape(-1) to do this.
• Convert the array into a 1D array:
• import numpy as np
newarr = arr.reshape(-1)
print(newarr)
NumPy Array Iterating
• Iterating means going through elements one by one.
• As we deal with multi-dimensional arrays in numpy, we can
do this using basic for loop.
• If we iterate on a 1-D array it will go through each element
one by one.
• import numpy as np
for x in arr:
print(x)
Iterating 2-D Arrays
• In a 2-D array it will go through all
the rows.
• import numpy as np
for x in arr:
print(x)
• To return the actual values, the scalars,
we have to iterate the arrays in each
dimension.
• Iterate on each scalar element of the 2-
D array:
• import numpy as np
for x in arr:
for y in x:
print(y)
Iterating 3-D Arrays
• import numpy as np
for x in arr:
for y in x:
for z in y:
print(z)
Iterating Arrays Using nditer()
• Iterating on Each Scalar Element
• In basic for loops, iterating through
each scalar of an array we need to
use n for loops which can be difficult to
write for arrays with very high
dimensionality.
• import numpy as np
arr = np.array([[[1, 2], [3, 4]],
[[5, 6], [7, 8]]])
for x in np.nditer(arr):
print(x)
Iterating With Different Step Size
• We can use filtering and followed by
iteration.
• import numpy as np
print(arr)
• Join two 2-D arrays along rows
(axis=1):
• import numpy as np
print(arr)
Splitting NumPy Arrays
• Splitting is reverse operation of Joining.
• Joining merges multiple arrays into one and Splitting
breaks one array into multiple.
• We use array_split() for splitting arrays, we pass
it the array we want to split and the number of
splits.
• Split the array in 3 parts:
• import numpy as np
newarr = np.array_split(arr, 3)
print(newarr)
• Split the array in 4 parts:
• import numpy as np
newarr = np.array_split(arr, 4)
print(newarr)
Split Into Arrays
• The return value of the array_split() method is an
array containing each of the split as an array.
• If you split an array into 3 arrays, you can access them
from the result just like any array element:
• import numpy as np
newarr = np.array_split(arr, 3)
print(newarr[0])
print(newarr[1])
print(newarr[2])
Splitting 2-D Arrays
• Split the 2-D array into three 2-D
arrays.
• import numpy as np
newarr = np.array_split(arr, 3)
print(newarr)
Searching Arrays
• You can search an array for a certain
value, and return the indexes that get a
match.
• To search an array, use the where()
method.
• Find the indexes where the value is 4:
• import numpy as np
arr =
np.array([1, 2, 3, 4, 5, 4, 4])
x = np.where(arr == 4)
print(x)
Sorting Arrays
• Sorting means putting elements in an ordered
sequence.
• Ordered sequence is any sequence that has an
order corresponding to elements, like numeric or
alphabetical, ascending or descending.
• The NumPy ndarray object has a function called
sort(), that will sort a specified array.
• import numpy as np
print(np.sort(arr))
Array dimensions
• Create an array a
• a=np.array([[1,2,3],[4,5,6],[7,8,9]])
• shape()-> returns dimensions of an array
• array_name.shape
• Extract elements from second and third row of
array a.
• a[1:3]
• Extract elements from first column of array a.
• a[: , 0] -> this means take all the rows
• Extract elements from the first row of array a.
• a[0, : ]
Subset of arrays
• Array a=> [[1 2 3]
[4 5 6]
[7 8 9]]
• Subset a 2X2 array from the original array a
• Consider the first two rows and columns from a
• a_sub=a[:2,:2]
• print(a_sub)
• Suppose, you want to modify the value of 1 to
12 in the array a_sub :
• a=[[1,2]
[4,5]]
• a_sub[0,0]=12
• a_col=np.append(a,col,axis=1)
Modifying array using insert()
• insert()-> adds values at a given position and
axis in an array.
• numpy.insert(array,obj,values,axis)
• array- input array
• Obj- index position
• values-array of values to be inserted’
• axis-axis along which values should be
inserted
• Consider array a
• a=[[12,2,3],[4,5,6],[7,8,9]]
• Insert new array along row and at the 1st index
position.
• a_ins=np.insert(a,1,[13,15,16],axis=0)
• print(a_ins)
Modifying array using delete()
• delete()- > removes values at a given position
and axis in an array.
• Numpy.delete(array,obj,axis)
• array- input array
• obj- indicate array to be removed.
• axis- axis along which array should be
removed
• Delete third row from the existing array a_ins
• a_del=np.delete(a_ins,2,axis=0)
• Corresponding index is 2.
Matrices
• Rectangular arrangement of numbers in rows
and columns.
• Rows run horizontally and columns run
vertically.
a11 a12 a13
a21 a22 a23
a31 a32 a33
The above matrix is 3X3
a11
a21
a31
Equation 1:
4x + 3y = 20
-5x + 9y = 26
To solve the above system of linear equations, we need to find the values of the x and y
variables.
In the matrix solution, the system of linear equations to be solved is represented in the
form of matrix AX = B.
For instance, we can represent Equation 1 in the form of a matrix as follows:
• A = [[ 4 3]
• [-5 9]]
• X = [[x]
• [y]]
• B = [[20]
• [26]]
• To find the value of x and y variables in Equation 1, we need to find the values in
the matrix X. To do so, we can take the dot product of the inverse of matrix A, and
the matrix B as shown below:
• X = inverse(A).B
• Using the inv() and dot() Methods
• m_list = [[4, 3], [-5, 9]]
• A = np.array(m_list)
• inv_A = np.linalg.inv(A)
• The next step is to find the dot product between the inverse of matrix A,
and the matrix B. It is important to mention that matrix dot product is only
possible between the matrices if the inner dimensions of the matrices are
equal i.e. the number of columns of the left matrix must match the
number of rows in the right matrix.
• To find the dot product with the Numpy library, the linalg.dot() function is
used. The following script finds the dot product between the inverse of
matrix A and the matrix B, which is the solution of the Equation 1.
• B = np.array([20, 26])
• X = np.linalg.inv(A).dot(B)
Using the solve() Method
• the Numpy library contains the linalg.solve() method, which can be used
to directly find the solution of a system of linear equations:
• A = np.array([[4, 3, 2], [-2, 2, 3], [3, -5, 2]])
• B = np.array([25, -10, -4])
• X2 = np.linalg.solve(A,B)
Finding Determinant
A = np.array([[6, 1, 1],
[4, -2, 5],
[2, 8, 7]])
print(("\nDeterminant of A:"
, np.linalg.det(A)))
Numpy Matrix
• Numpy matrices are strictly 2-dimensional, while numpy arrays (ndarrays)
are N-dimensional.
• The main advantage of numpy matrices is that they provide a convenient
notation for matrix multiplication: if a and b are matrices, then a*b is their
matrix product.
• Syntax :
• a=np.array([(1,2,3),(3,4,5,)])
• print(np.sqrt(a))
• print(np.std(a)) //standard deviation
Vertical & Horizontal Stacking
• if you want to concatenate two arrays and not
just add them, you can perform it using two
ways – vertical stacking and horizontal
stacking.
• f = np.array([1,2,3])
• g = np.array([4,5,6])
• print(np.vstack((x,y)))
• print(np.hstack((x,y)))
• Horizontal Append: [1 2 3 4 5 6]
• Vertical Append: [[1 2 3] [4 5 6]]
NumPy Array Iteration
• NumPy provides an iterator object, i.e., nditer
which can be used to iterate over the given
array using python standard Iterator interface.
• import numpy as np
• a = np.array([[1,2,3,4],[2,4,5,6],[10,20,39,3]])
• print("Printing array:")
• print(a);
• print("Iterating over the array:")
• for x in np.nditer(a):
• print(x,end=' ')
Array Sorting
• print(np.sort(a,1)) → sorting along the rows
• print(np.sort(a,0)) -> Along the columns
numpy.mean()
• The sum of elements, along with an axis
divided by the number of elements, is known
as arithmetic mean. The numpy.mean()
function is used to compute the arithmetic
mean along the specified axis.
• numpy.mean(a, axis=None, dtype=None, out=
None, keepdims=<no value>)
• import numpy as np
• a = np.array([[1, 2], [3, 4]])
• b=np.mean(a)
• b
• x = np.array([[5, 6], [7, 34]])
• y=np.mean(x)
• y
• import numpy as np
• a = np.array([[2, 4], [3, 5]])
• b=np.mean(a,axis=0)
• c=np.mean(a,axis=1)