Advance Python Program Unit II
Advance Python Program Unit II
Features of NumPy
Import NumPy
Once NumPy is installed, import it in your applications by adding the import keyword:
import numpy
import numpy as np
Method – 1:
To find out the version of NumPy you are using, just import the module (if not already imported)
and then print out “numpy.__version__“.
Example:
import numpy as np
Method – 2:
Using pip show to check version of Numpy
Syntax: pip show <package_name>
Example:
Arrays in NumPy
• NumPy’s main object is the homogeneous multidimensional array.
• Arrays are a collection of the same type of elements/values that can have one or more
dimensions.
• An array of one dimension is called a Vector, while having two dimensions is called a Matrix.
• In NumPy, dimensions are called axes. The number of axes is rank.
• NumPy arrays are called ndarray or N-dimensional arrays.
Creating a NumPy Array
To create an ndarray, we can pass a list, tuple or any array-like object into the array() method,
and it will be converted into an ndarray.
Dimensions in Arrays
A dimension in arrays is one level of array depth (nested arrays)
0-D Arrays
0-D arrays, or Scalars, are the elements in an array. Each value in an array is a 0-D array.
Example
import numpy as np
arr = np.array(10)
print(arr)
Output:
10
1-D Arrays
• An array that has 0-D arrays as its elements is called uni-dimensional or 1-D array.
• These are the most common and basic arrays.
Example - 1: Creating 1-D array using list and array()
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
Output:
[1 2 3 4 5]
Example - 2: Creating 1-D array using tuple and array()
import numpy as np
arr = np.array((1, 2, 3, 4, 5))
print(arr)
Output:
[1 2 3 4 5]
Example – 3: Creating 1-D array using list and asarray()
import numpy as np
list = [10,20,30,40]
arr = np.asarray(list)
print(arr)
Output:
[10 20 30 40]
tuple = (10,20,30,40)
arr = np.asarray(tuple)
print(arr)
Output:
[10 20 30 40]
Example – 5: Creating 1-D array using loop and ndarray()
import numpy as np
n = int(input("Enter Size :"))
arr = np.ndarray(shape=(n),dtype=int)
Empty array:
Example - 1:
import numpy as np
empty_array = np.empty(0)
print("Empty Array:\n", empty_array)
Output:
Empty Array:
[]
Example - 2:
import numpy as np
empty_array = np.empty(5)
print("Empty Array:\n", empty_array)
Output:
Empty Array:
[6.23042070e-307 4.67296746e-307 1.69121096e-306 9.40672775e-312
3.56175136e-317]
Array of zeros
Example - 1:
import numpy as np
zeros_array = np.zeros(5)
print("zeros_array\n",zeros_array)
Output:
zeros_array
[0. 0. 0. 0. 0.]
Example - 2:
import numpy as np
zeros_array = np.zeros(5,dtype=int)
print("zeros_array\n",zeros_array)
Output:
zeros_array
[0 0 0 0 0]
Array of ones
Example – 1:
import numpy as np
ones_array = np.ones(5)
print("ones_array\n",ones_array)
Output:
ones_array
[1. 1. 1. 1. 1.]
Example – 2:
import numpy as np
ones_array = np.ones(5,dtype=int)
print("ones_array\n",ones_array)
Output:
ones_array
[1 1 1 1 1]
Example:
import numpy as np
constant_array = np.full(5,7)
print("constant_array\n",constant_array)
Output:
constant_array
[7 7 7 7 7]
import numpy as np
Example - 2:
import numpy as np
Example - 3:
import numpy as np
Example - 4:
import numpy as np
Example - 5:
import numpy as np
NumPy also has functions for generating arrays with random values, useful for simulations and
testing.
• Random Float Array : np.random.rand() function generates an array of random values between
0 and 1.
Example:
import numpy as np
arr = np.random.rand(5)
print(arr)
Output:
• Random Integers : If we need random integers, we can use np.random.randint() to create arrays
with integer values in a specified range.
Example:
import numpy as np
arr = np.random.randint(1,10,size=6)
print(arr)
Output:
[6 3 7 6 2 6]
2-D Arrays
• An array that has 1-D arrays as its elements is called a 2-D array.
• These are often used to represent matrix
Example - 1: Creating 2-D array using list and array()
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
Output:
[[1 2 3]
[4 5 6]]
Output:
[[1 2 3]
[4 5 6]]
arr = np.ndarray(shape=(r,c),dtype=int)
for i in range(r):
for j in range(c):
arr[i][j] = int(input())
print("Array elements:\n",arr)
Output:
Enter rowSize : 2
Enter colSize : 2
Enter 4 elements :
1
2
3
4
Array elements:
[[1 2]
[3 4]]
Empty array:
Example - 1:
import numpy as np
empty_array = np.empty([0,0])
print("Empty Array:\n", empty_array)
Output:
Empty Array:
[]
Example - 2:
import numpy as np
empty_array = np.empty([2,2])
print("Empty Array:\n", empty_array)
Output:
Empty Array:
[[0. 0.]
[0. 0.]]
Array of zeros
Example - 1:
import numpy as np
zeros_array = np.zeros([2,2])
print("zeros_array\n",zeros_array)
Output:
zeros_array
[[0. 0.]
[0. 0.]]
Example - 2:
import numpy as np
zeros_array = np.zeros([2,2],dtype=int)
print("zeros_array\n",zeros_array)
Output:
zeros_array
[[0 0]
[0 0]]
Array of ones
Example – 1:
import numpy as np
ones_array = np.ones([2,3])
print("ones_array\n",ones_array)
Output:
ones_array
[[1. 1. 1.]
[1. 1. 1.]]
Example – 2:
import numpy as np
ones_array = np.ones([2,3],dtype=int)
print("ones_array\n",ones_array)
Output:
ones_array
[[1 1 1]
[1 1 1]]
Example:
import numpy as np
constant_array = np.full([3,3],7)
print("constant_array\n",constant_array)
Output:
constant_array
[[7 7 7]
[7 7 7]
[7 7 7]]
Random numbers in ndarray
NumPy also has functions for generating arrays with random values, useful for simulations and
testing.
• Random Float Array : np.random.rand() function generates an array of random values between
0 and 1.
Example:
import numpy as np
arr_rand = np.random.rand(2, 3)
print(arr_rand)
Output:
[[0.01466307 0.9677552 0.25104995]
[0.66652019 0.27030812 0.91041412]]
• Random Integers : If we need random integers, we can use np.random.randint() to create arrays
with integer values in a specified range.
Example:
import numpy as np
• Identity Matrix : np.eye() function creates an identity matrix, a square matrix with ones on the
diagonal and zeros elsewhere.
Example:
import numpy as np
identity_matrix = np.eye(3)
print(identity_matrix)
Output:
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
• Diagonal Matrix : Use np.diag() to create a diagonal matrix, where the provided array elements
form the diagonal.
Example:
import numpy as np
Output:
[[1 0 0]
[0 2 0]
[0 0 3]]
3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-D array.
Example - 1: Creating 3-D array using list and array()
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr)
Output:
[[[1 2 3]
[4 5 6]]
[[7 8 9]
[10 11 12]]]
print("Array elements:\n",arr)
Output:
Enter No of matrix : 2
Enter rowSize : 2
Enter colSize : 2
Enter 8 elements :
1
2
3
4
5
6
7
8
Array elements:
[[[1 2]
[3 4]]
[[5 6]
[7 8]]]
Empty array:
Example - 1:
import numpy as np
empty_array = np.empty([0,0,0])
print("Empty Array:\n", empty_array)
Output:
Empty Array:
[]
Example - 2:
import numpy as np
empty_array = np.empty([2,2,2])
print("Empty Array:\n", empty_array)
Output:
Empty Array:
[[[0.1724733 0.21789787]
[0.02467395 0.06510989]]
[[0.13597087 0.80837594]
[0.59047233 0.55820254]]]
Array of zeros
Example - 1:
import numpy as np
zeros_array = np.zeros([2,2,2])
print("zeros_array\n",zeros_array)
Output:
zeros_array
[[[0. 0.]
[0. 0.]]
[[0. 0.]
[0. 0.]]]
Example - 2:
import numpy as np
zeros_array = np.zeros([2,2,2],dtype=int)
print("zeros_array\n",zeros_array)
Output:
zeros_array
[[[0 0]
[0 0]]
[[0 0]
[0 0]]]
Array of ones
Example – 1:
import numpy as np
ones_array = np.ones([2,2,3])
print("ones_array\n",ones_array)
Output:
ones_array
[[[1. 1. 1.]
[1. 1. 1.]]
[[1. 1. 1.]
[1. 1. 1.]]]
Example – 2:
import numpy as np
ones_array = np.ones([2,2,3],dtype=int)
print("ones_array\n",ones_array)
Output:
ones_array
[[1 1 1]
[1 1 1]]
[[1 1 1]
[1 1 1]]
Example:
import numpy as np
constant_array = np.full([2,2,2],7)
print("constant_array\n",constant_array)
Output:
constant_array
[[[7 7]
[7 7]]
[[7 7]
[7 7]]]
Example:
import numpy as np
arr_rand = np.random.rand(2,2,2)
print(arr_rand)
Output:
[[[0.33797995 0.76896482]
[0.27211375 0.85260367]]
[[0.0864397 0.49337954]
[0.45241024 0.14987361]]]
• Random Integers : If we need random integers, we can use np.random.randint() to create arrays
with integer values in a specified range.
Example:
import numpy as np
[[9 8]
[4 9]]]
Experiment -2
The Shape and Reshaping of NumPy Array
Shape of an Array
The shape of an array is the number of elements in each dimension. The shape of a NumPy array is a tuple
of integers. Each integer in the tuple represents the size of the array along a particular dimension or axis. For
example, an array with shape (3, 4) has 3 rows and 4 columns.
• For a 2D array, the shape is a tuple with two elements: number of rows, number of columns.
• For a 3D array, the shape is a tuple with three elements: depth, number of rows, number of
columns.
NumPy arrays have an attribute called shape that returns a tuple with each index having the number of
corresponding elements.
Shape of an Array using 1D
In NumPy, the shape attribute also works with 1D arrays. A 1D array is essentially a list of elements. Here’s
an example to illustrate:
Example 1:
import numpy as np
# Creating a 1D array
array = np.array([1, 2, 3, 4, 5])
# Getting the shape of the 1D array
print(array.shape)
Output:
(5,)
Shape of an Array using 2D
Example 2:
import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)
Output:
(2, 4)
Shape of an Array using 3D
NumPy, working with 3D arrays is quite common, especially in fields like image processing and scientific
computing. Just like with 2D arrays, you can use the shape attribute to get the dimensions of a 3D array.
Example 3:
import numpy as np
array = np.array([[[1, 2, 3], [4, 5, 6]],
[[7, 8, 9], [10, 11, 12]]])
print(array.shape)
Output:
(2, 2, 3)
the array has 2 blocks, each containing 2 rows and 3 columns. Essentially, it’s a collection of 2D arrays
stacked together.
Accessing Array Shape
You can access the shape of a NumPy array using the shape attribute. This attribute returns a tuple of
integers, each representing the size of the array along a particular dimension.
1D Array
Output:
Shape of the array: (5,)
Number of dimensions: 1
Total number of elements: 5
2D Array
Example 5:
import numpy as np
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
Example 6:
import numpy as np
array3 = np.array([[[1, 2, 3], [4, 5, 6]],
[[7, 8, 9], [10, 11, 12]]])
Output:
Reshaped to 1D array:
[1 2 3 4 5 6]
Reshaped to 3D array:
[[[1]
[2]
[3]]
[[4]
[5]
[6]]]
Example 6:
import numpy as np
# Creating a 3D array
array_1d = array_3d.reshape(12)
print("Reshaped to 1D array:")
print(array_1d)
# Reshaping 3D array to 2D array
array_2d = array_3d.reshape(4, 3)
print("\nReshaped to 2D array:")
print(array_2d)
Output:
Reshaped to 1D array:
[ 1 2 3 4 5 6 7 8 9 10 11 12]
Reshaped to 2D array:
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
Example 7:
import numpy as np
# Create a 1D array
print(array_2d)
Output:
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
This way, you can reshape arrays in various ways, depending on your needs.
Flattening a NumPy array means converting a multi-dimensional array into a one-dimensional array. You can
do this easily using the flatten() method or the ravel() method.
Example 1:
import numpy as np
# Create a 2D array
flattened_array = array.flatten()
print(flattened_array)
Output:
[1 2 3 4 5 6]
Example2:
import numpy as np
# Create a 2D array
array = np.array([[1, 2, 3], [4, 5, 6]])
# Ravel the array
raveled_array = array.ravel()
print(raveled_array)
Output:
[1 2 3 4 5 6]
Flattening NumPy arrays can be very useful when dealing with different dimensions. Here are examples of
flattening 1D, 2D, and 3D arrays:
1D Array:
Flattening a 1D array is quite straightforward as it is already a single dimension.
import numpy as np
Example 3:
import numpy as np
# Create a 1D array
array_1d = np.array([1, 2, 3, 4, 5])
# Flatten the array (though it's already flat)
flattened_array_1d = array_1d.flatten()
print(flattened_array_1d)
Output:
[1 2 3 4 5]
2D Array:
Flattening a 2D array converts it into a 1D array.
Example 4:
import numpy as np
# Create a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
# Flatten the array
flattened_array_2d = array_2d.flatten()
print(flattened_array_2d)
Output:
[1 2 3 4 5 6]
3D Array:
Flattening a 3D array also converts it into a 1D array.
Example 5:
import numpy as np
# Create a 3D array
array_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
# Flatten the array
flattened_array_3d = array_3d.flatten()
print(flattened_array_3d)
Output:
[1 2 3 4 5 6 7 8]
Transposing a NumPy
Transposing a NumPy array essentially means swapping its rows and columns. This can be done easily using
the transpose() method or the .T attribute.
1D Array:
Transposing a 1D array in NumPy doesn't change the array since it has only one dimension. The transpose
function and .T attribute are designed for multi-dimensional arrays.
Example 1:
import numpy as np
# Create a 1D array
transposed_array_1d = array_1d.T
print(transposed_array_1d)
Output:
[1 2 3 4 5]
The array remains the same. Transposing is meaningful for 2D arrays and higher, where rows and columns
can be interchanged. For a 1D array, there's no change in structure.
2D Array:
Example 2:
import numpy as np
# Create a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
# Transpose the array using transpose() method
transposed_array_2d = array_2d.transpose()
print(transposed_array_2d)
# Transpose the array using .T attribute
transposed_array_2d_T = array_2d.T
print(transposed_array_2d_T)
Output:
[[1 4]
[2 5]
[3 6]]
3D Array:
When transposing a 3D array, you can specify the order of the axes.
Example 3:
import numpy as np
# Create a 3D array
array_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
# Transpose the array specifying the axes order
transposed_array_3d = array_3d.transpose(1, 0, 2)
print(transposed_array_3d)
Output:
[[[1 2]
[5 6]]
[[3 4]
[7 8]]]
In this example, we swapped the first and second axes.
Transposing is a versatile operation that can be very useful in various data manipulation tasks.
Expanding and Squeezing a NumPy Array
Expanding and squeezing a NumPy array are common operations in NumPy for adding or removing
dimensions.
Expanding Dimensions
You can use the np.expand_dims function or the reshape method to add a new axis to an array.
1. Using np.expand_dims:
Example 1:
import numpy as np
# Create a 1D array
array_1d = np.array([1, 2, 3, 4, 5])
# Expand dimensions
expanded_array = np.expand_dims(array_1d, axis=0)
print(expanded_array)
print(expanded_array.shape)
Output:
[[1 2 3 4 5]]
(1, 5)
Example 2:
import numpy as np
# creating an input array
a = np.array([1, 2, 3, 4])
# getting the dimension of a
print(a.shape)
# expanding the axis of a
b = np.expand_dims(a, axis=1)
# getting the dimension of the new array
print(b.shape)
Output:
(4,)
(4, 1)
Example 3:
import numpy as np
# Original array
arr = np.array([1, 2, 3])
# Expanding dimensions
expanded_arr = np.expand_dims(arr, axis=0)
print("Original array:", arr)
print("Expanded array:", expanded_arr)
print("Shape of expanded array:", expanded_arr.shape)
# Using None indexing
expanded_arr_none = arr[None, :]
print("Expanded array using None indexing:", expanded_arr_none)
print("Shape of expanded array using None indexing:", expanded_arr_none.shape)
Output:
Original array: [1 2 3]
Expanded array: [[1 2 3]]
Shape of expanded array: (1, 3)
Expanded array using None indexing: [[1 2 3]]
Shape of expanded array using None indexing: (1, 3)
Example 3:
import numpy as np
# Creating an array using the array() method
arr = np.array([[5, 10, 15], [20, 25, 30]])
# Display the array
print("Our Array...",arr)
# Display the shape of array
print("Array Shape...",arr.shape)
# Check the Dimensions
print("Dimensions of our Array...",arr.ndim)
# Get the Datatype
print("Datatype of our Array object...",arr.dtype)
# Get the number of elements in an array
print("Size of array...",arr.size)
# To expand the shape of an array, use the numpy.expand_dims() method
# Insert a new axis that will appear at the axis position in the expanded array shape.
res = np.expand_dims(arr, axis=(0, 1))
# Display the expanded array
print("Resultant expanded array....", res)
# Display the shape of the expanded array
print("Shape of the expanded array...",res.shape)
# Check the Dimensions
print("Dimensions of our Array...",res.ndim)
Output:
Our Array...
[[ 5 10 15]
[20 25 30]]
Array Shape...
(2, 3)
Dimensions of our Array...
2
Datatype of our Array object...
int64
Size of array...
6
Resultant expanded array....
[[[[ 5 10 15]
[20 25 30]]]]
Shape of the expanded array...
(1, 1, 2, 3)
Dimensions of our Array...
4
The below example shows how expand_dims() function is used to add new dimensions to arrays by
adjusting their shapes and dimensions as needed
Example 4:
Import numpy as np
# Create a 2D array
x = np.array([[1, 2], [3, 4]])
print('Array x:')
print(x)
print('\n')
# Add a new axis at position 0
y = np.expand_dims(x, axis=0)
print('Array y with a new axis added at position 0:')
print(y)
print('\n')
# Print the shapes of x and y
print('The shape of x and y arrays:')
print(x.shape, y.shape)
print('\n')
# Add a new axis at position 1
y = np.expand_dims(x, axis=1)
print('Array y after inserting axis at position 1:')
print(y)
print('\n')
# Print the number of dimensions (ndim) for x and y
print('x.ndim and y.ndim:')
print(x.ndim, y.ndim)
print('\n')
# Print the shapes of x and y
print('x.shape and y.shape:')
print(x.shape, y.shape)
Output:
Array x:
[[1 2]
[3 4]]
Array y with a new axis added at position 0:
[[[1 2]
[3 4]]]
The shape of x and y arrays:
(2, 2) (1, 2, 2)
Array y after inserting axis at position 1:
[[[1 2]]
[[3 4]]]
x.ndim and y.ndim:
23
x.shape and y.shape:
(2, 2) (2, 1, 2)
Squeezing Dimensions
Squeezing a NumPy array involves removing single-dimensional entries from the shape of the array.
Squeezing a 2-dimensional NumPy array involves removing dimensions of size 1.
Example 1:
import numpy as np
# Creating a 2D array with shape (3, 1)
arr = np.array([[1], [2], [3]])
print("Original array shape:", arr.shape)
# Using np.squeeze() to remove single-dimensional entries
squeezed_arr = np.squeeze(arr)
print("Squeezed array shape:", squeezed_arr.shape)
print("Squeezed array:", squeezed_arr)
Output:
Original array shape: (3, 1)
Squeezed array shape: (3,)
Squeezed array: [1 2 3]
If you have an array with shape (1, 3, 1), squeezing it would result in an array with shape (3,).
• The first dimension has a size of 1. This means there is one "row" in the array.
• The second dimension has a size of 3. This indicates that there are three "columns" in the array.
• The third dimension has a size of 1. This signifies that each "element" in the three columns is itself a
single-element array.
Example 2:
import numpy as np
# Creating a 3D array with shape (1, 3, 1)
arr = np.array([[[1], [2], [3]]])
print("Original array shape:", arr.shape)
# Using np.squeeze() to remove single-dimensional entries
squeezed_arr = np.squeeze(arr)
print("Squeezed array shape:", squeezed_arr.shape)
print("Squeezed array:", squeezed_arr)
Output:
Original array shape: (1, 3, 1)
Squeezed array shape: (3,)
Squeezed array: [1 2 3]
You can use the np.squeeze function to remove single-dimensional entries from the shape of an array.
Example 3:
import numpy as np
# Create a 3D array with single-dimensional entries
array_3d = np.array([[[1, 2, 3]], [[4, 5, 6]]])
print(array_3d.shape)
# (2, 1, 3)
# Squeeze the array to remove dimensions of size 1
squeezed_array = np.squeeze(array_3d)
print(squeezed_array)
print(squeezed_array.shape)
Output:
[[1 2 3]
[4 5 6]]
(2, 3)
The np. squeeze function removes all dimensions of size 1 from the array shape, effectively reducing the
dimensionality. These operations can be very useful for reshaping data to fit your needs.
When you squeeze a 1-dimensional NumPy array, it will remain unchanged because there are no single-
dimensional entries to remove.
Example 4:
import numpy as np
# Creating a 1D array
arr = np.array([1, 2, 3])
print("Original array shape:", arr.shape)
# Using np.squeeze() to remove single-dimensional entries
squeezed_arr = np.squeeze(arr)
print("Squeezed array shape:", squeezed_arr.shape)
print("Squeezed array:", squeezed_arr)
Output:
Original array shape: (3,)
Squeezed array shape: (3,)
Squeezed array: [1 2 3]
Sorting
Sorting arrays in NumPy is straightforward with the np.sort() function. Let me
show you how to sort arrays:
1.Sorting a 1D array:
Example 1:
import numpy as np
# Creating a 1D array
arr = np.array([5, 2, 9, 1, 5, 6])
# Sorting the array
sorted_arr = np.sort(arr)
print("Original array:", arr)
print("Sorted array:", sorted_arr)
Output:
Original array: [5 2 9 1 5 6]
Sorted array: [1 2 5 5 6 9]
2.Sorting a 2D array along an axis:
Example 2:
import numpy as np
# Creating a 2D array
arr_2d = np.array([[5, 2, 9], [1, 5, 6]])
# Sorting along the first axis (columns)
sorted_arr_2d_axis0 = np.sort(arr_2d, axis=0)
# Sorting along the second axis (rows)
sorted_arr_2d_axis1 = np.sort(arr_2d, axis=1)
print("Original 2D array:\n", arr_2d)
print("Sorted 2D array along axis 0 (columns):\n", sorted_arr_2d_axis0)
print("Sorted 2D array along axis 1 (rows):\n", sorted_arr_2d_axis1)
Output:
Original 2D array:
[[5 2 9]
[1 5 6]]
Sorted 2D array along axis 0 (columns):
[[1 2 6]
[5 5 9]]
Sorted 2D array along axis 1 (rows):
[[2 5 9]
[1 5 6]]
Output:
In-place sorted array:
[[2 5 9]
[1 5 6]
[4 7 8]]
Indices of the sorted array:
[[0 1 2]
[0 1 2]
[0 1 2]]
Array sorted using indices:
[[[2 5 9]
[1 5 6]
[4 7 8]]
[[2 5 9]
[1 5 6]
[4 7 8]]
[[2 5 9]
[1 5 6]
[4 7 8]]]
Indexing and Slicing of NumPy Array
Indexing 1D array
Indexing refers to accessing elements of an array using their indices. NumPy arrays are zero-indexed, meaning
the first element is at index 0.
Example 1:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print("Element at index 0:", arr[0])
print("Element at index 3:", arr[3])
print(“Element at index -1:”,arr[-1])
Output:
Element at index 0: 1
Element at index 3: 4
Element at index -1:5
Indexing 2D array
With 2D arrays, you can access elements using a row and column index.
import numpy as np
# Creating a 2D array
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Accessing elements using row and column indices
element_0_1 = arr[0, 1] # Element at row 0, column 1
element_2_2 = arr[2, 2] # Element at row 2, column 2
print("Element at row 0, column 1:", element_0_1)
print("Element at row 2, column 2:", element_2_2)
Output:
Element at row 0, column 1: 2
Element at row 2, column 2: 9
Basic Slicing
Slicing in 1D NumPy arrays is a powerful way to access and manipulate portions of an array. The basic
syntax for slicing is array[start:stop:step], where start is the index to begin the slice, stop is the index to end
the slice (exclusive), and step is the stride between each index.
Example 1:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5])
Output:
[2 3 4 5]
Example 2:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[4:])
Output:
[5 6 7]
Example 3:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[:4])
output:
[1 2 3 4]
Example 4:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[-3:-1])
Output:
[5 6]
Example 5:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5:2])
Output:
[2 4]
Example 6:
import numpy as np
# Creating a 1D array
arr = np.array([10, 20, 30, 40, 50])
# Slicing from index 1 to 4 (exclusive)
slice_1 = arr[1:4]
print("Slice from index 1 to 4:", slice_1)
# Slicing from start to index 3 (exclusive)
slice_2 = arr[:3]
print("Slice from start to index 3:", slice_2)
# Slicing from index 2 to end
slice_3 = arr[2:]
print("Slice from index 2 to end:", slice_3)
Output:
Slice from index 1 to 4: [20 30 40]
Slice from start to index 3: [10 20 30]
Slice from index 2 to end: [30 40 50]
Assignment:
Create a 1D numpy array with elements from 0 through 9
i. Slice elements from index 2 to 5
ii. Slice elements from the beginning to index 5
iii. Slice elements from index 5 to the end
iv. Slice elements with a step of 2
v. Slice elements from index 1 to 8 with a step of 3
import numpy as np
# Creating a NumPy array
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Slicing elements from index 2 to 5
slice_1 = arr[2:6]
print(slice_1) # Output: [2 3 4 5]
# Slicing elements from the beginning to index 5
slice_2 = arr[:6]
print(slice_2) # Output: [0 1 2 3 4 5]
# Slicing elements from index 5 to the end
slice_3 = arr[5:]
print(slice_3) # Output: [5 6 7 8 9]
# Slicing elements with a step of 2
slice_4 = arr[::2]
print(slice_4) # Output: [0 2 4 6 8]
# Slicing elements from index 1 to 8 with a step of 3
slice_5 = arr[1:9:3]
print(slice_5) # Output: [1 4 7]
Modify Array Elements Using Slicing
With slicing, we can also modify array elements using:
• start parameter
• stop parameter
• start and stop parameter
• start, stop, and step parameter
In NumPy, we can also reverse array elements using the negative slicing. For example,
import numpy as np
# create a numpy array
numbers = np.array([2, 4, 6, 8, 10, 12])
# generate reversed array
reversed_numbers = numbers[::-1]
print(reversed_numbers)
Output:
[12 10 8 6 4 2]
Here, the slice numbers[::-1] selects all the elements of the array with a step size of -1, which reverses the
order of the elements.
Slicing a 2D array using NumPy is quite similar to slicing lists in Python.
From the second element, slice elements from index 1 to index 4
Example 1:
import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[1, 1:4])
Output:
[7 8 9]
From both elements, return index 2
Example 2:
import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[0:2, 2])
Output:
[3 8]
From both elements, slice index 1 to index 4 (not included), this will return a 2-D array:
Example 3:
import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[0:2, 1:4])
output:
[[2 3 4]
[7 8 9]]
Example 4:
# create a 2D array
array1 = np.array([[1, 3, 5, 7],
[9, 11, 13, 15]])
print(array1[:2, :2])
Output
[[ 1 3]
[ 9 11]]
The first :2 returns first 2 rows i.e., entire array1 # [1 3]
The second :2 returns first 2 columns from the 2 rows. # [9 11]
Example 5:
Example 1:
import numpy as np
array = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
# Extract a single element
element = array[1, 2] # Output: 7
# Extract a specific row:
row = array[1, :] # Output: [5, 6, 7, 8]
# Extract a specific column
column = array[:, 2] # Output: [3, 7, 11]
#Extract a subarray:
subarray = array[0:2, 1:3] # Output: [[2, 3], [6, 7]]
Example 6:
import numpy as np
# create a 2D array
array1 = np.array([[1, 3, 5, 7],
[9, 11, 13, 15],
[2, 4, 6, 8]])
# slice the array to get the first two rows and columns
subarray1 = array1[:2, :2]
# slice the array to get the last two rows and columns
subarray2 = array1[1:3, 2:4]
# print the subarrays
print("First Two Rows and Columns: \n",subarray1)
print("Last two Rows and Columns: \n",subarray2)
Output
First Two Rows and Columns:
[[ 1 3]
[ 9 11]]
Last two Rows and Columns:
[[13 15]
[ 6 8]]
• array1[:2, :2] - slices array1 that starts at the first row and first column (default values), and ends at the
second row and second column (exclusive)
• array1[1:3, 2:4] - slices array1 that starts at the second row and third column (index 1 and 2), and ends
at the third row and fourth column (index 2 and 3)
Assignment:
Write a Python Program using Numpy array with the following:
i. Create an 2d numpy array with elements 1 to 16
ii. Select a specific range of rows say 1:3
iii. Select a specific range of columns
iv. Skipping the element using a step
v. Reversing the array
import numpy as np
arr = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]])
rows_2_to_3 = arr[1:3] # [[5, 6, 7, 8], [9, 10, 11, 12]]
cols_2_to_4 = arr[:, 1:4] # [[2, 3, 4], [6, 7, 8], [10, 11, 12], [14, 15, 16]]
every_other_row = arr[::2] # [[1, 2, 3, 4], [9, 10, 11, 12]]
every_other_col = arr[:, ::2] # [[1, 3], [5, 7], [9, 11], [13, 15]]
reverse_rows = arr[::-1] # [[13, 14, 15, 16], [9, 10, 11, 12], [5, 6, 7, 8], [1, 2, 3, 4]]
reverse_cols = arr[:, ::-1] # [[4, 3, 2, 1], [8, 7, 6, 5], [12, 11, 10, 9], [16, 15, 14, 13]]
Slicing a 3D
Slicing a 3D numpy array follows similar principles to slicing a 2D array, but you add an additional dimension
to the slicing. Here's a basic example to get you started:
Let's say we have the following 3D numpy array:
import numpy as np
arr = np.array([[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr3 = np.array([7, 8, 9])
# Stack arrays along a new axis
stacked_arr = np.stack((arr1, arr2, arr3), axis=0)
print("Stacked Array along a new axis (Axis 0):")
print(stacked_arr)
Following is the output obtained −
Stacked Array along a new axis (Axis 0):
[[1 2 3]
[4 5 6]
[7 8 9]]
Example: Changing the Axis
The "axis" parameter in numpy.stack() function determines where the new axis is inserted. By changing the
value of axis, you can control how the arrays are stacked −
Open Compiler
import numpy as np
# arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr3 = np.array([7, 8, 9])
# Stack arrays along axis 1
stacked_arr = np.stack((arr1, arr2, arr3), axis=1)
print("Stacked Array along Axis 1:")
print(stacked_arr)
This will produce the following result −
Stacked Array along Axis 1:
[[1 4 7]
[2 5 8]
[3 6 9]]
Example: Stacking Multi-dimensional Arrays
The numpy.stack() function can also be used to stack multi-dimensional arrays. The function adds a new axis
to the higher-dimensional arrays and stacks them accordingly.
In here, we are stacking two 2D arrays −
Open Compiler
import numpy as np
# 2D arrays
arr1 = np.array([[1, 2],
[3, 4]])
arr2 = np.array([[5, 6],
[7, 8]])
# Stack arrays along a new axis
stacked_arr = np.stack((arr1, arr2), axis=0)
print("Stacked 2D Arrays along a new axis (Axis 0):")
print(stacked_arr)
[[5 6]
[7 8]]]
[[3 7]
[4 8]]]
Concatenating ndarrays
Concatenating ndarrays refers to the process of joining multiple NumPy arrays along a specified axis. In
simpler terms, it's like sticking arrays together end-to-end. This can be done along different dimensions (axes)
of the arrays.
Here's a more detailed breakdown:
• Axis 0: Concatenation along rows (vertically). Think of stacking arrays on top of each other.
• Axis 1: Concatenation along columns (horizontally). Think of placing arrays side-by-side.
Here's a visual example:
1.Concatenating along the first axis (rows):
import numpy as np
# Creating two ndarrays
array1 = np.array([[1, 2], [3, 4]])
array2 = np.array([[5, 6]])
# Concatenating along the first axis (rows)
result = np.concatenate((array1, array2), axis=0)
print(result)
2.Concatenating along the second axis (columns):
import numpy as np
# Creating two ndarrays
array1 = np.array([[1, 2], [3, 4]])
array2 = np.array([[5, 6], [7, 8]])
# Concatenating along the second axis (columns)
result = np.concatenate((array1, array2), axis=1)
print(result)
3.Concatenating multiple ndarrays:
import numpy as np
# Creating multiple ndarrays
array1 = np.array([1, 2])
array2 = np.array([3, 4])
array3 = np.array([5, 6])
# Concatenating multiple ndarrays
result = np.concatenate((array1, array2, array3))
print(result)
You can use the axis parameter to control the axis along which the arrays will be joined. axis=0 means along
the rows, and axis=1 means along the columns.
Broadcasting in NumPy
Broadcasting in NumPy is a powerful mechanism that allows you to perform operations on arrays of different
shapes in a way that would otherwise require you to manually expand their dimensions. It works by
"broadcasting" the smaller array across the larger array so that they have compatible shapes.
Here’s a simple example to illustrate the concept:
Suppose you have the following arrays:
Example 1:
import numpy as np
array1 = np.array([1, 2, 3])
array2 = np.array([[1], [2], [3]])
The shapes of array1 and array2 are (3,) and (3, 1) respectively.
When you perform an operation like addition, NumPy broadcasts array1 over array2:
result = array1 + array2
print(result)
Output:
[[2, 3, 4],
[3, 4, 5],
[4, 5, 6]]
Here's a step-by-step breakdown of broadcasting rules:
1. Align Shapes: Starting with the trailing dimensions, NumPy compares the dimensions of each array.
If the dimensions are equal, or one of them is 1, they are compatible.
2. Stretch to Match: If a dimension in one array is 1 while the corresponding dimension in the other
array is greater than 1, the array with the dimension of 1 is stretched to match the other array’s
dimension.
3. Apply Operation: Once the shapes are compatible, NumPy applies the operation element-wise.
Example of broadcasting in practice:
• Scalar and Array:
import numpy as np
array = np.array([1, 2, 3])
scalar = 2
result = array + scalar # Broadcasting scalar to match the shape of the array
print(result)
# Output: [3 4 5]
Two Arrays:
import numpy as np
array1 = np.array([[1, 2, 3], [4, 5, 6]])
array2 = np.array([1, 2, 3])
result = array1 + array2 # Broadcasting array2 to match the shape of array1
print(result)
# Output: [[2 4 6]
# [5 7 9]]
Broadcasting
Broadcasting enables concise and efficient code, reducing the need for explicit loops and making operations
on arrays of differing shapes easier and more intuitive.
What is Pandas?
Pandas is a popular open-source library in Python that's essential for data manipulation and analysis. It
provides powerful tools for handling structured data such as tables, spreadsheets, or databases. The name
Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data. In 2008,
developer Wes McKinney started developing pandas when in need of high performance, flexible tool for
analysis of data.
Here are a few highlights of Pandas:
• Data Structures: It primarily uses two data structures—Series (1D) and DataFrame (2D)—to
organize and manipulate data.
• Data Cleaning: Pandas makes it easy to clean and preprocess messy datasets, including handling
missing or duplicate values.
• Data Operations: You can perform operations like filtering, merging, grouping, and aggregating
data with ease.
• File Handling: It supports importing/exporting data to various formats like CSV, Excel, SQL
databases, and more.
Applications of Pandas
• Data Cleaning
• Data Exploration
• Data Preparation
• Data Analysis
• Data Visualisation
• Time Series Analysis
• Data Aggregation and Grouping
• Data Input/Output
• Machine Learning
• Web Scraping
• Financial Analysis
• Text Data Analysis
• Experimental Data Analysis
Installation of Pandas
If you have Python and PIP already installed on a system, then installation of Pandas is
very easy.
Pandas comes pre-installed with Anaconda, so you can directly import it in your Python environment.
import pandas as pd
Pandas Series
A Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float,
Python objects, etc.). The axis labels are collectively called indexes.
Creating a Series
Pandas Series is created by loading the datasets from existing storage (which can be a SQL database, a CSV
file, or an Excel file).
Pandas Series can be created from lists, dictionaries, scalar values, etc.
import pandas as pd
import numpy as np
data = np.array(['r', 'a', 'a', 'k', 'i'])
ser = pd.Series(data)
print(ser)
Additional Exercises
1. Create a simple Pandas Series from a list?
2. Return the first value of the series.
Create Labels
With the index argument, you can name your own labels.
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)
import pandas as pd
ser=pd.Series(range(1,20,3), index=[x for x in 'abcdefg'])
print(ser)
Creating a Series from a Dictionary
A dictionary in Python stores data as key-value pairs. When we convert Dictionary into a Pandas Series the
keys become index labels and the values become the data. This method is useful for labeled data preserving
structure and enabling quick access.
import pandas as pd
data_dict = {'Pandas': 10, 'and': 20, 'NumPy': 30}
ser = pd.Series(data_dict)
print(ser)
Create Labels
With the index argument, you can name your own labels.
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)
A Series is a one-dimensional labeled array that can hold any data type. It can store integers, strings,
floating-point numbers, etc. Each value in a Series is associated with a label (index), which can be an integer
or a string.
Name Steve
Age 35
Gender Male
Rating 3.5
import pandas as pd
data = ['Steve', '35', 'Male', '3.5']
series = pd.Series(data, index=['Name', 'Age', 'Gender', 'Rating'])
print(series)
import pandas as pd
df = pd.read_csv('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/data.csv')
ser = pd.Series(df['Duration'])
data = ser.head(10)
print(data)
import pandas as pd
df = pd.read_csv('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/data.csv')
ser = pd.Series(df['Duration'])
data = ser.head(10)
print(data.iloc[3:6])
Example 2:
import pandas as pd
import re
data = pd.read_csv("C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/nlist.csv")
data.dropna(inplace = True)
dtype_before = type(data["Salary"])
salary_list = data["Salary"].tolist()
dtype_after = type(salary_list)
print("Data type before converting = {}\nData type after converting = {}"
.format(dtype_before, dtype_after))
salary_list
DataFrame
A DataFrame is a two-dimensional labeled data structure with columns that can hold different data types. It
is similar to a table in a database or a spreadsheet. Consider the following data representing the performance
rating of a sales team.
3.Filtering Data
# Filter rows where Age is greater than 28
filtered_df = df[df['Age'] > 28]
print(filtered_df)
5.Basic Statistics
# Calculate mean age
mean_age = df['Age'].mean()
print("Mean Age:", mean_age)
6.Reading/Writing Data
# Reading from a CSV file
df = pd.read_csv('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/nlist.csv')
# Writing to a CSV file
df.to_csv('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/output.csv', index=False)
7.Updating Data
# Update a specific value
df.at[0, 'Age'] = 26
print(df)
8. Dropping Rows or Columns
# Drop a column
df = df.drop('Age', axis=1)
# Drop a row
df = df.drop(1) # Removes the row with index 1
print(df)
9. Sorting Data
# Sort by age in descending order
sorted_df = df.sort_values(by='Number', ascending=False)
print(sorted_df)
14.Concat() in DataFrames
The function in Pandas is used to concatenate or combine multiple DataFrames (or Series) along a
particular axis—either rows (axis=0) or columns (axis=1). It provides flexibility to merge data even
when the indices or columns don't align.
Basic Syntax
pd.concat(objs, axis=0, join='outer', ignore_index=False)
EXPERIMENT -7
In Pandas, you can fill values with a string using the method. Here's a quick example:
import pandas as pd
import numpy as np
# Create a DataFrame with NaN values
data = {'Name': ['Alice', 'Bob', np.nan, 'David'],
'Age': [25, np.nan, 30, np.nan]}
df = pd.DataFrame(data)
# Fill NaN values with a string
df_filled = df.fillna('Unknown')
print(df_filled)
You can use the method with specific columns or the entire DataFrame. If you want to replace in a
particular column, you can do something like this:
import pandas as pd
import numpy as np
# Create a DataFrame with NaN values
data = {'Name': ['Alice', 'Bob', np.nan, 'David'],
'Age': [25, np.nan, 30, np.nan]}
df = pd.DataFrame(data)
df['Name'] = df['Name'].fillna('No Name')
print(df)
import pandas as pd
import numpy as np
# Create a DataFrame with NaN values
data = {'Name': ['Alice', 'Bob', np.nan, 'David'],
'Age': [25, np.nan, 30, np.nan]}
df = pd.DataFrame(data)
df.loc[df['Name'].isna(), 'Name'] = 'Condition-Based Name'
print(df)
Sorting DataFrames based on column values can be done using the method. Here are some examples to
guide you:
Example 1: Sorting in Ascending Order
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 28]}
df = pd.DataFrame(data)
# Sort by Age in ascending order
df_sorted = df.sort_values(by='Age')
print(df_sorted)
import pandas as pd
# Create a DataFrame
data = {'Category': ['A', 'B', 'A', 'B', 'A'],
'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)
# Group by 'Category' and calculate the sum of 'Values'
grouped = df.groupby('Category')['Values'].sum()
print(grouped)
Here, the data is grouped by the column, and the sum of within each group is calculated.
Text files
CSV files
Excel files
JSON files
Pandas offers convenient methods to read various file formats into a DataFrame. Below are examples for
each format:
1. Text Files
You can read text files using or (if the text file is structured like a CSV):
import pandas as pd
# Read a text file
df_text = pd.read_table('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/nlist.txt')
print(df_text)
If your text file has delimiters, specify them using the parameter:
df_text = pd.read_csv('example.txt', sep='\t') # For tab-separated values
2. CSV Files
CSV files are easily handled with :
import pandas as pd
df_csv = pd.read_csv('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/nlist.csv')
print(df_csv)
3. Excel Files
Excel files can be read using :
import pandas as pd
df_excel = pd.read_excel('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/nlist.xlsx', sheet_name='nlist')
print(df_excel)
If the Excel file contains multiple sheets, specify the sheet name using the parameter or load all sheets as a
dictionary:
4. JSON Files
JSON files can be loaded using :
import pandas as pd
df = pd.json_normalize(pd.read_json('C:/Users/Raakesh
Kumar/OneDrive/Desktop/csv/colors.json')['colors'])
print(df)
Option - 2
import pandas as pd
# Read the JSON file
df = pd.read_json('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/colors.json', orient='records')
# Display the DataFrame
print(df)
Option -3
import pandas as pd
Experiment – 9
1. Pickle Files
Pickle files are binary files used to serialize and deserialize Python objects.
You can use the module or Pandas' built-in methods for Pickle files.
Example: Using Pandas to Read and Write Pickle Files
import pandas as pd
# Save a DataFrame as a Pickle file
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df.to_pickle('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/example.pkl')
# Load the Pickle file
df_loaded = pd.read_pickle('C:/Users/Raakesh
Kumar/OneDrive/Desktop/csv/example.pkl')
print(df_loaded)
2. Image Files Using PIL
To read image files, you can use the Python Imaging Library (PIL), which is
now maintained under the package.
Example: Reading and Displaying an Image
Experiment -10
3. Scrapy:
• A powerful Python framework designed for large-scale scraping.
• It handles crawling, data extraction, and pipelines to store data
efficiently.
• Example: Building a scraper to collect job listings across multiple sites.
4. Puppeteer (JavaScript):
• A Node.js library for controlling headless Chrome browsers.
• Perfect for scraping content that requires JavaScript execution.
• Example: Gathering live sports scores from dynamic websites.
5. Octoparse:
• A no-code web scraping tool with a user-friendly interface.
• Great for non-programmers who want to extract data visually.
• Example: Scraping e-commerce product information.
6. Apify:
• A platform offering pre-built scrapers (actors) and tools for custom
scraping.
• You can deploy and run scrapers in the cloud.
• Example: Monitoring competitor prices online.
7. ParseHub:
• A visual scraping tool that works well with dynamic websites.
• A visual scraping tool that works well with dynamic websites.
• Example: Collecting weather data from a regional website.
8. Requests (Python):
• Often paired with BeautifulSoup, it allows you to send HTTP requests to
fetch web pages.
• Example: Accessing the HTML content of a webpage for parsing.
Each tool has its strengths depending on your use case. Are you interested in a
specific one or need help with a particular task? Let’s make it happen!
1. Simple Web Scraper (Python with BeautifulSoup):
You want to extract the titles of articles from a blog. Here's a Python
example using the BeautifulSoup library:
import requests
from bs4 import BeautifulSoup
url = "https://exampleblog.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
for title in soup.find_all("h2", class_="entry-title"):
print(title.text)
Output:
What is a Blog? A Simple Guide to Understanding Blogs
2. Scraping E-commerce Data:
Suppose you want to get prices of items from an e-commerce site to
compare them. Using Python with Selenium (for dynamic pages):
Step 1: Install Selenium
Open your terminal or command prompt and run the following command
to install Selenium:
pip install selenium
Step 2: Check Installation
Once installed, verify it by opening a Python shell and running:
import selenium
print(selenium.__version__)
This should display the version number of Selenium.
Step 3: Install WebDriver
Selenium requires a WebDriver to interact with the browser. For example,
if you're using Chrome:
To install a WebDriver on Windows, follow these steps:
1. Identify Your Browser
Determine which browser you want to automate (e.g., Chrome, Edge,
Firefox).
2. Download the WebDriver
For Chrome: Download ChromeDriver from here. Ensure the version
matches your Chrome browser version.
https://www.selenium.dev/downloads/
It's important to note that web scraping must be done ethically and within
the legal boundaries. Many websites have terms of service that specify how
their data can be used, and ignoring these could lead to consequences.
import requests
from bs4 import BeautifulSoup
url = "https://www.naukri.com/bkpmg-health-solution-overview-
4582789?tab=jobs&functionAreaIdGid=25&searchId=17442659254677956&src
=orgCompanyListing/"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
for job in soup.find_all("div", class_="job-listing"):
title = job.find("h2", class_="headJobs").text
company = job.find("span", class_="company-page").text
location = job.find("span", class_="location").text
print(f"Title: {title}, Company: {company}, Location: {location}")
EXPERIMENT - 11
Perform following preprocessing techniques on loan prediction dataset
• Feature Scaling
• Feature Standardization
• Label Encoding
• One Hot Encoding
Feature scaling
Feature Scaling is a technique to standardize the independent features present in
the data. It is performed during the data pre-processing to handle highly varying
values. If feature scaling is not done then machine learning algorithm tends to
use greater values as higher and consider smaller values as lower regardless
of the unit of the values. For example it will take 10 m and 10 cm both as same
regardless of their unit. Here we will learn about different techniques which are
used to perform feature scaling.
Why is Feature Scaling Important?
• Improves Model Performance: Features with larger values can
disproportionately impact model training. Scaling brings all features to
comparable ranges.
• Speeds Up Optimization: Gradient descent converges faster when features
are scaled.
• Handles Units Differences: Features measured in different units (e.g.,
income in dollars vs. age in years) need to be standardized to avoid bias.
Types of Feature Scaling:
1. Standardization: Scales features to have a mean of 0 and a standard
deviation of 1.
2. Normalization: Scales features to a range, like [0, 1], or makes the feature
vector length 1 (L2 norm).
3. Min-Max Scaling: Normalizes features to a fixed range, typically [0, 1].
4. Robust Scaling: Uses the median and interquartile range, making it less
sensitive to outliers.
5. MaxAbs Scaling: Divides each feature by its maximum absolute value.
import pandas as pd
df = pd.read_csv('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/sampleFile.csv')
print(df.head())
Now let’s apply the first method which is of the absolute maximum scaling. For
this first, we are supposed to evaluate the absolute maximum values of the
columns.
import pandas as pd
df = pd.read_csv('C:/Users/Raakesh
Kumar/OneDrive/Desktop/csv/SampleFile.csv')
print(df.head())
Now let’s apply the first method which is of the absolute maximum scaling. For
this first, we are supposed to evaluate the absolute maximum values of the
columns
#max abs scaling
import numpy as np
max_vals = np.max(np.abs(df))
print(max_vals)
Now we are supposed to subtract these values from the data and then divide the
results from the maximum values as well.
print((df - max_vals) / max_vals)
2. Min-Max Scaling
This method of scaling requires below two-step:
1. First we are supposed to find the minimum and the maximum value of the
column.
2. Then we will subtract the minimum value from the entry and divide the
result by the difference between the maximum and the minimum value.
Xscaled=Xi−XminXmax–XminXscaled=Xmax–XminXi−Xmin
As we are using the maximum and the minimum value this method is also prone
to outliers but the range in which the data will range after performing the above
two steps is between 0 to 1.
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_data,
columns=df.columns)
z = scaled_df.head()
print(z)
3. Normalization
This method is more or less the same as the previous method but here instead of
the minimum value we subtract each entry by the mean value of the whole data
and then divide the results by the difference between the minimum and the
maximum value.
Xscaled=Xi−XmeanXmax–XminXscaled=Xmax–XminXi−Xmean
Xscaled=Xi−Xmean
---------------
σ
from sklearn.preprocessing import Normalizer
scaler = Normalizer()
scaled_data = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_data,
columns=df.columns)
print(scaled_df.head())
Example2:
from sklearn.preprocessing import StandardScaler
import pandas as pd
# Example dataset
data = {
'Feature1': [10, 20, 30, 40, 50],
'Feature2': [100, 200, 300, 400, 500],
'Feature3': [1000, 2000, 3000, 4000, 5000]
}
df = pd.DataFrame(data)
# Initialize the scaler
scaler = StandardScaler()
# Standardize features
standardized_features = scaler.fit_transform(df)
# Convert the numpy array back to a DataFrame for better readability
standardized_df = pd.DataFrame(standardized_features,
columns=df.columns)
print("Standardized Data:")
print(standardized_df)
5. Robust Scaling
In this method of scaling, we use two main statistical measures of the data.
• Median
• Inter-Quartile Range
After calculating these two values we are supposed to subtract the median from
each entry and then divide the result by the interquartile range.
Xscaled=Xi−Xmedian
---------------------------
IQR
Label Encoding
Label encoding is a technique used to convert categorical variables into
numerical format, which is essential for many machine learning models. Here's a
Python example using scikit-learn:
Example:
from sklearn.preprocessing import LabelEncoder
import pandas as pd
# Example dataset
data = {
'Category': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana']
}
df = pd.DataFrame(data)
# Initialize the label encoder
label_encoder = LabelEncoder()
# Perform label encoding
df['Category_encoded'] = la
bel_encoder.fit_transform(df['Category'])
print("Original Data:")
print(df)
print("\nMapping of Encoded Labels:")
for category, encoded_value in zip(label_encoder.classes_,
range(len(label_encoder.classes_))):
print(f"{category} -> {encoded_value}")
This code will transform the categorical values (Apple, Banana, Orange) into
numerical labels (0, 1, 2). Note that label encoding assigns integers arbitrarily, so
it is suitable for ordinal data or categories without any inherent ranking.
1. Encoding Multiple Categorical Columns:
from sklearn.preprocessing import LabelEncoder
import pandas as pd
# Example dataset with multiple categorical columns
data = {
'Color': ['Red', 'Blue', 'Green', 'Red'],
'Size': ['Small', 'Large', 'Medium', 'Small']
}
df = pd.DataFrame(data)
# Initialize the label encoder
label_encoder = LabelEncoder()
# Apply label encoding for each column
for column in df.columns:
df[column + '_encoded'] = label_encoder.fit_transform(df[column])
print("Original Data:")
print(df)
This code creates binary columns for each unique animal, like Animal_Cat,
Animal_Dog, and Animal_Bird.