0% found this document useful (0 votes)
3 views92 pages

Advance Python Program Unit II

This document provides an introduction to NumPy, a Python library for scientific computing that supports large, multi-dimensional arrays and matrices, along with high-level mathematical functions. It covers features, installation, array creation, and various types of arrays including 0-D, 1-D, 2-D, and 3-D arrays, as well as special functions for generating arrays like np.zeros(), np.ones(), and random number generation. Additionally, it explains how to create identity and diagonal matrices, making it a comprehensive guide for using NumPy in scientific computing.

Uploaded by

Abhi Bunny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views92 pages

Advance Python Program Unit II

This document provides an introduction to NumPy, a Python library for scientific computing that supports large, multi-dimensional arrays and matrices, along with high-level mathematical functions. It covers features, installation, array creation, and various types of arrays including 0-D, 1-D, 2-D, and 3-D arrays, as well as special functions for generating arrays like np.zeros(), np.ones(), and random number generation. Additionally, it explains how to create identity and diagonal matrices, making it a comprehensive guide for using NumPy in scientific computing.

Uploaded by

Abhi Bunny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

Unit – II

Scientific Computing with Numpy and Pandas


Applications of Python-NumPy & Pandas
NumPy Introduction
• NumPy is an open-source Python library/package/module/tool that provides support for large, multi-
dimensional arrays and matrices. It also have a collection of high-level mathematical functions to
operate on arrays.
• NumPy was created in 2005 by Travis Oliphant.

Features of NumPy

• Provides powerful, multidimensional array objects (ndarrays).


• Supports advanced slicing and indexing techniques.
• Reduces the need for explicit loops with efficient array operations.
• Easily integrates with other libraries like pandas, matplotlib, and SciPy.
• Useful linear algebra, Fourier transform, and random number capabilities

Install Python NumPy

Install it using this command:

pip install numpy

Import NumPy

Once NumPy is installed, import it in your applications by adding the import keyword:

import numpy

import numpy as np

Note: np - alias name

Now the NumPy package can be referred to as np instead of numpy.

To check NumPy version

Method – 1:

To find out the version of NumPy you are using, just import the module (if not already imported)
and then print out “numpy.__version__“.

Example:
import numpy as np

print("My numpy version is: ", np.__version__)

Output: My numpy version is: 1.26.4

Method – 2:
Using pip show to check version of Numpy
Syntax: pip show <package_name>
Example:

pip show numpy


Output:

Arrays in NumPy
• NumPy’s main object is the homogeneous multidimensional array.
• Arrays are a collection of the same type of elements/values that can have one or more
dimensions.
• An array of one dimension is called a Vector, while having two dimensions is called a Matrix.
• In NumPy, dimensions are called axes. The number of axes is rank.
• NumPy arrays are called ndarray or N-dimensional arrays.
Creating a NumPy Array
To create an ndarray, we can pass a list, tuple or any array-like object into the array() method,
and it will be converted into an ndarray.
Dimensions in Arrays
A dimension in arrays is one level of array depth (nested arrays)

0-D Arrays

0-D arrays, or Scalars, are the elements in an array. Each value in an array is a 0-D array.
Example
import numpy as np
arr = np.array(10)
print(arr)

Output:
10
1-D Arrays
• An array that has 0-D arrays as its elements is called uni-dimensional or 1-D array.
• These are the most common and basic arrays.
Example - 1: Creating 1-D array using list and array()
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)

Output:
[1 2 3 4 5]
Example - 2: Creating 1-D array using tuple and array()
import numpy as np
arr = np.array((1, 2, 3, 4, 5))
print(arr)

Output:
[1 2 3 4 5]
Example – 3: Creating 1-D array using list and asarray()
import numpy as np

list = [10,20,30,40]
arr = np.asarray(list)
print(arr)
Output:
[10 20 30 40]

Example – 4: Creating 1-D array using list and asarray()


import numpy as np

tuple = (10,20,30,40)
arr = np.asarray(tuple)
print(arr)
Output:
[10 20 30 40]
Example – 5: Creating 1-D array using loop and ndarray()
import numpy as np
n = int(input("Enter Size :"))
arr = np.ndarray(shape=(n),dtype=int)

print("Enter %d elements :" %n)


for i in range(n):
arr[i] = int(input())
print("Array elements :",arr)
Output:
Enter Size :5
Enter 5 elements :
1
2
3
4
5
Array elements : [1 2 3 4 5]

Initialize a Python NumPy Array Using Special Functions


NumPy provides several built-in functions to generate arrays with specific properties.
• np.empty():This is an array that isn’t initialized with any specific values. It’s like a blank page,
ready to be filled with data later. However, it will contain random leftover values in
memory until you update it.
• np.zeros(): Creates an array filled with zeros.
• np.ones(): Creates an array filled with ones.
• np.full(): Creates an array filled with a specified value.
• np.arange(): Creates an array with values that are evenly spaced within a given range.
• np.linspace(): Creates an array with values that are evenly spaced over a specified interval.

Empty array:

Example - 1:
import numpy as np

empty_array = np.empty(0)
print("Empty Array:\n", empty_array)
Output:

Empty Array:
[]

Example - 2:

import numpy as np

empty_array = np.empty(5)
print("Empty Array:\n", empty_array)
Output:
Empty Array:
[6.23042070e-307 4.67296746e-307 1.69121096e-306 9.40672775e-312
3.56175136e-317]

Array of zeros

Example - 1:

import numpy as np

zeros_array = np.zeros(5)
print("zeros_array\n",zeros_array)
Output:
zeros_array
[0. 0. 0. 0. 0.]

Example - 2:

import numpy as np

zeros_array = np.zeros(5,dtype=int)
print("zeros_array\n",zeros_array)
Output:
zeros_array
[0 0 0 0 0]

Array of ones

Example – 1:

import numpy as np

ones_array = np.ones(5)
print("ones_array\n",ones_array)
Output:
ones_array
[1. 1. 1. 1. 1.]

Example – 2:

import numpy as np

ones_array = np.ones(5,dtype=int)
print("ones_array\n",ones_array)
Output:
ones_array
[1 1 1 1 1]

An array of your choice

Example:

import numpy as np

constant_array = np.full(5,7)
print("constant_array\n",constant_array)
Output:
constant_array
[7 7 7 7 7]

Evenly spaced ndarray / Creating Arrays with a Range of Values


Example - 1:

import numpy as np

range_array = np.arange(0, 10, 2) # start, stop, step


print("Range Array:\n",range_array)
Output:
Range Array:
[0 2 4 6 8]

Example - 2:

import numpy as np

range_array = np.arange(1, 10, 1) # start, stop, step


print("Range Array:\n",range_array)
Output:
Range Array:
[1 2 3 4 5 6 7 8 9]

Example - 3:

import numpy as np

range_array = np.arange(1, 10, 2) # start, stop, step


print("Range Array:\n",range_array)
Output:
Range Array:
[1 3 5 7 9]

Example - 4:

import numpy as np

linspace_array = np.linspace(1, 10, 10) # start, stop, num


print("Linspace Array:\n",linspace_array)
Output:
Linspace Array:
[ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]

Example - 5:

import numpy as np

linspace_array = np.linspace(1, 10, 10,dtype=int) # start, stop, num


print("Linspace Array:\n",linspace_array)
Output:
Linspace Array:
[ 1 2 3 4 5 6 7 8 9 10]

Random numbers in ndarray

NumPy also has functions for generating arrays with random values, useful for simulations and
testing.

• Random Float Array : np.random.rand() function generates an array of random values between
0 and 1.

Example:
import numpy as np

arr = np.random.rand(5)
print(arr)
Output:

[0.83366232 0.78974204 0.29426745 0.02493498 0.9331421 ]

• Random Integers : If we need random integers, we can use np.random.randint() to create arrays
with integer values in a specified range.

Example:

import numpy as np

arr = np.random.randint(1,10,size=6)
print(arr)
Output:

[6 3 7 6 2 6]

2-D Arrays

• An array that has 1-D arrays as its elements is called a 2-D array.
• These are often used to represent matrix
Example - 1: Creating 2-D array using list and array()
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)

Output:
[[1 2 3]
[4 5 6]]

Example – 2: Creating 2-D array using list and asarray()


import numpy as np

arr = np.asarray([[1, 2, 3], [4, 5, 6]])


print(arr)

Output:
[[1 2 3]
[4 5 6]]

Example – 3: Creating 2-D array using loop and ndarray()


import numpy as np

r = int(input("Enter rowSize : "))


c = int(input("Enter colSize : "))

arr = np.ndarray(shape=(r,c),dtype=int)

print("Enter %d elements :" %(r*c))

for i in range(r):
for j in range(c):
arr[i][j] = int(input())

print("Array elements:\n",arr)

Output:
Enter rowSize : 2
Enter colSize : 2
Enter 4 elements :
1
2
3
4
Array elements:
[[1 2]
[3 4]]
Empty array:
Example - 1:
import numpy as np

empty_array = np.empty([0,0])
print("Empty Array:\n", empty_array)
Output:

Empty Array:
[]

Example - 2:
import numpy as np
empty_array = np.empty([2,2])
print("Empty Array:\n", empty_array)
Output:
Empty Array:
[[0. 0.]
[0. 0.]]
Array of zeros
Example - 1:
import numpy as np

zeros_array = np.zeros([2,2])
print("zeros_array\n",zeros_array)
Output:
zeros_array
[[0. 0.]
[0. 0.]]
Example - 2:
import numpy as np

zeros_array = np.zeros([2,2],dtype=int)
print("zeros_array\n",zeros_array)
Output:
zeros_array
[[0 0]
[0 0]]
Array of ones

Example – 1:

import numpy as np

ones_array = np.ones([2,3])
print("ones_array\n",ones_array)
Output:
ones_array
[[1. 1. 1.]
[1. 1. 1.]]
Example – 2:

import numpy as np

ones_array = np.ones([2,3],dtype=int)
print("ones_array\n",ones_array)
Output:
ones_array
[[1 1 1]
[1 1 1]]

An array of your choice

Example:

import numpy as np

constant_array = np.full([3,3],7)
print("constant_array\n",constant_array)
Output:
constant_array
[[7 7 7]
[7 7 7]
[7 7 7]]
Random numbers in ndarray
NumPy also has functions for generating arrays with random values, useful for simulations and
testing.
• Random Float Array : np.random.rand() function generates an array of random values between
0 and 1.

Example:
import numpy as np

arr_rand = np.random.rand(2, 3)
print(arr_rand)
Output:
[[0.01466307 0.9677552 0.25104995]
[0.66652019 0.27030812 0.91041412]]

• Random Integers : If we need random integers, we can use np.random.randint() to create arrays
with integer values in a specified range.

Example:
import numpy as np

arr_int = np.random.randint(1, 10, size=(3, 3))


print(arr_int)
Output:
[[1 4 2]
[8 3 2]
[7 4 4]]

Identity and Diagonal Matrices in NumPy


NumPy also provides functions for creating identity matrices and diagonal matrices, which are often
used in linear algebra.

• Identity Matrix : np.eye() function creates an identity matrix, a square matrix with ones on the
diagonal and zeros elsewhere.

Example:
import numpy as np

identity_matrix = np.eye(3)
print(identity_matrix)
Output:
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
• Diagonal Matrix : Use np.diag() to create a diagonal matrix, where the provided array elements
form the diagonal.

Example:
import numpy as np

diag_matrix = np.diag([1, 2, 3])


print(diag_matrix)

Output:
[[1 0 0]
[0 2 0]
[0 0 3]]

3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-D array.
Example - 1: Creating 3-D array using list and array()
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr)

Output:
[[[1 2 3]
[4 5 6]]
[[7 8 9]
[10 11 12]]]

Example - 2: Creating 3-D array using loop and ndarray()


import numpy as np
n = int(input("Enter No of matrix : "))
r = int(input("Enter rowSize : "))
c = int(input("Enter colSize : "))
arr = np.ndarray(shape=(n,r,c),dtype=int)
print("Enter %d elements :" %(n*r*c))
for i in range(n):
for j in range(r):
for k in range(c):
arr[i][j][k] = int(input())

print("Array elements:\n",arr)

Output:

Enter No of matrix : 2
Enter rowSize : 2
Enter colSize : 2
Enter 8 elements :
1
2
3
4
5
6
7
8
Array elements:
[[[1 2]
[3 4]]

[[5 6]
[7 8]]]

Empty array:

Example - 1:
import numpy as np

empty_array = np.empty([0,0,0])
print("Empty Array:\n", empty_array)
Output:

Empty Array:
[]

Example - 2:
import numpy as np
empty_array = np.empty([2,2,2])
print("Empty Array:\n", empty_array)
Output:
Empty Array:
[[[0.1724733 0.21789787]
[0.02467395 0.06510989]]

[[0.13597087 0.80837594]
[0.59047233 0.55820254]]]
Array of zeros
Example - 1:
import numpy as np

zeros_array = np.zeros([2,2,2])
print("zeros_array\n",zeros_array)
Output:
zeros_array
[[[0. 0.]
[0. 0.]]

[[0. 0.]
[0. 0.]]]

Example - 2:
import numpy as np

zeros_array = np.zeros([2,2,2],dtype=int)
print("zeros_array\n",zeros_array)
Output:
zeros_array
[[[0 0]
[0 0]]

[[0 0]
[0 0]]]

Array of ones

Example – 1:

import numpy as np

ones_array = np.ones([2,2,3])
print("ones_array\n",ones_array)
Output:
ones_array
[[[1. 1. 1.]
[1. 1. 1.]]

[[1. 1. 1.]
[1. 1. 1.]]]

Example – 2:

import numpy as np

ones_array = np.ones([2,2,3],dtype=int)
print("ones_array\n",ones_array)
Output:
ones_array
[[1 1 1]
[1 1 1]]

[[1 1 1]
[1 1 1]]

An array of your choice

Example:

import numpy as np

constant_array = np.full([2,2,2],7)
print("constant_array\n",constant_array)
Output:
constant_array
[[[7 7]
[7 7]]

[[7 7]
[7 7]]]

Random numbers in ndarray


NumPy also has functions for generating arrays with random values, useful for simulations and
testing.
• Random Float Array : np.random.rand() function generates an array of random values between
0 and 1.

Example:
import numpy as np
arr_rand = np.random.rand(2,2,2)
print(arr_rand)
Output:
[[[0.33797995 0.76896482]
[0.27211375 0.85260367]]

[[0.0864397 0.49337954]
[0.45241024 0.14987361]]]

• Random Integers : If we need random integers, we can use np.random.randint() to create arrays
with integer values in a specified range.

Example:

import numpy as np

arr_int = np.random.randint(1, 10, size=(2,2,2))


print(arr_int)
Output:
[[[1 2]
[2 2]]

[[9 8]
[4 9]]]

Experiment -2
The Shape and Reshaping of NumPy Array

Shape of an Array

The shape of an array is the number of elements in each dimension. The shape of a NumPy array is a tuple
of integers. Each integer in the tuple represents the size of the array along a particular dimension or axis. For
example, an array with shape (3, 4) has 3 rows and 4 columns.
• For a 2D array, the shape is a tuple with two elements: number of rows, number of columns.
• For a 3D array, the shape is a tuple with three elements: depth, number of rows, number of
columns.

Get the Shape of an Array

NumPy arrays have an attribute called shape that returns a tuple with each index having the number of
corresponding elements.
Shape of an Array using 1D
In NumPy, the shape attribute also works with 1D arrays. A 1D array is essentially a list of elements. Here’s
an example to illustrate:
Example 1:
import numpy as np
# Creating a 1D array
array = np.array([1, 2, 3, 4, 5])
# Getting the shape of the 1D array
print(array.shape)
Output:
(5,)
Shape of an Array using 2D
Example 2:
import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)
Output:
(2, 4)
Shape of an Array using 3D
NumPy, working with 3D arrays is quite common, especially in fields like image processing and scientific
computing. Just like with 2D arrays, you can use the shape attribute to get the dimensions of a 3D array.
Example 3:
import numpy as np
array = np.array([[[1, 2, 3], [4, 5, 6]],
[[7, 8, 9], [10, 11, 12]]])
print(array.shape)
Output:
(2, 2, 3)
the array has 2 blocks, each containing 2 rows and 3 columns. Essentially, it’s a collection of 2D arrays
stacked together.
Accessing Array Shape

You can access the shape of a NumPy array using the shape attribute. This attribute returns a tuple of
integers, each representing the size of the array along a particular dimension.

1D Array

A 1D array is essentially a simple list of elements.


Example 4:
import numpy as np
array_1d = np.array([1, 2, 3, 4, 5])

print("Shape of the array:", array_1d.shape)

print("Number of dimensions:", array_1d.ndim)

print("Total number of elements:", array_1d.size)

Output:
Shape of the array: (5,)
Number of dimensions: 1
Total number of elements: 5

2D Array

A 2D array can be thought of as a matrix with rows and columns.

Example 5:
import numpy as np
array_2d = np.array([[1, 2, 3], [4, 5, 6]])

print("Shape of the array:", array_2d.shape)

print("Number of dimensions:", array_2d.ndim)

print("Total number of elements:", array_2d.size)


Output:
Shape of the array: (2, 3)
Number of dimensions: 2
Total number of elements: 6

Example 6:

import numpy as np
array3 = np.array([[[1, 2, 3], [4, 5, 6]],
[[7, 8, 9], [10, 11, 12]]])

print("Shape of the array:", array3.shape)


print("Number of dimensions:", array3.ndim)

print("Total number of elements:", array3.size)


Output:
Shape of the array: (2, 2, 3)
Number of dimensions: 3
Total number of elements: 12

Reshaping the NumPY array


Reshaping arrays in NumPy is a powerful feature that allows you to change the dimensions of an array without
modifying its data. You can use the reshape function to transform 1D arrays into 2D or 3D, and vice versa.
Reshaping a 1D array to a 2D array:
Converting a single-dimensional array into a multi-dimensional one is a common task in NumPy. You can use
the reshape function to accomplish this.
Example 1:
import numpy as np
# Create a 1D array
array_1d = np.array([1, 2, 3, 4, 5, 6])
# Reshape to a 2D array with 2 rows and 3 columns
array_2d = array_1d.reshape(2, 3)
print(array_2d)
Output:
[[1 2 3]
[4 5 6]]

Reshaping a 1D array to a 3D array:


Example 2:
import numpy as np
# Create a 1D array
array_1d = np.array([1, 2, 3, 4, 5, 6])
# Reshape to a 3D array with 2 blocks, each containing 1 row and 3 columns
array_3d = array_1d.reshape(2, 1, 3)
print(array_3d)
Output:
[[[1 2 3]]
[[4 5 6]]]

Reshaping 2d into 3d NumPy array


Reshaping a 2D array into a 3D array in NumPy is achievable using the reshape function. As with any
reshaping operation, the total number of elements must remain the same before and after reshaping.
Example 3:
import numpy as np
# Create a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
# Reshape to a 3D array (for example, with 2 blocks, each containing 1 row and 3 columns)
array_3d = array_2d.reshape(2, 1, 3)
print(array_3d)
Output:
[[[1 2 3]]
[[4 5 6]]]
In this case, we reshaped the 2D array into a 3D array with 2 blocks, each containing 1 row and 3 columns.
You can adjust the dimensions as needed, as long as the total number of elements is consistent.

Reshaping a 1D Array to 2D and 3D


Example 4:
import numpy as np
array_1d = np.array([1, 2, 3, 4, 5, 6])
array_2d = array_1d.reshape(2, 3)
print("Reshaped to 2D array:")
print(array_2d)
array_3d = array_1d.reshape(2, 1, 3)
print("\nReshaped to 3D array:")
print(array_3d)
Output:
Reshaped to 2D array:
[[1 2 3]
[4 5 6]]
Reshaped to 3D array:
[[[1 2 3]]
[[4 5 6]]]
Reshaping a 2D Array to 1D and 3D
Example 5:
import numpy as np
# Creating a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
# Reshaping 2D array to 1D array
array_1d = array_2d.reshape(6)
print("Reshaped to 1D array:")
print(array_1d)
# Reshaping 2D array to 3D array
array_3d = array_2d.reshape(2, 3, 1)
print("\nReshaped to 3D array:")
print(array_3d)

Output:

Reshaped to 1D array:

[1 2 3 4 5 6]

Reshaped to 3D array:

[[[1]

[2]

[3]]

[[4]

[5]

[6]]]

Reshaping a 3D Array to 1D and 2D

Example 6:

import numpy as np

# Creating a 3D array

array_3d = np.array([[[1, 2, 3], [4, 5, 6]],

[[7, 8, 9], [10, 11, 12]]])

# Reshaping 3D array to 1D array

array_1d = array_3d.reshape(12)

print("Reshaped to 1D array:")

print(array_1d)
# Reshaping 3D array to 2D array

array_2d = array_3d.reshape(4, 3)

print("\nReshaped to 2D array:")

print(array_2d)

Output:

Reshaped to 1D array:

[ 1 2 3 4 5 6 7 8 9 10 11 12]

Reshaped to 2D array:

[[ 1 2 3]

[ 4 5 6]

[ 7 8 9]

[10 11 12]]

Using reshape with -1:

The -1 in reshape can be used to automatically calculate the dimension.

Example 7:

import numpy as np

# Create a 1D array

array_1d = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

# Reshape to a 2D array with 3 rows and an automatically calculated number of columns

array_2d = array_1d.reshape(3, -1)

print(array_2d)

Output:

[[ 1 2 3 4]

[ 5 6 7 8]

[ 9 10 11 12]]

This way, you can reshape arrays in various ways, depending on your needs.

Flattening a NumPy array

Flattening a NumPy array means converting a multi-dimensional array into a one-dimensional array. You can
do this easily using the flatten() method or the ravel() method.
Example 1:

import numpy as np

# Create a 2D array

array = np.array([[1, 2, 3], [4, 5, 6]])

# Flatten the array

flattened_array = array.flatten()

print(flattened_array)

Output:

[1 2 3 4 5 6]

You can do it using ravel():

Example2:

import numpy as np
# Create a 2D array
array = np.array([[1, 2, 3], [4, 5, 6]])
# Ravel the array
raveled_array = array.ravel()
print(raveled_array)
Output:
[1 2 3 4 5 6]
Flattening NumPy arrays can be very useful when dealing with different dimensions. Here are examples of
flattening 1D, 2D, and 3D arrays:
1D Array:
Flattening a 1D array is quite straightforward as it is already a single dimension.
import numpy as np
Example 3:
import numpy as np
# Create a 1D array
array_1d = np.array([1, 2, 3, 4, 5])
# Flatten the array (though it's already flat)
flattened_array_1d = array_1d.flatten()
print(flattened_array_1d)
Output:
[1 2 3 4 5]
2D Array:
Flattening a 2D array converts it into a 1D array.
Example 4:
import numpy as np
# Create a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
# Flatten the array
flattened_array_2d = array_2d.flatten()
print(flattened_array_2d)
Output:
[1 2 3 4 5 6]
3D Array:
Flattening a 3D array also converts it into a 1D array.
Example 5:
import numpy as np
# Create a 3D array
array_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
# Flatten the array
flattened_array_3d = array_3d.flatten()
print(flattened_array_3d)
Output:
[1 2 3 4 5 6 7 8]
Transposing a NumPy

Transposing a NumPy array essentially means swapping its rows and columns. This can be done easily using
the transpose() method or the .T attribute.

1D Array:

Transposing a 1D array in NumPy doesn't change the array since it has only one dimension. The transpose
function and .T attribute are designed for multi-dimensional arrays.

Example 1:

import numpy as np

# Create a 1D array

array_1d = np.array([1, 2, 3, 4, 5])

# Attempt to transpose the array

transposed_array_1d = array_1d.T

print(transposed_array_1d)
Output:

[1 2 3 4 5]

The array remains the same. Transposing is meaningful for 2D arrays and higher, where rows and columns
can be interchanged. For a 1D array, there's no change in structure.

2D Array:

Example 2:

import numpy as np
# Create a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
# Transpose the array using transpose() method
transposed_array_2d = array_2d.transpose()
print(transposed_array_2d)
# Transpose the array using .T attribute
transposed_array_2d_T = array_2d.T
print(transposed_array_2d_T)
Output:
[[1 4]
[2 5]
[3 6]]
3D Array:
When transposing a 3D array, you can specify the order of the axes.
Example 3:
import numpy as np
# Create a 3D array
array_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
# Transpose the array specifying the axes order
transposed_array_3d = array_3d.transpose(1, 0, 2)
print(transposed_array_3d)
Output:
[[[1 2]
[5 6]]
[[3 4]
[7 8]]]
In this example, we swapped the first and second axes.
Transposing is a versatile operation that can be very useful in various data manipulation tasks.
Expanding and Squeezing a NumPy Array
Expanding and squeezing a NumPy array are common operations in NumPy for adding or removing
dimensions.
Expanding Dimensions
You can use the np.expand_dims function or the reshape method to add a new axis to an array.
1. Using np.expand_dims:
Example 1:
import numpy as np
# Create a 1D array
array_1d = np.array([1, 2, 3, 4, 5])
# Expand dimensions
expanded_array = np.expand_dims(array_1d, axis=0)
print(expanded_array)
print(expanded_array.shape)
Output:
[[1 2 3 4 5]]
(1, 5)

Example 2:
import numpy as np
# creating an input array
a = np.array([1, 2, 3, 4])
# getting the dimension of a
print(a.shape)
# expanding the axis of a
b = np.expand_dims(a, axis=1)
# getting the dimension of the new array
print(b.shape)
Output:
(4,)
(4, 1)
Example 3:
import numpy as np
# Original array
arr = np.array([1, 2, 3])
# Expanding dimensions
expanded_arr = np.expand_dims(arr, axis=0)
print("Original array:", arr)
print("Expanded array:", expanded_arr)
print("Shape of expanded array:", expanded_arr.shape)
# Using None indexing
expanded_arr_none = arr[None, :]
print("Expanded array using None indexing:", expanded_arr_none)
print("Shape of expanded array using None indexing:", expanded_arr_none.shape)
Output:
Original array: [1 2 3]
Expanded array: [[1 2 3]]
Shape of expanded array: (1, 3)
Expanded array using None indexing: [[1 2 3]]
Shape of expanded array using None indexing: (1, 3)

Example 3:
import numpy as np
# Creating an array using the array() method
arr = np.array([[5, 10, 15], [20, 25, 30]])
# Display the array
print("Our Array...",arr)
# Display the shape of array
print("Array Shape...",arr.shape)
# Check the Dimensions
print("Dimensions of our Array...",arr.ndim)
# Get the Datatype
print("Datatype of our Array object...",arr.dtype)
# Get the number of elements in an array
print("Size of array...",arr.size)
# To expand the shape of an array, use the numpy.expand_dims() method
# Insert a new axis that will appear at the axis position in the expanded array shape.
res = np.expand_dims(arr, axis=(0, 1))
# Display the expanded array
print("Resultant expanded array....", res)
# Display the shape of the expanded array
print("Shape of the expanded array...",res.shape)
# Check the Dimensions
print("Dimensions of our Array...",res.ndim)
Output:
Our Array...
[[ 5 10 15]
[20 25 30]]
Array Shape...
(2, 3)
Dimensions of our Array...
2
Datatype of our Array object...
int64
Size of array...
6
Resultant expanded array....
[[[[ 5 10 15]
[20 25 30]]]]
Shape of the expanded array...
(1, 1, 2, 3)
Dimensions of our Array...
4
The below example shows how expand_dims() function is used to add new dimensions to arrays by
adjusting their shapes and dimensions as needed
Example 4:
Import numpy as np
# Create a 2D array
x = np.array([[1, 2], [3, 4]])
print('Array x:')
print(x)
print('\n')
# Add a new axis at position 0
y = np.expand_dims(x, axis=0)
print('Array y with a new axis added at position 0:')
print(y)
print('\n')
# Print the shapes of x and y
print('The shape of x and y arrays:')
print(x.shape, y.shape)
print('\n')
# Add a new axis at position 1
y = np.expand_dims(x, axis=1)
print('Array y after inserting axis at position 1:')
print(y)
print('\n')
# Print the number of dimensions (ndim) for x and y
print('x.ndim and y.ndim:')
print(x.ndim, y.ndim)
print('\n')
# Print the shapes of x and y
print('x.shape and y.shape:')
print(x.shape, y.shape)

Output:
Array x:
[[1 2]
[3 4]]
Array y with a new axis added at position 0:
[[[1 2]
[3 4]]]
The shape of x and y arrays:
(2, 2) (1, 2, 2)
Array y after inserting axis at position 1:
[[[1 2]]
[[3 4]]]
x.ndim and y.ndim:
23
x.shape and y.shape:
(2, 2) (2, 1, 2)

Squeezing Dimensions
Squeezing a NumPy array involves removing single-dimensional entries from the shape of the array.
Squeezing a 2-dimensional NumPy array involves removing dimensions of size 1.
Example 1:
import numpy as np
# Creating a 2D array with shape (3, 1)
arr = np.array([[1], [2], [3]])
print("Original array shape:", arr.shape)
# Using np.squeeze() to remove single-dimensional entries
squeezed_arr = np.squeeze(arr)
print("Squeezed array shape:", squeezed_arr.shape)
print("Squeezed array:", squeezed_arr)
Output:
Original array shape: (3, 1)
Squeezed array shape: (3,)
Squeezed array: [1 2 3]

If you have an array with shape (1, 3, 1), squeezing it would result in an array with shape (3,).
• The first dimension has a size of 1. This means there is one "row" in the array.
• The second dimension has a size of 3. This indicates that there are three "columns" in the array.
• The third dimension has a size of 1. This signifies that each "element" in the three columns is itself a
single-element array.

Example 2:
import numpy as np
# Creating a 3D array with shape (1, 3, 1)
arr = np.array([[[1], [2], [3]]])
print("Original array shape:", arr.shape)
# Using np.squeeze() to remove single-dimensional entries
squeezed_arr = np.squeeze(arr)
print("Squeezed array shape:", squeezed_arr.shape)
print("Squeezed array:", squeezed_arr)
Output:
Original array shape: (1, 3, 1)
Squeezed array shape: (3,)
Squeezed array: [1 2 3]

You can use the np.squeeze function to remove single-dimensional entries from the shape of an array.
Example 3:
import numpy as np
# Create a 3D array with single-dimensional entries
array_3d = np.array([[[1, 2, 3]], [[4, 5, 6]]])
print(array_3d.shape)
# (2, 1, 3)
# Squeeze the array to remove dimensions of size 1
squeezed_array = np.squeeze(array_3d)
print(squeezed_array)
print(squeezed_array.shape)
Output:
[[1 2 3]
[4 5 6]]
(2, 3)
The np. squeeze function removes all dimensions of size 1 from the array shape, effectively reducing the
dimensionality. These operations can be very useful for reshaping data to fit your needs.

When you squeeze a 1-dimensional NumPy array, it will remain unchanged because there are no single-
dimensional entries to remove.
Example 4:
import numpy as np
# Creating a 1D array
arr = np.array([1, 2, 3])
print("Original array shape:", arr.shape)
# Using np.squeeze() to remove single-dimensional entries
squeezed_arr = np.squeeze(arr)
print("Squeezed array shape:", squeezed_arr.shape)
print("Squeezed array:", squeezed_arr)
Output:
Original array shape: (3,)
Squeezed array shape: (3,)
Squeezed array: [1 2 3]

Sorting
Sorting arrays in NumPy is straightforward with the np.sort() function. Let me
show you how to sort arrays:
1.Sorting a 1D array:
Example 1:
import numpy as np
# Creating a 1D array
arr = np.array([5, 2, 9, 1, 5, 6])
# Sorting the array
sorted_arr = np.sort(arr)
print("Original array:", arr)
print("Sorted array:", sorted_arr)
Output:
Original array: [5 2 9 1 5 6]
Sorted array: [1 2 5 5 6 9]
2.Sorting a 2D array along an axis:
Example 2:
import numpy as np
# Creating a 2D array
arr_2d = np.array([[5, 2, 9], [1, 5, 6]])
# Sorting along the first axis (columns)
sorted_arr_2d_axis0 = np.sort(arr_2d, axis=0)
# Sorting along the second axis (rows)
sorted_arr_2d_axis1 = np.sort(arr_2d, axis=1)
print("Original 2D array:\n", arr_2d)
print("Sorted 2D array along axis 0 (columns):\n", sorted_arr_2d_axis0)
print("Sorted 2D array along axis 1 (rows):\n", sorted_arr_2d_axis1)
Output:
Original 2D array:
[[5 2 9]
[1 5 6]]
Sorted 2D array along axis 0 (columns):
[[1 2 6]
[5 5 9]]
Sorted 2D array along axis 1 (rows):
[[2 5 9]
[1 5 6]]

In-place sorting with np.sort(), modifying the original array:


Example 3:
import numpy as np
# Creating a 2D array
arr_2d = np.array([[5, 2, 9], [1, 5, 6]])
arr_2d.sort()
print("In-place sorted array:\n", arr_2d)
Output:
In-place sorted array:
[[2 5 9]
[1 5 6]]
Argsort: If you need the indices that would sort an array, you can use
np.argsort():
Example 4:
import numpy as np
# Creating a 2D array
arr = np.array([[5, 2, 9], [1, 5, 6],[4,8,7]])
arr.sort()
print("In-place sorted array:\n", arr)
# Indices that would sort the array
indices = np.argsort(arr)
sorted_by_indices = arr[indices]
print("Indices of the sorted array:", indices)
print("Array sorted using indices:", sorted_by_indices)

Output:
In-place sorted array:
[[2 5 9]
[1 5 6]
[4 7 8]]
Indices of the sorted array:
[[0 1 2]
[0 1 2]
[0 1 2]]
Array sorted using indices:
[[[2 5 9]
[1 5 6]
[4 7 8]]

[[2 5 9]
[1 5 6]
[4 7 8]]

[[2 5 9]
[1 5 6]
[4 7 8]]]
Indexing and Slicing of NumPy Array
Indexing 1D array
Indexing refers to accessing elements of an array using their indices. NumPy arrays are zero-indexed, meaning
the first element is at index 0.
Example 1:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print("Element at index 0:", arr[0])
print("Element at index 3:", arr[3])
print(“Element at index -1:”,arr[-1])
Output:
Element at index 0: 1
Element at index 3: 4
Element at index -1:5

Indexing 2D array
With 2D arrays, you can access elements using a row and column index.
import numpy as np
# Creating a 2D array
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Accessing elements using row and column indices
element_0_1 = arr[0, 1] # Element at row 0, column 1
element_2_2 = arr[2, 2] # Element at row 2, column 2
print("Element at row 0, column 1:", element_0_1)
print("Element at row 2, column 2:", element_2_2)

Output:
Element at row 0, column 1: 2
Element at row 2, column 2: 9

Indexing Multi-Dimensional Arrays


Indexing in multi-dimensional arrays is quite similar to 1D and 2D arrays but with more dimensions to
navigate. Here are some examples to help you get the hang of it:
3D Array Example
import numpy as np
# Creating a 3D array
arr = np.array([[[1, 2, 3], [4, 5, 6]],
[[7, 8, 9], [10, 11, 12]]])
# Accessing elements using indices
element_0_1_2 = arr[0, 1, 2] # Element at layer 0, row 1, column 2
element_1_0_1 = arr[1, 0, 1] # Element at layer 1, row 0, column 1
print("Element at layer 0, row 1, column 2:", element_0_1_2)
print("Element at layer 1, row 0, column 1:", element_1_0_1)
Output:
Element at layer 0, row 1, column 2: 6
Element at layer 1, row 0, column 1: 8

Basic Slicing
Slicing in 1D NumPy arrays is a powerful way to access and manipulate portions of an array. The basic
syntax for slicing is array[start:stop:step], where start is the index to begin the slice, stop is the index to end
the slice (exclusive), and step is the stride between each index.
Example 1:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5])
Output:
[2 3 4 5]
Example 2:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[4:])
Output:
[5 6 7]
Example 3:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[:4])
output:
[1 2 3 4]
Example 4:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[-3:-1])
Output:
[5 6]
Example 5:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5:2])
Output:
[2 4]

Example 6:
import numpy as np
# Creating a 1D array
arr = np.array([10, 20, 30, 40, 50])
# Slicing from index 1 to 4 (exclusive)
slice_1 = arr[1:4]
print("Slice from index 1 to 4:", slice_1)
# Slicing from start to index 3 (exclusive)
slice_2 = arr[:3]
print("Slice from start to index 3:", slice_2)
# Slicing from index 2 to end
slice_3 = arr[2:]
print("Slice from index 2 to end:", slice_3)
Output:
Slice from index 1 to 4: [20 30 40]
Slice from start to index 3: [10 20 30]
Slice from index 2 to end: [30 40 50]
Assignment:
Create a 1D numpy array with elements from 0 through 9
i. Slice elements from index 2 to 5
ii. Slice elements from the beginning to index 5
iii. Slice elements from index 5 to the end
iv. Slice elements with a step of 2
v. Slice elements from index 1 to 8 with a step of 3

import numpy as np
# Creating a NumPy array
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Slicing elements from index 2 to 5
slice_1 = arr[2:6]
print(slice_1) # Output: [2 3 4 5]
# Slicing elements from the beginning to index 5
slice_2 = arr[:6]
print(slice_2) # Output: [0 1 2 3 4 5]
# Slicing elements from index 5 to the end
slice_3 = arr[5:]
print(slice_3) # Output: [5 6 7 8 9]
# Slicing elements with a step of 2
slice_4 = arr[::2]
print(slice_4) # Output: [0 2 4 6 8]
# Slicing elements from index 1 to 8 with a step of 3
slice_5 = arr[1:9:3]
print(slice_5) # Output: [1 4 7]
Modify Array Elements Using Slicing
With slicing, we can also modify array elements using:
• start parameter
• stop parameter
• start and stop parameter
• start, stop, and step parameter

1. Using start Parameter


import numpy as np
# create a numpy array
numbers = np.array([2, 4, 6, 8, 10, 12])
# modify elements from index 3 onwards
numbers[3:] = 20
print(numbers)
# Output:
[ 2 4 6 20 20 20]
Here, numbers[3:] = 20 replaces all the elements from index 3 onwards with new value 20.

2. Using stop Parameter


import numpy as np
# create a numpy array
numbers = np.array([2, 4, 6, 8, 10, 12])
# modify the first 3 elements
numbers[:3] = 40
print(numbers)
Output:
[40 40 40 8 10 12]
Here, numbers[:3] = 20 replaces the first 3 elements with the new value 40.
3. Using start and stop parameter
import numpy as np
# create a numpy array
numbers = np.array([2, 4, 6, 8, 10, 12])
# modify elements from indices 2 to 5
numbers[2:5] = 22
print(numbers)
Output:
[2 4 22 22 22 12]
Here, numbers[2:5] = 22 selects elements from index 2 to index 4 and replaces them with new value 22.
4. Using start, stop, and step parameter
import numpy as np
# create a numpy array
numbers = np.array([2, 4, 6, 8, 10, 12])
# modify every second element from indices 1 to 5
numbers[1:5:2] = 16
print(numbers)
Output:
[ 2 16 6 16 10 12]
modifies every second element from index 1 to index 5 with a new value 16.
NumPy Array Negative Slicing
We can also use negative indices to perform negative slicing in NumPy arrays. During negative slicing,
elements are accessed from the end of the array.
import numpy as np
# create a numpy array
numbers = np.array([2, 4, 6, 8, 10, 12])
# slice the last 3 elements of the array
# using the start parameter
print(numbers[-3:]) # [8 10 12]
# slice elements from 2nd-to-last to 4th-to-last element
# using the start and stop parameters
print(numbers[-5:-2]) # [4 6 8]
# slice every other element of the array from the end
# using the start, stop, and step parameters
print(numbers[-1::-2]) # [12 8 4]
Output
Using numbers[-3:]- [ 8 10 12]
Using numbers[-5:-2]- [4 6 8]
Using numbers[-1::-2]- [12 8 4]
• numbers[-3:] - slices last 3 elements of numbers
• numbers[-5:-2] - slices numbers elements from 5th last to 2nd last(excluded)
• numbers[-1::-2] - slices every other numbers elements from the end with step size 2

Reverse NumPy Array Using Negative Slicing

In NumPy, we can also reverse array elements using the negative slicing. For example,
import numpy as np
# create a numpy array
numbers = np.array([2, 4, 6, 8, 10, 12])
# generate reversed array
reversed_numbers = numbers[::-1]
print(reversed_numbers)
Output:
[12 10 8 6 4 2]
Here, the slice numbers[::-1] selects all the elements of the array with a step size of -1, which reverses the
order of the elements.
Slicing a 2D array using NumPy is quite similar to slicing lists in Python.
From the second element, slice elements from index 1 to index 4
Example 1:
import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[1, 1:4])
Output:
[7 8 9]
From both elements, return index 2
Example 2:
import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[0:2, 2])
Output:
[3 8]
From both elements, slice index 1 to index 4 (not included), this will return a 2-D array:
Example 3:
import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[0:2, 1:4])
output:
[[2 3 4]
[7 8 9]]
Example 4:
# create a 2D array
array1 = np.array([[1, 3, 5, 7],
[9, 11, 13, 15]])
print(array1[:2, :2])
Output
[[ 1 3]
[ 9 11]]
The first :2 returns first 2 rows i.e., entire array1 # [1 3]
The second :2 returns first 2 columns from the 2 rows. # [9 11]

Example 5:
Example 1:
import numpy as np
array = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
# Extract a single element
element = array[1, 2] # Output: 7
# Extract a specific row:
row = array[1, :] # Output: [5, 6, 7, 8]
# Extract a specific column
column = array[:, 2] # Output: [3, 7, 11]
#Extract a subarray:
subarray = array[0:2, 1:3] # Output: [[2, 3], [6, 7]]
Example 6:
import numpy as np
# create a 2D array
array1 = np.array([[1, 3, 5, 7],
[9, 11, 13, 15],
[2, 4, 6, 8]])
# slice the array to get the first two rows and columns
subarray1 = array1[:2, :2]
# slice the array to get the last two rows and columns
subarray2 = array1[1:3, 2:4]
# print the subarrays
print("First Two Rows and Columns: \n",subarray1)
print("Last two Rows and Columns: \n",subarray2)
Output
First Two Rows and Columns:
[[ 1 3]
[ 9 11]]
Last two Rows and Columns:
[[13 15]
[ 6 8]]
• array1[:2, :2] - slices array1 that starts at the first row and first column (default values), and ends at the
second row and second column (exclusive)
• array1[1:3, 2:4] - slices array1 that starts at the second row and third column (index 1 and 2), and ends
at the third row and fourth column (index 2 and 3)
Assignment:
Write a Python Program using Numpy array with the following:
i. Create an 2d numpy array with elements 1 to 16
ii. Select a specific range of rows say 1:3
iii. Select a specific range of columns
iv. Skipping the element using a step
v. Reversing the array

import numpy as np
arr = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]])
rows_2_to_3 = arr[1:3] # [[5, 6, 7, 8], [9, 10, 11, 12]]
cols_2_to_4 = arr[:, 1:4] # [[2, 3, 4], [6, 7, 8], [10, 11, 12], [14, 15, 16]]
every_other_row = arr[::2] # [[1, 2, 3, 4], [9, 10, 11, 12]]
every_other_col = arr[:, ::2] # [[1, 3], [5, 7], [9, 11], [13, 15]]
reverse_rows = arr[::-1] # [[13, 14, 15, 16], [9, 10, 11, 12], [5, 6, 7, 8], [1, 2, 3, 4]]
reverse_cols = arr[:, ::-1] # [[4, 3, 2, 1], [8, 7, 6, 5], [12, 11, 10, 9], [16, 15, 14, 13]]
Slicing a 3D
Slicing a 3D numpy array follows similar principles to slicing a 2D array, but you add an additional dimension
to the slicing. Here's a basic example to get you started:
Let's say we have the following 3D numpy array:
import numpy as np
arr = np.array([[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],

[[10, 11, 12],


[13, 14, 15],
[16, 17, 18]],

[[19, 20, 21],


[22, 23, 24],
[25, 26, 27]]])
1. Selecting a specific 2D matrix:
matrix_0 = arr[0] # [[[1, 2, 3], [4, 5, 6], [7, 8, 9]]]
2. Selecting a specific row across all matrices:
row_0_across_all_matrices = arr[:, 0, :] # [[1, 2, 3], [10, 11, 12], [19, 20, 21]]
3. Selecting a specific column across all matrices:
col_0_across_all_matrices = arr[:, :, 0] # [[1, 4, 7], [10, 13, 16], [19, 22, 25]]
4. Selecting a submatrix from each matrix:
submatrix = arr[:, 1:3, 1:3] # [[[5, 6], [8, 9]], [[14, 15], [17, 18]], [[23, 24], [26, 27]]]
5. Slicing with steps:
sliced_with_steps = arr[::2, ::2, ::2] # [[[1, 3], [7, 9]], [[19, 21], [25, 27]]]
Stacking NumPy Array
Stacking arrays in NumPy refers to combining multiple arrays along a new dimension, creating higher-
dimensional arrays. This is different from concatenation, which combines arrays along an existing axis without
adding new dimensions.
NumPy provides several functions to achieve stacking. They are as follows −
• Using numpy.stack() Functiom
• Using numpy.vstack() Function
• Using numpy.hstack() Function
• Using numpy.dstack() Function
• Using numpy.column_stack() Function

Stacking Arrays Using stack() Function


We can use the stack() function in NumPy to stack a sequence of arrays along a new
axis, creating a new dimension in the result.
Following is the syntax of the stack() function in NumPy −
np.stack(arrays, axis=0)
Where,
• arrays − A sequence of arrays to be stacked.
• axis − The axis along which to stack the arrays. The default is 0, which adds a new first axis.
Example: Stacking 1D Arrays
In the below example, we are stacking three 1D arrays along a new axis (axis 0) using the numpy.stack()
function, resulting in a 2D array −
Open Compiler

import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr3 = np.array([7, 8, 9])
# Stack arrays along a new axis
stacked_arr = np.stack((arr1, arr2, arr3), axis=0)
print("Stacked Array along a new axis (Axis 0):")
print(stacked_arr)
Following is the output obtained −
Stacked Array along a new axis (Axis 0):
[[1 2 3]
[4 5 6]
[7 8 9]]
Example: Changing the Axis
The "axis" parameter in numpy.stack() function determines where the new axis is inserted. By changing the
value of axis, you can control how the arrays are stacked −
Open Compiler
import numpy as np
# arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr3 = np.array([7, 8, 9])
# Stack arrays along axis 1
stacked_arr = np.stack((arr1, arr2, arr3), axis=1)
print("Stacked Array along Axis 1:")
print(stacked_arr)
This will produce the following result −
Stacked Array along Axis 1:
[[1 4 7]
[2 5 8]
[3 6 9]]
Example: Stacking Multi-dimensional Arrays
The numpy.stack() function can also be used to stack multi-dimensional arrays. The function adds a new axis
to the higher-dimensional arrays and stacks them accordingly.
In here, we are stacking two 2D arrays −
Open Compiler
import numpy as np
# 2D arrays
arr1 = np.array([[1, 2],
[3, 4]])
arr2 = np.array([[5, 6],
[7, 8]])
# Stack arrays along a new axis
stacked_arr = np.stack((arr1, arr2), axis=0)
print("Stacked 2D Arrays along a new axis (Axis 0):")
print(stacked_arr)

Following is the output of the above code −


Stacked 2D Arrays along a new axis (Axis 0):
[[[1 2]
[3 4]]

[[5 6]
[7 8]]]

Stacking Arrays Using column_stack() Function


The numpy.column_stack() function in NumPy is used to stack 1D arrays as columns into a 2D array or to
stack 2D arrays column-wise. This function provides a way to combine arrays along the second axis (axis=1),
effectively increasing the number of columns in the resulting array.
Following is the syntax −
np.column_stack(tup)
Where, tup is a tuple of arrays to be stacked. The arrays can be either 1D or 2D, but they must have the same
number of rows.
Example: Stacking 1D arrays as columns
In the example below, we are stacking two two 1D arrays as columns into a 2D array using the NumPy
column_stack() function −
Open Compiler
import numpy as np
# 1D arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
# Column-stack 1D arrays
stacked_arr_1d = np.column_stack((arr1, arr2))
print("Stacked 1D arrays as 2D array:")
print(stacked_arr_1d)
We get the output as shown below −
Stacked 1D arrays as 2D array:
[[1 4]
[2 5]
[3 6]]
Example: Stacking 2D arrays column-wise
In here, we are stacking two 2D arrays column-wise using the NumPy column_stack() function −
Open Compiler
import numpy as np
# 2D arrays
arr3 = np.array([[1, 2],
[3, 4]])
arr4 = np.array([[5, 6],
[7, 8]])
# Column-stack 2D arrays
stacked_arr_2d = np.column_stack((arr3, arr4))
print("Stacked 2D arrays column-wise:")
print(stacked_arr_2d)
Following is the output obtained −
Stacked 2D arrays column-wise:
[[1 2 5 6]
[3 4 7 8]]
Vertical Stacking
We can also stack arrays vertically (row-wise) using the vstack() function in NumPy. It is equivalent to using
numpy.concatenate() function with "axis=0", where arrays are concatenated along the first axis.
This results in an array with an increased number of rows, combining multiple arrays row-wise. Following is
the syntax −
numpy.vstack(tup)
Where, tup is a tuple of arrays to be stacked vertically. All arrays must have the same number of columns.
Example
In the example below, we are stacking two arrays vertically using the NumPy vstack() function −
Open Compiler
import numpy as np
# arrays
arr1 = np.array([[1, 2, 3],
[4, 5, 6]])
arr2 = np.array([[7, 8, 9],
[10, 11, 12]])
# Stack arrays vertically
stacked_arr = np.vstack((arr1, arr2))
print("Vertically Stacked Array:")
print(stacked_arr)
The output obtained is as shown below −
Vertically Stacked Array:
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
Horizontal Stacking
We can stack arrays horizontally (column-wise) using the hstack() function in NumPy. It is equivalent to using
numpy.concatenate() function with "axis=1", where arrays are concatenated along the second axis for 2D
arrays.
This results in an array with an increased number of columns, combining multiple arrays column-wise.
Following is the syntax −
numpy.hstack(tup)
Where, tup is a tuple of arrays to be stacked horizontally. All arrays must have the same number of rows.
Example
In the example below, we are stacking two arrays horizontally using the NumPy hstack() function −
Open Compiler
import numpy as np
arr1 = np.array([[1, 2],
[3, 4]])
arr2 = np.array([[5, 6],
[7, 8]])
# Stack arrays horizontally
stacked_arr = np.hstack((arr1, arr2))
print("Horizontally Stacked Array:")
print(stacked_arr)
After executing the above code, we get the following output −
Horizontally Stacked Array:
[[1 2 5 6]
[3 4 7 8]]
Depth Stacking
The numpy.dstack() function is used to stack arrays along the third dimension, also known as the depth
dimension. This combines arrays depth-wise, effectively creating a new dimension in the resulting array.
It is particularly useful when you want to combine multiple 2D arrays into a single 3D array. Following is the
syntax −
np.dstack(tup)
Where, tup is a tuple of arrays to be stacked along the third dimension. All arrays must have the same shape
in the first two dimensions.
Example
In this example, we are stacking two arrays along the third dimension using the NumPy dstack() function −
Open Compiler
import numpy as np
arr1 = np.array([[1, 2],
[3, 4]])
arr2 = np.array([[5, 6],
[7, 8]])
# Stack arrays along the third dimension
stacked_arr = np.dstack((arr1, arr2))
print("Depth-wise Stacked Array:")
print(stacked_arr)
The result produced is as follows −
Depth-wise Stacked Array:
[[[1 5]
[2 6]]

[[3 7]
[4 8]]]
Concatenating ndarrays
Concatenating ndarrays refers to the process of joining multiple NumPy arrays along a specified axis. In
simpler terms, it's like sticking arrays together end-to-end. This can be done along different dimensions (axes)
of the arrays.
Here's a more detailed breakdown:
• Axis 0: Concatenation along rows (vertically). Think of stacking arrays on top of each other.
• Axis 1: Concatenation along columns (horizontally). Think of placing arrays side-by-side.
Here's a visual example:
1.Concatenating along the first axis (rows):
import numpy as np
# Creating two ndarrays
array1 = np.array([[1, 2], [3, 4]])
array2 = np.array([[5, 6]])
# Concatenating along the first axis (rows)
result = np.concatenate((array1, array2), axis=0)
print(result)
2.Concatenating along the second axis (columns):
import numpy as np
# Creating two ndarrays
array1 = np.array([[1, 2], [3, 4]])
array2 = np.array([[5, 6], [7, 8]])
# Concatenating along the second axis (columns)
result = np.concatenate((array1, array2), axis=1)
print(result)
3.Concatenating multiple ndarrays:
import numpy as np
# Creating multiple ndarrays
array1 = np.array([1, 2])
array2 = np.array([3, 4])
array3 = np.array([5, 6])
# Concatenating multiple ndarrays
result = np.concatenate((array1, array2, array3))
print(result)
You can use the axis parameter to control the axis along which the arrays will be joined. axis=0 means along
the rows, and axis=1 means along the columns.
Broadcasting in NumPy
Broadcasting in NumPy is a powerful mechanism that allows you to perform operations on arrays of different
shapes in a way that would otherwise require you to manually expand their dimensions. It works by
"broadcasting" the smaller array across the larger array so that they have compatible shapes.
Here’s a simple example to illustrate the concept:
Suppose you have the following arrays:
Example 1:
import numpy as np
array1 = np.array([1, 2, 3])
array2 = np.array([[1], [2], [3]])
The shapes of array1 and array2 are (3,) and (3, 1) respectively.
When you perform an operation like addition, NumPy broadcasts array1 over array2:
result = array1 + array2
print(result)
Output:
[[2, 3, 4],
[3, 4, 5],
[4, 5, 6]]
Here's a step-by-step breakdown of broadcasting rules:
1. Align Shapes: Starting with the trailing dimensions, NumPy compares the dimensions of each array.
If the dimensions are equal, or one of them is 1, they are compatible.
2. Stretch to Match: If a dimension in one array is 1 while the corresponding dimension in the other
array is greater than 1, the array with the dimension of 1 is stretched to match the other array’s
dimension.
3. Apply Operation: Once the shapes are compatible, NumPy applies the operation element-wise.
Example of broadcasting in practice:
• Scalar and Array:
import numpy as np
array = np.array([1, 2, 3])
scalar = 2
result = array + scalar # Broadcasting scalar to match the shape of the array
print(result)
# Output: [3 4 5]

Two Arrays:
import numpy as np
array1 = np.array([[1, 2, 3], [4, 5, 6]])
array2 = np.array([1, 2, 3])
result = array1 + array2 # Broadcasting array2 to match the shape of array1
print(result)
# Output: [[2 4 6]
# [5 7 9]]

Broadcasting
Broadcasting enables concise and efficient code, reducing the need for explicit loops and making operations
on arrays of differing shapes easier and more intuitive.

1. Broadcasting a scalar across a 2D array:


import numpy as np
array = np.array([[1, 2, 3], [4, 5, 6]])
scalar = 10
result = array * scalar # Broadcasting scalar to match the shape of the array
print(result)
# Output: [[10 20 30]
# [40 50 60]]

2. Broadcasting a 1D array across a 2D array:


import numpy as np
array1 = np.array([[1, 2, 3], [4, 5, 6]])
array2 = np.array([1, 2, 3])
result = array1 + array2 # Broadcasting array2 to match the shape of array1
print(result)
# Output: [[2 4 6]
# [5 7 9]]

3. Broadcasting with different shapes:


import numpy as np
array1 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
array2 = np.array([1, 2, 3])
result = array1 * array2 # Broadcasting array2 to match the shape of array1
print(result)
# Output: [[ 1 4 9]
# [ 4 10 18]
# [ 7 16 27]]

4. Broadcasting along different dimensions:


import numpy as np
array1 = np.array([[1, 2], [3, 4], [5, 6]])
array2 = np.array([[10], [20]])
result = array1 + array2 # Broadcasting array2 to match the shape of array1
print(result)
# Output: [[11 12]
# [23 24]
# [25 26]]

5. Broadcasting with higher-dimensional arrays:


import numpy as np
array1 = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
array2 = np.array([1, 2])
result = array1 + array2 # Broadcasting array2 to match the shape of array1
print(result)
# Output: [[[ 2 4]
# [ 4 6]]
#
# [[ 6 8]
# [ 8 10]]]
These examples should give you a better sense of how broadcasting works with various array shapes and
operations.
Experiment -6
Applications of Python – Pandas

Perform following operations using pandas


• Creating dataframe
• concat()
• Setting conditions
• Adding a new column

What is Pandas?
Pandas is a popular open-source library in Python that's essential for data manipulation and analysis. It
provides powerful tools for handling structured data such as tables, spreadsheets, or databases. The name
Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data. In 2008,
developer Wes McKinney started developing pandas when in need of high performance, flexible tool for
analysis of data.
Here are a few highlights of Pandas:
• Data Structures: It primarily uses two data structures—Series (1D) and DataFrame (2D)—to
organize and manipulate data.
• Data Cleaning: Pandas makes it easy to clean and preprocess messy datasets, including handling
missing or duplicate values.
• Data Operations: You can perform operations like filtering, merging, grouping, and aggregating
data with ease.
• File Handling: It supports importing/exporting data to various formats like CSV, Excel, SQL
databases, and more.

Why Use Pandas?


Pandas allows us to analyze big data and make conclusions based on statistical theories.
Pandas can clean messy data sets, and make them readable and relevant.
Relevant data is very important in data science.

Applications of Pandas
• Data Cleaning
• Data Exploration
• Data Preparation
• Data Analysis
• Data Visualisation
• Time Series Analysis
• Data Aggregation and Grouping
• Data Input/Output
• Machine Learning
• Web Scraping
• Financial Analysis
• Text Data Analysis
• Experimental Data Analysis

Installation of Pandas
If you have Python and PIP already installed on a system, then installation of Pandas is
very easy.

Install it using this command:

C:\Users\Your Name>pip install pandas

Installing Pandas Using Anaconda


Anaconda is a popular distribution for data science that includes Python and many scientific libraries,
including Pandas.

Pandas comes pre-installed with Anaconda, so you can directly import it in your Python environment.

import pandas as pd

Data Structures in Pandas Library


Pandas generally provide two data structures for manipulating data. They are:
• Series
• DataFrame

Pandas Series
A Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float,
Python objects, etc.). The axis labels are collectively called indexes.
Creating a Series
Pandas Series is created by loading the datasets from existing storage (which can be a SQL database, a CSV
file, or an Excel file).
Pandas Series can be created from lists, dictionaries, scalar values, etc.

Creating an Empty Pandas Series


An empty Series contains no data and can be useful when we plan to add values later. we can create an
empty Series using the pd.Series() function. By default an empty Series has a float64 data type. If we need a
different data type specify it using the dtype parameter.
import pandas as pd
ser = pd.Series()
print(ser)
Creating a Series from a NumPy Array
If we already have data stored in a NumPy array we can easily convert it into a Pandas Series. This is helpful
when working with numerical data.
import pandas as pd
import numpy as np
data = np.array(['r', 'a', 'a', 'k', 'i'])
ser = pd.Series(data)
print(ser)
Creating a Series from a List
we can create a Series by passing a Python list to the pd.Series() function. Pandas automatically assigns an
index to each element starting from 0. This is a simple way to store and manipulate data.

import pandas as pd
import numpy as np
data = np.array(['r', 'a', 'a', 'k', 'i'])
ser = pd.Series(data)
print(ser)

Additional Exercises
1. Create a simple Pandas Series from a list?
2. Return the first value of the series.

Create Labels
With the index argument, you can name your own labels.
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)

Creating a Series Using range()


The range() function in Python is commonly used to generate sequences of numbers and it can be easily
converted into a Pandas Series. This is particularly useful for creating a sequence of values in a structured
format without need of manually specify each element. Below is an how range() can be used to create a
Series.
import pandas as pd
ser = pd.Series(range(5, 15))
print(ser)
Creating a Series Using List Comprehension
List comprehension is a concise way to generate sequences and apply transformations in a single line of
code. This method is useful when we need to create structured sequences dynamically. Below is an example
demonstrating how list comprehension is used to create a Series with a custom index.

import pandas as pd
ser=pd.Series(range(1,20,3), index=[x for x in 'abcdefg'])
print(ser)
Creating a Series from a Dictionary
A dictionary in Python stores data as key-value pairs. When we convert Dictionary into a Pandas Series the
keys become index labels and the values become the data. This method is useful for labeled data preserving
structure and enabling quick access.

import pandas as pd
data_dict = {'Pandas': 10, 'and': 20, 'NumPy': 30}
ser = pd.Series(data_dict)
print(ser)

Creating a Series Using NumPy Functions


In order to create a series using numpy function. Some commonly used NumPy functions for generating
sequences include numpy.linspace() for creating evenly spaced numbers over a specified range and
numpy.random.randn() for generating random numbers from a normal distribution. This is particularly
useful when working with scientific computations, statistical modeling or large datasets.
import numpy as np
import pandas as pd
ser = pd.Series(np.linspace(1, 10, 5))
print(ser)

Creating a Series Using List Comprehension


List comprehension is a concise way to generate sequences and apply transformations in a single line of
code. This method is useful when we need to create structured sequences dynamically. Below is an example
demonstrating how list comprehension is used to create a Series with a custom index.
import pandas as pd
ser=pd.Series(range(1,20,3), index=[x for x in 'abcdefg'])
print(ser)

Create Labels
With the index argument, you can name your own labels.

import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)
A Series is a one-dimensional labeled array that can hold any data type. It can store integers, strings,
floating-point numbers, etc. Each value in a Series is associated with a label (index), which can be an integer
or a string.

Name Steve

Age 35

Gender Male

Rating 3.5

import pandas as pd
data = ['Steve', '35', 'Male', '3.5']
series = pd.Series(data, index=['Name', 'Age', 'Gender', 'Rating'])
print(series)

Accessing element of Series


There are two ways through which we can access element of series, they are :
• Accessing Element from Series with Position
• Accessing Element Using Label (index)
Accessing Element from Series with Position : In order to access the series element refers to the index
number. Use the index operator [ ] to access an element in a series. The index must be an integer. In order to
access multiple elements from a series, we use Slice operation.
Accessing first 5 elements of Series.
import pandas as pd
import numpy as np
data = np.array(['r','a','g','h','u','e','n','g','g','c','o','l','l','e','g','e'])
ser = pd.Series(data)
print(ser[:5])

Indexing and Selecting Data in Series


Indexing in pandas means simply selecting particular data from a Series. Indexing could mean selecting all
the data, some of the data from particular columns. Indexing can also be known as Subset Selection.
Indexing a Series using indexing operator [] :
Indexing operator is used to refer to the square brackets following an object. The .loc and .iloc indexers also
use the indexing operator to make selections. In this indexing operator to refer to df[ ].

import pandas as pd
df = pd.read_csv('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/data.csv')
ser = pd.Series(df['Duration'])
data = ser.head(10)
print(data)

Now we access the element of series using index operator [ ].


import pandas as pd
df = pd.read_csv('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/data.csv')
ser = pd.Series(df['Duration'])
data = ser.head(10)
print(data[3:6])

Indexing a Series using .iloc[ ] :


This function allows us to retrieve data by position. In order to do that, we’ll need to specify the positions of
the data that we want. The df.iloc indexer is very similar to df.loc but only uses integer locations to make its
selections.

import pandas as pd
df = pd.read_csv('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/data.csv')
ser = pd.Series(df['Duration'])
data = ser.head(10)
print(data.iloc[3:6])

Binary Operation on Series


We can perform binary operation on series like addition, subtraction and many other operation. In order to
perform binary operation on series we have to use some function like .add(),.sub() etc..

# importing pandas module


import pandas as pd
# creating a series
data = pd.Series([5, 2, 3,7], index=['a', 'b', 'c', 'd'])
# creating a series
data1 = pd.Series([1, 6, 4, 9], index=['a', 'b', 'd', 'e'])
print(data, "\n\n", data1)
import pandas as pd
data = pd.Series([5, 2, 3,7], index=['a', 'b', 'c', 'd'])
data1 = pd.Series([1, 6, 4, 9], index=['a', 'b', 'd', 'e'])
data.add(data1, fill_value=0)

Now we subtract two series using .sub function.


import pandas as pd
data = pd.Series([5, 2, 3,7], index=['a', 'b', 'c', 'd'])
data1 = pd.Series([1, 6, 4, 9], index=['a', 'b', 'd', 'e'])
data.sub(data1, fill_value=0)

Conversion Operation on Series


In conversion operation we perform various operation like changing datatype of series, changing a series to
list etc. In order to perform conversion operation we have various function which help in conversion
like .astype(), .tolist() etc.
import pandas as pd
df = pd.read_csv('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/nlist.csv')
print(df)
df.dropna(inplace = True)
before = data.dtypes
df["Number"]= df["Number"].astype(str)
df["Salary"]= df["Salary"].astype(int)
after = df.dtypes
print("BEFORE CONVERSION\n", before, "\n")
print("AFTER CONVERSION\n", after, "\n")

Example 2:

import pandas as pd
import re
data = pd.read_csv("C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/nlist.csv")
data.dropna(inplace = True)
dtype_before = type(data["Salary"])
salary_list = data["Salary"].tolist()
dtype_after = type(salary_list)
print("Data type before converting = {}\nData type after converting = {}"
.format(dtype_before, dtype_after))
salary_list

DataFrame
A DataFrame is a two-dimensional labeled data structure with columns that can hold different data types. It
is similar to a table in a database or a spreadsheet. Consider the following data representing the performance
rating of a sales team.

Name Age Gender Rating

Steve 32 Male 3.45

Lia 28 Female 4.6

Vin 45 Male 3.9

Katie 38 Female 2.78

Create a simple Pandas DataFrame:


1. Creating a DataFrame
import pandas as pd
# Creating a DataFrame from a dictionary
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
2.Accessing Data
# Access a specific column
print(df['Name'])
# Access a specific row by index
print(df.loc[1]) # Bob's row

3.Filtering Data
# Filter rows where Age is greater than 28
filtered_df = df[df['Age'] > 28]
print(filtered_df)

4. Adding a New Column


# Add a new column
df['Score'] = [88, 92, 79]
print(df)

5.Basic Statistics
# Calculate mean age
mean_age = df['Age'].mean()
print("Mean Age:", mean_age)

6.Reading/Writing Data
# Reading from a CSV file
df = pd.read_csv('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/nlist.csv')
# Writing to a CSV file
df.to_csv('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/output.csv', index=False)

7.Updating Data
# Update a specific value
df.at[0, 'Age'] = 26
print(df)
8. Dropping Rows or Columns
# Drop a column
df = df.drop('Age', axis=1)
# Drop a row
df = df.drop(1) # Removes the row with index 1
print(df)

9. Sorting Data
# Sort by age in descending order
sorted_df = df.sort_values(by='Number', ascending=False)
print(sorted_df)

10. Grouping Data


# Group by a column and calculate the mean
grouped_df = df.groupby('Number')['Salary'].mean()
print(grouped_df)

11. Merging DataFrames


# Merge two DataFrames
df1 = pd.DataFrame({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'ID': [1, 2], 'Score': [95, 89]})
merged_df = pd.merge(df1, df2, on='ID')
print(merged_df)

12. Handling Missing Data


# Create a DataFrame with missing values
data = {'Name': ['Alice', 'Bob', None], 'Age': [25, None, 35]}
df = pd.DataFrame(data)
# Fill missing values
df['Age'] = df['Age'].fillna(df['Age'].mean())
# Drop rows with missing values
df = df.dropna()
print(df)

13. Pivot Tables


# Create a pivot table
data = {
'Name': ['Alice', 'Bob', 'Alice', 'Bob'],
'Month': ['Jan', 'Jan', 'Feb', 'Feb'],
'Sales': [200, 150, 300, 250]
}
df = pd.DataFrame(data)
pivot = df.pivot_table(values='Sales', index='Name', columns='Month', aggfunc='sum')
print(pivot)

14.Concat() in DataFrames
The function in Pandas is used to concatenate or combine multiple DataFrames (or Series) along a
particular axis—either rows (axis=0) or columns (axis=1). It provides flexibility to merge data even
when the indices or columns don't align.

Basic Syntax
pd.concat(objs, axis=0, join='outer', ignore_index=False)

15. Concatenating by Rows (Default Behavior)


import pandas as pd
# DataFrames to concatenate
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
result = pd.concat([df1, df2])
print(result)

16. Concatenating by Columns


result = pd.concat([df1, df2], axis=1)
print(result)

17. Ignoring the Index


result = pd.concat([df1, df2], ignore_index=True)
print(result)

EXPERIMENT -7

Perform following operations using pandas

Filling NaN with string


Sorting based on column values
groupby()

Filling NaN with string

In Pandas, you can fill values with a string using the method. Here's a quick example:
import pandas as pd
import numpy as np
# Create a DataFrame with NaN values
data = {'Name': ['Alice', 'Bob', np.nan, 'David'],
'Age': [25, np.nan, 30, np.nan]}
df = pd.DataFrame(data)
# Fill NaN values with a string
df_filled = df.fillna('Unknown')
print(df_filled)

You can use the method with specific columns or the entire DataFrame. If you want to replace in a
particular column, you can do something like this:
import pandas as pd
import numpy as np
# Create a DataFrame with NaN values
data = {'Name': ['Alice', 'Bob', np.nan, 'David'],
'Age': [25, np.nan, 30, np.nan]}
df = pd.DataFrame(data)
df['Name'] = df['Name'].fillna('No Name')
print(df)

Example 1: Filling with Different Strings for Different Columns


import pandas as pd
import numpy as np
# Create a DataFrame with NaN values
data = {'Name': ['Alice', 'Bob', np.nan, 'David'],
'Age': [25, np.nan, 30, np.nan]}
df = pd.DataFrame(data)
# Fill NaN values with specific strings for each column
df_filled = df.fillna({'Name': 'Unknown', 'Age': 'Not Specified'})
print(df_filled)
Example 2: Filling Using Forward Fill ()
You can fill values with the previous value in the column using forward fill.

Example 3: Filling with a String and Adding a Flag Column


You can create a flag column to indicate which rows originally had values.
import pandas as pd
import numpy as np
# Create a DataFrame with NaN values
data = {'Name': ['Alice', 'Bob', np.nan, 'David'],
'Age': [25, np.nan, 30, np.nan]}
df = pd.DataFrame(data)
df['Missing_Name_Flag'] = df['Name'].isna().astype(int)
df['Name'] = df['Name'].fillna('No Name')
print(df)
Example 4: Replace in Specific Rows Based on Conditions
If you want to replace values based on certain conditions:

import pandas as pd
import numpy as np
# Create a DataFrame with NaN values
data = {'Name': ['Alice', 'Bob', np.nan, 'David'],
'Age': [25, np.nan, 30, np.nan]}
df = pd.DataFrame(data)
df.loc[df['Name'].isna(), 'Name'] = 'Condition-Based Name'
print(df)

Sorting DataFrames based on column values can be done using the method. Here are some examples to
guide you:
Example 1: Sorting in Ascending Order
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 28]}
df = pd.DataFrame(data)
# Sort by Age in ascending order
df_sorted = df.sort_values(by='Age')
print(df_sorted)

Example 2: Sorting in Descending Order

# Sort by Age in descending order


df_sorted_desc = df.sort_values(by='Age', ascending=False)
print(df_sorted_desc)
Example 3: Sorting by Multiple Columns
You can sort based on multiple columns by specifying a list. For example:
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Score': [85, 90, 85, 92],
'Age': [25, 30, 35, 28]}
df = pd.DataFrame(data)
# Sort by Score (descending) and then by Age (ascending)
df_sorted = df.sort_values(by=['Score', 'Age'], ascending=[False, True])
print(df_sorted)

Example 4: Sorting by Index


If you'd like to sort by the DataFrame's index instead of a column:
df_sorted_index = df.sort_index(ascending=False)
print(df_sorted_index)

Example 5: In-Place Sorting


If you want to modify the DataFrame directly:
df.sort_values(by='Age', inplace=True)
print(df)
GroupBy()
The function in Pandas is a powerful tool that helps you group data and perform operations like
aggregation, transformation, filtering, and more on these groups. Let me show you some examples:
Example 1: Grouping and Aggregating

import pandas as pd
# Create a DataFrame
data = {'Category': ['A', 'B', 'A', 'B', 'A'],
'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)
# Group by 'Category' and calculate the sum of 'Values'
grouped = df.groupby('Category')['Values'].sum()
print(grouped)
Here, the data is grouped by the column, and the sum of within each group is calculated.

Example 2: Grouping and Applying Multiple Aggregations


# Apply multiple aggregation functions (sum, mean)
grouped = df.groupby('Category')['Values'].agg(['sum', 'mean'])
print(grouped)
Example 3: Grouping by Multiple Columns
data = {'Category': ['A', 'A', 'B', 'B', 'C'],
'Subcategory': ['X', 'Y', 'X', 'Y', 'X'],
'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)
# Group by 'Category' and 'Subcategory' and calculate the sum of 'Values'
grouped = df.groupby(['Category', 'Subcategory'])['Values'].sum()
print(grouped)

Example 4: Transforming Data Within Groups


You can use combined with to apply functions to group values while maintaining the DataFrame structure.
df['Normalized'] = df.groupby('Category')['Values'].transform(lambda x: x / x.sum())
print(df)

Example 5: Filtering Groups


You can filter out groups based on conditions.
filtered = df.groupby('Category').filter(lambda x: x['Values'].sum() > 50)
print(filtered)
The function allows you to unlock deep insights from your data by enabling customized operations on
grouped subsets.
EXPERIMENT -8

Read the following file formats using pandas

Text files
CSV files
Excel files
JSON files

Pandas offers convenient methods to read various file formats into a DataFrame. Below are examples for
each format:
1. Text Files
You can read text files using or (if the text file is structured like a CSV):
import pandas as pd
# Read a text file
df_text = pd.read_table('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/nlist.txt')
print(df_text)

If your text file has delimiters, specify them using the parameter:
df_text = pd.read_csv('example.txt', sep='\t') # For tab-separated values

2. CSV Files
CSV files are easily handled with :

import pandas as pd
df_csv = pd.read_csv('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/nlist.csv')
print(df_csv)

3. Excel Files
Excel files can be read using :
import pandas as pd
df_excel = pd.read_excel('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/nlist.xlsx', sheet_name='nlist')
print(df_excel)

If the Excel file contains multiple sheets, specify the sheet name using the parameter or load all sheets as a
dictionary:

all_sheets = pd.read_excel('example.xlsx', sheet_name=None) # Reads all sheets

4. JSON Files
JSON files can be loaded using :
import pandas as pd
df = pd.json_normalize(pd.read_json('C:/Users/Raakesh
Kumar/OneDrive/Desktop/csv/colors.json')['colors'])
print(df)

Option - 2
import pandas as pd
# Read the JSON file
df = pd.read_json('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/colors.json', orient='records')
# Display the DataFrame
print(df)

Option -3
import pandas as pd

# Load the JSON data


data = {
"colors": [
{
"color": "black",
"category": "hue",
"type": "primary",
"code": {
"rgba": [255, 255, 255, 1],
"hex": "#000"
}
},
{
"color": "white",
"category": "value",
"code": {
"rgba": [0, 0, 0, 1],
"hex": "#FFF"
}
},
{
"color": "red",
"category": "hue",
"type": "primary",
"code": {
"rgba": [255, 0, 0, 1],
"hex": "#FF0"
}
},
{
"color": "blue",
"category": "hue",
"type": "primary",
"code": {
"rgba": [0, 0, 255, 1],
"hex": "#00F"
}
},
{
"color": "yellow",
"category": "hue",
"type": "primary",
"code": {
"rgba": [255, 255, 0, 1],
"hex": "#FF0"
}
},
{
"color": "green",
"category": "hue",
"type": "secondary",
"code": {
"rgba": [0, 255, 0, 1],
"hex": "#0F0"
}
}
]
}

# Convert JSON data to DataFrame


df = pd.json_normalize(data['colors'])
print(df)

Experiment – 9

Read the following file formats


Pickle files
Image files using PIL
Multiple files using Glob
Importing data from database

1. Pickle Files
Pickle files are binary files used to serialize and deserialize Python objects.
You can use the module or Pandas' built-in methods for Pickle files.
Example: Using Pandas to Read and Write Pickle Files
import pandas as pd
# Save a DataFrame as a Pickle file
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df.to_pickle('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/example.pkl')
# Load the Pickle file
df_loaded = pd.read_pickle('C:/Users/Raakesh
Kumar/OneDrive/Desktop/csv/example.pkl')
print(df_loaded)
2. Image Files Using PIL
To read image files, you can use the Python Imaging Library (PIL), which is
now maintained under the package.
Example: Reading and Displaying an Image

from PIL import Image // Python Image Library


# Open an image file
image = Image.open('C:/Users/Raakesh
Kumar/OneDrive/Pictures/zoom_cars.jpeg')
# Display the image
image.show()
# Convert to grayscale
gray_image = image.convert('L')
gray_image.show()
You can also manipulate images (resize, crop, etc.) using PIL.
You can install Pillow using:
pip install pillow
Pillow is a powerful tool for image processing in Python, widely used in tasks
like computer vision, graphic design, and automation.
3. Multiple Files Using Glob
The module is helpful for reading multiple files that match a specific
pattern.
Example: Reading Multiple CSV Files
import pandas as pd
import glob
# Get all CSV files in a directory
file_paths = glob.glob('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/*.csv')
# Read and concatenate all CSV files into one DataFrame
df = pd.concat([pd.read_csv(file) for file in file_paths], ignore_index=True)
print(df)
You can change the file pattern () to match other formats like *.txt , *.json , etc.
4. Importing Data from a Database
You can use libraries like (for SQLite) or (for more databases) to interact
with databases.
Example: Using SQLite
import sqlite3
import pandas as pd
# Connect to a database (or create one if it doesn't exist)
conn = sqlite3.connect('example.db')
# Create a sample table and insert data
conn.execute('CREATE TABLE IF NOT EXISTS sample_table (id
INTEGER, name TEXT)')
conn.execute('INSERT INTO sample_table VALUES (1, "Alice"), (2,
"Bob")')
conn.commit()
# Read data from the database into a DataFrame
df = pd.read_sql_query('SELECT * FROM sample_table', conn)
print(df)
# Close the connection
conn.close()
For other databases (MySQL, PostgreSQL, etc.), you can use sqlalchemy or
pymysql .

Experiment -10

What is Web Scrapping?


Web scraping is the process of automatically extracting data from websites. It
involves writing scripts or using tools to gather specific information from web
pages and organize it in a structured format, like a spreadsheet or database.
For example, if you want to collect pricing data from an e-commerce website or
track news headlines from various outlets, web scraping allows you to do so
efficiently without manually copying and pasting the data.
Here’s how it generally works:
Access the webpage: A scraper sends a request to a website's server to retrieve
the HTML code of the page.
Parse the data: The tool identifies and extracts the desired elements (like text,
images, or links) from the HTML structure.
Store the data: The extracted information is saved in a structured format, like a
CSV file or a database
It's important to note that web scraping must be done ethically and within the
legal boundaries. Many websites have terms of service that specify how their
data can be used, and ignoring these could lead to consequences.
Great! Here are some tools and libraries for web scraping:
1. BeautifulSoup (Python):
• A popular Python library for parsing HTML and XML documents.
• It's easy to use and ideal for smaller web scraping tasks.
• Example: Scraping product details or headlines from a webpage.
2. Selenium:
• A tool to automate web browsers and scrape dynamic content.
• It's useful for pages that require interaction, like logging in or clicking
buttons.
• Example: Extracting reviews from a website with a "Load More" button.

3. Scrapy:
• A powerful Python framework designed for large-scale scraping.
• It handles crawling, data extraction, and pipelines to store data
efficiently.
• Example: Building a scraper to collect job listings across multiple sites.

4. Puppeteer (JavaScript):
• A Node.js library for controlling headless Chrome browsers.
• Perfect for scraping content that requires JavaScript execution.
• Example: Gathering live sports scores from dynamic websites.
5. Octoparse:
• A no-code web scraping tool with a user-friendly interface.
• Great for non-programmers who want to extract data visually.
• Example: Scraping e-commerce product information.
6. Apify:
• A platform offering pre-built scrapers (actors) and tools for custom
scraping.
• You can deploy and run scrapers in the cloud.
• Example: Monitoring competitor prices online.

7. ParseHub:
• A visual scraping tool that works well with dynamic websites.
• A visual scraping tool that works well with dynamic websites.
• Example: Collecting weather data from a regional website.

8. Requests (Python):
• Often paired with BeautifulSoup, it allows you to send HTTP requests to
fetch web pages.
• Example: Accessing the HTML content of a webpage for parsing.

Each tool has its strengths depending on your use case. Are you interested in a
specific one or need help with a particular task? Let’s make it happen!
1. Simple Web Scraper (Python with BeautifulSoup):

You want to extract the titles of articles from a blog. Here's a Python
example using the BeautifulSoup library:
import requests
from bs4 import BeautifulSoup
url = "https://exampleblog.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
for title in soup.find_all("h2", class_="entry-title"):
print(title.text)
Output:
What is a Blog? A Simple Guide to Understanding Blogs
2. Scraping E-commerce Data:
Suppose you want to get prices of items from an e-commerce site to
compare them. Using Python with Selenium (for dynamic pages):
Step 1: Install Selenium
Open your terminal or command prompt and run the following command
to install Selenium:
pip install selenium
Step 2: Check Installation
Once installed, verify it by opening a Python shell and running:
import selenium
print(selenium.__version__)
This should display the version number of Selenium.
Step 3: Install WebDriver
Selenium requires a WebDriver to interact with the browser. For example,
if you're using Chrome:
To install a WebDriver on Windows, follow these steps:
1. Identify Your Browser
Determine which browser you want to automate (e.g., Chrome, Edge,
Firefox).
2. Download the WebDriver
For Chrome: Download ChromeDriver from here. Ensure the version
matches your Chrome browser version.
https://www.selenium.dev/downloads/

3. Add WebDriver to PATH


Extract the downloaded WebDriver file.
Copy the file path of the WebDriver executable.
Add it to your system's PATH:
Right-click on "This PC" or "My Computer" and select Properties.
Click Advanced system settings > Environment Variables.
Under "System variables," find Path and click Edit.
Add the WebDriver's file path and click OK.
4. Verify Installation
Open a Command Prompt and type:
chromedriver --version
5. Use WebDriver in Your Code
Now you can use the WebDriver in your scripts. For example, with
Selenium:
from selenium import webdriver
driver = webdriver.Chrome() # Or Edge, Firefox, etc.
driver.get("https://example.com")

Let me know if you need help with any specific step!

from selenium import webdriver


from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://www.amazon.in")
prices = driver.find_elements(By.CLASS_NAME, "price-tag")
for price in prices:
print(price.text)

It's important to note that web scraping must be done ethically and within
the legal boundaries. Many websites have terms of service that specify how
their data can be used, and ignoring these could lead to consequences.

from selenium import webdriver


from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://www.youtube.com/watch?v=kFHN4XQiue0")
video_titles = driver.find_elements(By.CSS_SELECTOR, "#video-title")
for title in video_titles:
print(title.text)

Example: Scraping Titles and Links from a Web Page


import requests
from bs4 import BeautifulSoup
# Step 1: Send a GET request to the website
url = 'https://www.raghuenggcollege.com'
response = requests.get(url)
# Step 2: Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
# Step 3: Extract specific elements (e.g., all links and titles)
titles = soup.find_all('h1') # Finds all <h1> elements
links = soup.find_all('a') # Finds all <a> (anchor) elements
# Step 4: Display the extracted data
for title in titles:
print('Title:', title.text)
for link in links:
print('Link:', link.get('href'))

import requests
from bs4 import BeautifulSoup
url = "https://www.naukri.com/bkpmg-health-solution-overview-
4582789?tab=jobs&functionAreaIdGid=25&searchId=17442659254677956&src
=orgCompanyListing/"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
for job in soup.find_all("div", class_="job-listing"):
title = job.find("h2", class_="headJobs").text
company = job.find("span", class_="company-page").text
location = job.find("span", class_="location").text
print(f"Title: {title}, Company: {company}, Location: {location}")
EXPERIMENT - 11
Perform following preprocessing techniques on loan prediction dataset

• Feature Scaling
• Feature Standardization
• Label Encoding
• One Hot Encoding

Feature scaling
Feature Scaling is a technique to standardize the independent features present in
the data. It is performed during the data pre-processing to handle highly varying
values. If feature scaling is not done then machine learning algorithm tends to
use greater values as higher and consider smaller values as lower regardless
of the unit of the values. For example it will take 10 m and 10 cm both as same
regardless of their unit. Here we will learn about different techniques which are
used to perform feature scaling.
Why is Feature Scaling Important?
• Improves Model Performance: Features with larger values can
disproportionately impact model training. Scaling brings all features to
comparable ranges.
• Speeds Up Optimization: Gradient descent converges faster when features
are scaled.
• Handles Units Differences: Features measured in different units (e.g.,
income in dollars vs. age in years) need to be standardized to avoid bias.
Types of Feature Scaling:
1. Standardization: Scales features to have a mean of 0 and a standard
deviation of 1.
2. Normalization: Scales features to a range, like [0, 1], or makes the feature
vector length 1 (L2 norm).
3. Min-Max Scaling: Normalizes features to a fixed range, typically [0, 1].
4. Robust Scaling: Uses the median and interquartile range, making it less
sensitive to outliers.
5. MaxAbs Scaling: Divides each feature by its maximum absolute value.

1. Absolute Maximum Scaling


This method of scaling requires two-step:
1. We should first select the maximum absolute value out of all the entries of
a particular measure.
2. Then after this we divide each entry of the column by this maximum value.
Xscaled=Xi−max(∣X∣)max(∣X∣)Xscaled=max(∣X∣)Xi−max(∣X∣)
After performing the above-mentioned two steps we will observe that each entry
of the column lies in the range of -1 to 1.
But this method is not used that often the reason behind this is that it is too
sensitive to the outliers and while dealing with the real-world data presence of
outliers is a very common thing.

import pandas as pd
df = pd.read_csv('C:/Users/Raakesh Kumar/OneDrive/Desktop/csv/sampleFile.csv')
print(df.head())
Now let’s apply the first method which is of the absolute maximum scaling. For
this first, we are supposed to evaluate the absolute maximum values of the
columns.
import pandas as pd
df = pd.read_csv('C:/Users/Raakesh
Kumar/OneDrive/Desktop/csv/SampleFile.csv')
print(df.head())

Now let’s apply the first method which is of the absolute maximum scaling. For
this first, we are supposed to evaluate the absolute maximum values of the
columns
#max abs scaling
import numpy as np
max_vals = np.max(np.abs(df))
print(max_vals)
Now we are supposed to subtract these values from the data and then divide the
results from the maximum values as well.
print((df - max_vals) / max_vals)

2. Min-Max Scaling
This method of scaling requires below two-step:
1. First we are supposed to find the minimum and the maximum value of the
column.
2. Then we will subtract the minimum value from the entry and divide the
result by the difference between the maximum and the minimum value.
Xscaled=Xi−XminXmax–XminXscaled=Xmax–XminXi−Xmin
As we are using the maximum and the minimum value this method is also prone
to outliers but the range in which the data will range after performing the above
two steps is between 0 to 1.
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_data,
columns=df.columns)
z = scaled_df.head()
print(z)

3. Normalization
This method is more or less the same as the previous method but here instead of
the minimum value we subtract each entry by the mean value of the whole data
and then divide the results by the difference between the minimum and the
maximum value.
Xscaled=Xi−XmeanXmax–XminXscaled=Xmax–XminXi−Xmean

from sklearn.preprocessing import Normalizer


scaler = Normalizer()
scaled_data = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_data,
columns=df.columns)
print(scaled_df.head())
4. Standardization
This method of scaling is basically based on the central tendencies and variance
of the data.
1. First we should calculate the mean and standard deviation of the data we
would like to normalize it.
2. Then we are supposed to subtract the mean value from each entry and then
divide the result by the standard deviation.
This helps us achieve a normal distribution of the data with a mean equal to zero
and a standard deviation equal to 1.

Xscaled=Xi−Xmean
---------------
σ
from sklearn.preprocessing import Normalizer
scaler = Normalizer()
scaled_data = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_data,
columns=df.columns)
print(scaled_df.head())

Example2:
from sklearn.preprocessing import StandardScaler
import pandas as pd
# Example dataset
data = {
'Feature1': [10, 20, 30, 40, 50],
'Feature2': [100, 200, 300, 400, 500],
'Feature3': [1000, 2000, 3000, 4000, 5000]
}
df = pd.DataFrame(data)
# Initialize the scaler
scaler = StandardScaler()
# Standardize features
standardized_features = scaler.fit_transform(df)
# Convert the numpy array back to a DataFrame for better readability
standardized_df = pd.DataFrame(standardized_features,
columns=df.columns)
print("Standardized Data:")
print(standardized_df)

5. Robust Scaling
In this method of scaling, we use two main statistical measures of the data.
• Median
• Inter-Quartile Range
After calculating these two values we are supposed to subtract the median from
each entry and then divide the result by the interquartile range.
Xscaled=Xi−Xmedian
---------------------------
IQR

from sklearn.preprocessing import RobustScaler


scaler = RobustScaler()
scaled_data = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_data,
columns=df.columns)
print(scaled_df.head())

Label Encoding
Label encoding is a technique used to convert categorical variables into
numerical format, which is essential for many machine learning models. Here's a
Python example using scikit-learn:

Example:
from sklearn.preprocessing import LabelEncoder
import pandas as pd
# Example dataset
data = {
'Category': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana']
}
df = pd.DataFrame(data)
# Initialize the label encoder
label_encoder = LabelEncoder()
# Perform label encoding
df['Category_encoded'] = la
bel_encoder.fit_transform(df['Category'])
print("Original Data:")
print(df)
print("\nMapping of Encoded Labels:")
for category, encoded_value in zip(label_encoder.classes_,
range(len(label_encoder.classes_))):
print(f"{category} -> {encoded_value}")

This code will transform the categorical values (Apple, Banana, Orange) into
numerical labels (0, 1, 2). Note that label encoding assigns integers arbitrarily, so
it is suitable for ordinal data or categories without any inherent ranking.
1. Encoding Multiple Categorical Columns:
from sklearn.preprocessing import LabelEncoder
import pandas as pd
# Example dataset with multiple categorical columns
data = {
'Color': ['Red', 'Blue', 'Green', 'Red'],
'Size': ['Small', 'Large', 'Medium', 'Small']
}
df = pd.DataFrame(data)
# Initialize the label encoder
label_encoder = LabelEncoder()
# Apply label encoding for each column
for column in df.columns:
df[column + '_encoded'] = label_encoder.fit_transform(df[column])

print("Original Data:")
print(df)

2. Encoding Data with Missing Values:


import pandas as pd
from sklearn.preprocessing import LabelEncoder
# Example dataset with missing values
data = {
'Fruit': ['Apple', 'Banana', None, 'Orange', 'Apple']
}
df = pd.DataFrame(data)
# Initialize the label encoder
label_encoder = LabelEncoder()
# Replace NaN with a placeholder before encoding
df['Fruit_filled'] = df['Fruit'].fillna('Unknown')
df['Fruit_encoded'] = label_encoder.fit_transform(df['Fruit_filled'])
print("Original Data with Encoding:")
print(df)
3.Retrieving the Mapping for Encoded Labels:
from sklearn.preproces
from sklearn.preprocessing import LabelEncoder
import pandas as pd
# Example dataset
data = {
'Animal': ['Cat', 'Dog', 'Fish', 'Cat', 'Dog']
}
df = pd.DataFrame(data)
# Initialize the label encoder
label_encoder = LabelEncoder()
df['Animal_encoded'] = label_encoder.fit_transform(df['Animal'])
print("Mapping of Encoded Labels:")
label_mapping = dict(zip(label_encoder.classes_,
range(len(label_encoder.classes_))))
print(label_mapping)

One Hot Encoding:


One-hot encoding is a method used to convert categorical data into a binary format,
where each category is represented as a unique combination of 0s and 1s. Here's
an example Python code snippet using pandas:
Example:
import pandas as pd
# Example dataset
data = {
'Fruit': ['Apple', 'Banana', 'Orange', 'Banana', 'Apple']
}
df = pd.DataFrame(data)
# Perform one-hot encoding
one_hot_encoded_df = pd.get_dummies(df, columns=['Fruit'])
print("One-Hot Encoded Data:")
print(one_hot_encoded_df)
Output
This code will create new columns for each unique category in the "Fruit" column
(e.g., Fruit_Apple, Fruit_Banana, Fruit_Orange) and fill them with 1s and 0s to
indicate the presence of each category.

1. Encoding with Non-Numeric Data


import pandas as pd
# Example dataset with non-numeric data
data = {
'Animal': ['Cat', 'Dog', 'Bird', 'Dog', 'Cat']
}
df = pd.DataFrame(data)
# Perform one-hot encoding
one_hot_encoded_df = pd.get_dummies(df, columns=['Animal'])
print("One-Hot Encoded Data:")
print(one_hot_encoded_df)

This code creates binary columns for each unique animal, like Animal_Cat,
Animal_Dog, and Animal_Bird.

2. One-Hot Encoding with Missing Values


import pandas as pd
# Example dataset with missing values
data = {
'Fruit': ['Apple', 'Banana', None, 'Orange', 'Apple']
}
df = pd.DataFrame(data)
# Handle missing values by replacing with a placeholder before encoding
df['Fruit'] = df['Fruit'].fillna('Unknown')
# Perform one-hot encoding
one_hot_encoded_df = pd.get_dummies(df, columns=['Fruit'])
print("One-Hot Encoded Data:")
print(one_hot_encoded_df)
This demonstrates handling NaN values by replacing them with a placeholder
(Unknown) before applying one-hot encoding.

3. One-Hot Encoding for Multiple Columns


import pandas as pd
# Example dataset with multiple categorical columns
data = {
'City': ['London', 'Paris', 'New York'],
'Weather': ['Rainy', 'Sunny', 'Snowy']
}
df = pd.DataFrame(data)
# Perform one-hot encoding for multiple columns
one_hot_encoded_df = pd.get_dummies(df, columns=['City', 'Weather'])
print("One-Hot Encoded Data:")
print(one_hot_encoded_df)
This example creates binary columns for categories across multiple categorical
features (City and Weather).
4. Adding Prefix to One-Hot Encoded Columns
import pandas as pd
# Example dataset
data = {
'Gender': ['Male', 'Female', 'Male', 'Female'],
'Marital Status': ['Single', 'Married', 'Married', 'Single']
}
df = pd.DataFrame(data)
# Add prefix for better clarity in column names
one_hot_encoded_df = pd.get_dummies(df, columns=['Gender', 'Marital Status'],
prefix=['Gender', 'Status'])
print("One-Hot Encoded Data:")
print(one_hot_encoded_df)

Adding prefixes (Gender_, Status_) ensures clarity in naming one-hot encoded


columns when working with larger datasets.
5. Handling Large Numbers of Categories
import pandas as pd
# Example dataset with many categories
data = {
'Country': ['India', 'USA', 'Canada', 'India', 'France']
}
df = pd.DataFrame(data)
# Limit the number of one-hot encoded columns by restricting top N
frequent categories
top_countries = df['Country'].value_counts().nlargest(3).index
df['Country'] = df['Country'].apply(lambda x: x if x in top_countries else
'Other')
one_hot_encoded_df = pd.get_dummies(df, columns=['Country'])
print("One-Hot Encoded Data:")
print(one_hot_encoded_df)
In this example, categories beyond the top N frequent ones are grouped into
"Other" to avoid high dimensionality.

You might also like