0% found this document useful (0 votes)

18 views32 pages

Chapter 2

This document provides an overview of NumPy, a Python library for working with numerical data and multi-dimensional arrays. NumPy features n-dimensional arrays (ndarrays) that enable fast and vectorized operations on large datasets. Key features include efficient array computing, multi-dimensional arrays, universal functions that operate element-wise on arrays, broadcasting operations, indexing and slicing capabilities, and integration with other scientific computing libraries in Python. NumPy arrays can be created in various ways and support common operations like addition, multiplication, and matrix multiplication. NumPy also supports indexing, slicing, and broadcasting for efficient data manipulation.

Uploaded by

Ingle Ashwini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views32 pages

Chapter 2

Uploaded by

Ingle Ashwini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 32

Subject Title : Data Science Using Python SEMESTER - III

Subject Ref. No. : MANC503

Chapter 2
Introduction to NumPy in Detail:

NumPy (Numerical Python) is a powerful Python library that provides

support for efficient manipulation of multi-dimensional arrays and
matrices of numerical data, along with a wide range of mathematical
functions to operate on these arrays. It is a fundamental package for
scientific computing and data analysis in Python. NumPy's key
feature is the ndarray (n-dimensional array) data structure, which
enables fast and vectorized operations on large datasets.

Key Features of NumPy:

Efficient Array Computing: NumPy arrays are more memory-efficient
and faster for numerical computations compared to Python's built-in
lists. NumPy operations are implemented in C and are therefore
significantly faster than equivalent Python loops.

Multi-Dimensional Arrays: NumPy arrays can have any number of

dimensions, allowing you to work with multi-dimensional datasets
like images, time series, and scientific data.

Universal Functions (ufuncs): NumPy provides a wide range of

mathematical functions that operate element-wise on arrays, resulting
in concise and efficient code.

Broadcasting: Broadcasting allows NumPy to perform operations on

arrays of different shapes, intelligently applying the operation to
elements without explicit looping.

1
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

Indexing and Slicing: NumPy supports powerful indexing and slicing

operations to access and manipulate elements within arrays.
Data Types: NumPy arrays have a consistent data type, enabling better
control over memory usage and efficient storage of homogeneous
data.
Vectorization: NumPy encourages vectorized operations, where you
perform operations on entire arrays instead of looping over individual
elements. This leads to cleaner and more efficient code.
Integration with Other Libraries: NumPy is the foundation for many
other scientific and data-related libraries in Python, including libraries
like SciPy, pandas, scikit-learn, and more.
Creating NumPy Arrays:

You can create NumPy arrays using various methods:

import numpy as np
# Create an array from a Python list
arr = np.array([1, 2, 3, 4, 5])

2
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

# Create a 2D array from a nested list

matrix = np.array([[1, 2, 3], [4, 5, 6]])
# Create an array of zeros or ones
zeros = np.zeros((3, 4))
ones = np.ones((2, 3))
# Create an identity matrix
identity = np.eye(3)
# Create an array with a range of values
range_array = np.arange(0, 10, 2)
# Create an array of evenly spaced values
linspace_array = np.linspace(0, 1, 5)
Array Operations:
NumPy supports various mathematical and logical operations on
arrays:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Element-wise addition
result_add = a + b

# Element-wise multiplication
result_mul = a * b

# Matrix multiplication
result_matmul = np.dot(a, b)

3
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

Indexing and Slicing:

Accessing elements in a NumPy array using indexing and slicing:
arr = np.array([10, 20, 30, 40, 50])

print(arr[0]) # Access the first element

print(arr[1:4]) # Access elements from index 1 to 3
print(arr[:3]) # Access elements from the beginning to index 2
print(arr[2:]) # Access elements from index 2 to the end
NumPy Array
NumPy arrays, also known as ndarrays (n-dimensional arrays), are the
foundation of the NumPy library. They provide an efficient and
flexible way to store and manipulate large datasets of homogeneous
numerical data in Python. This guide will cover key concepts related
to NumPy arrays, including creation, indexing, slicing, operations,
and attributes.

Importing NumPy:
To use NumPy, you need to import the library:
import numpy as np
Creating NumPy Arrays:
NumPy arrays can be created in several ways:
From a Python List:
arr = np.array([1, 2, 3, 4, 5])
From a Nested List (2D Array):
matrix = np.array([[1, 2, 3], [4, 5, 6]])

4
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

Using Built-in Functions:

zeros = np.zeros((3, 4)) # Array of zeros with shape (3, 4)

ones = np.ones((2, 3)) # Array of ones with shape (2, 3)
identity = np.eye(3) # 3x3 identity matrix

Using Range and Linspace:

range_array = np.arange(0, 10, 2) # Array with values [0, 2, 4, 6, 8]

linspace_array = np.linspace(0, 1, 5) # Array with 5 evenly spaced
values between 0 and 1

Array Attributes:

NumPy arrays have several useful attributes:

shape = arr.shape # Shape of the array (rows, columns)

dtype = arr.dtype # Data type of array elements
ndim = arr.ndim # Number of dimensions
size = arr.size # Total number of elements
Array Indexing and Slicing:
NumPy arrays are indexed and sliced similarly to Python lists:

element = arr[0] # Access the first element

sub_array = arr[1:4] # Access elements from index 1 to 3

5
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

sub_matrix = matrix[:2, 1:] # Access rows 0 to 1, columns 1 to end

Array Operations:
NumPy supports element-wise operations:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

add_result = a + b # Element-wise addition

mul_result = a * b # Element-wise multiplication
matmul_result = np.dot(a, b) # Matrix multiplication
Universal Functions (ufuncs):

NumPy provides numerous ufuncs for efficient element-wise

operations:
sqrt_array = np.sqrt(arr) # Square root of array elements
exp_array = np.exp(arr) # Exponential of array elements
Broadcasting:
Broadcasting allows performing operations on arrays of different
shapes:
scalar_mul = arr * 2 # Multiply each element by 2

Quick Note on Array Indexing

Array indexing is a fundamental concept in programming that allows

you to access individual elements within an array. In the context of
NumPy arrays, indexing refers to the process of retrieving specific
elements or subsets of elements from an array. Here's a quick
overview of array indexing in NumPy:
6
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

1. Indexing Basics:

Indexing in NumPy arrays is 0-based, meaning the index of the first

element is 0, the second element's index is 1, and so on.
You can use square brackets [] to access elements at specific indices.
2. Single Element Indexing:

To access a single element of a 1D array, use the index within the

square brackets:
import numpy as np

arr = np.array([10, 20, 30, 40, 50])

element = arr[2] # Access the element at index 2 (30)
3. Multi-Dimensional Arrays:
For 2D or multi-dimensional arrays, use a comma-separated pair of
indices within the square brackets:
matrix = np.array([[1, 2, 3], [4, 5, 6]])
element = matrix[1, 2] # Access the element at row 1, column 2 (6)
4. Slicing:
Slicing allows you to extract a subset of elements from an array. The
syntax is 'start:end:step', where 'start' is the starting index, 'end' is the
ending index (exclusive), and 'step' specifies the interval between
elements.
arr = np.array([10, 20, 30, 40, 50])
subset = arr[1:4] # Extract elements from index 1 to 3: [20, 30, 40]
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
7
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

submatrix = matrix[:2, 1:] # Extract rows 0 to 1, columns 1 to end

5. Boolean Indexing:
Boolean indexing involves using a Boolean condition to extract
elements that satisfy the condition.
arr = np.array([10, 20, 30, 40, 50])
condition = arr > 30
filtered = arr[condition] # Elements greater than 30: [40, 50]
6. Fancy Indexing:
Fancy indexing allows you to access elements at specific indices
using an array of index values.
arr = np.array([10, 20, 30, 40, 50])
indices = np.array([0, 3])
selected = arr[indices] # Elements at indices 0 and 3: [10, 40]
7. Negative Indices:
Negative indices count from the end of the array.
arr = np.array([10, 20, 30, 40, 50])
last_element = arr[-1] # Access the last element (50)
print(arr[-1])
Understanding array indexing is crucial for effectively working with
data in NumPy arrays. It allows you to access, modify, and manipulate
individual elements and subsets of elements within arrays.

1. NumPy Operations:
NumPy operations provide powerful tools for performing
computations on arrays efficiently. Understanding these operations is
crucial for effective data manipulation and analysis.
Arithmetic Operations:
8
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

Addition: 'np.add(arr1, arr2)'

Subtraction: 'np.subtract(arr1, arr2)'
Multiplication: 'np.multiply(arr1, arr2)'
Division: 'np.divide(arr1, arr2)'
Element-wise Operations:
Square root: 'np.sqrt(arr)'
Exponential: 'np.exp(arr)'
Logarithm: 'np.log(arr)'
Trigonometric functions: 'np.sin(arr), np.cos(arr)'

Aggregation Functions:
Sum: 'np.sum(arr)'
Mean: 'np.mean(arr)'
Median:' np.median(arr)'
Standard deviation: 'np.std(arr)'
Reshaping and Transposing:
Reshaping in NumPy:
Reshaping refers to changing the dimensions (shape) of an array
without changing the data it contains. It's often used when you want to
convert a one-dimensional array into a two-dimensional matrix or
vice versa, or when you need to change the number of rows or
columns in a multidimensional array.
import numpy as np
# Create a one-dimensional array with 12 elements
arr = np.arange(12)
# Reshape it into a 3x4 matrix

9
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

reshaped_arr = arr.reshape(3, 4)
print("Original array:")
print(arr)
print("\nReshaped array:")
print(reshaped_arr)
In this example, we created a one-dimensional array with 12 elements
and then reshaped it into a 3x4 matrix using the reshape method. The
resulting array keeps the same data but has a different shape.

Transposing in NumPy:
Transposing involves swapping the rows and columns of a two-
dimensional array. This operation is useful when you want to change
the orientation of your data, for example, when you want to perform
matrix operations like matrix multiplication or when you want to
align data for different calculations.
import numpy as np
# Create a 2x3 matrix
matrix = np.array([[1, 2, 3],
[4, 5, 6]])
# Transpose the matrix (swap rows and columns)
transposed_matrix = matrix.T
print("Original matrix:")
print(matrix)
print("\nTransposed matrix:")
print(transposed_matrix)
10
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

Original matrix:
[[1 2 3]
[4 5 6]]
Transposed matrix:
[[1 4]
[2 5]
[3 6]]
In this example, we created a 2x3 matrix and then used the .T attribute
to transpose it. The resulting matrix swaps the rows and columns,
changing its orientation.

NumPy Exercises Solutions with Examples:

Solutions to exercises provide step-by-step guidance and validation of

your understanding.

Solution 1: Array Creation and Indexing:

import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
element = arr[1, 2] # Access element at row 1, column 2 (6)

Solution 2: Element-wise Operations:

import numpy as np
arr1 = np.array([1, 2, 3])
11
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

arr2 = np.array([4, 5, 6])

result_add = np.add(arr1, arr2) # Element-wise addition

Solution 3: Aggregation and Statistics:

import numpy as np
arr = np.array([10, 20, 30, 40, 50])
mean_value = np.mean(arr) # Calculate mean
std_deviation = np.std(arr) # Calculate standard deviation

Solution 4: Reshaping and Transposing:

import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
reshaped = arr.reshape(3, 2) # Reshape to 3 rows, 2 columns
transposed = arr.T # Transpose the matrix

12
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

Introduction to Pandas:

Pandas is a widely used open-source Python library for data

manipulation and analysis. It provides versatile data structures and
functions that simplify working with structured data, making it an
essential tool in data science, analytics, and research. Pandas is built
on top of NumPy and is particularly useful for handling tabular data,
time series, and labeled data.

Features of Pandas:
DataFrame: The DataFrame is a two-dimensional, size-mutable, and
heterogeneous tabular data structure. It is the most commonly used
data structure in Pandas and can be thought of as a spreadsheet or
SQL table. DataFrames can hold data of various types, including
numeric, string, and categorical data.

13
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

Series: A Series is a one-dimensional labeled array that can hold data

of any type. Series are used as columns in DataFrames and are often
used to represent time series data.

Data Alignment: Pandas automatically aligns data when performing

operations on Series and DataFrames, ensuring that operations are
performed on corresponding elements, even when data is missing or
misaligned.

Data Cleaning and Preparation: Pandas provides a wide range of

functions for cleaning and preparing data, including methods for
handling missing data (NaN values), duplicate data, and data type
conversions.

Indexing and Selection: Pandas supports powerful indexing and

selection capabilities, allowing you to select data based on labels,
positions, and boolean conditions. You can also perform multi-axis
indexing and slicing.

Aggregation and Grouping: Pandas allows you to perform

aggregation operations like sum, mean, count, and more on data sets.
It also supports grouping data based on one or more criteria.

Time Series Data: Pandas has robust support for working with time
series data, including date and time indexing, resampling, and rolling
window calculations.

Input/Output: Pandas provides functions to read data from various file

formats, including CSV, Excel, SQL databases, JSON, and more. You
can also write DataFrames to these formats.
14
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

Visualization: Pandas integrates with popular data visualization

libraries like Matplotlib and Seaborn to create plots and charts directly
from your data.

Merging and Joining: Pandas supports various methods for combining

DataFrames through merging and joining operations, similar to SQL
joins.

Data Transformation: You can perform data transformations, such as

pivoting, melting, and stacking, to reshape your data for analysis.

Pandas Data Structures:

1. Series:
A Pandas Series is a one-dimensional array-like object with labeled
data and an associated index. It's often used to represent a column of
data in a DataFrame.
import pandas as pd
# Creating a Series
data = pd.Series([10, 20, 30, 40, 50])

2. DataFrame:
A Pandas DataFrame is a two-dimensional table of data with rows and
columns. It's a powerful data structure for storing and manipulating
structured data.

# Creating a DataFrame from a dictionary

15
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

data = {'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 28]}
df = pd.DataFrame(data)
print(df)
DataFrames - Part 1: Introduction to DataFrames

1. Introduction:
DataFrames are a core data structure in the Pandas library. They
provide a two-dimensional, labeled data structure that is highly
efficient for data manipulation and analysis.

2. Creating DataFrames:
You can create DataFrames using various methods:

From dictionaries: 'pd.DataFrame({'Column1': data1, 'Column2':

data2})'
From lists: 'pd.DataFrame([data1, data2], columns=['Column1',
'Column2'])'
From external data sources (CSV, Excel, databases, etc.):
pd.read_csv('data.csv')
Creating DataFrames is a fundamental task when working with data
analysis and manipulation in Python, especially when using libraries
like Pandas. A DataFrame is a two-dimensional, tabular data structure
that is commonly used to store and work with structured data.
import pandas as pd

16
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

# Creating a DataFrame from a dictionary

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 35],
'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)

# Printing the DataFrame

print(df)
Importing Pandas: First, you need to import the Pandas library using
import pandas as pd. This is a common convention in data analysis to
use pd as an alias for Pandas.

Creating Data: In this example, we create a Python dictionary called

data with three keys ('Name', 'Age', and 'City'). Each key is associated
with a list of values. Each list represents a column in the DataFrame.

Creating the DataFrame: We use the pd.DataFrame() constructor to

create a DataFrame from the data dictionary. The constructor takes the
dictionary as input, and each key in the dictionary becomes a column
in the DataFrame. The values in each list become the data in the
corresponding column.

Printing the DataFrame: We print the DataFrame df to the console.

The DataFrame is displayed in a tabular format, where each row

17
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

represents an observation (in this case, a person), and each column

represents a variable (in this case, 'Name', 'Age', and 'City').
3. Exploring DataFrames:

'df.head(n)': Display the first n rows of the DataFrame.

'df.tail(n)': Display the last n rows of the DataFrame.
'df.info()': Display information about the DataFrame, including data
types and non-null counts.
import pandas as pd
# Creating a sample DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 35],
'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)

# Exploring the DataFrame

# 1. Displaying the first few rows of the DataFrame

print("First 3 rows of the DataFrame:")
print(df.head(3))

# 2. Getting basic information about the DataFrame

18
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

print("\nSummary information about the DataFrame:")

print(df.info())

# 3. Descriptive statistics of numeric columns

print("\nDescriptive statistics of numeric columns:")
print(df.describe())

# 4. Checking for missing values

print("\nChecking for missing values:")
print(df.isnull())

# 5. Counting unique values in a column

print("\nCount of unique values in the 'City' column:")
print(df['City'].value_counts())

# 6. Selecting specific columns

print("\nSelecting specific columns ('Name' and 'Age'):")
print(df[['Name', 'Age']])

Displaying Rows: We use the head() method to display the first few
rows of the DataFrame. In this case, we show the first 3 rows.

Summary Information: The info() method provides a summary of the

DataFrame, including the number of rows and columns, column
names, non-null counts, and data types.

19
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

Descriptive Statistics: The describe() method provides descriptive

statistics for the numeric columns, including count, mean, standard
deviation, minimum, 25th percentile, median (50th percentile), 75th
percentile, and maximum.

Checking for Missing Values: The isnull() method checks for missing
values in the DataFrame. In this case, there are no missing values, so
all entries are False.

Counting Unique Values: We use value_counts() to count the unique

values in the 'City' column. This is useful for understanding the
distribution of categorical data.

Selecting Specific Columns: We select specific columns ('Name' and

'Age') by specifying their names within double square brackets.

DataFrames - Part 2: Indexing and Selection

1. Indexing and Selection: Indexing and selection in Pandas
DataFrames are essential operations for retrieving specific data from
your dataset. This allows you to access and manipulate the data you
need for analysis. import pandas as pd

# Creating a sample DataFrame

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 35],
'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']
}

20
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

df = pd.DataFrame(data)

# 1. Selecting a single column by column name

name_column = df['Name']
print("Name column:")
print(name_column)

# 2. Selecting multiple columns by column names

name_age_columns = df[['Name', 'Age']]
print("\nName and Age columns:")
print(name_age_columns)

# 3. Selecting rows by index (integer location)

second_row = df.iloc[1]
print("\nSecond row by integer location:")
print(second_row)

# 4. Selecting specific rows and columns by integer location

subset = df.iloc[1:3, 0:2]
print("\nSubset of rows and columns by integer location:")
print(subset)

# 5. Selecting rows based on a condition

young_people = df[df['Age'] < 30]

21
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

print("\nYoung people (age < 30):")

print(young_people)

# 6. Selecting rows based on multiple conditions

city_condition = (df['City'] == 'New York') | (df['City'] == 'Chicago')
selected_cities = df[city_condition]
print("\nSelected cities (New York or Chicago):")
print(selected_cities)
Selecting a Single Column: To select a single column by its name, use
square brackets and the column name as a string. In this case, we
selected the 'Name' column.

Selecting Multiple Columns: To select multiple columns, pass a list of

column names within double square brackets. Here, we selected both
the 'Name' and 'Age' columns.

Selecting Rows by Index (Integer Location): You can use the iloc
indexer to select specific rows by their integer location. In this
example, we selected the second row (index 1).

Selecting Specific Rows and Columns: By using iloc with row and
column indices, you can select a subset of rows and columns. Here,
we selected rows 1 and 2 and columns 0 and 1.

Selecting Rows Based on a Condition: You can filter rows based on a

condition. In this case, we selected all rows where the 'Age' is less
than 30.

22
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

Selecting Rows Based on Multiple Conditions: You can use logical

operators (| for OR, & for AND) to filter rows based on multiple
conditions. Here, we selected rows where the 'City' is either 'New
York' or 'Chicago'.

Column selection: 'df['Column'] or df.Column'

Multiple columns: 'df[['Column1', 'Column2']]'
Row selection using boolean indexing:' df[df['Column'] > value]'
Location-based selection: 'df.loc[row_label, column_label]'
Integer-based selection:' df.iloc[row_index, column_index]'
2. Filtering Data:

Using boolean conditions to filter rows: 'df[df['Column'] > value]'

Combining multiple conditions: 'df[(df['Column1'] > value1) &
(df['Column2'] < value2)]'

DataFrames - Part 3: Data Manipulation and Operations

Data manipulation and operations in Pandas DataFrames are essential
for transforming and processing data to extract meaningful insights.
1. Adding and Modifying Data:
Adding columns: 'df['NewColumn'] = data'
Modifying values based on conditions: 'df.loc[df['Column'] > value,
'NewColumn'] = new_value'
import pandas as pd
# Creating a sample DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
23
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

'Age': [25, 30, 22, 35],

'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
# 1. Adding a new column
df['Gender'] = ['Female', 'Male', 'Male', 'Male']
print("DataFrame with a new 'Gender' column:")
print(df)

# 2. Removing a column
df.drop(columns='Gender', inplace=True)
print("\nDataFrame with 'Gender' column removed:")
print(df)
# 3. Filtering rows based on a condition
young_people = df[df['Age'] < 30]
print("\nYoung people (age < 30):")
print(young_people)
# 4. Sorting the DataFrame by a column
sorted_df = df.sort_values(by='Age', ascending=False)
print("\nSorted DataFrame by 'Age' in descending order:")
print(sorted_df)
# 5. Aggregating data (calculating mean age)
mean_age = df['Age'].mean()
print("\nMean age of all people:")

24
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

print(mean_age)
# 6. Grouping and aggregating data
city_groups = df.groupby('City')['Age'].mean()
print("\nMean age of people in each city:")
print(city_groups)
# 7. Applying a function to a column
def is_adult(age):
return age >= 18

df['IsAdult'] = df['Age'].apply(is_adult)
print("\nDataFrame with 'IsAdult' column:")
print(df)
Adding a New Column: We added a new 'Gender' column to the
DataFrame by assigning a list of values to it.

Removing a Column: We removed the 'Gender' column using the drop

method with the columns parameter and set inplace=True to modify
the DataFrame in place.

Filtering Rows Based on a Condition: We filtered rows where the

'Age' is less than 30 to create a DataFrame of young people.

Sorting the DataFrame: We sorted the DataFrame by the 'Age' column

in descending order using sort_values. This helps in arranging data in
a specific order.

25
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

Aggregating Data: We calculated the mean age of all people using the
mean method on the 'Age' column.

Grouping and Aggregating Data: We grouped the data by the 'City'

column and calculated the mean age in each city using groupby and
mean.

Applying a Function to a Column: We applied a custom function

is_adult to the 'Age' column to create a new 'IsAdult' column based on
the age condition.

Missing Data:

Introduction:
Missing data is a common challenge in data analysis. Pandas provides
tools to handle and manage missing data effectively. import pandas as
pd
import numpy as np
# Create a sample DataFrame with missing data
data = {
'A': [1, 2, np.nan, 4, 5],
'B': [np.nan, 2, 3, 4, np.nan],
'C': [1, 2, 3, 4, 5]
}
df = pd.DataFrame(data)
# 1. Checking for missing data

26
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

print("Checking for missing data:")

print(df.isnull())
# 2. Counting missing values in each column
missing_count = df.isnull().sum()
print("\nCount of missing values in each column:")
print(missing_count)

# 3. Dropping rows or columns with missing data

df_dropped_rows = df.dropna() # Drop rows with any missing values
df_dropped_cols = df.dropna(axis=1) # Drop columns with any
missing values

# 4. Filling missing data

df_filled = df.fillna(0) # Fill missing values with 0

# 5. Interpolating missing values

df_interpolated = df.interpolate() # Interpolate missing values

# 6. Replacing missing values with a specific value

df_replaced = df.replace(np.nan, -1) # Replace NaN with -1

# 7. Forward-fill missing values

df_forward_filled = df.ffill()

# 8. Backward-fill missing values

27
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

df_backward_filled = df.bfill()
# 9. Checking for any missing values left
print("\nChecking for any missing values left:")
print(df.isnull())
Checking for Missing Data: We use the isnull() method to create a
Boolean DataFrame where True indicates missing values (NaN) and
False indicates non-missing values.

Counting Missing Values: We count the missing values in each

column using isnull().sum(), which gives the count of True values
(True = missing) in each column.

Dropping Rows or Columns with Missing Data: We can remove rows

or columns containing missing data using dropna(). The axis
parameter specifies whether to drop rows (axis=0) or columns
(axis=1). In this example, we create two DataFrames: one with rows
containing missing values removed and one with columns containing
missing values removed.

Filling Missing Data: We can fill missing values using fillna(). In this
example, we fill missing values with 0.

Interpolating Missing Data: Interpolation is used to estimate missing

values based on surrounding data points. We use interpolate() to fill
missing values based on linear interpolation.

Replacing Missing Data: We can replace missing values with a

specific value using replace(). In this case, we replace NaN with -1.

28
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

Forward-Fill and Backward-Fill: Forward-fill (ffill()) replaces missing

values with the previous non-missing value in the same column.
Backward-fill (bfill()) replaces missing values with the next non-
missing value in the same column.

Checking for Any Missing Values Left: After applying various

methods to handle missing data, we check if any missing values are
still present in the DataFrame.

Handling missing data is crucial because it ensures that your analysis

is based on complete and accurate information. Depending on your
specific dataset and analysis, you can choose the appropriate method
for dealing with missing data, whether it's dropping, filling,
interpolating, or replacing the missing values.

Groupby:
Introduction:
Groupby operation allows you to group data based on a column and
perform aggregate functions on the groups.
The groupby operation in Pandas is a powerful tool for splitting,
applying a function, and combining the results on a DataFrame based
on some criteria. It is often used for data aggregation and summary
statistics. Let's explore the groupby operation in Pandas
import pandas as pd

# Create a sample DataFrame

data = {
'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
'Value': [10, 15, 12, 18, 9, 11],
29
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

'Quantity': [100, 150, 120, 180, 90, 110]

}

df = pd.DataFrame(data)

# Grouping by 'Category'
grouped = df.groupby('Category')

# 1. Applying an aggregation function (e.g., mean) to grouped data

mean_values = grouped.mean()
print("Mean values for each category:")
print(mean_values)

# 2. Applying multiple aggregation functions

agg_functions = {
'Value': 'mean',
'Quantity': 'sum'
}
aggregated = grouped.agg(agg_functions)
print("\nAggregated values (mean for 'Value' and sum for
'Quantity'):")
print(aggregated)

# 3. Applying a custom aggregation function

def custom_aggregation(arr):

30
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

return arr.max() - arr.min()

custom_agg = grouped['Value'].agg(custom_aggregation)
print("\nCustom aggregation (max - min) for 'Value' in each
category:")
print(custom_agg)

# 4. Iterating through groups

print("\nIterating through groups and displaying them:")
for category, group_data in grouped:
print(f"Category: {category}")
print(group_data)

# 5. Selecting a specific group

group_a = grouped.get_group('A')
print("\nData for 'Category' A:")
print(group_a)
Applying an Aggregation Function: We group the DataFrame by the
'Category' column using groupby. Then, we calculate the mean values
for each group using the mean() function. This gives us the average
'Value' and 'Quantity' for each category.

Applying Multiple Aggregation Functions: We define a dictionary of

aggregation functions for specific columns and use the agg method to
apply them to the grouped data. In this case, we calculate the mean of
'Value' and the sum of 'Quantity' for each category.

31
Asst Professor :- Ingle A.R
Subject Title : Data Science Using Python SEMESTER - III
Subject Ref. No. : MANC503

Applying a Custom Aggregation Function: We define a custom

aggregation function custom_aggregation that computes the range
(max - min) of a given array. We apply this function to the 'Value'
column in each category and display the results.
Iterating Through Groups: We iterate through the groups created by
groupby and display each group's data. This can be useful for custom
processing or analysis on each group separately.

Selecting a Specific Group: We use get_group to retrieve data for a

specific group, in this case, 'Category' A. This allows you to access
and manipulate data for a specific category easily.

32
Asst Professor :- Ingle A.R

TEACHING SOCIAL STUDIES IN THE ELEMENTARY GRADES Culture
89% (9)
TEACHING SOCIAL STUDIES IN THE ELEMENTARY GRADES Culture
88 pages
Eclipse Download and Installation Instructions
No ratings yet
Eclipse Download and Installation Instructions
15 pages
Cracking The Mayan Code: Stela Stelas
No ratings yet
Cracking The Mayan Code: Stela Stelas
6 pages
Angleski Glagol 2
No ratings yet
Angleski Glagol 2
82 pages
09 20241101 NumPy
No ratings yet
09 20241101 NumPy
38 pages
Unit - Iii
No ratings yet
Unit - Iii
79 pages
Numpy Tutorial
No ratings yet
Numpy Tutorial
19 pages
Numpy Basics
No ratings yet
Numpy Basics
66 pages
Numpy
No ratings yet
Numpy
14 pages
Numpy
No ratings yet
Numpy
24 pages
NUMPY
No ratings yet
NUMPY
33 pages
python-notes-BCC-302 (Unit - 05)
No ratings yet
python-notes-BCC-302 (Unit - 05)
25 pages
Array in Python
No ratings yet
Array in Python
33 pages
Numpy
No ratings yet
Numpy
7 pages
Unit 3 - Numpy - VP
No ratings yet
Unit 3 - Numpy - VP
53 pages
NUMPYA03
No ratings yet
NUMPYA03
36 pages
Lecture 2 - NumPy I
No ratings yet
Lecture 2 - NumPy I
12 pages
Unit 4 Python Numpy
No ratings yet
Unit 4 Python Numpy
18 pages
Unit 3 Numpy
No ratings yet
Unit 3 Numpy
23 pages
Num Py
No ratings yet
Num Py
8 pages
Lecture 2 - NumPy I
No ratings yet
Lecture 2 - NumPy I
12 pages
Print
No ratings yet
Print
296 pages
Getting Started With NumPy in Data Analytics
No ratings yet
Getting Started With NumPy in Data Analytics
45 pages
Data Science Handwritten Notes - 3
No ratings yet
Data Science Handwritten Notes - 3
26 pages
Lab 1
No ratings yet
Lab 1
6 pages
Numpy
No ratings yet
Numpy
32 pages
Basic of Numphy
No ratings yet
Basic of Numphy
14 pages
Unit3 - Arrays and Strings
No ratings yet
Unit3 - Arrays and Strings
20 pages
Dse Unit 3
No ratings yet
Dse Unit 3
12 pages
Week2-1 Numpy
No ratings yet
Week2-1 Numpy
43 pages
Lab-3 AI
No ratings yet
Lab-3 AI
21 pages
Unit - V
No ratings yet
Unit - V
90 pages
Unit 4
No ratings yet
Unit 4
19 pages
Unit 1
No ratings yet
Unit 1
170 pages
Python 5th Sem
No ratings yet
Python 5th Sem
33 pages
NumPy Class 11th
No ratings yet
NumPy Class 11th
10 pages
Numpy in Python
No ratings yet
Numpy in Python
34 pages
Num Py
No ratings yet
Num Py
71 pages
NumPy - The Absolute Basics For Beginners - NumPy v1.23 Manual
No ratings yet
NumPy - The Absolute Basics For Beginners - NumPy v1.23 Manual
29 pages
Numpy
No ratings yet
Numpy
27 pages
Mds1111 Merged Numbered
No ratings yet
Mds1111 Merged Numbered
41 pages
Module Numpy
No ratings yet
Module Numpy
67 pages
Numpy
No ratings yet
Numpy
64 pages
Unit III - Data Manipulation Using Python
No ratings yet
Unit III - Data Manipulation Using Python
16 pages
Numpy, Pandas and Matplotlib
No ratings yet
Numpy, Pandas and Matplotlib
60 pages
Satish Dangi
No ratings yet
Satish Dangi
13 pages
NumPy Basics
No ratings yet
NumPy Basics
23 pages
Experiment 3
No ratings yet
Experiment 3
3 pages
Python Module 5
No ratings yet
Python Module 5
43 pages
Unit-V Python - BCC402
No ratings yet
Unit-V Python - BCC402
20 pages
45B AIML Practical1.1
No ratings yet
45B AIML Practical1.1
57 pages
Numpy Full
100% (1)
Numpy Full
40 pages
Numpy
No ratings yet
Numpy
23 pages
PP&DS 3
No ratings yet
PP&DS 3
109 pages
An Introduction To Numpy and Scipy by Scott Shell
No ratings yet
An Introduction To Numpy and Scipy by Scott Shell
24 pages
3 Introduction To Numpy
No ratings yet
3 Introduction To Numpy
9 pages
11 NumPy
No ratings yet
11 NumPy
14 pages
Lab 1 - Introduction
No ratings yet
Lab 1 - Introduction
14 pages
Introduction To NumPy
No ratings yet
Introduction To NumPy
5 pages
Numpy
No ratings yet
Numpy
9 pages
Lecture 2 - NumPy I
No ratings yet
Lecture 2 - NumPy I
11 pages
FDS Unit 4
No ratings yet
FDS Unit 4
66 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Year 2-Assignment 1
No ratings yet
Year 2-Assignment 1
2 pages
Hayes Gte537 ResearchProject
No ratings yet
Hayes Gte537 ResearchProject
7 pages
4 Poem of Return
100% (1)
4 Poem of Return
16 pages
Code nhập key Windows - Office.0
No ratings yet
Code nhập key Windows - Office.0
3 pages
FP1 U0S Grammar Practice Plus
No ratings yet
FP1 U0S Grammar Practice Plus
1 page
Hades and The Underworld
No ratings yet
Hades and The Underworld
5 pages
Pandas
No ratings yet
Pandas
9 pages
Introduction To Access SQL: SELECT Statements
No ratings yet
Introduction To Access SQL: SELECT Statements
5 pages
Notes On Kant's Prolegomena To Any Future Metaphysics
100% (1)
Notes On Kant's Prolegomena To Any Future Metaphysics
5 pages
Stand Alone Romanian
No ratings yet
Stand Alone Romanian
197 pages
How To Master The Art of Speaking (And Blow Up Your Content) (DownSub - Com)
No ratings yet
How To Master The Art of Speaking (And Blow Up Your Content) (DownSub - Com)
27 pages
Untitled
No ratings yet
Untitled
772 pages
Xamarin For Mobile Development Concepts
No ratings yet
Xamarin For Mobile Development Concepts
36 pages
ST Stephen School Sonarpur Bengali
No ratings yet
ST Stephen School Sonarpur Bengali
4 pages
Paper II LDC DMR
No ratings yet
Paper II LDC DMR
9 pages
Db2 Training Class 001
No ratings yet
Db2 Training Class 001
13 pages
Recursion Solutions
No ratings yet
Recursion Solutions
11 pages
Druid Vs Necromancer Spell List PDF
No ratings yet
Druid Vs Necromancer Spell List PDF
1 page
Reading Torch Test Comprehensive Lesson Plan 2
No ratings yet
Reading Torch Test Comprehensive Lesson Plan 2
5 pages
Let's Help Endangered Animals - Big or Small, Save Them All: Teacher: RAMIREZ GAMARRA, Bernabe Juan
No ratings yet
Let's Help Endangered Animals - Big or Small, Save Them All: Teacher: RAMIREZ GAMARRA, Bernabe Juan
5 pages
Unit 1 Session 4
No ratings yet
Unit 1 Session 4
7 pages
AENG 243 Syllabus
No ratings yet
AENG 243 Syllabus
6 pages
Communication Is A Bridge To Reach Other People With Thoughts, Ideas and Facts
No ratings yet
Communication Is A Bridge To Reach Other People With Thoughts, Ideas and Facts
17 pages
PLAGUE OF WAR - ATHENS, SPARTA Roberts
100% (1)
PLAGUE OF WAR - ATHENS, SPARTA Roberts
404 pages
04-Systems Development Life Cycle
No ratings yet
04-Systems Development Life Cycle
8 pages
Cao - Unit 4 - Notes - Final
No ratings yet
Cao - Unit 4 - Notes - Final
30 pages