Unit 5
NumPy: (Numerical Python): Introduction to Numpy, Data types of arrays, Dealing
with ndarrays, copies and views, Arithmetic operations, Indexing, Slicing, splitting
arrays, Shape manipulation, Stacking together di fferent data.
Pandas: Pandas: (Data Analysis): Data Frame and Series, Data Frame operations, Data
Slicing, indexing, Data Frame functions, Reading the files-csv, excel.
NumPy and Pandas
NumPy: Numerical Python
NumPy (short for Numerical Python) is a powerful library for numerical
computations in Python. It offers support for multi-dimensional arrays, matrices, and
high-level mathematical functions to operate on these arrays efficiently.
1. Introduction to NumPy
• NumPy is used for fast mathematical and logical operations on arrays.
• It provides efficient ways to store large data and manipulate it in the form of
ndarrays (N-dimensional arrays).
• NumPy arrays are much faster and more memory-efficient than Python lists.
Installation:
pip install numpy
Importing NumPy:
import numpy as np
2. Data Types of Arrays
NumPy supports many data types, such as:
• int: Integer types like int32, int64
• float: Floating-point types like float32, float64
• complex: Complex numbers
• bool: Boolean values
• object: General Python objects
• str: Unicode string
Example: Creating arrays with different data types:
import numpy as np
arr1 = np.array([1, 2, 3], dtype='int32') # Integer array
arr2 = np.array([1.1, 2.2, 3.3], dtype='float64') # Float array
arr3 = np.array([True, False, True], dtype='bool') # Boolean array
print(arr1, arr1.dtype)
print(arr2, arr2.dtype)
print(arr3, arr3.dtype)
Output:
[1 2 3] int32
[1.1 2.2 3.3] float64
[ True False True] bool
3. Dealing with ndarrays
An ndarray is the core data structure of NumPy. It is a fast, N-dimensional container
for homogeneous data.
Creating Arrays:
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
print(f'Shape: {arr.shape}, Dimensions: {arr.ndim}')
Output:
[[1 2 3]
[4 5 6]]
Shape: (2, 3), Dimensions: 2
4. Copies and Views
• Copy: A new independent array is created.
• View: A new array refers to the original array’s data.
Example:
arr = np.array([10, 20, 30])
copy_arr = arr.copy() # Independent copy
view_arr = arr.view() # Just a view of original data
copy_arr[0] = 99
view_arr[1] = 88
print('Original:', arr) # [10 88 30]
print('Copy:', copy_arr) # [99 20 30]
print('View:', view_arr) # [10 88 30]
5. Arithmetic Operations on Arrays
You can perform element-wise operations with NumPy arrays.
Example:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
print(arr1 + arr2) # [5 7 9]
print(arr1 * arr2) # [4 10 18]
print(arr1 ** 2) # [1 4 9]
6. Indexing and Slicing
• Indexing: Accessing elements using indices.
• Slicing: Accessing a range of elements.
Example:
arr = np.array([10, 20, 30, 40, 50])
print(arr[1]) # 20 (Indexing)
print(arr[1:4]) # [20 30 40] (Slicing)
print(arr[-1]) # 50 (Negative Indexing)
7. Splitting Arrays
You can split a larger array into smaller ones.
Example:
arr = np.array([1, 2, 3, 4, 5, 6])
split_arr = np.array_split(arr, 3)
print(split_arr) # [array([1, 2]), array([3, 4]), array([5, 6])]
8. Shape Manipulation
You can reshape arrays to change their dimensions.
Example:
arr = np.array([1, 2, 3, 4, 5, 6])
reshaped = arr.reshape(2, 3)
print(reshaped)
Output:
[[1 2 3]
[4 5 6]]
9. Stacking Arrays
You can stack arrays vertically or horizontally.
Example:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
vstack = np.vstack((arr1, arr2))
hstack = np.hstack((arr1, arr2))
print('Vertical Stack:\n', vstack)
print('Horizontal Stack:', hstack)
Pandas: Data Analysis Library
Pandas is a Python library used for data manipulation and analysis. It provides two
main data structures:
• Series: One-dimensional labeled array.
• DataFrame: Two-dimensional labeled data structure.
1. Introduction to Pandas
• It allows importing, cleaning, transforming, and analyzing data.
• Pandas is especially useful for working with CSV or Excel files.
Installation:
pip install pandas
Importing Pandas:
import pandas as pd
2. Series and DataFrame
• Series: One-dimensional array with labels.
• DataFrame: Two-dimensional array (table) with rows and columns.
Example:
# Creating a Series
s = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
print(s)
# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [24, 27, 22]}
df = pd.DataFrame(data)
print(df)
Output:
a 1
b 2
c 3
d 4
dtype: int64
Name Age
0 Alice 24
1 Bob 27
2 Charlie 22
3. DataFrame Operations
You can manipulate DataFrames using functions like head(), tail(), or describe().
Example:
print(df.head()) # First few rows
print(df.describe()) # Summary statistics
4. Data Slicing and Indexing
You can select specific rows and columns using labels or positions.
Example:
print(df['Name']) # Select a column
print(df.iloc[1]) # Select the second row
print(df[df['Age'] > 23]) # Filter rows
5. DataFrame Functions
Some useful functions:
• sort_values(): Sorts data based on a column.
• drop(): Removes a column or row.
• fillna(): Replaces missing values.
Example:
df['Age'] = df['Age'].fillna(0) # Replace NaN with 0
sorted_df = df.sort_values(by='Age')
print(sorted_df)
6. Reading Files (CSV, Excel)
Pandas can read data from various formats, including CSV and Excel.
Reading a CSV File:
df = pd.read_csv('data.csv')
print(df.head())
Reading an Excel File:
df = pd.read_excel('data.xlsx')
print(df.head())
Summary
• NumPy: Used for numerical computations with fast operations on arrays.
• Pandas: Used for data manipulation and analysis, especially useful for tabular
data in CSV/Excel formats.
• Key operations include indexing, slicing, reshaping, and reading files.
2 Marks Questions (Simple Conceptual or Definition-based Questions)
1. What is NumPy?
Answer:
NumPy (Numerical Python) is a Python library used for scientific computing. It
provides support for multi-dimensional arrays and mathematical operations on
these arrays, such as linear algebra, statistical operations, and element-wise
operations. It is faster than traditional Python lists due to its optimized C-based
implementation.
2. What is a Pandas DataFrame?
Answer:
A Pandas DataFrame is a 2-dimensional, tabular data structure with labeled axes
(rows and columns). It is similar to a spreadsheet or SQL table and is useful for
working with structured data.
3. Explain 'ndarray' in NumPy.
Answer:
ndarray (N-dimensional array) is the core data structure in NumPy. It can hold
multiple elements of the same data type across various dimensions (1D, 2D, or
more). Operations on these arrays are performed element-wise and efficiently.
4. How do you read a CSV file using Pandas?
Answer:
You can read a CSV file using the read_csv() function from the Pandas library:
import pandas as pd
data = pd.read_csv('filename.csv')
5. What is the difference between a view and a copy in NumPy?
Answer:
• View: A view refers to shared data; changes in the original array reflect in
the view.
• Copy: A copy creates a new array independent of the original; changes in
one do not affect the other.
5 Marks Questions (Explanation and Short Code Questions)
1. Explain the difference between NumPy arrays and Python lists with an
example.
Answer:
• Python Lists: Can hold elements of different data types, but they are slower
and occupy more memory.
• NumPy Arrays: Store elements of the same data type. Operations are faster
because of better memory management.
Example:
import numpy as np
# NumPy array
arr = np.array([1, 2, 3])
# Python list
lst = [1, 2, 3]
Operations like element-wise addition are faster in NumPy:
arr + 2 # Output: [3 4 5]
2. What are Pandas Series? How do you create one?
Answer:
A Pandas Series is a one-dimensional labeled array capable of holding any data
type. It can act like a list or dictionary.
Code Example:
import pandas as pd
# Creating a Series from a list
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s)
Output:
a 10
b 20
c 30
dtype: int64
3. Write a code to slice a NumPy array.
Answer:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Slice to get the first two rows and first two columns
sliced_arr = arr[:2, :2]
print(sliced_arr)
Output:
[[1 2]
[4 5]]
4. How can you change the shape of a NumPy array?
Answer:
You can change the shape using the reshape() method.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
reshaped_arr = arr.reshape(2, 3)
print(reshaped_arr)
Output:
[[1 2 3]
[4 5 6]]
5. Explain the use of DataFrame indexing with an example.
Answer: You can index a DataFrame using row and column labels.
import pandas as pd
data = {'Name': ['Alice', 'Bob'], 'Age': [24, 27]}
df = pd.DataFrame(data)
# Indexing a single column
print(df['Name'])
# Indexing a specific row
print(df.loc[1])
Output:
0 Alice
1 Bob
Name: Name, dtype: object
Name Bob
Age 27
Name: 1, dtype: object
10 Marks Questions (Detailed Questions with Code Examples)
1. Explain arithmetic operations on NumPy arrays with examples.
Answer:
NumPy allows element-wise arithmetic operations such as addition, subtraction,
multiplication, and division.
Example:
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
# Addition
print(arr1 + arr2)
# Multiplication
print(arr1 * arr2)
# Scalar addition
print(arr1 + 10)
Output:
[5 7 9]
[ 4 10 18]
[11 12 13]
2. Write a program to split a NumPy array into sub-arrays.
Answer:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
# Splitting into 3 sub-arrays
sub_arrays = np.split(arr, 3)
print(sub_arrays)
Output:
[array([1, 2]), array([3, 4]), array([5, 6])]
3. How can you read an Excel file using Pandas? Write a code example.
Answer:
import pandas as pd
# Reading an Excel file
df = pd.read_excel('sample_data.xlsx')
print(df.head())
This will read the Excel file and print the first 5 rows using head().
15 Marks Questions (In-depth Questions Covering Concepts and Code)
1. Explain how you can stack NumPy arrays and manipulate their shapes.
Provide examples.
Answer:
Stacking means combining multiple arrays along a particular axis.
Code Example:
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
# Stacking along rows (axis=0)
stacked_rows = np.vstack((arr1, arr2))
print("Stacked Rows:\n", stacked_rows)
# Stacking along columns (axis=1)
stacked_columns = np.column_stack((arr1, arr2))
print("Stacked Columns:\n", stacked_columns)
Output:
Stacked Rows:
[[1 2 3]
[4 5 6]]
Stacked Columns:
[[1 4]
[2 5]
[3 6]]
2. Write a Pandas program to perform the following operations: (1) Create a
DataFrame, (2) Filter rows based on a condition, (3) Perform a group-by
operation.
Answer:
import pandas as pd
# (1) Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [24, 27, 22, 32],
'City': ['NY', 'LA', 'NY', 'LA']}
df = pd.DataFrame(data)
# (2) Filtering rows where Age > 25
filtered_df = df[df['Age'] > 25]
print("Filtered Data:\n", filtered_df)
# (3) Grouping by 'City' and calculating the mean age
grouped = df.groupby('City')['Age'].mean()
print("Mean Age by City:\n", grouped)
Output:
Filtered Data:
Name Age City
1 Bob 27 LA
3 David 32 LA
Mean Age by City:
City
LA 29.5
NY 23.0
Name: Age, dtype: float64