0% found this document useful (0 votes)
0 views

Lab 3 & 4

The document outlines two lab sessions focused on NumPy and Pandas for data manipulation and analysis. It includes tasks on creating and manipulating arrays, performing descriptive statistics, and handling data in DataFrames, including operations like grouping, sorting, and merging. Additionally, it provides student exercises to reinforce the concepts learned.

Uploaded by

khancheck063
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Lab 3 & 4

The document outlines two lab sessions focused on NumPy and Pandas for data manipulation and analysis. It includes tasks on creating and manipulating arrays, performing descriptive statistics, and handling data in DataFrames, including operations like grouping, sorting, and merging. Additionally, it provides student exercises to reinforce the concepts learned.

Uploaded by

khancheck063
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Lab 03: NumPy – Matrix Operations & Descriptive Statistics

Duration: 3 Hours
Required Library: NumPy
Environment: Jupyter Notebook / Google Colab

Part A: Introduction to NumPy

What is NumPy?

NumPy is a library for working with arrays. It provides powerful tools for numerical computing
and supports high-performance multidimensional array objects.

Task 1: Creating Arrays

Code:
import numpy as np

a = np.array([1, 2, 3])

b = np.array([[1, 2], [3, 4]])

print("1D Array:", a)

print("2D Array:\n", b)
Explanation:

• a is a 1-dimensional array.

• b is a 2-dimensional array (matrix).

Task 2: Basic Array Operations

Code:
x = np.array([[2, 4], [6, 8]])

y = np.array([[1, 1], [1, 1]])

print("Addition:\n", x + y)
print("Element-wise Multiplication:\n", x * y)

print("Matrix Multiplication:\n", np.dot(x, y))

Explanation:

• Element-wise operations (+, *) on arrays.

• Matrix multiplication using np.dot().

Task 3: Descriptive Statistics

Code:

data = np.array([10, 20, 30, 40, 50])

print("Mean:", np.mean(data))

print("Median:", np.median(data))

print("Standard Deviation:", np.std(data))

Explanation:
• np.mean(): Calculates the mean of the data.

• np.median(): Finds the median of the data.

• np.std(): Computes the standard deviation.

Student Exercises

1. Create a 3x3 matrix and find its transpose using np.transpose().


2. Generate an array of 10 random integers between 1 and 100, then compute the mean.

3. Find the max, min, and sum of elements in a 2D array.


4. Compute row-wise and column-wise sums using axis=0 and axis=1.
Task 4: Reshaping Arrays

Code:

# Creating a 1D array

arr = np.arange(12)

# Reshaping it into a 3x4 matrix

reshaped_arr = arr.reshape(3, 4)

print("Reshaped Array:\n", reshaped_arr)


Explanation:

• np.arange() generates a sequence of numbers.

• reshape() changes the shape of the array into the desired dimensions.

Task 5: Indexing & Slicing

Code:

matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Extracting the second row

second_row = matrix[1, :]

print("Second Row:", second_row)

# Extracting the second column

second_column = matrix[:, 1]
print("Second Column:", second_column)
Explanation:

• Indexing allows us to extract specific rows and columns from a matrix.

• Slicing allows access to a subset of the array.

Task 6: Broadcasting in NumPy

Code:

a = np.array([1, 2, 3])

b = np.array([10, 20, 30])

# Element-wise addition with broadcasting


result = a + b

print("Broadcasting Result:", result)

Explanation:

• Broadcasting allows arrays of different shapes to be used in arithmetic operations.

Task 7: Random Numbers in NumPy

Code:

# Generate 5 random numbers between 0 and 1

random_numbers = np.random.rand(5)

print("Random Numbers:", random_numbers)

# Generate a 3x3 matrix with random integers between 1 and 100


random_integers = np.random.randint(1, 101, (3, 3))

print("Random Integers:\n", random_integers)


Explanation:
• np.random.rand() generates random floating-point numbers.

• np.random.randint() generates random integers in a specified range.

Lab 04: Pandas – Data Handling, Grouping, Aggregation

Duration: 3 Hours
Required Library: Pandas
Dataset Used: Manual or students.csv

Part A: Introduction to Pandas

What is Pandas?

Pandas is a library used for working with structured data. It offers two main structures:
• Series – 1D labeled array.

• DataFrame – 2D labeled tabular data.

Task 1: Creating DataFrames

Code:

import pandas as pd

data = {

"Name": ["Ali", "Sara", "Ahmed", "Fatima"],

"Marks": [85, 90, 78, None],

"Subject": ["Math", "Science", "Math", "Science"]

df = pd.DataFrame(data)

print("DataFrame:\n", df)
print("\nSummary:\n", df.describe())
Explanation:

• A DataFrame is created using a dictionary of lists, where the keys are column names and
the values are column data.

• df.describe() gives summary statistics of numerical columns.

Task 2: Handling Missing Values

Code:

print("Original Data:\n", df)

# Fill missing values with mean

df['Marks'].fillna(df['Marks'].mean(), inplace=True)

print("\nAfter Filling Missing Marks:\n", df)

Explanation:

• fillna() is used to replace missing values (NaN) with the mean of the column.

Task 3: Grouping and Aggregation

Code:

grouped = df.groupby("Subject")["Marks"].mean()

print("Average Marks by Subject:\n", grouped)

Explanation:

• groupby() groups the DataFrame by a specified column.

• Aggregation functions like mean() can be used to summarize data within each group.

Task 4: Reading a CSV File


Code:

df_csv = pd.read_csv("students.csv")

print(df_csv.head())

Explanation:
• pd.read_csv() loads data from a CSV file into a DataFrame.

• head() shows the first 5 rows of the DataFrame.

Task 5: Sorting DataFrames

Code:

# Sort the DataFrame by 'Marks'

sorted_df = df.sort_values(by="Marks", ascending=False)


print("Sorted DataFrame by Marks:\n", sorted_df)

Explanation:

• sort_values() is used to sort the DataFrame based on a column (here, "Marks").

• ascending=False ensures sorting in descending order.

Task 6: Filtering DataFrames

Code:

# Filter students who have marks greater than 80


filtered_df = df[df["Marks"] > 80]

print("Filtered DataFrame (Marks > 80):\n", filtered_df)

Explanation:

• Conditional filtering allows you to select rows based on certain criteria (e.g., marks
greater than 80).

Task 7: Adding a New Column


Code:

# Add a new column 'Pass/Fail' based on the Marks

df["Pass/Fail"] = df["Marks"].apply(lambda x: "Pass" if x >= 50 else "Fail")


print("DataFrame with 'Pass/Fail' column:\n", df)

Explanation:

• .apply() applies a function to each element of the column.

• Here, the lambda function checks if marks are above or below 50.

Task 8: Merging DataFrames

Code:
# Create another DataFrame

df2 = pd.DataFrame({

"Name": ["Ali", "Sara", "Ahmed", "Fatima"],

"Age": [18, 19, 20, 21]

})

# Merge df and df2 based on 'Name' column


merged_df = pd.merge(df, df2, on="Name")

print("Merged DataFrame:\n", merged_df)

Explanation:

• pd.merge() combines two DataFrames based on a common column (e.g., "Name").

Task 9: Plotting Data

Code:

import matplotlib.pyplot as plt


# Plot Marks distribution
df['Marks'].plot(kind='bar')

plt.title("Marks Distribution")

plt.xlabel("Students")

plt.ylabel("Marks")
plt.show()

Explanation:

• plot(kind='bar') creates a bar plot of marks.

• The plot shows the distribution of marks for each student.

Task 10: Handling Categorical Data

Code:
# Convert 'Subject' column to categorical type

df['Subject'] = df['Subject'].astype('category')

print("Converted 'Subject' column to categorical data:\n", df.dtypes)

Explanation:

• astype('category') converts the 'Subject' column to a categorical type, which is more


memory-efficient for repetitive text data.

Student Exercises

1. Manually create a DataFrame with 5 students, columns: Name, Age, Marks, Subject.

2. Use dropna() to remove rows with missing data.

3. Use df[df["Marks"] > 80] to filter students with marks above 80.

4. Group students by subject and count how many students each subject has.

5. Export the final DataFrame to CSV using df.to_csv("final_data.csv").

6. Create a DataFrame from a dictionary of your choice, then explore basic statistics using
df.describe().
7. Calculate the percentage of missing values in each column using df.isnull().sum() / len(df) *
100.
8. Filter rows where the "Marks" column has values less than 50 and save the result to a
new DataFrame.
9. Use groupby() to find the average marks per subject and then sort the subjects by average
marks in ascending order.
10. Plot a pie chart of the distribution of marks in the "Pass/Fail" column.

You might also like