Lab 3 & 4
Lab 3 & 4
Duration: 3 Hours
Required Library: NumPy
Environment: Jupyter Notebook / Google Colab
What is NumPy?
NumPy is a library for working with arrays. It provides powerful tools for numerical computing
and supports high-performance multidimensional array objects.
Code:
import numpy as np
a = np.array([1, 2, 3])
print("1D Array:", a)
print("2D Array:\n", b)
Explanation:
• a is a 1-dimensional array.
Code:
x = np.array([[2, 4], [6, 8]])
print("Addition:\n", x + y)
print("Element-wise Multiplication:\n", x * y)
Explanation:
Code:
print("Mean:", np.mean(data))
print("Median:", np.median(data))
Explanation:
• np.mean(): Calculates the mean of the data.
Student Exercises
Code:
# Creating a 1D array
arr = np.arange(12)
reshaped_arr = arr.reshape(3, 4)
• reshape() changes the shape of the array into the desired dimensions.
Code:
second_row = matrix[1, :]
second_column = matrix[:, 1]
print("Second Column:", second_column)
Explanation:
Code:
a = np.array([1, 2, 3])
Explanation:
Code:
random_numbers = np.random.rand(5)
Duration: 3 Hours
Required Library: Pandas
Dataset Used: Manual or students.csv
What is Pandas?
Pandas is a library used for working with structured data. It offers two main structures:
• Series – 1D labeled array.
Code:
import pandas as pd
data = {
df = pd.DataFrame(data)
print("DataFrame:\n", df)
print("\nSummary:\n", df.describe())
Explanation:
• A DataFrame is created using a dictionary of lists, where the keys are column names and
the values are column data.
Code:
df['Marks'].fillna(df['Marks'].mean(), inplace=True)
Explanation:
• fillna() is used to replace missing values (NaN) with the mean of the column.
Code:
grouped = df.groupby("Subject")["Marks"].mean()
Explanation:
• Aggregation functions like mean() can be used to summarize data within each group.
df_csv = pd.read_csv("students.csv")
print(df_csv.head())
Explanation:
• pd.read_csv() loads data from a CSV file into a DataFrame.
Code:
Explanation:
Code:
Explanation:
• Conditional filtering allows you to select rows based on certain criteria (e.g., marks
greater than 80).
Explanation:
• Here, the lambda function checks if marks are above or below 50.
Code:
# Create another DataFrame
df2 = pd.DataFrame({
})
Explanation:
Code:
plt.title("Marks Distribution")
plt.xlabel("Students")
plt.ylabel("Marks")
plt.show()
Explanation:
Code:
# Convert 'Subject' column to categorical type
df['Subject'] = df['Subject'].astype('category')
Explanation:
Student Exercises
1. Manually create a DataFrame with 5 students, columns: Name, Age, Marks, Subject.
3. Use df[df["Marks"] > 80] to filter students with marks above 80.
4. Group students by subject and count how many students each subject has.
6. Create a DataFrame from a dictionary of your choice, then explore basic statistics using
df.describe().
7. Calculate the percentage of missing values in each column using df.isnull().sum() / len(df) *
100.
8. Filter rows where the "Marks" column has values less than 50 and save the result to a
new DataFrame.
9. Use groupby() to find the average marks per subject and then sort the subjects by average
marks in ascending order.
10. Plot a pie chart of the distribution of marks in the "Pass/Fail" column.