0% found this document useful (0 votes)

2 views10 pages

Lab 3 & 4

The document outlines two lab sessions focused on NumPy and Pandas for data manipulation and analysis. It includes tasks on creating and manipulating arrays, performing descriptive statistics, and handling data in DataFrames, including operations like grouping, sorting, and merging. Additionally, it provides student exercises to reinforce the concepts learned.

Uploaded by

khancheck063

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views10 pages

Lab 3 & 4

Uploaded by

khancheck063

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Lab 03: NumPy – Matrix Operations & Descriptive Statistics

Duration: 3 Hours
Required Library: NumPy
Environment: Jupyter Notebook / Google Colab

Part A: Introduction to NumPy

What is NumPy?

NumPy is a library for working with arrays. It provides powerful tools for numerical computing
and supports high-performance multidimensional array objects.

Task 1: Creating Arrays

Code:
import numpy as np

a = np.array([1, 2, 3])

b = np.array([[1, 2], [3, 4]])

print("1D Array:", a)

print("2D Array:\n", b)
Explanation:

• a is a 1-dimensional array.

• b is a 2-dimensional array (matrix).

Task 2: Basic Array Operations

Code:
x = np.array([[2, 4], [6, 8]])

y = np.array([[1, 1], [1, 1]])

print("Addition:\n", x + y)
print("Element-wise Multiplication:\n", x * y)

print("Matrix Multiplication:\n", np.dot(x, y))

Explanation:

• Element-wise operations (+, *) on arrays.

• Matrix multiplication using np.dot().

Task 3: Descriptive Statistics

Code:

data = np.array([10, 20, 30, 40, 50])

print("Mean:", np.mean(data))

print("Median:", np.median(data))

print("Standard Deviation:", np.std(data))

Explanation:
• np.mean(): Calculates the mean of the data.

• np.median(): Finds the median of the data.

• np.std(): Computes the standard deviation.

Student Exercises

1. Create a 3x3 matrix and find its transpose using np.transpose().

2. Generate an array of 10 random integers between 1 and 100, then compute the mean.

3. Find the max, min, and sum of elements in a 2D array.

4. Compute row-wise and column-wise sums using axis=0 and axis=1.
Task 4: Reshaping Arrays

Code:

# Creating a 1D array

arr = np.arange(12)

# Reshaping it into a 3x4 matrix

reshaped_arr = arr.reshape(3, 4)

print("Reshaped Array:\n", reshaped_arr)

Explanation:

• np.arange() generates a sequence of numbers.

• reshape() changes the shape of the array into the desired dimensions.

Task 5: Indexing & Slicing

Code:

matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Extracting the second row

second_row = matrix[1, :]

print("Second Row:", second_row)

# Extracting the second column

second_column = matrix[:, 1]
print("Second Column:", second_column)
Explanation:

• Indexing allows us to extract specific rows and columns from a matrix.

• Slicing allows access to a subset of the array.

Task 6: Broadcasting in NumPy

Code:

a = np.array([1, 2, 3])

b = np.array([10, 20, 30])

# Element-wise addition with broadcasting

result = a + b

print("Broadcasting Result:", result)

Explanation:

• Broadcasting allows arrays of different shapes to be used in arithmetic operations.

Task 7: Random Numbers in NumPy

Code:

# Generate 5 random numbers between 0 and 1

random_numbers = np.random.rand(5)

print("Random Numbers:", random_numbers)

# Generate a 3x3 matrix with random integers between 1 and 100

random_integers = np.random.randint(1, 101, (3, 3))

print("Random Integers:\n", random_integers)

Explanation:
• np.random.rand() generates random floating-point numbers.

• np.random.randint() generates random integers in a specified range.

Lab 04: Pandas – Data Handling, Grouping, Aggregation

Duration: 3 Hours
Required Library: Pandas
Dataset Used: Manual or students.csv

Part A: Introduction to Pandas

What is Pandas?

Pandas is a library used for working with structured data. It offers two main structures:
• Series – 1D labeled array.

• DataFrame – 2D labeled tabular data.

Task 1: Creating DataFrames

Code:

import pandas as pd

data = {

"Name": ["Ali", "Sara", "Ahmed", "Fatima"],

"Marks": [85, 90, 78, None],

"Subject": ["Math", "Science", "Math", "Science"]

df = pd.DataFrame(data)

print("DataFrame:\n", df)
print("\nSummary:\n", df.describe())
Explanation:

• A DataFrame is created using a dictionary of lists, where the keys are column names and
the values are column data.

• df.describe() gives summary statistics of numerical columns.

Task 2: Handling Missing Values

Code:

print("Original Data:\n", df)

# Fill missing values with mean

df['Marks'].fillna(df['Marks'].mean(), inplace=True)

print("\nAfter Filling Missing Marks:\n", df)

Explanation:

• fillna() is used to replace missing values (NaN) with the mean of the column.

Task 3: Grouping and Aggregation

Code:

grouped = df.groupby("Subject")["Marks"].mean()

print("Average Marks by Subject:\n", grouped)

Explanation:

• groupby() groups the DataFrame by a specified column.

• Aggregation functions like mean() can be used to summarize data within each group.

Task 4: Reading a CSV File

Code:

df_csv = pd.read_csv("students.csv")

print(df_csv.head())

Explanation:
• pd.read_csv() loads data from a CSV file into a DataFrame.

• head() shows the first 5 rows of the DataFrame.

Task 5: Sorting DataFrames

Code:

# Sort the DataFrame by 'Marks'

sorted_df = df.sort_values(by="Marks", ascending=False)

print("Sorted DataFrame by Marks:\n", sorted_df)

Explanation:

• sort_values() is used to sort the DataFrame based on a column (here, "Marks").

• ascending=False ensures sorting in descending order.

Task 6: Filtering DataFrames

Code:

# Filter students who have marks greater than 80

filtered_df = df[df["Marks"] > 80]

print("Filtered DataFrame (Marks > 80):\n", filtered_df)

Explanation:

• Conditional filtering allows you to select rows based on certain criteria (e.g., marks
greater than 80).

Task 7: Adding a New Column

Code:

# Add a new column 'Pass/Fail' based on the Marks

df["Pass/Fail"] = df["Marks"].apply(lambda x: "Pass" if x >= 50 else "Fail")

print("DataFrame with 'Pass/Fail' column:\n", df)

Explanation:

• .apply() applies a function to each element of the column.

• Here, the lambda function checks if marks are above or below 50.

Task 8: Merging DataFrames

Code:
# Create another DataFrame

df2 = pd.DataFrame({

"Name": ["Ali", "Sara", "Ahmed", "Fatima"],

"Age": [18, 19, 20, 21]

})

# Merge df and df2 based on 'Name' column

merged_df = pd.merge(df, df2, on="Name")

print("Merged DataFrame:\n", merged_df)

Explanation:

• pd.merge() combines two DataFrames based on a common column (e.g., "Name").

Task 9: Plotting Data

Code:

import matplotlib.pyplot as plt

# Plot Marks distribution
df['Marks'].plot(kind='bar')

plt.title("Marks Distribution")

plt.xlabel("Students")

plt.ylabel("Marks")
plt.show()

Explanation:

• plot(kind='bar') creates a bar plot of marks.

• The plot shows the distribution of marks for each student.

Task 10: Handling Categorical Data

Code:
# Convert 'Subject' column to categorical type

df['Subject'] = df['Subject'].astype('category')

print("Converted 'Subject' column to categorical data:\n", df.dtypes)

Explanation:

• astype('category') converts the 'Subject' column to a categorical type, which is more

memory-efficient for repetitive text data.

Student Exercises

1. Manually create a DataFrame with 5 students, columns: Name, Age, Marks, Subject.

2. Use dropna() to remove rows with missing data.

3. Use df[df["Marks"] > 80] to filter students with marks above 80.

4. Group students by subject and count how many students each subject has.

5. Export the final DataFrame to CSV using df.to_csv("final_data.csv").

6. Create a DataFrame from a dictionary of your choice, then explore basic statistics using
df.describe().
7. Calculate the percentage of missing values in each column using df.isnull().sum() / len(df) *
100.
8. Filter rows where the "Marks" column has values less than 50 and save the result to a
new DataFrame.
9. Use groupby() to find the average marks per subject and then sort the subjects by average
marks in ascending order.
10. Plot a pie chart of the distribution of marks in the "Pass/Fail" column.

EZkeys 2 - Manual - Toontrack
0% (1)
EZkeys 2 - Manual - Toontrack
1 page
NumPy and Pandas (1)
No ratings yet
NumPy and Pandas (1)
12 pages
Python ClassXII AI
No ratings yet
Python ClassXII AI
4 pages
FDS RECORD-1-4
No ratings yet
FDS RECORD-1-4
18 pages
ML ASSIGNMENT 2..
No ratings yet
ML ASSIGNMENT 2..
6 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
PDA_Assignment questions
No ratings yet
PDA_Assignment questions
4 pages
Khadeeja_DS_PRACTICAL 4
No ratings yet
Khadeeja_DS_PRACTICAL 4
24 pages
12 Ip Practical List With Solution Complete
No ratings yet
12 Ip Practical List With Solution Complete
5 pages
Apuntes Azure Data Scientist
No ratings yet
Apuntes Azure Data Scientist
397 pages
Practical_1
No ratings yet
Practical_1
5 pages
Advanced Python Lab
No ratings yet
Advanced Python Lab
17 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
Data Preprocessing Python Tome I
No ratings yet
Data Preprocessing Python Tome I
10 pages
Class 1 - 2024 Business Analytics
No ratings yet
Class 1 - 2024 Business Analytics
8 pages
EXP1-siddhant gupta (23_SE_148)
No ratings yet
EXP1-siddhant gupta (23_SE_148)
17 pages
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
No ratings yet
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
16 pages
Pandas Tutorial1 - Informatics
No ratings yet
Pandas Tutorial1 - Informatics
43 pages
Task2 - Colaboratory Dip
No ratings yet
Task2 - Colaboratory Dip
3 pages
Module 7 _ Advanced Python Tools Assignment DS
No ratings yet
Module 7 _ Advanced Python Tools Assignment DS
3 pages
python interviews
No ratings yet
python interviews
154 pages
Shalvin
No ratings yet
Shalvin
9 pages
Python Notes
No ratings yet
Python Notes
55 pages
PW2 DataCleaning
No ratings yet
PW2 DataCleaning
6 pages
ML Lab File Vijay Kumar
No ratings yet
ML Lab File Vijay Kumar
16 pages
Week 4- Introduction to Python #3
No ratings yet
Week 4- Introduction to Python #3
47 pages
Task2 - Colaboratory
No ratings yet
Task2 - Colaboratory
3 pages
22mbada303 Module 4
No ratings yet
22mbada303 Module 4
32 pages
Practical File IP
No ratings yet
Practical File IP
27 pages
data science
No ratings yet
data science
42 pages
numpy_dataframe
No ratings yet
numpy_dataframe
12 pages
GE Python Visualization 2023
No ratings yet
GE Python Visualization 2023
16 pages
Python programming U5
No ratings yet
Python programming U5
46 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Copy of Vedant 2024801005 Experiment 3
No ratings yet
Copy of Vedant 2024801005 Experiment 3
18 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Assignment 01
No ratings yet
Assignment 01
3 pages
DS Practical
No ratings yet
DS Practical
30 pages
Python Notes by Prof T
No ratings yet
Python Notes by Prof T
10 pages
NumPy and Pandas Tutorial
No ratings yet
NumPy and Pandas Tutorial
8 pages
Imp Details
No ratings yet
Imp Details
6 pages
PRACTICAL FILE (XII - IP) (1)
No ratings yet
PRACTICAL FILE (XII - IP) (1)
32 pages
IP Practical 2023-24 (1 To 34)
100% (1)
IP Practical 2023-24 (1 To 34)
32 pages
ip study
No ratings yet
ip study
18 pages
DAV Practical
No ratings yet
DAV Practical
12 pages
dv_lab_manual_modified
No ratings yet
dv_lab_manual_modified
31 pages
GEC PRACTICALS
No ratings yet
GEC PRACTICALS
31 pages
index
No ratings yet
index
4 pages
Pandas,Numpy,Matplotlib
No ratings yet
Pandas,Numpy,Matplotlib
11 pages
Lab #2 - Data Analysis With NumPy and Pandas
No ratings yet
Lab #2 - Data Analysis With NumPy and Pandas
7 pages
Ip Practical File
No ratings yet
Ip Practical File
20 pages
Practical File 2024
No ratings yet
Practical File 2024
25 pages
Data Analysis Tools
No ratings yet
Data Analysis Tools
26 pages
12 IP Practical Exampl
No ratings yet
12 IP Practical Exampl
6 pages
Practical File Question 28.09.2022
No ratings yet
Practical File Question 28.09.2022
15 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
2023 Data Analysis and Visualization Using Python
100% (2)
2023 Data Analysis and Visualization Using Python
9 pages
fds_merged (3) (1)
No ratings yet
fds_merged (3) (1)
102 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Flow Controls
No ratings yet
Flow Controls
35 pages
4.PAKNotes Tutorial
No ratings yet
4.PAKNotes Tutorial
36 pages
Ensayo Sobre Términos Literarios
100% (2)
Ensayo Sobre Términos Literarios
6 pages
Eyeshare Manual
No ratings yet
Eyeshare Manual
13 pages
3.5. File System Implementation-Allocation
No ratings yet
3.5. File System Implementation-Allocation
16 pages
Xtruck X003 Heavy Duty Truck Scanner DPF Regenerat
No ratings yet
Xtruck X003 Heavy Duty Truck Scanner DPF Regenerat
9 pages
Central Computer System: Automation
No ratings yet
Central Computer System: Automation
4 pages
Project Documentation
No ratings yet
Project Documentation
13 pages
20 - Access Control Lists - FortiGate
No ratings yet
20 - Access Control Lists - FortiGate
3 pages
Hisu Dm Manual
No ratings yet
Hisu Dm Manual
8 pages
6y2rwdKSBWZD5phJs7aL 1-BCRIsk
No ratings yet
6y2rwdKSBWZD5phJs7aL 1-BCRIsk
10 pages
Graph Theory and its Applications_ What Can Graphs Do for Your Software_ _ by Héla Ben Khalfallah _ Sep, 2024 _ ITNEXT
No ratings yet
Graph Theory and its Applications_ What Can Graphs Do for Your Software_ _ by Héla Ben Khalfallah _ Sep, 2024 _ ITNEXT
52 pages
سحر المغربة بخط اليد كامل
100% (4)
سحر المغربة بخط اليد كامل
162 pages
Databases 2 Course Material
No ratings yet
Databases 2 Course Material
13 pages
Rishi Sharma Resume 2022
No ratings yet
Rishi Sharma Resume 2022
1 page
lastUIException 63802229501
No ratings yet
lastUIException 63802229501
6 pages
Design and Implementation of An Rfid Based Automated Students Attendance System R Basas
No ratings yet
Design and Implementation of An Rfid Based Automated Students Attendance System R Basas
6 pages
Full download (Ebook) Artificial Neural Networks with Java - Tools for Building Neural Network Applications by Igor Livshin ISBN 9781484244203, 1484244206 pdf docx
100% (9)
Full download (Ebook) Artificial Neural Networks with Java - Tools for Building Neural Network Applications by Igor Livshin ISBN 9781484244203, 1484244206 pdf docx
67 pages
SVLK 2015 Proceeding Upy
No ratings yet
SVLK 2015 Proceeding Upy
9 pages
CV Dhai Mansour
No ratings yet
CV Dhai Mansour
1 page
Thebes Higher Institute For Management and Information Technology
No ratings yet
Thebes Higher Institute For Management and Information Technology
3 pages
Maya Render Log
No ratings yet
Maya Render Log
24 pages
CLOUD SECURITY
No ratings yet
CLOUD SECURITY
20 pages
Viewer v8
No ratings yet
Viewer v8
19 pages
Kós Károly - Mihez Kezdjünk A Természetben
100% (1)
Kós Károly - Mihez Kezdjünk A Természetben
117 pages
FPTV Proceedings Template v3
No ratings yet
FPTV Proceedings Template v3
5 pages
Embracing Agile Practices
No ratings yet
Embracing Agile Practices
4 pages
01.introduction To Modular Industry
No ratings yet
01.introduction To Modular Industry
18 pages