Comsats University
Islamabad
Muhammad Noman | SP22-BSE-037
Introduction to Data Science
Dr. Javed Iqbal
Assignment 1
October 24, 2024
Part (I):
Create a data frame for student grades and calculate the average grade for each
student. Determine if the student passed (average ≥ 60) or failed. 1. Create a data
frame with the following columns: StudentID, Name, Grade1, Grade2, Grade3. 2.
Calculate the average grade for each student. 3. Create a new column Status that
indicates "Pass" or "Fail" based on the average grade.
Solution:
# Create a data frame with student data
student_data <- data.frame(
StudentID = c(1, 2, 3, 4, 5),
Name = c("Ali", "Ahmed", "Ashraf", "Adeel", "Rubab"),
Grade1 = c(85, 70, 92, 60, 88),
Grade2 = c(90, 75, 85, 72, 95),
Grade3 = c(82, 68, 90, 75, 92)
)
# Calculate the average grade for each student
student_data$Average <- rowMeans(student_data[, c("Grade1", "Grade2",
"Grade3")])
# Determine if the student passed or failed
student_data$Status <- ifelse(student_data$Average >= 60, "Pass", "Fail")
print(student_data
Part (II):
Create a data frame for sales data and identify the highest-selling product in each
category. 1. Create a data frame with columns: ProductID, ProductName, Category,
Sales. 2. For each category, find the product with the highest sales. 3. Create a new
data frame containing the category and the corresponding highest-selling product
Solution:
# Create a data frame with sales data
sales_data <- data.frame(
ProductID = c(101, 102, 103, 104, 105, 106),
ProductName = c("Refrigerator", "T-Shirt", "Washing Machine", "Bed", "Cargo
Pants", "Chair"),
Category = c("Electronics", "Clothing", "Electronics", "Furniture", "Clothing",
"Furniture"),
Sales = c(1500, 1200, 2000, 800, 1800, 1000)
)
# Find the highest-selling product in each category
highest_selling <- aggregate(Sales ~ Category, data = sales_data, FUN = max)
# Merge the highest-selling data with the original data to get product names
highest_selling_products <- merge(highest_selling, sales_data, by = "Category")
# Filter for the highest-selling product in each category
highest_selling_products <-
highest_selling_products[highest_selling_products$Sales.x ==
highest_selling_products$Sales.y, ]
# Remove unnecessary columns
highest_selling_products <- highest_selling_products[, c("Category",
"ProductName")]
print(highest_selling_products)