0% found this document useful (0 votes)
20 views13 pages

Data Mining Journal 1 Kashan

Uploaded by

Kashan Riaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views13 pages

Data Mining Journal 1 Kashan

Uploaded by

Kashan Riaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Bahria University,

Karachi Campus

COURSE:
Data Mining
TERM: SPRING 2024, CLASS: BSE- 6(A)

Submitted By:
KASHAN RIAZ 02-131212-075
_______________________________________________
(Name) (Enroll. No.)

Submitted To:

Engr. Hamza/Engr. Misbah

Signed Remarks: Score:_____


INDEX
SNO DATE LAB NO LAB OBJECTIVE SIGN

1 17-2-24 1 GUI in Python and data mining


libraries
SNO DATE LAB NO LAB OBJECTIVE SIGN
Bahria University,
Karachi Campus

LAB EXPERIMENT NO.


_1_
LIST OF TASKS
TASK NO OBJECTIVE
1 Library Management System
2 You work for an e-commerce company and have been given a dataset with
information on customer orders over the past year. Load the data into
Pandas, and analyze it using methods like .info(), and .describe(), Which
products have the highest/lowest sales? Which customer segments spend
the most?
3 You are a data analyst at a real estate company. You have been given a dataset of housing
sale prices in different regions over the past 5 years. Load the data into Pandas and
preprocess it by handling missing values and formatting columns.
4 ▪ You are a data analyst working for an automobile company. You have been provided with
the Vega dataset which contains details on different vehicle models like price, engine
size, horsepower, dimensions etc.

▪ Requirements:
• Load the Vega dataset into a Pandas data frame.
• Using plotting libraries like Matplotlib and Seaborn, Altair create visualizations to
understand relationships between different vehicle features. Some examples:
• Scatterplot of engine size vs. horsepower
• Histogram of price distribution
• Grouping by body style and analyzing statistics

Submitted On:
Date: _17-2-24___
Task No. 01: Library Management System
Solution and output:
import pandas as pd
books_df = pd.DataFrame(columns=['Title', 'Author', 'Genre', 'Publishing Year'])
members_df = pd.DataFrame(columns=['Name', 'Email', 'Contact Number', 'Membership Status'])

def add_book(title, author, genre, year):


global books_df
books_df = books_df.append({'Title': title, 'Author': author, 'Genre': genre, 'Publishing Year': year},
ignore_index=True)

def edit_book(title, new_data):


global books_df
book_index = books_df[books_df['Title'] == title].index[0]
books_df.loc[book_index] = new_data

def delete_book(title):
global books_df
books_df = books_df.drop(books_df[books_df['Title'] == title].index)

def add_member(name, email, contact_number, membership_status):


global members_df
members_df = members_df.append({'Name': name, 'Email': email, 'Contact Number': contact_number,
'Membership Status': membership_status}, ignore_index=True)

def edit_member(name, new_data):


global members_df
member_index = members_df[members_df['Name'] == name].index[0]
members_df.loc[member_index] = new_data

def search_book(title):
global books_df
return books_df[books_df['Title'] == title]

def search_member(name):
global members_df
return members_df[members_df['Name'] == name]

print("Initial Data:")
print("Books Data Frame:")
print(books_df)
print("\nMembers Data Frame:")
print(members_df)

add_book("1984", "George Orwell", "Dystopian Fiction", 1949)


add_book("To Kill a Mockingbird", "Harper Lee", "Fiction", 1960)
print("\nAfter Adding Books:")
print(books_df)

Kashan Riaz 02-131212-075


edit_book("1984", {'Title': 'Nineteen Eighty-Four', 'Author': 'George Orwell', 'Genre': 'Dystopian Fiction',
'Publishing Year': 1949})
print("\nAfter Editing '1984' Book:")
print(books_df)

delete_book("To Kill a Mockingbird")


print("\nAfter Deleting 'To Kill a Mockingbird' Book:")
print(books_df)

add_member("John Doe", "[email protected]", "1234567890", "Active")


add_member("Jane Smith", "[email protected]", "0987654321", "Active")
print("\nAfter Adding Members:")
print(members_df)

edit_member("John Doe", {'Name': 'John Smith', 'Email': '[email protected]', 'Contact Number': '1112223333',
'Membership Status': 'Active'})
print("\nAfter Editing 'John Doe' Member:")
print(members_df)

searched_book = search_book("Nineteen Eighty-Four")


print("\nSearched Book:")
print(searched_book)

searched_member = search_member("John Smith")


print("\nSearched Member:")
print(searched_member)

Kashan Riaz 02-131212-075


Task No. 02: Customer Database For e-commerce company
Solution and output:
import pandas as pd

df = pd.read_csv("Train.csv")

print("Dataset Information:")

print(df.info())

print("\nSummary Statistics:")

print(df.describe())

product_sales = df.groupby('ID')['Cost_of_the_Product'].sum().sort_values(ascending=False)

print("\nProducts with Highest Sales:")

print(product_sales.head(5))

print("\nProducts with Lowest Sales:")

print(product_sales.tail(5))

customer_segments = df.groupby('Customer_rating')['Cost_of_the_Product'].sum().sort_values(ascending=False)

print("\nCustomer Segments with Highest Spending:")

print(customer_segments.head(5))

Kashan Riaz 02-131212-075


Kashan Riaz 02-131212-075
Task No. 03: Housing Database
Solution and output:
import pandas as pd

df = pd.read_csv("Housing.csv")

print("Dataset Information:")

print(df.info())

print("\nMissing Values:")

print(df.isnull().sum())

df['price'] = df['price'].astype(float)

print("\nPreprocessed Dataset:")

print(df.head())

Kashan Riaz 02-131212-075


Task No. 04: Automobile Database
Solution and output:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv("Vega.csv")

print("Dataset Information:")
print(df.info())

plt.figure(figsize=(10, 6))
sns.scatterplot(data=df, x='displacement', y='horsepower')
plt.title('Engine Displacement vs. Horsepower')
plt.xlabel('Engine Displacement')
plt.ylabel('Horsepower')
plt.grid(True)
plt.show()

plt.figure(figsize=(10, 6))
sns.histplot(data=df, x='mpg', bins=20, kde=True)
plt.title('Fuel Efficiency Distribution')
plt.xlabel('Miles per Gallon (MPG)')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()

grouped_origin = df.groupby('origin').agg({'mpg': 'mean', 'weight': 'mean',


'acceleration': 'mean'})
print("\nGrouped by Origin Statistics:")
print(grouped_origin)

plt.figure(figsize=(10, 6))
sns.boxplot(data=df, x='origin', y='mpg')
plt.title('Fuel Efficiency Distribution by Origin')
plt.xlabel('Origin')
plt.ylabel('Miles per Gallon (MPG)')
plt.grid(True)
plt.show()

Kashan Riaz 02-131212-075


Kashan Riaz 02-131212-075
Kashan Riaz 02-131212-075

You might also like