Data Mining Journal 1 Kashan
Data Mining Journal 1 Kashan
Karachi Campus
COURSE:
Data Mining
TERM: SPRING 2024, CLASS: BSE- 6(A)
Submitted By:
KASHAN RIAZ 02-131212-075
_______________________________________________
(Name) (Enroll. No.)
Submitted To:
▪ Requirements:
• Load the Vega dataset into a Pandas data frame.
• Using plotting libraries like Matplotlib and Seaborn, Altair create visualizations to
understand relationships between different vehicle features. Some examples:
• Scatterplot of engine size vs. horsepower
• Histogram of price distribution
• Grouping by body style and analyzing statistics
Submitted On:
Date: _17-2-24___
Task No. 01: Library Management System
Solution and output:
import pandas as pd
books_df = pd.DataFrame(columns=['Title', 'Author', 'Genre', 'Publishing Year'])
members_df = pd.DataFrame(columns=['Name', 'Email', 'Contact Number', 'Membership Status'])
def delete_book(title):
global books_df
books_df = books_df.drop(books_df[books_df['Title'] == title].index)
def search_book(title):
global books_df
return books_df[books_df['Title'] == title]
def search_member(name):
global members_df
return members_df[members_df['Name'] == name]
print("Initial Data:")
print("Books Data Frame:")
print(books_df)
print("\nMembers Data Frame:")
print(members_df)
edit_member("John Doe", {'Name': 'John Smith', 'Email': '[email protected]', 'Contact Number': '1112223333',
'Membership Status': 'Active'})
print("\nAfter Editing 'John Doe' Member:")
print(members_df)
df = pd.read_csv("Train.csv")
print("Dataset Information:")
print(df.info())
print("\nSummary Statistics:")
print(df.describe())
product_sales = df.groupby('ID')['Cost_of_the_Product'].sum().sort_values(ascending=False)
print(product_sales.head(5))
print(product_sales.tail(5))
customer_segments = df.groupby('Customer_rating')['Cost_of_the_Product'].sum().sort_values(ascending=False)
print(customer_segments.head(5))
df = pd.read_csv("Housing.csv")
print("Dataset Information:")
print(df.info())
print("\nMissing Values:")
print(df.isnull().sum())
df['price'] = df['price'].astype(float)
print("\nPreprocessed Dataset:")
print(df.head())
df = pd.read_csv("Vega.csv")
print("Dataset Information:")
print(df.info())
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df, x='displacement', y='horsepower')
plt.title('Engine Displacement vs. Horsepower')
plt.xlabel('Engine Displacement')
plt.ylabel('Horsepower')
plt.grid(True)
plt.show()
plt.figure(figsize=(10, 6))
sns.histplot(data=df, x='mpg', bins=20, kde=True)
plt.title('Fuel Efficiency Distribution')
plt.xlabel('Miles per Gallon (MPG)')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
plt.figure(figsize=(10, 6))
sns.boxplot(data=df, x='origin', y='mpg')
plt.title('Fuel Efficiency Distribution by Origin')
plt.xlabel('Origin')
plt.ylabel('Miles per Gallon (MPG)')
plt.grid(True)
plt.show()