0% found this document useful (0 votes)

3 views11 pages

DMDW Fielding Set

The lab manual outlines assignments related to data mining and data warehousing, including tasks like creating SQLite databases, implementing data warehouse operations, and designing star and snowflake schemas. Each assignment includes corrected code snippets for practical implementation, covering topics such as KNN imputation, the Apriori algorithm, and decision trees. The manual serves as a comprehensive guide for students in the Computer Science and Engineering department.

Uploaded by

Parimal Maity

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views11 pages

DMDW Fielding Set

Uploaded by

Parimal Maity

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Lab Manual: Data Mining and Data Warehousing (Corrected)

Page 1: Course Information

Department: Computer Science and Engineering – AI\ Course Code: PCC-CSM(202)

Date of
S.No Topic Name Signature Remarks
Experiment

01 Assignment 1 10/02/2025

02 Assignment 2 17/02/2025

03 Assignment 3 24/02/2025

04 Star Schema 10/03/2025

Snowflake
05 17/03/2025
Schema

KNN
06 14/04/2025
Imputation

Apriori
07 21/04/2025
Algorithm

Data
08 21/04/2025
Visualization

Decision
09 Decision Tree 28/04/2025 28/04/2025
Tree

Page 2–3: Assignment 1 – Database Operations

Aim: Create a SQLite database with student, faculty, and books tables, insert sample records, and display
the data.

Corrected Code

import sqlite3

# Create database and tables

conn = sqlite3.connect("university.db")
cursor = conn.cursor()

# Student table

1
cursor.execute("""
CREATE TABLE IF NOT EXISTS student (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
age INTEGER NOT NULL,
dept TEXT NOT NULL,
phone TEXT NOT NULL
)""")

# Faculty table
cursor.execute("""
CREATE TABLE IF NOT EXISTS faculty (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
dept TEXT NOT NULL,
phone TEXT NOT NULL
)""")

# Books table
cursor.execute("""
CREATE TABLE IF NOT EXISTS books (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
writer TEXT NOT NULL,
isbn TEXT NOT NULL,
issue_no INTEGER NOT NULL
)""")

# Insert sample data

cursor.executemany(
"INSERT INTO student (name, age, dept, phone) VALUES (?,?,?,?)",
[("Alice", 20, "CSE", "9876543210"), ("Bob", 22, "ECE", "9123456780")]
)
cursor.execute(
"INSERT INTO faculty (name, dept, phone) VALUES (?,?,?)",
("Dr. Smith", "CSE", "0123456789")
)
cursor.execute(
"INSERT INTO books (name, writer, isbn, issue_no) VALUES (?,?,?,?)",
("Python Essentials", "A. Malik", "ISBN123", 101)
)

conn.commit()

# Display data
print("Students:")
cursor.execute("SELECT * FROM student")
for row in cursor.fetchall(): print(row)

2
print("
Faculty:")
cursor.execute("SELECT * FROM faculty")
for row in cursor.fetchall(): print(row)

print("
Books:")
cursor.execute("SELECT * FROM books")
for row in cursor.fetchall(): print(row)

conn.close()

Page 4–8: Assignment 2 – Data Warehouse Operations

Aim: Implement roll-up, drill-down, slice, dice, and pivot operations on a sales data warehouse using
SQLite.

Corrected Code

import sqlite3

conn = sqlite3.connect("sales_warehouse.db")
cursor = conn.cursor()

# Create sales table

cursor.execute("""
CREATE TABLE IF NOT EXISTS sales (
sales_id INTEGER PRIMARY KEY,
product_name TEXT NOT NULL,
product_price REAL NOT NULL,
year INTEGER NOT NULL
)""")

def insert_sales_data():
data = [
(1, "Laptop", 20000, 2023),
(2, "Phone", 50000, 2024),
(3, "Monitor", 8000, 2023)
]
cursor.executemany("INSERT OR IGNORE INTO sales VALUES (?,?,?,?)", data)
conn.commit()

def query_data():
# Roll-up

3
cursor.execute("SELECT year, SUM(product_price) FROM sales GROUP BY year")
print("Roll-up:
", cursor.fetchall())

# Drill-down
cursor.execute("SELECT year, product_name, SUM(product_price) FROM sales
GROUP BY year, product_name")
print("
Drill-down:
", cursor.fetchall())

# Slice
cursor.execute("SELECT * FROM sales WHERE year = 2023")
print("
Slice:
", cursor.fetchall())

# Dice
cursor.execute("SELECT * FROM sales WHERE year=2023 AND product_price >
10000")
print("
Dice:
", cursor.fetchall())

# Pivot
cursor.execute("""
SELECT year,
SUM(CASE WHEN product_name='Laptop' THEN product_price ELSE 0 END) AS
Laptop,
SUM(CASE WHEN product_name='Phone' THEN product_price ELSE 0 END) AS
Phone
FROM sales GROUP BY year
""")
print("
Pivot:
", cursor.fetchall())

# Execute functions
insert_sales_data()
query_data()
conn.close()

4
Page 9–10: Assignment 3 – Star Schema Implementation:
Assignment 3 – Star Schema Implementation
Aim: Design a star schema with product_dim, customer_dim, and sales_fact tables, load sample data,
and generate a consolidated sales report.

Corrected Code

import sqlite3

# Connect to database
conn = sqlite3.connect("star_schema.db")
cursor = conn.cursor()

# Create dimension tables

cursor.execute("""
CREATE TABLE IF NOT EXISTS product_dim (
product_id INTEGER PRIMARY KEY,
product_name TEXT NOT NULL
)""")
cursor.execute("""
CREATE TABLE IF NOT EXISTS customer_dim (
customer_id INTEGER PRIMARY KEY,
customer_name TEXT NOT NULL
)""")

# Create fact table

cursor.execute("""
CREATE TABLE IF NOT EXISTS sales_fact (
sale_id INTEGER PRIMARY KEY,
product_id INTEGER,
customer_id INTEGER,
price REAL,
FOREIGN KEY (product_id) REFERENCES product_dim(product_id),
FOREIGN KEY (customer_id) REFERENCES customer_dim(customer_id)
)""")

# Insert sample data

products = [(1, "Laptop"), (2, "Phone"), (3, "Monitor")]
customers = [(1, "Alice"), (2, "Bob"), (3, "Charlie")]
sales = [(101, 1, 1, 999.99), (102, 2, 2, 499.99), (103, 3, 3, 299.99)]

cursor.executemany("INSERT OR IGNORE INTO product_dim VALUES (?,?)", products)

cursor.executemany("INSERT OR IGNORE INTO customer_dim VALUES (?,?)", customers)
cursor.executemany("INSERT OR IGNORE INTO sales_fact VALUES (?,?,?,?)", sales)
conn.commit()

5
# Generate sales report via join
cursor.execute("""
SELECT s.sale_id, p.product_name, c.customer_name, s.price
FROM sales_fact s
JOIN product_dim p ON s.product_id = p.product_id
JOIN customer_dim c ON s.customer_id = c.customer_id
""")
print("Sales Report:")
for row in cursor.fetchall():
print(row)

conn.close()

Page 11: Assignment 4 – Snowflake Schema

Aim: Construct a snowflake schema by normalizing dimensions and join sales data with categories.

Corrected Code

import pandas as pd

# Dimension: Categories
product_category = pd.DataFrame({
'category_id': [1, 2],
'category_name': ['Electronics', 'Stationery']
})

# Dimension: Products
products = pd.DataFrame({
'product_id': [101, 102, 103],
'product_name': ['Laptop', 'Mouse', 'Printer'],
'category_id': [1, 1, 2]
})

# Fact: Sales
sales = pd.DataFrame({
'sale_id': [1001, 1002, 1003],
'product_id': [101, 102, 103],
'quantity': [2, 4, 3],
'amount': [2500, 1000, 1500]
})

# Merge to snowflake
merged = sales.merge(products, on='product_id')
final = merged.merge(product_category, on='category_id')

6
print(final[['sale_id', 'product_name', 'category_name', 'quantity', 'amount']])

Page 12: Assignment 5 (KNN Imputation): Assignment 6 (KNN

Imputation)

Corrected Code

import pandas as pd
import numpy as np
from sklearn.impute import KNNImputer

# Original data with missing values

data = {
'Product': ['Laptop', 'Phone', 'Tablet', 'Monitor', 'TV'],
'Price': [1000, 500, np.nan, 300, 600],
'Rating': [4.5, np.nan, 4.2, 4.0, 4.7]
}
df = pd.DataFrame(data)

# KNN Imputation
imputer = KNNImputer(n_neighbors=2)
df[['Price', 'Rating']] = imputer.fit_transform(df[['Price', 'Rating']])
print("After Imputation:\n", df)

# Data Integration
df_sales = pd.DataFrame({
'Product_ID': [1, 2, 3],
'Product': ['Laptop', 'Phone', 'Tablet'],
'Price': [1000, 500, 700]
})

df_ratings = pd.DataFrame({
'Product_ID': [1, 2, 3, 4],
'Rating': [4.5, 4.2, 4.8, 4.3]
})

merged = pd.merge(df_sales, df_ratings, on='Product_ID', how='left')

print("\nMerged Data:\n", merged)

Output:

After Imputation:
Product Price Rating

7
0 Laptop 1000.0 4.5
1 Phone 500.0 4.3
2 Tablet 400.0 4.2
3 Monitor 300.0 4.0
4 TV 600.0 4.7

Merged Data:
Product_ID Product Price Rating
0 1 Laptop 1000 4.5
1 2 Phone 500 4.2
2 3 Tablet 700 4.8

Page 13: Assignment 7 (Apriori Algorithm)

Corrected Code

import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

# Transaction dataset
data = {
'TID': [1, 2, 3, 4, 5],
'Milk': [1, 1, 1, 0, 1],
'Bread': [1, 1, 1, 0, 1],
'Butter': [0, 1, 1, 1, 0],
'Jam': [0, 0, 1, 0, 1]
}
df = pd.DataFrame(data).set_index('TID')

# Apriori Algorithm
freq_items = apriori(df, min_support=0.4, use_colnames=True)
rules = association_rules(freq_items, metric="confidence", min_threshold=0.6)

print("Association Rules:")
print(rules[['antecedents', 'consequents', 'support', 'confidence']])

Output:

Association Rules:
antecedents consequents support confidence
0 (Milk) (Bread) 0.8 1.000000
1 (Bread) (Milk) 0.8 1.000000
2 (Butter) (Milk) 0.4 0.666667

8
3 (Butter) (Bread) 0.4 0.666667
4 (Jam) (Bread) 0.4 1.000000

Pages 14–16: Assignment 8 (Data Visualization)

Corrected Code

import seaborn as sns

import matplotlib.pyplot as plt

iris = sns.load_dataset("iris")

# Scatter Plot
plt.scatter(iris['sepal_length'], iris['petal_length'])
plt.title('Sepal vs Petal Length')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Petal Length (cm)')
plt.show()

# Histogram
plt.hist(iris['sepal_length'], bins=15)
plt.title('Sepal Length Distribution')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Frequency')
plt.show()

# Box Plot
sns.boxplot(x='species', y='sepal_width', data=iris)
plt.title('Sepal Width by Species')
plt.show()

# Heatmap
corr = iris.corr(numeric_only=True)
sns.heatmap(corr, annot=True)
plt.title('Correlation Matrix')
plt.show()

Visualizations render as separate figures.

9
Pages 17–18: Assignment 9 (Decision Tree)

Corrected Code

import pandas as pd
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
import matplotlib.pyplot as plt

# Sample dataset
data = {
'Age': [45, 32, 60, 28, 50],
'Weight': [70, 55, 90, 65, 80],
'Smoker': [1, 0, 1, 0, 1],
'Heart_Attack': [1, 0, 1, 0, 1]
}
df = pd.DataFrame(data)

X = df[['Age', 'Weight', 'Smoker']]

y = df['Heart_Attack']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf = DecisionTreeClassifier(max_depth=2)
clf.fit(X_train, y_train)

# Evaluation
y_pred = clf.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

# Tree Visualization
plt.figure(figsize=(12,8))
plot_tree(clf, feature_names=X.columns, class_names=['No','Yes'], filled=True)
plt.show()

Output (example):

Accuracy: 1.0

Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 1
1 1.00 1.00 1.00 1

10
accuracy 1.00 2
macro avg 1.00 1.00 1.00 2
weighted avg 1.00 1.00 1.00 2

End of Lab Manual.

Designing The Star Schema Database
No ratings yet
Designing The Star Schema Database
15 pages
A Practical Guide To SAP NetWeaver Business Warehouse (BW) 7.0
0% (2)
A Practical Guide To SAP NetWeaver Business Warehouse (BW) 7.0
11 pages
STAR Schema
No ratings yet
STAR Schema
3 pages
New Obiee 11g
No ratings yet
New Obiee 11g
287 pages
Qlikview Interview Questions and Answers
100% (1)
Qlikview Interview Questions and Answers
12 pages
Informatica FAQs
No ratings yet
Informatica FAQs
143 pages
01 Data Warehoudingand Ab Initio Concepts
100% (1)
01 Data Warehoudingand Ab Initio Concepts
76 pages
DataMining and Data Warehousing
No ratings yet
DataMining and Data Warehousing
96 pages
HR Business Content 2004 11
No ratings yet
HR Business Content 2004 11
56 pages
Part A: Question 1 What Is Data Warehouse Schema? Explain Different Types of Schema
No ratings yet
Part A: Question 1 What Is Data Warehouse Schema? Explain Different Types of Schema
6 pages
Module 1 - BW Overview
No ratings yet
Module 1 - BW Overview
30 pages
Report Presentation
No ratings yet
Report Presentation
17 pages
Chapter 2 Data Management
No ratings yet
Chapter 2 Data Management
20 pages
Welcome To The Topic of SAP HANA Modeling Views
No ratings yet
Welcome To The Topic of SAP HANA Modeling Views
16 pages
4th - Business Intelligence
No ratings yet
4th - Business Intelligence
30 pages
Framework Manager-0124 IBM Cognos
No ratings yet
Framework Manager-0124 IBM Cognos
61 pages
Compilation Chapter 13-Data Warehouse-Ans
No ratings yet
Compilation Chapter 13-Data Warehouse-Ans
22 pages
Divya Class 12 Board Practical File
No ratings yet
Divya Class 12 Board Practical File
31 pages
L12 de Normalization
No ratings yet
L12 de Normalization
16 pages
DMC - Record
No ratings yet
DMC - Record
54 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
6 pages
Bahria University: Assignment # 6
No ratings yet
Bahria University: Assignment # 6
3 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
34 pages
Assignment 1
No ratings yet
Assignment 1
7 pages
Informatica FAQs
No ratings yet
Informatica FAQs
26 pages
Practical File Informatics Practices Class 12 For 2022-23 No WM
No ratings yet
Practical File Informatics Practices Class 12 For 2022-23 No WM
20 pages
Practical File Informatics Practices Class 12 For 2022-23 No WM
No ratings yet
Practical File Informatics Practices Class 12 For 2022-23 No WM
24 pages
Python Project
100% (1)
Python Project
15 pages
DWDM
No ratings yet
DWDM
81 pages
1.standard Deviation of Speed of Cars
No ratings yet
1.standard Deviation of Speed of Cars
8 pages
Inmon and Kimball Methodolgies of Data Warehousing
No ratings yet
Inmon and Kimball Methodolgies of Data Warehousing
2 pages
DMW Assignment 1
No ratings yet
DMW Assignment 1
15 pages
Ch3 Data Warehouse: Dr. Bernard Chen PH.D
No ratings yet
Ch3 Data Warehouse: Dr. Bernard Chen PH.D
24 pages
Chapter - 6 DB
No ratings yet
Chapter - 6 DB
27 pages
PMT2 24
No ratings yet
PMT2 24
56 pages
AI Lab 04 Lab Tasks
No ratings yet
AI Lab 04 Lab Tasks
18 pages
Schemas For Multidimensional Databases
No ratings yet
Schemas For Multidimensional Databases
5 pages
Big Book of Data Warehousing and Bi
No ratings yet
Big Book of Data Warehousing and Bi
88 pages
Ip Practical File
No ratings yet
Ip Practical File
20 pages
Practical File IP Class 12
No ratings yet
Practical File IP Class 12
19 pages
DMC Lab Ex - 1 To 15 (31.03.2024)
No ratings yet
DMC Lab Ex - 1 To 15 (31.03.2024)
52 pages
3rd Normal Form Vs Star Schema
No ratings yet
3rd Normal Form Vs Star Schema
2 pages
DHP Journal
No ratings yet
DHP Journal
29 pages
Data Mining Journal 1 Kashan
No ratings yet
Data Mining Journal 1 Kashan
13 pages
Ans IP AISSCE Practical Exam 2023
No ratings yet
Ans IP AISSCE Practical Exam 2023
7 pages
Abhishek - 20BCS7093 - EXP 5
No ratings yet
Abhishek - 20BCS7093 - EXP 5
3 pages
HHHH
No ratings yet
HHHH
22 pages
Data Mining e Resources
No ratings yet
Data Mining e Resources
98 pages
DM Lab Cycle 7 1
No ratings yet
DM Lab Cycle 7 1
7 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Python Project File
No ratings yet
Python Project File
31 pages
IP Record Python 23-24 Aryan
No ratings yet
IP Record Python 23-24 Aryan
42 pages
DWM Practical
No ratings yet
DWM Practical
12 pages
Pragya File
No ratings yet
Pragya File
31 pages
Cloud Lab
No ratings yet
Cloud Lab
29 pages
Panda
No ratings yet
Panda
39 pages
Data Mining Lab 15.11.24
No ratings yet
Data Mining Lab 15.11.24
29 pages
Library Management System Code
No ratings yet
Library Management System Code
7 pages
DW Lab File
No ratings yet
DW Lab File
18 pages
Class 12 Practical File Informatics Practices
No ratings yet
Class 12 Practical File Informatics Practices
22 pages
D3 Docs
No ratings yet
D3 Docs
6 pages
Vamshi ml-1,2
No ratings yet
Vamshi ml-1,2
25 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
Question Bank-BDA (Module 1&2) 2
No ratings yet
Question Bank-BDA (Module 1&2) 2
5 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
Flipkart Business Analyst Interview Questions
No ratings yet
Flipkart Business Analyst Interview Questions
16 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
35 pages
Screenshot 2023-12-27 at 7.05.37 PM
No ratings yet
Screenshot 2023-12-27 at 7.05.37 PM
23 pages
12 Ip Practical List With Solution Complete
No ratings yet
12 Ip Practical List With Solution Complete
5 pages
Data Mining Lab Manaul
No ratings yet
Data Mining Lab Manaul
32 pages
Index
No ratings yet
Index
4 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
DWDM Lab Manual 28.04.25-9-14
No ratings yet
DWDM Lab Manual 28.04.25-9-14
6 pages
PP DWDM 4 5
No ratings yet
PP DWDM 4 5
26 pages
AI in Cyber Security
No ratings yet
AI in Cyber Security
10 pages
B Tech-AIML-question Bank-2 Answer Key
No ratings yet
B Tech-AIML-question Bank-2 Answer Key
9 pages
Proj CSM681 Bwu Bta 22 212
No ratings yet
Proj CSM681 Bwu Bta 22 212
13 pages
SPM End Sem MCQ
No ratings yet
SPM End Sem MCQ
19 pages
Guides
No ratings yet
Guides
23 pages
ML Cheatsheet
No ratings yet
ML Cheatsheet
12 pages
Software Project Management - Core Topics and Key Questions
No ratings yet
Software Project Management - Core Topics and Key Questions
5 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
Pyspark Interview Questions
No ratings yet
Pyspark Interview Questions
4 pages
Data Wrangling Notebook Summary
No ratings yet
Data Wrangling Notebook Summary
9 pages
FDS Lab
No ratings yet
FDS Lab
43 pages
Datascience
No ratings yet
Datascience
26 pages
KSTV
No ratings yet
KSTV
19 pages
DSC Lab Programs
No ratings yet
DSC Lab Programs
24 pages
Python
No ratings yet
Python
22 pages
Foundation of Data Science Lab Manual Full
No ratings yet
Foundation of Data Science Lab Manual Full
8 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
NumPy and Pandas Step
No ratings yet
NumPy and Pandas Step
9 pages
Even Students
No ratings yet
Even Students
36 pages
SC-200: Microsoft Security Operations Analyst Preparation
From Everand
SC-200: Microsoft Security Operations Analyst Preparation
Georgio Daccache
No ratings yet
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
From Everand
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
Equity Press
No ratings yet
Blazor and API Example: Classroom Quiz Application
From Everand
Blazor and API Example: Classroom Quiz Application
Taurius Litvinavicius
No ratings yet

DMDW Fielding Set

Uploaded by

DMDW Fielding Set

Uploaded by

Lab Manual: Data Mining and Data Warehousing (Corrected)

Page 1: Course Information

04 Star Schema 10/03/2025

Page 2–3: Assignment 1 – Database Operations

# Create database and tables

# Insert sample data

Page 4–8: Assignment 2 – Data Warehouse Operations

# Create sales table

# Create dimension tables

# Create fact table

# Insert sample data

cursor.executemany("INSERT OR IGNORE INTO product_dim VALUES (?,?)", products)

Page 11: Assignment 4 – Snowflake Schema

Page 12: Assignment 5 (KNN Imputation): Assignment 6 (KNN

# Original data with missing values

merged = pd.merge(df_sales, df_ratings, on='Product_ID', how='left')

Page 13: Assignment 7 (Apriori Algorithm)

Pages 14–16: Assignment 8 (Data Visualization)

import seaborn as sns

Visualizations render as separate figures.

X = df[['Age', 'Weight', 'Smoker']]

End of Lab Manual.

You might also like