0% found this document useful (0 votes)

47 views10 pages

DWM Lab Workbook Sample

Uploaded by

Likhith Srinivas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views10 pages

DWM Lab Workbook Sample

Uploaded by

Likhith Srinivas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

22DSB3202 DATA WAREHOUSING & MINING (DWM)

LAB WORKBOOK
22DSB3202 DATA WAREHOUSING & MINING
(DWM)

III B.TECH 2024-25 ODD SEMESTER

KLH AZIZ NAGAR CAMPUS, HYDERABAD

22DSB3202 DATA WAREHOUSING & MINING (DWM)

LABORATORY
WORKBOOK
STUDENT NAME

REGN NO
Year III(2024-25)

Odd semester
SEMESTER

SECTION
Dr. Shahin Fatima
FACULTY
2

2
22DSB3202 DATA WAREHOUSING & MINING (DWM)

TABLE OF CONTENTS

SNO NAME OF PRACTICAL PAGENO

3
22DSB3202 DATA WAREHOUSING & MINING (DWM)

Data Warehousing and Mining Lab

Manual
Introduction
This lab manual outlines various data warehousing and mining techniques using Python.
Each section includes the algorithm, objectives, and sample code. Ensure that you have
the required libraries installed (pandas, numpy, scikit-learn, mlxtend, matplotlib,
and seaborn).

1. Basic Statistical Descriptions

Algorithm

1. Load the dataset into a DataFrame.

2. Use descriptive statistics to summarize the dataset.
3. Compute measures like mean, median, mode, standard deviation, and quantiles.

Code
import pandas as pd

# Load the dataset

data = pd.read_csv('your_dataset.csv')

# Basic statistical descriptions

description = data.describe()
print(description)

OUTPUT

4
22DSB3202 DATA WAREHOUSING & MINING (DWM)

2. Dataset Creation
CATEGORICAL DATA

Algorithm to Create and Save a DataFrame to CSV

1. Import pandas library:

o Load the pandas library into the Python environment to work with data
frames.
2. Define data:
o Create a dictionary named data where:
 Keys represent column names: 'Name', 'Age', 'City',
'Occupation'.
 Values are lists containing the respective column data.
3. Create DataFrame:
o Use the pandas DataFrame constructor to convert the data dictionary into
a DataFrame object named df.
4. Save DataFrame to CSV:
o Call the to_csv method on the DataFrame df to save it to a CSV file.
o Specify the file name as 'categorical_dataset.csv'.
o Set index=False to exclude the index column in the CSV file.

TIME SERIES DATASET

Algorithm to Generate and Save a Time Series Dataset

1. Import Libraries:
o Import the pandas library as pd to handle data manipulation and
DataFrame creation.
o Import the numpy library as np to generate random numbers.
2. Define Date Range:
o Create a date range starting from '2024-01-01' with a total of 100
periods (days).
o Use a daily frequency ('D') to generate the sequence of dates.
3. Generate Random Data:
o Create a dictionary data where:
 'Date' is assigned the date range created in step 2.
 'Value' is assigned a numpy array of random numbers generated
from a normal distribution, scaled by 100.
 'Value' contains random values drawn from a normal distribution
with a mean of 0 and standard deviation of 100.
4. Create DataFrame:
o Convert the dictionary data into a pandas DataFrame object named df.
5. Save DataFrame to CSV:

5
22DSB3202 DATA WAREHOUSING & MINING (DWM)

o Save the DataFrame df to a CSV file with the path

'C:/Users/lenovo/Documents/time_series_dataset.csv'.
o Set index=False to exclude the DataFrame index from being written to
the CSV file.

Code
CATEGORICAL DATA

import pandas as pd

# Define data

data = {

'Name': ['Alice', 'Bob', 'Charlie', 'David'],

'Age': [24, 30, 28, 35],

'City': ['New York', 'Los Angeles', 'Chicago', 'Houston'],

'Occupation': ['Engineer', 'Doctor', 'Artist', 'Chef']

# Create DataFrame

df = pd.DataFrame(data)

# Save to CSV

#df.to_csv('categorical_dataset.csv', index=False)

df.to_csv('C:/Users/lenovo/Documents/categorical_dataset.csv', index=False)

TIME SERIES DATA

import pandas as pd

import numpy as np

# Define date range and generate data

date_range = pd.date_range(start='2024-01-01', periods=100, freq='D')

data = {

6
22DSB3202 DATA WAREHOUSING & MINING (DWM)

'Date': date_range,

'Value': np.random.randn(len(date_range)) * 100

# Create DataFrame

df = pd.DataFrame(data)

# Save to CSV

df.to_csv('C:/Users/lenovo/Documents/time_series_dataset.csv', index=False)

OUTPUT

7
22DSB3202 DATA WAREHOUSING & MINING (DWM)

3. Data Pre-processing Techniques

Algorithm

1. Load the dataset into a DataFrame.

2. Identify and handle missing values:3
o Fill missing values with the mean or median.
o Drop rows or columns with excessive missing values.
3. Convert categorical variables to numeric using one-hot encoding.

Code
import pandas as pd

# Load the dataset

data = pd.read_csv('your_dataset.csv')

# Handle missing values

data.fillna(data.mean(), inplace=True) # Fill numeric columns with
mean

# Encode categorical variables

data = pd.get_dummies(data, drop_first=True)
print(data.head())

OUTPUT

8
22DSB3202 DATA WAREHOUSING & MINING (DWM)

4. Classification Using Decision Trees

Algorithm

1. Split the dataset into features and target variables.

2. Split the data into training and testing sets.
3. Create and train a Decision Tree model.
4. Evaluate the model's accuracy on the test set.

Code
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

# Load the dataset

data = pd.read_csv('your_dataset.csv')

# Split the data

X = data.drop('target', axis=1)
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Train the model

model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# Evaluate the model

accuracy = model.score(X_test, y_test)
print(f'Accuracy: {accuracy}')

OUTPUT

9
22DSB3202 DATA WAREHOUSING & MINING (DWM)

5. Classification Using Bayesian Classifiers

Algorithm

1. Split the dataset into features and target variables.

2. Split the data into training and testing sets.
3. Create and train a Naive Bayes model.
4. Evaluate the model's accuracy on the test set.

Code
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split

# Load the dataset

data = pd.read_csv('your_dataset.csv')

# Split the data

X = data.drop('target', axis=1)
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Train the model

model = GaussianNB()
model.fit(X_train, y_train)

# Evaluate the model

accuracy = model.score(X_test, y_test)
print(f'Accuracy: {accuracy}')

OUTPUT

Data Cleaning - Cheatsheet
100% (2)
Data Cleaning - Cheatsheet
8 pages
Seminar - Write-Up Format
No ratings yet
Seminar - Write-Up Format
8 pages
Aiml Lab Manaual R23
100% (1)
Aiml Lab Manaual R23
10 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
27 pages
Daily Lesson Log - Ucsp
91% (11)
Daily Lesson Log - Ucsp
3 pages
Obeah Witchcraft in The West Indies by Hesketh Bell PDF
100% (1)
Obeah Witchcraft in The West Indies by Hesketh Bell PDF
219 pages
SGA and Background Process - Architecture
100% (2)
SGA and Background Process - Architecture
68 pages
Unit - II MLT
No ratings yet
Unit - II MLT
75 pages
Chapter 2 Outline
No ratings yet
Chapter 2 Outline
4 pages
Differentiated Activities For Teaching Key Math Skills Grades 4 6 Sample Pages
100% (1)
Differentiated Activities For Teaching Key Math Skills Grades 4 6 Sample Pages
20 pages
Experience Optimization Playbook
100% (2)
Experience Optimization Playbook
33 pages
Lesson 2 - Data Preprocessing
100% (1)
Lesson 2 - Data Preprocessing
72 pages
Data Mining Using Python Manual
No ratings yet
Data Mining Using Python Manual
69 pages
Data Science Lab Manual..
No ratings yet
Data Science Lab Manual..
54 pages
FCE First Trainer 2
No ratings yet
FCE First Trainer 2
254 pages
Get Clock Wise Workbook
100% (1)
Get Clock Wise Workbook
25 pages
Holistic Approach
No ratings yet
Holistic Approach
42 pages
Data Mining Lab Manual 2 2
No ratings yet
Data Mining Lab Manual 2 2
63 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
Red Rose Children Academy: ANNUAL EXAM-2018-19 Time:-2Hrs Class 3 MM. 50
No ratings yet
Red Rose Children Academy: ANNUAL EXAM-2018-19 Time:-2Hrs Class 3 MM. 50
6 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
Position Control Performance Improvement of DTC-SVM For An Induction Motor: Application To Photovoltaic Panel Position
No ratings yet
Position Control Performance Improvement of DTC-SVM For An Induction Motor: Application To Photovoltaic Panel Position
14 pages
Track 2 - Final Major Project (Coursera) - MCA 3
No ratings yet
Track 2 - Final Major Project (Coursera) - MCA 3
2 pages
DBDAL LAB - MANUAL - Final
No ratings yet
DBDAL LAB - MANUAL - Final
93 pages
SF 9 - ES ( (Learner's Progress Report Card)
No ratings yet
SF 9 - ES ( (Learner's Progress Report Card)
2 pages
Chapter2 - Data Wrangling
No ratings yet
Chapter2 - Data Wrangling
48 pages
Justenoughpython Pandas 220915 175329
No ratings yet
Justenoughpython Pandas 220915 175329
64 pages
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
ML Lab
No ratings yet
ML Lab
46 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
CSE445 NSU Week - 3
No ratings yet
CSE445 NSU Week - 3
48 pages
Lecture02. ML Pipeline (Chapter 2)
No ratings yet
Lecture02. ML Pipeline (Chapter 2)
50 pages
Grade 11
No ratings yet
Grade 11
2 pages
UNITIV BtechIot
No ratings yet
UNITIV BtechIot
43 pages
r20 Datamining Lab (2-2 Sem Lab)
No ratings yet
r20 Datamining Lab (2-2 Sem Lab)
41 pages
ML Lab Manual Completed
No ratings yet
ML Lab Manual Completed
56 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
66 pages
38.dynamic Analysis of Multi-Storey RCC Building
No ratings yet
38.dynamic Analysis of Multi-Storey RCC Building
7 pages
Data Mining Lab Manual CSE VII Sem
No ratings yet
Data Mining Lab Manual CSE VII Sem
63 pages
Dmdw-Lab Manual
No ratings yet
Dmdw-Lab Manual
61 pages
FAG Generation C Deep Groove Ball Bearings
No ratings yet
FAG Generation C Deep Groove Ball Bearings
8 pages
Data Analytics
No ratings yet
Data Analytics
34 pages
Class Xii PDF For Practical
No ratings yet
Class Xii PDF For Practical
24 pages
Final Dev Record
No ratings yet
Final Dev Record
49 pages
Project Report
No ratings yet
Project Report
37 pages
Jashan ML
No ratings yet
Jashan ML
20 pages
Datascience
No ratings yet
Datascience
26 pages
PP DWDM 4 5
No ratings yet
PP DWDM 4 5
26 pages
Business Intelligent
No ratings yet
Business Intelligent
20 pages
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
No ratings yet
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
29 pages
ML Expt 1 Description
No ratings yet
ML Expt 1 Description
15 pages
Creating Sense of Place
No ratings yet
Creating Sense of Place
15 pages
Pandas
No ratings yet
Pandas
27 pages
Vamshi ml-1,2
No ratings yet
Vamshi ml-1,2
25 pages
Lab 1 ML Lab
No ratings yet
Lab 1 ML Lab
15 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
96 pages
Training Report On Data Analysis With Python
No ratings yet
Training Report On Data Analysis With Python
12 pages
Server Hosting Management System (Ip Class 12) (2024-25)
No ratings yet
Server Hosting Management System (Ip Class 12) (2024-25)
21 pages
Data Preprocesing JavaPoint
No ratings yet
Data Preprocesing JavaPoint
19 pages
Document 1
No ratings yet
Document 1
16 pages
Practical (Data Science)
No ratings yet
Practical (Data Science)
13 pages
Report
No ratings yet
Report
18 pages
Dsbda Lab - 1 - 1736243987425
No ratings yet
Dsbda Lab - 1 - 1736243987425
10 pages
Dejene Chala Stat606 Screening Quiz Programming Part
No ratings yet
Dejene Chala Stat606 Screening Quiz Programming Part
12 pages
Data Mining Lab 03
No ratings yet
Data Mining Lab 03
10 pages
EDA Document
No ratings yet
EDA Document
13 pages
Recurrent Neural Network-Programs
No ratings yet
Recurrent Neural Network-Programs
9 pages
NumPy and Pandas Tutorial
No ratings yet
NumPy and Pandas Tutorial
8 pages
ML (Prac1)
No ratings yet
ML (Prac1)
12 pages
Research Q1.1 Science Process Skills
No ratings yet
Research Q1.1 Science Process Skills
5 pages
What Is Pandas
No ratings yet
What Is Pandas
9 pages
The Johns Hopkins University Press Transactions and Proceedings of The American Philological Association
No ratings yet
The Johns Hopkins University Press Transactions and Proceedings of The American Philological Association
12 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Experiment 1 Solution
No ratings yet
Experiment 1 Solution
5 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Handling Missing Values in A Real-Time Dataset During
No ratings yet
Handling Missing Values in A Real-Time Dataset During
5 pages
Pandas
No ratings yet
Pandas
5 pages
PANAS
No ratings yet
PANAS
3 pages
Unit 6
No ratings yet
Unit 6
3 pages
Construction Mining Case Study Sms Equipment
No ratings yet
Construction Mining Case Study Sms Equipment
2 pages
Blistered, Browned, and Burnt
No ratings yet
Blistered, Browned, and Burnt
2 pages
Sternberg Press - April 2017
No ratings yet
Sternberg Press - April 2017
12 pages
IEEE Paper On Modbus
No ratings yet
IEEE Paper On Modbus
4 pages
Bird's-Eye View
No ratings yet
Bird's-Eye View
3 pages
Grades 1 To 12 Daily Lesson Log School Grade Level 11 Teacher Learning Area HOPE 1 Teaching Dates and Time Week 4 Quarter FIRST
No ratings yet
Grades 1 To 12 Daily Lesson Log School Grade Level 11 Teacher Learning Area HOPE 1 Teaching Dates and Time Week 4 Quarter FIRST
3 pages
Converting UTM To Latitude ...
No ratings yet
Converting UTM To Latitude ...
7 pages
Cody's Data Cleaning Techniques Using SAS, Third Edition
From Everand
Cody's Data Cleaning Techniques Using SAS, Third Edition
Ron Cody
4.5/5 (3)
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet

DWM Lab Workbook Sample

Uploaded by

DWM Lab Workbook Sample

Uploaded by

22DSB3202 DATA WAREHOUSING & MINING (DWM)

III B.TECH 2024-25 ODD SEMESTER

KLH AZIZ NAGAR CAMPUS, HYDERABAD

SNO NAME OF PRACTICAL PAGENO

Data Warehousing and Mining Lab

1. Basic Statistical Descriptions

1. Load the dataset into a DataFrame.

# Load the dataset

# Basic statistical descriptions

Algorithm to Create and Save a DataFrame to CSV

1. Import pandas library:

TIME SERIES DATASET

Algorithm to Generate and Save a Time Series Dataset

o Save the DataFrame df to a CSV file with the path

'Name': ['Alice', 'Bob', 'Charlie', 'David'],

'Age': [24, 30, 28, 35],

'City': ['New York', 'Los Angeles', 'Chicago', 'Houston'],

'Occupation': ['Engineer', 'Doctor', 'Artist', 'Chef']

TIME SERIES DATA

# Define date range and generate data

date_range = pd.date_range(start='2024-01-01', periods=100, freq='D')

'Value': np.random.randn(len(date_range)) * 100

3. Data Pre-processing Techniques

1. Load the dataset into a DataFrame.

# Load the dataset

# Handle missing values

# Encode categorical variables

4. Classification Using Decision Trees

1. Split the dataset into features and target variables.

# Load the dataset

# Split the data

# Train the model

# Evaluate the model

5. Classification Using Bayesian Classifiers

1. Split the dataset into features and target variables.

# Load the dataset

# Split the data

# Train the model

# Evaluate the model

You might also like