0% found this document useful (0 votes)
47 views10 pages

DWM Lab Workbook Sample

Uploaded by

Likhith Srinivas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views10 pages

DWM Lab Workbook Sample

Uploaded by

Likhith Srinivas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

22DSB3202 DATA WAREHOUSING & MINING (DWM)

LAB WORKBOOK
22DSB3202 DATA WAREHOUSING & MINING
(DWM)

III B.TECH 2024-25 ODD SEMESTER

KLH AZIZ NAGAR CAMPUS, HYDERABAD


22DSB3202 DATA WAREHOUSING & MINING (DWM)

LABORATORY
WORKBOOK
STUDENT NAME

REGN NO
Year III(2024-25)

Odd semester
SEMESTER

SECTION
Dr. Shahin Fatima
FACULTY
2

2
22DSB3202 DATA WAREHOUSING & MINING (DWM)

TABLE OF CONTENTS

SNO NAME OF PRACTICAL PAGENO

3
22DSB3202 DATA WAREHOUSING & MINING (DWM)

Data Warehousing and Mining Lab


Manual
Introduction
This lab manual outlines various data warehousing and mining techniques using Python.
Each section includes the algorithm, objectives, and sample code. Ensure that you have
the required libraries installed (pandas, numpy, scikit-learn, mlxtend, matplotlib,
and seaborn).

1. Basic Statistical Descriptions


Algorithm

1. Load the dataset into a DataFrame.


2. Use descriptive statistics to summarize the dataset.
3. Compute measures like mean, median, mode, standard deviation, and quantiles.

Code
import pandas as pd

# Load the dataset


data = pd.read_csv('your_dataset.csv')

# Basic statistical descriptions


description = data.describe()
print(description)

OUTPUT

4
22DSB3202 DATA WAREHOUSING & MINING (DWM)

2. Dataset Creation
CATEGORICAL DATA

Algorithm to Create and Save a DataFrame to CSV

1. Import pandas library:


o Load the pandas library into the Python environment to work with data
frames.
2. Define data:
o Create a dictionary named data where:
 Keys represent column names: 'Name', 'Age', 'City',
'Occupation'.
 Values are lists containing the respective column data.
3. Create DataFrame:
o Use the pandas DataFrame constructor to convert the data dictionary into
a DataFrame object named df.
4. Save DataFrame to CSV:
o Call the to_csv method on the DataFrame df to save it to a CSV file.
o Specify the file name as 'categorical_dataset.csv'.
o Set index=False to exclude the index column in the CSV file.

TIME SERIES DATASET

Algorithm to Generate and Save a Time Series Dataset

1. Import Libraries:
o Import the pandas library as pd to handle data manipulation and
DataFrame creation.
o Import the numpy library as np to generate random numbers.
2. Define Date Range:
o Create a date range starting from '2024-01-01' with a total of 100
periods (days).
o Use a daily frequency ('D') to generate the sequence of dates.
3. Generate Random Data:
o Create a dictionary data where:
 'Date' is assigned the date range created in step 2.
 'Value' is assigned a numpy array of random numbers generated
from a normal distribution, scaled by 100.
 'Value' contains random values drawn from a normal distribution
with a mean of 0 and standard deviation of 100.
4. Create DataFrame:
o Convert the dictionary data into a pandas DataFrame object named df.
5. Save DataFrame to CSV:

5
22DSB3202 DATA WAREHOUSING & MINING (DWM)

o Save the DataFrame df to a CSV file with the path


'C:/Users/lenovo/Documents/time_series_dataset.csv'.
o Set index=False to exclude the DataFrame index from being written to
the CSV file.

Code
CATEGORICAL DATA

import pandas as pd

# Define data

data = {

'Name': ['Alice', 'Bob', 'Charlie', 'David'],

'Age': [24, 30, 28, 35],

'City': ['New York', 'Los Angeles', 'Chicago', 'Houston'],

'Occupation': ['Engineer', 'Doctor', 'Artist', 'Chef']

# Create DataFrame

df = pd.DataFrame(data)

# Save to CSV

#df.to_csv('categorical_dataset.csv', index=False)

df.to_csv('C:/Users/lenovo/Documents/categorical_dataset.csv', index=False)

TIME SERIES DATA

import pandas as pd

import numpy as np

# Define date range and generate data

date_range = pd.date_range(start='2024-01-01', periods=100, freq='D')

data = {

6
22DSB3202 DATA WAREHOUSING & MINING (DWM)

'Date': date_range,

'Value': np.random.randn(len(date_range)) * 100

# Create DataFrame

df = pd.DataFrame(data)

# Save to CSV

df.to_csv('C:/Users/lenovo/Documents/time_series_dataset.csv', index=False)

OUTPUT

7
22DSB3202 DATA WAREHOUSING & MINING (DWM)

3. Data Pre-processing Techniques


Algorithm

1. Load the dataset into a DataFrame.


2. Identify and handle missing values:3
o Fill missing values with the mean or median.
o Drop rows or columns with excessive missing values.
3. Convert categorical variables to numeric using one-hot encoding.

Code
import pandas as pd

# Load the dataset


data = pd.read_csv('your_dataset.csv')

# Handle missing values


data.fillna(data.mean(), inplace=True) # Fill numeric columns with
mean

# Encode categorical variables


data = pd.get_dummies(data, drop_first=True)
print(data.head())

OUTPUT

8
22DSB3202 DATA WAREHOUSING & MINING (DWM)

4. Classification Using Decision Trees


Algorithm

1. Split the dataset into features and target variables.


2. Split the data into training and testing sets.
3. Create and train a Decision Tree model.
4. Evaluate the model's accuracy on the test set.

Code
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

# Load the dataset


data = pd.read_csv('your_dataset.csv')

# Split the data


X = data.drop('target', axis=1)
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Train the model


model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# Evaluate the model


accuracy = model.score(X_test, y_test)
print(f'Accuracy: {accuracy}')

OUTPUT

9
22DSB3202 DATA WAREHOUSING & MINING (DWM)

5. Classification Using Bayesian Classifiers


Algorithm

1. Split the dataset into features and target variables.


2. Split the data into training and testing sets.
3. Create and train a Naive Bayes model.
4. Evaluate the model's accuracy on the test set.

Code
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split

# Load the dataset


data = pd.read_csv('your_dataset.csv')

# Split the data


X = data.drop('target', axis=1)
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Train the model


model = GaussianNB()
model.fit(X_train, y_train)

# Evaluate the model


accuracy = model.score(X_test, y_test)
print(f'Accuracy: {accuracy}')

OUTPUT

10

You might also like