0% found this document useful (0 votes)

7 views8 pages

Mine 5

The document details an exercise performed by P Koushik Reddy using Jupyter Notebook to apply various normalization techniques (Min-Max, Z-Score, Decimal Scaling) on a dataset, followed by discretization using Binning. It includes code snippets for data loading, normalization, and visualization through histograms and scatter plots, as well as analysis of central tendency and dispersion. The exercise also explores clustering methods, including dendrograms and the elbow method for determining optimal cluster numbers.

Uploaded by

hudsonnnnn16

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views8 pages

Mine 5

Uploaded by

hudsonnnnn16

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

NAME : P KOUSHIK REDDY

ROLL NO : 12212161

Note : WEKA does’t work in my laptop,hence I used Jupyter Notebook,it gives similar results

Ex. 5 Select a dataset which comprises numeric attributes of varying

range. Apply different normalization techniques viz. Min-max

normalization, z-score normalization, Decimal scaling on your datasets.

Further, discretize the numeric attributes using Binning and Histogram

analysis method. Analyze the effect of different techniques on dataset in

terms of type of attributes, statistical parameters such as central tendency

and dispersion and change in aptness of proximity metrics. (Later on this

exercise will be extended in combination of any clustering and/or

classification and/or association technique).

CODE : All the comments are for the necessary steps followed :

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.preprocessing import MinMaxScaler, StandardScaler

from scipy.cluster.hierarchy import dendrogram, linkage

from sklearn.cluster import KMeans

# Load the Iris dataset (or replace this with your dataset)

from sklearn.datasets import load_iris

iris = load_iris()

df = pd.DataFrame(iris.data, columns=iris.feature_names)
# Step 1: Display the first few rows of the dataset

print("Original Data:")

print(df.head())

# Step 2: Apply Min-Max Normalization

scaler_minmax = MinMaxScaler()

df_minmax = pd.DataFrame(scaler_minmax.fit_transform(df), columns=df.columns)

print("\nMin-Max Normalized Data:")

print(df_minmax.head())

# Step 3: Apply Z-Score Normalization

scaler_zscore = StandardScaler()

df_zscore = pd.DataFrame(scaler_zscore.fit_transform(df), columns=df.columns)

print("\nZ-Score Normalized Data:")

print(df_zscore.head())

# Step 4: Apply Decimal Scaling

df_decimal = df.copy()

for column in df_decimal.columns:

max_val = df_decimal[column].abs().max()

df_decimal[column] = df_decimal[column] / 10**np.ceil(np.log10(max_val))

print("\nDecimal Scaled Data:")

print(df_decimal.head())

# Step 5: Discretize the Numeric Attributes using Binning

df_binned = df.copy()

for column in df_binned.columns:

df_binned[column + "_binned"] = pd.cut(df_binned[column], bins=3, labels=["Low", "Medium",

"High"])
print("\nData After Binning:")

print(df_binned.head())

# Step 6: Histogram Analysis of Original and Normalized Data

plt.figure(figsize=(14, 10))

df.hist(bins=10, color='skyblue', edgecolor='black', alpha=0.7)

plt.suptitle("Histogram Analysis of Original Data")

plt.show()

plt.figure(figsize=(14, 10))

df_minmax.hist(bins=10, color='lightgreen', edgecolor='black', alpha=0.7)

plt.suptitle("Histogram Analysis of Min-Max Normalized Data")

plt.show()

plt.figure(figsize=(14, 10))

df_zscore.hist(bins=10, color='salmon', edgecolor='black', alpha=0.7)

plt.suptitle("Histogram Analysis of Z-Score Normalized Data")

plt.show()

plt.figure(figsize=(14, 10))

df_decimal.hist(bins=10, color='lightcoral', edgecolor='black', alpha=0.7)

plt.suptitle("Histogram Analysis of Decimal Scaled Data")

plt.show()

# Step 7: Analyze Effect on Central Tendency and Dispersion

print("\nOriginal Data Summary:")

print(df.describe())

print("\nMin-Max Normalized Data Summary:")

print(df_minmax.describe())
print("\nZ-Score Normalized Data Summary:")

print(df_zscore.describe())

print("\nDecimal Scaled Data Summary:")

print(df_decimal.describe())

# Step 8: Scatter Plot of Original Data (to visually inspect clustering potential)

plt.figure(figsize=(8, 6))

sns.scatterplot(data=df, x=df.columns[0], y=df.columns[1], s=100)

plt.title('Scatter Plot of Original Data')

plt.xlabel(df.columns[0])

plt.ylabel(df.columns[1])

plt.show()

# Step 9: Dendrogram for Hierarchical Clustering

linked = linkage(df, method='ward')

plt.figure(figsize=(10, 7))

dendrogram(linked, orientation='top', distance_sort='descending', show_leaf_counts=True)

plt.title('Dendrogram for Hierarchical Clustering')

plt.xlabel('Samples')

plt.ylabel('Euclidean distances')

plt.show()

# Step 10: Elbow Method to Determine Optimal Number of Clusters

wcss = []

for i in range(1, 11):

kmeans = KMeans(n_clusters=i, random_state=42)

kmeans.fit(df)

wcss.append(kmeans.inertia_)

plt.figure(figsize=(8, 6))
plt.plot(range(1, 11), wcss, marker='o', linestyle='--')

plt.title('Elbow Method for Optimal Number of Clusters')

plt.xlabel('Number of Clusters')

plt.ylabel('Within-Cluster Sum of Squares (WCSS)')

plt.show()

The output screenshots are pasted below

(Feature Engineering) (Extended-Cheatsheet)
No ratings yet
(Feature Engineering) (Extended-Cheatsheet)
9 pages
ML Unit 2
No ratings yet
ML Unit 2
21 pages
Ex No3
No ratings yet
Ex No3
17 pages
ML - Lab Manual
No ratings yet
ML - Lab Manual
54 pages
21BDS0357 VL2024250504577 Ast02
No ratings yet
21BDS0357 VL2024250504577 Ast02
5 pages
Lab 3 - Normalization of Dataset
No ratings yet
Lab 3 - Normalization of Dataset
2 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
MLFILE
No ratings yet
MLFILE
21 pages
Exp 2
No ratings yet
Exp 2
6 pages
Data Assigment 1
100% (2)
Data Assigment 1
32 pages
Practical 5
No ratings yet
Practical 5
6 pages
5 Data Preprocessing III Editted Notes
No ratings yet
5 Data Preprocessing III Editted Notes
17 pages
Exp2 - Data Visualization and Cleaning and Feature Selection
No ratings yet
Exp2 - Data Visualization and Cleaning and Feature Selection
13 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
32 pages
ML Lab
No ratings yet
ML Lab
14 pages
Example Data Mining
No ratings yet
Example Data Mining
4 pages
Data Preparation
No ratings yet
Data Preparation
11 pages
Exp-2 ML
No ratings yet
Exp-2 ML
6 pages
Chapter 3 Solutions
No ratings yet
Chapter 3 Solutions
3 pages
Rapid Miner - Data Preparation
100% (1)
Rapid Miner - Data Preparation
17 pages
Print Print Print Print: Import As
No ratings yet
Print Print Print Print: Import As
6 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
Feature Engineering
No ratings yet
Feature Engineering
18 pages
Exp 12 and 15
No ratings yet
Exp 12 and 15
4 pages
ML Lab Records
No ratings yet
ML Lab Records
101 pages
Machine Learning Lab - Preprocessing
No ratings yet
Machine Learning Lab - Preprocessing
13 pages
ML Lab - Exp1-10
No ratings yet
ML Lab - Exp1-10
4 pages
3point5point2 Normalization
No ratings yet
3point5point2 Normalization
3 pages
dmdw2 2
No ratings yet
dmdw2 2
24 pages
Batch-2 Ieee DMT
No ratings yet
Batch-2 Ieee DMT
4 pages
04 - Data Normalization in Python - en
No ratings yet
04 - Data Normalization in Python - en
1 page
Seven Lab Instruction
No ratings yet
Seven Lab Instruction
38 pages
ML Notes
No ratings yet
ML Notes
44 pages
ASSi2 DSBDA
No ratings yet
ASSi2 DSBDA
4 pages
Machine Learning Laboratory
No ratings yet
Machine Learning Laboratory
23 pages
ML Assignment 01 Code
No ratings yet
ML Assignment 01 Code
21 pages
M PDF
No ratings yet
M PDF
13 pages
Lab5.ipynb - Colaboratory
No ratings yet
Lab5.ipynb - Colaboratory
8 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
48 pages
ML Self Unit 2
No ratings yet
ML Self Unit 2
20 pages
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
No ratings yet
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
111 pages
Prog 1
No ratings yet
Prog 1
3 pages
4 - Basics in Statistics and Linear Algebra
No ratings yet
4 - Basics in Statistics and Linear Algebra
7 pages
ADS LAB Merged
No ratings yet
ADS LAB Merged
86 pages
Keeratsi HW8
No ratings yet
Keeratsi HW8
17 pages
Data Mining Assignment
No ratings yet
Data Mining Assignment
8 pages
Assignment 2 Documentation
No ratings yet
Assignment 2 Documentation
15 pages
10-2 Data Analysis and Pre-Processing Part 4 PDF
No ratings yet
10-2 Data Analysis and Pre-Processing Part 4 PDF
23 pages
Unit1 ML Programs
No ratings yet
Unit1 ML Programs
5 pages
DS Day 5
No ratings yet
DS Day 5
11 pages
Week 10
No ratings yet
Week 10
50 pages
ML Lab Exam Document
No ratings yet
ML Lab Exam Document
14 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Feature Scaling in Machine Learning
No ratings yet
Feature Scaling in Machine Learning
4 pages
3 1 Chapter 3 Normalization
No ratings yet
3 1 Chapter 3 Normalization
22 pages
Edp 3
No ratings yet
Edp 3
16 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
26 pages
Graded Project
No ratings yet
Graded Project
36 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Varma Garch
No ratings yet
Varma Garch
55 pages
Output - Group - Work - Project - 4652 - GWP1.ipynb - Colaboratory
No ratings yet
Output - Group - Work - Project - 4652 - GWP1.ipynb - Colaboratory
6 pages
Introduction To Principal Components and Factoranalysis
No ratings yet
Introduction To Principal Components and Factoranalysis
29 pages
Micro-Econometrics ECO 6175: Abel Brodeur
No ratings yet
Micro-Econometrics ECO 6175: Abel Brodeur
34 pages
Regression Techniques
No ratings yet
Regression Techniques
14 pages
M2L2 CLRM & Simple Linear Regression Analysis
No ratings yet
M2L2 CLRM & Simple Linear Regression Analysis
13 pages
TQM - Statistical Process Control
100% (1)
TQM - Statistical Process Control
93 pages
A Study On Digital Transformation in Southeast Bank PLC. (Uttara Branch) : Impacts On Customer Experience.
No ratings yet
A Study On Digital Transformation in Southeast Bank PLC. (Uttara Branch) : Impacts On Customer Experience.
59 pages
PHD Thesis Structural Equation Modeling
100% (3)
PHD Thesis Structural Equation Modeling
6 pages
3 Epidemiology and Statistics For IPC Surveillance
No ratings yet
3 Epidemiology and Statistics For IPC Surveillance
46 pages
Statistical Analysis 3: Paired T-Test: Research Question Type
No ratings yet
Statistical Analysis 3: Paired T-Test: Research Question Type
4 pages
BCS 040 PDF
No ratings yet
BCS 040 PDF
5 pages
HASTS 412 Assignment-2
No ratings yet
HASTS 412 Assignment-2
2 pages
(ST-APP) Summary of probability distributions: n x π, x = 0, 1, - . -, n
No ratings yet
(ST-APP) Summary of probability distributions: n x π, x = 0, 1, - . -, n
2 pages
Summary On One Sample Hypothesis Testing
No ratings yet
Summary On One Sample Hypothesis Testing
1 page
The Purpose of This Feasibility Study Is To Forecast The Sales of Renewable Stationary Generators Over The Next Three Years
No ratings yet
The Purpose of This Feasibility Study Is To Forecast The Sales of Renewable Stationary Generators Over The Next Three Years
2 pages
Studiul I Meta Analysis Investment Model Le Agnew 2003
100% (1)
Studiul I Meta Analysis Investment Model Le Agnew 2003
21 pages
Data Analysis Coca Cola
No ratings yet
Data Analysis Coca Cola
7 pages
Assignment 2
No ratings yet
Assignment 2
10 pages
BR-III MCQs
100% (2)
BR-III MCQs
8 pages
Unit3 Eda
No ratings yet
Unit3 Eda
13 pages
Learner'S Packet (Leap) : Student Name: Section: Subject Teacher: Adviser
No ratings yet
Learner'S Packet (Leap) : Student Name: Section: Subject Teacher: Adviser
7 pages
ML Interview Questions
No ratings yet
ML Interview Questions
146 pages
4158-Article Text-7967-1-10-20201227
No ratings yet
4158-Article Text-7967-1-10-20201227
16 pages
Muhammad Muneeb Arshad (359126) Im DVM
No ratings yet
Muhammad Muneeb Arshad (359126) Im DVM
5 pages
Data Hasil Pengujian Organoleptik Uji Hedonik Produk Dendeng Daging Sapi (Excell)
No ratings yet
Data Hasil Pengujian Organoleptik Uji Hedonik Produk Dendeng Daging Sapi (Excell)
11 pages
Week 3
No ratings yet
Week 3
2 pages
Compiled Notes
No ratings yet
Compiled Notes
12 pages
Urban Heat Island - Istanbul
No ratings yet
Urban Heat Island - Istanbul
14 pages