0% found this document useful (0 votes)
8 views2 pages

Discretization Problem Statement21

Uploaded by

baneeru11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views2 pages

Discretization Problem Statement21

Uploaded by

baneeru11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

DISCRETIZATION

Instructions:

Please share your answers filled inline in the word document. Submit Python code and R code
files wherever applicable.

Please ensure you update all the details:

Name: hari machavarapu

Batch Id: dswdcmb 150622h


Topic: Data Pre-Processing

Problem Statement:
Everything will revolve around the data in Analytics world. Proper data will help you to
make useful predictions which improve your business. Sometimes the usage of original
data as it is does not help to have accurate solutions. It is needed to convert the data
from one form to another form to have better predictions. Explore on various
techniques to transform the data for better model performance. you can go through
this link:
https://360digitmg.com/mindmap-data-science
1) Convert the continuous data into discrete classes on iris dataset.
Prepare the dataset by performing the preprocessing techniques, to have the
data which improve model performance.

Sepal.Length Sepal.Width Petal.Length Petal.Width Species


5.1 3.5 1.4 0.2 setosa
4.9 3 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa

© 2013 - 2021 360DigiTMG. All Rights Reserved.


4.9 3.1 1.5 0.1 setosa

CODE-
#
import pandas as pd
data = pd.read_csv("C:/Users/hudso/Downloads/DataSets-Data Pre
Processing/DataSets/iris.csv")
data.describe()
data.head()
data['SepalLength_new'] = pd.cut(data['SepalLength'], bins = [min(data.SepalLength),
data.SepalLength.mean(), max(data.SepalLength)], labels=["Low", "High"])
data['SepalWidth_new'] = pd.cut(data['SepalWidth'], bins = [min(data.SepalWidth),
data.SepalWidth.mean(), max(data.SepalWidth)], labels=["Low", "High"])
data['PetalLength_new'] = pd.cut(data['PetalLength'], bins = [min(data.PetalLength),
data.PetalLength.mean(), max(data.PetalLength)], labels=["Low", "High"])
data['PetalWidth_new'] = pd.cut(data['PetalWidth'], bins = [min(data.PetalWidth),
data.PetalWidth.mean(), max(data.PetalWidth)], labels=["Low", "High"])
data.head(150)
data.SepalLength_new.value_counts()
data.SepalWidth_new.value_counts()
data.PetalLength_new.value_counts()
data.PetalWidth_new.value_counts()

Hints:
For each assignment, the solution should be submitted in the below format
1. Work on each feature to create a data dictionary as displayed in the image
displayed below:

2. Hint: Refer to Iris.csv, which is a public dataset.


3. Research and perform all possible steps for obtaining solution
4. All the codes (executable programs) should execute without errors
5. Code modularization should be followed
6. Each line of code should have comments explaining the logic and why you are using
that function

© 2013 - 2021 360DigiTMG. All Rights Reserved.

You might also like