0% found this document useful (0 votes)
6 views3 pages

Dummy Variables Problem Statement

The document outlines a task for data preprocessing, specifically converting non-numeric data into numeric format using techniques like One Hot Encoding. It includes instructions for preparing a dataset containing animal categories and requires the submission of code files demonstrating these preprocessing techniques. Additionally, it emphasizes the importance of code modularization and commenting for clarity.

Uploaded by

haneh68937
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views3 pages

Dummy Variables Problem Statement

The document outlines a task for data preprocessing, specifically converting non-numeric data into numeric format using techniques like One Hot Encoding. It includes instructions for preparing a dataset containing animal categories and requires the submission of code files demonstrating these preprocessing techniques. Additionally, it emphasizes the importance of code modularization and commenting for clarity.

Uploaded by

haneh68937
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Dummy Variables

Instructions:
Please share your answers filled inline in the word document. Submit code files wherever
applicable.

Please ensure you update all the details:


Name: _Naveen M_________________
Batch Id: _11/09/2023-10AM______________________
Topic: Data Pre-Processing

Problem Statement:
Data is one of the most important assets. It is often common that data is stored in
distinct systems with different formats and forms. Non-numeric form of data makes it
tricky while developing mathematical equations for prediction models. We have the
preprocessing techniques to make the data convert to numeric form. Explore the
various techniques to have reliable uniform standard data, you can go through this link:
https://360digitmg.com/mindmap-data-science

1) Prepare the dataset by performing the preprocessing techniques, to have all the
features in numeric format.

Index Animals Gende Homly Types


r
1 Cat Male Yes A
2 Dog Male Yes B
3 Mouse Male Yes C
4 Mouse Male Yes C
5 Dog Female Yes A
6 Cat Female Yes B
7 Lion Female Yes D
8 Goat Female Yes E
9 Cat Female Yes A

© 360DigiTMG. All Rights Reserved.


10 Dog Male Yes B
import pandas as pd
df=pd.read_csv('C:/Users/Naveen/Desktop/DataPreprocessing/
Animal_category.csv')
df.drop('Index',axis=1,inplace=True)
df_new=pd.get_dummies(df,drop_first=True)

#We can also use One Hot Encoding


from sklearn.preprocessing import OneHotEncoder
enc=OneHotEncoder()
enc_df=pd.DataFrame(enc.fit_transform(df.iloc[:,:]))

© 360DigiTMG. All Rights Reserved.


Hints:
For each assignment, the solution should be submitted in the below format.
1. Work on each feature to create a data dictionary as displayed in the image displayed below:

2. Refer to the animal_category.csv data set.


3. Research and perform all possible steps for obtaining the solution.
4. All the codes (executable programs) should execute without errors.
5. Code modularization should be followed.
6. Each line of code should have comments explaining the logic and why you are using that
function.

© 360DigiTMG. All Rights Reserved.

You might also like