0% found this document useful (0 votes)
14 views4 pages

TP4-ML-features Encoding

Uploaded by

mariagemariagee5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views4 pages

TP4-ML-features Encoding

Uploaded by

mariagemariagee5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Module : Machine Learning 1 Academic year : 2024/2025

TP4 : Features Encoding (follow up on the students’ performance dataset)

1 – display the mean score summary statistics

2 – draw a plot showing the distribution of the mean score

3 - Now that we have fully explored the variables in the dataset, we can move on to getting

the dataset ready for modelling. Let’s turn the categorical data in our dataset into numerical

data. This process is otherwise known as feature encoding : first define the nominal variables

in this dataset ?

4 - In fact, there are two different ways to encode categorical variables, one using the Scikit-

learn OneHotEncoder , and the other one is using Pandas get_dummies :

Let’s encode the gender column :

4-1 - First, instantiate the encoder like :

4-2 – then, Apply OneHotEncoder to the gender column :


What do you notice after applying the encoder ?

5- compare the outputs of the encoder with the first five rows of the gender column

6 – to check the work of the encoder, type this piece of code :

As we can see, OneHotEncoder has created two columns to represent the gender feature in our
dataframe, one for female and one for male.

Female students will receive a value of 1 in the female column and 0 in the male column whereas
male students will receive a value of 0 in the female column and 1 in the male column.

But most importantly, OneHotEncoder has successfully transformed what was originally a
categorical variable into a numerical variable.

7 – the same task can be done using pandas, type the following code to test the pandas’

approach :

8 – What is the ouput ? how did pandas convert the gender column ?

9 – Now, let’s encode ordinal variables :

OrdinalEncoder differs from OneHotEncoder such that it assigns incremental values to unique

values in an ordinal variable rather than just 0 and 1.

This helps machine learning models to recognise an ordinal variable and subsequently use the

information that it has learned to make more accurate predictions.


9-1- list the unique values in the parent education level column ?

9-2- Specify the order of the categories to be encoded, and create a variable educational

categories, containing the different categories in order.

9-3- Instantiate the ordinal encoder :

9-4- apply the ordinal encoder to parental level of education column :

9-5 – type this code :

What do you notice ?

10 – ordinal variables can be encoded using pandas too , thanks to the map method, though

it is not very practical when encoding ordinal variables with a high number of unique value
11 – during preprocessing phase, hot encoder and ordinal encoder can be both combined like

the following, into a single-step column transformer :

11 – 1 - first, let’s separate the predictor variables and target variables :

11-2 – build the transformer column

11-3- Apply it on the predictor variables :

You might also like