TP4-ML-features Encoding
TP4-ML-features Encoding
3 - Now that we have fully explored the variables in the dataset, we can move on to getting
the dataset ready for modelling. Let’s turn the categorical data in our dataset into numerical
data. This process is otherwise known as feature encoding : first define the nominal variables
in this dataset ?
4 - In fact, there are two different ways to encode categorical variables, one using the Scikit-
5- compare the outputs of the encoder with the first five rows of the gender column
As we can see, OneHotEncoder has created two columns to represent the gender feature in our
dataframe, one for female and one for male.
Female students will receive a value of 1 in the female column and 0 in the male column whereas
male students will receive a value of 0 in the female column and 1 in the male column.
But most importantly, OneHotEncoder has successfully transformed what was originally a
categorical variable into a numerical variable.
7 – the same task can be done using pandas, type the following code to test the pandas’
approach :
8 – What is the ouput ? how did pandas convert the gender column ?
OrdinalEncoder differs from OneHotEncoder such that it assigns incremental values to unique
This helps machine learning models to recognise an ordinal variable and subsequently use the
9-2- Specify the order of the categories to be encoded, and create a variable educational
10 – ordinal variables can be encoded using pandas too , thanks to the map method, though
it is not very practical when encoding ordinal variables with a high number of unique value
11 – during preprocessing phase, hot encoder and ordinal encoder can be both combined like