AI-ML Using Py
AI-ML Using Py
Preprocessing
Part 2
Labeling the Data
• We already know that data in a certain format is necessary for
machine learning algorithms. Another important requirement is that
the data must be labelled properly before sending it as the input of
machine learning algorithms.
• For example, if we talk about classification, there are lots of labels on
the data. Those labels are in the form of words, numbers, etc.
• Functions related to machine learning in sklearn expect that the data
must have number labels. Hence, if the data is in other form then it
must be converted to numbers.
• This process of transforming the word labels into numerical form is
called label encoding.
Labeling the Data
• Label encoding steps
• Follow these steps for encoding the data labels in Python:
• Step1: Importing the useful packages
• We need to import required packages to convert the data into certain
format. It can be done as follows:
import numpy as np
input_labels = ['red','black','red','green','black','yellow','white']
Labeling the Data
• Step 3: Creating & training of label encoder object
• In this step, we need to create the label encoder and train it. The
following Python code will help in doing this:
# Creating the label encoder
encoder = preprocessing.LabelEncoder()
encoder.fit(input_labels)
Labeling the Data
• Step4: Checking the performance by encoding random ordered list
• This step can be used to check the performance by encoding the
random ordered list. Following Python code can be written to do the
same:
# encoding a set of labels
test_labels = ['green','red','black']
encoded_values = encoder.transform(test_labels)
print("\nLabels =", test_labels)