0% found this document useful (0 votes)

21 views10 pages

AI-ML Using Py

The document outlines the importance of labeling data for machine learning, specifically focusing on label encoding in Python using the sklearn library. It details the steps for encoding labels, including importing packages, defining sample labels, creating and training a label encoder, and checking performance through encoding and decoding processes. Additionally, it contrasts labeled and unlabeled data, emphasizing the need for human expertise in labeling and the concept of semi-supervised learning to enhance model performance.

Uploaded by

Chloe Tee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views10 pages

AI-ML Using Py

Uploaded by

Chloe Tee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

AI/ML Using Python

Preprocessing
Part 2
Labeling the Data
• We already know that data in a certain format is necessary for
machine learning algorithms. Another important requirement is that
the data must be labelled properly before sending it as the input of
machine learning algorithms.
• For example, if we talk about classification, there are lots of labels on
the data. Those labels are in the form of words, numbers, etc.
• Functions related to machine learning in sklearn expect that the data
must have number labels. Hence, if the data is in other form then it
must be converted to numbers.
• This process of transforming the word labels into numerical form is
called label encoding.
Labeling the Data
• Label encoding steps
• Follow these steps for encoding the data labels in Python:
• Step1: Importing the useful packages
• We need to import required packages to convert the data into certain
format. It can be done as follows:

import numpy as np

from sklearn import preprocessing

Labeling the Data
• Step 2: Defining sample labels
• After importing the packages, we need to define some sample labels
so that we can create and train the label encoder. We will now define
the following sample labels:

# Sample input labels

input_labels = ['red','black','red','green','black','yellow','white']
Labeling the Data
• Step 3: Creating & training of label encoder object
• In this step, we need to create the label encoder and train it. The
following Python code will help in doing this:
# Creating the label encoder
encoder = preprocessing.LabelEncoder()

encoder.fit(input_labels)
Labeling the Data
• Step4: Checking the performance by encoding random ordered list
• This step can be used to check the performance by encoding the
random ordered list. Following Python code can be written to do the
same:
# encoding a set of labels
test_labels = ['green','red','black']
encoded_values = encoder.transform(test_labels)
print("\nLabels =", test_labels)

• The labels would get printed as follows:

Labels = ['green', 'red', 'black']

Labeling the Data
Step4 – Cont.
• Now, we can get the list of encoded values i.e. word labels converted
to numbers as follows:

print("Encoded values =", list(encoded_values))

• The encoded values would get printed as follows:

Encoded values = [1, 2, 0]

Labeling the Data
• Step 5: Checking the performance by decoding a random set of numbers:
• This step can be used to check the performance by decoding the random set of
numbers. Following Python code can be written to do the same:

# decoding a set of values

encoded_values = [3,0,4,1]
decoded_list = encoder.inverse_transform(encoded_values)
print("\nEncoded values =", encoded_values)

• Now, Encoded values would get printed as follows:

Encoded values = [3, 0, 4, 1]

Labeling the Data
Step 5 – Cont.
• And we can decode the labels using following code:

print("\nDecoded labels =", list(decoded_list))

• Now, decoded values would get printed as follows:

Decoded labels = ['white', 'black', 'yellow', 'green']

Labeled v/s Unlabeled Data
• Unlabeled data mainly consists of the samples of natural or human-created
object that can easily be obtained from the world. They include, audio,
video, photos, news articles, etc.
• On the other hand, labeled data takes a set of unlabeled data and
augments each piece of that unlabeled data with some tag or label or class
that is meaningful. For example, if we have a photo then the label can be
put based on the content of the photo, i.e., it is photo of a boy or girl or
animal or anything else. Labeling the data needs human expertise or
judgment about a given piece of unlabeled data.
• There are many scenarios where unlabeled data is plentiful and easily
obtained but labeled data often requires a human/expert to annotate.
Semi-supervised learning attempts to combine labeled and unlabeled data
to build better models.

Cse3001 Ai Ml m2
No ratings yet
Cse3001 Ai Ml m2
118 pages
Python Scikit-Learn Cheat Sheet For Machine Learning
No ratings yet
Python Scikit-Learn Cheat Sheet For Machine Learning
3 pages
05 Labels and Losses PDF
No ratings yet
05 Labels and Losses PDF
80 pages
Usc 08
No ratings yet
Usc 08
46 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Week2 DL
No ratings yet
Week2 DL
29 pages
6 - Machine Learning 2
No ratings yet
6 - Machine Learning 2
14 pages
CSL0777 L09
No ratings yet
CSL0777 L09
29 pages
Approaching (Almost) Any Machine Learning Problem - Abhishek Thakur - No Free Hunch
No ratings yet
Approaching (Almost) Any Machine Learning Problem - Abhishek Thakur - No Free Hunch
22 pages
Get Algorithmic Trading Methods: Applications Using Advanced Statistics, Optimization, and Machine Learning Techniques 2nd Edition Robert Kissell free all chapters
100% (1)
Get Algorithmic Trading Methods: Applications Using Advanced Statistics, Optimization, and Machine Learning Techniques 2nd Edition Robert Kissell free all chapters
55 pages
04_labels_and_losses
No ratings yet
04_labels_and_losses
65 pages
chapter2 (1)
No ratings yet
chapter2 (1)
35 pages
Deep Learning Workflow
No ratings yet
Deep Learning Workflow
11 pages
Lecture 5 Encoding
No ratings yet
Lecture 5 Encoding
35 pages
07. DE - Python For Data Science - machine learning
No ratings yet
07. DE - Python For Data Science - machine learning
45 pages
Loading & Working With Datasets: Student Name Student Roll # Program Section
No ratings yet
Loading & Working With Datasets: Student Name Student Roll # Program Section
8 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
92 pages
Experiement 1,2,4 and 5
No ratings yet
Experiement 1,2,4 and 5
12 pages
ML 5th and 6th
No ratings yet
ML 5th and 6th
37 pages
4 Data Preprocessing
No ratings yet
4 Data Preprocessing
27 pages
Lecture W1c UG
No ratings yet
Lecture W1c UG
33 pages
Lecture-2-20022025-092902am
No ratings yet
Lecture-2-20022025-092902am
87 pages
Dwdm-Lab Manual
No ratings yet
Dwdm-Lab Manual
39 pages
Week 01
No ratings yet
Week 01
37 pages
ADS_phase 3
No ratings yet
ADS_phase 3
34 pages
Deep Learning
No ratings yet
Deep Learning
25 pages
Day11 Machine Learning
No ratings yet
Day11 Machine Learning
37 pages
OR forecasting tool
No ratings yet
OR forecasting tool
39 pages
Null 0
No ratings yet
Null 0
6 pages
FND Imp Points
No ratings yet
FND Imp Points
6 pages
Chapter04 - Getting Started With Neural Networks
No ratings yet
Chapter04 - Getting Started With Neural Networks
9 pages
Natural language processing-Section (5)
No ratings yet
Natural language processing-Section (5)
38 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
Data Preprocessing Implementation 13112023 061217pm
No ratings yet
Data Preprocessing Implementation 13112023 061217pm
31 pages
statistic inference unit 2 notes
No ratings yet
statistic inference unit 2 notes
34 pages
C2W3_Lab_01_Model_Evaluation_and_Selection
No ratings yet
C2W3_Lab_01_Model_Evaluation_and_Selection
21 pages
A Comprehensive Guide To Understand and Implement Text Classification in Python
No ratings yet
A Comprehensive Guide To Understand and Implement Text Classification in Python
34 pages
Assignment 3 Dl
No ratings yet
Assignment 3 Dl
6 pages
exp3
No ratings yet
exp3
7 pages
72b85f60-8523-423f-9efc-ff56aa21f3f3
No ratings yet
72b85f60-8523-423f-9efc-ff56aa21f3f3
29 pages
Slides on DataI
No ratings yet
Slides on DataI
33 pages
UNIT 1
No ratings yet
UNIT 1
28 pages
Machine Learning With Python Data Preprocessing, Analysis and Visualization
No ratings yet
Machine Learning With Python Data Preprocessing, Analysis and Visualization
8 pages
Data Preprocessing
No ratings yet
Data Preprocessing
38 pages
Introduction To Scikit Learn
100% (1)
Introduction To Scikit Learn
108 pages
ML Intro Theory
No ratings yet
ML Intro Theory
10 pages
Model_learning_steps
No ratings yet
Model_learning_steps
12 pages
4_ML_Manual
No ratings yet
4_ML_Manual
1 page
DLT Experiment 3
No ratings yet
DLT Experiment 3
10 pages
Scikit Learn
No ratings yet
Scikit Learn
17 pages
How To Prepare Your Dataset For Machine Learning in Python
No ratings yet
How To Prepare Your Dataset For Machine Learning in Python
14 pages
Data Pre Process i
No ratings yet
Data Pre Process i
6 pages
ML (Prac1)
No ratings yet
ML (Prac1)
12 pages
Session 1
No ratings yet
Session 1
22 pages
ML
No ratings yet
ML
8 pages
Lab 08 - Data Preprocessing
No ratings yet
Lab 08 - Data Preprocessing
9 pages
Lecture 2 - Supervised Learning
No ratings yet
Lecture 2 - Supervised Learning
6 pages
Beginner’s Guide to Implementing a Simple Machine Learning Project - DeV Community
No ratings yet
Beginner’s Guide to Implementing a Simple Machine Learning Project - DeV Community
9 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Image_Segmentation_DeepLearning
No ratings yet
Image_Segmentation_DeepLearning
18 pages
Deep Learning With Keras
100% (5)
Deep Learning With Keras
136 pages
Collatz Conjecture
No ratings yet
Collatz Conjecture
19 pages
ME 422_442_StaticFailureTheories_r4 - Tagged
No ratings yet
ME 422_442_StaticFailureTheories_r4 - Tagged
29 pages
2024 Secure and efcient message transmission in MANET using
No ratings yet
2024 Secure and efcient message transmission in MANET using
24 pages
DAA_2.2-Mincost Spanning Tree
No ratings yet
DAA_2.2-Mincost Spanning Tree
29 pages
Lecture 08 Image Segmentation
No ratings yet
Lecture 08 Image Segmentation
31 pages
Image Segmentation in Python- Practical Hands-On (3)
No ratings yet
Image Segmentation in Python- Practical Hands-On (3)
24 pages
210430_PracticalWeek03a
No ratings yet
210430_PracticalWeek03a
1 page
Watershed_Segmentation (2)
No ratings yet
Watershed_Segmentation (2)
22 pages
(Sergei Artemov, Anil Nerode (Eds.) ) Logical Founda (B-Ok - Xyz)
No ratings yet
(Sergei Artemov, Anil Nerode (Eds.) ) Logical Founda (B-Ok - Xyz)
378 pages
210424_PracticalWeek02
No ratings yet
210424_PracticalWeek02
1 page
students - students
No ratings yet
students - students
1 page
Assignment 2 - FIR
No ratings yet
Assignment 2 - FIR
3 pages
(20230605) Slides - Lesson 6 - Capital Budgeting Techniques
No ratings yet
(20230605) Slides - Lesson 6 - Capital Budgeting Techniques
31 pages
Quantum Kuramoto Model
No ratings yet
Quantum Kuramoto Model
14 pages
ITS66034 Group 24 Assignment
No ratings yet
ITS66034 Group 24 Assignment
13 pages
Nominal Vs Linear Regression
No ratings yet
Nominal Vs Linear Regression
17 pages
W11 Lecture ITS69204 Image Recognition (1)
No ratings yet
W11 Lecture ITS69204 Image Recognition (1)
44 pages
Lab2 Computer Controlled Sys
No ratings yet
Lab2 Computer Controlled Sys
11 pages
Cyclomatic and Loc Complexity
No ratings yet
Cyclomatic and Loc Complexity
15 pages
Birhane Gemeda, Shear Capacity Prediction
No ratings yet
Birhane Gemeda, Shear Capacity Prediction
108 pages
Lab 10 - Random Forest Classifier
No ratings yet
Lab 10 - Random Forest Classifier
3 pages
Lecture 3
No ratings yet
Lecture 3
21 pages
DP 2
No ratings yet
DP 2
16 pages
Probable Maximum Loss
No ratings yet
Probable Maximum Loss
4 pages
Codevita Problem1
No ratings yet
Codevita Problem1
5 pages
302 Decision Science 2019 Answer Key
No ratings yet
302 Decision Science 2019 Answer Key
14 pages
ALGORITHMS
No ratings yet
ALGORITHMS
2 pages
World's Hardest Sudoku
No ratings yet
World's Hardest Sudoku
4 pages
Super-Exponential Methods For Blind Deconvolution: Shalvi and Ehud Weinstein, Ieee
No ratings yet
Super-Exponential Methods For Blind Deconvolution: Shalvi and Ehud Weinstein, Ieee
16 pages
5.3.3.3.3. Blocking of Full Factorial Designs
No ratings yet
5.3.3.3.3. Blocking of Full Factorial Designs
3 pages
DA Project Report
No ratings yet
DA Project Report
6 pages
Chapter 8 - Confidence Intervals - Lecture Notes
No ratings yet
Chapter 8 - Confidence Intervals - Lecture Notes
12 pages
Subject Control Systems: 18EC43 Model Question Paper-2 With Effect From 2019-20 (CBCS Scheme)
No ratings yet
Subject Control Systems: 18EC43 Model Question Paper-2 With Effect From 2019-20 (CBCS Scheme)
4 pages
Mini Project Report PDF
100% (2)
Mini Project Report PDF
20 pages
Simulation of A Single-Server Queueing System: 1.4.1 Problem Statement
No ratings yet
Simulation of A Single-Server Queueing System: 1.4.1 Problem Statement
11 pages
Theoretical Framework
No ratings yet
Theoretical Framework
7 pages
Quiz All
No ratings yet
Quiz All
6 pages
Exam SC-400: Microsoft Information Protection and Compliance Administrator Associate Exam Preparation
From Everand
Exam SC-400: Microsoft Information Protection and Compliance Administrator Associate Exam Preparation
Georgio Daccache
No ratings yet

AI-ML Using Py

Uploaded by

AI-ML Using Py

Uploaded by

AI/ML Using Python

from sklearn import preprocessing

# Sample input labels

• The labels would get printed as follows:

Labels = ['green', 'red', 'black']

print("Encoded values =", list(encoded_values))

• The encoded values would get printed as follows:

Encoded values = [1, 2, 0]

# decoding a set of values

• Now, Encoded values would get printed as follows:

Encoded values = [3, 0, 4, 1]

print("\nDecoded labels =", list(decoded_list))

• Now, decoded values would get printed as follows:

Decoded labels = ['white', 'black', 'yellow', 'green']

You might also like