0% found this document useful (0 votes)

3 views37 pages

Day11 Machine Learning

The document provides a comprehensive guide on data preprocessing for machine learning using Python, detailing the importance and steps involved in transforming raw data into a clean dataset. It outlines seven key steps in the preprocessing process, including importing libraries, handling missing data, and encoding categorical data. The document emphasizes the necessity of proper data formatting to achieve better results in machine learning models.

Uploaded by

Rahul Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views37 pages

Day11 Machine Learning

Uploaded by

Rahul Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 37

NATIONAL INSTITUTE OF ELECTRONICS AND INFORMATION TECHNOLOGY

Sumit Complex, A-1/9, Vibhuti Khand, Gomti Nagar, Lucknow,

Setting Up User Accounts

Machine Learning using Python

1 Day 11
Course: Machine Learning using Python
Module: Data Preprocessing for Machine Learning
2 Index
 Data Preprocessing  Dependent Variable
 Why Data Preprocessing  Step 4: Taking care of Missing Data in Dataset
 Data Preprocessing Process  Replacing missing values
 Data Preprocessing Steps  Step 5: Encoding Categorical Data
 Step 1: Import Libraries  Label Encoding
 Step 2: Loading the dataset  One Hot Encoding
 Step 3: Identify Independent and Dependent Va  Step 6: Feature Scaling
riables
 Step 7: Splitting the Dataset into Training set an
 Independent Variables d Test Set
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
3 Data Preprocessing
 In any machine learning process, data preprocessing is the step in which data is transformed or
encoded so that the machine can process the data easily. The features of data can now be easily
interpreted by machine learning algorithms.
 Pre-processing refers to the transformations applied to our data before feeding it to the algorithm.
 It is a technique that is used to convert the raw data into a clean data set. In other words,
whenever the data is gathered from different sources it is collected in raw format which is not
feasible for the analysis.
 L.fit(X, y)
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
4 Why Data Preprocessing
 For achieving better results from the applied model in Machine Learning projects, the format of the
data has to be in a proper manner. Some specified Machine Learning model needs information in
a specified format.

For example, Random Forest algorithm does not support null values, therefore to execute
random forest algorithm null values have to be managed from the original raw data set.
 Another aspect is that data set should be formatted in such a way that more than one Machine
Learning and Deep Learning algorithms are executed in one data set, and best out of them is
chosen.
 The data has to be in proper format and any missing values must be processed before applying
the Machine Learning algorithms.
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
5 Data Preprocessing Process
 Formatting the data to make it suitable for ML (structured format).
 Cleaning the data to remove incomplete variables.
 Sampling the data further to reduce running times for algorithms and memory requirements.
 Selecting data objects and attributes for the analysis.
 Creating/changing the attributes.
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
6 Data Preprocessing Steps
The rituals programmers usually perform data pre processing in 7 simple steps.
 Step 1: Importing the libraries
 Step 2: Loading the Dataset
 Step 3: Identify independent and dependent feature
 Step 4: Handling of Missing Data
 Step 5: Handling of Categorical Data
 Step 6: Feature Scaling
 Step 7: Splitting the dataset into training and testing datasets
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
7 Step 1: Import Libraries
 First step is usually importing the libraries that will be needed in the program. A library is
essentially a collection of modules that can be called and used. Built-in functions are defined in
libraries which can be used by the programmer.

For example, importing the library pandas and assigning alias as pd.

import pandas as pd
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
8 Step 2: Loading the dataset
 Load the dataset into pandas data frame using read_csv() function. The read_csv() function reads
comma separated values (csv) dataset into pandas dataframe.

import pandas as pd
dataset = pd.read_csv(‘Data_for_preprocessing.csv')
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
9 Step 3: Identify Independent and Dependent Variables
 The next step of data preprocessing is to identify independent and dependent variables from the
dataset.
 All the features of any dataset are not important for Machine Learning algorithm.
 Classification of dependent and independent feature is very important in Machine Learning.
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
10 Independent Variables
 Independent variables (also referred to as Features) are the input for a process that is being
analyzes.
 Usually independent features/variables are also known as input features/variables and
represented as X.
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
11 Continue……..
 For example, in Data_for_preprocessing dataset, the features such as Country, Age and Salary is
known as independent features because they are not dependent to Purchased feature.
 They must be extracted before starting Machine Learning process.
 They can be extracted from the dataset as follows:

X=dataset.drop([‘Purchased’], axis=1)

Dropping the ‘Purchased’ feature from the dataset and initializing the remaining features to X.
Here, axis=1 means dropping the column named ‘Purchased’ from the dataset.
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
12 Dependent Variable
 Dependent variables/features are the output of the process.
 Dependent features/variables are also known as output feature/variable and represented as y.
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
13 Continues...
 The result (whether a user purchased or not) is the dependent variable.
 It must be extracted before starting Machine Learning Process.
 It can be extracted as follows:

y=dataset[‘Purchased’]

Now, the ‘Purchased’ column of dataset will be assigned to y.

Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
14 Step 4: Taking care of Missing Data in Dataset
 In Python, specifically Pandas, NumPy and Scikit-Learn, missing values are represented as NaN.
 Values with a NaN value are ignored from operations like sum, count, etc.
 Missing values are specified with NaN. Python will recognize only NaNs as missing.
 Any other missing values such as space, .(dot), *, $ or # will not be recognized by the Python as
missing values.
 Missing values other than NaN are handled by na_values parameter of read_csv().
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
15 Continue…
 na_values - handles non NaN values in a DataFrame.

For example: In the Data_for_proprocessing.csv file, the missing values are represented by ‘#’.
The ‘#’ can be replaced with NaN as

dataset=pd.read_csv(‘Data_for_preprocessing.csv', na_values=[' #‘,’NULL’])

 Here, na_values=[' #‘,’NULL’] specifies that the # and NULL values are treated as NaN.
 We can specify any symbol as missing value in na_values. The symbol depends upon the dataset
being used.
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
16 Checking Missing Values
 isnull(): isnull() function is used to check missing values in a data frame. It returns Boolean values
which are True for NaN values.
 Checking entire data frame

print(dataset.isnull())
 Checking Age column only

print(dataset['Age'].isnull())
 Counting missing values from each column

print(dataset.isnull().sum())
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
17 Replacing missing values
 A Simple Option: Drop Columns or rows with Missing Values

dropna(): dropna() function is used to drop Rows/Columns with NaN values.

 To drop columns with missing values:

X=X.dropna(axis=1)

Now, the column which has NaN values will be dropped from the X dataframe.
 To drop rows with missing values:

X=X.dropna()

Now, all the rows with NaN values are dropped from the X dataframe.
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
18 Replacing missing values
 A Better Option: Imputation
 The Imputer() class can take a few parameters —
 missing_values: The missing_values placeholder which has to be imputed. By default is NaN.
 strategy : The data which will replace the NaN values from the dataset. The
strategy argument can take the values – ‘mean'(default), ‘median’,
‘most_frequent’.
 axis : We can either assign it 0 or 1. 0 to impute along columns and 1 to impute
along rows.
 Imputer works on numbers, not strings.
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
19 Replacing Numerical Values
 For numerical values, the simplest method is to replace the missing numerical values with
mean.

from sklearn.preprocessing import Imputer

fill_NaN = Imputer(missing_values='NaN', strategy='mean', axis=0)

X[['Age','Salary']]= fill_NaN.fit_transform(X[['Age','Salary']])

print (X)
 Note: Since ‘Age’ and ‘Salary’ column contains numerical values. So the missing values of ‘Age’
and ‘Salary’ column is replaced by their mean.
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
20 Replacing Categorical Values
 For Categorical values, count the occurrences of each category and replace the missing
values with high frequency values.
 Count frequency of each category

#Imputing missing values of categorical column 'Country'

#Counting frequency of each category in 'Country' Column using

value_counts()

X['Country'].value_counts()
Output: France 4
Spain 3
Germany 1
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
21 Continue…
 Replace the missing values with highest frequency value

Output suggests that the most frequent value is ‘France’. So replace the NaN values of
‘Country’ Column with ‘France’.

#Replacing the NaN values with 'France'

X[‘Country’].fillna('France', inplace=True)
 Checking missing values again,

X.isnull().sum()
Output: Country 0
Age 0
Salary 0
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
22 Step 5: Encoding Categorical Data
 Machine learning algorithms require numerical inputs.
 Categorical data are variables that contain label values rather than numeric values.
 The number of possible values is often limited to a fixed set.
 Machine learning algorithms cannot work with variables in text form.
 Categorical values must be transformed into numeric values to work with machine learning
algorithm.
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
23 Encoding Categorical Data
Categorical values can be transformed in to numeric values by :
 Label Encoding
 One Hot Encoding
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
24 Label Encoding
LabelEncoder:
 Encode target labels with value between 0 and n_classes-1.
 This transformer should be used to encode target values, i.e. y, and not the input X.
 Example:

from sklearn.preprocessing import LabelEncoder

lb_encode = LabelEncoder()

# Encode labels in column 'Country'.

X['Country']= lb_encode.fit_transform(X['Country'])

print(X.head())
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
25 Continue…
 Output:
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
26 Limitation of Label Encoding
 Label encoding convert the data in machine readable form, but it assigns a unique number(starting
from 0) to each class of data.
 This may lead to the generation of priority issue in training of data sets. A label with high value may
be considered to have high priority than a label having lower value.
 For example, on Label Encoding ‘Country’ column, let France is replaced with 0 , Germany is
replaced with 1 and Spain is replaced with 2.
 With this, it can be interpreted that Spain have high priority than Germany and France while
training the model. But actually there is no such priority relation between these countries.
 This can be overcome by the concept of One-Hot Encoding.
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
27 One Hot Encoding
 The technique to convert categorical values into a numerical vector is known as one hot encoding.
 It refers to splitting the column which contains numerical categorical data to many columns
depending on the number of categories present in that column. Each column contains “0” or “1”
corresponding to which column it has been placed.
 The resulting vector will have only one element equal to 1 and the rest will be 0.

For example, In given dataset ‘Country’ column contains categorical data. So, ‘Country’ column must
be converted into numerical values before starting Machine Learning Process.
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
28 Continue…
get_dummies(): Used to encode categorical values into numerical values.

Syntax: get_dummies(dataframe)

Example: X=get_dummies(X)
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
29 Step 6: Feature Scaling
 Real world dataset contains features that highly vary in magnitudes, units, and range.
 Differences in the scales across input variables may increase the difficulty of the problem being
modelled. An example of this is that large input values (e.g. a spread of hundreds or thousands of
units) can result in a model that learns large weight values.
 A model with large weight values is often unstable, meaning that it may suffer from poor
performance during learning and sensitivity to input values resulting in higher generalization error.
 Feature Scaling or Standardization is a step of Data Pre Processing which is applied to
independent variables or features of data.
 It basically helps to normalise the data within a particular range. Sometimes, it also helps in
speeding up the calculations in an algorithm.
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
30 Feature Scaling
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
31 StandardScaler
 StandardScaler performs the task of Standardization. Usually a dataset contains variables that are
different in scale. For e.g. an Employee dataset will contain AGE column with values on scale 20-
70 and SALARY column with values on scale 10000-80000.
 As these two columns are different in scale, they are Standardized to have common scale while
building machine learning model.
 Scaling is done for numerical values only. Categorical values are not scaled.

Example: from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

#scaling ‘Age’ and ‘Salary’ Column only

X[['Age','Salary']] = scaler.fit_transform(X[['Age','Salary']])
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
32 MinMaxScaler
 Transform features by scaling each feature to a given range.
 This estimator scales and translates each feature individually such that it is in the given range on
the training set, e.g. between zero and one.

Example:

from sklearn.preprocessing import MinMaxScaler

scalerX = MinMaxScaler(feature_range=(0, 1))

X[['Age','Salary']] = scalerX.fit_transform(X[['Age','Salary']])

X
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
33 Step 7: Splitting the Dataset into Training set and Test Set
 One important aspect of all machine learning models is to determine their accuracy. Now, in order
to determine their accuracy, one can train the model using the given dataset and then predict the
response values for the same dataset using that model and hence, find the accuracy of the model.
 A better option is to split our data into two parts: first one for training our machine learning model,
and second one for testing our model.
 Train the model on the training set.
 Test the model on the testing set, and evaluate how well our model did.
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
34 Continues…
 train_test_split: splits the data into two sets: train and test.
 It returns four datasets: X_train, X_test, y_train, y_test.

Parameters:
 test_size: This parameter decides the size of the data that has to be split as the test dataset. This
is given as a fraction. For example, if you pass 0.8 as the value, the dataset will be split 80% as
the test dataset.
 random_state: Here you pass an integer, which will act as the seed for the random number
generator during the split.
Course:
Course: NIELITLearning
Machine ‘O’ Levelusing
(IT) Python
Module:
Module: M2-R5:
Data Introduction
Preprocessing for to ICT Resources
Machine Learning
35 Continues…
 Example:
 Now X and y is ready. Spilt the data in two parts: train data and test data as:

from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test =train_test_split(X,y,test_size = 0.20, random_state=42)

 The 80% of data will be assigned as training data and remaining 20% of data will be assigned as
testing data.
Course: Machine Learning using Python
Module: Data Preprocessing for Machine Learning
36 References
• Wikipedia.org

• Tutorialspoint.com

• https://www.geeksforgeeks.org/

• https://www.kaggle.com/

• https://github.com/
Course: Machine Learning using Python
Module: Data Preprocessing for Machine Learning
37

Thank
You ! ! !

Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
27 pages
Technical Handbook Abarth 500 A.C. and L.E
100% (1)
Technical Handbook Abarth 500 A.C. and L.E
52 pages
Machine Learning Algorithms PDF
100% (1)
Machine Learning Algorithms PDF
148 pages
Data Pre Process I
No ratings yet
Data Pre Process I
6 pages
Data Preprocesing JavaPoint
No ratings yet
Data Preprocesing JavaPoint
19 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
4 pages
ML (Prac1)
No ratings yet
ML (Prac1)
12 pages
CSL0777 L09
No ratings yet
CSL0777 L09
29 pages
Data Preprocessing Implementation 13112023 061217pm
No ratings yet
Data Preprocessing Implementation 13112023 061217pm
31 pages
Keyur ML A-1
No ratings yet
Keyur ML A-1
14 pages
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
No ratings yet
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
29 pages
Unit 2 Data Preprocessing
No ratings yet
Unit 2 Data Preprocessing
3 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
5 pages
Unit - II MLT
No ratings yet
Unit - II MLT
75 pages
Handling Missing Values in A Real-Time Dataset During
No ratings yet
Handling Missing Values in A Real-Time Dataset During
5 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
24 pages
4 Data Preprocessing
No ratings yet
4 Data Preprocessing
27 pages
Data Mining Lab Manual 2 2
No ratings yet
Data Mining Lab Manual 2 2
63 pages
Data Mining Lab Manual CSE VII Sem
No ratings yet
Data Mining Lab Manual CSE VII Sem
63 pages
Data Preprocessing in Python
No ratings yet
Data Preprocessing in Python
3 pages
Data Pre-Processing Steps
No ratings yet
Data Pre-Processing Steps
32 pages
Data Preparation
No ratings yet
Data Preparation
19 pages
Lecture 2 20022025 092902am
No ratings yet
Lecture 2 20022025 092902am
87 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
Lab 08 - Data Preprocessing
No ratings yet
Lab 08 - Data Preprocessing
9 pages
Dwdm-Lab Manual
No ratings yet
Dwdm-Lab Manual
39 pages
Data Preprocessing Python 1
No ratings yet
Data Preprocessing Python 1
3 pages
How To Prepare Your Dataset For Machine Learning in Python
No ratings yet
How To Prepare Your Dataset For Machine Learning in Python
14 pages
Data Mining Using Python Lab
100% (1)
Data Mining Using Python Lab
63 pages
Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning
No ratings yet
Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning
53 pages
Week-6 Lab Print
No ratings yet
Week-6 Lab Print
6 pages
r20 Datamining Lab (2-2 Sem Lab)
No ratings yet
r20 Datamining Lab (2-2 Sem Lab)
41 pages
Subject - Machine Learning Group - E27-24 Name
No ratings yet
Subject - Machine Learning Group - E27-24 Name
18 pages
Data Preparation For Machine Learning Mini Course
No ratings yet
Data Preparation For Machine Learning Mini Course
19 pages
Report Intership Chapters
No ratings yet
Report Intership Chapters
39 pages
AI Project Report: By: Neha Kalra (17csu122) and Prerna Pathak (17csu143)
No ratings yet
AI Project Report: By: Neha Kalra (17csu122) and Prerna Pathak (17csu143)
22 pages
Data Preprocessing Tutorial
No ratings yet
Data Preprocessing Tutorial
39 pages
Data Science Bootcamp (Day-01) (1) - Compressed
No ratings yet
Data Science Bootcamp (Day-01) (1) - Compressed
161 pages
Data Preprocessing For Machine Learning in Python
No ratings yet
Data Preprocessing For Machine Learning in Python
27 pages
Workflow of A Machine Learning Project
No ratings yet
Workflow of A Machine Learning Project
12 pages
To Artificial Intelligence: What Is Data Science?
100% (1)
To Artificial Intelligence: What Is Data Science?
131 pages
Lab3 ML Eac22050
No ratings yet
Lab3 ML Eac22050
8 pages
Module 5.pptx - 20250608 - 201231 - 0000
No ratings yet
Module 5.pptx - 20250608 - 201231 - 0000
43 pages
ML Lab Manual
No ratings yet
ML Lab Manual
90 pages
The Complete Guide To Data Preprocessing
No ratings yet
The Complete Guide To Data Preprocessing
50 pages
Python Basics Refresher
No ratings yet
Python Basics Refresher
19 pages
Prac 7
No ratings yet
Prac 7
5 pages
D P Lab Manual
No ratings yet
D P Lab Manual
54 pages
L2 - Machine Learning Process
No ratings yet
L2 - Machine Learning Process
17 pages
Data Preprocessing
No ratings yet
Data Preprocessing
9 pages
Course - Machine Learning A-Z - AI, Python & R + ChatGPT Prize (2025) - Udemy Business
No ratings yet
Course - Machine Learning A-Z - AI, Python & R + ChatGPT Prize (2025) - Udemy Business
18 pages
Data Preprocessing and Data Analysis Using Python
No ratings yet
Data Preprocessing and Data Analysis Using Python
32 pages
CSC407 - Chapter 2-3
No ratings yet
CSC407 - Chapter 2-3
46 pages
MODULE 5 Merged
No ratings yet
MODULE 5 Merged
22 pages
7 محاضرات
No ratings yet
7 محاضرات
36 pages
Unit 2
No ratings yet
Unit 2
19 pages
Lab Mannual of ML
No ratings yet
Lab Mannual of ML
43 pages
ML Lab Manual (Upto Cie-1)
No ratings yet
ML Lab Manual (Upto Cie-1)
33 pages
Week 4
No ratings yet
Week 4
2 pages
Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
En (1070)
100% (1)
En (1070)
1 page
Hyperlipidemia 1
No ratings yet
Hyperlipidemia 1
54 pages
Expt 4 Conclusion and Applications
0% (2)
Expt 4 Conclusion and Applications
2 pages
Jithin Original
No ratings yet
Jithin Original
2 pages
Source Follower: (Common-Drain Amplifier)
No ratings yet
Source Follower: (Common-Drain Amplifier)
40 pages
SDG Primer FINAL PDF
No ratings yet
SDG Primer FINAL PDF
238 pages
Detailed Lesson Plan
No ratings yet
Detailed Lesson Plan
6 pages
Simple Carburetor Operation
100% (2)
Simple Carburetor Operation
6 pages
Assessment Task 2: Activity No. 1
No ratings yet
Assessment Task 2: Activity No. 1
5 pages
D2R Season 9 Charger Paladin Build (D2R 2.8)
No ratings yet
D2R Season 9 Charger Paladin Build (D2R 2.8)
22 pages
Math 110-Fundamentals
No ratings yet
Math 110-Fundamentals
52 pages
Pro Proctor User Guide
No ratings yet
Pro Proctor User Guide
24 pages
Iot Sem 5
No ratings yet
Iot Sem 5
45 pages
Cbo - Elen4003a - 2023
No ratings yet
Cbo - Elen4003a - 2023
4 pages
Price-Rexroth Hydraulics Division
78% (9)
Price-Rexroth Hydraulics Division
512 pages
Impact of HL On QOL
No ratings yet
Impact of HL On QOL
8 pages
IELTS Simon Speaking Part 3 9dee133876
No ratings yet
IELTS Simon Speaking Part 3 9dee133876
37 pages
CTPAT Job Aid - Personnel Training Checklist Sample - October 2021
No ratings yet
CTPAT Job Aid - Personnel Training Checklist Sample - October 2021
4 pages
Position Paper
No ratings yet
Position Paper
2 pages
3.2. Perspectives On Listening Ho
No ratings yet
3.2. Perspectives On Listening Ho
35 pages
LDB MP2020 FRMWRK
No ratings yet
LDB MP2020 FRMWRK
77 pages
Rubric Ict
No ratings yet
Rubric Ict
1 page
Intermittent Fasting
100% (1)
Intermittent Fasting
36 pages
Unity TCP Open Block Library Users Manual
No ratings yet
Unity TCP Open Block Library Users Manual
124 pages
Venkat - AEM Developer
No ratings yet
Venkat - AEM Developer
4 pages
Oracle Database 12c: OR1, 5 Tage
No ratings yet
Oracle Database 12c: OR1, 5 Tage
1 page
RISK MGMT Chap III 2020 Sem II
No ratings yet
RISK MGMT Chap III 2020 Sem II
18 pages
Director of Training
No ratings yet
Director of Training
2 pages
Scamper Technique
No ratings yet
Scamper Technique
19 pages

Day11 Machine Learning

Uploaded by

Day11 Machine Learning

Uploaded by

NATIONAL INSTITUTE OF ELECTRONICS AND INFORMATION TECHNOLOGY

Sumit Complex, A-1/9, Vibhuti Khand, Gomti Nagar, Lucknow,

Setting Up User Accounts

Machine Learning using Python

Now, the ‘Purchased’ column of dataset will be assigned to y.

dataset=pd.read_csv(‘Data_for_preprocessing.csv', na_values=[' #‘,’NULL’])

dropna(): dropna() function is used to drop Rows/Columns with NaN values.

from sklearn.preprocessing import Imputer

fill_NaN = Imputer(missing_values='NaN', strategy='mean', axis=0)

#Imputing missing values of categorical column 'Country'

#Counting frequency of each category in 'Country' Column using

#Replacing the NaN values with 'France'

from sklearn.preprocessing import LabelEncoder

# Encode labels in column 'Country'.

Example: from sklearn.preprocessing import StandardScaler

#scaling ‘Age’ and ‘Salary’ Column only

from sklearn.preprocessing import MinMaxScaler

scalerX = MinMaxScaler(feature_range=(0, 1))

from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test =train_test_split(X,y,test_size = 0.20, random_state=42)

You might also like