0% found this document useful (0 votes)

19 views

Machine Learning

Uploaded by

sheenu117

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Machine Learning

Uploaded by

sheenu117

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Machine Learning:- we need to feed data into machine for the machine to learn.

Machine Learning is the process of helping the machine Learn how to take decision logically

It will use data and algorithm to learn also it is the subset of AI

Ex-netflix and amazon built such machine learning model by using tons of data inorder to identify
profitable opportunity and avoid risk

The term machine learning was first coined by Arthur Samuel in the year 1959.

First formal definition of ML was given by Tom M “ a computer program is set to learn from experience E
with respect to some classes of task T and performance measure P if its performance at task in T as
measured by P improves with Experience E.”

The need of machine learning

By the help of Ml we can solve the complex problems

We can uncover the pattern and trends in data

We can improve decision making skills

We can do predictive modeling

We can solve complex problems with marginal level of accuracy.

Machine Learning Definition

Algorithm:A Machine learning algo is a set of rules and statistical technique used to learn patterns from
data and draw the significant information from it.logic behind ML model.

Model: A model is main component of ML. A model is trained by using machine learning algorithm.

An algorithm maps all the decision that a model is supposed to take on the given input inorder to get
the correct output.

Predictor Variable:- it is the feature of the data that can be used to predict the output.

Response variable: It is the output variable that needs to be predicted by using the predictor variable.

Training data:the ML Model is built using training data and training data helps to identify key trends and
patterns to predict the output.

Testing Data: After the model is trained it must be tested to evaluate how accurately it can predict the
outcome.

Locality Carpet Area No. of rooms House price

Predictor variable Predictor variable Predictor variable Response variable
Basic Pipeline- Data, Train the machine,Build the model,Predicting Outcome.

Data-Define the objective-Preparing the Data-Exploring the data-Building Model-Model evaluation-Best

Model selected-Testdata-Prediction on Test Data.

Machine Learning Types:

Supervised Learning- Teach Machine or train machine using the data which is well labelled.

Unsupervised Learning- We will train the machine using unlabeled data and we are training the machine
without any guidance

Reinforcement Learning- Part of Machine Learning where an agent is put in an environment and
behaves in this environment and he learns to behave in this environment by observing the rewards
which it gets from those actions

Eg.Alexa,Self Driven Cars

Machine Reinforcement Learning

Learning

UnSupervised ML

Supervised machine Learning

1 Numeric-Regression supervised learning

2 catagorical-Classification supervised machine learning

Unsupervised:-Clustering technique

Regression Classification clustering

Supervised Supervised Unsupervised
Numeric format catagorical clusters
Forecast or Predict Compute category Making of similar item clusters
House Price Classification of species of iris All the transaction which are
dataset fraudulent in nature
Linear regression Logistic regression K means
Conversion of raw data into the meaningfull form and this is called data preprocessing.

Convert data into useful format.

To make our data ready for model building.

Steps involved in data preprocessing

1 Handling Missing Values

2 treating Outlier

3 Scaling the dataset

4 Encoding the categorical variable

Handling the missing values- To avoid lost data

To avoid skip of patterns and trends.

Methods of Handling Missing values:

1 Methods- Mean/median/mode imputation mean can be used when the variable is numeric and
having normal distribution.

Median can be used to fill missing valued when variable is numeric and it is skewed.

More can be used when our variable is in categorical format.

2 Random Sample Imputation: In this method random value from existing set of value is taken and
used to fill the missing value. It is easy and variance is same as original dataset.

3 Capturing NAN value with the new feature: This method is used where the data is missing due to
some cause so in that case we can create a new feature in dataframe which will store the null value
for that particular feature. Easy to implement easy to identify where missing value is.

4 End of Distribution: We will the missing value with some extreme value of the feature.

Extreme value=mean + - 3*(std)

5 Arbitrary value imputation:We can fill the missing value with any value in dataset. It is purely
judgement based decision.

6: Frequent category Imputation: When the variable is categorical best way to fill missing value with
the more popular class mode.

7: Using KNN imputation: K nearest neighbor: if some values are missing at 6 th place baased on
neighbours we are filling missing values by using 5th and 7th place value k nearest neighbor.

8: dropping all the NAN Values: If in dataset for a particular 60% or more is missing then it is advised
to drop that particular feature.
Handling the outliers:Differ significantly from the rest of the data called outliers.

Visualisation and mathematical way of dealing with outlier.

Boxplot

Scatterplot

IQR

ZScore

3 feature Scaling:-When we are having different feature vary over different ranges it is difficult to put
them in a model

The process of bringing features of all features together in the same range is called scaling.

Theres a different method of feature scaling

1 Absolute max scaling: According to this method every value is divide by the max value of the feature
convert all the value from -1 to +1 prone to outliers.

Marks Abs max scaling Min max scaling Normalisation

10 10/300 10-10/300-10=0 10-110/300-10
20 20/300 20-10/300- 20-110/300-10
10=10/290=0.034
300 300/300 300-10/300-10=1 300-110/300-10

2 Min Max Scaling-This method follows the formuls

x-xmin/xmax-xmin

this will create the range from 0 to 1.

3 Normalisation= x-xmean/xmax-xmin

4 Standardisation: Converts each value of a feature to a zscore suited when data is normally distributed

Z=x-xmean/std

5 robust scaling: when the dataset is skewed this method will come to the picture

Scaled value=x-xmedian/IQR

Method formula pythoncode

1 Absolute max scaling x/xmax MaxAbsScaler()
2 min max scaler x-xmin/xmax-xmin MinMaxScaler()
3 Normalisation x-xmean/xmax-xmin normalize
4 Standardisation x-xmean/xstd StandardScaler()
Robust Scaling x-xmedian/iqr RobustScaling
Encoding the Dataset

Encoding the dataset:

Encode the categorical column into numerical

Machine learning model may fail to incorporate these variable that’s why it is important to convert them
into simple numeric codes process is called as encoding

Two ways by which we do encoding

1 nominal encoding

2 ordinal encoding

Nominal Encoding:- Present in ordered form increase the features in the dataset.

10 categories

N number of category in a feature then it will creature n additional feature in dataset.

1 hot encoding

color 1 hot encoding red yellow green

red Red,yellow,green 1 0 0
red 1 0 0
yellow 0 1 0
green 0 0 1
yellow 0 1 0
Disadvantage of 1 hot encoding that it will create additional feature in the dataset that’s why it lead to
cause curse of dimensionality problem

It refers to the scenario it will increase the excessive no of feature in dataset leads to decrease model
performance.

It will difficult to design model in higher dimensional space .

In Higher dimensional space it will take more processing time for model to work along with increase in
noise and error and ultimately decrease the accuracy.

Ordinal Encoding:in the ordinal encoding the label encoding is there which will encode each category of
feature in same column.

state stateonly
Maharashtra 3
Delhi 0
karnataka 2
Gujarat 1
TamilNadu 4

Machine Learning Notes
100% (10)
Machine Learning Notes
19 pages
Final ML
No ratings yet
Final ML
2 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Machine Learning
No ratings yet
Machine Learning
28 pages
Lecture-2-20022025-092902am
No ratings yet
Lecture-2-20022025-092902am
87 pages
Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning
No ratings yet
Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning
53 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
AIML NOTES
No ratings yet
AIML NOTES
12 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
8 pages
MLE
No ratings yet
MLE
15 pages
ML-1-PPT-UNIT-1
No ratings yet
ML-1-PPT-UNIT-1
93 pages
AI Unit 1
No ratings yet
AI Unit 1
30 pages
Cse3001 Ai Ml m2
No ratings yet
Cse3001 Ai Ml m2
118 pages
data processing
No ratings yet
data processing
19 pages
ML_MDU_2024_10939237
No ratings yet
ML_MDU_2024_10939237
20 pages
Workflow of A Machine Learning Project
No ratings yet
Workflow of A Machine Learning Project
12 pages
ML 01
No ratings yet
ML 01
24 pages
Ml Notes All
No ratings yet
Ml Notes All
32 pages
ML-chap-2
No ratings yet
ML-chap-2
60 pages
UNIT-1,2,3
No ratings yet
UNIT-1,2,3
30 pages
Lecture5
No ratings yet
Lecture5
26 pages
Basics of Machine Learning1
No ratings yet
Basics of Machine Learning1
67 pages
ML Notes
No ratings yet
ML Notes
79 pages
AIYA SESSION 4
No ratings yet
AIYA SESSION 4
42 pages
Introduction To ML
No ratings yet
Introduction To ML
55 pages
ML_1
No ratings yet
ML_1
13 pages
ML IMP QUES 1
No ratings yet
ML IMP QUES 1
22 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
Feature Engineering
No ratings yet
Feature Engineering
15 pages
Social Media Analytics Techniques[1] (1)
No ratings yet
Social Media Analytics Techniques[1] (1)
77 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Machine Learning
No ratings yet
Machine Learning
34 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
DSF - UNIT III Notes
No ratings yet
DSF - UNIT III Notes
17 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Machine Learning Summer Training
No ratings yet
Machine Learning Summer Training
118 pages
Developing A Machining Learning Models From Start To Finish.
No ratings yet
Developing A Machining Learning Models From Start To Finish.
59 pages
A Short Guide For Feature Engineering and Feature Selection
No ratings yet
A Short Guide For Feature Engineering and Feature Selection
32 pages
My Notes
No ratings yet
My Notes
15 pages
Presentation
No ratings yet
Presentation
10 pages
Summary - Data Analytics& Machine Learning
No ratings yet
Summary - Data Analytics& Machine Learning
18 pages
Unit 4_Question Bank and answers
No ratings yet
Unit 4_Question Bank and answers
23 pages
ML_DA
No ratings yet
ML_DA
55 pages
ML Unit1
No ratings yet
ML Unit1
25 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
Machine learning assignment (3) (1)
No ratings yet
Machine learning assignment (3) (1)
5 pages
Machine learning assignment (3)
No ratings yet
Machine learning assignment (3)
5 pages
ML Intro Theory
No ratings yet
ML Intro Theory
10 pages
ML_notion_1
No ratings yet
ML_notion_1
18 pages
Data Preprocessing in Machine Learning[1]
No ratings yet
Data Preprocessing in Machine Learning[1]
24 pages
ML GTU Solution
No ratings yet
ML GTU Solution
83 pages
Unit II
No ratings yet
Unit II
14 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
15 pages
ML Revision
No ratings yet
ML Revision
207 pages
18ai61-Model Question Paper Solutions
No ratings yet
18ai61-Model Question Paper Solutions
71 pages
Fundamentals of ML Recap
No ratings yet
Fundamentals of ML Recap
21 pages
Unit4_PPT (2)
No ratings yet
Unit4_PPT (2)
126 pages
Data - Analytics - Chapter 2
No ratings yet
Data - Analytics - Chapter 2
58 pages
Emec D 22 01052
No ratings yet
Emec D 22 01052
30 pages
Functional Information Systems
0% (1)
Functional Information Systems
33 pages
Measuring Firms Financial Constraints Evidence Fo
No ratings yet
Measuring Firms Financial Constraints Evidence Fo
40 pages
Reforms in Unicef (Mayank Gera) (X-R)
No ratings yet
Reforms in Unicef (Mayank Gera) (X-R)
33 pages

Machine Learning

Uploaded by

Machine Learning

Uploaded by

Machine Learning:- we need to feed data into machine for the machine to learn.

It will use data and algorithm to learn also it is the subset of AI

The need of machine learning

By the help of Ml we can solve the complex problems

We can uncover the pattern and trends in data

We can improve decision making skills

We can do predictive modeling

We can solve complex problems with marginal level of accuracy.

Machine Learning Definition

Locality Carpet Area No. of rooms House price

Data-Define the objective-Preparing the Data-Exploring the data-Building Model-Model evaluation-Best

Machine Learning Types:

Eg.Alexa,Self Driven Cars

Machine Reinforcement Learning

Supervised machine Learning

1 Numeric-Regression supervised learning

2 catagorical-Classification supervised machine learning

Regression Classification clustering

Convert data into useful format.

To make our data ready for model building.

Steps involved in data preprocessing

1 Handling Missing Values

3 Scaling the dataset

4 Encoding the categorical variable

Handling the missing values- To avoid lost data

To avoid skip of patterns and trends.

Methods of Handling Missing values:

More can be used when our variable is in categorical format.

Extreme value=mean + - 3*(std)

Visualisation and mathematical way of dealing with outlier.

Theres a different method of feature scaling

Marks Abs max scaling Min max scaling Normalisation

2 Min Max Scaling-This method follows the formuls

this will create the range from 0 to 1.

Method formula pythoncode

Encoding the dataset:

Encode the categorical column into numerical

Two ways by which we do encoding

N number of category in a feature then it will creature n additional feature in dataset.

color 1 hot encoding red yellow green

It will difficult to design model in higher dimensional space .

You might also like