0% found this document useful (0 votes)

11 views53 pages

Module 3 Data Science Machine Learning

This document provides an overview of machine learning, including its definitions, applications, and the modeling process. It discusses various techniques for feature engineering, model training, validation, and scoring, as well as libraries and tools used in data analysis. Additionally, it covers types of machine learning, including supervised, unsupervised, and semi-supervised learning, along with a case study on recognizing digits from images.

Uploaded by

Pratibha S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views53 pages

Module 3 Data Science Machine Learning

Uploaded by

Pratibha S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 53

MODULE 3

MACHINE LEARNING
“Machine learning is a field of study that
gives computers the ability to learn
without being explicitly programmed.”
—Arthur Samuel

“Machine learning is the process by which

a computer can work more accurately as
it collects and learns from the data it is
given.”
—Mike Roberts2
Applications of Machine Learning
•Finding oil fields, gold mines, or
archeological sites based on existing sites
(classification and regression)
•Finding place names or persons in text
(classification)
• Identifying people based on pictures or
voice recordings (classification)
• Recognizing birds based on their whistle
(classification)
•Identifying profitable customers
(regression and classification)
• Proactively identifying car parts that are
likely to fail (regression)
Applications of Machine Learning

• Identifying tumors and diseases

(classification)
• Predicting the amount of money a
person will spend on product X
(regression)
• Predicting the number of eruptions of a
volcano in a period (regression)
• Predicting your company’s yearly
revenue (regression)
• Predicting which team will win the
Champions League in soccer (regression)
Goal of Model building
(Root Cause Analysis)
Interpretation Not Prediction

Understanding and optimizing a

business process, such as
determining which products add
value to a product line

Discovering what causes diabetes

Determining the causes of traffic

jams
PACKAGES FOR WORKING WITH DATA IN MEMORY

 SciPy is a library that integrates fundamental

packages often used in scientific computing such
as NumPy, matplotlib, Pandas, and SymPy.

 NumPy gives you access to powerful array

functions and linear algebra functions.


■Matplotlib is a popular 2D plotting package with
some 3D functionality.

 Pandas is a high-performance, but easy-to-use,

data-wrangling package. It allows us to analyze big
data and make conclusions based on statistical
theories.
PACKAGES FOR WORKING
WITH DATA IN MEMORY
 SymPy is a package used for symbolic
mathematics and computer algebra.

 StatsModels is a package for statistical methods

and algorithms.

 Scikit-learn is a library filled with machine learning

algorithms.

 RPy2 allows you to call R functions from within

Python. R is a popular open source statistics
program.

 NLTK (Natural Language Toolkit) is a Python

toolkit with a focus on text analytics.
Libraries to optimize the

operations
Numba and NumbaPro—These use just-in-time compilation to
speed up applications written directly in Python and a few
annotations. NumbaPro also allowsyou to use the power of
your graphics processor unit (GPU).

 PyCUDA—This allows you to write code that will be executed

on the GPU instead of your CPU and is therefore ideal for
calculation-heavy applications. It works best with problems
that lend themselves to being parallelized and need little
input compared to the number of required computing cycles.
An example is studying the robustness of your predictions by
calculating thousands of different outcomes based on a single
start state.

 Cython, or C for Python—This brings the C programming

language to Python. C is a lower-level language, so the code
is closer to what the computer eventually uses (bytecode).
The closer code is to bits and bytes, the faster it executes. A
computer is also faster when it knows the type of a variable
Libraries to optimize the
operations
 Blaze —Blaze gives you data structures that can be
bigger than your computer’s main memory, enabling you
to work with large data sets.

 Dispy and IPCluster —These packages allow you to write

code that can be distributed over a cluster of computers.

 PP —Python is executed as a single process by default.

With the help of PP you can parallelize com putations on
a single machine or over clusters.

 Pydoop and Hadoopy—These connect Python to Hadoop,

a common big data framework.

 PySpark—This connects Python and Spark, an in-

memory big data framework.
Modeling Process Steps

1.Feature engineering and model

selection
2. Training the model
3. Model validation and selection
4 .Applying the trained model to
unseen data
Step 1:Feature engineering and model
selection
Feature engineering is the process of transforming raw data into relevant information for machine learning (ML) models to use

Types of Feature Selection:

 Filter Method: Based on the statistical
measure of the relationship between the feature
and the target variable. Features with a high
correlation are selected.
 Wrapper Method: Based on the evaluation of
the feature subset using a specific machine
learning algorithm. The feature subset that
results in the best performance is selected.
 Embedded Method: Based on the feature
selection as part of the training process of the
Feature engineering
What is a feature?
 A feature (also known as a variable or
attribute) is an individual measurable property
or characteristic of a data point that is used as
input for a machine learning algorithm.
 Features can be numerical, categorical, or
text-based, and they represent different
aspects of the data that are relevant to the
problem at hand.
 For example, in a dataset of housing prices,
features could include
 number of bedrooms,
 square footage, the location
 age of the property.
Feature Selection
Feature Selection is the process of
selecting a subset of relevant
features from the dataset to be
used in a machine-learning model.

Numerical Variables: Variable with

continuous values such as integer, float
Categorical Variables: Variables with
categorical values such as Boolean,
ordinal, nominals.
Example of Feature Selection in
Heart Failure Clinical Records Dat
aset
Step 1:Feature engineering and
model selection
Availability bias:
Sometimes the features selected are only the
ones which represents this one-sided “truth.”
Models suffering from availability bias often fail
when they’re validated because it becomes
clear that they’re not a valid representation of
the truth.
Availability bias can lead us to believe
information that is untrue or make decisions
without all the facts.

Availability bias is the human tendency to rely

on information that comes readily to mind when
evaluating situations or making decisions.
Step 2: Training the Model
• Train the model to learn with data
collected with few lines of code.

Underfitting occurs when our model

performance is very low in training stage as well
as testing stage.

Overfitting occurs when model gives very good

performance in training stage however when we
test it on testing data, it doesn't give good
performance/result.
Two factors for a good machine
learning model
 Bias –It is an error due to the model’s inability
to represent the true relationship between
input and output accurately. When a model
has poor performance both on the training
and testing data means high bias because of
the simple model, indicating underfitting.

 Variance -It’s the variability of the model’s

predictions for different instances of training
data. As a result, the model performs well on
the training data but poorly on the testing
data, indicating overfitting.
Reasons for Underfitting
If a model that performs exceptionally well on
its training data but poorly on the validation
set is likely overfitting.

 The model is too simple, So it may be not

capable to represent the complexities in the
data.
 The input features which is used to train the
model is not the adequate representations of
underlying factors influencing the target
variable.
 The size of the training dataset used is not
enough.
 Features are not scaled.
Reasons for Overfitting:
If a model shows poor performance
on testing and training sets, it's
probably underfitting.

 High variance and low bias.

The model is too complex.
The size of the training data.
Reasons for Overfitting:
If a model shows poor performance
on testing and training sets, it's
probably underfitting.

 High variance and low bias.

The model is too complex.
The size of the training data.
Step 3: Validating the Model
Validation is extremely important because
it determines whether your model works.
A good model has two properties: it has
good predictive power and it generalizes
well to data it hasn’t seen.
Two common error measures are
classification error rate for
classification problems
mean squared error for regression
problems
 The classification error rate is the
percentage of observations in the test
data set that your model mislabeled;
lower is better.

 The mean squared error measures how

big the average error of your prediction
is. Squaring the average error has two
consequences: you can’t cancel out a
wrong prediction in one direction with a
faulty prediction in the other direction.
Validation Techniques

Split Validation
K-fold cross validation
Leave One Out cross
validation
Cross validation
Cross validation is a technique used in machine
learning to evaluate the performance of a model on
unseen data. It involves dividing the available data
into multiple folds or subsets, using one of these folds
as a validation set, and training the model on the
remaining folds. This process is repeated multiple
times, each time using a different fold as the
validation set.
 An example of split validation in machine learning
is when a dataset is divided into three sets: training,
validation, and testing:
 Training set: Used to train the model
 Validation set: Used to validate the model
 Testing set: Used to test the model
K-fold cross validation
K-fold cross validation in machine learning cross-validation
is a powerful technique for evaluating predictive models in
data science. It involves splitting the dataset into k
subsets or folds, where each fold is used as the validation
set in turn while the remaining k-1 folds are used for
training.
n the K-Fold method, the dataset is divided into ‘k’
subsets, called folds. Using all, but one fold, training is
done. The fold left out is used in the evaluation of the
model once it is trained. This method performs k iterations
k times, in each of which, a different subset is reserved for
testing.
L.O.O.C.V. (Leave One Out
Cross Validation),
In the method of L.O.O.C.V. (Leave One Out Cross Validation),
the model is trained using the entire dataset, while leaving out
only one data point of the dataset, and then iterating for each
data point. One prominent benefit of this method is that all the
data points are used, thus it is low bias

LOOCV is appropriate when an accurate estimate of model

performance is critical. This particularly case when the dataset
is small, such as less than thousands of examples, can lead to
model overfitting during training and biased estimates of model
performance.

LOOCV is an extreme version of k-fold cross-validation that has

the maximum computational cost. It requires one model to be
created and evaluated for each example in the training dataset.
Validation Techniques
 Dividing your data into a training set with X%
of the observations and keeping the rest as a
holdout data set (a data set that’s never used
for model creation)—This is the most common
technique.

 K-folds cross validation—This strategy divides

the data set into k parts and uses each part
one time as a test data set while using the
others as a training data set. This has the
advantage that you use all the data available
in the data set.

 Leave-1 out—This approach is the same as k-

folds but with k=1. You always leave one
Regularization in Validation
 Regularization introduces a penalty for more
complex models, effectively reducing their
complexity and encouraging the model to learn more
generalized patterns.
 With L1 regularization you ask for a model with as
few predictors as possible. This is important for the
model’s robustness: simple solutions tend to hold
true in more situations.
 L2 regularization aims to keep the variance
between the coefficients of the predictors as small
as possible.
 Overlapping variance between predictors makes it
hard to make out the actual impact of each predictor.
Keeping their variance from overlapping will increase
interpretability. To keep it simple: regularization is
mainly used to stop a model from using too many
features and thus prevent over-fitting.
Regularization in Validation

Regularization is a technique
used to reduce errors by fitting
the function appropriately on the
given training set and avoiding
overfitting.
L1 reularuzation(Lasso
regularization)
L2 regularization (Ridge
regularization)
L1 norm calculates the sum of
the absolute values of the vector
elements,

L2 norm calculates the square

root of the sum of the squared
values of the vector element.
Step 4: Applying the trained
model to unseen data
The process of applying your model to
new data is called model scoring.

Two steps in Model Scoring

Prepare a data set that has features
exactly as defined by your model. This
boils down to repeating the data
preparation
Apply the model on this new data set,
and this results in a prediction.
Types of machine learning

 Supervised learning techniques attempt to

discern results and learn by trying to find
patterns in a labeled data set. Human
interaction is required to label the data.

 Unsupervised learning techniques don’t rely

on labeled data and attempt to find patterns in a
data set without human interaction.

 Semi-supervised learning techniques need

labeled data, and therefore human interaction,
to find patterns in the data set, but they can
still progress toward a result and learn even if
passed unlabeled data as well.
Supervised Machine Learning

CASE STUDY: DISCERNING DIGITS FROM

IMAGES
A simple Captcha control can be used to prevent
automated spam being sent through an online web
form.
Data Science Process
CASE STUDY: DISCERNING DIGITS
FROM IMAGES

Step 1: Setting up the Research

goal
Research goal is to let a computer
recognize numbers from Images.
Step 2: Data Collection or
fetching the digital image
data
The MNIST data set, which is often used in the
data science literature for teaching and
benchmarking. The MNIST images can be found
in the data sets package of Scikit-learn and are
already normalized for you (all scaled to the
same size: 64x64 pixels),
The MNIST database (Modified National Institute of Standards
and Technology database) is a large database of handwritten
digits that is commonly used for training various image
processing systems. The database is also widely used for
training and testing in the field of machine learning.
Step 4: Data Exploration using
Sclkit-learn

pl.matshow() returns a two-dimensional array (a

matrix) reflecting the shape of the image.
To flatten it into a list, we need to call reshape()
on digits.images.
The net result will be a one-dimensional array that
looks something like this:
We’ll turn an image into something usable by
the Naïve Bayes classifier by getting the
grayscale value for each of its pixels (shown
on the right) and putting those values in a list.
Image data classification problem on images of
digits
Confusion matrix produced by
predicting what number is
depicted by a blurryimage
Confusion matrix
A confusion matrix is a matrix showing
how wrongly (or correctly) a model
predicted,
how much it got “confused.”
In its simplest form it will be a 2x2 table
for models that try to classify
observations as being A or B.
Let’s say we have a classification
modelthat predicts whether somebody will
buy our newest product: deep-fried cherry
pudding.
We can either predict: “Yes, this person
will buy” or “No, this customer won’t buy.”
Confusion Matrix Example

The model was correct in (35+40) 75 cases

and incorrect in (15+10) 25 cases,resulting
in a (75 correct/100 total observations)
75% accuracy.
Inspecting predictions
vs
actual numbers
For each blurry image a number is
predicted; only the number 2 is is
interpreted as 8. Then an ambiguous
number is predicted to be 3 but it could
as well be 5; even to human eyes this
isn’t clear.
 Supervised learning aims to find a mapping or
relationship between the input variables and
the desired output, which enables the algorithm
to produce precise predictions or classifications
when faced with fresh, unobserved data.

 Aninput-output pair training set is given to the

algorithm during a supervised learning process.
For every example in the training set, the
algorithm iteratively modifies its parameters to
minimize the discrepancy between its predicted
output and the actual output (the ground truth).
This procedure keeps going until the algorithm
performs at an acceptable level.
Supervised learning can be divided into two
main types:
Regression: In regression problems, the goal
is to predict a continuous output or value. For
example, predicting the price of a house
based on its features, such as the number of
bedrooms, square footage, and location.
Classification: In classification problems, the
goal is to assign input data to one of several
predefined categories or classes. Examples
include spam email detection, image
classification (e.g., identifying whether an
image contains a cat or a dog), and sentiment
analysis.
Example of Supervised Learning
 Suppose there is a basket which is filled with some fresh fruits,
the task is to arrange the same type of fruits in one place.

 Also, suppose that the fruits are apple, banana, cherry, and
grape. Suppose one already knows from their previous
work (or experience) that, the shape of every fruit present in
the basket so, it is easy for them to arrange the same type of
fruits in one place. Here, the previous work is called training
data in Data Mining terminology.

 So, it learns things from the training data. This is because it

has a response variable that says that if some fruit has so and
so features then it is grape, and similarly for every fruit. This
type of information is deciphered from the data that is used to
train the model. This type of learning is called Supervised
Learning. Such problems are listed under
classical Classification Tasks.
Unsupervised Learning
 In unsupervised learning, the algorithm tries to find patterns,
structures, or relationships in the data without the guidance.
There are several common types of unsupervised learning
techniques:
 Clustering: Clustering algorithms aim to group similar data
points into clusters based on some similarity metric. K-
means clustering and hierarchical clustering are examples of
unsupervised clustering techniques.
 Dimensionality Reduction: These techniques aim to
reduce the number of features (or dimensions) in the data
while preserving its essential information. Principal
Component Analysis (PCA) and t-distributed Stochastic
Neighbor Embedding (t-SNE) are examples of dimensionality
reduction methods.
 Association: Association rule learning is used to discover
interesting relationships or associations between variables in
large datasets. The Apriori algorithm is a well-known
example used for association rule learning of labeled output.
Example of Unsupervised
Learning
 Suppose there is a basket and it is filled with some fresh fruits.
The task is to arrange the same type of fruits in one place. This
time there is no information about those fruits beforehand, it’s
the first time that the fruits are being seen or discovered So
how to group similar fruits without any prior knowledge about
them? First, any physical characteristic of a particular fruit is
selected. Suppose colour. Then the fruits are arranged based on
the color.
The groups will be something as shown below:
 RED COLOR GROUP: apples & cherry fruits.
 GREEN COLOR GROUP: bananas & grapes. So now, take
another physical character say, size, so now the groups will be
something like this.
 RED COLOR AND BIG SIZE: apple.
 RED COLOR AND SMALL SIZE: cherry fruits.
 GREEN COLOR AND BIG SIZE: bananas.
 GREEN COLOR AND SMALL SIZE: grapes.

Machine Learning
No ratings yet
Machine Learning
51 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Machine Learning
No ratings yet
Machine Learning
24 pages
Machine Learning
No ratings yet
Machine Learning
57 pages
CS601_Machine Learning_Unit 1_Notes_1672759748
No ratings yet
CS601_Machine Learning_Unit 1_Notes_1672759748
13 pages
Unit III
No ratings yet
Unit III
19 pages
LECTURE-2
No ratings yet
LECTURE-2
36 pages
Module_-1
No ratings yet
Module_-1
9 pages
ML Workshop
No ratings yet
ML Workshop
78 pages
Module 4
No ratings yet
Module 4
28 pages
Fire extinguisher prediction using machine learning report
No ratings yet
Fire extinguisher prediction using machine learning report
48 pages
Lecture 5 - Feature extraction, model building & evaluation
No ratings yet
Lecture 5 - Feature extraction, model building & evaluation
35 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
ML Fundamentals
No ratings yet
ML Fundamentals
15 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Introduction To ML
No ratings yet
Introduction To ML
55 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
Machine Learning
No ratings yet
Machine Learning
95 pages
Air quality prediction using machine learning
No ratings yet
Air quality prediction using machine learning
29 pages
machineLearning-unit1
No ratings yet
machineLearning-unit1
9 pages
Machine Learning Note (2)
No ratings yet
Machine Learning Note (2)
40 pages
Unit 5 Intro To Machine Learning
No ratings yet
Unit 5 Intro To Machine Learning
25 pages
Types of Machine Learning Algorithms
No ratings yet
Types of Machine Learning Algorithms
14 pages
ML - Part - A
No ratings yet
ML - Part - A
10 pages
Machine Learning Part: Domain Overview
No ratings yet
Machine Learning Part: Domain Overview
20 pages
(A) What Is Machine Learning? Explain The Impact of Various Machine Learning Techniques in Today's World
No ratings yet
(A) What Is Machine Learning? Explain The Impact of Various Machine Learning Techniques in Today's World
6 pages
module3_DS_ppt
No ratings yet
module3_DS_ppt
68 pages
Machine Learning with Python for Everyone (Addison Wesley Data & Analytics Series) 1st Edition, (Ebook PDF) - Download the ebook and explore the most detailed content
100% (1)
Machine Learning with Python for Everyone (Addison Wesley Data & Analytics Series) 1st Edition, (Ebook PDF) - Download the ebook and explore the most detailed content
60 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
1 - Intro to Machine Learning
No ratings yet
1 - Intro to Machine Learning
34 pages
ML Lecture Notes Unit-1
No ratings yet
ML Lecture Notes Unit-1
45 pages
An Introduction To Machine Learning
No ratings yet
An Introduction To Machine Learning
136 pages
Machine Learning for Data Science Unit-4
No ratings yet
Machine Learning for Data Science Unit-4
16 pages
ML SIG - Day 1
No ratings yet
ML SIG - Day 1
55 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
MLP IA1
No ratings yet
MLP IA1
26 pages
Lecture 2 Unit 1
No ratings yet
Lecture 2 Unit 1
60 pages
PT2
No ratings yet
PT2
22 pages
AIYA SESSION 4
No ratings yet
AIYA SESSION 4
42 pages
L2 - Machine Learning Process
No ratings yet
L2 - Machine Learning Process
17 pages
Lec2 Intro to ML
No ratings yet
Lec2 Intro to ML
35 pages
Week 4 - Intro to ML
No ratings yet
Week 4 - Intro to ML
37 pages
Unit 1 Machine Learning - PDF Lands
No ratings yet
Unit 1 Machine Learning - PDF Lands
5 pages
PPT-Final Project_DT- Done All Final
No ratings yet
PPT-Final Project_DT- Done All Final
14 pages
Chapter 01 machine learning
No ratings yet
Chapter 01 machine learning
22 pages
PUSHKAR
No ratings yet
PUSHKAR
15 pages
Supervised - ML Complete Book
No ratings yet
Supervised - ML Complete Book
153 pages
ML Unit 1
No ratings yet
ML Unit 1
74 pages
Data Science
No ratings yet
Data Science
64 pages
Week 2: Machine Learning Intro: Instructor: Ting Sun
No ratings yet
Week 2: Machine Learning Intro: Instructor: Ting Sun
21 pages
AI and ML For Business Antim Prahar WITH ANSWERS
No ratings yet
AI and ML For Business Antim Prahar WITH ANSWERS
26 pages
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
AIML-Unit 5 Notes-Assignment 5
No ratings yet
AIML-Unit 5 Notes-Assignment 5
24 pages
Lecture5
No ratings yet
Lecture5
26 pages
Chapter 4- Machine Learning
No ratings yet
Chapter 4- Machine Learning
81 pages
Final ML
No ratings yet
Final ML
2 pages
Module 2 - ML
No ratings yet
Module 2 - ML
53 pages
Silver Oak College of Computer Application: Subject:Machine Learning
No ratings yet
Silver Oak College of Computer Application: Subject:Machine Learning
15 pages
presenttion33
No ratings yet
presenttion33
2 pages
MLT - MKC
No ratings yet
MLT - MKC
10 pages
L2 MANUAL 200-750 kVA
100% (1)
L2 MANUAL 200-750 kVA
16 pages
Specification of Tokens
No ratings yet
Specification of Tokens
21 pages
SRDK Series
No ratings yet
SRDK Series
122 pages
2014 Winter Model Answer Paper
No ratings yet
2014 Winter Model Answer Paper
38 pages
Meen40030
No ratings yet
Meen40030
4 pages
Integration Checklist
No ratings yet
Integration Checklist
2 pages
Conditions Governing Connection Operation Small Scale Generation
No ratings yet
Conditions Governing Connection Operation Small Scale Generation
18 pages
Applied Mathematics-I syllabus
No ratings yet
Applied Mathematics-I syllabus
4 pages
Math 10 Worksheets 2nd Quarter
No ratings yet
Math 10 Worksheets 2nd Quarter
22 pages
AI shorts Last Minute
No ratings yet
AI shorts Last Minute
45 pages
XT4N 250 Ekip LS/I in 250A 3p F F
No ratings yet
XT4N 250 Ekip LS/I in 250A 3p F F
3 pages
Technical Manual FOB4 TS Section 3
No ratings yet
Technical Manual FOB4 TS Section 3
54 pages
Keywords in Python - Set 1, Set 2
No ratings yet
Keywords in Python - Set 1, Set 2
5 pages
IATC Intake Exam Guide for Candidates
No ratings yet
IATC Intake Exam Guide for Candidates
21 pages
BI Journal
No ratings yet
BI Journal
51 pages
HTS_Products_2023_11_13
No ratings yet
HTS_Products_2023_11_13
28 pages
Box & Pipe Culverts Quantities
No ratings yet
Box & Pipe Culverts Quantities
2 pages
Chap 1 - Introduction To Algorithms
No ratings yet
Chap 1 - Introduction To Algorithms
2 pages
AutoCadd Drawing 2d
No ratings yet
AutoCadd Drawing 2d
62 pages
G1111 9 Ed1.0 Framework For Acceptance of VTS Systems and Equipment
No ratings yet
G1111 9 Ed1.0 Framework For Acceptance of VTS Systems and Equipment
13 pages
ExportReport_1700128805207
No ratings yet
ExportReport_1700128805207
21 pages
DAA - Module 4 Notes 4TH SEM
No ratings yet
DAA - Module 4 Notes 4TH SEM
11 pages
Galileo (Satellite Navigation)
No ratings yet
Galileo (Satellite Navigation)
21 pages
Double Reduction: Worm Gear Units
No ratings yet
Double Reduction: Worm Gear Units
18 pages
New Questions Set
67% (3)
New Questions Set
65 pages
682 SMP Seaa C10L04
No ratings yet
682 SMP Seaa C10L04
5 pages
LP Series: Service Manual
No ratings yet
LP Series: Service Manual
38 pages
Why Processor Performance Is More Than Frequency and Core Counts v10 13 23
No ratings yet
Why Processor Performance Is More Than Frequency and Core Counts v10 13 23
7 pages
Using Open-Ended Tools in Facilitating Language Learning
100% (2)
Using Open-Ended Tools in Facilitating Language Learning
14 pages
Role Giam Sat Mach Cat Mvax 31
No ratings yet
Role Giam Sat Mach Cat Mvax 31
12 pages

Module 3 Data Science Machine Learning

Uploaded by

Module 3 Data Science Machine Learning

Uploaded by

MODULE 3

“Machine learning is the process by which

• Identifying tumors and diseases

Understanding and optimizing a

Discovering what causes diabetes

Determining the causes of traffic

 SciPy is a library that integrates fundamental

 NumPy gives you access to powerful array

 Pandas is a high-performance, but easy-to-use,

 StatsModels is a package for statistical methods

 Scikit-learn is a library filled with machine learning

 RPy2 allows you to call R functions from within

 NLTK (Natural Language Toolkit) is a Python

 PyCUDA—This allows you to write code that will be executed

 Cython, or C for Python—This brings the C programming

 Dispy and IPCluster —These packages allow you to write

 PP —Python is executed as a single process by default.

 Pydoop and Hadoopy—These connect Python to Hadoop,

 PySpark—This connects Python and Spark, an in-

1.Feature engineering and model

Types of Feature Selection:

Numerical Variables: Variable with

Availability bias is the human tendency to rely

Underfitting occurs when our model

Overfitting occurs when model gives very good

 Variance -It’s the variability of the model’s

 The model is too simple, So it may be not

 High variance and low bias.

 High variance and low bias.

 The mean squared error measures how

LOOCV is appropriate when an accurate estimate of model

LOOCV is an extreme version of k-fold cross-validation that has

 K-folds cross validation—This strategy divides

 Leave-1 out—This approach is the same as k-

L2 norm calculates the square

Two steps in Model Scoring

 Supervised learning techniques attempt to

 Unsupervised learning techniques don’t rely

 Semi-supervised learning techniques need

CASE STUDY: DISCERNING DIGITS FROM

Step 1: Setting up the Research

pl.matshow() returns a two-dimensional array (a

The model was correct in (35+40) 75 cases

 Aninput-output pair training set is given to the

 So, it learns things from the training data. This is because it

You might also like