0% found this document useful (0 votes)

65 views

Machine Learning: What Is Data Science

The document provides an overview of machine learning, describing how it uses data to build models to make predictions. It discusses the differences between supervised and unsupervised learning, with supervised learning using input and output data to predict outputs, and unsupervised learning using only input data to discover patterns. The document also gives examples of applications of machine learning like product recommendations, advertising evaluation, and medical research.

Uploaded by

fauzansaadon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views

Machine Learning: What Is Data Science

Uploaded by

fauzansaadon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

23/02/2017

Machine Learning

SKEM4173
Artificial Intelligence

What is Data Science

Typical predictive analytic Managing the process
goals: that can transform
• who will win an hypotheses & data
Machine
election into actionable
Computer Learning
• what products will sell predictions
well together Science
• which loans will default
• which advertisements Data scientist is responsible
will be clicked on for:
Statistics
• acquiring data
• managing data
• choosing modelling
technique
• writing the code
• verifying results

Predictive Models

1
23/02/2017

What is Data Science

Some famous examples

Amazon’s product Google’s LinkedIn’s contact

recommendation advertisement recommendation
systems valuation systems system

Walmart’s
Twitter’s trending
consumer demand
topics
projection systems

Data Science Applications

2
23/02/2017

Data Science Applications

3
23/02/2017

Overview of Machine Learning

• Refers to a vast set of tools for understanding data
• These tools can be classified as supervised or unsupervised
• Broadly speaking,
• supervised machine learning involves building a statistical model for
predicting, or inferring, an output based on one or more inputs.
Problems of this nature occur in fields as diverse as business, medicine,
astrophysics, and public policy
• with unsupervised machine learning, there are inputs but no
supervising output; nevertheless we can learn relationships &
structure from such data

Overview of Machine
Learning
Wage Data
Wage data, which contains income
survey information for males from the
central Atlantic region of the United
States
Left: wage as a function of age. On
average, wage increases with age until
about 60 years of age, at which point it
begins to decline
Center: wage as a function of year.
There is a slow but steady increase of
approximately $10,000 in the average
wage between 2003 and 2009
Right: Boxplots displaying wage as a
function of education, with 1
indicating the lowest level (no high
school diploma) & 5 the highest level Age Model Wage
(an advanced graduate degree). On
average, wage increases with the level
of education

4
23/02/2017

Overview of Machine
Learning
Gene Expression Data
Left: Representation of the NCI60 gene
expression data set in a two-
dimensional space, & . Each point
corresponds to one of the 64 cell lines.
There appear to be 4 groups of cell
lines, which we have represented
using different colours
Right: Same as left panel except that
we have represented each of the 14
different types of cancer using a
different coloured symbol. Cell lines
corresponding to the same cancer type
tend to be nearby in the 2-dimensional
space
Dataset
Cancer

What is Machine Learning

• Suppose that we are consultants hired by a client to provide
advice on how to improve sales of a particular product
• The Advertising data set consists of the sales of that product in
200 different markets, along with advertising budgets for the
product in each of those markets for 3 different media: TV,
radio and newspaper

5
23/02/2017

What is Machine Learning?

The Advertising data set. The plot
displays sales, in thousands of units, as
a function of TV, radio & newspaper
budgets, in thousands of dollars, for
200 different markets. In each plot we
show the simple least squares fit of
sales to that variable. In other words,
each blue line represents a simple
model that can be used to predict
sales using TV, radio, & newspaper,
respectively

What is Machine Learning

• It is not possible for our client to directly increase sales of the
product. On the other hand, they can control the advertising
expenditure in each of the 3 media
• Therefore, if we determine that there is an association
between advertising & sales, then we can instruct our client to
adjust advertising budgets, thereby indirectly increasing sales
• In other words, our goal is to develop an accurate model that
can be used to predict sales on the basis of the 3 media
budgets
• In this setting, the advertising budgets are input variables
(predictors) while sales is an output variable (response)
• - TV budget, - radio budget, - newspaper budget

6
23/02/2017

What is Machine Learning

• Generally, suppose we observe a quantitative response &
different predictors, , , … ,
• We assume that there is some relationship between & =
( , , … , ), i.e.
= +
• Here is some fixed but unknown function of , … , , & is
a random error term, which is independent of & has mean
zero
• In this formulation, represents the systematic information
that provides about

What is Machine Learning?

The Income data set
Left: The red dots are the observed
values of income (in tens of thousands
of dollars) & years of education for 30
individuals
Right: The blue curve represents the
true underlying relationship between
income & years of education, which is
generally unknown (but is known in
this case because the data were
simulated). The black lines represent
the error associated with each
observation. Note that some errors are
positive (if an observation lies above
the blue curve) & some are negative (if
an observation lies below the curve).
Overall, these errors have
approximately mean zero

7
23/02/2017

What is Machine Learning?

The plot displays income as a function
of years of education & seniority in the
Income data set. The blue surface
represents the true underlying
relationship between income & years
of education & seniority, which is
known since the data are simulated.
The red dots indicate the observed
values of these quantities for 30
individuals

In essence, machine learning refers to

a set of approaches for estimating
.

Types of Machine Learning

Techniques
Most machine learning
problems fall into 1 of 2 Supervised
categories Unsupervised

• For each observation of the predictors , • For every observation = 1, . . . , , we

= 1, . . . , there is an associated response observe a vector but no associated
response
• Wish to fit a model that relates the • No response variable to predict
response to the predictors. • Referred to as unsupervised because we lack
• Aim: to accurately predict the response for a response variable that can supervise our
future observations (prediction) or to analysis
better understand the relationship • Aim: to understand the relationships
between the response & the predictors between the variables or between the
(inference) observations
• Methods: linear regression & logistic • Method: cluster analysis, or clustering. Goal:
regression to ascertain whether the observations fall
into relatively distinct groups

8
23/02/2017

Types of Machine Learning

Techniques
Machine
Learning

Prediction Reason Reason

Supervised Unsupervised Inference
Inference

Regression y is numeric
x Model y

Classification y is class

Exercises
• Explain whether each scenario is a classification or regression problem, & indicate
whether we are most interested in inference or prediction.
• We collect a set of data on the top 500 firms in the US. For each firm we record
profit, number of employees, industry & the CEO salary. We are interested in
understanding which factors affect CEO salary.
• We are considering launching a new product & wish to know whether it will be
a success or a failure. We collect data on 20 similar products that were
previously launched. For each product we have recorded whether it was a
success or failure, price charged for the product, marketing budget,
competition price, & ten other variables.
• We are interesting in predicting the % change in the US dollar in relation to the
weekly changes in the world stock markets. Hence we collect weekly data for
all of 2012. For each week we record the % change in the dollar, the % change
in the US market, the % change in the British market, & the % change in the
German market.

9
23/02/2017

Exercises
• You will now think of some real-life applications for machine learning.
• Describe three real-life applications in which classification might be useful.
Describe the response, as well as the predictors. Is the goal of each application
inference or prediction? Explain your answer.
• Describe three real-life applications in which regression might be useful.
Describe the response, as well as the predictors. Is the goal of each application
inference or prediction? Explain your answer.
• Describe three real-life applications in which cluster analysis might be useful.

Flow of Creating & Evaluating Models

Model evaluation
• Quantifying the performance of a
model
• Must use a measure of model
performance that’s appropriate to
both the original business goal & the
chosen modelling technique

Predicting who would default on Predicting revenue lost to

loans (classification) defaulting loans (regression)
Accuracy RMSE

Precision

10
23/02/2017

Flow of Creating & Evaluating Models

Model validation
• Generation of an assurance that the
model will work in production as it
worked during training
• Biggest cause of model validation
failures – not having enough training
data to represent the variety of what
may later be encountered in
production

Test & Training Splits

• When you’re building a model to make predictions, you need
data to build the model (training set)
• You also need data to test whether the model makes correct
predictions on new data (test or hold-out set)

Training
Dataset Test set
set

Data that you feed to the model-building Data that you feed into the resulting
algorithm (regression, decision tree, etc.) model, to verify that the model’s
so that the algorithm can set the correct predictions are accurate
parameters to best predict the outcome
variable

11
23/02/2017

Evaluating Classification Models

• When building a model, the 1st thing to check is if the model
even works on the data it was trained from
• Example of classifying email into spam (email we in no way
want) & non-spam (email we want)
• Summary of classifier performance – confusion matrix (table
that summarizes the classifier’s predictions against the actual
known data categories)
Confusion matrix
Predicted condition

Negative Positive
True Negative TN FP
condition
Positive FN TP

Evaluating Classification Models

Measures of Classifier
Performance

Accuracy Precision Recall

• For a classifier, accuracy is defined as the number of items categorized correctly divided by
total number of items – what fraction of the time the classifier is correct
• Accuracy = = (cM[1,1] + cM[2,2]) / sum(cM) = 92%
• The error of around 8% is unacceptably high for a spam filter!

12
23/02/2017

Validating Models
• Model evaluation: performance of the model on training data
• Biggest worry: validity of the model – will it show similar
quality on new data in production?
• Model validation: testing of a model on new data (test set)

Validating Models
A common model problem: Overfitting
• An overfit model looks
great on the training
data & performs poorly
on new data
• Memorized the training
data instead of
discovering generalizable
rules or patterns
• Overfit model is bad:
– more complicated
than anything useful
– less accurate in
production

13
23/02/2017

Ensuring Model Quality

• The data used to build a model is not the best data for testing
the model’s performance
• Because this data was seen during model construction, &
model construction is optimizing your performance measure,
you tend to get exaggerated measures of performance on your
training data
• Perform all of your clever work on the training data alone, &
delay measuring your performance with respect to your test
data until as late as possible in your project – testing on held-
out data

Decision Trees
• Decision tree predict responses to data
• To predict a response, follow the decisions in the tree from the
root (beginning) node down to a leaf node. The leaf node
contains the response.

14
23/02/2017

Decision Trees
• This tree predicts classifications based on 2 predictors, x1 & x2
• To predict, start at the top node, represented by a triangle (Δ).
The 1st decision is whether x1<0.5. If so, follow the left branch,
& see that the tree classifies the data as type 0.
• If x1>=0.5, then follow the right branch to the lower-right
triangle node. Here the tree asks if x2<0.5. If so, then follow
the left branch to see that the tree classifies the data as type 0.
If not, then follow the right branch to see that the that the tree
classifies the data as type 1.

Questions
1. What is machine learning?
2. Explain machine learning techniques and its categories
3. Why do we need to evaluate our model?
4. Why do we need a portion of our data called test data?
5. Why do we need to validate our model?

Predictive Analytics Updated
No ratings yet
Predictive Analytics Updated
30 pages
Machine Learning Basics: An Illustrated Guide For Non-Technical Readers
100% (3)
Machine Learning Basics: An Illustrated Guide For Non-Technical Readers
27 pages
Machine Learning Unit 1
100% (7)
Machine Learning Unit 1
112 pages
Bernard E. Harcourt - Exposed - Desire and Disobedience in The Digital Age (2015, Harvard University Press)
100% (2)
Bernard E. Harcourt - Exposed - Desire and Disobedience in The Digital Age (2015, Harvard University Press)
375 pages
Who Runs Media in Peru
No ratings yet
Who Runs Media in Peru
544 pages
INTRODUCTION
No ratings yet
INTRODUCTION
51 pages
Unit 1-1
No ratings yet
Unit 1-1
32 pages
Tesla Stock Marketing Price Prediction
No ratings yet
Tesla Stock Marketing Price Prediction
62 pages
Lecture 1 (21.02.2022)
No ratings yet
Lecture 1 (21.02.2022)
19 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
9 pages
Unit Iii Supervised Learning
No ratings yet
Unit Iii Supervised Learning
67 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
9 pages
Machine Learning
No ratings yet
Machine Learning
51 pages
Machine Learning and Regression
No ratings yet
Machine Learning and Regression
8 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
Big-Data Unit-3
100% (1)
Big-Data Unit-3
54 pages
AI Session 3 Machine Learning Slides
No ratings yet
AI Session 3 Machine Learning Slides
35 pages
1. Intro to Machine Learning
No ratings yet
1. Intro to Machine Learning
32 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
68 pages
Machine Learning by Sahil
No ratings yet
Machine Learning by Sahil
15 pages
ML 2
No ratings yet
ML 2
4 pages
UNIT I-Machine Learning
No ratings yet
UNIT I-Machine Learning
68 pages
Machine Learning Basics: An Illustrated Guide For Non-Technical Readers
50% (2)
Machine Learning Basics: An Illustrated Guide For Non-Technical Readers
27 pages
Project
No ratings yet
Project
12 pages
ETI microproject
No ratings yet
ETI microproject
11 pages
Linear Regression for ML ass
No ratings yet
Linear Regression for ML ass
99 pages
Iu 3.6.4 ML 101
No ratings yet
Iu 3.6.4 ML 101
39 pages
Task The Problems That Can Be Solved With Machine Learning
No ratings yet
Task The Problems That Can Be Solved With Machine Learning
9 pages
ML OL Introduction
No ratings yet
ML OL Introduction
11 pages
MachineLearning Jan2nd
100% (2)
MachineLearning Jan2nd
171 pages
Day 1 Intro To DS and ML - New
No ratings yet
Day 1 Intro To DS and ML - New
41 pages
Machine Learning
No ratings yet
Machine Learning
11 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
12 pages
ML_Introduction
No ratings yet
ML_Introduction
76 pages
ML1 Foundations
No ratings yet
ML1 Foundations
39 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
10 pages
Machine Learning Practical File
No ratings yet
Machine Learning Practical File
41 pages
Unit3
No ratings yet
Unit3
80 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
11 pages
DS-05 Introduction To Machine Learning
No ratings yet
DS-05 Introduction To Machine Learning
103 pages
Evolution of Machine Learning
No ratings yet
Evolution of Machine Learning
7 pages
Lecture 1 - Introduction To ML
No ratings yet
Lecture 1 - Introduction To ML
25 pages
2.WhyMachineLearning.pdf
No ratings yet
2.WhyMachineLearning.pdf
27 pages
Machine Learning (Chapter1)
No ratings yet
Machine Learning (Chapter1)
8 pages
21CSC305P ML_ Unit 1-E.pptx
No ratings yet
21CSC305P ML_ Unit 1-E.pptx
137 pages
1. Chapter 1 Introduction to ML
No ratings yet
1. Chapter 1 Introduction to ML
52 pages
Lect3 Machine Learning
No ratings yet
Lect3 Machine Learning
27 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Fundamentals of Machine Learning II
No ratings yet
Fundamentals of Machine Learning II
13 pages
Machine Learning
100% (1)
Machine Learning
63 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
68 pages
92991v00 Machine Learning Section1 Ebook PDF
No ratings yet
92991v00 Machine Learning Section1 Ebook PDF
12 pages
11 Introduction To Machine Learning
No ratings yet
11 Introduction To Machine Learning
13 pages
Unit III
No ratings yet
Unit III
19 pages
Machine Learning For Beginners Overview of Algorithm TypesStart Learning Machine Learning From Here
No ratings yet
Machine Learning For Beginners Overview of Algorithm TypesStart Learning Machine Learning From Here
13 pages
Data Science ML Learning Demo
No ratings yet
Data Science ML Learning Demo
34 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Essentials of Data Analysis
From Everand
Essentials of Data Analysis
Agasti Khatri
No ratings yet
Machine Learning in Healthcare
From Everand
Machine Learning in Healthcare
Vaibhav Rupapara
No ratings yet
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
From Everand
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
Steven Taylor
No ratings yet
(#) iR-ADV DX C3730s (V3.12) MN-CONT v27.30 - EN - 01.0
No ratings yet
(#) iR-ADV DX C3730s (V3.12) MN-CONT v27.30 - EN - 01.0
3 pages
Entrepreneurial Mindset 1 - Characteristic and Competency PDF
No ratings yet
Entrepreneurial Mindset 1 - Characteristic and Competency PDF
40 pages
Design Thinking and Business Model Canvas For Mobile Economy
No ratings yet
Design Thinking and Business Model Canvas For Mobile Economy
84 pages
Individual Assignment (10%) SKEM4173 Artificial Intelligence Session 2017/2018-2
No ratings yet
Individual Assignment (10%) SKEM4173 Artificial Intelligence Session 2017/2018-2
1 page
CI 17182 SKEM4173 Artificial Intelligence
No ratings yet
CI 17182 SKEM4173 Artificial Intelligence
5 pages
4 Simple ANN
No ratings yet
4 Simple ANN
24 pages
Ethical Issues in Advertising
100% (5)
Ethical Issues in Advertising
61 pages
booklet-speaking-joy-english
No ratings yet
booklet-speaking-joy-english
61 pages
Assignment of International Business On Google: Dr. Shibly Noman Khan
No ratings yet
Assignment of International Business On Google: Dr. Shibly Noman Khan
15 pages
A Project Report On Marketing Strategy With Respect To Amul Ice Cream
50% (4)
A Project Report On Marketing Strategy With Respect To Amul Ice Cream
30 pages
F 8 T 161 P 960 N 1
No ratings yet
F 8 T 161 P 960 N 1
7 pages
Marketing To The Digital Consumer
No ratings yet
Marketing To The Digital Consumer
3 pages
The Hershey Company
No ratings yet
The Hershey Company
31 pages
Trade Marketing
No ratings yet
Trade Marketing
5 pages
Compe Titive Analysis of Yahoo
No ratings yet
Compe Titive Analysis of Yahoo
4 pages
Staff Requisition Form
No ratings yet
Staff Requisition Form
2 pages
Cammack Letter To Facebook On Cartel Ads
No ratings yet
Cammack Letter To Facebook On Cartel Ads
2 pages
The Effects of Background Music in Advertising
No ratings yet
The Effects of Background Music in Advertising
7 pages
Direct Marketing - Response
No ratings yet
Direct Marketing - Response
8 pages
Report Milka in Engleza
No ratings yet
Report Milka in Engleza
2 pages
Questionnaire: Personality and Age and Its Relation To The Buying Behaviour of Organic Products Among Consumers
No ratings yet
Questionnaire: Personality and Age and Its Relation To The Buying Behaviour of Organic Products Among Consumers
7 pages
Marketing of Perishable Goods in Bangladesh
No ratings yet
Marketing of Perishable Goods in Bangladesh
15 pages
LinkedIn Quickstart Cheatsheet
No ratings yet
LinkedIn Quickstart Cheatsheet
7 pages
Trap-Ease America: The Big Cheese of Mousetraps: Company Case
No ratings yet
Trap-Ease America: The Big Cheese of Mousetraps: Company Case
3 pages
Gillette
No ratings yet
Gillette
10 pages
Draft Nou
No ratings yet
Draft Nou
3 pages
Mira Walt Disney Word
No ratings yet
Mira Walt Disney Word
7 pages
Marketintegration 170924041515 PDF
No ratings yet
Marketintegration 170924041515 PDF
13 pages
April 2013 Gears Magazine
88% (8)
April 2013 Gears Magazine
76 pages
Genius Network - Brian Kurtz and Marty Edelston Interviewed by Joe Polish
100% (3)
Genius Network - Brian Kurtz and Marty Edelston Interviewed by Joe Polish
50 pages
MTRCB Memo Circ 04-2014
No ratings yet
MTRCB Memo Circ 04-2014
7 pages
Eng9 Q4 Week5 19p
No ratings yet
Eng9 Q4 Week5 19p
19 pages
Etisilat and Du Eco Project
No ratings yet
Etisilat and Du Eco Project
22 pages
Hugo Boss
No ratings yet
Hugo Boss
25 pages

Machine Learning: What Is Data Science

Uploaded by

Machine Learning: What Is Data Science

Uploaded by

23/02/2017

What is Data Science

What is Data Science

Some famous examples

Amazon’s product Google’s LinkedIn’s contact

Data Science Applications

Data Science Applications

Data Science Applications

Overview of Machine Learning

What is Machine Learning

What is Machine Learning?

What is Machine Learning

What is Machine Learning

What is Machine Learning?

What is Machine Learning?

In essence, machine learning refers to

Types of Machine Learning

• For each observation of the predictors , • For every observation = 1, . . . , , we

Types of Machine Learning

Prediction Reason Reason

Flow of Creating & Evaluating Models

Predicting who would default on Predicting revenue lost to

Flow of Creating & Evaluating Models

Test & Training Splits

Evaluating Classification Models

Evaluating Classification Models

Accuracy Precision Recall

Ensuring Model Quality

You might also like