0% found this document useful (0 votes)

15 views

Lect1 Introduction

Uploaded by

osm78027

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Lect1 Introduction

Uploaded by

osm78027

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Introduction

CCCS416
APPLIED MACHINE LEARNING

Prepared and assembled by Shahd Alahdal1

We are drowning in information and
starving for knowledge. — John Naisbitt.

2
Outline:
qBig data
qMachine Learning definitions (What and Why)
qMachine Learning Applications
qMachine Learning approaches:
qSupervised learning
qUnsupervised learning
qSemi-supervised learning
qReinforcement learning
qOffline learning vs online learning
qInstance-based vs model-based learning
qMain challenges of Machine Learning

3
Big Data
qWidespread use of personal computers and
wireless communication leads to “big data”
qWe are both producers and consumers of data
qData is not random, it has structure, e.g.,
customer behavior
qWe need to extract that structure from data for
(a) Understanding the process
(b) Making predictions for the future
qThe big amount of data calls for automated
methods of data analysis, which is what machine
learning provides!
https://www.domo.com/learn/infographic/data-never-sleeps-8

4
What is Machine Learning?

qMachine Learning is the science (and art) of programming computers so they can
learn from data.

qMachine Learning is the field of study that gives computers the ability to learn
without being explicitly programmed. (Arthur Samuel, 1959)

qA computer program is said to learn from experience E with respect to some task
T and some performance measure P, if its performance on T, as measured by P,
improves with experience E. (Tom Mitchell, 1997)

5
What is Machine Learning?

qMachine learning is a set of methods that can automatically detect

patterns in data.

qThese uncovered patterns are then used to predict future data, or to

perform other kinds of decision-making under uncertainty.

qThe key premise is learning from data!!

6
What is Machine Learning?

qAddresses the problem of analyzing huge bodies of data so that they

can be understood.

qProvides techniques to automate the analysis and exploration of large,

complex data sets.

qTools, methodologies, and theories for revealing patterns in data –

critical step in knowledge discovery.

7
Why “Learn” ?
qData is cheap and abundant (data warehouses, data marts); knowledge is
expensive and scarce.
qLearning general models from a data of particular examples
qExample in retail: Customer transactions to consumer behavior:
People who bought “Blink” also bought “Outliers”
(www.amazon.com)
qBuild a model that is a good and useful approximation to the data.
qSome cases where learning is useful, when:
• Human expertise does not exist (navigating on Mars),
• Humans are unable to explain their expertise (speech recognition)

8
Machine Learning is great for:
qProblems for which existing solutions require a lot of hand-tuning or long lists
of rules: one Machine Learning algorithm can often simplify code and perform
better.
qComplex problems for which there is no good solution at all using a
traditional approach: the best Machine Learning techniques can find a solution.
qFluctuating environments: a Machine Learning system can adapt to new data.
qGetting insights about complex problems and large amounts of data.

9
Why machine learning ?

The traditional approach Machine learning approach

10
Applications

• Machine learning plays a key role in many areas :

• Medicine:
• Predict whether a patient, hospitalized due to a heart attack, will
have a second heart attack. The prediction is to be based on
demographic, diet and clinical measurements for that patient.
• Identify the risk factors for lung cancer, based on clinical and
demographic variables.
• Finance:
• Predict the price of a stock in 6 months from now, on the basis of
company performance measures and economic data.

11
Applications
• Business
• Walmart data warehouse mined for advertising
• Credit card companies mined for fraudulent use of your card based on
purchase patterns
• Netflix developed movie recommender system
• Genomics
• Human genome project: collection of DNA sequences, microarray data
• Communication Systems
• Speech recognition
• Image analysis

12
Machine Learning Approaches
qSupervised learning
qClassification
qRegression
qUnsupervised learning
qSemi-supervised learning
qReinforcement learning

13
Supervised learning:
Classification
In supervised learning, the training data you feed to the algorithm includes the
desired solutions, called labels.
Example: spam filtering (class: spam or ham)

https://www.oreilly.com/library/view/hands-on-machine-learning/9781491962282/ch01.html 14
Supervised learning:
Regression
Another typical task is to predict a target numeric value, such as the price of a
car, given a set of features (mileage, age, brand, etc.) called predictors.

https://www.oreilly.com/library/view/hands-on-machine-learning/9781491962282/ch01.html
15
Examples for Supervised Learning
Algorithms
vk-Nearest Neighbors
vSupport Vector Machines (SVMs)
vDecision Trees and Random Forests
vNeural networks
vLinear Regression
vLogistic Regression
v…

16
Unsupervised Learning
qDoes not require a human expert to manually label the data
qWe are just given input data, without any outputs.
qThe goal is to discover “interesting structure” in the data “knowledge discovery”
qClustering: Grouping similar instances
qExample applications
qCustomer segmentation in CRM
qImage compression: Color quantization
qBioinformatics: Learning motifs

17
Clustering: example

https://www.researchgate.net/figure/An-example-of-the-document-clustering_fig1_322455242

18
Unsupervised learning

https://www.edureka.co/blog/machine-learning-tutorial/

19
Unsupervised machine learning
“Some” examples for unsupervised learning algorithms:
vK-means
vFuzzy k-means
vPrinciple Component Analysis
vAssociation rule learning
v…

20
Semi-supervised learning
qSome algorithms can deal with partially labeled training data, usually a lot of
unlabeled data and a little bit of labeled data.
qMost semi-supervised learning algorithms are combinations of unsupervised and
supervised algorithms
qSome photo-hosting services, such as Google Photos, are good examples of this.
Once you upload all your family photos to the service, it automatically recognizes
that the same person A shows up in photos 1, 5, and 11, while another person B
shows up in photos 2, 5, and 7.

21
Semi-supervised machine learning

https://www.ecloudvalley.com/mlintroduction/

22
Reinforcement Learning
qThis is useful for learning how to act or behave when given occasional reward or
punishment signals. (For example, consider how a baby learns to walk.)
qThe learning system, called an agent in this context, can observe the environment,
select and perform actions, and get rewards in return (or penalties in the form of
negative rewards).
qThen learn by itself what is the best strategy, called a policy, to get the most
reward over time.
qA policy defines what action the agent should choose when it is in a given
situation.
qExamples: gaming and robot navigation.

23
Reinforcement Learning

https://www.oreilly.com/library/view/hands-on-machine-learning/9781491962282/ch01.html
24
Offline learning vs Online learning
Offline learning (batch learning)
qThe system is incapable of learning incrementally: it must be trained using all the
available data.
qIf we want a batch learning system to know about new data (such as a new type of
spam), we need to train a new version of the system from scratch on the full
dataset (not just the new data, but also the old data), then stop the old system and
replace it with the new one.
qThis solution is simple and often works fine, but training using the full set of data
can take long time.
qTraining on the full set of data requires a lot of computing resources (CPU,
memory space, disk space, disk I/O, network I/O, etc.).

25
Offline learning vs Online learning
Online Learning
qThe system is trained incrementally by feeding it data instances sequentially, either
individually or by small groups called mini-batches.
qEach learning step is fast and cheap, so the system can learn about new data on the fly, as it
arrives. It is also a good option if you have limited computing resources

https://www.oreilly.com/library/view/hands-on-machine-learning/9781491962282/ch01.html
26
Instance-based vs model-based learning

• There are two main approaches to generalization: instance-based learning and

model-based learning.
• Instance-based learning
• The system learns the examples by heart, then generalizes to new cases using a
similarity measure.

27
Instance-based vs model-based learning

Model-based learning Example: does money make people happy?

• Another way to generalize from a set of

examples is to build a model of these
examples, then use that model to make
predictions.
§ Study the data
§ Select a model
§ Train it on the training data
§ Finally, apply the model to make predictions
on new cases (this is called inference), hoping
that this model will generalize well.

28
Main challenges of Machine Learning
qData-related challenges
qInsufficient quantity of training data
qNon-representative training data
qPoor-quality data
qIrrelevant features
qAlgorithms-related challenges
qOverfitting the Training data
qUnderfitting the Training data

29
Insufficient quantity of data
• It takes a lot of data for most Machine Learning algorithms to work properly.
Even for very simple problems you typically need thousands of examples, and
for complex problems such as image or speech recognition you may need
millions of examples

these results suggest that we may want to reconsider the trade-off

between spending time and money on algorithm development versus
spending it on corpus development. “Michele Banko and Eric Brill”

30
https://www.oreilly.com/library/view/hands-on-machine-learning/9781491962282/ch01.html
Non-representative training data
• In order to generalize well, it is crucial that your training data be representative of
the new cases you want to generalize to.

https://www.oreilly.com/library/view/hands-on-machine-learning/9781491962282/ch01.html
31
Poor quality data
qIf the training data is full of errors, outliers, and noise (e.g., due to poor-quality
measurements), it will make it harder for the system to detect the underlying
patterns, so your system is less likely to perform well.
qExamples
qIf some instances are clearly outliers, it may help to simply discard them or try to fix the errors
manually.
qIf some instances are missing a few features (e.g., 5% of your customers did not specify their
age), you must decide whether you want to ignore this attribute altogether, ignore these
instances, fill in the missing values (e.g., with the median age), or train one model with the
feature and one model without it, and so on.

32
Irrelevant features

qThe system will only be capable of learning if the training data contains
enough relevant features and not too many irrelevant ones
qFeatures Engineering
qFeature selection: selecting the most useful features to train on among existing features.
qFeature extraction: combining existing features to produce a more useful one.
qCreating new features by gathering new data.

33
Overfitting the training data

It means that the model performs well on the training data, but it does not generalize
well.
The possible solutions are:
q To simplify the model by selecting one with fewer
parameters (e.g., a linear model rather than a high-degree
polynomial model), by reducing the number of attributes in
the training data or by constraining the model
q To gather more training data
q To reduce the noise in the training data (e.g., remove
outliers)

Constraining a model to make it simpler and reduce the risk of overfitting is called regularization

34
Underfitting the training data
qIt occurs when the model is too simple to learn the underlying structure of the
data.
qThe main options to fix this problem are:
qSelecting a more powerful model, with more parameters.
qFeeding better features to the learning algorithm (feature engineering).
qReducing the constraints on the model.

35
Machine learning process

36
Summary

q Machine learning as tools, methodologies, and theories for revealing patterns

in data.
qExamples ad applications of machine learning in different areas including
medicine, finance,…
qMachine learning approaches including supervised, unsupervised, semi-
supervised and reinforcement learning.
qOffline learning vs online learning.
qInstance-Based vs Model-Based Learning.
qMain challenges of Machine Learning including bad data and bad algorithms.

37
References
§ Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow:
Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron
(chapter 1)
§ Introduction to machine learning by Ethem Alpaydin (chapter 1)
§ Machine learning: a probabilistic perspective by Kevin P. Murphy (chapter 1)

FMCG Training Modules-Catalyst
0% (1)
FMCG Training Modules-Catalyst
12 pages
Importance of Algebra in Psychology
No ratings yet
Importance of Algebra in Psychology
1 page
Lec 7_8_Machine Learning Introduction
No ratings yet
Lec 7_8_Machine Learning Introduction
55 pages
Lecture01 Introduction To Machine Learning (Chapter1)
No ratings yet
Lecture01 Introduction To Machine Learning (Chapter1)
64 pages
Unit-1 Introduction To Machine Learning
No ratings yet
Unit-1 Introduction To Machine Learning
24 pages
Machine Learning Report 1
No ratings yet
Machine Learning Report 1
20 pages
Unit 1
No ratings yet
Unit 1
72 pages
Module 1
No ratings yet
Module 1
175 pages
UNIT I-Machine Learning
No ratings yet
UNIT I-Machine Learning
68 pages
Notes Unit 1
No ratings yet
Notes Unit 1
13 pages
Unit1-2
No ratings yet
Unit1-2
101 pages
Introduction To ML
100% (1)
Introduction To ML
39 pages
AI Presentation Machine Learning
100% (2)
AI Presentation Machine Learning
42 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
28 pages
Chapter 1
No ratings yet
Chapter 1
27 pages
lec001
No ratings yet
lec001
17 pages
Introduction To ML
No ratings yet
Introduction To ML
48 pages
Machine Learning- UNIT I (1)
No ratings yet
Machine Learning- UNIT I (1)
70 pages
Lecture 1.2 Introduction to Machine Learning
No ratings yet
Lecture 1.2 Introduction to Machine Learning
31 pages
What is Machine Learning
No ratings yet
What is Machine Learning
5 pages
Unit-1
No ratings yet
Unit-1
88 pages
Machine Learning-Lecture 01
No ratings yet
Machine Learning-Lecture 01
28 pages
ml report
No ratings yet
ml report
19 pages
Lec 1,2
No ratings yet
Lec 1,2
69 pages
UNIT I-Part 1
No ratings yet
UNIT I-Part 1
52 pages
01 LecIntro
No ratings yet
01 LecIntro
23 pages
Lecture - 1
No ratings yet
Lecture - 1
46 pages
UNIT III
No ratings yet
UNIT III
39 pages
Supervised and Deep Learning
No ratings yet
Supervised and Deep Learning
83 pages
Unit 1 Intro
No ratings yet
Unit 1 Intro
41 pages
Machine Learning Slides
No ratings yet
Machine Learning Slides
46 pages
Module 4 & 5
No ratings yet
Module 4 & 5
58 pages
I2ml3e Chap1
No ratings yet
I2ml3e Chap1
20 pages
Session 8- Machine Learning Techniques
No ratings yet
Session 8- Machine Learning Techniques
48 pages
ML Lecture#1
No ratings yet
ML Lecture#1
52 pages
Unit - 5.1 - Introduction To Machine Learning
No ratings yet
Unit - 5.1 - Introduction To Machine Learning
38 pages
Lecture Compiled
No ratings yet
Lecture Compiled
224 pages
Machine Learnning
No ratings yet
Machine Learnning
17 pages
1 Lecture 1: Introduction To Machine Learning
No ratings yet
1 Lecture 1: Introduction To Machine Learning
12 pages
Lecture 1
No ratings yet
Lecture 1
65 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
Training Report On Machine Learning
No ratings yet
Training Report On Machine Learning
27 pages
Unit 1
No ratings yet
Unit 1
66 pages
UNIT III_AIML
No ratings yet
UNIT III_AIML
47 pages
Lecture 1
No ratings yet
Lecture 1
30 pages
1. ML Introduction
No ratings yet
1. ML Introduction
54 pages
Machine Learning: BY: Jeshwanth Singh, Soumya Ranjan Sahoo 5 Sem Cs
No ratings yet
Machine Learning: BY: Jeshwanth Singh, Soumya Ranjan Sahoo 5 Sem Cs
20 pages
ML 1
No ratings yet
ML 1
16 pages
MACHINE LEARNING
No ratings yet
MACHINE LEARNING
97 pages
UNIT III;dkd
No ratings yet
UNIT III;dkd
48 pages
L3 - Supervised and Unsupervised Learning
100% (3)
L3 - Supervised and Unsupervised Learning
24 pages
Introduction to ML
No ratings yet
Introduction to ML
17 pages
AI unit 1
No ratings yet
AI unit 1
36 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
12 pages
Machine Learning Techniques-bcds062!01!01[1]
No ratings yet
Machine Learning Techniques-bcds062!01!01[1]
66 pages
unit1
No ratings yet
unit1
6 pages
Chapter 1 Introduction To Machine Learning
No ratings yet
Chapter 1 Introduction To Machine Learning
29 pages
Unit I_Machine Learning @ CSJMU_6 Slides Handouts
No ratings yet
Unit I_Machine Learning @ CSJMU_6 Slides Handouts
4 pages
5th Sem Report
No ratings yet
5th Sem Report
29 pages
The Machine Learning Landscape
No ratings yet
The Machine Learning Landscape
30 pages
MATHEMATICAL FOUNDATIONS OF MACHINE LEARNING: Unveiling the Mathematical Essence of Machine Learning (2024 Guide for Beginners)
From Everand
MATHEMATICAL FOUNDATIONS OF MACHINE LEARNING: Unveiling the Mathematical Essence of Machine Learning (2024 Guide for Beginners)
DAVID MACKAY
No ratings yet
Machine Learning Applications
From Everand
Machine Learning Applications
Kai Turing
No ratings yet
The Socio-Cultural Life of The Ifugao of Chaja
No ratings yet
The Socio-Cultural Life of The Ifugao of Chaja
7 pages
Typhoon PAR
100% (1)
Typhoon PAR
5 pages
Nicholas Chaussonnet
No ratings yet
Nicholas Chaussonnet
2 pages
Teacher Candidate: Mikayla Mackert Grade Level: 3 Title: Body Systems/Digestive System
No ratings yet
Teacher Candidate: Mikayla Mackert Grade Level: 3 Title: Body Systems/Digestive System
4 pages
Unlocking Language Strategies For Teaching English As A Foreign Language
No ratings yet
Unlocking Language Strategies For Teaching English As A Foreign Language
10 pages
Sample Questionnaire
No ratings yet
Sample Questionnaire
2 pages
4as LESSON PLAN
50% (6)
4as LESSON PLAN
2 pages
Myat Pwint Phyu
No ratings yet
Myat Pwint Phyu
10 pages
Book Review and Article Review
100% (1)
Book Review and Article Review
4 pages
Readings in Philippine History
No ratings yet
Readings in Philippine History
15 pages
English For Specific Purposes: Guided By: Anggraini Thesisia S, M.Hum
No ratings yet
English For Specific Purposes: Guided By: Anggraini Thesisia S, M.Hum
7 pages
CH 3 Social Beliefs and Judgements
No ratings yet
CH 3 Social Beliefs and Judgements
27 pages
Management Support Systems: An Overview
No ratings yet
Management Support Systems: An Overview
21 pages
January 23-23 Integrated Studies
No ratings yet
January 23-23 Integrated Studies
10 pages
Business Communication - Syllabus
No ratings yet
Business Communication - Syllabus
5 pages
Quality Process Improvement Tools and Techniques
No ratings yet
Quality Process Improvement Tools and Techniques
21 pages
Performance Managent
0% (1)
Performance Managent
43 pages
International Journal of Education & The Arts
No ratings yet
International Journal of Education & The Arts
38 pages
Group 1 Assignment On Thesis Evaluation
No ratings yet
Group 1 Assignment On Thesis Evaluation
7 pages
Tamil
No ratings yet
Tamil
8 pages
Memeories: Internet Memes As Means For Daily Journaling: Nađa Terzimehić Svenja Yvonne Schött
No ratings yet
Memeories: Internet Memes As Means For Daily Journaling: Nađa Terzimehić Svenja Yvonne Schött
11 pages
Practical Research 2
No ratings yet
Practical Research 2
8 pages
Characteristics of Legalese
No ratings yet
Characteristics of Legalese
4 pages
Tac101 - Introduction
No ratings yet
Tac101 - Introduction
12 pages
Batangas State University Graduate School
No ratings yet
Batangas State University Graduate School
9 pages
Cause Effect Essay RD
No ratings yet
Cause Effect Essay RD
3 pages
Theories of Motivation
No ratings yet
Theories of Motivation
3 pages
Running Record
No ratings yet
Running Record
9 pages

Lect1 Introduction

Uploaded by

Lect1 Introduction

Uploaded by

Introduction

Prepared and assembled by Shahd Alahdal1

qMachine learning is a set of methods that can automatically detect

qThese uncovered patterns are then used to predict future data, or to

qThe key premise is learning from data!!

qAddresses the problem of analyzing huge bodies of data so that they

qProvides techniques to automate the analysis and exploration of large,

qTools, methodologies, and theories for revealing patterns in data –

The traditional approach Machine learning approach

• Machine learning plays a key role in many areas :

• There are two main approaches to generalization: instance-based learning and

Model-based learning Example: does money make people happy?

• Another way to generalize from a set of

these results suggest that we may want to reconsider the trade-off

q Machine learning as tools, methodologies, and theories for revealing patterns

You might also like