0% found this document useful (0 votes)
5 views

Lect1 Introduction

Uploaded by

osm78027
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lect1 Introduction

Uploaded by

osm78027
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Introduction

CCCS416
APPLIED MACHINE LEARNING

Prepared and assembled by Shahd Alahdal1


We are drowning in information and
starving for knowledge. — John Naisbitt.

2
Outline:
qBig data
qMachine Learning definitions (What and Why)
qMachine Learning Applications
qMachine Learning approaches:
qSupervised learning
qUnsupervised learning
qSemi-supervised learning
qReinforcement learning
qOffline learning vs online learning
qInstance-based vs model-based learning
qMain challenges of Machine Learning

3
Big Data
qWidespread use of personal computers and
wireless communication leads to “big data”
qWe are both producers and consumers of data
qData is not random, it has structure, e.g.,
customer behavior
qWe need to extract that structure from data for
(a) Understanding the process
(b) Making predictions for the future
qThe big amount of data calls for automated
methods of data analysis, which is what machine
learning provides!
https://www.domo.com/learn/infographic/data-never-sleeps-8

4
What is Machine Learning?

qMachine Learning is the science (and art) of programming computers so they can
learn from data.

qMachine Learning is the field of study that gives computers the ability to learn
without being explicitly programmed. (Arthur Samuel, 1959)

qA computer program is said to learn from experience E with respect to some task
T and some performance measure P, if its performance on T, as measured by P,
improves with experience E. (Tom Mitchell, 1997)

5
What is Machine Learning?

qMachine learning is a set of methods that can automatically detect


patterns in data.

qThese uncovered patterns are then used to predict future data, or to


perform other kinds of decision-making under uncertainty.

qThe key premise is learning from data!!

6
What is Machine Learning?

qAddresses the problem of analyzing huge bodies of data so that they


can be understood.

qProvides techniques to automate the analysis and exploration of large,


complex data sets.

qTools, methodologies, and theories for revealing patterns in data –


critical step in knowledge discovery.

7
Why “Learn” ?
qData is cheap and abundant (data warehouses, data marts); knowledge is
expensive and scarce.
qLearning general models from a data of particular examples
qExample in retail: Customer transactions to consumer behavior:
People who bought “Blink” also bought “Outliers”
(www.amazon.com)
qBuild a model that is a good and useful approximation to the data.
qSome cases where learning is useful, when:
• Human expertise does not exist (navigating on Mars),
• Humans are unable to explain their expertise (speech recognition)

8
Machine Learning is great for:
qProblems for which existing solutions require a lot of hand-tuning or long lists
of rules: one Machine Learning algorithm can often simplify code and perform
better.
qComplex problems for which there is no good solution at all using a
traditional approach: the best Machine Learning techniques can find a solution.
qFluctuating environments: a Machine Learning system can adapt to new data.
qGetting insights about complex problems and large amounts of data.

9
Why machine learning ?

The traditional approach Machine learning approach

10
Applications

• Machine learning plays a key role in many areas :


• Medicine:
• Predict whether a patient, hospitalized due to a heart attack, will
have a second heart attack. The prediction is to be based on
demographic, diet and clinical measurements for that patient.
• Identify the risk factors for lung cancer, based on clinical and
demographic variables.
• Finance:
• Predict the price of a stock in 6 months from now, on the basis of
company performance measures and economic data.

11
Applications
• Business
• Walmart data warehouse mined for advertising
• Credit card companies mined for fraudulent use of your card based on
purchase patterns
• Netflix developed movie recommender system
• Genomics
• Human genome project: collection of DNA sequences, microarray data
• Communication Systems
• Speech recognition
• Image analysis

12
Machine Learning Approaches
qSupervised learning
qClassification
qRegression
qUnsupervised learning
qSemi-supervised learning
qReinforcement learning

13
Supervised learning:
Classification
In supervised learning, the training data you feed to the algorithm includes the
desired solutions, called labels.
Example: spam filtering (class: spam or ham)

https://www.oreilly.com/library/view/hands-on-machine-learning/9781491962282/ch01.html 14
Supervised learning:
Regression
Another typical task is to predict a target numeric value, such as the price of a
car, given a set of features (mileage, age, brand, etc.) called predictors.

https://www.oreilly.com/library/view/hands-on-machine-learning/9781491962282/ch01.html
15
Examples for Supervised Learning
Algorithms
vk-Nearest Neighbors
vSupport Vector Machines (SVMs)
vDecision Trees and Random Forests
vNeural networks
vLinear Regression
vLogistic Regression
v…

16
Unsupervised Learning
qDoes not require a human expert to manually label the data
qWe are just given input data, without any outputs.
qThe goal is to discover “interesting structure” in the data “knowledge discovery”
qClustering: Grouping similar instances
qExample applications
qCustomer segmentation in CRM
qImage compression: Color quantization
qBioinformatics: Learning motifs

17
Clustering: example

https://www.researchgate.net/figure/An-example-of-the-document-clustering_fig1_322455242

18
Unsupervised learning

https://www.edureka.co/blog/machine-learning-tutorial/

19
Unsupervised machine learning
“Some” examples for unsupervised learning algorithms:
vK-means
vFuzzy k-means
vPrinciple Component Analysis
vAssociation rule learning
v…

20
Semi-supervised learning
qSome algorithms can deal with partially labeled training data, usually a lot of
unlabeled data and a little bit of labeled data.
qMost semi-supervised learning algorithms are combinations of unsupervised and
supervised algorithms
qSome photo-hosting services, such as Google Photos, are good examples of this.
Once you upload all your family photos to the service, it automatically recognizes
that the same person A shows up in photos 1, 5, and 11, while another person B
shows up in photos 2, 5, and 7.

21
Semi-supervised machine learning

https://www.ecloudvalley.com/mlintroduction/

22
Reinforcement Learning
qThis is useful for learning how to act or behave when given occasional reward or
punishment signals. (For example, consider how a baby learns to walk.)
qThe learning system, called an agent in this context, can observe the environment,
select and perform actions, and get rewards in return (or penalties in the form of
negative rewards).
qThen learn by itself what is the best strategy, called a policy, to get the most
reward over time.
qA policy defines what action the agent should choose when it is in a given
situation.
qExamples: gaming and robot navigation.

23
Reinforcement Learning

https://www.oreilly.com/library/view/hands-on-machine-learning/9781491962282/ch01.html
24
Offline learning vs Online learning
Offline learning (batch learning)
qThe system is incapable of learning incrementally: it must be trained using all the
available data.
qIf we want a batch learning system to know about new data (such as a new type of
spam), we need to train a new version of the system from scratch on the full
dataset (not just the new data, but also the old data), then stop the old system and
replace it with the new one.
qThis solution is simple and often works fine, but training using the full set of data
can take long time.
qTraining on the full set of data requires a lot of computing resources (CPU,
memory space, disk space, disk I/O, network I/O, etc.).

25
Offline learning vs Online learning
Online Learning
qThe system is trained incrementally by feeding it data instances sequentially, either
individually or by small groups called mini-batches.
qEach learning step is fast and cheap, so the system can learn about new data on the fly, as it
arrives. It is also a good option if you have limited computing resources

https://www.oreilly.com/library/view/hands-on-machine-learning/9781491962282/ch01.html
26
Instance-based vs model-based learning

• There are two main approaches to generalization: instance-based learning and


model-based learning.
• Instance-based learning
• The system learns the examples by heart, then generalizes to new cases using a
similarity measure.

27
Instance-based vs model-based learning

Model-based learning Example: does money make people happy?

• Another way to generalize from a set of


examples is to build a model of these
examples, then use that model to make
predictions.
§ Study the data
§ Select a model
§ Train it on the training data
§ Finally, apply the model to make predictions
on new cases (this is called inference), hoping
that this model will generalize well.

28
Main challenges of Machine Learning
qData-related challenges
qInsufficient quantity of training data
qNon-representative training data
qPoor-quality data
qIrrelevant features
qAlgorithms-related challenges
qOverfitting the Training data
qUnderfitting the Training data

29
Insufficient quantity of data
• It takes a lot of data for most Machine Learning algorithms to work properly.
Even for very simple problems you typically need thousands of examples, and
for complex problems such as image or speech recognition you may need
millions of examples

these results suggest that we may want to reconsider the trade-off


between spending time and money on algorithm development versus
spending it on corpus development. “Michele Banko and Eric Brill”

30
https://www.oreilly.com/library/view/hands-on-machine-learning/9781491962282/ch01.html
Non-representative training data
• In order to generalize well, it is crucial that your training data be representative of
the new cases you want to generalize to.

https://www.oreilly.com/library/view/hands-on-machine-learning/9781491962282/ch01.html
31
Poor quality data
qIf the training data is full of errors, outliers, and noise (e.g., due to poor-quality
measurements), it will make it harder for the system to detect the underlying
patterns, so your system is less likely to perform well.
qExamples
qIf some instances are clearly outliers, it may help to simply discard them or try to fix the errors
manually.
qIf some instances are missing a few features (e.g., 5% of your customers did not specify their
age), you must decide whether you want to ignore this attribute altogether, ignore these
instances, fill in the missing values (e.g., with the median age), or train one model with the
feature and one model without it, and so on.

32
Irrelevant features

qThe system will only be capable of learning if the training data contains
enough relevant features and not too many irrelevant ones
qFeatures Engineering
qFeature selection: selecting the most useful features to train on among existing features.
qFeature extraction: combining existing features to produce a more useful one.
qCreating new features by gathering new data.

33
Overfitting the training data

It means that the model performs well on the training data, but it does not generalize
well.
The possible solutions are:
q To simplify the model by selecting one with fewer
parameters (e.g., a linear model rather than a high-degree
polynomial model), by reducing the number of attributes in
the training data or by constraining the model
q To gather more training data
q To reduce the noise in the training data (e.g., remove
outliers)

Constraining a model to make it simpler and reduce the risk of overfitting is called regularization

34
Underfitting the training data
qIt occurs when the model is too simple to learn the underlying structure of the
data.
qThe main options to fix this problem are:
qSelecting a more powerful model, with more parameters.
qFeeding better features to the learning algorithm (feature engineering).
qReducing the constraints on the model.

35
Machine learning process

36
Summary

q Machine learning as tools, methodologies, and theories for revealing patterns


in data.
qExamples ad applications of machine learning in different areas including
medicine, finance,…
qMachine learning approaches including supervised, unsupervised, semi-
supervised and reinforcement learning.
qOffline learning vs online learning.
qInstance-Based vs Model-Based Learning.
qMain challenges of Machine Learning including bad data and bad algorithms.

37
References
§ Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow:
Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron
(chapter 1)
§ Introduction to machine learning by Ethem Alpaydin (chapter 1)
§ Machine learning: a probabilistic perspective by Kevin P. Murphy (chapter 1)

38

You might also like