Xchapter 1
Xchapter 1
1
An Application
A credit card company receives thousands of
applications for new cards. Each application
contains information about an applicant,
age
Marital status
annual salary
outstanding debts
credit rating
etc.
Problem: to decide whether an application
should approved, or to classify applications
into two categories, approved and not
approved.
2
Machine learning
Like human learning from past experiences.
A computer does not have “experiences”.
A computer system learns from data, which
represent some “past experiences” of an
application domain.
Our focus is to learn a target function that can be
used to predict the values of a discrete class
attribute, Here, in the example our approve or
not-approved
3
Some more examples of tasks that are
best solved by using a learning algorithm
Recognizing patterns:
Facial identities or facial expressions
Handwritten or spoken words
Medical images
Generating patterns:
Generating images or motion sequences
Recognizing anomalies:
Unusual sequences of credit card transactions
Unusual patterns of sensor readings in a nuclear power plant
or unusual sound in your car engine.
Prediction:
Future stock prices or currency exchange rates
4
Some web-based examples of machine
learning
The web contains a lot of data. Tasks with very big
datasets often use machine learning
especially if the data is noisy or non-stationary.
Recommendation systems:
Lots of noisy data
Information retrieval:
Find documents or images with similar content.
Data Visualization:
Display a huge database in a revealing way
5
Related Fields
Machine Visualization
Learning
Data Mining and
Knowledge Discovery
Statistics Databases
6
Statistics, Machine Learning and
Data Mining
Statistics:
more theory-based
more focused on testing hypotheses
Machine learning
more heuristic
focused on improving performance of a learning agent
also looks at real-time learning and robotics – areas not part of
data mining
Data Mining and Knowledge Discovery
integrates theory and heuristics
focus on the entire process of knowledge discovery, including
data cleaning, learning, and integration and visualization of
results
Distinctions are fuzzy
7
Types of learning task
Supervised learning
classification is seen as supervised learning from
examples.
Supervision: The data (observations, measurements, etc.) are
labeled with pre-defined classes. It is like that a “teacher” gives
the classes (supervision).
Test data are classified into these classes too.
Unsupervised learning
Class labels of the data are unknown
Given a set of data, the task is to establish the
existence of classes or clusters in the data
8
Supervised learning process: two steps
9
What do we mean by learning?
Given
a data set D,
a task T, and
a performance measure P,
a computer system is said to learn from D to perform the
task T if after learning the system’s performance on T
improves as measured by P.
10
An Example
Data: Loan application data
Task: Predict whether a loan should be approved
or not.
Performance measure: accuracy.
11
Fundamental assumption of learning
12
Classifier Accuracy Measures and its evaluation.
13
Evaluating Classification/Pediction
methods
Predictive accuracy
Efficiency
time to construct the model
time to use the model
Robustness
handling noise and missing values
Scalability
efficiency in disk-resident databases
Interpretability
understandable and insight provided by the model
Compactness of the model
size of the tree, or the number of rules.
14
Evaluation methods of
classifier/Predictor
Holdout set: The available data set D is divided into
two disjoint subsets,
the training set Dtrain (for learning a model)
the test set Dtest (for testing the model)
Important: training set should not be used in testing
and the test set should not be used in learning.
Unseen test set provides a unbiased estimate of accuracy.
The test set is also called the holdout set
This method is mainly used when the data set D is
large.
15
Holdout method
The holdout method has two basic drawbacks
In problems where we have a sparse dataset we may not be able
to afford the “luxury” of setting aside a portion of the dataset for
testing
Since it is a single train-and-test experiment, the holdout estimate
of error rate, will be misleading if we happen to get an
“unfortunate” split
The limitations of the holdout can be overcome with a family of re-
sampling methods at the expense of more computations
Cross Validation
Random Sub-sampling
N-Fold Cross-Validation
Leave-one-out Cross-Validation
16
Evaluation methods
1. Random Subsampling
Holdout method is repeated k times
Performs K data splits of the dataset
Each split randomly selects a (fixed) number of examples
without replacement
For each data split we retrain the classifier from scratch with
the training examples and estimate Ei with the test examples
The true error estimate is obtained as the average of the
separate estimates E i
17
Evaluation methods
2. N-fold cross-validation:
The available data is partitioned into n equal-size disjoint
subsets.
Use each subset as the test set and combine the rest n-1 subsets
accuracies.
10-fold and 5-fold cross-validations are commonly used.
19
Bootstrap
Select training samples uniformly with
replacement
Each time a tuple is selected ,it is equally likely to
be selected again and readded to training set
Suppose we have a dataset of d tuples.it is sampled d
times with replacement,resulting in bootstrap
sample.Try out several times
Eg: .632 bootstrap
Each tuple has a probabi;ity of 1/d being selected
-Not selected is (1-1/d).
If d is large,probanility approaches 0.368^14.36.8%
will not be selected for training and remaining
63.2% will form training set.Repaet this sampling
procedure k times 20
Evaluation methods
Accuracy is a measure to evaluate the classifier
Accuracy is not suitable in some applications.
In text mining, we may only be interested in the
documents of a particular topic, which are only a small
portion of a big document collection.
In classification involving skewed or highly imbalanced
data, e.g., network intrusion and financial fraud
detections, we are interested only in the minority class.
High accuracy does not mean any intrusion is detected.
E.g., 1% intrusion. Achieve 99% accuracy by doing nothing.
The class of interest is commonly called the positive
class, and the rest negative classes.
21
Precision and recall measures
We use a confusion matrix to introduce them
22
Precision and recall measures
TP TP
p . r .
TP FP TP FN
Precision p is the number of correctly classified
positive examples divided by the total number of
examples that are classified as positive.
The term precision indicates the relevancy of
prediction, as it represent out of all samples
labeled as class A, what fraction actuallybelongs
to class A.
23
Precision and recall measures
24
An example
25
Receive operating characteristics curve
It is commonly called the ROC curve.
It is a plot of the true positive rate (TPR) against the false
positive rate (FPR).
True positive rate:
26
Sensitivity and Specificity
Then we have
27
F1-value (also called F1-score)
It is hard to compare two classifiers using two measures. F 1 score
combines precision and recall into one measure
28
libraries
• scipy
• numpy
• matplotlib
• pandas
• sklearn
29
Transfer learning generally refers to a process
where a model trained on one problem is used in
some way on a second related problem.
30
Transfer learning has the benefit of decreasing
the training time for a neural network model and
can result in lower generalization error.
The weights in re-used layers may be used as the
starting point for the training process and
adapted in response to the new problem. This
usage treats transfer learning as a type of weight
initialization scheme. This may be useful when the
first related problem has a lot more labeled data
than the problem of interest and the similarity in
the structure of the problem may be useful in
both contexts.
31