0% found this document useful (0 votes)
30 views31 pages

Xchapter 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views31 pages

Xchapter 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Machine Learning

1
An Application
 A credit card company receives thousands of
applications for new cards. Each application
contains information about an applicant,
 age
 Marital status
 annual salary
 outstanding debts
 credit rating
 etc.
 Problem: to decide whether an application
should approved, or to classify applications
into two categories, approved and not
approved.

2
Machine learning
 Like human learning from past experiences.
 A computer does not have “experiences”.
 A computer system learns from data, which
represent some “past experiences” of an
application domain.
 Our focus is to learn a target function that can be
used to predict the values of a discrete class
attribute, Here, in the example our approve or
not-approved

3
Some more examples of tasks that are
best solved by using a learning algorithm
 Recognizing patterns:
 Facial identities or facial expressions
 Handwritten or spoken words
 Medical images
 Generating patterns:
 Generating images or motion sequences
 Recognizing anomalies:
 Unusual sequences of credit card transactions
 Unusual patterns of sensor readings in a nuclear power plant
or unusual sound in your car engine.
 Prediction:
 Future stock prices or currency exchange rates

4
Some web-based examples of machine
learning
 The web contains a lot of data. Tasks with very big
datasets often use machine learning
 especially if the data is noisy or non-stationary.

 Spam filtering, fraud detection:


 The enemy adapts so we must adapt too.

 Recommendation systems:
 Lots of noisy data

 Information retrieval:
 Find documents or images with similar content.

 Data Visualization:
 Display a huge database in a revealing way

5
Related Fields

Machine Visualization
Learning
Data Mining and
Knowledge Discovery

Statistics Databases

6
Statistics, Machine Learning and
Data Mining
 Statistics:
 more theory-based
 more focused on testing hypotheses
 Machine learning
 more heuristic
 focused on improving performance of a learning agent
 also looks at real-time learning and robotics – areas not part of
data mining
 Data Mining and Knowledge Discovery
 integrates theory and heuristics
 focus on the entire process of knowledge discovery, including
data cleaning, learning, and integration and visualization of
results
 Distinctions are fuzzy

7
Types of learning task
 Supervised learning
 classification is seen as supervised learning from
examples.
 Supervision: The data (observations, measurements, etc.) are
labeled with pre-defined classes. It is like that a “teacher” gives
the classes (supervision).
 Test data are classified into these classes too.

 Unsupervised learning
 Class labels of the data are unknown
 Given a set of data, the task is to establish the
existence of classes or clusters in the data

8
Supervised learning process: two steps

 Learning (training): Learn a model using the training data


 Testing: Test the model using unseen test data to assess the
model accuracy

Number of correct classifications


Accuracy  ,
Total number of test cases

9
What do we mean by learning?

 Given
 a data set D,
 a task T, and
 a performance measure P,
 a computer system is said to learn from D to perform the
task T if after learning the system’s performance on T
improves as measured by P.

 In other words, the learned model helps the system to


perform T better as compared to no learning

10
An Example
 Data: Loan application data
 Task: Predict whether a loan should be approved
or not.
 Performance measure: accuracy.

No learning: classify all future applications (test


data) to the majority class (i.e., Yes):
Accuracy = 9/15 = 60%.
 We can do better than 60% with learning

11
Fundamental assumption of learning

Assumption: The distribution of training examples is identical


to the distribution of test examples (including future
unseen examples).

 In practice, this assumption is often violated to certain


degree.
 Strong violations will clearly result in poor classification
accuracy.
 To achieve good accuracy on the test data, training
examples must be sufficiently representative of the test
data.

12
 Classifier Accuracy Measures and its evaluation.

13
Evaluating Classification/Pediction
methods
 Predictive accuracy

 Efficiency
 time to construct the model
 time to use the model
 Robustness
handling noise and missing values
 Scalability
efficiency in disk-resident databases
 Interpretability
understandable and insight provided by the model
 Compactness of the model
size of the tree, or the number of rules.
14
Evaluation methods of
classifier/Predictor
 Holdout set: The available data set D is divided into
two disjoint subsets,
 the training set Dtrain (for learning a model)
 the test set Dtest (for testing the model)
 Important: training set should not be used in testing
and the test set should not be used in learning.
 Unseen test set provides a unbiased estimate of accuracy.
 The test set is also called the holdout set
 This method is mainly used when the data set D is
large.

15
Holdout method
 The holdout method has two basic drawbacks
 In problems where we have a sparse dataset we may not be able
to afford the “luxury” of setting aside a portion of the dataset for
testing
 Since it is a single train-and-test experiment, the holdout estimate
of error rate, will be misleading if we happen to get an
“unfortunate” split
 The limitations of the holdout can be overcome with a family of re-
sampling methods at the expense of more computations
Cross Validation
 Random Sub-sampling
 N-Fold Cross-Validation
 Leave-one-out Cross-Validation

16
Evaluation methods

1. Random Subsampling
Holdout method is repeated k times
Performs K data splits of the dataset
 Each split randomly selects a (fixed) number of examples
without replacement
 For each data split we retrain the classifier from scratch with
the training examples and estimate Ei with the test examples
 The true error estimate is obtained as the average of the
separate estimates E i

17
Evaluation methods
2. N-fold cross-validation:
 The available data is partitioned into n equal-size disjoint
subsets.
 Use each subset as the test set and combine the rest n-1 subsets

as the training set to learn a classifier.


 The procedure is run n times, which give n accuracies.

 The final estimated accuracy of learning is the average of the n

accuracies.
 10-fold and 5-fold cross-validations are commonly used.

 This method is used when the available data is not large.

 N-Fold Cross validation is similar to Random Subsampling


 The advantage of K-Fold Cross validation is that all the examples in
the dataset are eventually used for both training and testing
18
Evaluation methods

3. Leave-one-out Cross Validation :


 Leave-one-out is the degenerate case of N-Fold Cross
Validation, where N is chosen as the total number of
examples.
 For a dataset with N examples, perform N experiments,
For each experiment use N-1 examples for training and
the remaining example for testing

19
Bootstrap
 Select training samples uniformly with
replacement
 Each time a tuple is selected ,it is equally likely to
be selected again and readded to training set
Suppose we have a dataset of d tuples.it is sampled d
times with replacement,resulting in bootstrap
sample.Try out several times
Eg: .632 bootstrap
Each tuple has a probabi;ity of 1/d being selected
-Not selected is (1-1/d).
If d is large,probanility approaches 0.368^14.36.8%
will not be selected for training and remaining
63.2% will form training set.Repaet this sampling
procedure k times 20
Evaluation methods
 Accuracy is a measure to evaluate the classifier
 Accuracy is not suitable in some applications.
 In text mining, we may only be interested in the
documents of a particular topic, which are only a small
portion of a big document collection.
 In classification involving skewed or highly imbalanced
data, e.g., network intrusion and financial fraud
detections, we are interested only in the minority class.
 High accuracy does not mean any intrusion is detected.
 E.g., 1% intrusion. Achieve 99% accuracy by doing nothing.
 The class of interest is commonly called the positive
class, and the rest negative classes.

21
Precision and recall measures
We use a confusion matrix to introduce them

22
Precision and recall measures

TP TP
p . r .
TP  FP TP  FN
 Precision p is the number of correctly classified
positive examples divided by the total number of
examples that are classified as positive.
 The term precision indicates the relevancy of
prediction, as it represent out of all samples
labeled as class A, what fraction actuallybelongs
to class A.
23
Precision and recall measures

 Recall r is the number of correctly classified


positive examples divided by the total number of
actual positive examples in the test set.
 Recall indicates the fraction of class A that the
classifier picks up out of all samples that belonged
to class A.

24
An example

 This confusion matrix gives


 precision p = 100% and
 recall r = 1%
because we only classified one positive example correctly
and no negative examples wrongly.

25
Receive operating characteristics curve
 It is commonly called the ROC curve.
 It is a plot of the true positive rate (TPR) against the false
positive rate (FPR).
 True positive rate:

 False positive rate:

26
Sensitivity and Specificity

 In statistics, there are two other evaluation measures:


 Sensitivity: Same as TPR
 Specificity: Also called True Negative Rate (TNR)

 Then we have

27
F1-value (also called F1-score)
 It is hard to compare two classifiers using two measures. F 1 score
combines precision and recall into one measure

 The harmonic mean of two numbers tends to be closer to the


smaller of the two.
 For F -value to be large, both p and r much be large.
1

28
libraries

• scipy
• numpy
• matplotlib
• pandas
• sklearn

29
 Transfer learning generally refers to a process
where a model trained on one problem is used in
some way on a second related problem.

30
 Transfer learning has the benefit of decreasing
the training time for a neural network model and
can result in lower generalization error.
 The weights in re-used layers may be used as the
starting point for the training process and
adapted in response to the new problem. This
usage treats transfer learning as a type of weight
initialization scheme. This may be useful when the
first related problem has a lot more labeled data
than the problem of interest and the similarity in
the structure of the problem may be useful in
both contexts.
31

You might also like