Glossary of Terms Journal of Machine Learning

Uploaded by

assistant0849

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views4 pages

Glossary of Terms Journal of Machine Learning

Uploaded by

assistant0849

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

2/1/2017 Glossary of Terms Journal of Machine Learning

Machine Learning, 30, 271-274 (1998)

©1998 Kluwer Academic Publishers, Boston, Manufactured in The Netherlands

Glossary of Terms
Special Issue on Applications of Machine Learning and the Knowledge Discovery
Process

Editors: Ron Kohavi ([email protected])

Foster Provost ([email protected])

To help readers understand common terms in machine learning, statistics, and data mining, we provide a
glossary of common terms. The definitions are not designed to be completely general, but instead are aimed at
the most common case.

Accuracy (error rate)

The rate of correct (incorrect) predictions made by the model over a data set (cf. coverage). Accuracy is
usually estimated by using an independent test set that was not used at any time during the learning
process. More complex accuracy estimation techniques, such as cross-validation and the bootstrap, are
commonly used, especially with data sets containing a small number of instances.
Association learning
Techniques that find conjunctive implication rules of the form ``X and Y implies A and B'' (associations)
that satisfy given criteria. The conventional association algorithms are sound and complete methods for
finding all associations that satisfy criteria for minimum support (at least a specified fraction of the
instances must satisfy both sides of the rule) and minimum confidence (at least a specified fraction of
instances satisfying the left hand side, or antecedent, must satisfy the right hand side, or consequent).
Attribute (field, variable, feature)
A quantity describing an instance. An attribute has a domain defined by the attribute type, which denotes
the values that can be taken by an attribute. The following domain types are common:
Categorical
A finite number of discrete values. The type nominal denotes that there is no ordering between the values,
such as last names and colors. The type ordinal denotes that there is an ordering, such as in an attribute
taking on the values low, medium, or high.
Continuous (quantitative)
Commonly, subset of real numbers, where there is a measurable difference between the possible values.
Integers are usually treated as continuous in practical problems.

A feature is the specification of an attribute and its value. For example, color is an attribute. ``Color is blue'' is a
feature of an example. Many transformations to the attribute set leave the feature set unchanged (for example,
regrouping attribute values or transforming multi-valued attributes to binary attributes). Some authors use
feature as a synonym for attribute (e.g., in feature-subset selection).

Classifier
A mapping from unlabeled instances to (discrete) classes. Classifiers have a form (e.g., decision tree) plus
an interpretation procedure (including how to handle unknowns, etc.). Some classifiers also provide
probability estimates (scores), which can be thresholded to yield a discrete class decision thereby taking
into account a utility function.

http://robotics.stanford.edu/~ronnyk/glossary.html 1/4
2/1/2017 Glossary of Terms Journal of Machine Learning

Confusion matrix
A matrix showing the predicted and actual classifications. A confusion matrix is of size LxL, where L is
the number of different label values. The following confusion matrix is for L=2:

actual \ predicted negative positive

Negative a b
Positive c d

The following terms are defined for a two by two confusion matrix:
Accuracy
(a+d)/(a+b+c+d).
True positive rate (Recall, Sensitivity)
d/(c+d).
True negative rate (Specificity)
a/(a+b).
Precision
d/(b+d).
False positive rate
b/(a+b).
False negative rate
c/(c+d).
Coverage
The proportion of a data set for which a classifier makes a prediction. If a classifier does not classify all
the instances, it may be important to know its performance on the set of cases for which it is ``confident''
enough to make a prediction.
Cost (utility/loss/payoff)
A measurement of the cost to the performance task (and/or benefit) of making a prediction Y' when the
actual label is y. The use of accuracy to evaluate a model assumes uniform costs of errors and uniform
benefits of correct classifications.
Cross-validation
A method for estimating the accuracy (or error) of an inducer by dividing the data into k mutually
exclusive subsets (the ``folds'') of approximately equal size. The inducer is trained and tested k times.
Each time it is trained on the data set minus a fold and tested on that fold. The accuracy estimate is the
average accuracy for the k folds.
Data cleaning/cleansing
The process of improving the quality of the data by modifying its form or content, for example by
removing or correcting data values that are incorrect. This step usually precedes the machine learning
step, although the knowledge discovery process may indicate that further cleaning is desired and may
suggest ways to improve the quality of the data. For example, learning that the pattern Wife implies
Female from the census sample at UCI has a few exceptions may indicate a quality problem.
Data mining
The term data mining is somewhat overloaded. It sometimes refers to the whole process of knowledge
discovery and sometimes to the specific machine learning phase.
Data set
A schema and a set of instances matching the schema. Generally, no ordering on instances is assumed.
Most machine learning work uses a single fixed-format table.
Dimension
An attribute or several attributes that together describe a property. For example, a geographical dimension
might consist of three attributes: country, state, city. A time dimension might include 5 attributes: year,
month, day, hour, minute.
Error rate
See Accuracy.

http://robotics.stanford.edu/~ronnyk/glossary.html 2/4
2/1/2017 Glossary of Terms Journal of Machine Learning

Example
See Instance.
Feature
See Attribute.
Feature vector (record, tuple)
A list of features describing an instance.
Field
See Attribute.
i.i.d. sample
A set of independent and identically distributed instances.
Inducer / induction algorithm
An algorithm that takes as input specific instances and produces a model that generalizes beyond these
instances.
Instance (example, case, record)
A single object of the world from which a model will be learned, or on which a model will be used (e.g.,
for prediction). In most machine learning work, instances are described by feature vectors; some work
uses more complex representations (e.g., containing relations between instances or between parts of
instances).
Knowledge discovery
The non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable
patterns in data. This is the definition used in ``Advances in Knowledge Discovery and Data Mining,''
1996, by Fayyad, Piatetsky-Shapiro, and Smyth.
Loss
See Cost.
Machine learning
In Knowledge Discovery, machine learning is most commonly used to mean the application of induction
algorithms, which is one step in the knowledge discovery process. This is similar to the definition of
empirical learning or inductive learning in Readings in Machine Learning by Shavlik and Dietterich. Note
that in their definition, training examples are ``externally supplied,'' whereas here they are assumed to be
supplied by a previous stage of the knowledge discovery process. Machine Learning is the field of
scientific study that concentrates on induction algorithms and on other algorithms that can be said to
``learn.''
Missing value
The value for an attribute is not known or does not exist. There are several possible reasons for a value to
be missing, such as: it was not measured; there was an instrument malfunction; the attribute does not
apply, or the attribute's value cannot be known. Some algorithms have problems dealing with missing
values.
Model
A structure and corresponding interpretation that summarizes or partially summarizes a set of data, for
description or prediction. Most inductive algorithms generate models that can then be used as classifiers,
as regressors, as patterns for human consumption, and/or as input to subsequent stages of the KDD
process.
Model deployment
The use of a learned model. Model deployment usually denotes applying the model to real data.
OLAP (MOLAP, ROLAP)
On-Line Analytical Processing. Usually synonymous with MOLAP (multi-dimensional OLAP). OLAP
engines facilitate the exploration of data along several (predetermined) dimensions. OLAP commonly
uses intermediate data structures to store pre-calculated results on multidimensional data, allowing fast
computations. ROLAP (relational OLAP) refers to performing OLAP using relational databases.
Record
see Feature vector.
Regressor
A mapping from unlabeled instances to a value within a predefined metric space (e.g., a continuous
range).
http://robotics.stanford.edu/~ronnyk/glossary.html 3/4
2/1/2017 Glossary of Terms Journal of Machine Learning

Resubstitution accuracy (error/loss)

The accuracy (error/loss) made by the model on the training data.
Schema
A description of a data set's attributes and their properties.
Sensitivity
True positive rate (see Confusion matrix).
Specificity
True negative rate (see Confusion matrix).
Supervised learning
Techniques used to learn the relationship between independent attributes and a designated dependent
attribute (the label). Most induction algorithms fall into the supervised learning category.
Tuple
See Feature vector.
Unsupervised learning
Learning techniques that group instances without a pre-specified dependent attribute. Clustering
algorithms are usually unsupervised.
Utility
See Cost.

[email protected]

http://robotics.stanford.edu/~ronnyk/glossary.html 4/4

MySQL Questions Answers
No ratings yet
MySQL Questions Answers
18 pages
20 Rad Sa Bazom Podataka
No ratings yet
20 Rad Sa Bazom Podataka
28 pages
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
ML Glossary
No ratings yet
ML Glossary
44 pages
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
Pattern L1 L6
No ratings yet
Pattern L1 L6
19 pages
Final ML
No ratings yet
Final ML
2 pages
Asset v1 ACCA+ML001+2T2021+Type@Asset+Block@Glossary
No ratings yet
Asset v1 ACCA+ML001+2T2021+Type@Asset+Block@Glossary
5 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
36 pages
Feature and Feature Extractionlect2
No ratings yet
Feature and Feature Extractionlect2
28 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
ML MAKAUT Unit-3
No ratings yet
ML MAKAUT Unit-3
6 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
Python 06 MachineLearning
No ratings yet
Python 06 MachineLearning
45 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Chapter Introduction
No ratings yet
Chapter Introduction
7 pages
ML Chap 2
No ratings yet
ML Chap 2
60 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
All Cards
No ratings yet
All Cards
104 pages
Predictive Analysis 1
No ratings yet
Predictive Analysis 1
22 pages
Data Minning Unit 2-1
No ratings yet
Data Minning Unit 2-1
10 pages
ML - Interview Prep
No ratings yet
ML - Interview Prep
9 pages
FDS Unit-4
No ratings yet
FDS Unit-4
15 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
Machine Learning
No ratings yet
Machine Learning
28 pages
03-Introduction To Machine Learning - DNN
No ratings yet
03-Introduction To Machine Learning - DNN
35 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Lec 8
No ratings yet
Lec 8
35 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Chapter1 ML
No ratings yet
Chapter1 ML
101 pages
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-04 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-04 Reference-Material-I
69 pages
Input: Concepts, Attributes, Instances
No ratings yet
Input: Concepts, Attributes, Instances
36 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
Learning
No ratings yet
Learning
51 pages
FML - KNN
No ratings yet
FML - KNN
64 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
5 - Predictive Modeling Using Decision Trees
No ratings yet
5 - Predictive Modeling Using Decision Trees
25 pages
Unit 3 DM
No ratings yet
Unit 3 DM
34 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
CH 4
No ratings yet
CH 4
21 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
Machine Learning Notes
100% (3)
Machine Learning Notes
134 pages
ML Sit1305
No ratings yet
ML Sit1305
127 pages
IntroClassificationDA 2024
No ratings yet
IntroClassificationDA 2024
129 pages
Optimization Problems For Machine Learning: A Survey
No ratings yet
Optimization Problems For Machine Learning: A Survey
41 pages
Machine Learning/Data Science Interview Cheat Sheets: Aqeel Anwar
No ratings yet
Machine Learning/Data Science Interview Cheat Sheets: Aqeel Anwar
17 pages
Pattern Recognition 14
No ratings yet
Pattern Recognition 14
46 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
ML
No ratings yet
ML
49 pages
Data Analysis ch1
No ratings yet
Data Analysis ch1
13 pages
ERROR and Confusion Matrix
No ratings yet
ERROR and Confusion Matrix
29 pages
03 Classification
No ratings yet
03 Classification
66 pages
Data Science in FInancial Services - 3
No ratings yet
Data Science in FInancial Services - 3
76 pages
Data Science II: Charles C.N. Wang
No ratings yet
Data Science II: Charles C.N. Wang
38 pages
ML Terminologies PDF
100% (1)
ML Terminologies PDF
44 pages
Classification Chapter 5
No ratings yet
Classification Chapter 5
26 pages
RODBC
No ratings yet
RODBC
30 pages
Advanced Database System Project 1
No ratings yet
Advanced Database System Project 1
4 pages
A Crash Course in Caching - Part 2 - by Alex Xu
No ratings yet
A Crash Course in Caching - Part 2 - by Alex Xu
9 pages
Attendance Connect SQL SERVER - en
No ratings yet
Attendance Connect SQL SERVER - en
8 pages
Beginners Guide To MySQL HeatWave
No ratings yet
Beginners Guide To MySQL HeatWave
22 pages
DBMS Reviewer
No ratings yet
DBMS Reviewer
8 pages
Lecture 04 - Cloud Storage
No ratings yet
Lecture 04 - Cloud Storage
28 pages
Kill Stale Version Store Connection
No ratings yet
Kill Stale Version Store Connection
3 pages
DBMS Query
No ratings yet
DBMS Query
33 pages
NGD Mini Notes
No ratings yet
NGD Mini Notes
7 pages
Lakshmi Snowflake Resume
No ratings yet
Lakshmi Snowflake Resume
4 pages
Contents IP XI
No ratings yet
Contents IP XI
15 pages
Database Management Systems: CS/B.TECH (CSE) /SEM-5/CS-502/2011-12
No ratings yet
Database Management Systems: CS/B.TECH (CSE) /SEM-5/CS-502/2011-12
7 pages
Homework 1
No ratings yet
Homework 1
9 pages
DS Unit 4 Hashing File Structure
No ratings yet
DS Unit 4 Hashing File Structure
46 pages
Class 12 Practical
No ratings yet
Class 12 Practical
2 pages
Module 9 - Advanced SQL Query
No ratings yet
Module 9 - Advanced SQL Query
6 pages
SPLK-1004 Splunk Core Certified Advanced Power User Exam Updated Dumps
No ratings yet
SPLK-1004 Splunk Core Certified Advanced Power User Exam Updated Dumps
11 pages
Dbms Lab File
No ratings yet
Dbms Lab File
46 pages
Freeradius OK
No ratings yet
Freeradius OK
2 pages
Spring Boot 033
No ratings yet
Spring Boot 033
10 pages
CDS Annotations
No ratings yet
CDS Annotations
29 pages
Praveen DBT and Snowflake Training 9703180969 - Dec 16
No ratings yet
Praveen DBT and Snowflake Training 9703180969 - Dec 16
14 pages
Influxdb Introduction
No ratings yet
Influxdb Introduction
16 pages
CS403 IMP Short Notes
100% (1)
CS403 IMP Short Notes
88 pages
Data Warehousing Tutorial
No ratings yet
Data Warehousing Tutorial
86 pages
Shireen Nagdive: Education
No ratings yet
Shireen Nagdive: Education
1 page
Entity Framework Code First Approach New
No ratings yet
Entity Framework Code First Approach New
5 pages

Glossary of Terms Journal of Machine Learning

Uploaded by

Glossary of Terms Journal of Machine Learning

Uploaded by

2/1/2017 Glossary of Terms Journal of Machine Learning

Machine Learning, 30, 271-274 (1998)

©1998 Kluwer Academic Publishers, Boston, Manufactured in The Netherlands

Editors: Ron Kohavi ([email protected])

Foster Provost ([email protected])

Accuracy (error rate)

actual \ predicted negative positive

Resubstitution accuracy (error/loss)

You might also like