0% found this document useful (0 votes)
10 views7 pages

Glossary

The document is a glossary of terms related to machine learning and data processing, defining key concepts such as aggregation, automation, and classification. It includes explanations of various terms like false positives, data labeling, and supervised learning, providing insights into how machine learning models operate and are evaluated. This resource serves as a reference for understanding fundamental terminology in the field of machine learning.

Uploaded by

Yueyao Wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views7 pages

Glossary

The document is a glossary of terms related to machine learning and data processing, defining key concepts such as aggregation, automation, and classification. It includes explanations of various terms like false positives, data labeling, and supervised learning, providing insights into how machine learning models operate and are evaluated. This resource serves as a reference for understanding fundamental terminology in the field of machine learning.

Uploaded by

Yueyao Wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

12/10/2019 Glossary

Glossary
Aggregation
When you combine data from many different sources or times in order to lower the possibility of a
single individual being identified.

Augment
When a machine, software, or function extends a person’s abilities or potential while maintaining
their agency.

Automate
When a machine, software, or function performs a task without user involvement.

Binary Classification
Binary classification: when an ML model predicts if an example falls into one category or another
based on a set of features.

Classification
When a machine learning model identifies an object. In response to an identification question, the
simplest classification is “yes” or “no”. For example, if a model was shown a picture of a cat, it
could classify it as “Cat”, or “Not a cat”. More complex classifications are sorting items into one of
several groups.

Confidence Level, Model Confidence


The confidence level for a model is a statistical measure of how certain a prediction or outcome is.

Context Errors
Situations when the product output doesn’t make sense in the user’s current context. Often, this
output serves
Google is perceived
cookiesas
to irrelevant by the
analyse traffic user.
to this site. Information about your use of our site is shared with Google
for that purpose. See details OK, got it

https://pair.withgoogle.com/glossary/ 1/7
12/10/2019 Glossary

Counterfactuals
Rationale for why something is classified as not within the given class. Usually in the form of a
statement of how the world would have to be different for a desirable outcome to occur.

Data Collection and Labeling


How product teams get the data they need and apply meaningful labels to it. For example:
acquiring millions of images of cats and dogs correctly labeled as “cat” or “dog”.

Data Distribution
Shows frequency of specific values within a dataset. For example, your could find that your data
includes a high number of certain values, and lower numbers of others. Usually follows “normal”
distribution, or a Gaussian curve.

Data Examples
Lines in a dataset or specific pieces of data, such as a photo of a shoe or run route.

Data Features
An individual measurable property or characteristic of an observable entity. Feature should be
informative, discriminating, and independent.

Data Labels
Human-added descriptions for a piece of data, or example.

Explicit Data Collection


When you request information from users outright, like in feedback forms.

Explicit Feedback
Information solicited from users from within your app. For example: rating systems, review
requests, forms, or surveys.

False Negatives
Google serves cookies to analyse traffic to this site. Information about your use of our site is shared with Google
for that purpose. See details OK, got it

https://pair.withgoogle.com/glossary/ 2/7
12/10/2019 Glossary

When the ML algorithm classifies an object as not in a certain category, when it actually is. For
example, if it was searching for sneakers, and it didn’t return several true images of sneakers.

False Positives
When the machine learning algorithm classifies an object as belonging to a certain category, but it
is not in that category. For example, if the algorithm incorrectly identified a sneaker as a llama.

Features
Distinct data sources or machine learning calculations that influence a prediction or outcome.

Folk Theories
Invented (and usually false) ideas of how a product works based on existing mental models and
assumptions.

General System Explanations


Descriptions of general system functionality, i.e. how and why it uses inputs to generate outputs.

Heuristic-Based
Based on static if-then functions, or rules based on desired situation-result pairs. If a certain
situation arises, the software produces a specific result, every time.

Implicit Data Collection


When you gather information about users passively, usually through logging behavior.

Implicit Feedback
Information about user behaviors, preferences, and needs that’s gathered from their interactions
within your application or product. Often uses logging — records of what people do within your
app.

Inter-rater Reliability
Also known as inter-rater agreement, or concordance, is a score of much consensus there is
between
Google different
serves raters
cookies performing
to analyse the
traffic to same
this task.
site. Information about your use of our site is shared with Google
for that purpose. See details OK, got it

https://pair.withgoogle.com/glossary/ 3/7
12/10/2019 Glossary

Labeling/Labeled
A label is the description that is either given to a piece of data by a human or derived from user
actions. For example, labeling a photo as “sneakers”, or run route as “hilly”.

ML Model
Mathematical algorithm that learns the statistical relationships among examples to make
predictions in the future.

Machine Learning
Techniques and methods to program computers to execute tasks without super-specific rules. ML
can help machines recognize patterns and adjust to unique situations.

Machine Learning (ML) Systems


Techniques and methods to develop AI, by getting computers to do something without being
programmed with super-specific rules. ML can help machines recognize patterns and adjust to
unique situations.

Mental Model
Users’ internal explanations of how something works. They shape how users interact with a
product or feature and it’s perceived value.

N-Best, N-Best Classifications, N-Best Lists


Refers to showing a certain number, “n”, top solutions or suggestions, such as the top 5 matches
for an image search.

Network Effect
When a person starts or stops using a product or service because the majority of their network is
using it or not.

Overfitting
When a model is optimized for predictive power for a training dataset that is narrower than the ML
model’sserves
Google intended use.
cookies to analyse traffic to this site. Information about your use of our site is shared with Google
for that purpose. See details OK, got it

https://pair.withgoogle.com/glossary/ 4/7
12/10/2019 Glossary

Partial Explanations
Messages that explain one aspect of how the system works. Ideally, this is the most important
aspect to the user.

Precision
The proportion of true positives correctly categorized out of all the true and false positives.

Predictive Power
A percentage that refers to an ML models’ ability to correctly predict outcomes given a certain
input. A model with predictive power of 100 gives the correct prediction every time, 0 is purely
random.

Probabilistic
Situations where there are multiple possible outcomes, each having varying degrees of certainty of
its occurrence.

Progressive Disclosures
A practice in UX when more information is revealed in subsequent screens or interactions.

Qualitative Feedback
Non-numeric feedback about how a user feels about a certain experience. Can include measures
of satisfaction, happiness, verbal responses or other qualities.

Quantitative Feedback
Feedback that is numeric or converted to a number. Both implicit and explicit feedback
mechanisms can be quantitative. This feedback can be fed back into your model for tuning.

Raters
The people who label the data used to train machine learning algorithms, specifically supervised
learning models.

Recall
Google serves cookies to analyse traffic to this site. Information about your use of our site is shared with Google
for that purpose. See details OK, got it

https://pair.withgoogle.com/glossary/ 5/7
12/10/2019 Glossary

The proportion of true positives correctly categorized out of all the true positives and false
negatives.

Redaction
When some pieces of a dataset or profile are removed to lower the possibility of identifying a
single user based on their data profile. You can redact certain features of data to shrink the data
profile, or redact examples for a certain amount of time.

Regressions
Also known as. linear regression algorithms, which try to find the best-fit line for a plot of data
points on a graph. As new data points appear over time, the algorithm adjusts the line to fit.

Reward Function
Mathematical equation that your ML algorithm uses to optimize outputs. The function weighs
some results as better than others, and optimizes for certain outcomes.

Second-order Effects
When the aggregate or outcomes or behaviors over time produces additional, unexpected
outcomes.

Specific Output Explanations


Descriptions of how a system arrives at a specific output based on a certain input.

Supervised Learning
When you “teach” your algorithm on training data. Often this is based on examples manually
labeled by humans to show “right” and “wrong” answers.

Test Data
Datasets that you use to test your ML model to make sure its predictions work on data it hasn’t
encountered before.

Training Data
Google serves cookies to analyse traffic to this site. Information about your use of our site is shared with Google
for that purpose.
Datasets that youSee
usedetails
to teachOK, got it
your ML model which outcomes correspond to which inputs.
https://pair.withgoogle.com/glossary/ 6/7
12/10/2019 Glossary

Transparency
Providing information about how a product works, including data sources, terms and conditions,
privacy, permissions, and rationale behind system output.

True Negatives
When the machine learning algorithm classifies an object as NOT in a certain category and it is
indeed not in that specific category. For example, it correctly classifies a llama as “not a sneaker”.

True Positives
When the machine learning algorithm classifies an object in a certain category, and the object is in
that category.

Tuning
When developers adjust their machine learning algorithm based on feedback or errors to improve
accuracy and performance.

Underfitting
When a model has a low predictive power across a more varied dataset.

Google serves cookies to analyse traffic to this site. Information about your use of our site is shared with Google
for that purpose. See details OK, got it

https://pair.withgoogle.com/glossary/ 7/7

You might also like