LightSpeed Ans
LightSpeed Ans
Version 2.3.1
Contents
1. Introduction
1.1. Task Components
1.2. Steps in the Grading Process
1.3. Definitions
1.4. General Guidelines
2. Annotation Process
2.1. Understand the query
2.2. Review the results
• Overview of Result Types
2.3. Validate the result
• Wrong Language
• Content Unavailable
• Inappropriate
2.4. Rate the result
• Satisfaction Principles
• Degrees of Separation
• Think About the Meaning, Not Just Matching Words
• User Effort
• Source Quality
• Grading specific situations and result types
1. Ambiguous Queries (Multiple Interpretations)
2. Locale Sensitivity
3. English Results in Non-English Locales
4. Redirected Pages
5. Apps
6. News
7. Maps
8. Web Video
9. Dictionary, Stocks, Weather, Knowledge/Answers, Sports, and “Learn About” Queries
10. Web Results (also called Suggested Web Sites)
11. Web Images
12. Product Searches
13. Other Query Types
2.5. Review and submit
• Common grading mistakes
3. Additional Examples
3.1 Highly Satisfying
3.2 Satisfying
3.3 Somewhat Satisfying
3.4 Not Satisfying
Version History
Note: There are two platforms using these guidelines, Tag and Try Rating. In the majority of cases, the grading
instructions will be the same. However, there are some situations where the grade will be different depending on what
platform you are working on. These instances are noted in the guidelines when they occur.
A search service may return many different types of results. How are these graded? What is a satisfying search result? In these
guidelines we talk about what constitutes a search query, the different types of results, and how to grade them. In addition we
describe some typical grading tasks that use the principles learned in satisfaction grading.
Search engine users are trying to accomplish a task (or achieve a goal) that requires some information or quick access to some
other resource, such as an app.
A user’s information need or search need is defined as the information or resource that the user needs in order to accomplish
their task. The user's query is an attempt to express that need to the search engine. If the search results enable the user to
accomplish their task, we say that the search need is satisfied.
We say that a result is satisfying if it satisfies the search need of a query. Results can be more satisfying or less satisfying
depending on how well or how completely they satisfy the need. The purpose of this task is to improve the search results when a
user issues a query.
The grading interface displays each query together with additional information that provides useful context. As shown in the
figure above, this includes the following components:
Note: The tool interface may look different depending on which platform you are using, but the components of each task will be
the same.
Check your work for errors. Once you • Ensure you have not made one of the
5. Review and submit have reviewed your work, submit it and common grading mistakes discussed in
go to the next task. Section 2.5 before submitting the task.
• Stephen Curry
• Yellowstone National Park
• Jupiter
• Médecins Sans Frontières
A person, place, organization, business, product,
• Starbucks
Named Entity service, or event whose name would normally be
• Post-It Notes
capitalized in English. (This includes fictional entities.)
• Skype
• Super Bowl LI
• Boxer Rebellion
• Frodo Baggins
• photosynthesis
• elephant
A word or phrase describing a concept or object of • ROC curve
study (other than a named entity) that users may wish to • linear algebra
learn more about. Knowledge terms may come from any • cancer
Knowledge Term
field of study, including: science, technology, • oligarchy
mathematics, medicine, history, philosophy, literature, • veto
art, economics, etc. They are most often noun phrases, • existentialism
but may also be other parts of speech. • metaphor
• impressionism
• interest rate
• Jacinda Ardern
Anything whose concept or identity can be usefully • Taj Mahal
conveyed by a visual image. People and places are • bal-peen hammer
Visually Distinctive Entity visually distinctive entities, but so are certain tools, • dodecahedron
geometric figures, geological or architectural features, • mesa
and visual artworks. • flying buttress
• “The Thinker” (sculpture by Rodin)
It is important that you research the query in order to fully understand the user intent. The query may be a common word that
you think you know. But the web search may show that the primary meaning is something entirely different. For example:
• Query is "canada goose"; result is the wikipedia page about that kind of bird. If you had not heard of the Canada Goose
clothing brand, you might assume that the bird page is what almost all users would want to see. But by looking at the web
search results, you can tell that this is not the case.
Chrome
Safari
For other browsers, perform a web search for instructions on how to turn off ad blockers.
• Click the result and review the information present on the page.
• Are the results timely or are they stale and outdated given the user’s query?
• Is the information the user requested present on the page?
Web, Web Video, and News results • Does the user have to do additional scrolling or click an additional link to arrive at the
information they’ve requested?
• Also note any broken links or warnings, or log-ins or pop-ups that might prevent some but not
all users from viewing the content on the page.
• Only review the information present on the card. Do not click on them.
Direct results, such as Sports, Maps, Weather, • Is the answer to the query that addresses the user's intent present on the card?
Movies, etc. • If the location information is relevant to the query, is it present on the card?
• Is the date relevant and present?
Once you have reviewed the results and noted the relevant information, go to the next step.
Before you can grade the satisfaction of a result, you’ll be asked to indicate whether there are any problems that would prevent
you from judging it or result in an unsatisfying experience for the user. There are three types of result problems you’ll be asked to
identify: wrong language, content unavailable, and inappropriate.
Wrong Language
A result is in the wrong language if it is neither in English nor in the language of the user’s locale.
However, there are a few exceptions that are NOT considered wrong language results:
1. Result (e.g. amazon.co.jp) is the same country-specific site as requested by the query (“amazon.co.jp”), even if the requested
site is not in your locale.
2. User is visiting another country, query is for a local business or attraction, result is in the language of the visited country (i.e.
where query was submitted), and there is no equivalent result in the user’s own locale language.
3. Query is in a foreign language and result is in locale language, but query is also the name of a popular song, movie, business,
etc. in the current locale (e.g. “viva la vida” query in en-US).
4. The result is in English. English results are never considered Wrong Language.
When you flag a result as Content Unavailable, you must leave a comment describing the reason for the flag. Flag result as
content unavailable in any of these situations:
Situation Example
In some grading scenarios, you will need additional information to assign a grade. This additional information is called
Contextual Information, and it is provided in the tool to help you. There are two types of contextual information in this project:
Query Context and Result Context.
Result context is information provided about the result, such as distance from user location on a Maps result, or the date of a
news article on a News result.
Inappropriate
A result is considered inappropriate if it has any of the following: pornography, adult advertising/services, sex toys, illegal drugs,
hate speech, gambling, spam/phishing, pirated content (including those posing as free video streaming services), or gore/shock.
In general, we want to connect users with useful content for their topic of interest while protecting them from being exposed to
harmful information summarized below.
• Hateful: the result should not advocate discriminatory content that intentionally attacks someone’s dignity. This can
include references or commentary about religion, race, sexual orientation, gender, national/ethnic origin, or other targeted
groups.
• Violent or harmful: the result should not intentionally incite imminent violent, physically dangerous, or illegal activities,
nor provide information that leads to immediate harm.
• Sexually explicit: the result should not have overtly sexual or pornographic material, defined by Webster’s Dictionary as
"explicit descriptions or displays of sexual organs or activities that are principally intended to stimulate erotic without
sufficient aesthetic or emotional feelings.”
• Spam Results that are malicious, deceptive, or manipulative. Examples: pages that contain phishing schemes, instal
viruses, or attempt to artificially boost their relevance (e.g., link farming, keyword stuffing, etc).
• Results that do not contain original and useful content. Examples: pages with content scraped from Wikipedia or
otherwise automatically-created content.
• Illegal: We also manually remove reported results in those circumstances that are required by law in the corresponding
locale (e.g., images of child abuse, content related to sex trafficking, copyright infringement, etc.) and when action is
required to keep people safe (e.g., involuntary posting of sensitive personal information, etc). Movie streaming sites such
as those posing as free movies are also part of this category. App sites that promote side loading (these can lead to
unsafe applications being installed on phones).
Note 2: Content that might otherwise be considered inappropriate is acceptable if it occurs in a medical, educational, fine
art, or journalistic context, and should not be flagged (e.g Wikipedia).
Inappropriate Examples
User searched for [tinyzone] and the result is https://tinyzonetv.to/ which contains pirated content.
User searched for [sdc.com] and result is http://sdc.com/, or user searched [olga 24k gold] and the result is https://
www.lelo.com/blog/olga-24k-gold-review/. Both results contain adult advertising and should be flagged.
Irrespective of whether the user was searching for this, these results need to be flagged.
• If working in TAG:
• If result is Wrong Language or Inappropriate, flag result and select Submit to go the next query.
• If result is Content Unavailable, flag result and leave a comment about why. Submit to go to the next query.
When judging how satisfying each result is, you’ll use the following scale:
Query: instagram
Query: microsoft
Almost all users would want to see this result. It’s authoritative,
accurate, up-to-date, and addresses the most likely search
Result: Their official website, microsoft.com
need(s). If the user is asking a specific question, the result gives
the correct answer clearly and concisely.
Almost all users searching for a company or organization would
want to see its official web site.
Highly Satisfying Note that some types of results can never be Highly
Satisfying. Results for advice or recommendation queries (e.g.,
“how to lose weight”, “chicken parmesan recipe”, “best beatles
song”, “thai restaurant”) can never be HS. This is because the Query: how many stomachs does a cow have
result would be an opinion and we don’t know if almost all users
would agree with the recommendation. Result:
The page contains the answer, but the user has to do some extra
work to find it -- clicking on the result, reading and scrolling
through it.
Query: qr reader
Result:
Probably not what most users were looking for. (If they had
wanted the library, they would have mentioned it in the query.)
Result:
Somewhat Satisfying Some users may find this result useful, but it’s
probably not what most searchers were looking for. It’s often only
indirectly related to the search need or assumes an uncommon
interpretation of the query.
Result: https://hotsportsgirls.com/alica-schmidt/
Result: Home page for Harold's Kitchen & Bar in British Columbia,
Canada
Despite the similar name, this result is for a restaurant 3000 miles
away from the user. (And there is a different Harold's Kitchen near
the user.)
Result is for a previous year’s Tour de France, and is not even the
stage the user asked for.
Examples:
• Query is “fac,” result is “facebook.com”. Grade as if the query was “Facebook.”
• Query is “ted cruise,” result is a wikipedia page about U.S. senator Ted Cruz.
Grade as if the query was “ted cruz.”
Satisfaction Principles
There are several factors you should consider when you grade a result.
Degrees of Separation
Results are often associated with concepts in the real world, and different concepts are connected by their relationships.
Each time we pass through one of these relationships, we increase the distance from the original concept.
A Rolling Stone magazine review of the album. Somewhat Satisfying The singer's official site and Rob Sheffield's Twitter.
The reviewer Rob Sheffield's Twitter. Not Satisfying Random article from same issue of Rolling Stone
We can think of these relationships as “degrees of separation” so in this example, the review of the Lemonade album is two
degrees of separation from Beyoncé.
When Grading results, each degree of separation from the concept mentioned in the query, that is, the number of relationships
you have to traverse to get to the result, lowers the satisfaction grade by one level. See table above.
Example Scenarios
Note that some highly satisfying results may not contain all (or even any) of the query words; what matters is the meaning. For
example:
• The result www.premierleague.com/home is highly satisfying for the query “english premier league soccer” even
though that result doesn’t contain the words “english” or “soccer.”
• The result https://music.apple.com/us/album/25/1544494115 is satisfying for the query “adele’s third album,” even
though it doesn’t contain the word “third.” It's also possible for a result to contain all the query words and not be
satisfying.
• The result https://en.wikipedia.org/wiki/My_Girl_Has_Gone (a web page about a song from the 1960s) is not satisfying for
the query “gone girl,” even though the result contains both query words. Gone Girl is the title of a book and movie from
the 2010s, and the song result is clearly not what the user intended.
When the user is looking for specific information, a result that displays this information directly is preferable to a regular web
result. For example:
• If the query is “how old is Obama”, then a Knowledge card that directly displays his age without requiring any user action
is better than a web result that the user needs to click on, wait for it to load, and scroll through to find the desired
information.
Example of card showing Obama’s age. Check that the answer is indeed correct.
Source Quality
Sources of results, including web sites and news providers, can have large differences in quality. If the source of a result is low
quality, you should assign a lower grade than you would have otherwise. Source quality is based on several factors:
1. Writing
• High quality source: has a neutral point of view, or makes point of view clear.
• Low quality source: has "hidden agendas," such as pretending to offer information while actually trying to sell its
services.
Motivation Examples
When the source is a web URL, this means that the response is taken from that web page. Some of those pages may be from
authoritative sites whose content is carefully curated by experts, while others may be created by people expressing their
uninformed opinion, or worse, promoting conspiracy theories and other misinformation.
Look at the source page and see if you can determine how much you can trust the information provided. Sometimes checking
the “About Us” page or the wikipedia page of the website (if any), or doing a third-party search can help you better determine
the trustworthiness of the Source.
◦ This is a well-known, authoritative source created by people who are either experts on the subject, are the
inventors/creators/owners of the subject, or use professional standards of research to create the content.
◦ Includes news articles or news subpages dedicated to certain topics, written by professional journalists or experts
on the subject. For example:
• BBC subpage dedicated to science (written and curated by experts in their respective fields: https://
www.bbcearth.com/science
◦ The content may be created by a random person who knows nothing about the subject, or by an organization with
a particular political or commercial agenda (e.g. pretending to give information when actually trying to sell you a
product or service).
◦ The information it provides is misleading or incorrect, often supporting a conspiracy theory, or has no purpose
other than to get users to click on links or ads.
4. Use of Citations
• Low quality source: makes medical or scientific claims without citations or evidence.
Source quality is meant to be only one point to help you evaluate a query result. For example, the result can still be unsatisfying
even if from a trustworthy site (e.g. does not answer user intent). And even if you consider a site to be neutral, the result can still
be highly satisfying.
Source quality also depends on the query. Some websites meet all the criteria of being a high quality source, but the site is not
an appropriate source given the query intent. For example, The Onion or Punch satire websites are sources of high quality humor,
but they are not trustworthy sources for news.
Always keep in mind the user intent when considering Source Quality.
While most queries express several different user intents, some queries are also ambiguous in what they refer to (e.g. “apple”
could be a company or a fruit). In this case you should still grade the result, using the following additional guidelines.
If you're not sure whether there is a dominant interpretation, look at the web search results for the query. If most of the highly
ranked results on the first page are for one interpretation, then you should consider that to be the dominant interpretation.
Secondary Interpretation: If a result would be 2. Query is “american eagle”, result is home page of web
Dominant Interpretation Exists.
relevant (HS/S/SS) for a secondary developer americaneagle.com. Grade as SS (rather than HS),
When one interpretation is much
interpretation, you should since the dominant interpretation of the query is clothing
more popular than the others.
grade it as “SS”. retailer American Eagle Outfitters.
2. Locale Sensitivity
Explicitly Locale-Sensitive.
Query is “amazon france”. The user is in EN-
Results that do not pertain to the locale specified in the GB locale. The result is https://amazon.co.uk.
Query explicitly specifies that user is seeking
query should be automatically graded as “NS”. Grade as NS, since the Amazon page in the
results from a locale that differs from their
UK is not what the user is searching for.
current location.
Implicitly Locale-Sensitive.
Query is “ticketmaster”; user is located in
Query does not explicitly ask for results in a Any results from a different locale (even if they’re in the
US. Result is ticketmaster.co.uk. Grade as
particular locale, but the user need is correct language) should be automatically graded as
NS, since user did not express any interest in
inherently locale-specific (e.g., local law “NS”.
UK events.
information, country-specific merchant sites,
nearby real-world business).
Example Scenarios
English is a widely-understood second language in many countries, and all our international graders are fluent in it. For this
reason, rather than simply marking an English result in a non-English locale as “wrong language,” graders should go ahead and
grade the result, with the following locale-specific considerations. You will need to use your own knowledge of the locale to
decide which guideline to apply.
Scenario Grade
The user’s locale is one where many users understand English Grade the result one level lower than you would if it were in the locale language.
fluently (i.e. Western Europe) and would possibly be interested in
English-language results. ⚠ Results that would have been NS should still be graded as NS
4. Redirected Pages
If the result displayed URL gets redirected to a different URL, then you should grade the page you’re redirected to as if that were
the result.
5. Apps
When a user clicks these results it takes them to app store (usually Apple app store) or opens the app if present on the device.
• Scenario 6 below refers to cases where the query is the name of a well-known app — a service that is best known as an app.
A well-known app is not the same thing as a well-known company!
• Scenario 10 below refers to cases where the query is a business and the result is an app “regularly used to interact with that
business.” Meaning, the app is a common way that customers or clients perform the ordinary tasks they need to do business
with that company.
For example, the query “dell” refers to the name of a computer company. But their app “Dell@Retail 2019” is described as “a
chance for our global retail partners to immerse themselves in the design, performance, and vision driving Dell’s innovation.”
This app is NOT used regularly by Dell’s customers and should NOT be graded HS.
• If the query is the name of a bank, then the app should allow the user to perform mobile banking tasks.
• If the query is the name of a restaurant chain, then the app should allow the user to order food at that restaurant.
• If the query is the name of an airline, then the app should allow the user to make reservations, choose their seat
assignment, and check flight status.
• If the query is the name of a retail chain, then the app should allow the user to browse and purchase items sold by that
chain.
Example Scenarios
News articles usually have the word “News” prepended to them. They are specific web results that link to news websites.
• The relevance grade for a news article depends in part on the amount of time between the date the search was done and
the date of the article.
• Keep in mind validity flags (Inappropriate, Wrong language, and Content Unavailable).
• Just because a news story mentions an entity does not mean it's about that entity. If the entity is not a primary topic of the
story, the article is not about the entity.
◦ Example: Query is "starbucks" and result is a news article about a man who died in a traffic accident. The article
mentions the fact that the man worked at Starbucks, but his death had nothing to do with the company or the fact
that he worked there. This is NOT a news article about Starbucks, so Scenario 14 below does not apply.
• News items may be Highly Satisfying. One news organization – even one reporter – may actually write several stories
about the same event. One person only likes stories from Fox News while another prefers MSNBC. For these reasons, we
can’t say that a given news story is one that almost everyone wants to see.
However if the article is timely, accurate, well written, and highly relevant to the query and comes from a well-known and
established (in its respective locale and geographical area) news source, the result may be HS.
Time sensitivity does not impact the relevance grade of the results for these types of queries.
Historical Event Examples of historical events are Notre Dame fire, Harry and Meghan wedding, Sandy Hook
shooting, Pope Benedict resigns, etc.
Note 4: You might see articles with dates in the future! For these rare occurrences, grade it the same way as a timely
article, as long as the date is not more than 3 months newer than the search date. If the date is more than 3 months
newer, flag the result as Content Unavailable.
The following chart contains examples of these news sites for en locales. Note: this list is not exhaustive. There may be news
sources that are considered high quality but are not represented below.
For all locales: refer to Satisfaction Principles: Source Quality to help you judge whether or not a news source would be
considered a high-quality, trusted source.
Locale Example
wsj.com
cnn.com
foxnews.com
reuters.com
en_us
washingtonpost.com
npr.org
bloomberg.com
bbc.com
telegraph.co.uk
bbc.co.uk/news
independent.co.uk
en_gb
theguardian.com
news.sky.com
bbc.com
businessinsider.com.au
news.com.au
theage.com.au
en_au
theguardian.com.au
abc.net.au
9news.com.au
en_au smh.com.au
huffingtonpost.ca
globalnews.ca
thestar.com
ctvnews.ca
en_ca
cbc.ca/news
theglobeandmail.com
thecanadianpress.com
nationalpost.com
Example Scenarios
https://www.nbcnews.com/news/us-news/last-
Query is a named entity and batch-unsealed-jeffrey-epstein-documents-
there is something highly Highly released-rcna132936
topical about the entity that Satisfying
people are searching for. In the recent news, (as of January 5, 2024),
14 Named Entity News Jeffrey Epstein related documents have been
released - hence a high quality, timely news
article about the document release might be
HS (and comparable to his wikipedia entry).
However, if its not topical, maybe a month old,
the news article might be S.
The relevance of Maps results depends in part on the distance from the user.
• You should check to see if the info card has distance displayed. If not, this result must be flagged as Content Unavailable
(and graded as Not Satisfying if working in Try Rating).
• Queries with a map intent often have a distance qualifier e.g. "nearest", "closest", "near me”.
• Such queries often relate to business where one must physically go to e.g. gas stations, cinema halls.
1. Grade on what is visible: Only use what is in the title and description to grade. Do not grade NS just because clicking the
result takes you nowhere or the wrong place.
• Note: at times a query will return multiple possible Maps results. In these cases, assign the grade based on the first result
only.
2. “Permanently closed”: You might see this phrase in the card for a business. We still surface these results as the knowledge
of whether business is closed permanently or temporarily inactive is important. In this case a "permanently closed" label
would have the result's rating lowered by 1 if similar/same business is open and nearby. Otherwise no penalty.
• People living in sparsely populated rural areas are generally wiling to travel longer distances than people in cities. If the
query “restaurants” is issued in Wilsal, MT (population 237), then a result 39 miles away in Bozeman (population 39,860)
might be S. But if the same query were issued in New York City, a result 36 miles away in Greenwich, CT would be NS.
4. Keep in mind Intent and Distance! For some queries, users are looking for a Maps result. For other queries, they aren't. If a
Maps result is shown for a non-Maps intent query, then grade it as NS. Use the distance to guide you. If a Maps result is very
far away, that’s often a sign that the user was not looking for a map.
• Query is "prime video" and result description is: "prime time video, 2511 springs rd ne, hickory, nc 28601- distance: 529
mi”
• Query is "Lakers" and result description is: "great lakes brewing company, 2516 market ave, cleveland, oh 44113 -
distance: 2,165 miles
Maps result is correct and near the user, but is not the closest one. Satisfying
Business Maps result is correct, and is still accessible to the user but is not close. Somewhat Satisfying
8. Web Video
• If a query specifically refers to a particular video (e.g., “lemonade official video,” “stepanov elements of programming
lecture”), the desired result should be graded as Highly Satisfying regardless of its popularity.
• For other results, and for more general queries where many different video results could satisfy the user's need (e.g., “guitar
lesson”), then popularity may factor into your decision; you may want to grade a video with millions of views higher than a
similar one with only a handful.
• When deciding on your grade, think about whether video results are what user is looking for when typing the query.
• You are not required to watch the entire video to arrive at a rating.
Grade these cards based on what is visible. The grader cannot click on them but a user is provided self-contained snippets of
information which can often be interacted with to learn more (e.g. the Stock card opens up to show historic price graphs)
• Dictionary: Is the user seeking a definition or a concept? If the card precisely answers the need, this is Highly Satisfying.
In all cases it must be the correct interpretation for that word
• Weather: the result’s location should match the location specified in the query (e.g. “weather boston”), or the user’s
location if location is not mentioned in query.
• Answers: grade on what is visible. If the query is an explicit question, see Scenario 20 below.
Note 5: For all these cards, ensure your browser window is expanded. A small browser window causes the cards to resize,
potentially hiding information that would have been shown to the user ̶ and this might affect your rating.
Additionally, you must still do web research to ensure correctness and relevance of information shown in the card.
To grade web results such as Scenarios 21 and 23 below, you must click on the web result and verify whether or not the
requested information is available in order to properly grade the result.
Please click on the thumbnail and grade the destination page (after redirects).
• If the web search results show results for a corrected or autocompleted version of the query, you should grade your result
as if the user typed the corrected or completed query.
Example Scenarios
A group of web images should be graded as a single result. Check to see if all the images have the following properties:
• Image displays correct subject. The image must actually show the subject of the query. For example, if the query is
“dodecahedron,” the image must actually show that geometric figure and not some other one. Missing images (or ones
that do not load) do not have this property.
• Subject clearly shown. All images in the set must clearly show the subject of the query. The subject should not be
blocked, out of focus, too far away, or otherwise difficult to see clearly.
• Subject is focus of image. In cases where the image includes multiple people or objects, it should be clear who or what
is the subject of the query. (For example, if the query is “Joe Biden,” it’s fine to have people in the background of a picture
of President Biden giving a speech, but it’s not fine to have a picture of Presidents Biden and Macron shaking hands.)
• Image shows representative version of subject. For example, if the query is the name of a currently popular actor, the
image should show that person as they look today (or how their character looks in a currently popular movie), not how
they looked many years ago. If the query is the name of a famous person from the past who is no longer alive, the image
should show them as they were best known. For example, if the query is “Richard Nixon,” a picture should show him
during the time he was U.S. president, not 20 years later when he was near the end of his life.
If ALL the images have all of the above properties, grade the result Highly Satisfying. Otherwise, downgrade the results as shown
in the following table.
Query is David Beckham, result is set shown above. It has all the
desired properties, so you would grade as Highly Satisfying.
If the user is searching for a product and the result is a page where the product can be purchased, but the item is unavailable or
out-of-stock, you may want to lower the grade in certain cases:
• If the query describes something very specific, the user usually wants only that item. Showing the product page for the
item is the best you can do, even if the item is out of stock, so that result should not be penalized. Example queries:
◦ “our missing hearts by celeste ng” [a specific book; user doesn’t want any book]
◦ “iPhone 14 pro max 512gb” [a specific model and configuration of a product]
• If the query describes something general, or where there are reasonable substitutes, the user would probably rather
see an in-stock substitute rather than an out-of-stock exact match. So you should lower the grade of the out-of-stock
result. Example queries:
◦ usb to usb-c adapter [there are many different, equally good ones from different brands]
◦ bounty paper towels 12-pack [the user might be just as happy with two 6-packs of the same brand]
Example Scenarios
Result that is not about a. Query is “samsung tv”, result is web page for
the query topic. Note Samsung washing machine.
that in some cases the b. Query is “obama age”, result gives the age of Joe
URL may appear to be Not Biden.
36 Any Query Off-Topic Result
about the query, but Satisfying c. Query is “Messi goals”, (Messi is a soccer player)
clicking through shows result is total goals by Barcelona (his team).
that the destination page d. Query is “target stores”, result is about an Ace
is not related. Hardware store location.
After you have assigned a grade, review your work for errors. Ensure you have not made one of the common grading mistakes
discussed below.
1. Misunderstanding Query Meaning. The query may be a common word that you think you know. But the web search may
show that the primary meaning is something entirely different.
• Example: Query is "canada goose"; result is the wikipedia page about that kind of bird. If you had not heard of the Canada
Goose clothing brand, you might assume that the bird page is what almost all users would want to see. But by looking at
the web search results, you can tell that this is not the case.
2. Misunderstanding Dominant Interpretation. This is a slight variation of the previous error. Based on your personal
experience, you may know that there is more than one interpretation of the query, but you may not realize that one is dominant.
• Example: Query is "jaguar"; result is the home page for the car company. If you believe the animal is the dominant
interpretation, you would downgrade the car company result. But by doing the web search, you can see that the car
company is actually the dominant interpretation, accounting for all but one of the results on the first page of both Google
and Bing results.
3. Falsely Assuming Dominant Interpretation. If you have heard of a result, you may assume that it's the dominant
interpretation. But this is not always true.
• Example: Query is "u of m scholarships," result is a page about scholarships at the University of Michigan. A grader who
knew nothing about the subject might conclude that this is a great result, and rate it Highly Satisfying. But looking at the
⚠ Do not use web search ranking to determine grade! The only purpose of looking at the web search (Google and Bing)
results is to make sure you understand the possible meaning(s) of the query, and which meaning is dominant. You should never
use the ranking on the search result page to decide your grade. In other words, you should never think (for example) "Google
says this is the #1 result, so it must be Highly Satisfying," or "Bing puts this at the bottom of the page, so it must not be that
good." Once you understand the query, only these guidelines and your judgment should determine the grade.
Another class of mistakes can occur when the grader fails to visit the destination page of a web/news result, and in particular, if
they try to grade a web/news result based only on the URL and/or snippet.
1. Missing Error Condition. The URL and/or snippet may make this look like a perfect result ‒ perhaps the home page of a
company. But if you actually clicked on it, you'd discover that the page does not load, or redirects to some entirely unrelated
page.
• Example: Query “valco shopping center,” result is www.valcoshoppingcenter.com. If you click on the result, you’ll be
taken to an advertising page that has nothing to do with the shopping center (which is out of business).
2. Incorrect Page Owner Assumption. The URL may be a perfect match for the name of a company or product you're
familiar with. But if you visited the destination page, you'd see that it's actually for an entirely different company with a
similar name.
• Example: Query "american eagle," result is www.americaneagle.com. Since American Eagle is a well-known clothing
brand, you assume the page is the home page of that company. But it isn't. Clicking on the result would have shown that
it's the home page of a web design company, which is not what most searchers are looking for.
Many grading mistakes happen when the grader doesn't pay attention to the time or place of the query and/or result.
1. Mismatched Location. Graders usually notice when the user is in one location and the result is a Map to a very distant
location. But they frequently miss the case where the result is a web result for a very distant location.
• Example: User is in Virginia (state in Eastern U.S.), query is "harold's kitchen menu." Result is home page for Harold's
Kitchen and Bar. At first glance, this looks like a Highly Satisfying result. It's a restaurant with a matching name, and the
page shows their menu. But a closer look shows that this restaurant is actually in Richmond, British Columbia, Canada ‒
nearly 3000 miles (5000 km) away from the user. It is extremely unlikely that this was the result the user was looking for
(especially since there is a different restaurant named Harold's Kitchen close to the user's location).
2. Mismatched Date. Graders may notice the date of a news story, but forget to notice the date of the search. Or they may
not notice an implicit date in the content of a web result.
• Example: Query dated 2022 is "presidential election results"; result is a page showing the results of the 2016 U.S.
presidential election. The user was almost certainly looking for the most recent presidential election results, not one from
six years earlier.
Some mistakes involve the conceptual distance between the result and what the user was looking for.
1. Too Specific or Too General. Graders sometimes incorrectly give a result a high grade without realizing that it is too
specific or too general.
• Example: Query is "dog," result is wikipedia page about the welsh corgi, a particular breed of dog. This is too specific.
• Example: Query is "new england patriots news," result is home page for a regional sports news network that covers
many different sports teams in New England, not just the New England Patriots. This is too general.
• Example: Query is "us passport information"; result is www.state.gov. This page is too high in the hierarchy of this web
site. It is about everything the U.S. State Department does (diplomatic relations, trade policies, etc.), not just passports.
• Example: Query is "us passport information"; result is a page from the U.S. State Department about what to do if your
passport is lost or stolen. This page is too low in the hierarchy of the site. The user never said anything about their
passport being lost or stolen ‒ in fact, we don't even know if the user already has a passport.
3. Ignoring Degrees of Separation. Graders often ignore the principle of degrees of separation. A result that's associated
with the thing the user is looking for is not the same as the thing the user is looking for.
• Example: Query is "chez panisse," result is Yelp's page of reviews for that restaurant. This is a very useful result, but it is
not Highly Satisfying, because it is one degree of separation from what the user was looking for.
1. Matching Words Instead of Meaning. Graders sometimes forget the principle "Think about meaning, not just matching
words." Just because the query words appear in the result does not mean the result is a good one, and just because the
query words are missing does not mean the result is a bad one.
• Example: Query is "far alone," result is a page containing the inspirational quote "If you want to go quickly, go alone. If
you want to go far, go together." The result contains both query words, but they match only incidentally. It's clear that
this is not what the user was looking for, and in fact the web search results show that "Far Alone" is the name of a song.
2. Ignoring Basic Definitions of Grading Scale. A common mistake is to ignore the basic definitions of each grade and only
look at the specific grading scenarios. The scenarios are meant to illustrate the definitions in different situations, not to
replace them. If you're faced with a grading situation where you don't see a rule that applies, just go back to the definitions:
Is this a result most users would want to see? Etc.
3. Ignoring "Aboutness" in News Stories. When the query is a named entity, news stories about that entity are graded as
satisfying. But just because a news story mentions an entity does not mean it's about that entity. If the entity is not a primary
topic of the story, the article is not about the entity.
• Example: Query is "starbucks" and result is a news article about a man who died in a traffic accident. The article mentions
the fact that the man worked at Starbucks, but his death had nothing to do with the company or the fact that he worked
there. This is NOT a news article about Starbucks, so News: Scenario 2 does not apply.
Official website for the pop star, Almost all users searching for a celebrity would
3 olivia rodrigo HS
oliviarodrigo.com want to see that person's official web site.
Wikipedia page about the early 1800s Wikipedia is a highly satisfying result for any
5 jane austen HS
author named entity.
saw
The official page for the movie. Contains
12 Note: assume a web search shows HS streaming links and descriptions about the
the dominant interpretation is for the movie.
movie
saw
A knowledge card for a named entity is Highly
13 Note: assume a web search shows HS
Satisfying.
the dominant interpretation is for the
movie
tour de france stage 1 (queried on NBC video of stage 18 of 2021 Tour de Result is for a previous year’s Tour de France,
4 NS
29 July 2022) France. and is not even the stage the user asked for.
Stop! Overall Preference Rating is a separate task from Search Satisfaction. You will only be working on one of
these tasks at a time.
Do not utilize these guidelines unless you are working on the Overall Preference Rating (Side-By-Side Search
Satisfaction) task. If you are working on the single Search Satisfaction task, do not proceed past this point.
Contents
1. Overall Preference Rating
1.1. OPR Criteria
1.2. When a Side is Missing
1.3. Writing Comments
2. OPR and Comment Examples
2.1. Example 1
2.2. Example 2
2.3. Example 3
2.4. Example 4
2.5. Example 5
2.6. Example 6
2.7. Example 7
2.8. Example 8
2.9. Example 9
2.10. Example 10
2.11. Example 11
2.12. Example 12
Note 1: How much these criteria affect OPR also depend on the position of the result. For example, if the satisfaction
rating of the results in position 1 are different, that should have a bigger impact on OPR than if the satisfaction rating of
results in position 4 are different.
• Prefer the side WITH results ONLY when the side with results has at least one result graded Somewhat Satisfying, Satisfying or
Highly Satisfying
• Do not choose "About The Same”.
OR
• Prefer the side WITH results ONLY when the side with results has at least one result graded Satisfying or Highly Satisfying
• Do not choose "About The Same”.
In neither case should you choose “About the Same” in other words a side with a result can never be as good as a side without.
• The comment on the left can be improved by providing reasons why the left is “more suitable”.
• For the comment on the right, the writer states presumed search need and then goes on to describe how the results help meet
that and ultimately why they chose one over the other.
Query: tdecu
Location: Richwood, TX
LEFT RIGHT
Official TDECU Digital Banking App Official TDECU Digital Banking App
TDECU Mortgage Simplified App TDECU Mortgage Simplified App
Maps info card with directions to TDECU TDECU.org official website
branch, 3 miles away
Maps Info Card with directions to a TDECU.org "About Us" page
TDECU branch 4 miles away
@TDEC twitter page
Much Better Better Slightly Better About the Same Slightly Better Better Much Better
OPR Explanation: The query refers to a credit union (essentially, a bank) with two branches near the user. We can assume the
user wants to either do a bank transaction, go to the bank, or get information about the bank.
The official app, the official website, and the map results for the nearest locations are al Highly Satisfying. The map results
appear on the left but not the right, while the official website appears on the right but not the left.
The left side addresses three search needs (it satisfies people looking for the main app, the mortgage app, and the map) while
the right addresses four (the main app, the mortgage app, the web page, and the Twitter feed). So the right has a slightly more
diverse result set. However, the user gave no indication that they were interested in the Twitter feed, so this is a very unlikely
intent. Since we don’t know whether more people are interested in the map or the official site, the two sides are About the
Same.
Query: diesel
Location: Cambridge, MA
LEFT RIGHT
Diesel Fuel - Wikipedia (en.wikipedia.org/wiki/Diesel_fuel) Diesel [Maps result], 339 Newbury St., Boston (2 miles)
Much Better Better Slightly Better About the Same Slightly Better Better Much Better
OPR Explanation: The query could refer to a clothing store or a kind of fuel.
• Two out of three results are the same on both sides, so they aren’t that different.
• The left side has a wrong language result, which is Not Satisfying to users.
• The right side ranks the diesel fuel result higher, showing both likely interpretations near the top.
• The right side has more diversity of result types (web pages and maps, instead of only web pages).
• Since the are multiple reasons to prefer the right side, that side should be more than Slightly Better. But since the lists aren’t
that different, it’s not Much Better. So we choose Better.
LEFT RIGHT
Apollo Space Program wikipedia article Apollo Space Program wikipedia article
(en.wikipedia.org/wiki/Apollo_program) (en.wikipedia.org/wiki/Apollo_program)
OPR Explanation: The query refers to the space program from the 1960s that first put a human on the moon.
• The first two results are the same on both sides.
• Both result sets have three types of search results.
• The third result on the left is only vaguely related to the Apollo space program. It seems unlikely that someone searching for
“apollo project” would find an obscure artist’s ambient music useful in satisfying their search need.
• The third result on the right is not at all related to the Apollo space program; it has something to do with a project of the
Apollo Theater.
Based on the web results, it’s extremely unlikely that this was the user’s intended interpretation of the query. Since only the last
result is different, and the last result on the left is less bad than the one on the right, we conclude that the left side is Slightly
Better.
LEFT RIGHT
Academy Awards Best Actor and Best Supporting Actor — Joaquin Phoenix — Academy Award for Best Actor —
Winners (filmsite.org/bestactor2.html) Winner [Info card]
Andy Serkis for Best Actor [YouTube video from 2011] Academy Awards Best Actor and Best Supporting Actor —
Winners (filmsite.org/bestactor2.html)
The Best Actors Who Won Oscars for Their First Movie Joaquin Phoenix: Best Actor, Motion Picture, Drama: 2020
(www.ranker.com/list/actors-who-won-oscars-for-their- Golden Globes (YouTube video)
first-movie/ranker-film)
Much Better Better Slightly Better About the Same Slightly Better Better Much Better
OPR Explanation: The query very likely refers to the winner of the Academy Award (aka “Oscar”) in the best actor category.
Since the query was on Feb. 13, 2020, we assume the user wanted the most recent award winner at the time, announced at the
ceremony on February 8, 2020.
• Result #1 on the left (same as #2 on right) contains the answer, but requires visiting the page and scrolling al the way to the
bottom to find it. Result #1 on the right gives us the answer right away, without even having to click on it.
• Result #2 on the left is a YouTube video from a non-authoritative source (a random fan), and it’s very outdated ̶ from 2011.
• Result #3 on the left is related to best actor winners, but doesn’t actually contain the answer the user is looking for.
• Result #3 on the right tells us about another recent best actor award ̶the Golden Globes, rather than the Oscars ̶which had the
same winner, Joaquin Phoenix. Even though we assume the user was looking for the Oscar winner, they might also be
interested in other awards won by the same actor for the same role.
Since all of these observations suggest that the right side is better than the left, you would conclude that the right side is Much
Better than the left.
LEFT RIGHT
OPR Explanation: The query refers to an actor and singer who appeared in the original cast of the musical Hamilton.
• Results L1, R1, and R4 al al Highly Satisfying. Al the rest of the results on both sides are Satisfying.
• The set on the right is more diverse, providing more different types of results.
Query: dana
Location: Hampton, VA on 2021-08-17
LEFT RIGHT
Dana (Indonesian digital wallet) app Home page for Dana Inc. (www.dana.com), a company
that makes drivetrain parts for passenger vehicles
Home page for Nigerian airline Dana Air Video of Israeli singer Dana International performing the
winning song at the 1998 Eurovision contest
Video of 2021 song "Dana Dana" by Now United Wikipedia page for South Korean singer Dana
Much Better Better Slightly Better About the Same Slightly Better Better Much Better
OPR Explanation: The query can refer to many different things or people, and the web search results make it clear that none of
them is a dominant interpretation. Furthermore, these results all seem to be only Somewhat Satisfying, since it isn’t likely that
most users in the United States were searching for (say) an Indonesian app or an Israeli Singer from the 1990s. Therefore the
two sides are About the Same.
LEFT RIGHT
1985 movie "Mad Max: Beyond Thunderdome" (which co- Web page for 2021 documentary "Tina"
starred Tina Turner) on HBO
1993 movie "What's Love Got to Do With It," about the life 1993 movie "What's Love Got to Do With It," about the life
of Tina Turner of Tina Turner
Web page for 2021 documentary “Tina" on HBO 1985 movie "Mad Max: Beyond Thunderdome" (which co-
starred Tina Turner)
Much Better Better Slightly Better About the Same Slightly Better Better Much Better
OPR Explanation: Both sides have the same results, but they are ranked differently. Since the search was done in 2021, it’s most
likely that the new 2021 documentary about Tina Turner (“Tina”) is what the user was looking for. Since the only difference is the
ranking, and the right side ranking is clearly better than the left side (moving the best result into position #1), it’s Better.
LEFT RIGHT
A news article on her winning an Emmy award for her The IMDB page for the actor Hannah Waddingham
character in the tv series Ted Lasso
A website listing the Emmy 2021 winners A different news article on her wining an Emmy award for
her character in the tv series Ted Lasso
Much Better Better Slightly Better About the Same Slightly Better Better Much Better
OPR Explanation: Both sides have a fresh and relevant news article but the second result on the left doesn't add any additional
value. On the right, we have an excellent ranking, the first result is a professional page about the actor and her experience and
the second a fresh news article.
LEFT RIGHT
Wikipedia entry for the video game Monster Hunter Stories Wikipedia link to Monster Hunter Stories 2: Wings of Ruin
Much Better Better Slightly Better About the Same Slightly Better Better Much Better
OPR Explanation: The user specifically asked for “Monster Hunter Stories 2”. The left side has a more general result (it’s about
the entire video game series), while the right is about the exact thing the user asked about, so the right is Better. To be Much
Better, the right side would have needed some additional content that added diversity, such as the link to official page.
LEFT RIGHT
A Knowledge Card describing the singer/actor including A Knowledge Card describing the singer/actor including
links to her official site and Twitter handle links to her official site and Twitter handle
A web video of a lesser well known song “My Man's Gone Official website
Now" from 2007
A web video of another song “Rainbow High" Twitter handle
Much Better Better Slightly Better About the Same Slightly Better Better Much Better
OPR Explanation: Both sides have the brief Knowledge card describing the person (with links to her official website and twitter
feed). The left side also has web videos for two of her songs, while the right side also has her official website and Twitter
feedResults R2 and R3 are more valuable than L2 and L3, but the lack of any videos makes the right side only Slightly Better.
Query: sunrise
Location: West Melbourne, FL on 2021-09-01
LEFT RIGHT
Weather Info card for West Melbourne A website selling the domain name
(with sunrise/sunset times) http://www.sunrise.am
App store link for sunrise/sunset times Weather Info card for West Melbourne
(with sunrise/sunset times)
Knowledge Info card about the topic Knowledge Info card about the topic
Sunrise Sunrise
Much Better Better Slightly Better About the Same Slightly Better Better Much Better
OPR Explanation: Both have same third result. Both have the same Highly Satisfying info card, but it’s ranked better on the left.
Of the remaining results, the one on the left might be useful, while the one on the right is Not Satisfying. Both of these
differences favor the left side, so it is Better.
LEFT RIGHT
Much Better Better Slightly Better About the Same Slightly Better Better Much Better
OPR Explanation: The user is looking for the news site Huffington Post. Official website, app, and Twitter feed are all Highly
Satisfying. The UK site is Somewhat Satisfying. Left is better due to more satisfying results.
• Added a new example to “Content Unavailable: The browser presents warning of a privacy or security issue on the page.”
• “Not Secure” in the search bar for the website.
• Updated Section “11. Web Images.”
• Updated image set for David Beckham so it more closely aligns with the grading rules.
• Added a new grading scenario (31).
• Note: subsequent scenario numbers have been updated accordingly (ie, previous Scenario 35 is now Scenario 36, etc). No
other change to subsequent scenarios.
• Updated image set for “Additional Examples: Highly Satisfying,” Example 11 to more closely align with guidelines.
• Added example 12 to “Additional Examples: Satisfying.”
Added guidelines for different but related Side-by-Side Search Satisfaction task.
• Guidelines reformatted.
• Query and result in the same language removed as exceptions to Wrong Language.
• Clarification on which pop-ups are considered Content Unavailable.
• Instructions added on how to handle ad blockers, CAPTCHAs, and cookie pop-ups.
• Added scenarios for when to grade News result as Highly Satisfying.
• Added examples for Content Unavailable and Web Image groups.
• Examples of “Trusted News Sources” for en locales added to News Section.
• Results for advice or recommendation queries can not be HS - this was accidentally removed in 1.6 (the change only applied to
news)
• Fixed some inconsistencies (mentioning news can never be HS or not HS at the same time)
• Examples of Highly Satisfying news responses
• Explanation of “Adeles Third Album” (in Think About the Meaning) has been fixed.
• If at least one image in web-images group result is not visible then flag as Content Unavailable (see section in Content
Unavailable)
• Updated table of advice to suggest this in Grading Specific Advice for Web Images