0% found this document useful (0 votes)
80 views11 pages

Baseline Guidelines

This document provides guidelines for relevance rating in video search on Apple TV, emphasizing the importance of user intent and query types. It categorizes queries into three types: Browse, Navigational, and Similarity, each with specific rating criteria based on relevance, popularity, and recency. Raters are instructed to conduct thorough research and provide reasoning for their ratings to enhance the user experience and improve search algorithms.

Uploaded by

nwabudefrank9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views11 pages

Baseline Guidelines

This document provides guidelines for relevance rating in video search on Apple TV, emphasizing the importance of user intent and query types. It categorizes queries into three types: Browse, Navigational, and Similarity, each with specific rating criteria based on relevance, popularity, and recency. Raters are instructed to conduct thorough research and provide reasoning for their ratings to enhance the user experience and improve search algorithms.

Uploaded by

nwabudefrank9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

• My Tasks

• Training
• Mohammed Zaki Mohiuddin
Video Siri (Complex Queries) Training

Guidelines for Video Siri (Complex


Queries) Training
10/28/24 - New section 3.4.21 Brand Results, updates to sections 1.3 Reasoning (Comments), 3.5 Problem: Other, 3.4.2 'Free'
Queries
10/10/24 - updated tables under 3.4.18 Person and 3.4.15 Time Period Queries sections.

1. Introduction
In this document, we explain relevance rating guidelines for video search on Apple TV. If you are not familiar with the Apple TV app,
please refer to https://www.apple.com/apple-tva-app/ for an overview and basic information about this app.
Please use the sidebar to help you navigate through your reading.

1.1 The importance of your work as a Rater


Each of the judgements you complete will be used to build and improve artificial intelligence systems such as search algorithms and
machine learned rankers that power the user experience for Apple TV users. Your attention to detail, research and language skills as
well as your cultural knowledge of the market are all critical to the success of our projects.

Your judgements should represent those of an Apple TV user who is using the Search feature. Ask yourself if you would be content
with the results returned for a particular search query. Is there a significant relationship between the query and content returned?
Would you be content if you see this content appear as a search result? Stay curious and complete thorough research.

Our ultimate goal is to surprise and delight our customers by improving search quality and enhancing customer satisfaction, and you
play an important role in this.

Please keep in mind that your tasks will be spot-checked for quality, and measured against those of your peers.

1.2 Primary and Secondary Intent


The primary intent of a query is the most likely intent, i.e. the intent of most users who say the given query. Some queries may have
multiple primary intents.

A secondary intent is less likely, or would be a less popular intent compared to a primary one. A secondary intent could include:
• Content relevant to a smaller group of users than for the primary intent. For queries like [shows] and [movies], the
primary intent is usually media content for adults. Content for children would be considered secondary intent, except Shen the
intent is obviously kids-related (such as cartoons, animated films, etc.).
• Complimentary content such as trailers, reviews, cast members, or interviews with the cast on how the movie was
made.
• Lower quality/lesser-known content that is relevant to the query but is not the primary intent, a content that is dated or
less popular.

1.2 Query Types & Intents


This evaluation contains long, complex queries that may contain multiple aspects, such as specifying a genre, time-period and actor
all in one query. These queries will broadly fall into three different categories that will be rated differently.

1.2.1 Browse
The first type is queries with a browsing intent. These queries will point to a larger set of content where the user doesn’t have anything
specific in mind. Some examples:
• [best tv shows set in the future]
• [i want a a classic made for tv type move]
• [show me commedies available in french]
• [whats something good to watch around chinese new years]
1.2.2 Navigational (Video Navigational & Single Results
Navigational)
The second type of queries have a video navigational intent. These queries are looking for a specific piece of content or a small list of
contents. Navigational queries are categorized as either Video Navigational or Single Navigational. The key difference is that Single
Results Navigational queries refer to a specific movie, whereas Video Navigational queries may target a broader range of
content.

Examples:

Video Navigational:
• [james bond movies with pierce brosnan] → Pierce Brosnan was James Bond in 4 films
• [all kardashians shows] → Looking for shows related to the Kardashian family, like The Kardashians and Keeping Up with
the Kardashians
Single Results Navigational:
• [James bond movie quantum something] → Looking for Quantum of Solace
• [most recent best picture academy awards] → Looking for a specific movie
• [that tv+ show where the actor travels the world even though he hates travel] → This describes The Reluctant Traveler
with Eugene Levy

1.2.3 Similarity
The final type of query is when the user is looking for content similar to another piece of content. These queries will generally
reference a piece(s) of content and use phrases like “similar to” or “like”.

Examples:
• [movies like top gun]
• [suspensful show similar to the 100]
• [show me something super dark like sey7en]

1.3 Reasoning (Comments)


For these complex queries it is especially important to provide the reasoning for a result being relevant to the query. Thus, for each
result you must provide the following in the Comments field. Although the Comments field is labeled as "optional," it is mandatory, and
you are required to provide your reasoning.
• Provide a concise explanation indicating why it should be considered relevant, not relevant, or the relevance considered
as ambiguous via unknown. Your explanation MUST answer each of the following questions:
1. What is the intent of the query?
2. Are there any abbreviations? If so, explain them (e.g. "90s refers to the time between 1990 and 1999").
3. How is the result relevant to the query?
The reasoning field should be primarily focused on the connection of the result to the query rather than explaining the specific rating
given.

Examples:
Below are examples of the reasoning for different query and result pairs. The examples are color coded to show how the questions
are answered.

2. Rating Process
2.1 Query Research
To complete query research, tap into your local market knowledge in addition to online sources such as IMDB, YouTube,
Wikipedia, video streaming services, local content evaluators and social media. Consider how a user of Apple TV engages the search
feature to navigate to a specific set of content, or as a means to browse a larger catalogue of titles.

Online Research
• IMDB
o Popularity Ranking
o Rating Counts
o Storyline, Taglines
o Genre
o Release Date
o For topic, decade, etc: (IMDB Sort By feature)
• Box Office Mojo
o Local Box Office Rankings
• Common Sense Media
o Age Rating → for determination of kids content
• Wikipedia
o Information of the movie/show: plot, cast, awards, nominations, crew
• Social Media
o Trending films
o Followers on Twitter, Instagram, TikTok, etc. can help to determine a person’s popularity
• YouTube trailer views
o Make sure to factor in time, consider the monthly average views since its upload
• Local content evaluators
o To identify popularity in the market

2.2 Aspects to Consider


For Browse and Navigational queries, consider aspects:
1. Relevance(2.2.1)
2. Popularity(2.2.2)
3. Recency(2.2.3)
For Similarity queries, focus on aspects:
1. Target Aspects(2.2.4)
2. Factual Aspects(2.2.5)
3. Theme(2.2.6)
Definitions are below.

2.2.1 Relevance
Relevance is in short, the connection of query to the output. Below are examples where the show Ted Lasso might surface.

.What if the query is only partially relevant?

Results may match part of a query. In these cases, we weigh query requirements in its importance as a requirement and on a scale
from Factual to Ambiguous. For fully Factual aspects, anything that doesn’t match would be considered Off-Topic. For more
Ambiguous requirements, we would demote the rating by 1 (e.g., Good → Acceptable) if the result matches other aspects but not that
requirement.\

Example: [exciting 2000’s documentary with morgan freeman]


• exciting is a subjective mood that is hard to define. If the result is a 2000’s documentary with Morgan Freeman that you
don’t consider exciting, it still relevant and fulfills the intent as we cannot really define what an “exciting documentary” is.
o note: if the query is [exciting dramas with Morgan Freeman], it's more reasonable to evaluate the plot to
determine if the movie appears exciting. If the result is a "boring" drama with Morgan Freeman, it would merit a rating demotion of
1.
• 2000’s is factual, therefore the content needs to be from the years 2000 to 2009 or explicitly about that time period to be
considered fully relevant. Content from before 2000 is off-topic, and content from 2010 onwards still holds some relevance but
receives 1 rating demotion.
• documentary is also factual, but not a hard requirement. An exciting 2000s movie with Morgan Freeman that is based on
a true story (not a documentary), could be partially relevant and receive a 1 rating demotion.
• morgan freeman is a fully factual requirement. Any content without Morgan Freeman would be considered off-topic.

Note: The above guidance is primarily for Browse queries. For Navigational queries, consider the above if the result is a secondary
intent.

2.2.2 Popularity
Remember that popularity is a sliding scale that depends on the relevance of the content for a given query. For example, a query like
[action comedies that don’t have sci-fi elements] has a much larger pool of relevant content compared to [Norwegian comedy shows
about school children]. Consequently, a Norwegian action comedy show might be considered very popular in the context of the
second query but not as popular in the context of the first, due to the differing sets of relevant results.

2.2.3 Recency
In general, users are more satisfied with newer content, so our ratings should prioritize content that has not been out as long. Similar
to popularity, recency exists on a sliding scale. Broad queries such as [what's something I can watch with surround sound] would have
more relevant results compared to query [historical dramas where they speak latin]. Therefore, the recency of a result should be
evaluated in relation to the other possible results that can be returned for the query.

For TV shows, recency is determined by the release date of the most recent season.
2.2.2 Target Audience
The target audience for a movie includes considerations of two factors: Genre and Age Rating (Kids (PG) vs Everyone (PG-13, TV-
14) vs Adults (R)). If a content fits within these age ratings and its genre aligns with the interests of that audience, it is generally
considered to target that specific group.

2.2.3 Factual Aspects


Factual aspects is determined by two aspects:
• Cast & Crew: Actors, Producers, Studio, etc.
• Setting: Location AND Time Period in which the content plays

2.2.4 Theme
Theme refers to what is the content is about - the underlying message, idea, or lesson that a movie conveys beyond its plot and
characters.

3. Rating Scale & Examples


Each of our three query types have different ways of applying the principles outline above. Reference the section below to see a
deeper dive and examples on how to rate each type of query.

3.1 Browse
For browse queries there is no “Perfect” results. This is because, by their nature, the user is not looking for anything specific and is
instead browsing. Therefore, the highest rating available is “Excellent”, and ratings are determined based on relevance, popularity,
and recency.

General Guideline:
• Excellent → The returned content is relevant to the query (not only by title), popular and recent.
• Good → The returned content is relevant to the query, and either popular or recent.
• Acceptable → The returned content is relevant to the query and neither popular nor recent.
• Off-Topic → Regardless of popularity and recency, the returned content is unlikely to be what the user intended to find
with the query or there is no clear relationship between the query and the content

.
Examples:

3.2 Navigational (Single Results Navigational


& Video Navigational)
Navigational queries are generally looking for a specific piece of content (i.e. a specific movie or show) or a finite set of content such
as a film-trilogy. In these cases we allow for a “Perfect” rating for those specifically intended pieces of content.

General Guideline:
• Perfect → If navigational content and result is certainly primary intent of user, rate it “Perfect”, regardless of popularity and
recency. For Franchise related queries, please follow the Franchise specific guidelines as outlined above.
• Excellent → If the returned content is a sequel or prequel of the intended content, or the content is part of a movie bundle
with the intended content, rate the result as “Excellent”.
• Good → If the returned content is relevant to the query, and either recent or popular, or can be considered a secondary
intent, rate it as “Good”.
• Acceptable → If the returned content is relevant to the query, but neither recent nor popular or can only poorly satisfy a
secondary intent, it can be rated as “Acceptable”.
• Off-Topic → If the content is not relevant to the query or it is very unlikely that a user would search for the returned content
with the given query, it can be rated off-topic.

.
Examples:

.
3.3 Similarity
Instead of looking at popularity and recency, you should focus on how similar the content is to the referenced content. The main goal
of similarity searches and ratings is to capture “if a user liked piece of content x, will they also like piece of content y?”

Evaluate Similarity based on Target Audience, Factual Aspects, and Theme.

For similarity queries, there may be additional aspects highlighted such as “movies like x”, “comedies similar to y”. In those cases the
ratings should take the highlighted aspect as necessary to reach the minimum “Acceptable” rating.

General Guideline:
• Excellent → Similarity in 3 aspects (Target Audience, Factual Aspect and Theme)
• Good → Similarity in 2 aspects or just a close match in Target Audience
• Acceptable → Similarity in only Factual Aspects or Theme, or if it's a mentioned content (see section 3.4.3)
• Off-Topic → No Similarity

.
Examples:

3.4 Special Cases


There are a variety of special cases to be aware of, where the rating scales are slightly different from what is written above.

Please note that the below guidelines are meant to augment the guidelines above, not override them. For instance, the additional
guidance for a query referencing a person has maximum rating of “Excellent”, however, for a navigational query such as “movie where
tom hanks is stuck on a island” the intended movie of “Cast Away” would still get a “Perfect” rating.

Here are a list of special cases covered in this document:

3.4.1 Result Filters


When handling queries that include tokens like [‘new’, ‘best’, ‘movies’, ‘shows’], consider these filters along with the core intent:
• [new] → emphasize on recency aspect, the popularity aspect is less important
• [best] → emphasize on the popularity and awards aspect, the recency aspect is less important
• [movies] → consider only content in that content type (here: movie) as relevant. Shows in this case can be rated as off-
topic

3.4.2 'Free' Queries


For queries with strings such as “free”, “free for me”, please consider what streaming services in your market are free along with
content that is available explicitly for free. This includes TV shows where the first episode is available for free. Use the below guidance
to rate these queries.
• Short queries that include the string “free” (“free movies”, “free shows”, “free”)
o Excellent → all AppleTV+ originals
o Good → non AppleTV+ content that is popular and recent
o Acceptable → non AppleTV+ content that is popular or recent
o Off-Topic → non Apple TV+ content that is neither popular, nor recent
• Longer queries that include the string “free” (“free comedy movies”, “free horror shows”, “scary movies free for me”)
o Check if the content is free and then rate based on other aspects of content
▪ “free denzel 2000s”
▪ check if content is free, from the 2000s and features Denzel Washington
▪ if yes and Denzel Washington stars in the content then rating is excellent
▪ If Denzel is in a lesser role, rate based on recency and popularity
▪ “free comedy movies”
▪ check if content is free, a comedy and a movie
▪ if yes then rate based on recency and popularity

Note: In addition to using traditional web resources, one can also check if a content is free follow these steps:
• Click the link that is the title of the returned content
.
• This will bring you to apple tv product page of the content in the country of the evaluation

• Here you will see what platforms the content is available through
o For instance in the US free streaming platforms include (but are not limited too): Tubi TV, Pluto TV, Amazon
FreeVee, Plex
o Other services may have some free content available

3.4.3 Mentioned Content


Users may search for content by referencing another piece of content. In these cases

Typical Structure for Mentioned Content Queries:


• Similarity: “shows like game of thrones”
• Recursion: “movies with the actor of ron weasly”

General Guideline:
• Excellent:
o Other highly relevant / related content
• Acceptable:
o The mentioned content(s)
• Special Cases:
o Franchise
▪ Content under the same branding as the mentioned content is “Acceptable”
▪ Franchise content under different branding is rated according to the franchise / similarity rules

3.4.4 Misspelling/Alternate Spelling


Some words can be spelled in multiple ways or frequently misspelled. Since search is able to handle misspellings, please evaluate the
intent rather than the exact text.
• [rhe] / [thr] / [yhe] → Common misspellings for the word “the”
• [nruto] → Likely looking for “Naruto”
• [bestmovies] → Probably intended as “best movies”
• [4], [IV], [Four] → consider synonyms
• [pray] vs [prey] → since Siri relies on voice input, there can be "misspellings" that come from homonyms that sound similar

3.4.5 Harmful Queries


Queries or content related but not limited to the following topics that may be offensive or sensitive:
• Bias
• Discrimination
• Hatred
• Restricted or Regulated Content
• Self-Harm
• Sexual Material
• Toxicity
• Vulgarity/Obscenities
• Violence
If the result returned contains an offensive or sensitive response to the query, please mark it as “Unacceptable: Off-Topic”.

Note: If the query refers to a content title (movie, TV show), it should not be considered offensive or sensitive. E.g., ‘How to Get Away
with Murder’ is a title for a popular legal thriller TV show and should not be considered as a safety concern.

3.4.6 Kids Content


Kids' content should be demoted by one rating level, unless:
• The query is explicitly about kids' content
• The query is indirectly related to kids' content (e.g., query contains “kids,” “animation,” “for the family,” etc.).
• The content is relevant and rated Acceptable - do not demote Acceptable to Off-Topic since relevant content should
always be rated as at least Acceptable.
Use the “Kids&Family” genre, Common Sense Media, and your own knowledge to determine if content is specifically targeting kids.

Examples:
• [adventure shows to binge this weekend] → “Yakari”: The returned show is in the intended genre and very popular,
however dated and hence could be rated as “Good”. However, since it is primarily kids content, it will be rated “Acceptable”.

3.4.7 Explicit Content


Rate all pornographic content as “Unacceptable: Off-Topic“ unless the query intent is specifically for erotic content, or porn.
• [drama movies for me and my friends] → any pornographic content is “Unacceptable: Off-Topic”
• [erotica] / [porn] → erotic content can receive any rating
• [18+] / [movies for adults] → intent here is not specifically for pornographic or erotic content

3.4.8 Classic Content


Ignore recency for classic movies and TV shows that were among the most popular in their release decade or remain highly popular
today.

3.4.9 Pre-Release Content


Sometimes the result is the preview page for content which is not yet available for viewing. The timing of the release can be ignored.
Different sources will need to be utilized to determine popularity than for already released content.
• Complete research to determine expected popularity (ex. YouTube trailers, production budget, famous cast and crew)
• Original content from Apple TV+, Netflix, Hulu, or other popular production company - generally assumed to be popular.

3.4.10 Franchise
Core franchise are central, primary films (e.g. Harry Potter main films) and non-core includes supplementary or spin-off films (e.g.
Fantastic Beasts).

First and last item of core franchise → Perfect


• Please make an honest determination of the order in which users would typically watch the collection or franchise of
content and use this to complete the rating. Use comment to justify decision.
Other content in core franchise → Excellent
Content in the franchise, but not core
• popular and recent → Excellent
• popular or recent → Good
• neither popular nor recent → Acceptable

3.4.11 Movie Bundles


Movie bundles are a collection of films grouped together, with titles like “James Bond 10 Film Collection.”
Determine the rating for the content related to the query. Once the rating for the content is determined, demote rating by one level.

Example:
• [No time to Die] → James Bond 10 Film collection is Excellent since No time to Die in the collection is the primary intent

3.4.12 Music Results


If the Music result satisfies a potential user intent → Acceptable
If the Music result does not satisfy a potential user intent → Unacceptable: Off-Topic

3.4.13 AppleTV+ Content


Users are using Search on Apple TV which indicates a preference for Apple TV+ content such as Ted Lasso, Coda,
Severance or Tehran. Therefore:
• Popularity is slightly less strict for Apple TV+ content
• If you are in doubt about “Good” or “Excellent” for an Apple TV+ result, select “Excellent”

3.4.14 Sporting / Live Event

.
Please determine if the query is directly relevant to the sporting event (clear Primary Intent) or if the intent of the query can reasonably
be multiple things, including non-sports events (Partial Intent or Secondary Intent).
The following guidance can help determine the proper rating, however, please rely primarily on your own market knowledge and query
relevance to make your judgement.

Clear Primary Intent


• Perfect → live event when query is for a team or single day event
• Excellent → popular live events when query is for a sport, league or multi-day event. Also video on demand assets for
recent championship events. Also for teams in intended league
• Good → less popular live events related to the query that satisfy a secondary intent. Also movies and tv shows related to
the query that is popular and recent
• Acceptable → unpopular live events and video on demand content for relevant non-championship live events. Also movies
and tv shows related to the query that are not popular and recent

Partial Intent or Secondary Intent


Note that all ambiguous queries that have live events as a potential secondary intent, shall be rated based on the Live Event
guidelines. The “Perfect” rating category is not available in this case.
• [foot] → “football”

.
Note that the examples below may have already taken place in the past. Please assume that live assets are either ongoing or
upcoming.

3.4.15 Apple Event / WWDC


Primary Intent (e.g., “Apple Event replays,” “WWDC with a new iPhone announced,” “Apple Developer Conference film”):
• Perfect: Events within a year from the current date of rating
o Example: Apple Event 05.07.24, WWDC 2024
• Excellent: Events older than a year but less than 2 years from the current date of rating
o Example: Apple Event 10.30.23, WWDC 2023
• Good: Events older than 2 years but less than 3 years from the current date of rating
o Example: WWDC 2022
• Acceptable: Events older than 3 years
o Example: Apple Event 09.10.19, WWDC 2019
Secondary Intent (e.g., “Apple,” “iPhone,” “new technology,” “web conference,” “WW”):
• Demote the ratings by one level from the primary intent criteria.
Irrelevant Content (e.g. “2022,” “W,” “September 2021”)
• If the Apple Event / WWDC content is relevant to the query but is unlikely to match the intent, rate as “Acceptable”

*date of publishing these examples: 8/1/24

3.4.16 Time Period Queries


Some queries specify a time period, like [action movies from the 80’s]. In these cases, recency is not important; instead, the focus
should be on whether the content fits the specified time frame. For such queries, focus on popularity for the final rating.

note: TV shows can remain relevant across multiple years or time periods based on when their seasons aired. For
example, Friends is relevant for both queries [i want to watch a classic 90’s sitcom] and [2000’s most popular shows]

General Guideline:
• Excellent:
o Content which won major award(s) in the given year/decade
o Ultra Popular (Approximately):
▪ Top 50 most viewed shows/films associated with decade
▪ Top 10 most viewed films associated with year
o Ultra popular show has >=50% of seasons/episodes in decade
• Good:
o Popular shows/movie associated with year/decade
o Show has 3+ seasons (or >50%) in decade
o Popular (Approximately):
▪ Top 100 most viewed in decade
▪ Top 30 most viewed films from year
• Acceptable:
o Content released in year/decade
o Show has 1+ season in decade
• Off-Topic:
o Content not from year/decade
.

3.4.17 Seasonal Results


In some locales, certain results gain additional relevance and popularity at specific times of the year. For example, holiday movies are
very popular in the US during the end of the year. Please consider the seasonal popularity when rating results and adjust the rating
accordingly.

Examples (US results during December):


• [s] → “Spirited”: Always “Excellent”, new and very popular result, “Excellent” is the best result for an “Ambiguous - Intent
Unclear” query
• [c] → “A Charlie Brown Christmas”: If rating the result in July this would be a “Good” result as it is relevant and popular,
but not quite at the level of a classic that would receive an “Excellent” rating. However, if this result surfaces in the November /
December timeframe it should be rated as “Excellent” as it is a holiday classic and an “Excellent” result at that time
• [bo] → “A Boyfriend for Christmas”: If rating in the summer this would be an “Acceptable” result, it is relevant but not
recent or popular. During the Holiday season it would be rated as “Good”, being a Christmas Romantic Comedy, it has additional
popularity during that period

3.4.18 Character
Intent for “Character” queries can be both Movies and TV Shows. Often, the Franchise section (3.4.10) will apply to ratings for
characters. In cases where the main character name and the title of the content match, character shall be the dominating
classification.

General Guideline:
• Perfect:
o The most popular and recent content featuring this character in a major role. (Apply “Franchise Query” Rule).
o If only one content with the character is produced, this content can be rated “Perfect”
• Excellent:
o Sequels/prequels for the show in which the character is best known
o Other high-quality content featuring the character
o Person page for well known actor/actress who plays the character
• Good:
o Content which is in the same franchise but about a different character
o If the character has their own spinoff, the ‘parent’ show where the character first appeared
• Acceptable:
o Show/movie features the character in insignificant role

3.4.19 Person
Some queries may reference a person, be it an actor, director, producer, musician, etc. Use the guidelines below to modify your rating
for queries with such a reference. This guidance is primarily for Browse queries featuring a person.

General Guideline:
• Excellent:
o Content with the intended person as lead in cast & crew
o Most popular documentary about the person (more than 1 possible if equal in popularity/quality)
o Recent and popular live event with person
o Content where person is a significant guest star. Hosted by reputable content creator.
o Set of most popular content inspired by the person
o Person page
• Good:
o Documentary about the person that is popular, but not most popular or recent
o Popular live event with person
o Popular content inspired by the person
o Content with the intended person as cast & crew that is popular or recent
• Acceptable
o Unpopular content about/with the person
o Content with the intended person as cast & crew (not as lead)
.

3.4.20 Awards
Users searching for an award show are generally interested in (1) watching nominated movies/shows before the award event, or (2)
watching movies/shows which won the most recent award event. Recent winners should receive higher ratings, unless the query
specifies a specific edition of the award event.

Definitions:
• If award event upcoming: Nominations have already been announced for the next upcoming award event.
• If no award event upcoming: Nominations for next upcoming award event are not announced yet.

General Guideline:
• Perfect:
o Most recent award event show
o Currently ongoing or upcoming live award event show
• Excellent:
o If award event upcoming: nominees (movies, tv shows or actors) for the upcoming award event. The returned
content itself needs to be nominated
▪ If movie returned, movie needs to be nominated
▪ If person page is returned, person needs to be nominated
o If no award event upcoming: winners of the most recent award event
• Good:
o If award event upcoming: winners of the most recent award event
o If no award event upcoming: winners of previous award events, nominees of most recent award event
o Similar award event shows
• Acceptable:
o Movies/Shows featuring people associated with award show
o Nominees of previous award events
• Exceptions:
o If a year is specified in the query, consider only results relevant to the year
▪ Perfect → content that won an award that year
▪ Excellent → content that was nominated for that year
▪ Off-Topic → content has no association with the award in the given year

3.4.21 Brand Results


At time you will see Brand Results such as the channel page for a streaming service. These results should be rated based on
relevance to the query.
• [dramas on prime video] → The Prime Video brand page would be very relevant and considered an “Excellent” result
• [soccer] → MLS Season Pass page and MLS team pages would be very relevant and “Excellent” results
• [sports] →
o ESPN and MLS Season Pass pages are very relevant and “Excellent” results
o Prime Video and Peacock both sometimes feature live sports but it is not as much of a focus for them and
would be “Good” results

3.5 Problem: Other


Please rate "Problem: Other" in the following scenarios:
1. When there is problem or technical issue with the task in BaseLine that makes it impossible to judge relevance. This can also be
used when the result is not a TV show / movie (e.g. podcast, movie, etc.).
2. When you come across queries such as [cancel subscription], [settings], or [log out] which are not relevant to any video content.
3. When you come across a preview page which has incorrect or missing data such as wrong Cast and Crew, wrong Release Date, or
the wrong artwork.

Please describe the issue in the comment section.

.
1. Introduction1.1 The importance of your work as a Rater1.2 Primary and Secondary Intent1.2 Query Types & Intents1.2.1
Browse1.2.2 Navigational (Video Navigational & Single Results Navigational)1.2.3 Similarity1.3 Reasoning (Comments)2. Rating
Process2.1 Query Research2.2 Aspects to Consider2.2.1 Relevance2.2.2 Popularity2.2.3 Recency2.2.2 Target Audience2.2.3
Factual Aspects2.2.4 Theme3. Rating Scale & Examples3.1 Browse3.2 Navigational (Single Results Navigational & Video
Navigational)3.3 Similarity3.4 Special Cases3.4.1 Result Filters3.4.2 'Free' Queries3.4.3 Mentioned Content3.4.4
Misspelling/Alternate Spelling3.4.5 Harmful Queries3.4.6 Kids Content3.4.7 Explicit Content3.4.8 Classic Content3.4.9 Pre-Release
Content3.4.10 Franchise3.4.11 Movie Bundles3.4.12 Music Results3.4.13 AppleTV+ Content3.4.14 Sporting / Live Event3.4.15 Apple
Event / WWDC3.4.16 Time Period Queries3.4.17 Seasonal Results3.4.18 Character3.4.19 Person3.4.20 Awards3.4.21 Brand
Results3.5 Problem: Other

Copyright © 2025 Apple Inc. All rights reserved. Apple Confidential.

You might also like