Baseline Guidelines
Baseline Guidelines
• Training
• Mohammed Zaki Mohiuddin
Video Siri (Complex Queries) Training
1. Introduction
In this document, we explain relevance rating guidelines for video search on Apple TV. If you are not familiar with the Apple TV app,
please refer to https://www.apple.com/apple-tva-app/ for an overview and basic information about this app.
Please use the sidebar to help you navigate through your reading.
Your judgements should represent those of an Apple TV user who is using the Search feature. Ask yourself if you would be content
with the results returned for a particular search query. Is there a significant relationship between the query and content returned?
Would you be content if you see this content appear as a search result? Stay curious and complete thorough research.
Our ultimate goal is to surprise and delight our customers by improving search quality and enhancing customer satisfaction, and you
play an important role in this.
Please keep in mind that your tasks will be spot-checked for quality, and measured against those of your peers.
A secondary intent is less likely, or would be a less popular intent compared to a primary one. A secondary intent could include:
• Content relevant to a smaller group of users than for the primary intent. For queries like [shows] and [movies], the
primary intent is usually media content for adults. Content for children would be considered secondary intent, except Shen the
intent is obviously kids-related (such as cartoons, animated films, etc.).
• Complimentary content such as trailers, reviews, cast members, or interviews with the cast on how the movie was
made.
• Lower quality/lesser-known content that is relevant to the query but is not the primary intent, a content that is dated or
less popular.
1.2.1 Browse
The first type is queries with a browsing intent. These queries will point to a larger set of content where the user doesn’t have anything
specific in mind. Some examples:
• [best tv shows set in the future]
• [i want a a classic made for tv type move]
• [show me commedies available in french]
• [whats something good to watch around chinese new years]
1.2.2 Navigational (Video Navigational & Single Results
Navigational)
The second type of queries have a video navigational intent. These queries are looking for a specific piece of content or a small list of
contents. Navigational queries are categorized as either Video Navigational or Single Navigational. The key difference is that Single
Results Navigational queries refer to a specific movie, whereas Video Navigational queries may target a broader range of
content.
Examples:
Video Navigational:
• [james bond movies with pierce brosnan] → Pierce Brosnan was James Bond in 4 films
• [all kardashians shows] → Looking for shows related to the Kardashian family, like The Kardashians and Keeping Up with
the Kardashians
Single Results Navigational:
• [James bond movie quantum something] → Looking for Quantum of Solace
• [most recent best picture academy awards] → Looking for a specific movie
• [that tv+ show where the actor travels the world even though he hates travel] → This describes The Reluctant Traveler
with Eugene Levy
1.2.3 Similarity
The final type of query is when the user is looking for content similar to another piece of content. These queries will generally
reference a piece(s) of content and use phrases like “similar to” or “like”.
Examples:
• [movies like top gun]
• [suspensful show similar to the 100]
• [show me something super dark like sey7en]
Examples:
Below are examples of the reasoning for different query and result pairs. The examples are color coded to show how the questions
are answered.
2. Rating Process
2.1 Query Research
To complete query research, tap into your local market knowledge in addition to online sources such as IMDB, YouTube,
Wikipedia, video streaming services, local content evaluators and social media. Consider how a user of Apple TV engages the search
feature to navigate to a specific set of content, or as a means to browse a larger catalogue of titles.
Online Research
• IMDB
o Popularity Ranking
o Rating Counts
o Storyline, Taglines
o Genre
o Release Date
o For topic, decade, etc: (IMDB Sort By feature)
• Box Office Mojo
o Local Box Office Rankings
• Common Sense Media
o Age Rating → for determination of kids content
• Wikipedia
o Information of the movie/show: plot, cast, awards, nominations, crew
• Social Media
o Trending films
o Followers on Twitter, Instagram, TikTok, etc. can help to determine a person’s popularity
• YouTube trailer views
o Make sure to factor in time, consider the monthly average views since its upload
• Local content evaluators
o To identify popularity in the market
2.2.1 Relevance
Relevance is in short, the connection of query to the output. Below are examples where the show Ted Lasso might surface.
Results may match part of a query. In these cases, we weigh query requirements in its importance as a requirement and on a scale
from Factual to Ambiguous. For fully Factual aspects, anything that doesn’t match would be considered Off-Topic. For more
Ambiguous requirements, we would demote the rating by 1 (e.g., Good → Acceptable) if the result matches other aspects but not that
requirement.\
Note: The above guidance is primarily for Browse queries. For Navigational queries, consider the above if the result is a secondary
intent.
2.2.2 Popularity
Remember that popularity is a sliding scale that depends on the relevance of the content for a given query. For example, a query like
[action comedies that don’t have sci-fi elements] has a much larger pool of relevant content compared to [Norwegian comedy shows
about school children]. Consequently, a Norwegian action comedy show might be considered very popular in the context of the
second query but not as popular in the context of the first, due to the differing sets of relevant results.
2.2.3 Recency
In general, users are more satisfied with newer content, so our ratings should prioritize content that has not been out as long. Similar
to popularity, recency exists on a sliding scale. Broad queries such as [what's something I can watch with surround sound] would have
more relevant results compared to query [historical dramas where they speak latin]. Therefore, the recency of a result should be
evaluated in relation to the other possible results that can be returned for the query.
For TV shows, recency is determined by the release date of the most recent season.
2.2.2 Target Audience
The target audience for a movie includes considerations of two factors: Genre and Age Rating (Kids (PG) vs Everyone (PG-13, TV-
14) vs Adults (R)). If a content fits within these age ratings and its genre aligns with the interests of that audience, it is generally
considered to target that specific group.
2.2.4 Theme
Theme refers to what is the content is about - the underlying message, idea, or lesson that a movie conveys beyond its plot and
characters.
3.1 Browse
For browse queries there is no “Perfect” results. This is because, by their nature, the user is not looking for anything specific and is
instead browsing. Therefore, the highest rating available is “Excellent”, and ratings are determined based on relevance, popularity,
and recency.
General Guideline:
• Excellent → The returned content is relevant to the query (not only by title), popular and recent.
• Good → The returned content is relevant to the query, and either popular or recent.
• Acceptable → The returned content is relevant to the query and neither popular nor recent.
• Off-Topic → Regardless of popularity and recency, the returned content is unlikely to be what the user intended to find
with the query or there is no clear relationship between the query and the content
.
Examples:
General Guideline:
• Perfect → If navigational content and result is certainly primary intent of user, rate it “Perfect”, regardless of popularity and
recency. For Franchise related queries, please follow the Franchise specific guidelines as outlined above.
• Excellent → If the returned content is a sequel or prequel of the intended content, or the content is part of a movie bundle
with the intended content, rate the result as “Excellent”.
• Good → If the returned content is relevant to the query, and either recent or popular, or can be considered a secondary
intent, rate it as “Good”.
• Acceptable → If the returned content is relevant to the query, but neither recent nor popular or can only poorly satisfy a
secondary intent, it can be rated as “Acceptable”.
• Off-Topic → If the content is not relevant to the query or it is very unlikely that a user would search for the returned content
with the given query, it can be rated off-topic.
.
Examples:
.
3.3 Similarity
Instead of looking at popularity and recency, you should focus on how similar the content is to the referenced content. The main goal
of similarity searches and ratings is to capture “if a user liked piece of content x, will they also like piece of content y?”
For similarity queries, there may be additional aspects highlighted such as “movies like x”, “comedies similar to y”. In those cases the
ratings should take the highlighted aspect as necessary to reach the minimum “Acceptable” rating.
General Guideline:
• Excellent → Similarity in 3 aspects (Target Audience, Factual Aspect and Theme)
• Good → Similarity in 2 aspects or just a close match in Target Audience
• Acceptable → Similarity in only Factual Aspects or Theme, or if it's a mentioned content (see section 3.4.3)
• Off-Topic → No Similarity
.
Examples:
Please note that the below guidelines are meant to augment the guidelines above, not override them. For instance, the additional
guidance for a query referencing a person has maximum rating of “Excellent”, however, for a navigational query such as “movie where
tom hanks is stuck on a island” the intended movie of “Cast Away” would still get a “Perfect” rating.
Note: In addition to using traditional web resources, one can also check if a content is free follow these steps:
• Click the link that is the title of the returned content
.
• This will bring you to apple tv product page of the content in the country of the evaluation
• Here you will see what platforms the content is available through
o For instance in the US free streaming platforms include (but are not limited too): Tubi TV, Pluto TV, Amazon
FreeVee, Plex
o Other services may have some free content available
General Guideline:
• Excellent:
o Other highly relevant / related content
• Acceptable:
o The mentioned content(s)
• Special Cases:
o Franchise
▪ Content under the same branding as the mentioned content is “Acceptable”
▪ Franchise content under different branding is rated according to the franchise / similarity rules
Note: If the query refers to a content title (movie, TV show), it should not be considered offensive or sensitive. E.g., ‘How to Get Away
with Murder’ is a title for a popular legal thriller TV show and should not be considered as a safety concern.
Examples:
• [adventure shows to binge this weekend] → “Yakari”: The returned show is in the intended genre and very popular,
however dated and hence could be rated as “Good”. However, since it is primarily kids content, it will be rated “Acceptable”.
3.4.10 Franchise
Core franchise are central, primary films (e.g. Harry Potter main films) and non-core includes supplementary or spin-off films (e.g.
Fantastic Beasts).
Example:
• [No time to Die] → James Bond 10 Film collection is Excellent since No time to Die in the collection is the primary intent
.
Please determine if the query is directly relevant to the sporting event (clear Primary Intent) or if the intent of the query can reasonably
be multiple things, including non-sports events (Partial Intent or Secondary Intent).
The following guidance can help determine the proper rating, however, please rely primarily on your own market knowledge and query
relevance to make your judgement.
.
Note that the examples below may have already taken place in the past. Please assume that live assets are either ongoing or
upcoming.
note: TV shows can remain relevant across multiple years or time periods based on when their seasons aired. For
example, Friends is relevant for both queries [i want to watch a classic 90’s sitcom] and [2000’s most popular shows]
General Guideline:
• Excellent:
o Content which won major award(s) in the given year/decade
o Ultra Popular (Approximately):
▪ Top 50 most viewed shows/films associated with decade
▪ Top 10 most viewed films associated with year
o Ultra popular show has >=50% of seasons/episodes in decade
• Good:
o Popular shows/movie associated with year/decade
o Show has 3+ seasons (or >50%) in decade
o Popular (Approximately):
▪ Top 100 most viewed in decade
▪ Top 30 most viewed films from year
• Acceptable:
o Content released in year/decade
o Show has 1+ season in decade
• Off-Topic:
o Content not from year/decade
.
3.4.18 Character
Intent for “Character” queries can be both Movies and TV Shows. Often, the Franchise section (3.4.10) will apply to ratings for
characters. In cases where the main character name and the title of the content match, character shall be the dominating
classification.
General Guideline:
• Perfect:
o The most popular and recent content featuring this character in a major role. (Apply “Franchise Query” Rule).
o If only one content with the character is produced, this content can be rated “Perfect”
• Excellent:
o Sequels/prequels for the show in which the character is best known
o Other high-quality content featuring the character
o Person page for well known actor/actress who plays the character
• Good:
o Content which is in the same franchise but about a different character
o If the character has their own spinoff, the ‘parent’ show where the character first appeared
• Acceptable:
o Show/movie features the character in insignificant role
3.4.19 Person
Some queries may reference a person, be it an actor, director, producer, musician, etc. Use the guidelines below to modify your rating
for queries with such a reference. This guidance is primarily for Browse queries featuring a person.
General Guideline:
• Excellent:
o Content with the intended person as lead in cast & crew
o Most popular documentary about the person (more than 1 possible if equal in popularity/quality)
o Recent and popular live event with person
o Content where person is a significant guest star. Hosted by reputable content creator.
o Set of most popular content inspired by the person
o Person page
• Good:
o Documentary about the person that is popular, but not most popular or recent
o Popular live event with person
o Popular content inspired by the person
o Content with the intended person as cast & crew that is popular or recent
• Acceptable
o Unpopular content about/with the person
o Content with the intended person as cast & crew (not as lead)
.
3.4.20 Awards
Users searching for an award show are generally interested in (1) watching nominated movies/shows before the award event, or (2)
watching movies/shows which won the most recent award event. Recent winners should receive higher ratings, unless the query
specifies a specific edition of the award event.
Definitions:
• If award event upcoming: Nominations have already been announced for the next upcoming award event.
• If no award event upcoming: Nominations for next upcoming award event are not announced yet.
General Guideline:
• Perfect:
o Most recent award event show
o Currently ongoing or upcoming live award event show
• Excellent:
o If award event upcoming: nominees (movies, tv shows or actors) for the upcoming award event. The returned
content itself needs to be nominated
▪ If movie returned, movie needs to be nominated
▪ If person page is returned, person needs to be nominated
o If no award event upcoming: winners of the most recent award event
• Good:
o If award event upcoming: winners of the most recent award event
o If no award event upcoming: winners of previous award events, nominees of most recent award event
o Similar award event shows
• Acceptable:
o Movies/Shows featuring people associated with award show
o Nominees of previous award events
• Exceptions:
o If a year is specified in the query, consider only results relevant to the year
▪ Perfect → content that won an award that year
▪ Excellent → content that was nominated for that year
▪ Off-Topic → content has no association with the award in the given year
.
1. Introduction1.1 The importance of your work as a Rater1.2 Primary and Secondary Intent1.2 Query Types & Intents1.2.1
Browse1.2.2 Navigational (Video Navigational & Single Results Navigational)1.2.3 Similarity1.3 Reasoning (Comments)2. Rating
Process2.1 Query Research2.2 Aspects to Consider2.2.1 Relevance2.2.2 Popularity2.2.3 Recency2.2.2 Target Audience2.2.3
Factual Aspects2.2.4 Theme3. Rating Scale & Examples3.1 Browse3.2 Navigational (Single Results Navigational & Video
Navigational)3.3 Similarity3.4 Special Cases3.4.1 Result Filters3.4.2 'Free' Queries3.4.3 Mentioned Content3.4.4
Misspelling/Alternate Spelling3.4.5 Harmful Queries3.4.6 Kids Content3.4.7 Explicit Content3.4.8 Classic Content3.4.9 Pre-Release
Content3.4.10 Franchise3.4.11 Movie Bundles3.4.12 Music Results3.4.13 AppleTV+ Content3.4.14 Sporting / Live Event3.4.15 Apple
Event / WWDC3.4.16 Time Period Queries3.4.17 Seasonal Results3.4.18 Character3.4.19 Person3.4.20 Awards3.4.21 Brand
Results3.5 Problem: Other