Search - 2 Video Complex Queries - BaseLine
Search - 2 Video Complex Queries - BaseLine
6/18/24: New section (3.4.5) on Mentioned Content, particularly effects Similarity queries. Clarifications on Kids content demotions.
1 Introduction
In this document, we explain relevance rating guidelines for video search on Apple TV.
If you are not familiar with the Apple TV app, please refer to https://www.apple.com/apple-tv-app/ for an overview and basic information about this app.
Your judgements should represent those of an Apple TV user who is using the Search feature. Ask yourself if you would be content with the results returned for a particular search query. Is there a significant relationship
between the query and content returned? Would you be content if you see this content appear as a search result? Stay curious and complete thorough research.
Our ultimate goal is to surprise and delight our customers by improving search quality and enhancing customer satisfaction, and you play an important role in this.
Please keep in mind that your tasks will be spot-checked for quality, and measured against those of your peers.
1.2.1 Browse
The first type is queries with a browsing intent. These queries will point to a larger set of content where the user doesn’t have anything specific in mind. Some examples:
[best tv shows set in the future]
[i want a a classic made for tv type move]
[show me commedies available in french]
[whats something good to watch around chinese new years]
1.2.2 Navigational
The second type of queries have a navigational intent. These queries are looking for a specific piece of content or a small list of contents. Some examples:
[most recent best picture academy awards] → Looking for a specific movie
[james bond movies with pierce brosnan] → Pierce Brosnan was James Bond in 4 films
[that tv+ show where the actor travels the world even though he hates travel] → This describes The Reluctant Traveler with Eugene Levy
[that bond movie quantum something] → Looking for Quantum of Solace
1.2.3 Similarity
The final type of query is when the user is looking for content similar to another piece of content. These queries will generally reference a piece(s) of content and use phrases like “similar to” or “like”. Some examples:
[movies like top gun]
[suspensful show similar to the 100]
[show me something super dark like sey7en]
1.3 Reasoning
For these complex queries it is especially important to provide the reasoning for a result being relevant to the query. Thus, for each result you must provide the following in the “Reasoning” field.
Provide a concise explanation indicating why it should be considered relevant, not relevant, or the relevance considered as ambiguous via unknown. Your explanation MUST answer each of the following questions:
What is the intent of the query?
Are there any abbreviations? If so, explain them (e.g. "90s refers to the time between 1990 and 1999").
How is the result relevant to the query?
The reasoning field should be primarily focused on the connection of the result to the query rather than explaining the specific rating given.
Reasoning Examples
2 Rating Process
2.1 Query Research
To complete query research, tap into your local market knowledge in addition to online sources such as IMDB, YouTube, Wikipedia, video streaming services, local content evaluators and social media. Consider how a
user of Apple TV engages the search feature to navigate to a specific set of content, or as a means to browse a larger catalogue of titles.
https://baseline.apple.com/training/evaluations/1051/guidelines 1/8
19:16 17/7/24 Guidelines for Search - Video Complex Queries — BaseLine
Online Research
IMDB
Popularity Ranking
Rating Counts
Storyline, Taglines
Genre
Release Date
For topic, decade, etc: (IMDB Sort By feature)
Box Office Mojo
Local Box Office Rankings
Common Sense Media
Age Rating → for determination of kids content
Wikipedia
Information of the movie/show: plot, cast, awards, nominations, crew
Social Media
Trending films
Followers on Twitter, Instagram, TikTok, etc. can help to determine a person’s popularity
YouTube trailer views
Make sure to factor in time, consider the monthly average views since its upload
Local content evaluators
To identify popularity in the market
2.2.1.1 Relevance
[the show with the american coach in europe with the mustache] → While indirect, this query is looking for a show with an American Coach that has a mustache and is in Europe, this accurately describes the titular
character of Ted Lasso
Plot point is specific to Ted Lasso
[commedy tv shows to binge] → Despite being misspelled this query is clearly looking for comedy tv shows, which Ted Lasso is
The Office, Parks and Rec, South Park
[emmy award winners since 2020] → Ted Lasso and its cast and crew have won multiple Emmy’s since 2020
Succession, Last Week Tonight with John Oliver, The Bear
[best apple originals that are funny] → The show is produced by and streamed on Apple TV+ and is a comedy
Shrinking, Loot, Trying
[I want some shows about soccer clubs] → Ted Lasso is about a soccer team
Welcome to Wrexham, Boca Juniors Confidential, All or Nothing: Tottenham Hotspur
[what are some of the best shows with jason sudeikis] → The show stars Jason Sudeikis
Saturday Night Live, 30 Rock, The Last Man on Earth
[I want to watch something where a protege becomes a rival] → This query is describing a key plot point of season 3 of Ted Lasso
Star Wars (Anakin & Obi Wan), Naruto (Orochimaru & 3rd Hokage)
2.2.1.2 Similarity
[sports dramas like firday night lights] → This query is looking for something similar to Friday Night Lights which is a sports drama. Ted Lasso is also a sports drama and focuses around a charismatic head coach, their
age ratings (TV-14 and TV-MA are close) making them very similar and Ted Lasso Relevant
All American, The Boys in the Boat
[apple tv+ shows similar to shrinking] → Ted Lasso is an Apple TV+ dramatic comedy with a TV-MA age rating, just like Shrinking, making it quite similar
Loot, Platonic, Trying
2.2.2 Popularity
Popularity is a key component of ratings, sources such as IMDB, YouTube trailer views, Box Office returns will help determine how popular a piece of content is.
Remember that popularity is a sliding scale depending on the relevant content for a query. A query such as [action comedies that don’t have sci-fi elements] has a lot more relevant content than [norwegian comedy shows
about school children]. For our ratings there might be a Norwegian action comedy show that is considered very popular for the second query and not popular for the first because of the different set of relevant results.
https://baseline.apple.com/training/evaluations/1051/guidelines 2/8
19:16 17/7/24 Guidelines for Search - Video Complex Queries — BaseLine
Excellent = Events older than a year, less than 2 years gap from the current date of rating
(ex. Apple Event 09.07.22, WWDC 2022)
Good = Events older than 2 years, less than 3 years gap from the current date of rating
(ex. WWDC 2021)
Acceptable = Apple Event 09.10.19, WWDC 2019
If this is the secondary intent, decrease the ratings above by 1.
Query = “Apple”, “iPhone”, “new technology”, “web conference”, “ww”
If the Apple Event / WWDC content is relevant to the query but unlikely to be the intent, rate as “Acceptable”.
Query = “2022”, “w”, “September 2021”
*date of publishing these examples: 3/7/24
3.1 Browse
For browse queries there is no “Perfect” results. This is because, by their nature, the user is not looking for anything specific and is instead browsing. As a result the best rating is “Excellent” and the ratings are
determined by relevance / connection to the query, popularity and recency.
General Guideline:
Excellent → The returned content is relevant to the query (not only by title), popular and recent
Good → The returned content is relevant to the query, and either popular or recent.
Acceptable → The returned content is relevant to the query and neither popular nor recent.
Off-Topic → Regardless of popularity and recency, if it is unlikely that the user would use the query to search for the returned content or there is no relationship between the query and the content.
Browse Scale
Browse Examples
3.2 Navigational
Navigational queries are generally looking for a specific piece of content (i.e. a specific movie or show) or a finite set of content such as a film-trilogy. In these cases we allow for a “Perfect” rating for those specifically
intended pieces of content.
Note: Navigational queries can also also receive results due to similarity, in that case refer to the similarity guidelines to rate the result.
[movies with the squirel and acorn] → Looking for Ice Age movie franchise, if something not from the franchise shows up it is rated on similarity
General Guideline:
Perfect → If navigational content and result is certainly primary intent of user, rate it perfect, regardless of popularity and recency. For Franchise related queries, please follow the Franchise specific guidelines as
outlined above.
Excellent → If the returned content is a sequel or prequel of the intended content, or the content is part of a movie bundle with the intended content, rate the result as excellent.
Good → If the returned content is relevant to the query, and either recent or popular, or can be considered a secondary intent, rate it as good.
Acceptable → If the returned content is relevant to the query, but neither recent nor popular or can only poorly satisfy a secondary intent, it can be rated as acceptable.
Off-Topic → If the content is not relevant to the query or it is very unlikely that a user would search for the returned content with the given query, it can be rated off-topic.
Navigational Scale
https://baseline.apple.com/training/evaluations/1051/guidelines 3/8
19:16 17/7/24 Guidelines for Search - Video Complex Queries — BaseLine
Navigational Examples
3.3 Similarity
For similarity queries we have a very different way of evaluating the results. Instead of looking at popularity and recency we are focused on how similar the content is to the referenced content. The primary intent of
similarity searches and ratings is to capture “if a user liked piece of content x, will they also like piece of content y?”
Evaluate Similarity between the intended content and the returned content based on the following aspects:
Target Audience
Genre
Age Rating (Kids (PG) vs Everyone (PG-13, TV-14) vs Adults (R))
Factual Aspects
Cast & Crew: Actors, Producers, Studio, etc.
Setting: Location AND Time Period in which the content plays
Theme
What is the content about?
For similarity queries there may be additional aspects highlighted such as “movies like x”, “comedies similar to y”. In those cases the ratings should take the highlighted aspect as necessary to reach the minimum
acceptable rating.
General Guideline:
Excellent → Similarity in Target Audience, Factual Aspect and Theme
Good → Similarity in 2 of the three categories (Target Audience, Factual Aspect and Theme) or just a close match in Target Audience
Acceptable → Similarity in only Factual Aspects, Theme or loose Target Audience match
Off-Topic → No Similarity
Similarity Scale
similarity examples
https://baseline.apple.com/training/evaluations/1051/guidelines 4/8
19:16 17/7/24 Guidelines for Search - Video Complex Queries — BaseLine
Below are the same examples from above, but showing how the similar result might show up on a navigational query.
Please note that the below guidelines are meant to augment the guidelines above, not override them. For instance the additional guidance for a query referencing a person has maximum rating of “excellent”, however, for
a navigational query such as “movie where tom hanks is stuck on a island” the intended movie of “Cast Away” would still get a perfect rating.
3.4.2 Character
https://baseline.apple.com/training/evaluations/1051/guidelines 5/8
19:16 17/7/24 Guidelines for Search - Video Complex Queries — BaseLine
Intent for “Character” queries can be both Movies and TV Shows. Often, the “Franchise Ratings” (Appendix) will apply to ratings for characters. In cases where the main character name and the title of the content
match, character shall be the dominating classification.
General Guideline:
Perfect:
The most popular and recent content featuring this character in a major role. (Apply “Franchise Query” Rule).
If only one content with the character is produced, this content can be rated perfect
Excellent:
Sequels/prequels for the show in which the character is best known
Other high-quality content featuring the character
Person page for well known actor/actress who plays the character
Good:
Content which is in the same franchise but about a different character
If the character has their own spinoff, the ‘parent’ show where the character first appeared
Acceptable:
Show/movie features the character in insignificant role
Character Scale
Character Examples
3.4.3 Person
Some queries may reference a person, be it an actor, director, producer, musician, etc. Use the guidelines below modify your rating for queries with such a reference. This guidance is primarily for Browse queries
featuring a person.
General Guideline:
Excellent:
Content with the intended person as lead in cast & crew
Most popular documentary about the person (more than 1 possible if equal in popularity/quality)
Recent and popular live event with person
Content where person is a significant guest star. Hosted by reputable content creator.
Set of most popular content inspired by the person
Person page
Good:
Documentary about the person that is popular, but not most popular or recent
Popular live event with person
Popular content inspired by the person
Content with the intended person as cast & crew that is popular or recent
Acceptable
Unpopular content about/with the person
Content with the intended person as cast & crew (not as lead)
Person Scale
https://baseline.apple.com/training/evaluations/1051/guidelines 6/8
19:16 17/7/24 Guidelines for Search - Video Complex Queries — BaseLine
Person Examples
3.4.4 Awards
Users searching for an award show are generally interested in (1) watching nominated movies/shows before the award event, or (2) watching movies/shows which won the most recent award event. Recent winners
should receive higher ratings, unless the query specifies a specific edition of the award event.
Definitions:
If award event upcoming: Nominations have already been announced for the next upcoming award event
If no award event upcoming: Nominations for next upcoming award event are not announced yet.
General Guideline:
Perfect:
Most recent award event show
Currently ongoing or upcoming live award event show
Excellent:
If award event upcoming: nominees (movies, tv shows or actors) for the upcoming award event. The returned content itself needs to be nominated
if movie returned, movie needs to be nominated
if artist page is returned, artist needs to be nominated
If no award event upcoming: winners of the most recent award event
Good:
If award event upcoming: winners of the most recent award event
If no award event upcoming: winners of previous award events, nominees of most recent award event
Similar award event shows
Acceptable:
Movies/Shows featuring people associated with award show
Nominees of previous award events
Exceptions:
If a year is specified in the query, consider only results relevant to the year
Perfect → content that won an award that year
Excellent → content that was nominated for that year
Off-Topic → content has no association with the award in the given year
Awards Scale
https://baseline.apple.com/training/evaluations/1051/guidelines 7/8
19:16 17/7/24 Guidelines for Search - Video Complex Queries — BaseLine
Awards Examples
General Guideline:
Excellent:
Other highly relevant / related content
Acceptable:
The mentioned content(s)
Special Cases:
Franchise
Content under the same branding as the mentioned content is acceptable
Franchise content under different branding is rated according to the franchise / similarity rule
https://baseline.apple.com/training/evaluations/1051/guidelines 8/8