0% found this document useful (0 votes)
32 views110 pages

Resource 2

The HRS Judging Guidelines Addendum provides a comprehensive framework for evaluating search engine results, focusing on user intent and content quality. It outlines a five-step process for judges to assess query intents, review results, and rate the relevance and quality of web documents. The document also includes specific scenarios, examples, and technical requirements for using the judging tool effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views110 pages

Resource 2

The HRS Judging Guidelines Addendum provides a comprehensive framework for evaluating search engine results, focusing on user intent and content quality. It outlines a five-step process for judges to assess query intents, review results, and rate the relevance and quality of web documents. The document also includes specific scenarios, examples, and technical requirements for using the judging tool effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

HRS JUDGING GUIDELINES ADDENDUM - SIDE-BY-SIDE

_______________________________________________________________________
Last Updated: 6/10/2015 by Gayathri Mohan

These guidelines contain confidential and proprietary information belonging to Microsoft Corporation.

The recipient understands and agrees that these materials and the information contained herein may not be used
or disclosed without the prior written consent of Microsoft Corporation.

1
Table of Contents
1 Overview ................................................................................................................................................................................... 4
2 Accessing the Tool ..................................................................................................................................................................... 4
3 Task Detail ................................................................................................................................................................................. 6
3.1 Determine the Query Intents ..................................................................................................................................................... 6
3.2 Review the Results ..................................................................................................................................................................... 7
3.2.1 Result Captions ..................................................................................................................................................................... 8
3.2.2 Flagging Notable Results ....................................................................................................................................................... 9
3.2.3 Comments ........................................................................................................................................................................... 12
3.3 Indicate Which Result Set is Better .......................................................................................................................................... 12
4 Addendums for Specific Judging Scenarios ............................................................................................................................... 13
4.1 Location-Specific Queries......................................................................................................................................................... 13
4.2 Judging Side-By-Side for Mobile Phone Examples ................................................................................................................... 17
4.3 Navigational and Sub-Navigational Intent Queries ................................................................................................................. 26
4.4 Rating SERPs for Category vs. Specific Queries........................................................................................................................ 33
4.4.1 Category vs. Specific Queries .............................................................................................................................................. 34
4.4.2 Examples of Different Types of LPs ..................................................................................................................................... 35
4.4.3 Judging Scenarios ................................................................................................................................................................ 41
4.5 Judging with Relevance Ratings Provided ............................................................................................................................... 42
4.5.1 Judging Considerations ....................................................................................................................................................... 45
4.5.2 Judging Example .................................................................................................................................................................. 46
4.6 Content Quality ....................................................................................................................................................................... 47
4.6.1 Overview ............................................................................................................................................................................. 47
4.6.2 Content Authority (CA) ....................................................................................................................................................... 47
4.6.3 Content Utility (CU) ............................................................................................................................................................. 49
4.6.4 Content Discoverability (CD) ............................................................................................................................................... 50
4.6.5 Page Presentation (PP) ........................................................................................................................................................ 51
4.6.6 Content Generation Effort (CGE) ........................................................................................................................................ 51
4.6.7 Content Quality Examples (Page Level) .............................................................................................................................. 52
4.6.8 Content Quality SBS Examples ............................................................................................................................................ 58
4.7 Queries with Recourse and Requery Links ............................................................................................................................... 67
4.7.1 Judgment Guideline ............................................................................................................................................................ 67
4.8 Freshness ................................................................................................................................................................................. 77
5 Mixed Examples ....................................................................................................................................................................... 82
5.1 Left/Right Much Better ............................................................................................................................................................ 82
5.2 Left/Right Better ...................................................................................................................................................................... 85
5.3 Left/Right Slightly Better ......................................................................................................................................................... 90
5.4 About the Same ....................................................................................................................................................................... 95
5.5 Much Better vs. Better ............................................................................................................................................................. 99
5.6 Better vs. Slightly Better ........................................................................................................................................................ 103
6 Appendix: setting up your computer...................................................................................................................................... 107
2
6.1 System Requirements ............................................................................................................................................................ 107
6.2 Browser plugins and external programs ................................................................................................................................ 107
6.3 Refusing and uninstalling third-party programs ................................................................................................................... 107
6.4 Creating an Unprivileged account ......................................................................................................................................... 108
6.4.1 Windows 7 ........................................................................................................................................................................ 108
6.4.2 Vista Ultimate Edition ....................................................................................................................................................... 108
6.4.3 Vista Home Premium ........................................................................................................................................................ 108
6.4.4 XP ...................................................................................................................................................................................... 109
6.5 Malware and Virus Protection ............................................................................................................................................... 109
6.6 Setting the IE language preference ....................................................................................................................................... 110
6.7 Browser shortcuts .................................................................................................................................................................. 110

3
1 Overview
When users enter words to search for a query into a search engine such as Bing or Google, they expect the search engine to understand what they are looking for, and to return
results that completely satisfy their requirements. With the huge number of documents on the Web, the ambiguities of human language, and the usually short queries that users
enter, this is a hard problem.

These guidelines describe how to accurately determine user intent and assess the relevance and quality of the web documents returned by a search engine. While the task may
seem daunting, there are really only five steps:

1. Consider what the user could have had in mind (their intent) when they typed in the query; allow for misspellings or other ambiguity.
2. Look at the set of web documents returned and determine which intents are satisfied by the query, and how strongly they are satisfied.
3. Consider the quality of the content of the documents on both sides with respect to the authority of the document, its usefulness, presentation etc…
4. Given your inspection of the web results returned on both sides, give a rating to which side is most likely to satisfy users, and by how much.
5. Where necessary, provide additional information to explain your rating, by providing comments explaining your rating.

We need your help to understand how satisfied users would be with the results that search engines return for various queries.This document is designed to be used in conjunction
with the main Web judging guidelines.

2 Accessing the Tool


These are the steps you should follow to access the SBS judging tool:

1. Please close all browser windows.


2. Open Internet Explorer (minimum version required is IE9).
3. Log in to: UHRS Prod WebEntry and select the Side by Side production HitApp from the list.
4. When using this tool for the first time, click on the “Marketplace” link first.
5. You will see a query, two search engine results pages (SERP) for that query, and some rating options. We’ve highlighted some key features for you, which we’ll
explain in greater detail below:

4
Provide us with feedback on what you’ve noticed in This box contains the query; always look at it first. In Click on the ‘Can’t Judge’ drop-down menu
this comment box. Note that clear comments are some cases a user location will also be given below. You for unjudgeable examples (see section 3.1).
always helpful, but they are only required when you can click on the location for a map. You should take user For reporting technical issues, follow your
see [Required]. location into account when assessing the query intent. vendor’s process for escalation.

The timer here is given as a guide to help


you track time spent on each example.

Flag results that are Click on the blue


particularly useful or links to review the
poor. For some content for any
examples this will be page. Duplicate
required. See section links on each side
3.2.2 for details. will also be
highlighted when
you hover over
them.

5
3 Task Detail
The judging process involves the following steps.

3.1 Determine the Query Intents


Before you can decide how happy a user would be with a web document, you must determine what they were thinking (what their intent was) when they typed the query into the
search engine. Users with identical queries may not be looking for exactly the same thing, and some intents may be more popular than others.

Most query intents can be classified into one or more of these three categories:

 Navigational: Find a specific page on the internet.


 Informational: Learn more about a topic.
 Transactional: Complete a transaction or action on the internet.

Likewise, the likelihood or popularity of an intent can be categorized into one of the five levels indicated below. An intent that is most likely means that over 50% percent of users
have this specific intent in mind. An intent that is very unlikely indicates that less than 1% of users have that particular intent in mind for the query.

Most Likely

>50%

Very Likely

26-50%
How likely is this Likely
intent?
11-25%
Unlikely
1-10%

Very Unlikely

<1%

If the query is unjudgable, you can click on the Can’t judge drop-down menu to choose from one of the following options:

Can't judge
Unreadable
Adult
Other

 If the query is in a foreign language you don’t understand or is otherwise unreadable (e.g. junk queries with no discernible intent), please choose “Unreadable.”
 If the query and the results are Adult, you can choose Adult and skip the query (however you may rate the results if you choose).
6
 If you see anything else that makes it impossible for you to rate the SERPs, click “Other” to proceed to the next query.
 Note that search engines will sometimes return different numbers of results for certain queries, so this is not a reason to choose the Can’t judge option.

3.2 Review the Results


After determining the possible intents for the query and their likelihood, now review the results returned by both sides.

Landing Pages (LPs) should be relevant to at least one of the intents of the query and be of high quality; they should have appropriate authority, utility, readability, scope,
freshness and diversity for the query. In addition to satisfying the intent of the query, consider the following factors when assessing which is the better SERP:

 Relevance: When thinking about relevance, consider the intents you’ve identified and evaluate the LPs against those intents. The better SERP will have more LPs at higher
positions satisfying the primary intents of the query.

 Quality: Quality, discussed at length in Addendum 4.6, covers five dimensions: Content Authority (CA), Content Utility (CU), Content Presentation (CP), Content Discoverability
(CD) and Content Generation Effort (CGE). When evaluating the quality of LPs on a SERP, consider these five dimensions. Is the source of the content authoritative? Is it at the
appropriate level of depth? Does the appearance and organization of the LPs provide a satisfying experience? For example, pages with excessive ads or distracting pop-ups
generally provide a less satisfying experience, while pages focused on providing quality content are usually better.

 Scope: Consider whether the results cover the breadth and depth of what the user is looking for. Some queries may be looking for specific information while others for more
general information. Always evaluate results in the context of the whole SERP.

 Freshness: Consider whether the results are from the appropriate time period or not for the query in question. See addendum 4.8.

 Diversity: For queries that have multiple intents, consider whether the SERP provides sufficient coverage for these intents. For judging queries with multiple intents it is
essential to satisfy the most likely intent high on the page. A difficult aspect of judging is to know when it is preferred to include satisfying results for intents which are not the
most likely intents. After a SERP has fully satisfied the most likely intent (i.e. so that a user would not need/want any more LPs to satisfy their intent) then a SERP which
satisfies other likely intents is preferred over a SERP which has more documents for the most likely intent.

 Redundancy: Some results may be highly relevant by themselves, but very similar (redundant) to results that are already shown higher on the page. Showing redundant results
is a negative aspect of a SERP. Two similar results (A and B) may be satistisfying individually, but if the user were to navigate to result A, result B would not satisfy them given
they had already seen result A. This occurs when both A and B have almost identical content or serve a very similar purpose. For example for the query {collector coins} a user
is highly likely to be satisfied for both http://www.ebay.com/sch/i.html?_nkw=collectors+coins and http://www.ebay.com/sch/i.html?_nkw=coin+collector as they display
very similar content, but if a user went to the first page, the second page would not be very useful. Not all results which serve the same intent are redundant, for example
http://www.hsn.com/shop/coin-collector-coins-and-collectibles/co-5648 serves the same intent as the eBay results, but a user may like to see results from different sources
for this type of query.

 User Context: When evaluating the SERPs, consider the provided context, if any. For instance, if the location is provided, consider if the query is dependent on the locale. For
example if the query is {restaurants near me} and the user is located in San Diego, CA, the better SERP will reflect that intent For more information, see Addendum 4.1.
Similarly if the user is on a mobile device, the preferred SERP will be appropriate for the device and form factor. See Addendum 4.2 for information on judging mobile queries.

7
You may find that some of these factors conflict with each other. For example, one SERP may have lots of authoritative results, while the other may have better diversity. Again,
consider what the user may have wanted when the query was issued, and then decide which side is better based on the user’s expectations.

About the SERPs as a whole:


 Results should directly address one of the query’s intents.
 A SERP which satisfies the most likely intents higher on the page is essential.
 The better SERP will contain documents appropriate to the location and context.
 A SERP which has higher quality documents (in terms of authority, presentation, etc…) is prefered.
 A SERP which satisfies the most likely intent and also other intents of the query is superior to a SERP which only satisfies the most likely intent.
 Results should load without problems, and not have malware or adult content for non-adult queries.
 Results should be ordered in the most helpful way, with the most useful results at the top. Keep in mind that users want to find the information they are looking for at
the top of the SERP, so lower results typically have less impact on the user.
 The SERP should in general not have lots of duplicates or be overly redundant; users will be dissatisfied to see the same results or information presented over and
over again.
 Don’t compare pages line-by line across the SERPs. After you have determined what pages are on each SERP, and what kind of content they offer, observe the SERPs
as a whole in light of the above.

This list is not exhaustive, so please keep in mind that different aspects will surface as important, depending on the query and the SERPs you will see.

3.2.1 Result Captions


To help users decide what to click on, results are represented by captions comprising a title, URL, and a summary (a.k.a. Snippet):

Caption

Title

URL

Snippet

Caption quality can vary; poor captions might contain junk, copyright messages, boilerplate text, and so on. Here are some examples:

8
Your primary focus is on result relevance and quality, but all else being equal, users will be unhappy to see results with unclear or misleading captions. Take a look at this example:

The landing page (LP) offered on both sides is the same, but the captions are different. In particular, the snippet on the right states at the beginning that the program has expired,
while on the left this is not clear until the user reads the second sentence. Since the snippet on the right is more immediately helpful but is not complete (the email address has
been left out), we rate Right Slightly Better.

3.2.2 Flagging Notable Results


To annotate a result you must click on the result in the non-link area. This will highlight the result and give you the option to annotate with one of four options. Results should
only be annotated when the result influenced the overall rating.

This functionality is always available and is highly recommended. Some judgments will be randomly selected to require this annotation step after submission.

Annotation Description


The ‘check mark’ rating should be annotated when the result perfectly satisfies the query and is the major reason for giving the overall
rating.


The ‘thumbs up’ rating should be annotated on results which positively influenced the overall rating.


The ‘thumbs down’ rating should be annotated on results which negatively influenced the overall rating.


The ‘red x’ rating should be annotated on results which severely negatively influenced the overall rating. This rating should only be used
on results which are clearly irrelevant to the result.

9
Note: If a result occurs on both sides of the page, you can annotate one and the other side will automatically be annotated. In general a document which is good to have on the
page, or where you prefer the higher ranking should be given a green rating, and a document which shouldn’t be on the page, or where you prefer the lower ranking, should be
given a red rating.

Examples:

Result Positions Prefered Position Annotation

L2, R7 L2 Green

L4, R10 R10 Red

L2, Not Shown L2 Green

Not Shown, R4 Not Shown Red

HitApp Example:

In the below example, L1 should be annotated with a ‘thumbs down’ rating, as it misses the “contact” intent of the query. R1 should be annotated with the ‘check mark’ rating as
it perfectly satisfies the user. R2 could be annotated thumbs up if it influenced your decision, or could be left without annotation.

10
After selections have been made, you can change your selection by clicking on the annotation icon.

11
3.2.3 Comments
In some cases, you will notice that providing a comment is mandatory. Your comments help us to understand the key factors that led to your decision. For example, you might
prefer one side because it has a much better key result; has more good results; has fewer bad results; covers more possible intents; or something else. When appropriate, use the
terms defined in the glossary section. Please be as specific as possible, and always refer to each result as L1/R2 (instead of Left1 and Right2). Your comments do not need to be
long, as long as you clearly convey the key factor(s) in your decision. Here are a few examples of good vs. bad comments:

Good Comments Bad Comments


All lps are same except R3 and L10. R3 is a good lp
giving broad info from authorative source. L10 is a Overall order
forum answer with no responses to question

L/R 1-3 identical. R4 better because it's a .net Better results on the right side
framework page, L4 is visual studio
R1 is more recent version than L1. R6 less relevant

In cases where providing a comment is not required, focus your time on determining the best label for the query. If something makes a particular side stand out, we will
appreciate your input.

3.3 Indicate Which Result Set is Better


Take a step back and examine the SERPs as a whole before choosing your final rating. You should make this determination by carefully evaluating the intents which are satisfied by
each SERP and the degree to which those intents are satisfied. Always keep this question in mind for each query: Which side has the results which will better satisfy more users?
Note that showing no results might be better than showing results that are not useful to the user and will waste their time. Don’t automatically penalize the side with no or fewer
results; the order and relevance of the results is more important than the overall number.

12
4 Addendums for Specific Judging Scenarios
4.1 Location-Specific Queries
Location-specific queries are queries where a user’s location is provided and should in certain situations impact the judgment. When the user location is provided, it means that the
user was in that location when they issued that query. There are three types of location-specific queries:

1. No location information

2. City & State (but no map) are provided

3. High Precision Location Queries (HPL): For High Precision Location queries, City,
State and a map are provided. The exact user location is known (i.e. latitude and
longitude in addition to city & state). A map indicates that the query is “high
precision location” and the user was at that exact location when the query was
issued. For high precision location queries, the city & state (Map) text will link to
the specific latitude/longitude of the user on Bing maps.

There are special considerations when evaluating location-specific queries:

 First you must determine whether some possible intents of the query are
dependent on location. Remember that in many cases the query will have more
likely intents which are not depenant on location. In some cases there may be no intents at all relating to location, even though a user location is provided.
 If the query is location dependent (e.g. the user is looking for something nearby like a business or some physical location) then you must take the geographic locations of
the results into account. The better SERPs will have results closer to the user location.
 If only a city and state but no map are provided you can assume the user to be in that city and any results near to that city should be preferred, if applicable and of
comparable relevance and quality otherwise.
 If a map is provided, you can assume the user is in that specific location. SERPs with results closer to this location are preferred, if applicable to the query.
 A user who specifies a location in their query is most likely looking for results related to that location (and not the user location or map position provided with the query).
 Proximity of the results to the user location are not the only criterion. If one is closer but is not likely to satisfy the user’s needs well, the result that is further away (within
reason) but is more relevant is preferred.
 If the location on the LP does not appear to match the query, you can use the map link available to see if they are close to each other. LPs which are geographically close
to the user location can be relevant.
13
Query/Location Judging Considerations

Query: san fran hotels In this query, the provided location (New York, NY) is not important. The intent of the query
Location: New York, NY is to find hotels in a location defined in the query (“san fran” which is probably San
Francisco).

Query: starbucks In this query the user location (Dunnellon, FL) should modify your judgment even though it’s
Location: Dunnellon, FL not explicitly part of the query. Dunnellon, FL is the target location for location-specific
intents. For this query, results about specific Starbucks locations in Dunnellon, FL and
surrounding area could satisfy a user from this location. Remember that even though a user
location has been provided, the query also has non-location specific intents as well.
Query: how to change lightbulb In this query, the provided location is not important. There is no local intent for this query.
Location: Perkasie, PA You would expect the results for this query to be the same regardless of user’s location.

Query: Enjoy Golf Course In this query, the provided location and the query location are aligned. The golf course
Location: Candor, NY mentioned is near the user location (it’s actually called ‘En-Joie Golf Club’ -- the user
mistyped the query and it’s 33 mins away by car from Candor, NY). If there were other golf
courses called Enjoy or similar but in other locations outside of NY, these would be much
less likely intents in this user location.

Query: Macy’s This query is a high precision location query and there is local intent for this query. This
Location: Houston, TX (With a map; 29.785904, - means the user is in that specific location and searching for a Macy’s store. SERPs with
95.564856) results for stores close to that location should be preferred.

Query: coffee This is a high precision location query and there is local intent for this query. The user is
Location: Eugene, OR (With a map; 44.045529, - most likely looking for the nearest coffee store to that location. SERPs with stores near that
123.081159) location should be preferred.

 For the en-US market, when the user location is not provided, you should assume the user was located in Redmond, WA unless otherwise indicated by the query.
 For non en-US markets, when the user location is not provided, you should determine which SERP is most relevant to the majority of users in the market.

You may occasionally encounter some queries with an out of market user location provided. This is expected as sometimes users use search engines in out of market locations. If
user location is not important to the query intent, you can ignore it and judge the SERPs as normal. In cases where user location is important, you should try to consider the SERP
results in relation to the user’s out of market location. Use side searches and maps to help you better determine what local intents the user may have.

14
Checking whether a result satisfies a query’s high precision location intent

If a document specifies a physical location, you may need to check the distance between the provided latitude and longitude and the location provided in the document to
determine if it is the most satisfying result for the user. You should consider whether there are closer matches to the user’s location that would be more satisfying to the most
likely intent. You can use Bing Maps (http://www.bing.com/maps) to check the distance between the document location and user location with the “Directions” functionality.
Clicking the (Maps) link in the HitApp will take you to Bing Maps with the users latitude and longitude prepopulated. Below is an example:

For example, the distance between user location “44.045529, -123.081159” and this document’s location (801 E 13th Ave, Eugene, OR 97401) is 312 ft.

15
High precision location query example:

The query is {coffee} and it has some local intent since the user is probably looking for a coffee shop. The user would like to find the nearest coffee store. In this example the
differences are at L4 & R4. The document location of L4 is “311 Coburg Rd, Eugene, OR”, the distance is 2.1 mile. The document location of R4 is “801 E. 13th Avenue, Eugene, OR”,
the distance is 312 ft.

Both L4 and R4 can satisfy user’s high precision location intent, but R4 is closer. Since the two URLs are both from the same domain and has almost identical content quality, the
right side is slightly better than the left side.

16
4.2 Judging Side-By-Side for Mobile Phone Examples
In some cases, instead of displaying Side-By-Side in the standard manner which simulates a desktop browser experience, it will be displayed in a way which simulates a mobile phone
browser, specifically an Apple iPhone device. You should always assume the user is using an Apple iPhone when rating these. For these queries user location will almost always be
present. When judging these queries you should consider what are the most likely query intents, given that the user issued this query on a mobile device in the specified location.

Judging Considerations:

1. Some results which may be irrelevant for desktop could be relevant in a mobile phone context. For example mobile versions of websites (e.g. m.cnn.com) should be
considered unlikely to satisfy a user on a desktop computer, but may satisfy a user on a mobile phone if they contain relevant content.
a. Note that since many sites will automatically redirect to a mobile experience on a mobile device, it is not a defect to show a non-mobile URL on the mobile SERP,
as long as the landing page is mobile-friendly.
2. Some web pages are less mobile-friendly than others. They hamper readability and often require a lot of zooming and scrolling on mobile devices in order to consume their
content. They are also prone to accidental clicks due to the layout of links/buttons not being touch-friendly. When comparing two similar results that are likely to satisfy
user intent based on general SBS guidelines, the result that is more mobile-friendly is preferable on mobile SBS tasks. For example, in the screenshot below, the page on the
left is designed to be viewed on a mobile device, whereas the page on the right is not. It does not fit on the screen and requires the user to scroll horizontally or zoom to
view the content.

17
3. In some cases, even if the page design is mobile-friendly, it might show an error message like “content does not display well on mobile device” or otherwise report some
mobile compatibility issues. For example, the pages shown below work well on desktop browsers, but are incompatible with mobile devices and generally considered to be
poor search results.

4. Some intents will be more or less likely in a mobile phone context. For example intents to download specific mobile phone apps are more likely on mobile than on
desktop. Similarly intent can shift with the mobile platform: if the user is on an iPhone, an intent to download an iPhone app is a much more likely intent than an Android
app or Windows Phone app (unless specified in the query). This is just one example of a case where likelihood of intents on desktop and mobile are different, there are
many others. As always, use your best judgment. As mentioned above, you should assume the user is using an Apple iPhone.
a. Do not, however, make assumptions about the user’s data plan. SERPs with video results (e.g. from m.youtube.com) should appear on a good mobile SERP
regardless of the mobile context, as long as they are relevant.
5. Given the limited screen size available on a mobile phone, it is even more important to have the relevant content higher on the page. It is also important to scroll down to
evaluate all of the results on the first page beyond the viewable region.

The following screenshots illustrates how a mobile SBS example will appear. Note the SERPs are narrower and closely resemble that of a mobile phone. For mobile SBS tasks,
clicking the search result will show an image of the web page instead of directly opening the link. The direct link will still be provided above the image for reference, but the
simulated rendering will show you how the content would appear in a mobile browser.

18
Mobile SBS Task & rendered mobile image of first result

The image is rendered just as it would appear on a mobile device and is not interactive. The image provides a quick way to assess how the content would appear on a mobile
phone. The direct link to the LP is provided as well to let you explore the content further; this link typically opens a desktop rendering of the page. In a few instances the mobile
simulation may be rendered incorrectly or may be missing information. In cases where such issues make it impossible to accurately assess the SERPs, you should scan the QR code
provided at the bottom of the page using a smartphone to see the actual landing page on your mobile device. A QR code is a barcode that can be read using any smartphone with a
camera and a QR code reader app (there are typically several free QR code reader apps in iTunes, Windows and Android app stores, for example: Microsoft Tag) . When the QR
code is scanned, it will open the landing page in the browser on your smartphone, so you can view how the content looks on a mobile device. You should not need to use the QR
code often, but it is recommended for examples where providing an accurate rating would otherwise be difficult.

The following are examples where it may be helpful to use the QR code to view the page display on a mobile device, or the direct URL to assess the relevance of the page content.
This is not an exhaustive list; if you suspect there may be an issue with the rendered simulation, it is best to check the QR code or direct link to assess the page.

19
 Error in rendering content as an image – Sometimes the tool might be unable to generate an image preview of the mobile page due to some unexpected or transient
error. In case a reload/refresh does not fix the problem, use the QR code to open the webpage on a mobile device or use the URL provided to view the desktop version of
the site. You must never base your rating on a rendering error; this is an issue with the tool and not with the LP itself.

 Content is not fully loaded – The landing page might still be loading when the image was rendered. In such cases, the image might typically contain progress bars or icons
representing the partially loaded state. In many cases you may still be able to assess the content by clicking the direct link above the image. However, if you are unable to assess
how mobile friendly the page is, scanning the QR code will allow you to view it on your device.

 Content is not current – The information on the rendered image may not reflect the current status of the landing page. This can be true especially for classified adverts and
other listing sites, where content changes frequently. The image may still be useful for assessing how mobile friendly the LP is, but you may need to use the direct link or the QR
code to view the current content.

.
20
 Pop-up obscures content – Some mobile pages show a pop-up that needs to be dismissed before the actual content can be seen. This might obscure some/all of the landing
page on the rendered image. Viewing such pages on your mobile device using the QR code may help you determine how mobile friendly the content is.

 Content behavior is not simulated – For example, on an iPhone iTunes LPs would open within the iTunes application. This behavior cannot be replicated in the mobile
simulation and the rendered image may not appear to be mobile friendly. Such LPs should not be penalized based on this simulation and should be opened using the
direct URL provided to view the desktop page. Take care when assessing similar applications where the behavior on a mobile device cannot be simulated.

Note that these are not necessarily issues with the landing pages themselves, rather they are idiosyncrasies/quirks of using rendered images as a means to judge mobile
content. Consider the images as an optimization that expedites judging time, but when you cannot assess the LP due to a rendering issue, use the provided URL and QR codes
to help determine how relevant and satisfying the content is.

21
Mobile SBS Examples:

 {mustang 1972 for sale}


o R2 and L2 are equally likely to satisfy user intent on a non-mobile device, but R2 is mobile-friendly while L2 is not, as shown in the screenshots below. L2 which is not
mobile-friendly appears lower down on the right-hand side at R4.

Conclusion: Right Better.

22
 {youtube music – jimmy buffett}
o The most likely intent is to view Jimmy Buffett music videos on YouTube. Although both R1 and L1 are the official YouTube channel, L1 features an unnecessary sub-
navigational block beneath with irrelevant links for ‘Sports’ and ‘About’ (which is the ‘About’ page for YouTube in general).
o The irrelevant links in the sub-navigational block push other relevant results such as L2 and L3 further down the SERP. As the user is viewing on a mobile, this would
be a poor user experience as the irrelevant links and captions take up a large part of the screen.

Conclusion: Right Better.

23
 {Bridal Handkerchief}
o The most likely intent is to find bridal handkerchiefs for sale. All results are the same on both SERPs except for the 6th and 7th, which are switched.
o Both L6 and R6 are relevant results, however we can see from the mobile simulation that R6 is not a mobile friendly result. It displays the desktop LP.
o Although the rendered image for L6 shows a pop-up, viewing the LP with the QR code or direct URL shows that this can be easily closed. The LP is otherwise relevant
and mobile friendly. As this is the only difference between the two SERPs, and it occurs in the mid-level/lower results, the left side would be just slightly more
satisfying overall.

Conclusion: Left Slightly Better

24
 {Aaron Carter}
o The first 9 results on either SERP are identical. L10 and R10 are for the same LP, but L10 points to the mobile URL for this site.
o However, viewing L10 and R10 in the mobile simulation shows that both present content in a mobile friendly way. Although R10 is not the mobile URL, it redirects to
the mobile version of the site, as shown in the mobile simulation. The two SERPs are therefore considered equally satisfying to a mobile user.

Conclusion: About the Same.

25
4.3 Navigational and Sub-Navigational Intent Queries
There are certain queries where the most likely intent of the user is strongly satisfied by navigating (going) to a specific website, and where likely intents are to navigate to one of
its subpages. The intents satisfied by these subpages are referred to as sub-navigational intents. Subpages should satisfy the following conditions:

1. A subpage is a webpage that belongs to the same website as the primary navigational URL.
2. A subpage satisfies a more specific navigational intent relevant to the primary navigational intent.

For example:

 For the query {nordstorm bags}, http://shop.nordstrom.com/c/handbags-shop is the primary navigational URL. http://shop.nordstrom.com/c/designer-handbags is a
valid subpage because it satisfies a more specific intent of going to the “designer handbags” page on the same website as the primary navigational URL
(nordstorm.com). (Designer handbags are a type of bags.)
 For the query {kohls}, http://www.kohls.com is the primary navigational URL. http://www.kohls.com/feature/myaccount.jsp is a valid subpage because it satisfies a
more specific navigational intent of going to “kohls account” page on the same website (kohls.com).
 For the query {kohls}, http://www.kohls.com is the primary navigational URL. http://kohlscareers.com is not a valid subpage (it’s from a different domain –
kohlscareers.com). Although it satisfies a specific intent relevant to the primary navigational intent of going to “kohls careers” page, it does not belong to the same
website as the primary navigational URL (kohls.com).

For these navigational queries, sub-navigational intents are displayed in a block of URLs below the primary navigational URL, we refer to this as the ‘Sub-Navigational Block’ and the
‘Navigational URL’ (displayed below).

Primary Navigational URL

Sub-Navigational Block

26
Structure of SERP for sub-navigational queries

For queries with navigational/sub-navigational intent, the SERP can be considered as three sections.

1. Navigational URL
a. This URL should strongly satisfy the primary navigational intent of the query.
2. Sub-Navigational Block
a. These sub-navigational URLs should strongly satisfy the most likely intents which are subpages of the navigational URL.
b. All pages should be subpages of the navigational URL.
3. Non-Navigational Block
a. The non-navigational block is all results not part of the sub-navigational block.
b. The non-navigational block landing pages should satisfy the non-navigational intents for the query.
c. The non-navigational block landing pages should not satisfy the primary navigational intent, or a sub-navigational intent (URLs which satisfy these intents should
be in the Navigational Block).

Judgment Considerations for Navigational/Sub-Navigational Queries

Use the following considerations when judging navigational/sub-navigational queries:

 Subpages of the navigational URL should be shown within the sub-navigational block. Subpages which are shown outside of this block should be judged as if they are
unlikely to satisfy the user.
 The sub-navigational block should only contain subpages of the primary navigational URL. A SERP without a sub-navigational block is preferred over a sub-navigational
block with URLs that are not relevant subpages of the primary navigational URL.
 The non-navigational results should satisfy the intents which are not sub-navigational intents. This adds diversity to the SERP by addressing the other intents that exist
for the query. However, non-navigational results which are poorly satisfying or address very unlikely intents are not desirable. Useful sub-navigational results in the
non-navigational block would still be preferable to poorly satisfying or very unlikely non-navigational results.
 Transactional or informational queries can still have a primary navigational URL associated with the most likely intent. For example, although the query {Taylor Swift}
has transactional and informational intents, www.taylorswift.com is considered the primary navigational URL as it is the official website that strongly satisfies the
most likely intent. It would still be appropriate to display a sub-navigational block for this kind of query as users would find prominent sub-pages from her official site
very useful.
 For queries where there is no primary navigational URL associated with the most likely intent, no sub-navigational block should be shown.

Example for judging the sub-navigational block:

The below examples further illustrate the above concepts and criteria, comparing examples of SERPs based on how much they would satisfy the user. In these examples the key
results are color coded; the best results are green, less satisfying results are orange, and results which are poorly satisfying or relate to very unlikely intents are red.

27
 {apple login}
o Both sides have the same navigational URL.
o Sub-navigational block: Left side satisfies most likely intents of the query {apple login}. Right has Store, Mac and Change Country that are not sub-navigational intents
for the query {apple login}, and therefore not relevant. These should be considered poor results for the sub-navigational block.

Conclusion: Left Better.

28
Examples for judging the non-navigation block:

 {aol}
o L2 and L3 strongly satisfy likely non-navigational intents; other services of AOL not sub-navigational of www.aol.com.
o L4 and L5 satisfy likely non-navigational intents; find financial and general info about AOL.
o R2-R4 satisfy sub-navigational intents and should be part of the sub-navigational block, R4 is also a duplicate of the first result in sub-navigational block.
o R5 is a foreign result and satisfies a very unlikely intent. This is a very poor result.

Conclusion: Left Better.

29
 {aol}
o L2 is a poor quality page, L4 weakly satisfies a very unlikely intent: poor relevance.
o L3 and L5 satisfy unlikely non-navigational intents; although AOL platforms (L3) is an AOL service, it is an unlikely intent for query {aol}; L5 has general info about AOL
but landing page is low quality.
o R2-R5 satisfy sub-navigational intents and should be part of the sub-navigational block, however they are better than low quality other domain results in the non-
navigational block.

Conclusion: Right Better.

30
Example for judging when sub-navigational block is present only on one side:

 {building supplies}
o Sub-navigational block on left shows valid sub-navigational URLs for lowes.com. However, this query is not associated with a primary navigational URL and so no sub-
navigational block should be shown. The user has not asked to navigate to lowes.com. Additionally some of the sub-navigational links are very narrow (e.g. Stone
Veneer, Lawn Care etc.)
o The right side omits the sub-nav block, instead providing results for a variety of building supplies for sale online and local resources.

Conclusion: Right Better.

31
 {macys}
o Right side has relevant subpages of macys.com inside a sub-navigational block.
o Right side also has many results addressing non-navigational intents (social pages, corporate page, and general info) in the non-navigational block.
o Left side does not show a sub-navigational block although {macys} is a navigational query. For this query it is preferable to show a sub-navigational block.

Conclusion: Right Better

32
4.4 Rating SERPs for Category vs. Specific Queries
Consider that many users search for items in categories rather than for a specific item. Look at the queries below, which have been grouped into category and specific queries:

20+ items 10-20 items 3-10 items 1-2 Items

Category Specific
Description Definition
Specific The landing page has content about a single item
Landing Page
Aggregate The landing page has content about many items
Landing Page
1st Party The landing page is from an official site from the owner of the category. For example a page from canon.com would be 1 st party for the query {canon digital
cameras}
2nd Party The landing page is not 1st party , but it contains original content (e.g reviews, information), or allows the user to perform actions directly on the website
(e.g. to purchase the item). Examples include amazon.com and ebay.com and review sites like cnet.com
3rd Party The website does not have its own content, and its primary purpose is to aggregate other 2nd and 1st party websites. The large majority of these websites
tailor to queries with a transactional intent, but also to those with an informational intent. Some specific examples of these 3rd websites are bizrate.com,
nextag.com, thefind.com and ask.com

Here are some of the aspects to consider about the individual landing pages:

 Users generally prefer 1st and 2nd party landing pages over 3rd party providers.
 The more specific a query is, the more likely it is that users will prefer specific LPs.
 The less specific a query is, the more likely it is that users will prefer aggregate LPs.
 Specific pages about famous/popular items within a category query (regardless of category size) tend to be highly valuable to users.

33
4.4.1 Category vs. Specific Queries
Here are more examples that illustrate the difference between category and specific queries:

Query Number Types Explanation

russian wars 20+ Category This query is straightforward; the user is likely searching for information about wars which Russia was involved.
It does not describe a specific war, but a category of wars.

Russian wars with France 3-10 Category This query is more specific, as it describes a more specific set of wars. The user is only interested in wars which
Russia and France were fighting. Via research, you as a judge should determine that there was more than one
such war, so this is a category query.

Lemon risotto recipes 20+ Category The user is looking for recipes which tell you how to cook lemon risotto. Since there are many different recipes
and ways of making lemon risotto, this is a category query.

honda hybrid cars 3-10 Category Since Honda has more than one hybrid car, this is a category query. The user likely wants information/reviews
about not just one but many different cars within the category.

40” HD LCD TVs best buy 10-20 Category This query describes a category of TV; the user wants 40 inch high definition LCD TVs. The user has also
specified that they would prefer their web results from a specific website (www.bestbuy.com). This is still a
category query because the primary intent describes a product category.

Nike shoes 20+ Category This query can be interpreted two ways: either the user wants to navigate to the official website for Nike shoes,
or they are describing the product category of Nike shoes. Since one of the likely intents is describing a
category of products, this is a category search.

Nike free run 3 womens 3-10 Category This query describes a very specific product category. Via research, you should discover that there are many
different kinds and colors of Nike Free Run 3.0 shoes for women. Therefore this is a category search.

Nike n/a Neither Nike has many different products. Since the user does not describe any type of category of product, nor a
specific product type, this is neither a specific nor a category search. The most likely intent of this query is to
navigate to www.nike.com.

US wars with Canada 1-2 Specific There were very few wars between the United States and Canada, so it is likely that user is searching for
information about a specific war between the United States and Canada (e.g. the War of 1812).

Honda fit 1-2 Specific The Honda Fit is a specific model of Honda car. Although there are three sub-types (Honda Fit, Honda Fit Sport
and Honda Fit Sport with Navigation), it is highly likely that the user considers this a specific item query.

34
Sony 55” W802A 1-2 Specific This describes a specific model of Sony TV (W802A) and a specific size (55 inches).

Canon T4i 1-2 Specific This is a specific model of Canon camera.

Digital Cameras 20+ Category This is a very generic query, as there are large number of digital cameras. This is a category query.

Canon cameras 20+ Category There are a large number of cameras made by Canon. This is a category query.

4.4.2 Examples of Different Types of LPs


Note that 1st, 2nd and 3rd party landing pages generally cover commercial entities and queries with a transactional intent.

Query: {russian wars}

Aggregate Landing Page

http://en.wikipedia.org/wiki/Category:Wars_involving_Russia

This Wikipedia page provides a list and links to the many different wars
Russia has been involved in.

35
Specific Landing Page

http://www.historylearningsite.co.uk/russian_civil_war1.htm

This page provides an overview of a specific Russian war, the Russian Civil War.

36
Query: {Nikon digital cameras}

1st Party Specific Page

http://www.nikonusa.com/en/Nikon-Products/Product/Compact-Digital-
Cameras/26347/COOLPIX-S01.html

This result is the main results page for a specific camera model in the Nikon
Digital Cameras category, and it is on the official webpage for the category.

1st party Aggregate Page

http://www.nikonusa.com/en/Nikon-Products/Cameras/index.page

This result is the main results page for the category on the official web page.

37
2nd Party Specific Page

http://www.bhphotovideo.com/c/product/918817-
REG/nikon_coolpix_aw110_digital_camera.html

This result is a results page for a specific camera model on a website where you
can directly purchase the product. It also provides specifications, and reviews,
and other information about the camera.

2nd Party Specific Page

http://www.amazon.com/Nikon-COOLPIX-Waterproof-Digital-
Camera/dp/B005IGVY6K

This result is a results page for a specific camera model on a website where you
can directly purchase the product. It also provides specifications, and reviews,
and other information about the camera.

38
2nd Party Aggregate Page
http://www.amazon.com/s?ie=UTF8&page=1&rh=n%3A281052%2Cp_4%3ANiko
n

These are webpages which show many different camera models that are within
the category of the query. All of these places show original content, or allow you
to directly purchase the product from that website.

2nd Party Aggregate Page


http://www.dpreview.com/products/nikon/cameras

These are webpages which show many different camera models that are within
the category of the query. All of these places show original content, or allow you
to directly purchase the product from that website.

39
3rd Party Specific Page

http://www.bizrate.com/digital-cameras/4856089953.html

This website shows very little content other than pricing information. There does
not appear to be much (or any) original content. You cannot directly purchase
the camera from this website. Engaging with many of the links on this page sends
you to 2nd Party locations which do sell the product and have original content.

40
3rd Party Aggregate Page

http://www.nextag.com/nikon/stores-html

This website only shows pricing information about many different camera models
within the category of the query. On close inspection, you will notice that you
cannot purchase this product on this website. The website appears to have
aggregated content from other 2nd party providers, and clicking links sends you to
those 2nd party providers.

4.4.3 Judging Scenarios


For very specific queries, the best SERP would have:
 A variety of specific (2nd party and 1st party) web pages which are relevant to the specific item.
 3rd party pages towards the bottom of the SERP, since they are generally less useful.
 Aggregate pages towards the bottom of the SERP, since they are generally less useful. Note: aggregate pages which are very close in scope to the query could be
useful, but users will generally still prefer the LPs with the specific item.

For category queries which describe a small number of elements:


 Users would generally be satisfied with both specific and aggregate pages.
 Aggregate pages should closely match the scope of the category.
 Specific pages should be for items within the category, and the most famous/popular items in the category.
 A variety of both 1st party and 2nd party pages would be preferred over exclusively 1st party or exclusively 2nd party pages.
 Again, 3rd party aggregate pages will generally be less useful.

For queries which describe a category with more than a small number of elements:
 Users would be most satisfied with aggregate pages.
 Aggregate pages should closely match the scope of the category.
 If there is a specific page, it should be for a famous or well-known item within the category.
 A variety of both 1st party and 2nd party pages would be preferred over exclusively 1st party or exclusively 2nd party.
41
4.5 Judging with Relevance Ratings Provided
In some cases, relevance ratings will be shown next to the search results. These ratings came from other judges who evaluated the relevance of individual LPs for the query. When
giving these ratings, the judges observe the specific landing page, and the query it is being evaluated against, but do not observe the caption / title of the result, or the other
potential results it is being evaluated against.

These ratings can be used to assist (but not determine) your judging of which side is preferred.

HitApp Annotation Rating

Perfect

Excellent

Good

Fair

Bad

Rating Unavailable

When the individual LP raters are giving ratings, they consider the below chart. After determining the intents that this landing page satisfies for the query, they determine how
likely this intent is, and then how well it satisfies the intent, given these two factors the judge can determine the specific label to give.

42
How well does this LP satisfy this intent?

Obscene Content
Strongly Moderately Weakly Poorly
content inaccessible

Most
No
Likely Excellent Good Fair Bad Detrimental
Judgment
>50% [Perfect]

Very
No
Likely Excellent Good Fair Bad Detrimental
Judgment
How likely is this intent?

26-50%

Likely No
Good Fair Bad Bad Detrimental
11-25% Judgment

Unlikely No
Fair Bad Bad Bad Detrimental
1-10% Judgment

Very
No
Unlikely Bad Bad Bad Bad Detrimental
Judgment
<1%

43
Example for query {oxide’s polyatomic formula} which shows individual relevance ratings:

44
4.5.1 Judging Considerations
1. Not all ratings are correct, as with any complex judging task errors can be made. You should not blindly trust the ratings provided.
2. Judgments on one page, come from different judges, therefore differences of ratings could come from slightly different opinions of different judges.
3. Be specifically careful about the following concepts:
a. Redundancy is not captured with the individual ratings, highly rated pages which are redundant given other results are still negative for a SERP.
b. Diversity. If results high on the page have already satisfied the most likely or very likely results, then results which strongly satisfy less likely intents are preferred
over results which weakly satisfy intents which are already satisfied.
4. Clicking on individual results is still highly recommended to ensure that you agree with specific ratings. In particular, if the rating is not what you would expect (i.e. higher
or lower than you would expect by looking at it), then you should click the result to make the final decision.
5. Many results may have ratings which are unavailable; it is even more important for you to click these results to ensure you fully evaluate their quality to make an accurate
overall determination.
6. Many results may be labeled “Bad”, but remember that there are many different cases which justify a “Bad” rating, and some are worse than others.
7. Many results may have identical ratings (i.e. both labeled Good) but this does not mean the results are identically relevant, or that this cannot justify a preference for one
side or another.
8. These labels do not align with the above policy on “Sub-Navigational” results, i.e. sub-navigational results which are displayed in the non-navigational block will likely be
rated high. You must judge with respect to the above policy on Sub-Navigational Results, and not with respect to the individual ratings those results may have.

45
4.5.2 Judging Example
In the below example there are two incorrect ratings. L2 is rated “Bad” by the individual rater. This label is incorrect, this is a recent Time Magazine about Chris Christie running for
president, and it should be rated “Good”. R2 is also incorrectly rated, this is a section about Scott Walker and is written by a journalist with the name Chris Christie, if the intent of
the user was to learn about the journalist Chris Christie, these page would poorly satisfy, this rating should be Bad. The overall rating for this example should be Left Slightly
Better.

As you can see by following the individual ratings without consideration would have resulted in the wrong overall rating. L2 seems suspiciously low and R2 seems to have little to
do with Chris Christie based on the Title so the rating was suspiciously high. Both of these should be investigated further before giving your overall rating.

46
4.6 Content Quality
4.6.1 Overview
The degree to which a landing page satisfies a user can be thought of as having two primary components:

 Content Relevance
 Content Quality

Much of the rest of these guidelines concern themselves with evaluating SERPs with respect to Content Relevance. This section addresses Content Quality.

Given that a document is on topic and relevant, Content Quality seeks to evaluate the quality of the document. Specifically when thinking about content quality, we concern
ourselves with the following five dimensions of quality:

 Content Authority (CA): How much can the page and its parent website be trusted on the given topic? Are they authorities on the subject matter?
 Content Discoverability (CD): How easy is it to discover the primary content in the document?
 Content Utility (CU): Does the content discuss the subject matter fully and thoroughly?
 Page Presentation (PP): How hard or how pleasing is the page is to view? Is it professionally designed?
 Content Generation Effort (CGE): How much effort/time/expertise/talent is required to generate the landing page?

Some of these dimensions have overlapping aspects.

The overarching goal of the Content Quality dimensions is to rate better the SERP that has landing pages that go beyond satisfying the intent of the query by also being
authoritative, thorough, well designed, and easy to read and navigate. It seeks to rate worse results that provide shallow, hastily assembled content with a primary purpose of
serving aggressive advertising, misleading or confusing the user, etc...

It’s important to note that Content Quality is a consideration only when comparing pages that are reasonably on topic and addressing the intent of the query.

The following sections will discuss each of these dimensions in detail with examples and appropriate considerations when evaluating a page.

4.6.2 Content Authority (CA)


When determining whether a page is authoritative there are a few elements you must consider:

1. Site Authority – Certain websites will be more authoritative in certain areas than others. For example http://www.webmd.com/ and http://www.mayoclinic.org/ are
known sites that have good authority on medical fields. Here are some tips to evaluate the authority of a site:
a. Look out for sites that exist to generate content written by “freelancers” specifically to attract search engine traffic (often called “Content Farms”). Examples of
these include http://www.ehow.com, http://www.buzzle.com, http://www.wisegeek.com, http://www.ezinearticles.com, amongst many others.
b. Check the reputation of a site by searching for it in a search engine, check the Wikipedia article on the site, etc… For example this Wikipedia article identifies
Yahoo! Voices as a paid content site. Other elements of that article suggest that it is a content farm with low quality articles.
c. Poke around the website and look at the “About Us” or “Contact Us” pages. Is there contact information? Who runs the site?

47
d. Is the site focused deeply on a topic or a general all-purpose aggregator of content? For example http://stackoverflow.com is focused on technical and
programming topics. There’s a clear rating system and the answers are generally professionally written by programmers. The community is well-moderated.
Stackoverflow.com is authoritative for programming and technical topics. On the contrary, http://answers.yahoo.com tries to be a general purpose question and
answer site. A user or page may be authoritative on a given topic but because the site lacks focus, it’s unlikely for a page on that site to be authoritative.
e. Brand Recognition – A site with broad brand recognition in a topic (such as Macy’s for clothing, or John Deere for tractors) will generally have higher authority in
their respective areas.
i. Be careful: a lot of malicious sites recognize that users recognize and respect these brands and try to model their URLs to confuse & mislead users. For
example the official website of Gucci (the fashion brand) is http://www.gucci.com. The following are squatters: http://www.guccibelts-guccibelt.com,
http://www.guccibeltsreplicas.com/, etc… Another example: http://www.pof.com is a popular dating website (“Plenty of Fish”) but
http://www.plentyfishof.com and http://www-pof.com (note the hyphen!) are low quality sites trying to mislead users (for the purposes of stealing their
information or driving ad clicks).

Overall site authority is extremely important: good sites will have good pages.

2. Author(s) identity – Is the author identified by full name (and not alias or online pseudonym)? Is there an author biography on the page or easily accessible from the
page? Usually a line or two about the author and their professional background. Note that sometimes there may be a reviewer identified whose background may lend
authority. It may be wise to use a search engine to confirm the author’s background.

3. Author background – Does the background of the author give him or her
credibility for the topic being discussed? For example consider this WebMD
article on Vaccines for Adults. The Author is clearly identified and so is the
reviewer. Although the author is a freelance author (“freelance author” can
often be a red flag), the reviewer is an MD (medical doctor) which lends the
LP the necessary authority.”

4. Other Considerations
a. Source Citations – A page may be highly authoritative by virtue of
having extensive and carefully curated citations. Wikipedia articles are usually composed by many people over a long period of time but the site enforces a strict
citation policy (example) and many pages cite upwards of 100 articles.

48
b. Site Community – In general user generated content is hard to consider authoritative. There are
however sites that will have extensive communities and excellent self-moderation. A few already noted
include stackoverflow.com and Wikipedia.org.
c. Content Originality – Sites and pages that copy large portions of content from other sites should not be
considered authoritative and in general should be severely penalized in content quality considerations.
d. Site trustworthiness -- In the case of sites dealing with personal data, taking credit card information,

financial sites, etc…, look for a padlock in the URL bar of Internet Explorer. Sites identified using a
trusted certificate (such as VeriSign or Symantec SSL) certificate are more trustworthy and preferred.
Also, look for the URL to start with https:// indicating an encrypted connection and keep an eye for
alerts from your browser about expired certificates, etc…

Authority might be determined differently based on the page segment. Documents and queries in segments that are important to a user and cover high risk topics (such as
medical, financial, etc…) should be more carefully evaluated with respect to authority and a higher bar should be used. Pages on these topics should be written with professional
expertise in a professional style. On the other hand, gossip, humor, and entertainment sites can be considered with a lower authority bar.

4.6.3 Content Utility (CU)


When evaluating a page’s “Content Utility” consider these elements:

1. Content completeness – sometimse a page that appears relevant is found to be unsatisfying because it is missing key elements. Some examples:
a. A template page that does not have many fields filled out has insufficient content. For example a yellow pages listing with missing address, etc…
b. A Question & Answer page that has only a question will not satisfy users.
c. An error page, “No Results Found”, or other junky pages that lack useful information.
2. Content Depth & Breadth vs. Shallow/Superficial content – For a query like {how to make eggs} a landing page can be relevant if it says “Boil water, throw in egg” but
that is unlikely to satisfy users. A better page with high utility provides a full recipe or more. Even if a query is asking a straight forward question like {when was Barack
Obama born} a page like his biography on whitehouse.gov that answers the question directly with other immediately relevant information is preferred to a shallow page
like this wiki.answers.com page. In general single factoid pages like that should not be highly regarded.
3. Main Content Freshness: Fresh enough to serve the purpose - A page which is actively maintained with regular updates will usually be more useful. The importance of
freshness changes with the topic. For example, a recipe page that hasn’t been updated for five years may still be very useful. However, a technology forum post from five
years ago is normally very out dated.
4. Main Content Functionality - Do elements of the page function correctly? For example: links, buttons, images, videos work and load properly? Consider the impact this
has on the overall purpose of the page and its ability to address the query intent.
5. Main Content Writing Quality - Please consider writing quality with respect to grammatical, spelling and word choice. Poor writing with grammatical mistakes or poorly
written/organized text should be regarded as being of lower quality.
6. Multimedia content – If the page has supporting images: do they help address the query and user intent? Are they of high quality?

49
4.6.4 Content Discoverability (CD)
Content discoverability answers the question of how quickly it’s possible to discover the page’s main content or achieve the user’s intent on this page.

1. Page Layout
a. We need to check how supportive content and advertisements are positioned relative to the main content.
i. Sites with good content discoverability will have the primary content up front and center, with easy page navigation for long pages. This Wikipedia
article serves as a good example.
ii. Large headers that push down the primary content hinder
discoverability.
b. Advertisements on the page cannot be excessive.
i. Ads that precede the content when viewing the page from
left to right, or top to bottom generally greatly hurt content
discoverability. Similarly having more than a few graphical
ads (generally more than three in close proximity) negatively
impact a user’s experience by taking their attention away
from the primary content.
ii. Ads that are not clearly set-off from the main content can
confuse users and hurt discoverability. Similarly ads that are
interleaved with the content can negatively impact the user
experience.
iii. Hard to dismiss pop-up windows, automatically playing
video ads, and ads that react to mouse overs etc…
negatively affect content discoverability.

2. Steps to reach main content should be minimized


a. If the landing page is not the obvious final destination (perhaps the user is searching for a download or an article found in a PDF linked from this page), getting to
the final goal should require few clicks and the steps should be obvious.
b. Some pages will present content as a slide show requiring the user to click multiple times to view all the information that could have otherwise been presented
on a single page. If there is relatively little content per slide the page should be penalized with regards to discoverability. However, if the slides are rich in content,
so that presenting them all at once would overwhelm the user or page, this may actually be a good thing.

3. Access Limitations - Limiting a user’s access to information is reasonable if the information is private, proprietary or otherwise sensitive. For example, it’s reasonable that
a social networking site limits access to user profiles to only logged-in users. On the other hand if a page requires sign-ups/registration/memberships for what is otherwise
public or non-proprietary information (e.g. white-pages information, or business listings, user generated answers to user questions) then pages with more permissive
access limitations should be preferred, if they are available.

50
4.6.5 Page Presentation (PP)
When evaluating the page presentation, consider how appealing the page looks:

1. Do the ads hurt the overall appearance of the page? (Also see discussion in content discoverability.) (Example, Example)
2. Use of colors, fonts, images, etc…
3. Is the page cluttered or properly laid out (in paragraphs with proper use of whitespace). (Example)
4. Are there display or rendering errors?
5. Is the content lengthy without pagination or within-page navigation?

4.6.6 Content Generation Effort (CGE)

A lot of low quality websites are built by stealing bits of information from other websites (such as Wikipedia or YouTube) or by purchasing databases of contact information,
business listings, etc…

1. How much effort, expertise, talent or research is needed to generate the main content of the page? Here, we are referring to the effort to create the page content, not
the engineering effort to build or design the web page.
2. Is the author providing a simple answer to a narrow topic or providing rich information to a broad topic?

For example this high quality page from Zillow.com has a lot of content of proprietary information that other real estate listing websites do not. Some of it is contributed by users,
and others is computed or designed by Zillow.com (such as the ZEstimate). A site with lower content generation effort may simply pull real estate information from a purchased
database.

51
4.6.7 Content Quality Examples (Page Level)
The following are examples of landing pages that illustrate and discuss the five dimensions of quality. Most (but not all) of these are low quality. The following page is low quality
because it is addressing a medical topic/intent written by a non-medical professional or doctor. The site has no authority on medical topics.

The next page is low quality because Yahoo! Answers is not an authoritative site to address medical topics. It is also has low utility because the question is unanswered.

52
In this example the ads are hindering access to the primary content and are disguising themselves as primary content. The utility of this page is relatively low as there’s no actual
download link provided as the heading implies.

53
The presentation in the following example is acceptable and the utility is reasonable as well (it provides information, phone number, etc…) Two ads on the right are reasonably
arranged and don’t hinder access to the main content. The listed keywords in the About us section are hurting presentation, discoverability and utility.

54
In the following example the ad above the content (“Is He Cheating On You?”) is hurting discoverability and presentation. The utlity of this page is rather low as well.

55
In this case the ad is popping up and blocking the main content.

56
Ads, Excessive Links & Broken Page elements make it hard to discover the primary content. This is a very low quality page.

57
4.6.8 Content Quality SBS Examples

 {get started with stock investing}


o R1 is a content farm (ehow.com) whereas L1 is a well-known financial website from the Wall Street Journal. However, L1 is off-topic and not relevant to the query (it’s
about bonds). Although R1 is not an authoritative or robust result, it is more useful to the user than L1.
o This example is meant to illustrate that relevance and topicality must be established before CQ can be considered. Content Quality should only be considered when
pages in question are more-or-less equally satisfying the user intent.

Conclusion: Right Slightly Better.

58
 {how to send pictures from a ipod?}
o In this example L1 and R1 both address the primary query intent directly. Visiting L1 and clicking around the website, we see that lifehacker is a tech enthusiast site
(Authority). The article’s content is front and center (Discoverability), the instructions are step by step and uninterrupted with ads (Utility), and the video greatly
enhances the content.
o R1 on the other hand is from a content farm. The site and author are not authoritative in the tech space. The actual instructions are buried under ads, supporting
content and an unnecessary explanation of what an iPod is.
o L1 is clearly better in terms of CQ than R1. The rest of the results are identical between the two sides and contain a lot of unauthoritative results and is missing results
from Apple (the manufacturer of the iPod).

Conclusion: Left Slightly Better.

59
 {how to send pictures from a ipod?}
o In this variant, the right side doesn’t have the lifehacker.com result at all, making this an Left Better case.

Conclusion: Left Better.

60
 {Email and Website Hosting}
o At first glance L1, http://emailwebsitehosting.com, seems promising. Beware, however; it only costs a few dollars to purchase a highly descriptive domain name like
that. Do not be fooled by the URL, the URL’s domain name or even the Title/Snippet.
o Inspect the content of the page (screenshot below). In this case there is very little useful content besides flashing advertisements, spammy links and a “This area is
currently undergoing development…” message. The links do not yield anything related to web-hosting. L1 doesn’t answer the intent of the query and is clearly low
quality and bordering on spam.

Conclusion: Right Better.

61
 {us open tennis} (Judged 07/10/2014)
o This is another example of authority and misleading domain names, titles and snippets. L2 is www.us-open-tennis.com – at first it may seem that the website’s
domain name (us-open-tennis.com) adds an element of legitimacy. However, a domain name can be purchased for a few dollars and alone it’s no proof of relevancy,
quality or authority. Visiting the website, it’s clear that it’s a front set-up to funnel users to tennis betting websites. It is not authoritative on the topic.
o Although R3 is a single news article, this would have been very fresh at the time of judging and would have been useful to many users.

Conclusion: Right Better.

62
 {how to reset iphone}
o Both sides are on topic and relevant. The left side is documentation from the manufacturer. The pages (L1, L2) are very clean and the content is easy to discover and
directly answer the question with supporting images.

Conclusion: Left Better.

63
 {python read html}
o The most likely intent of the query is to use the Python programming language to read an html document. Aside from L1 & R1 the SERPs are identical. Both L1 and R1
are on topic.
o Python.org is the official website for the Python programming language (CA). It is also a very clean, focused design (PP & CD). It goes into adequate detail to answer
the problem (CU). This official site is not featured at all on the right. R1 on the other hand is a content farm and refers to an older version of python.
o It’s important to note that R1 is arguably easier to read and understand for the average user. However the value of the result set must be judged with respect to the
user that is likely to have issued this query. This user is very likely technical and a python programmer. They are almost certainly capable of reading and understanding
a python.org document.
o It’s also worth noting that L2/R2, L3/R3, L4/R4 are Authoritative and have very high content utility as well. Though they are not official Python sources, they are pages
from communities dedicated to programming and provide excellent answers, links, etc…

Conclusion: Left Better.

64
 {what helps a kidney infection}
o This is a medical query. Medical, financial, legal and other segments that are highly sensitive, need to be treated with special consideration with regards to Authority
and other aspects of Quality.
o The emedicinehealth.com (L1) and webmd.com (L2) articles were written or reviewed by medical doctors. Buzzle.com (R2), however, doesn’t identifty the author and
is a well-known content farm.
o Note: L1/R1 doesn’t have treatment on the landing page, it is however on a subsequent page of the article and is linked to from the navigation.

Conclusion: Left Better.

65
 {what helps a kidney infection}
o Similar to the above example, except R2 is also a low quality (authority, page presentation, etc…) result.
o As this is a medical query, authority is very important. Having two low quality sites in the top two positions on the right is much less satisfying than the authoritative
medical results on the left.

Conclusion: Left Much Better.

66
4.7 Queries with Recourse and Requery Links
When users visit a search engine and type a query, search engines might apply certain alterations to the query to bring better results to the user. Spell corrections or synonyms of
the query terms are a couple of good examples of these alterations.

Example:

Original user query: microsft shop

Search Engine Altered Version: microsoft shop OR microsoft store

In this example, the user misspelt “microsoft” and also used the term “shop”. A search engine may correct the misspelling and add a synonym of the word “shop” which is “store”.

These alterations are internal mechanics of the search engine and a user does not always need to be bothered by these details. However, these alterations by the search engine
could lead to a material change to the query intent. In these situations, users need to be notified about the changes applied and given a chance to reject the alterations applied
by the search engine. For this purpose, Bing uses a component called DRL (Dynamic Recourse Link) to satisfy this need. The image below shows what a DRL looks like. “Including
results for” is for the altered query and “Do you want results only for” is for the original query:

4.7.1 Judgment Guideline


You should evaluate the utility of the DRLs with respect to the following aspects:

 Intent Shifting: The altered query in the DRL has a different intent to that originally entered by the user.
 Spelling Correction: The DRL corrects a misspelling by the user to provide better results.
 SERP Impact: The query alterations lead to a material impact on the SERP.

Essentially, if the text displayed in the DRL is a different intent or spelling to the user’s original query and the SERP is impacted materially (i.e. shows results for the altered query), it
is a better to show a DRL to the user. It informs the user that a change has been made to the query they originally entered.

If there is no impact on the search results from the DRL (i.e. the results are for the user’s original query), or if the DRL alteration is a very minor spelling or punctuation correction,
no DRL needs to be shown. Examples are given in the following table to show the difference.

67
User Query Altered Query Is User Query - > Altered Query SERP is about SERP is impacted? Show DRL?
an Intent Shift or Spelling
Correction?

11408 sturgen bay lane 11408 sturgeon bay Yes 11408 sturgen bay lane No NO DRL
lane (This is an intent shift as the altered (Despite the altered query, the (SERP results are for the
query is a different street name.) SERP content is about sturgen.) original query; the DRL has no No DRL is necessary because the
impact.) SERP results are not impacted.

bidens trip to paris biden trip to paris No biden trip to paris Yes NO DRL
(There’s no meaningful change in the (SERP shows results for Biden) (SERP includes results for
query – there is only a minor altered query) This is considered a minor
correction.) correction. Even though the SERP is
impacted, there is no meaningful
change to the query here.

falcon falcons Yes falcon No NO DRL


(Query intent is shifted – ‘Falcons’ could (SERP results are for the
refer to the Atlanta Falcons) original query; the DRL has no No DRL is necessary because the
impact.) SERP results are not impacted.

pleather men’s jacket leather mens jacket Yes leather mens jacket Yes Show DRL
(The altered query is a different intent. (SERP shows results for (SERP includes results for
“pleather” is a type of artificial leather.) ‘leather’) altered query) “Pleather” is a type of artificial
leather. As the DRL has changed the
SERP to show results for ‘leather’
instead, we should show a DRL. This
gives the user chance to change back
to results for ‘pleather’.

normal body normal body Yes normal body Yes Show DRL
tempatures temperature (‘tempatures’ is significantly misspelt – temperature (SERP includes results for
this is corrected in the altered query) altered query) The DRL adds value by informing the
user their misspelling has been
corrected.

68
Examples:

 {pleather men’s jacket}


o Both SERPs interpret the query the same way but only the right side shows a DRL (R0).
o Web results are dominated with results for the term leather instead of the user typed term pleather.
o Query Intent is shifted and SERP is impacted so showing the DRL is appropriate.

Conclusion: Right Slightly Better.

69
 {francheska wood}
o Web results are dominated with the urls with term francesca. The SERP is impacted as it shows results for the altered query.
o Query intent is shifted and the SERP reflects that. DRL is therefore appropriate.

Conclusion: Right Slightly Better.

70
 {bidens trip to paris}
o SERPs includes results for the term biden.
o Query intent is not shifted but SERP is impacted (as it shows results for the altered query). Showing a DRL was not necessary however as the altered query only
corrects a minor misspelling and the DRL adds no value.

Conclusion: Left Slightly Better.

71
 {coral gable, fl}
o SERPs includes results for the altered query term gables.
o Query intent is not shifted but SERP is impacted (as it shows results for the altered query). Showing a DRL was not necessary however as the altered query only
corrects a minor misspelling.

Conclusion: Right Slightly Better.

72
 {normal body tempatures}
o Query has a significant misspelling. DRL on the Right shows the correct spelling ‘temperature’ and informs the user of the change.
o SERP shows results for altered query. As the query spelling is corrected in the DRL, it is better to show it on the SERP.

Conclusion: Right Slightly Better

73
 {11408 sturgen bay lane}
o Despite the DRL on the left, the SERP does not include any results for the term sturgeon. The DRL has no impact on the SERP.
o Query Intent is shifted but SERP is not impacted – meaning showing the DRL was not necessary and the better SERP is the one without it.

Conclusion: Right Slightly Better.

74
 {dennison saint marys jr high school}
o Despite the DRL on the left, the SERP does not include results for the adjusted query denison. The DRL has no impact.
o Query Intent is shifted but SERP is not impacted – meaning showing the DRL was not necessary and the better SERP is the one without it.

Conclusion: Right Slightly Better.

75
 {sharron archuleta}
o SERP includes results including term Sharon and Shannon.
o Although Sharon impacts more results, Shannon impacts the top result.
o Both of the alterations change the intent and impacts the SERP.

Conclusion: About the Same.

76
4.8 Freshness
Fresh results are those that are recent, newly created and/or most recently updated. You should take the freshness of results into account when the recency of the content would
impact the utility of the SERP to the user. The following examples illustrate judging considerations for freshness.

Queries Judging Considerations

{ebola outbreak}, For queries related to recent news/events, finding recent and up-to-date content is the most likely intent. Showing the
{George Clooney latest results relevant to the most likely intent would be preferable for these queries.
wedding}

{iphone 6} For some queries the most likely intent shifts over time. For instance, rumors about iPhone 6 release would have been
the most likely intent for {iphone 6} prior to its release. After the phone is released, likely intents include finding
information/reviews and/or purchasing the phone.

{us open} For queries like {us open} that recur periodically, results about the current or nearest instance of the event satisfy the
most likely intent. In this particular case, users could be looking for either US Open Tennis or US Open Golf depending on
the time of the year. It is important to always judge with respect to current time for these queries.

{Britney spears} For celebrity queries, finding fresh content satisfies one of the likely intents but is not necessarily the most likely intent.
{Beyonce}
{Obama}

{Halloween costumes}, Finding recent content on latest costume trends and coupons could satisfy one of the likely intents for these queries.
{coach outlet}

{install windows} For this query freshness is important because the likely intent is to install a recent version of Windows. A SERP
prioritizing information about Windows Vista would be less preferable to one showing information about a more recent
version.

{yo} The most likely intent is to find the yo app: http://www.justyo.co/.

{how to get rid of Freshness is not an important consideration for this query since termite extermination techniques are not likely to
termites} change over time.

77
How to identify recency of the results:

This will vary from website to website but you can often find a publication date/time on the page or hints in the body of the content.

Sometimes you can identify the publication date from the URL. For example:
http://www.nytimes.com/2014/10/15/world/africa/ebola-epidemic-who-west-africa.html?_r=0

78
Freshness examples:

 {isis} Judging 01/21/2015


o At the time of judging, the most likely intent would have been for fresh news content relating to Isis. Although R1 is an article from CNN providing an overview of the
group, we can see from the URL that it is dated. The general article in L1 provides a robust overview, is well referenced, and is much more up to date.
o Although L3-L5 are individual news articles, they were very fresh at the time of judging and would have been useful to users looking for fresh news on this topic.
o In contrast, although R3 and R5 provide diversity, they are unlikely intents for this query and should be placed much further down the SERP. On the right, only R4
provides recent news relating to the most likely intent.

Conclusion: Left Better.

79
 {bill cosby} Judging 11/25/2014
o At the time of judging, side searches showed major news relating to allegations of assault concerning Bill Cosby. The most likely user intent at the time would have
been to find the latest news relating to the allegations.
o While R1-R6 would usually be credible and diverse results, at the time of judging, they provided limited information relating to the major news about Bill Cosby. Users
with the most likely intent for fresh news would not have been satisfied with these results (they relate to a less likely intent to find out about Cosby and his career in
general).
o In contrast, L3-L6 all provided fresh and robust news content relating to the most likely intent. This is what most users would have been looking for at the time.

Conclusion: Left Better.

80
 {Beynce baby born} Judging 01/08/2012
o This example was judged shortly after the birth of Beyonce’s baby. L1/R1 and L2/R2 are the same.
o L3 and R3 provide useful information; R3 includes the birthday of the baby in the snippet.
o R4 is also a relevant article while L4 and L5 are articles written prior to the birth of the baby. These would be less satisfying for the user.
o The rating is Right Slightly Better. Both SERPs offer useful results in the top positions, but the mid-results on the left are more dated than those on the right.

Conclusion: Right Slightly Better.

81
5 Mixed Examples
5.1 Left/Right Much Better
You will find that some SERPS are clearly much better than others. The other SERP may be blank or contain results that are irrelevant and would satisfy almost no users.

 {lowesl}
o The query is misspelt and side searches show that the most likely intent is for the popular retail chain Lowe’s. The results featured on the left all relate to very unlikely
intents that would be useful to no users.
o In contrast, the results on the right are all related to Lowe’s, including the official site in the top position, with relevant links in the sub-navigational block. The results
in the non-navigational block provide diversity with results for other intents, such as the company’s stock price in R4.

Conclusion: Right Much Better.

82
 {library}
o L1 and L2 are useful for users in Lexington KY, while L3 is useful for users in New York NY.
o R1, R2, and R3 are about the King County Library System, and are therefore useful to users in Redmond, WA.
o Since we assume the user is in Redmond, WA, the results on the left are not relevant, while the results on the right are very relevant.

Conclusion: Right Much Better.

83
 {www.standardbird.ca}
o The query is not a working URL, but a side search indicates that the user is most likely interested in navigating to standardbredcanada.ca (the website for the official
registry and record keeping body for the Canadian Standardbred industry).
o The left side offers no relevant results; the user would waste a lot of time browsing irrelevant content and be completely unsatisfied with the SERP.
o The right side provide the HP for the most likely intent in position R1, which would satisfy most users with this intent.
o The right side also has relevant results in top positions R2-R4 and lower mid-level positions R6 & R7.

Conclusion: Right Much Better.

84
5.2 Left/Right Better
In these cases, there will be something that stands out enough to make one side better than another, or there are one or several results on one side that aren’t useful and would
make users dissatisfied with their search experience as a whole on that side.

 {search for children of jjohnfrancismetcalf who lived in nelson co. ky}


o L1/L2 list people in Kentucky named John Francis Metcalf (without the trailing “e”).
o L3 provides links to genealogical resources for Kentucky, but does not address the query.
o L4 is a forum with many emails asking for information on people named Metcalf, but does not address the query.
o R1 is about Ignatius Metcalfe and his son, John Urban Metcalfe, which doesn’t match the query.
o R2 does not load and so would be a dissatisfying experience.
o R3 is the Bing search engine, which isn’t what the user is looking for.
o R4 has some relevant information.

Conclusion: Left Better.

85
 {www.valleynationalbank.com}
o The most likely intent is to navigate to the specified URL, which is the homepage for Valley National Bank. R1 is an exact match for this URL, providing access to the
range of different products and services offered. In contrast the left SERP does not feature the homepage at all.
o While the subpages provided on the left are useful, users would expect to find the exact match for the queried URL at the top of the SERP, as it is on the right SERP.

Conclusion: Right Better

86
 {poltergeist}
o The movie “Poltergeist” is a likely intent, but it is an old if popular film, so it is more likely users will be looking for information on paranormal phenomenon.
o The right side captures both of these intents, while offering more results that correspond to the more likely intent.
o The left side focuses more heavily on the movie in the top 1-6 positions, and beyond the Wikpedia article in L2, users must scroll down to L7 before finding more
pages related to poltergeists.

Conclusion: Right Better.

87
 {Hotmail sign up free}
o L1 is a UK-oriented page explaining how to use Hotmail (it includes a small section on hot to “Get Hotmail now”); it’s useful, but not the best result to serve up first.
o L2 is focused on signing in to Hotmail, rather than signing up.
o L3 is the Windows Live ID sign-up page customized for Hotmail, which would best satisfy the intent of the query.
o R1is the Windows Live ID sign-up page that expects the user to already have an email address, although it has a small hint saying “Or get a Windows Live email
address”.
o R2asks the user whether they have an email address; if not, it redirects them to Left 3.
o R3/L3 are the same.

Conclusion: Right Better.

88
 {open kitchens}
o Category query (large number of elements). The most likely intent is to find information about open-style kitchens, including images, decorating tips, etc. Less likely
intents would be for restaurants/businesses named 'Open Kitchens' or similar.
o Left SERP gives higher ranking to relevant results from popular home decorating sources like Houzz and HGTV which are aggregate LPs.
o The right SERP includes more results about narrow, local results (such as R9, a Facebook page for a Falls Church, VA, restaurant called Open Kitchen).

Conclusion: Left Better.

89
5.3 Left/Right Slightly Better
SERPs that are slightly better will offer something that makes the user experience minimally better than what is offered on the other SERP. Many of the results on both sides will
be the same or similar, but there is still something distinctive about one side that makes the experience more favorable to users.

 {cherry park long beach}


o The most likely intent is to browse or find information about Cherry Park in Long Beach, CA. Since Cherry Park Skate Park (another name for Bixby Park) located in
Long Beach may be of interest to some users, representing the diversity of intents for this query is important.
o L1/R1-L5/R5 are the same.
o L6/L7 feature a result about the skate park.

Conclusion: Left Slightly Better; since the skate park is featured slightly higher on the left.

90
 {“morrispullmio”}
o The user is specifically looking for Morris Pullmio (they have even enclosed their query in quotes). The left side shows no results for this query, while the right side
has interpreted the query to be {morrispullman}, and has results for Morris Pullman, Dr. Morris of the Pullman company etc.
o In this case, the results are not useful on the right side, so the left side is actually slightly better from the user’s perspective since they won’t waste time wading
through irrelevant results.

Conclusion: Left Slightly Better.

91
 {paulcampbell fighter}
o Intents for the query include: finding information on the fighter named Paul Campbell or finding information about the actor Paul Campbell, who appeared in the
movie “The Fighter”.
o L3 attributes the film ‘The Fighter’ to the wrong Paul Campbell (there are two actors with this name).
o The right side has results in the top positions featuring photos of the correct actor in the film.
o Otherwise the results on both sides are similar in diversity and relevance.

Conclusion: Right Slightly Better.

92
 {blueberry muffins}
o The user is looking forblueberry muffin recipes, and both sides offer relevant results.
o However, the right side offers a better diversity of pages from well-known and authoritative cooking websites in the top 1-6thpositions, including AllRecipes,
FoodNetwork and Epicurious.
o L3 offers a recipe from a content farm (About.com), and L6 offers recipes from a site of unknown quality.

Conclusion: Right Slightly Better.

93
 {margaret george books}
o Category query (small number of elements). The most likely intent is to find books by Margaret George, an American historical novelist who has published seven
books in total.
o L1/R1 is a 1st party LP on the official site of the author, providing detailed information on her books.
o L3/R3 is another very good result, as it features George’s books on a reputable 2 nd party website.
o All the results are the same on both sides, except for those in the 5 th and 6th positions, which are reversed. L5/R6 provides a greater variety of relevant books, while
R5/L6 includes books by other authors. However, as these results are further down the SERP, this is only sufficient for Left Slightly Better.

Conclusion: Left Slightly Better.

94
5.4 About the Same
Many of the results will be the same or similar, and neither SERP will offer a noticeably better experience to users.

 {funny jokes}
o All results on the both sides are the same and in the same ranking order, except that L8/R9 and L9/R8 are switched.
o Neither of the results is better than the other in terms of popularity or authority, and their ranking does not offer anything outstanding above what is already offered
in the first through the seventh positions.

Conclusion: About the Same.

95
 {skylerkalene twitter}
o Side searches reveal that there is a person named Skyler Kalene, and the user is most likely looking for their Twitter feed.
o In this case, many of the results are different on both sides, but none of them are relevant to the query, making both sides equally unsatisfying search experiences.

Conclusion: About the Same.

96
 {lyrics}
o All results on both sides are the same and in the same ranking order, except for position L10/R10.
o Neither lyric site in the 10th position is better than the other in terms of popularity or authority; neither result stands out as being more useful or relevant to the user
than the other.

Conclusion: About the Same.

97
 {ge staybright flickering window candles}
o Specific query. The most likely intent is to shop for GE StayBright flickering window candles.
o L1/R1 and L3/R3 are very useful results, as they feature the product in question on reputable 2nd party sites, Amazon and TrueValue.
o Results in positions 5 and 6 are reversed, but the product has either been sold, or the auction has ended.
o L7/R7 are the same and useless.
o Results in positions 8 and 9 are reversed, but the item is either out of stock, or presented in a list with many irrelevant products.

Conclusion: About the Same.

98
5.5 Much Better vs. Better
If the results on one SERP are completely irrelevant or very unsatisfying, while the other contains LPs that could strongly satisfy the query intent, the Left/Right Much Better rating
may be applicable. If the difference in satisfaction between the two SERPs is less pronounced, then Left/Right Better may be more appropriate. When deciding between the
Better and Much Better ratings, keep in mind that you should be comparing the two SERPs to one another, rather than to an imagined "ideal" SERP.

 {.sft}
o Based on query construction, most likely user intent is to find information about the .sft file type/extension.
o Right side contains relevant results in all positions, while left side shows only results related to much less-likely alternate intents, so users would be much more
satisfied by right SERP.

Conclusion: Right Much Better.

99
 {FIVE DOLLAR SHAKE!?! WHAT'S IN A FIVE DOLLAR SHAKE? I'VE GOTTA TRY A FIVE DOLLAR SHAKE.}
o Most likely intent is to find content related to this paraphrased quote from the film Pulp Fiction.
o Left side contains no results relevant to this intent, while the 1st and 6th results on the R SERP offer the full quote, which would be useful to users.
o Not Much Better like previous example, since Right side has only a few relevant LPs and users would still have to scroll through many irrelevant results.

Conclusion: Right Better.

100
 {duplicating an entire page in pages}
o Most likely intent is to find out how to duplicate an entire page in Apple's Pages software.
o Right side contains no results relevant to this intent, focusing on other programs such as Microsoft Word, Publisher, etc.
o Top results on Left include several LPs with content answering the query question. Though lower results on SERP are irrelevant, users could conclude their search
with the useful results in the top positions, so Left side is much more satisfying.

Conclusion: Left Much Better.

101
 {texas fastpitch softball pitcher berryhill}
o Most likely intent is to find information about a fastpitch softball player in Texas with the surname Berryhill. Side search indicates that users are probably seeking
Simone M. Berryhill from Westwood High School in Palestine, TX.
o All results on both SERPs are irrelevant aside from R1, which provides a robust profile of this athlete. Right SERP is not Much Better than L since it also contains 9
irrelevant results, but it's also not just Slightly Better since the relevant result is quite useful and located in the key top position.

Conclusion: Right Better.

102
5.6 Better vs. Slightly Better
In cases where the differences between the SERPs are minor, and one SERP is only marginally more satisfying, choosing the rating of Left/Right Slightly Better is more appropriate.
Remember that in most cases, results listed further down are less impactful than those prominently listed at the top of the SERP. Also, a SERP is not necessarily better just because
it has more results on one side; the order and relevance of the results are often more important to user satisfaction.

 {dominos menu}
o The most likely intent is to find a menu for Domino’s pizza chain, easily accessible from the top result on either side.
o L2 (http://www.dominos-menu.com) is a poor-quality page using spam techniques, while R2 is the menu subpage on Domino's official site. The right SERP would be
preferred, but only slightly, since the poor-quality result is just one position lower on the right (R3 vs. L2).
o There is minimal difference in the ranked order of pages between the mid and lower results on either SERP.

Conclusion: Right Slightly Better.

103
 {target}
o The most likely intent is to find content related to the well-known retailer Target. Finding information about the term "target" represents a much less likely, though
still plausible, intent.
o Unlike the previous example, the SERPs are more varied. The right side offers more results in the top positions related to the retailer, including the store locator and
access to the page with weekly ads, while the L side places the dictionary definition in L2 and offers up the Mobile RX page in L4/R10, which is less likely to be
satisfying in a desktop context (of course, if this were a mobile SBS, mobile pages are preferred).
o R6/L8 offers another page from the trusted online site Amazon, offering targets for sale.

Conclusion: Right Better.

104
 {prevail extra xxl}
o The most likely intent for this query is to shop for Prevail Extra Protective Underwear in size XXL.
o The right SERP has the same results as the left side, except the results are shifted down by one level, with a Nextag page in R1.
o Since the results are very similar, and L1 is not a particularly well-known or high-quality retailer, the left SERP is only marginally better in not showing the Nextag
page.

Conclusion: Left Slightly Better.

105
 {county connection 5}
o A side search indicates that the most likely intent is to find information about Route 5 on the County Connection in Walnut Creek, CA.
o The top results on both SERPs are the same, but R2 (http://www.cccta.org/schedule/5) is the official schedule/map for the specified route. This crucial result is not
included at all on the left, and also the L SERP includes a completely irrelevant result at L4.
o All four of the top-ranked results on the right are useful, so the right SERP would be considered better.

Conclusion: Right Better.

106
6 Appendix: setting up your computer
6.1 System Requirements
Note: Before making changes to your machine you may want to have a current backup.

To work on the HRS project successfully, you will need:

 A computer less than three years old


 A screen resolution of at least 1024x768 pixels; we recommend higher resolutions to improve your judging experience.
 An adequate amount of RAM for high-volume internet use (we recommend at least 2GB):
 Working sound card and speakers
 High-speed broadband internet connection (DSL or Cable, not dial-up).
 You will often find that using a network cable to connect to your home router is faster and more reliable than using a wireless connection. You can test this by using
sites such as http://speedtest.net.
 Microsoft Windows operating system (Vista, or 7); we recommend not using XP if possible.
 Microsoft Windows Internet Explorer version 9 or higher
 Microsoft .NET Framework 3.5 SP1 (.NET Framework 4 is the latest version)
 Current, up-to-date commercial anti-virus software; see Section 6.5for details.

6.2 Browser plugins and external programs


To view and judge most web content, you will need to have appropriate software installed on your system, primarily:

 Adobe Acrobat Reader


 Adobe Flash player
 Adobe ShockWave player
 Apple QuickTime
 Microsoft Silverlight
 Sun Java
 Microsoft Word, PowerPoint, and Excel 2007, or the free viewer programs.
We do not recommend you use alternative “clone” programs as what you see may be different from what users of the official programs will see.

6.3 Refusing and uninstalling third-party programs


Many download programs try to quietly install toolbars or other programs to track your browsing habits; these can interfere with the smooth operation of your computer, and can
critically taint the judging data that you provide to Bing.
When installing downloaded programs, ensure you disallow installation of such unwanted programs, for example the Ask.com toolbar, or McAfee Security Scan Plus; you can often
do this by choosing the “advanced” or “custom” installation mode, and de-selecting the appropriate checkbox before starting the download.
If any such programs are installed, you can uninstall them from the Windows Control Panel.
You must uninstall all browser toolbars other than the Bing HRS Toolbar before you do any judging. This including the Bing/MSN bar as well as toolbars from other companies such
as Google, Yahoo, AOL, and Ask.com.

107
6.4 Creating an Unprivileged account
We strongly urge you to follow the steps below to create an “unprivileged account” on your machine to use when doing HRS work. This is an account that does not have
administrative rights to install any software on the machine, and so has a much reduced risk of infecting the machine.If you do encounter any problems, you can start the
troubleshooting process by at most removing the infected account rather than having to wipe your entire machine.

For additional information see these articles for Windows Vista, orWindows 7.

6.4.1 Windows 7
1. Click on the start menu (the Windows icon on the bottom left corner of your screen)
2. Select “Control Panel” in the right hand shaded section
3. Select “User Accounts and Family Safety”.
4. Click User Accounts, and then click “Manage another account”.
5. If you're prompted for an administrator password or confirmation, type the password or provide confirmation.
6. Click “Create a new account.”
7. Click Continue if prompted by “Windows needs your permission to continue”.
8. Select the radio button for “Standard user”, and click “Finish”.
9. The next screen then shows the list of current available users. Click on the user account you just created to create a password. Enter and confirm the new password, and
click the “Create Password” button.
10. Log off machine as administrator
11. Log on machine using your new “unprivileged” account and password

6.4.2 Vista Ultimate Edition


1. Click on the start menu (the Windows icon on the bottom left corner of your screen)
2. Right click on Computer, and choose “Manage”.
3. Click on Local Users and Groups, click on the folder Users
4. Right click a create New User
5. Choose a username,enter and confirm the new password, and click the “Create Password” button.
6. Log off machine as administrator: click on the start menu, then click the bottom right hand arrow, and select “log off”.
7. Log on machine using your new “unprivileged” account and password. Click the link to “Other users” and then enter your computer name\username (from step 5 above).

6.4.3 Vista Home Premium


12. Click on the start menu (the Windows icon on the bottom left corner of your screen)
13. Select “Control Panel” in the right hand shaded section
14. Select “User Accounts and Family Safety”.
15. Select “Add or remove user accounts” on next screen.
16. Click Continue if prompted by “Windows needs your permission to continue”.
17. On next screen, choose “create a new account” (near the bottom of the page).
18. Create a User Name (pick whatever you want and can remember) and select the radio button for “Standard user”.
19. Click “Create Account”.

108
20. The next screen then shows the list of current available users. Click on the user account you just created to create a password. Enter and confirm the new password, and
click the “Create Password” button.
21. Log off machine as administrator
22. Log on machine using your new “unprivileged” account and password

6.4.4 XP
1. Right click on My Computer, click on Manage
2. Click on Local Users and Groups, click on the folder Users
3. Right click  create New User
4. Choose a username,enter and confirm the new password, and click the “Create Password” button.
5. Log off machine as administrator
6. Log on machine using your new “unprivileged” account and password

6.5 Malware and Virus Protection


You should ensure your malware and virus protection are up-to-date and pay close attention to the actions you perform while online.

The following steps should be taken only when logged in as Administrator


 Keep your computer up to date with the latest updates for Windows, Office, all your programs, your hardware and any other devices you may use.Use these links to
regularly check for needed updates:
o http://update.microsoft.com/
o http://office.microsoft.com/en-us/downloads/default.aspx?ofcresset=1 .
 Install a good anti-virus protection program, set it run continuously in the background, and to automatically check for updates every 24 hours. Commercial anti-virus
providers have more incentive to provide a quick update for a new exploit.
o A good free option is Microsoft Security Essentials
 Avoid multiple levels of duplicated tools; these can slow down your network throughput and may conflict with each other. Disable all additional security software other
than anti-virus, anti-spyware, and firewall protection; phishing filters, scam filters, and parental controls can slow down the machine and take away valuable bandwidth as
they scan each and every website you visit. If your internet security suite includes a firewall, you may need to disable Windows Firewall or Windows Defender; go to the
Control Panel > Windows Security Center > Firewall to check the settings on your machine.

We recommend that you follow these steps using your Unprivileged Account.
 Set Internet Explorer security level to Medium-High. Go to Tools > Internet Options > Security tab > Internet zone – move slider bar to Medium-High. It is important to do
this set first before the steps below.
 Prompt for downloads from browser. Go to Tools > Internet Options > Security tab > Custom level > scroll down to Downloads – Automatic prompting for file downloads.
 Disable installation of desktop items. Go to Tools > Internet Options > Security tab > Custom level > scroll down to Miscellaneous – Installation of desktop items.
 Prompt before allowing Active X controls in the browser. Go to Tools > Internet Options > Security tab > Custom level > scroll down to Active X Controls and Plug-Ins –
Automatic Active X Controls.
 Never allow the browser’s home page to be changed.
 Activate a pop-up blocker. You will need to allow pop-ups from the HRS tool homepage as specified in Section 3.
 Wherever possible, close pop-up windows by using the X in the upper right corner rather than a "close" or "no" button.
 The toolbar will clear your Internet Explorer cache, cookies, temporary internet files and history each time it starts. If you want to preserve these for your personal
browsing, use a different account on your computer or use a different browser.

109
6.6 Setting the IE language preference
Web sites use the browser’s Language Preference settings to determine which languages to show to the user (if they have versions in multiple languages). You need to make sure
that your IE browser is set to the correct language(s) for your locale; please ask your manager to provide the appropriate language list.

 Go to Tools, Internet Options, click on the Languages button and then choose only the specified language(s) for your locale.

Figure 1: Language Preference

6.7 Browser shortcuts


Action Keyboard shortcut
Find on page Ctrl + F

Navigate to bottom of landing page Spacebar or down arrow key

Move cursor to URL address line Alt + D

Internet Explorer refresh F5 or Ctrl + R

Close tab Ctrl + W

New tab Ctrl + T

Scroll down/up one page PgDn/PgUp

Open link in new tab Ctrl + Click; or middle-click

110

You might also like