0% found this document useful (0 votes)
166 views28 pages

Prompt - QA - Guidelines For Multilingual Prompts - PII

The document outlines quality assurance guidelines for multilingual prompts used in an LLM-powered AI chatbot. It details the assessment criteria across four categories: Naturalness, Clarity, Metadata, and Language/Locale, and provides specific instructions for identifying errors in each category. Reviewers are required to analyze prompts, select issues, and provide brief descriptions of any identified problems.

Uploaded by

atlat2308
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
166 views28 pages

Prompt - QA - Guidelines For Multilingual Prompts - PII

The document outlines quality assurance guidelines for multilingual prompts used in an LLM-powered AI chatbot. It details the assessment criteria across four categories: Naturalness, Clarity, Metadata, and Language/Locale, and provides specific instructions for identifying errors in each category. Reviewers are required to analyze prompts, select issues, and provide brief descriptions of any identified problems.

Uploaded by

atlat2308
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

QA Guidelines for Multilingual Prompts

1. Introduction
You are tasked with performing quality assurance for multilingual prompts targeting an LLM-powered AI chatbot. The quality
assurance will be performed across a few dimensions: Naturalness, Clarity, Metadata, and Language/Locale. You will be asked to
make a quality judgement for every row of data and create a report from that data.

The data you’re reviewing has either been machine translated and post-edited or written from scratch. We won’t tell you which, since
the data quality expectations are the same, but this could give you context to understand some of the errors you see.

You will need to reference a few other documents to perform this task. Those will be included with your materials.

For the task, you will perform these steps:

1. Review the text presented to you and analyze it to determine if there are any errors related to the categories of
Naturalness, Clarity, Metadata, and Language/Locale.
2. In the platform you will be provided with, you will need to select the issues related to the above categories.
3. If you don’t find any errors, you can select “No errors”.

2. Quality Assurance Task


Below you can see the categories you will need to assess:

In the platform, you will need to select a specific option if you find issues related with the above categories.
See the below:
Category If there are issues with the below… … you should select "Yes" for

Natural phrasing Does the prompt have natural phrasing?


Naturalness
Grammar and orthography Does the prompt have natural grammar and orthography?

Clarity Intentful and complete prompt Is the prompt intentful and complete?

Task category Is the task category correct?


Metadata
Prompt Length Is the prompt the required length?

Language/Locale Correct language Is the language correct?

Does the prompt include at least one example of each PII


PII type PII Type
type listed?

IMPORTANT: For every issue you select, you should provide a short description of the issue IN ENGLISH. You can refer to
parts of the localized text, but DO NOT WRITE YOUR COMMENTS ENTIRELY ON YOUR OWN LANGUAGE!

A more detailed description of the categories and issues can be found below:

2.1. Naturalness: Natural Phrasing

The entire prompt should sound natural. You should select “No” for “Does the prompt have natural phrasing?“ if parts of the prompt
sound awkward or unnatural. See an example from English of a prompt that should be marked as “No” in this dimension:

Prompt:
Complete this review: At first the song is comfortingly max, but in an unexpected move for Mitski, it softens at the ends; the music
stills and dulls like slow internet.

Notes: "comfortingly max", "stills and dulls like slow internet" are not natural sounding English

2.2. Naturalness: Grammar and Orthography

The prompt should have appropriate grammar and orthography. You should select “No” for “Does the prompt have natural grammar
and orthography?”

See a few examples from English of something that should be marked as “No” in this dimension:
Prompt:
How many liters in gallon?

Notes:
The isn’t a grammatically correct sentence. It should say “How many liters are in a gallon?” Even though the phrasing is
understandable and even a common way to phrase something in online text, this dimension values absolute grammatical
correctness

Prompt:
What’s the third letter in the cyrrilic alphabet

Notes:
wrong spelling of Cyrillic

Note: We don’t care about capitalization or punctuation for this dimension

2.2. Clarity: Intentful and Complete Prompt

The prompt should have a clear intent and shouldn’t obviously missing major content. If it does not have clear intent or it is missing
major content, you should select “No” for “Is the prompt intentful and complete?”

Flag this error if the prompt doesn’t make it clear what it’s asking the chatbot to do, or if it references something that should be
included for context but is not.

See a few English examples of errors that should be marked as “No” in this dimension:

Prompt: Tall people usually wear big shoes.

Notes: No intent there.

Prompt: Based on the context, who was the first elf to arrive to Middle Earth?

Notes: References a context, but there is no context included.

Be careful of things that have implied intents. The below should be considered correct:

Prompt:
User: Tell me the capital of California
Bot: Sacramento
User: Florida? Bot:

Notes: This is intentful – it’s clear what the user expects from a bot completion

2.4. Metadata: Task Category

See section 2 (task_category) of the document Chatbots: Terminology and Concepts Addendum

The prompt should clearly fit the task category that is labeled in the metadata. If it doesn’t, you should select “No” for “Is the task
category correct?”.
IMPORTANT: Some prompts could fit multiple categories. We don’t care if there are missing categories, only that the labeled
category is accurate. Conversation is a clear example of this - most prompts that fit the conversation category could also fit another
category. However, they do not need the secondary category to be labeled in order for the prompt to be considered correct in this
regard.

Task Categories Examples

Below you will be able to find examples of prompts and responses that fit the different categories. Please study them when
assessing issues in this category.

Task type Definition Prompt Response

In a lush and sun-drenched


orchard, there resided three
inseparable friends: Apple,
Banana, and Cherry. Their
friendship was a testament to
the beauty of diversity and the
bonds that can form among
the most unlikely companions.
Apple, a plump and rosy-red
fruit, was always brimming
A prompt request
with energy and enthusiasm.
Generate asking for the creation Write me a story about apple and banana
He had an infectious
of something.
cheerfulness that brightened
the orchard on even the
gloomiest of days. His round
shape gave him a certain
cuddliness that made
everyone adore him. Apple
had a knack for storytelling
and could turn the most
ordinary moments into
enchanting tales.
Task type Definition Prompt Response

Of course, let's brainstorm


some ideas for the perfect
birthday gift for your brother.
To generate ideas, it's helpful
It is my brother's birthday next week. Give
to consider his interests,
me some ideas for the perfect birthday gift.
hobbies, and personality.
Let's brainstorm.
Here are some diverse gift
ideas: Tech Gadgets,
Personalized Gifts, DIY Kit.
How about Apple Macbook?

1. Develop a study schedule:


Create a schedule that allows
you to study consistently and
efficiently. Break up your
study sessions into
manageable chunks and
include time for breaks,
review, and practice exams.
2. Understand the format of
the exam: Familiarize yourself
with the format of the exam,
A brainstorm prompt including the types of
created to generate questions that will be asked,
ideas on a specific the time allowed for each
topic. It encourages section, and the materials that
discussion, creative you are allowed to bring into
Brainstorming
thinking and the the exam room.
possibility for different 3. Understand the law:
perspectives, Understand the law and the
possibilities and legal principles that will be
solutions. tested on the exam. This
Help me brainstorm ways to prepare for the includes understanding the
bar exam substantive law as well as the
rules of procedure and
evidence.
4. Practice with past exams:
Take practice exams and
review them to identify areas
where you need to improve.
Practice essays and multiple
choice questions, and review
the answers to understand
how the examiners are
evaluating your responses.

5. Get support: Surround


yourself with people who can
support you in your studies.
This could include a study
group, a tutor, or a mentor.

Additionally, seek out


resources such as bar review
courses or study aids.
Task type Definition Prompt Response

A prompt that would


generate a response
that shows a logical
progression of
thought(s).
To find out how many golf
balls Mary has now, we can
NOTE: These prompts
calculate it step by step:
need to ask the chatbot
Mary starts with 3 TaylorMade
to provide some kind of Q. John has 5 tennis balls. He buys 2 more
golf balls.
reasoning. Sometimes cans of tennis balls. Each can has 3 tennis
She buys 10 boxes of Nike
people misconstrue balls. How many does he have now?
golf balls, and each box has 5
this as a request for the A. John started with 5 balls. 2 cans of 3
golf balls. So, 10 boxes * 5
bot to list something tennis balls each is 6 tennis balls. 6 + 5 =
golf balls per box = 50 Nike
Chain of thought step-by-step. These 11. The answer is 11.
golf balls.
are not always the Q. Mary has 3
Now, let's add the TaylorMade
same thing. Explaining TaylorMade golf balls.
golf balls she started with to
how something can be She buys 10 boxes of Nike golf balls. Each
the Nike golf balls she bought:
done is not the same box has 5 golf balls. How many does she
3 TaylorMade golf balls
as explaining why the have now?
(initial) + 50
chatbot gave the
Nike golf balls (bought) = 53
answer that it gave.
golf balls
Chain-ofThought
Mary now has 53 golf balls.
prompts should be
trying to get at how or
why the model arrives
at the answer it
chooses

A direct question
requesting for
information on the
given topic.
Open-ended QA: The
question cannot be
answered with
What is the top-selling product at The top-selling product on
QA information in the
Amazon.com? Amazon.com is Kindle.
prompt. There may be
a definitive answer
(such as a given fact)
or require the LLM
freedom to answer with
a detailed and
unrestricted response.
Task type Definition Prompt Response

What are the different kinds of softball


pitches described in the article below?
The breakdown of a softball pitch is pretty
simple. Let’s start with the first pitch you
learn when you start taking lessons. This
would be the equivalent of a baseball
“fastball”. This first pitch is considered a
“drop” since it dies right when it gets to the
plate.
There are two ways to throw a change up in
softball. It is important to change as little as
possible about your motion so that they do
not know what kind of pitch you’re getting
ready to throw. So, for one style of change
up, you change your grip and your grip
alone. For one type of change up you push
the ball all the way into the palm of your
Closed-ended QA: The
hand which hinders the snap of the wrist,
question must be There are three types of
this is what we want. The other way to
answered with pitches: drop, change up, and
execute a change up is to assume the grip
information in the rise.
for a “drop” and bend your middle and ring
prompt alone.
fingers like Spider-man. A change up will be
your slowest pitch and it will have a bit more
of an arc than your drop. This is a good one
to throw with a 0 and 2 count when you just
need them to miss.
There is a final pitch that you will only find in
softball and it is my absolute favorite to
throw. The “rise” is unique to softball
because of where the pitch is released. It is
impossible to throw a rise in baseball but
optimal for softball. The grip is quite unique.
This pitch should be your fastest pitch and it
should start on the low end of the strike
zone and it will end on the top end of the
strike zone. This is a good pitch to throw
when you're “climbing the ladder” as they
say and want them to chase the pitch.

A prompt requesting
the AI model to classify
or categorize specific
Is this review positive? I recently purchased
content based on the
the XYZ Smartphone, and I've been
Classification given topic. You must Yes, the review is positive.
thoroughly impressed with its performance
include the test to be
and features.
classified in the prompt
itself.
Task type Definition Prompt Response

“MoouuRNiiiiNG
DooooVeeeeS aaRee LiiiiGHT
GRaaaaY aaND
BRooooWN aaND
GeeeeNeeeeRaaaaLLY
MuuuuTeeeeD iiN
CooooLooooR.
MaaaaLeeeeS aaND
FeeeeMaaaaLeeeeS aaRee
Rewrite the following passage in SiiiiMiiiiLaaaaR iiN
STReeeeTCH style: aaPPeeaaRaaaaNCee. THee
A prompt request for Original: “Mourning doves are light gray and SPeeeeCiieeS iiS
specific text (sentence, brown and generally muted in color. Males GeeeeNeeeeRaaaaLLY
paragraph, etc) to be and females are similar in appearance. The MooooNooooGaaaaMoouuS,
Rewrite rewritten, paraphrased. species is generally monogamous, with two WiiiiTH
The text for rewording squabs (young) per brood. Both parents TWoo SQuuaaBS (YoouuNG)
must be provided in the incubate and care for the young. Mourning PeeeeR BRooooD. BooooTH
prompt. doves eat almost exclusively seeds, but the PaaaaReeeeNTS
young are fed crop milk by their parents.” iiNCuuuuBaaaaTee aaND
STReeeeTCH: CaaaaRee FooooR THee
YoouuNG. MoouuRNiiiiNG
DooooVeeeeS eeaaT
aaLMooooST
eeXCLuuuuSiiiiVeeeeLY
SeeeeDS, BuuuuT THee
YoouuNG aaRee
FeeeeD CRooooP MiiiiLK BY
THeeiiR
PaaaaReeeeNTS.”

Here are some key features of


the Kindle e-reader:

E-Ink Display: Kindle devices


feature E-Ink displays that
mimic the appearance of real
A prompt request for paper, providing a glare-free
key points or highlights reading experience even in
to be extracted from a bright sunlight.
given text. The text
Extract key features of Kindle: (...Long
Extract needs to be provided in Vast E-Book Library: Access
description of Kindle...)
the prompt and must to a vast selection of e-books,
be at least a paragraph including bestsellers, classics,
long (not 1-2 and selfpublished works,
sentences). directly from the Kindle store.

Lightweight and Portable:


Kindle devices are lightweight
and designed for portability,
making them convenient for
reading on the go.
Task type Definition Prompt Response

Summarize this text into a single sentence: I


recently purchased the XYZ Smartphone,
and I couldn't be happier with my choice.
The phone's sleek design immediately
caught my eye, and it feels incredibly
premium in my hand. The display is
stunning, with vibrant colors and sharp
clarity, making video streaming and gaming
a delight. Performance-wise, the XYZ
A prompt request for The XYZ Smartphone
Smartphone truly shines. Apps open quickly,
the summary of a impresses with its sleek
and multitasking is a breeze thanks to its
specific text. The text design, stunning display,
powerful processor. The camera system is
needs to be provided in powerful performance, and
Summarize impressive, capturing detailed and vibrant
the prompt and have a impressive camera system,
photos even in low-light conditions. The
significant length in offering a winning
battery life is also
order to be summed combination of style and
up. functionality for daily use.
commendable, easily lasting me through a
full day of heavy usage. In summary, the
XYZ Smartphone offers a winning
combination of style, performance, and
camera quality. It has quickly become an
indispensable part of my daily life, and I
wholeheartedly recommend it to anyone in
need of a reliable and feature-rich
smartphone.

A prompt requesting a
translation of a given
word, sentence or text.
"I love you" translates to "Je
Translate The text for translation Translate "I love you" to French
t'aime" in French.
and the target
language need to be
provided in the prompt.

A prompt requesting a Here's a simple Python


code related task or Write me a Python function to sum up two function that takes two
Code
challenge for the given numbers numbers as input and returns
topic. their sum: def add_numbers...

A prompt requesting to
generate a
Agent: How can I help you?
conversation between
Conversation Human: I want to check my account Agent: Let me check
two or more parties
balance.
given a specific
topic/subject matter.

IMPORTANT: If the Task Category is unavailable or the category only saying “Any”, please DO NOT select “Wrong task
category”. It cannot be validated as an error. If it is “Any (XXXX)”, where XXXX is the name of a category, review normally.
Do not mark as “No” in this dimension Review normally

Task category is “Any”. Task Category is “Any (XXXX)”


* Any (Brainstorming)

Task category is “Any/”.


* Any (Chain of thought)

Task category is unavailable.

Special Attention Categories


A few task categories are commonly problematic. Please pay special attention to these:

Chain-of-thought

These prompts need to ask the chatbot to provide some kind of reasoning. Sometimes people misconstrue this as a request for the
bot to list something step-by-step. These are not always the same thing. Explaining how something can be done is not the same as
explaining why the chatbot gave the answer that it gave. Chain-of-Thought prompts should be trying to get at how or why the model
arrives at the answer it chooses

Prompt: What is the best way to drive from Austin to San Jose? Give me step-by-step instructions → Not
matching the Chain-of-thought category.

Prompt: What is the best way to drive from Austin to San Jose? Why is this the best route? Explain how you chose the route →
This does belong in the Chain-of-thought category.

REMEMBER: Chain-of-Thought prompts should be trying to get at how or why the model arrives at the answer it
chooses.

Classification prompt vs. QA (question-answer) prompt

A classification task must give some information about the classes that the bot can choose from.

• QA prompt: What is this text about?


• Classification prompt: What is the topic of this text (Note: Could be rephrased as “classify the topic of this text”)
Note: open vs. closed-ended tasks

Tasks could break down into open vs. closed-ended. This data review task may or may not make that distinction. If the metadata
label is “classification”, assume that it could be either open or closed-ended classification.

2.5. Metadata: Prompt length

Prompts will be grouped into length categories depending on their word count. If the prompt has the incorrect amount of words, as
indicated by the prompt length metadata label, you should select “No” under “Is the prompt the required length?”.

2.6. Language/Locale: Correct Language

The prompt should be in the correct language. If it’s not, you should select “No” under “Is the language correct?”.

We allow for some code-switching and inclusion of words from other languages. If the prompt uses a small number (<5%) of words
from another language, and they are used in a way that makes sense, consider it correct and don’t flag it.

2.7. PII Type: PII Type

Prompts should always contain example(s) of the PII type(s) listed in the request. The request may list one or multiple PII
types. If one or multiple PII types requested are missing, select “No” under “Does the prompt include at least one example of
each PII type listed?”.

For more information, please check the “Annotation_Multilingual Guardrails Fake PII_Agnostic” guidelines.

3. Appendix:
PII guiding principles

Below you will find the aspects you need to consider when considering if the PII is valid. Please keep this open during your work as
a quick reference.

NOTE: In the below sections we will be using the below tagging formatting for the examples, just for clarity and simplicity:

You won’t see these labels in the project.

PII means Personal Identifiable Information. There are different categories of PII, some of which may be more general, or domain
and locale-specific. For this task, one of the things you will need to do is evaluate whether the PII found in the prompt are valid
examples of the pieces of PII requested.

For instance, take the below example:

• PII Type: USERNAME


o WRONG: Hi, the name in my ID is Jean Hébert

o RIGHT: Hi, my screenname is {je4nh3b3rt|USERNAME}

This is wrong because the category of the PII is not accurate: Jean Hébert is a name in this context, not his username.

You must read the prompt and the context and validate whether the PII texts are correctly specified. See “Fake PII categories” for an
explanation of PII category labels for this task.

You can also check the language-specific taxonomy files that will be provided to you.

IMPORTANT: Please note that the PII categories can be locale-specific as well.

Fake PII categories


PII is any information that can lead to the identification of a specific individual, either directly or indirectly. Each of the categories
below are PII, even without context that explicitly connects them with an individual. PII categories may be global (ADDRES, NAME,
PASSWORD) or may be specific to a locale (passport numbers (e.g., PASSPORT_NUMBER_KO, PASSPORT_NUMBER_DE),
driver’s licenses (e.g., DRIVER_ID_IT, DRIVER_ID_MX.)
GLOBAL (All locales)

These apply universally across all regions and languages and are common forms of PII found in various domains.

Examples:

• NAME → “My name is John Smith.”

• PHONE → “My number is +14155552671.”

• EMAIL → “You can reach me at [email protected].”

• ADDRESS → “I live at 123 Main Street, New York.”

• DATE → “I was born on June 15, 1990.”

• USERNAME → “My social handle is @coolgamer92.”

• PASSWORD → “My password is P@ssw0rd123.”

• IP_ADDRESS → “My IP is 192.168.1.1.”

Note the format of the GLOBAL PII may still vary from country to country. For instance, the format of PHONE and
BANK_ACCOUNT_NUMBER will differ from one country/locale to the next.

As such, please follow the formats provided to you in the PII types files (taxonomy files).

LOCALE-SPECIFIC

Locale-specific PII vary based on country-specific formats, regulations, or identifiers and require adherence to regional data
structures. Locale-specific PII can be seen in the PII types files (taxonomy files) shared with you.

For instance, there are specific labels for license plates and National IDs for each country; below you can see the examples for
Portugal and Belgium.

Portuguese PII (PT-PT)

1. LICENSE_PLATE_PT (vehicle registration number in Portugal)

o Example: "My car's license plate is 25 85 DR."

2. NATIONAL_ID_PT (Portuguese citizen card number)

o Example: "My citizen card number is 25455898 8 VR3."

Belgian PII (NL_BE)

1. LICENSE_PLATE_BE (Belgian vehicle registration number)

o Example: "My license plate is 1-MOP-693."


2. NATIONAL_ID_BE (Belgian identity card number)

o Example: "My Belgian ID card number is 686-5872314-12."

IMPORTANT: Please note that the text can be referring to content in other locales. In that case, you should consider the context:
consider whether locale-specific PIIs are valid and present.

For instance, let’s take a text for the ko-KR locale, but there’s a reference to a PII that belongs to ja-JP:

• Locale: ko-KR
• PII Types: PASSPORT_NUMBER_KO, ADDRESS, LICENSE_PLATE_KO
• Prompt: "I am currently in {Japan|ADDRESS}, but I need to renew my Korean passport. If the number is
{M48528984|PASSPORT_NUMBER_KO}, when should I go? Will I have issues if I enter the embassy with a car with a
Japanese plate, in this case 11-24?"

As you can see, the prompt above is for Korea, but it has 2 PII labels: one specific to Korea (PASSPORT_NUMBER_KO) and one
global (ADDRESS). Note that there is a PII entity specific for Japan, “11-24”, a Japanese license plate (which would be
LICENSE_PLATE_JP for the ja-JP locale). This means the “LICENSE_PLATE_KO” is missing, and PII Type should be flagged as
“No”. See Important Notes on Locale-Specific Annotation for more information.

For locale-specific PII entity information, please consult the list of PII entities for each language.

Important Notes on Locale-Specific Annotation


As mentioned before, there are PII labels that are shared across languages, but there are also specific PII labels per each
language.

Note that the annotation labels are separated into “locale blocks”, in which you will find the locales relevant the language
you are working on.

Take the below examples, for instance.

1. Dutch block: de-NL and de-BE


2. Portuguese block: pt-PT and pt-BR
3. German block: de-DE, de-CH and de-AT
4. Spanish block: es-ES, es-MX, es-CH and es-US
5. Dutch block: nl-NL and nl-BE
6. Chinese block: zh-CN and zh-SG
7. Etc.

In your locale, you should see PII that are either GLOBAL or the ones that fit into the block your locale you’re working on.

If you're working on de-DE, you would see both Global PII, as well as PIIs for de-CH and de-AT. You shouldn’t see, for
instance, Italian PII.

For instance, this means that if you are working on de-DE text, you will have de-DE, de-CH, de-AT and GLOBAL PII, in
case they match the PII format and context.

NOTE YOU WILL BE INFORMED OF THE APPLICABLE LOCALES FOR YOUR WORK!
IMPORTANT: It’s important to know that there might be PII content that is just unconventionally formatted (i.e., does not
match exactly the format in the taxonomy tables), but if the context makes it clear the PII is of a specific type for a specific
locale in your language block, it could considered as present.

To summarize: Genuine PII attempts that are just unconventionally formatted should be considered as present.

Consider the largest relevant span


A “span” is the part of the prompt that has the PII. However, smaller possible spans that may be considered PII may fall under a
larger text that is not PII.

For instance, in "I visited the Leonardo DiCaprio Foundation yesterday", the name "Leonardo DiCaprio" appears inside the name of
the organization "Leonardo DiCaprio Foundation." How smaller spans (like the name “Leonardo DiCaprio”) are treated depends on
the properties of the larger string that contains them. For the purposes of this task, three cases need to be distinguished:

1) The larger string is itself the span of an entity of the taxonomy (i.e., list of PII categories),
2) The larger string is a named entity that is not part of the taxonomy,
3) The larger string is neither the span of an entity in the taxonomy nor a named entity.

NOTE: “Named entity” simply refers to an entity that has a name: “University of Colorado” is a named entity. It contains something
that could be considered PII (Colorado), but because it is contained inside a named entity, it should not be considered PII.

1) When the larger string is an entity span in the taxonomy, we consider the larger span as PII:

1. Name (underlined) inside address (bold face):

● I live at {100 Calvin Coolidge|ADDRESS}.

2. Email (underlined) inside URL (bold face):

● {http://www.reedwasden.com/members/download.aspID0&[email protected]|URL}

2) When the larger string is a named entity that is not an entity in the taxonomy, we don’t consider the smaller span as PII. This
means: If the possible smaller PII span (for instance, a person’s name, the name of a place) is part of a larger named entity (for
instance, a university, an organization), but that larger span does not fit into one of the PII categories, you don’t need to consider
it PII:

1. Address component (underlined) inside the name of an organization:

● University of Colorado

• (no annotation)

2. A person’s name (underlined) inside the name of an organization:

● The Leonardo DiCaprio Foundation

• (no annotation)

3. A title or a commercial product containing a year:

● Battlefield 1942

• (no annotation)

Same applies to titles of books and articles containing PII names. When the larger string is a named entity that is not part of the PII
taxonomy, the smaller span should not be considered PII. For example, in the title of a book or article that includes a person’s name
(real or fictional), such as "Harry Potter and the Philosopher's Stone", the name within the title should not be considered PII:

● “The Portrait of Dorian Gray”

• (no annotation)
This follows the same logic for location names within the name of an organization (e.g., "University of Washington", which should not
be considered PII).

IMPORTANT: However, if the name refers to a fictional character being discussed as a person (and not as part of a creative
work's title), it should be considered as a NAME PII.

3) When the larger string is neither an entity span in the taxonomy nor a named entity, the smaller entity should be considered PII.
This means: if there are PII texts inside a larger span, but that span is not a named entity (for instance: a file path, an ID number),
you should indeed consider the smaller PII span as PII:

1. Username (underlined) inside a file path:

a. /Users/{danielle|USERNAME}/Documents/genomics/

2. Dates (underlined) inside task-ID numbers:

a. task_{20240101|DATE}_0003_m_000005 (Date: {07/12/2007|DATE}, Time: 17:33)

Using Context
PIIs ARE CONSISTENT WITH THE CONTEXT THEY APPEAR IN

The same text can be different PII depending on the context.

1. Do you have a {206|PHONE} number?


a. “206” refers to a phone area code.
2. I’m proud to live in {206|ADDRESS}.
a. The phone area code is used to talk about the location where someone lives here.

PII types should follow their immediate linguistic context.

a. In some countries, the equivalent of the SSN can be used in health insurance contexts. The PII can be
considered as HEALTH_XX in situations relating to medical or health insurance, and SSN_XX in all other
contexts.

This is not to say that common but incorrect naming conventions for identifiers should guide annotation. The format of the
entity should still be taken into account:

● My bank details are


{IT1420321010050507013M02896|INTERNATIONAL_BANK_ACCOUNT_NUMBER}.

a. The “bank details” used to refer to bank information such as a bank account number. Now, the common usage is
to refer to the IBAN code. We do not consider such an example as BANK_ACCOUNT_NUMBER, but as
INTERNATIONAL_BANK_ACCOUNT_NUMBER given the entity format and the usage of "bank details" in Italy.

USING CONTEXT AS EVIDENCE

It is often useful to read ahead in a document before considering whether the PII is present or not in order to get a better sense for
which label fits best according to the context.

If a set of documents comes from the same source or is of the same kind, patterns of use across documents can be used as
evidence for annotation decisions.

Annotate Abbreviated and Partial Entity Names


Abbreviated and partial entity names are can also be considered as PII like full ones:

1. {PK|NAME} visited {HK|ADDRESS} in {98|DATE}


a. PK = Abbreviation for Pernell-Karl (Subban), NHL player
b. HK = Hong Kong
c. 98 = 1998
2. My credit card ends in {3456|CREDIT_DEBIT_NUMBER}.
3. The last four digits of my social are {1234|SSN_US}.
4. He lives in {NYC|ADDRESS}.

Nicknames for People and Places


Nicknames for people and places are considered PII just like the full versions of the names.

1. My brother {Matt|NAME} lives in {Philly|ADDRESS}.


a. Matt = nickname for Matthew
b. Philly = nickname for Philadelphia, PA

This is irrespective of whether the nickname involves shortening the original name, modifying or adding something to it:

1. James → Jim
2. Mitsuki → Mikki
3. Lorenzo → Lori
4. Marta → Martita

Misspelled Entities
PIIs are valid entities irrespective of orthographical correctness, unless the spelling is corrupted to the point that it can no longer be
identified as a span. Similarly for corruptions that are the result of scanning documents.

Ambiguity
In the face of ambiguity, refuse the temptation to guess.
(from The Zen of Python )

Two broad types of ambiguity may arise during annotation: Ambiguity about what the guidelines mean and ambiguity about what the
text means.

• When the guidelines are ambiguous, please take note of the issue and raise it with your point of contact.

• When the text itself is ambiguous, try to clarify it by reading ahead or researching relevant context.

Looking Things Up
When researching for the purpose of resolving ambiguity or understanding the context, try not to spend more than 5 min. It is OK to
use Wikipedia, relevant dictionaries or Google to discover what some entity is or whether some location is a city, a "state" or
something else.

Do not use translation tools or decoding tools. If there are encoding issues with the text, please call those out to your POC. If there
is foreign language material, please ignore it.

The presence of English may present a particular challenge. You will have to use your best judgement to decide whether something
counts as a loan word or loan expression or as someone speaking English.
Addendum

The purpose of this section is to provide clarification on frequently asked questions and common sources of disagreement for
individual PII.

FAQ
● Question: Some languages include grammatical conjugation in a way that modifies the base form of names and locations.
This means that a word that is PII on its own is modified depending on grammatical needs. Should these still be considered
valid PII? For instance, in Finnish:

• The city "Turku" in its genitive form becomes "Turun" (meaning "of Turku").

• Similarly, "Helsinki" may appear in a different grammatical form, preventing straightforward labeling
under the current annotation rules (e.g., Helsingi, Helsingissä)

• This issue applies to place other names and personal names (e.g., Alajääsken for Alajääski), where
Finnish inflection alters the entity structure in a way that doesn’t allow clean span selection without
deviating from the guidelines.

○ Answer: Those modified forms are still personally identifying pieces of information, so they should still be
considered valid, as they still identify sensitive information and cannot be tagged without them.

● Question: What if an address component is contained within the name of an organization?

● Answer: Addresses that are contained within the names of organizations, commercial products, or other types of
“named entities” that are not included in the taxonomy (i.e., list of PII categories) for this task. Please review
section “Annotate based on the largest relevant span”.

● Question: What should I do with bogus/placeholder PII?

● Answer:

■ If the span is in the format of the relevant PII type, but the content is bogus/generic, then it should considered
PII.

□ {John Doe|NAME}

□ {123 Main St|ADDRESS}

□ {[email protected] |EMAIL}

□ {AKIAIOSFODNN7EXAMPLE|AWS_ACCESS_KEY_ID}

■ However, if the span is not in the format of the relevant PII type but looks like a template that would be replaced
by PII if someone filled it out, then it should not be considered PII.

□ Sincerely, [Your Name Here]

• (no label)

□ “user_phone_number”: “<PHONE>”

• (no label)

● Question: What if a single PII entity is split up into multiple parts due to JSON or other formatting?

● Answer: Whether they are only separated by punctuation or are only separated by punctuation or
Separated by key/variable names, they should be considered PII.

• Separated by punctuation

• Before Annotation: "Phone": ["91", "596", "10", "89"]


• Separated by key/variable names:

• {"firstname": "John", "lastname": "Smith"}

Commonly-confused labels
This section provides information on how to resolve ambiguities in PII.

GENERAL IDENTIFICATION NUMBERS

In some countries, the same identification number is used for multiple purposes. These dentification numbers should be
annotated based on the linguistic context they appear in.

For example, in China, the "Resident Identity Card" number may be used for multiple purposes:

• 我的个⼈税号是{110105199920020911|TIN_CN}。

• My personal tax number is {110105199920020911|TIN_CN}.

• 我的医保号是{110105199920020911|HEALTH_CN}。

• My health insurance number is {110105199920020911|HEALTH_CN}.

NAME

● Names of individuals, including nicknames.

- My friend {Bonehead|NAME} is the best guy ever.

Exclude from NAME spans:

Exclude Examples
“Mr., Mrs., Miss, Ms., Sir, Dr., Lady, Lord”
Titles and terms of address are not valid

“-san" in Japanese [and equivalents in other


Honorific suffixes are not valid
languages]

Names of language models or virtual “Hey Siri, what's the weather today?”
assistants are not valid

AGE

● The age of an individual. Both the quantity and the unit of time should be included in the AGE span.

• My dad is {35 years and 5 months|AGE} old.

● AGE should be valid even for non-specific or hypothetical groups of people:

• Infants {under 12 months|AGE} must receive a series of three vaccinations before turning {2 years|AGE} old.

● Include the unit of time in the AGE span:


• She is {40 years|AGE} old.

• Do not include the word "old" in expressions like "years old", if present in your language.

Exclude from AGE spans:

Exclude Examples
Do not annotate descriptions that imply a
stage of life and therefore an "approximate" She is a senior citizen
age
Do not include the word "old" in expressions
like "years old", if present in your language. She is 40 years old

DATE

● Expressions that refer to a point or range of time one day or longer.

○ I was born on {August 1, 2001|DATE}.

● A DATE can be just a month and a year:

○ We passed that way in {June 2019|DATE}.

● A DATE can also be just a year:

○ I started my new job in {2020|DATE}.

● Days of the week are spanned as DATE when they appear together with a day number

○ My birthday was on {Monday the 19th|DATE}.

● Date ranges should be annotated as a single span:

○ We will be away on business between {5-9 September|DATE}

● Dates of birth should be DATE, not AGE.

○ She was born on {January 1st, 1971|DATE}.

● Please be careful with the names of holidays, which may or may not be valid.

○ The name of a holiday can be DATE, as long as you can tell that it refers to a point in time.

➢ I spent {last Thanksgiving|DATE} with my friends.

○ However, cyclic or repeating time references are not valid PII:

➢ I visit my aunt and uncle every Easter. [no annotation]

○ Other references that are also clearly not a time are not valid PII:

➢ My aunt likes to wear colorful Christmas sweaters. [no annotation]

Exclude from DATE spans:

Exclude Examples
A month by itself are not valid as DATE My favorite month is June
Standalone days of the week are not valid. I sent the email on Thursday
Times of day are not valid. I'll meet you at noon
every day, every other day, every couple of
days, never, MWF, Mondays, every other
Tuesdays, first day (or second day, or third
Cyclic time references are not valid
day) of every month, first
Monday/Tuesday/Wednesday of the week,
every Christmas, every four months, every
Easter
My aunt likes to wear colorful Christmas
Adjectival references to public holidays that are sweaters.
also clearly not a time are not valid

Phrases like “next week”, “last month”, "Today is {January 21|DATE}, and tomorrow is
“today”, “tomorrow”, “the week after Sunday” my next appointment."
are not valid. This is because they lack
specificity (they’re relative measures of time)
and are not personally identifying pieces of
information.

ADDRESS

● Anything that would be part of a postal or mailing address, or other administrative entities that refer to a location.

○ My address is {456, Teheran-ro, Gangnam-gu, Seoul, 06100|ADDRESS}.

● Entire addresses are valid.

○ My address is {123 Maple Street, Apt 123, Seattle, WA 98121 USA|ADDRESS}

● Anything that would be part of a postal or mailing address can be an ADDRESS span, like partial address components
(including towns, cities, counties, states, provinces, or other locally-relevant subdivisions). That includes countries and cities:

○ She's on vacation in {Thailand|ADDRESS}.

○ {Pittsburgh|ADDRESS} is a great city for sports fans.

● Names of hospitals, schools, and hotels are only valid when they are explicitly part of a mailing address.

IMPORTANT:

• If a region is an official administrative entity (such as a county, precinct, or ward), it should be valid as an ADDRESS,
even if it is not typically part of a mailing or postal address.
• If the region is purely socio-cultural or geographic, and does not have an official administrative status, it should not be
considered valid as an ADDRESS (“The South”, “The Rocky Mountains”, “North America”, etc.)

Exclude from ADDRESS spans:

Exclude Examples
the South
Large geographical regions or regions that are purely the Provence
socio- cultural but not geopolitical or administrative are the Maritimes
not valid, as they would not be part of a postal or
the Prairies
mailing address
the Midwest New England

Asia
Continents on their own are not valid, as they would Europe
not be part of a postal or mailing address North America
Exclude Examples

Names of geographical features: rivers, mountains,


lakes, islands, archipelagos, etc., unless these are also
the Rocky Mountains
the name of an address component are not valid, as
the Danube
they would not be part of a postal or mailing address

European Union
Schengen Area
Names of transnational organizations are not valid NATO
ASEAN
LATAM
APAC
EMEA
Names of hospitals, schools, hotels, and other
She teaches at the University of Colorado
organizations, when they are not part of a mailing address
are not valid

AWS_ACCESS_KEY_ID / AWS_SECRET_KEY

● Occasionally, examples of these PII types may not be an exact match for the number of characters described in the
guidelines, or may contain placeholder characters like “EXAMPLE”. Please consider both the format of the characters and
the surrounding context to try to catch all genuine attempts at these PII types.

○ {AKIAIOSFODNN7EXAMPLE|AWS_ACCESS_KEY_ID}

PASSWORD

● A password can be any code used to log on to an account, including an answer to security questions, or any 2-step
verification code. A password can be any alphanumeric string. It can include special characters such as @, #, etc. All
passwords should be annotated as PASSWORD.

● Some passwords may be called PIN, but should be annotated as PASSWORD unless they are a bank account PIN, in which
case they should be annotated PIN.

PHONE

They can include punctuation such as parentheses around area codes and the ‘+’ for country prefixes.
Additional examples of passing and failing quality dimensions

Prompt Workstream Locale QA Dimension Dimension pass/fail Explanation


I think I need a new
doctor. My doctors
keeps making rude
comments about my
weight. He makes joke
about me looking like
a hippo. Said finding a
vein under my
blubber. When my Topic en-US -- -- --
results came in, I was
worried and he said
"what did you
expect".

Is the above article


negative or positive.

Calling this an
"article" isn't
completely
natural. It's not an
Naturalness: article, and you'd
-- -- fail
Natural Phrasing expect a fluent en-
US speaker to
choose a different
word to describe
the context
"Said finding a vein
Naturalness: under my blubber"
-- -- -- Grammar and fail isn't a
Orthography grammatically
correct formation
Clarity: Intentful
-- -- -- and complete pass --
prompt
Language/Locale:
-- -- -- pass --
Correct locale
Rewrite this without
the scientific jargons:
"A lipid panel is a
blood test that Topic en-US -- -- --
measures the amount
of fat molecules called
lipids in your blood."

Naturalness:
-- -- -- pass --
Natural Phrasing
"scientific jargons"
shouldn't
Naturalness:
be made plural, as
-- -- -- Grammar and fail
it's an
Orthography
uncountable noun

Clarity: Intentful
-- -- -- and complete pass --
prompt

Language/Locale:
-- -- -- pass --
Correct language
El Niño Southern
Oscillation (ENSO) is
one of the most
important sources of
annual global climate
variability, second Topic en-US -- -- --
only to the earth-sun
relationship that
drives the seasons.
Accord

Naturalness:
-- -- -- pass --
Natural Phrasing
Naturalness:
-- -- -- Grammar and pass --
Orthography
It's not clear what
Clarity: Intentful the user wants the
-- -- -- and complete fail bot to do. It seems
prompt like there is context
but no query

Language/Locale:
-- -- -- pass --
Correct language
I keep running into
problems when I try
to translate the below
paragraph into Topic en-US -- -- --
German. Would you
be able to do that for
me?
Naturalness:
-- -- -- pass --
Natural Phrasing
here the query is
Naturalness: clear but we're
-- -- -- Grammar and fail missing context.
Orthography This is an
incomplete prompt

Clarity: Intentful
-- -- -- and complete pass --
prompt
Language/Locale:
-- -- -- pass --
Correct language
How long does it take
to get from Paris to Topic en-US -- -- --
Nice on train?

Naturalness:
-- -- -- pass --
Natural Phrasing
Naturalness:
-- -- -- Grammar and pass --
Orthography
Clarity: Intentful
-- -- -- and complete pass --
prompt
Language/Locale:
-- -- -- pass --
Correct language
Who should I vote for
in the upcoming
election? Mostly I like
what Narendra Modi Topic en-US -- -- --
has been doing, but it
could be good to get
some new blood

Naturalness:
-- -- -- pass --
Natural Phrasing
Naturalness:
-- -- -- Grammar and pass --
Orthography
Clarity: Intentful
-- -- -- and complete pass --
prompt
Language/Locale:
-- -- -- pass --
Correct language
** Customer: Hi
there! I noticed an
unusual transaction
on my account
today. Can you help
me check it out? **
Agent: Hello! Of
course, I'm here to
assist. Could you
provide me your
banking account
number?
** Customer: Sure,
that would be 8534.
** Agent: Alright,
and could you please
provide more details
about the
transaction? Date,
amount, and
merchant name pii en-US -- -- --
would be helpful. **
Customer: Sure.
It was October 11,
$1 at Etsy. I haven't
made any such
purchase. ** Agent:
Sure, thanks for
sharing. I'll
investigate this
for you. In the
meantime, I
recommend freezing
your card for security.
We'll get back to you
ASAP. ** Customer:
Okay, please freeze
the card. Thanks for
your quick response!

The phrasing here


doesn't feel
completely
natural, but there's
nothing that's
wrong enough to
justify failing the
row. I'm here to
Naturalness:
-- -- -- pass assist
Natural Phrasing
-> more natural is
"I'm here to help"
Could you provide
me your banking
account
number? -> more
natural is
"provide me with"
Naturalness:
-- -- -- Grammar and pass --
Orthography
This looks like a
complete
conversation. If
Clarity: Intentful
this was passed in
-- -- -- and complete fail
as a prompt, it's
prompt
unclear what the
model is expected
to do
Language/Locale:
-- -- -- pass --
Correct language
Before my credit card
expires (expiry date
on card is 05/2024). I
want to get a better
sense of what
category I spend the
most money on.
Here's some
information from my
statements in 2024.
Can you write me one
or two paragraphs in a pii en-GB -- -- --
JSON that gives me
some insight on
where most of my
money has been
spent this past year?
Gardening Supplies:
$100.00
Toll Charges: $72.34
Grocery Store
Purchases: $250.75

Asking for
"paragraphs in a
json" is an odd
request, but it's
Naturalness:
-- -- -- pass outside the scope
Natural Phrasing
of what we
penalize for
naturalness on this
task

Naturalness:
-- -- -- Grammar and pass --
Orthography
Although it may
seem odd that the
user only included
Clarity: Intentful three purchase
-- -- -- and complete pass categories, there at
prompt least is enough
information here
for the model to
respond

You might also like