NACLO2022ROUND2
NACLO2022ROUND2
generous contributions:
The Sixteenth
Annual
North American
Computational
Linguistics
Open
Competition
2022
www.nacloweb.org
Invitational Round
March 17, 2022
Rules
1. The contest is four hours long and includes nine problems, labeled J to R.
2. Follow the facilitators’ instructions carefully.
3. If you want clarification on any of the problems, talk to a facilitator. The facilitator will
consult with the jury before answering.
4. You may not discuss the problems with anyone except as described in items 3 & 10.
5. Each problem is worth a specified number of points, with a total of 100 points. In the
Invitational Round, some questions require explanations.
6. All your answers should be written clearly in the Answer Sheets at the end of this
booklet. ONLY THE ANSWER SHEETS WILL BE GRADED.
7. Write your name and registration number on each page of the Answer Sheets.
Here is an example: Jessica Sawyer #850
8. Some problems are more difficult than others, but all can be solved using ordinary
reasoning and some basic analytic skills. You don’t need to know anything about
linguistics or about these languages in order to solve them.
9. Don’t be discouraged if you don’t finish everything! If we have done our job well, very
few people will solve all these problems completely in the time allotted.
10. DO NOT DISCUSS THE PROBLEMS UNTIL THEY HAVE BEEN POSTED ONLINE!
THIS MAY BE A COUPLE OF MONTHS AFTER THE END OF THE CONTEST.
Program Committee:
Adam Hesterberg — Massachusetts Institute of Technology
Aleka Blackwell — Middle Tennessee State University
Ali Sharman — University of Michigan
Cerulean Ozarow — Brown University
Daniel Lovsted — McGill University
Dragomir Radev — Yale University
Ethan Chi — Stanford University
Evan Hochstein — Yale University
Jonathan Huang — Massachusetts Institute of Technology
Lori Levin — Carnegie Mellon University
Patrick Littell — University of British Columbia
Pranav Krishna — Massachusetts Institute of Technology
Ryan Chi — Stanford University
Skyelar Raiti — University of Michigan
Tom McCoy — Johns Hopkins University
Booklet Editor:
Daniel Lovsted — McGill University
US Team Leaders:
Aleka Blackwell — Middle Tennessee State University
Lori Levin — Carnegie Mellon University
NACLO Co-Chairs:
Aleka Blackwell — Middle Tennessee State University
Lori Levin — Carnegie Mellon University
Problem Credits:
(J) Harold Somers
(K) Ryan Chi
(L) Tom McCoy
(M) Gordon Chi
(N) Simi Hellsten
(O) Andrés Pablo Salanova
(P) Ali Sharman
(Q) Simi Hellsten
(R) Ethan Chi and David Mortensen
We are grateful for the support of many institutional and individual donors who make this contest possible.
All material in this booklet © 2022, North American Computational Linguistics Open Competition and the
authors of the individual problems. Please do not copy or distribute without permission.
(J) Sounds Fishy (1/1) [5 Points]
As a child learns to talk, they “acquire” the sound system of their language bit by bit, with some speech
sounds appearing later than others. Unfortunately, some children have difficulties during this process, and
may be referred to a speech therapist. One of the first things a therapist does is try to assess the state of the
child’s sound “system”, and they sometimes do this by administering an “articulation test” in which the child
is asked to name pictures and in this way pronounce a set of words specifically chosen to profile the child’s
sound system.
Here are some examples slightly adapted from a genuine case: Scott, a 4-year-old British boy. Each example
contains the target word (the word being pronounced) and Scott’s pronunciation (see below the tables for
an explanation of the unfamiliar symbols). Note that, in this particular case, we are not interested in the
vowels, which are all pronounced “correctly”.
Target Pronunciation Target Pronunciation Target Pronunciation
church dɜ:x Christmas gixməx plane bein
teeth di:x pencil penduw spoon pu:n
fish pix flower bauwə toothbrush du:xbux
yellow jewou smoke hmouk birthday bɜ:xdei
stamps danx sneeze hni:ɣ loose wu:x
queen gi:n wings wiŋɣ feather peɣə
clouds gauɣ very bewi elephant ewipənt
soldier douɣə sugar dugə bottle boɁu
thumb dum monkey munɁi string diŋg
Pronunciation guide:
Ɂ is pronounced like the middle sound in “uh-oh”; x like the “ch” in “Bach”; ɣ like x but with vocal cords
vibrating; j like the “y” in “yes”; ŋ like the last sound in “sing”.
All transcriptions of vowel sounds are the same as the adult target. ə and ɜ are vowel sounds. The symbol :
indicates a long vowel. Note that the data comes from a British child, so the r is not pronounced in “soldier”,
“birthday”, etc.
J1. How would you predict that Scott would pronounce the following words? (Your answers should include
vowels, but you will not be graded on the specific vowels that you use.)
J2. What do you think Scott is saying here? Give one likely interpretation for each.
(a) danɁ ju bewi mux (b) wox jo: hanɣ an bux jo: di:x
(K) A Tough Word to Swallow (1/1) [15 Points]
Wik-Mungkan (literally: "to swallow one's words") is a Paman language spoken in Queensland, Australia, by
around 1,650 Wik-Mungkan people. On the left below are Wik-Mungkan words and phrases. On the right are
their English translations, in a scrambled order.
1. ma' ek A. awake
2. ma' puk pi'an B. brave
3. ma' puuy C. crab
4. ma' thayan D. crab shell
5. mee' E. English language
6. mee' thayan F. eye
7. mee' weep G. fingernail
8. min H. fresh water
9. ngak I. good
10. ngak mee' J. handcuffs
11. ngak min K. happy
12. ngak way L. heart
13. ngangk M. law
14. ngangk ek N. sad
15. ngangk min O. shoulder blade
16. ngangk thayan P. sound asleep
17. ngangk way Q. spring (water source)
18. puuy R. strong / firm
19. puuy ek S. thumb
20. thayan T. tired
21. weep thayan U. trustworthy (e.g., with belongings)
22. wik kiith V. undrinkable water
23. wik thayan W. water
In this problem, we will study finite-state transducers (FSTs), one type of system that can perform grapheme-
to-phoneme conversion. Below is an example of an FST:
The FST takes in a sequence of letters (in lowercase, before the colons) and outputs a sequence of sounds (in
uppercase, after the colons). The FST starts at the circle labeled “start.” When it reads in some lowercase
letter(s), it follows the arrow marked with the letter(s) and also outputs the phoneme(s) associated with the
letter(s), until the entire input has been used up. For example, given the input “siding,” the system would
produce SAYDIHNG. Ø is a special symbol which means that no output is produced: for the input “side”, the
output is SAYD. We need to represent letters and sounds differently from each other because letters can be
pronounced differently in different words. For example, the letters ed can be pronounced D (as in “timed”)
or UHD (as in “sided”).
L1. What output would the system produce for the following words?
L2. Sometimes, when the system reads in a letter, there are two possible paths that it could follow. In such
cases, it tries one path and then, if it gets stuck, it backtracks (goes back) and tries a different path until it
finds one that works – somewhat like how you might solve a maze.1 Exactly one of the following three words
could potentially force the system to backtrack – which word is it? Answer on your Answer Sheet.
1. It is possible to create an FST that gives more than one output for a given input. However, for all cases used in this contest, a
given input will have at most one output.
(L) Stopping for a Spell (2/2)
L3. A path is only valid if it ends at a position with a double circle. With this fact in mind, what output would
the system on the previous page produce for the following inputs?
L4. Many English words are spelled very strangely. For example, “colonel” is pronounced KUHRNUHL (like
“kernel”) — there is an R in the pronunciation even though there is no r in the spelling! The FST below is
designed to handle some of these exceptions. Match the arrows ((1)-(6)) with their labels ((A)-(F)) so that the
system gives the correct outputs for the 5 words listed under the FST.
Spelling Pronunciation
colonel KUHRNUHL
he HEA
people PEAPUHL
phase FEYZ
built BIHLT
L5. When using an FST, it is possible to swap what counts as the input vs. the output. In our case, this means
that we can provide a sequence of sounds (the symbols to the right of the colons) and have the system
produce letters (the symbols to the left of the colons). Since the system is converting sounds into spelling,
this process is something like having the system compete in a spelling bee. When you are using the previous
FST (the one that handles “colonel”), you try asking it what sequence of letters would be pronounced
RUHFLEA. You expect its answer to be “roughly”, but instead you get something very different! What
sequence of letters does the system say would be pronounced RUHFLEA?
(M) A Splitting Disagreement (1/4) [10 Points]
Note: This problem builds upon the previous problem, (L) Stopping for a Spell, so we recommend solving that
one first. There are parts of this problem that you may not be able to solve unless you have first completed (L)
Stopping for a Spell.
Thai, the official language of Thailand and a member of the Kra-Dai family, uses a writing system derived
from the Old Khmer script. Unlike in English, in Thai writing there are no spaces between words. Thus, Thai
word segmentation — the task of breaking a piece of Thai text into words — is a complex problem in
computational linguistics.
Pavan and Arun are both computer scientists who are trying to develop a word segmentation model for Thai.
To evaluate the performance of their model, they use the F1-score which is calculated using the following
equation:
TP represents the number of true positives, FP represents the number of false positives, and FN represents
the number of false negatives. A true positive is a case where the correct answer is a positive label, and the
model returns a positive label; a false positive is a case where the correct answer is a negative label, but the
model returns a positive label; and a false negative is a case where the correct answer is a positive label, but
the model returns a negative label.
In order to segment a piece of Thai text into words, Pavan and Arun’s models take in the Thai text and assign
a label to each character in the text. Specifically, the label should be 1 if the character is the end of a word,
or 0 otherwise. As an example, consider the following sentence:
This sentence has 4 words (counting the question mark as a “word”), which we can separate using vertical
bars:
We can further break the sentence into 13 characters. Some characters contain a dotted circle, indicating
that the character combines with some other character that goes in the place of the dotted circle:
(M) A Splitting Disagreement (2/4)
Based on the word boundaries indicated before, a perfect word segmentation output would be:
[0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,1]
Initially, Pavan and Arun develop a simple baseline approach. Soon, they learn a bit more about the Thai
writing system. Based on this knowledge, they group Thai characters into several groups: They label some as
vowels (listed as Vowel at the end of the problem), some as consonants that can appear at the start of a
syllable (listed as Initial consonant at the end of the problem), and some as consonants that can appear at
the end of a syllable (listed as Final consonant at the end of the problem). Note that some characters appear
in more than one category. Using these categories, Pavan and Arun each develop a new algorithm. All three
of these algorithms are described below:
Baseline algorithm:
• Label the last character of the sentence with a 1.
• Label all other characters with a 0.
Pavan’s algorithm:
• For all characters that appear in the Final consonant list at the end of the problem, label them with a
1.
• Label the last character of the sentence with a 1.
• Label all other characters with a 0.
Arun’s algorithm:
• Assign a label of 1 to all characters that satisfy the following criteria:
• The character is in the Final consonant list
• The character is preceded by a Vowel
• The character is followed by an Initial consonant and then a Vowel
• (In other words, use the label of 1 for a character that fits the FC slot in the following template: V
FC IC V).
• Label the last character of the sentence with a 1.
• Label all other characters with a 0.
On the next page are 3 Thai sentences along with the F1-score that each algorithm achieves on each
sentence. Unfortunately, Pavan and Arun have lost track of which algorithm is which!
(M) A Splitting Disagreement (3/4)
M1. Match each algorithm name from the previous page (“Baseline algorithm”, “Pavan’s algorithm”, or
“Arun’s algorithm”) to its label in the table (“Alg A” to “Alg C”). Answer on your Answer Sheet.
Thai sentence (with word boundaries Character array F1: F1: F1:
added) Alg A Alg B Alg C
“I speak Thai.”
One challenge for Thai word segmentation is that often there is no single right answer: There can be multiple
valid ways to break a Thai sentence into words. For example, consider the 4 Thai lines below. In the final
example of the table above, example (1) is treated as a single word (meaning “Thai”). However, it is also
valid to treat this as two words, listed in examples (2) and (3). If we use this view, then the final sentence
would be segmented as shown in example (4).
(1) “Thai”
(2) “language”
(3) “Thai”
(4)
M2. What F1-score would Arun’s algorithm get in the last row of the table above if we used (4) as the
intended segmentation, rather than the segmentation shown in the table? If you want, you can leave your
answer as a fraction — e.g., 19.4/51.7 — rather than simplifying it into a decimal.
(M) A Splitting Disagreement (4/4)
Here is a finite-state transducer that implements the Baseline Algorithm:
M3. Draw (a) a finite-state transducer that implements Pavan’s Algorithm and (b) a finite-state transducer
that implements Arun’s Algorithm. (See the previous problem, (L), for a definition of finite-state
transducers.) It may be helpful to use the category labels listed below; in an FST, one of these category labels
can match any character that is a member of that category. For example, in the transducer above, we have
used Any to match any single character. If you wish, you can abbreviate these terms — just make sure to
include a key for any abbreviations.
• Punctuation:
• Tone:
• Vowel:
• Initial consonant:
• Final consonant:
• Any: Can match any character from any of the 5 categories above
• Not [CATEGORY]: Can match any character that does not belong to the category [CATEGORY], where
[CATEGORY] can be replaced with any of the category names above (Punctuation, Tone, Vowel, etc.). For
example, Not Vowel would mean any character that is not one of the vowels.
(N) Pseudorandom Numbers (1/2) [15 Points]
Dinka is a Nilotic dialect cluster with about 1.3 million native speakers, mostly ethnic Dinka people in South Sudan.
There are several main varieties, but this problem focuses on the Agar dialect.
When linguists first studied the language, they believed that the singular and plural forms of Dinka nouns were
completely unpredictable. More recently, however, studying the way that verbs conjugate in Dinka allowed linguists to
find patterns in the singular and plural forms. This has allowed many nouns to be grouped according to common
patterns, although many remain unexplained.
Below are 22 Dinka nouns, in both singular and plural forms, each of which follow one of the common patterns. The
translations have been provided only for interest: they have no bearing on the solution to the problem.
Notes: ɛ is the vowel in “bed”, and ɔ is the vowel in “bought”. Dinka has three vowel lengths: short (e.g., a), medium
(e.g., aa), long (e.g., aaa); as well as three tones, high (e.g., á), low (e.g., à), falling (e.g., â). j and w are semivowels,
pronounced like the first sounds in “yes” and “with” respectively. t̪, d̪ , ɲ, ɟ and ŋ are consonants; how consonants are
pronounced is not relevant for this problem. While it is not strictly necessary for solving the problem, it may be helpful
to know that vowels can be classified by (among other things) height, i.e., how high or low the tongue is in the mouth
during their pronunciation. In this problem, i and u are high, e and o are high-mid, ɛ and ɔ are low-mid, and a is low.
N2. Assuming that the following verbs conform to one of the common patterns, fill in the blanks in the table below.
Answer on your Answer Sheet.
N3. Below are the singular or plural forms of 10 more Dinka nouns. Assuming that they conform to one of the
common patterns, predict the missing forms. If there is more than one possible prediction, give them all. Answer on
your Answer Sheet.
N4. Explain what you have observed about Dinka nouns and verbs from the data in this problem.
(O) Seeing the Future (1/1) [10 Points]
The Chorote Iyo’awujwa’ are a Matacoan people living in the Chaco region of Argentina and Paraguay. A
linguist working with one of the varieties of Iyo’awujwa’ obtains the following data from a native speaker
(“sg.” and “pl.” mean “singular” and “plural”, respectively):
The linguist then starts asking for other tenses. She asks how to say ‘you (sg.) are going to see me’ and gets
the form si’wehnayi’ from her consultant.
She says to herself, “I got this.” She asks her consultant, “Is ‘you (pl.) are going to see him/her/them’
hi’wehnayiweɬ?”
To her surprise, the form she gets is in’wehnayiweɬ. The consultant adds the following explanation: “it can
also mean ‘he/she/they are going to see you (pl.)’; and si’wehnayi’ can also mean a few other things, by the
way: ‘I am going to see you (sg.)’, ‘I am going to see him/her/them’, and ‘he/she/they are going to see me’.”1
O2. Describe how to form the Iyo’awujwa’ verb meaning “see”. Be sure to reference the present (“see”) and
future (“going to see”) tenses in your answer.
1. You may assume that in all cases, all the possible translations of a certain Iyo’awujwa’ form are given.
(P) Yumology (1/4) [15 Points]
To understand a piece of text, it can be extremely helpful to have some background knowledge about the
items discussed in the text: What properties do the items have, and how are they related to each other? This
problem deals with the important question of how we can represent such information in a way that a
computer can use.
As part of an initiative to increase their nation’s health, the Yaldish government has decided to list the
mineral potassium (which is abbreviated as K) on their nutrition labels. To ensure proper labeling, the Yaldish
Unified Ministry (YUM) maintains a Food Database of Compositions (FDC), but prior to the recent update in
requirements, they were not tracking potassium. Obtaining this information for each food listed in the
database through lab testing would be time-intensive and costly. The Yaldish have thus hired NacLabs to
develop a method to supplement the YUM FDC with the English-Language (EL) FDC, which has more
complete nutritional information.
The main challenge that NacLabs faces is that the food descriptions in the YUM FDC are written in Yaldish.
Even though they are also translated into English, the descriptions are not exactly the same as the
descriptions of similar foods in the EL FDC (which are described only in English). The following demonstrates
these kinds of differences:
Closest matches in YUM FDC, English translations (left) and EL FDC (right):
Furthermore, not all foods in the YUM FDC are listed in the EL FDC.
Taking these limitations into account, NacLabs has developed an algorithm that automatically fills in
potassium for YUM foods. On the next two pages are the YUM FDC (containing the automatically-estimated
K values), the EL FDC, and a set of food classification charts. Within each FDC, foods are classified based on
four facets plus a fifth “extra facet.” The food classification charts illustrate relationships between some of
the facets. If you are unfamiliar with any of the food terms in the EL FDC, see the glossary on Page 4 of this
problem.
P1. Two foods in the EL FDC are missing part of their description ((a) and (b)). On your Answer Sheet, fill in
the missing information. Word order does not matter as long as the desired meaning is clear.
P2. Three foods in the YUM FDC are missing their “Estimated K mg/100g” values ((c), (d), and (e)). On your
Answer Sheet, fill in the missing values. Note that the “extra facet” is not involved in determining these
values.
(P) Yumology (2/4)
EL FDC
EL ID Description K mg/100 grams Facets Extra facet
E01 Apple, raw, with skin 107 B1245; C0121; E0151; F0003 A2003
E04 Beet greens, raw 762 B1423; C0240; E0151; F0003 A2003
YUM FDC
YUM ID Estimated K mg/100g Facets Extra facet
Y1 201 B1136; C4545; E0133; F0003 A2003
Y2 250 B1530; C0339; E0115; F0003 A2003
Y3 250 B1484; C0339; E0114; F0013 A2003
Y4 107 B1245; C0126; E0133; F0003 A2001
Y5 189.5 B1430; C0120; E0215; F0013 A2002
Y6 170 B1430; C0120; E0310; F0013 A2002
Y7 (c) B1423; C0140; E1152; F0001 A2001
Y8 (d) B1245; C0121; E0151; F0013 A2003
Y9 (e) B2530; C0126; E0215; F0013 A2003
(P) Yumology (3/4)
Food classification charts
P3. Briefly describe how the “Estimated K mg/100g” values are determined in the YUM FDC. For this
question, you do not need to describe what any specific facets mean. As noted above, your answer to this
question should not involve the “extra facets.”
P4. Each facet starts with a letter (B, C, E, F, or A). The facets that start with F describe whether the food is
cooked. What type of information does each other letter correspond to?
(P) Yumology (4/4)
P5. For each of the following facets, briefly describe what that facet means:
(a) B1245 (b) B1530 (c) C0240 (d) E0310 (e) F0013 (f) F0001
P6. Name a food ingredient that might have the facet B1438.
P7. For each of the following YUM IDs from the YUM FDC, give a food description that could be associated
with that ID (in the style of the descriptions in the EL FDC). There are many possible answers. For full credit,
make sure that your answers cover all of the facets listed with each YUM ID:
P8. Even though they were not previously using it, NacLabs has decided to now include the “extra facet” in
determining the “Estimated K mg/100g” values in the YUM FDC. Will this decision make the estimated values
more accurate or less accurate? Explain your answer.
Apples are a fruit grown on a tree, available in red, green, and yellow varieties.
Applesauce is a dish made of apples (with their seeds and skin removed) blended until smooth.
Ascorbic acid is a chemical used to help preserve foods.
Bacon is a sliced breakfast food, typically made of pork but also available in meatless varieties made out of
protein extracted from beans, nuts, grains, etc.
Beet greens are the leaves of a beet plant.
Beetroot is the root of a beet plant.
Broiling is a method of cooking in which the heat source comes from above.
Canning is a food preservation process that involves raising the food to a high temperature and then sealing
it in a metal can.
Chuck roast is a type of beef.
Coconut water is a clear liquid found inside coconuts.
Mincing refers to chopping food into very small pieces.
Pasteurization is a process of heating food before packaging it in order to increase its shelf life.
Pan-frying is a method of cooking vegetables and other foods in a pan.
Pineapples are a fruit grown in a shrub.
Potatoes are a root vegetable. They are often served either baked (in which case the whole potato is baked
in an oven or microwave) or mashed (in which case the potato is cooked and then pounded with a utensil
until it is mostly smooth).
Pumpkins are a type of large orange vegetable that grows on a vine.
Puréeing is the process of blending a fruit or vegetable, often with its seeds and skin removed, into a smooth
liquid.
Raisins are dried grapes. They can be dried via heating or by being left out in the air.
(Q) Relatively Speaking (1/1) [15 Points]
Niuean is a Polynesian language spoken by nearly 8,000 people around the world. It is the official language of Niue, a
self-governing island in the Pacific, although most speakers of Niuean live in other countries, such as New Zealand.
Below are some sentences in Niuean. For each one, we have also listed one possible translation into English; some
sentences have additional possible translations that are not shown. Note that ā and ū are long vowels, and that g
represents the ng sound in sing.
Niuean English
2 Kua fai fakatino foki ne tā e ia. There have also been pictures that he drew.
6 To kai he moa ka holoholo e au e ika. The bird that I will wash will eat the fish.
9 Ne kai e ika ne takafaga he tama The fish that the child caught ate.
10 To holoholo foki he tama e vaka ne tā he kāmuta. The child will also wash the canoe that the carpenter built.
12 Muhu tama foki e faiaoga ka kitia he moa. The teacher also has plenty of children that the bird will see.
13 Fai vaka a Sione ne holoholo e au. Sione has canoes that I washed.
Q1. Translate the following sentences into English. For sentence (c), there are two possible translations; give them
both.
Q3. Describe what you have observed about Niuean grammar from the data in this problem.
(R) I Stop Being Afraid of This Problem (1/4) [10 Points]
One common task in developing technologies for human languages is grapheme-to-phoneme conversion (or
G2P).1 In G2P, you convert words written in orthography (the practical writing systems that people use day-
to-day) to a standardized phonetic transcription that can be used for recognizing and synthesizing speech,
among other things. In this problem, you will help develop G2P for Carib (Karìnja), a Cariban language
spoken by about 7,400 Carib people in Venezuela, Guyana, Suriname, French Guiana, and Brazil. Below are
some words in Carib with their pronunciations and meanings (a guide to the phonetic symbols is provided on
the next page). Here are some words to get you started:
Word (orthography) Phonetic transcription Meaning
As you can see, some consonants in the phonetic transcription of Carib are different from those of
English. To help you understand how these consonants are pronounced, here is some information on their
location in the mouth and manner of articulation (ɽ is pronounced farther back in the mouth than ɾ).
“Voiced” and “voiceless” indicate that the vocal cords vibrate and don’t vibrate, respectively, during
pronunciation. Note also that j after a consonant represents palatalization, or softer articulation, and ˈ before
a syllable indicates that it is stressed (i.e., pronounced more emphatically, like the first syllable of the
geographical feature “desert”, or the second syllable of the food “dessert”). The boundary between syllables
is marked with a period.
Nasal Voiced m n ŋ
Voiceless p t k ʔ
Stop
Voiced b d g
Fricative Voiceless s ʃ h
Liquid Voiced ɾ, ɽ j w
(R) I Stop Being Afraid of This Problem (3/4)
R1. On your Answer Sheet, fill in the following table to provide rules for the pronunciation of orthographic y.
Note that for each answer you provide in the “Phoneme” column (i.e., answers (c), (e), (g), and (h)), you
should answer with a single phoneme (i.e., a single character used in the phonetic transcriptions). Multiple
correct answers are possible.
if y (a) it is silent —
R2. On your Answer Sheet, fill in the following table to provide rules for the pronunciation of orthographic p,
t, and k. For this question, you should ignore palatalization (none of your answers in the “Phonemes” column
should include the j symbol).
Note that for each entry you fill in the “Phonemes” column (i.e., (c), (e), and (f)), you should provide three
answers of one or more phonemes each, and you should order your answers respectively, separated by
commas (i.e., with the pronunciation of p first, then t, then k). Multiple correct answers are possible.
R4. On your Answer Sheet, explain the G2P rules you developed for Carib based on the data. You do not
need to repeat your rules from R1 and R2.
The North American Computational Linguistics Open Competition
www.nacloweb.org
Answer Sheets
REGISTRATION NUMBER
Name: ____________________________________________
Contest Site: ________________________________________
Site ID: ____________________________________________
City, State: _________________________________________
Grade: ______
Please also make sure to write your registration number and your name on each page of the Answer
Sheets, and turn in all pages of the Answers Sheets even if you have left some blank .
SIGN YOUR NAME BELOW TO CONFIRM THAT YOU WILL NOT DISCUSS THESE PROBLEMS WITH ANYONE
UNTIL THEY HAVE BEEN OFFICIALLY POSTED ON THE NACLO WEBSITE IN APRIL.
Signature: __________________________________________________
YOUR NAME: REGISTRATION #
(g) shrimps
J2. Give one likely interpretation for each of the following things Scott says:
K1. In each box, write the letter of the English word/phrase that corresponds to the Wik-Mungkan word/
phrase of that number.
1. 2. 3. 4. 5. 6. 7. 8.
L2. Exactly one of these words could potentially force the system to backtrack – circle that word:
L4. In each box, write the letter (A-F) whose label corresponds to the arrow of that number:
1. 2. 3. 4. 5. 6.
L5. The sequence of letters that the system says would be pronounced RUHFLEA is:
M1. Write a letter (A, B, or C) in each box to match the algorithms in the table to their names:
Baseline algorithm:
Pavan’s algorithm:
Arun’s algorithm:
N1. Circle the letters of the two forms that follow one of the common patterns:
N4. Explain what you have observed about Dinka nouns and verbs from the data in this problem:
(P) Yumology
P3. Briefly describe how the “Estimated K mg/100g” values are determined in the YUM FDC:
B:
C:
E:
A:
P5. For each of the following facets, briefly describe what that facet means:
(a) B1245
(b) B1530
(c) C0240
(d) E0310
(e) F0013
(f) F0001
P6. Name a food ingredient that might have the facet B1438:
P7. Give a food description that could be associated with each ID:
(a) Y4
(b) Y5
(c) Y6
(d) Y9
P8. Will the decision make the estimated values more accurate or less accurate? Explain your answer:
Q1. Translate the following sentences into English. For sentence (c), there are two possible translations; give
them both:
(b) Sione has only had fish that the teacher will eat.
Q3. Describe what you have observed about Niuean grammar from the data in this problem:
YOUR NAME: REGISTRATION #
if y (a) it is silent —
(respectively)
(respectively)
(respectively)
R4. Explain the G2P rules you developed for Carib based on the data:
YOUR NAME: REGISTRATION #