Course Project and Term Paper Logistics
Course Project and Term Paper Logistics
• A group can choose a project from the list below, or propose a project of their own,
while meeting the criteria elaborated in the class. If you propose your own project,
it is subject to review and approval of faculty. Even if you are proposing your own
project idea please fill in project preferences from the list too.
• Each project will have a mentor. It is the project group’s responsibility to reach out
the mentor and making continuous progress over the course of semester.
• With project descriptions, baseline and baseline + are indicated. Baseline is the bare
minimum that is expected out of the project. Baseline + lists the improvements that
can be carried out by you. A team is in no ways limited by the literature listed in project
description. A good literature survey and its implementation, with due consultation
and review of faculty / mentor will go a long way in project evaluation.
1. Project Outline Submission: A week after project allocations. You will have to
give a gist of the project (description, datasets), literature survey, plan for project
execution.
2. Interim Submission: Report on the project progress. This will be in the middle
of the semester. Date will be communicated along with project allocations.
3. Final Submission : Project Report, Presentation and Code submissions.
1
2
3 Project Domains
• Machine Translation
• Question Answering
• Summarization
• Conversational Systems
4 Projects List
1. Domain: Machine Translation
Description: ACL 2019 shared task - Machine Translation of News Articles
http://www.statmt.org/wmt19/translation-task.html
Dataset: Pick one of the language pairs of your choice from the list in the link
Baseline: Explore and Implement project
Basline+: Over and over the base implementation explore and implement the im-
provements.
3. Domain: Summarization
Description: Scientific Document Summarization shared task
https://github.com/WING-NUS/scisumm-corpus
Dataset: Link above
Baseline: Explore and Implement project
Basline+: Over and over the base implementation explore and implement the im-
provements.
(a) MVP
• For a given bilingual corpus, train a translation system using Moses (or any
online PBSMT API will do too, in case Moses is hard to set up).
• Translate the test corpus, and run BLEU to score the translation.
• Similarly, using OpenNMT or Tensor2Tensor (any NMT framework), train
an attention model, and run it on the test corpus. Again, run BLEU to score
the translation
• Compare both the scores obtained.
• Obtain a monolingual corpora (maybe even the target side of the bilingual
corpora) of the target language, and train a language model (preferably with
KenLM)
• For each translated sentence of the test corpus (using both PBSMT and
NMT), evaluate the perplexities of both using the LM, and pick the better
translated sentence. (Also find percentage of choosing translations of both
systems.)
• Again, run BLEU to score this newly created mixed set of translations.
• Observe improvements in BLEU scores, if any.
(b) Steps Forward
• For each translated sentence of the test corpus (using both PBSMT and
NMT), use both as observation sequences, and train a HMM model that
predicts the hidden sequence. (Here, the hidden sequence implies a final
sentence, which captures the best of both the translations.)
• Observe improvements in BLEU scores on the set of final sentences, if any.
(a) MVP
• For a given bilingual corpus, train a translation system using Moses (or any
online PBSMT API will do too, in case Moses is hard to set up).
• Translate the test corpus, and run BLEU to score the translation.
• Use a Deep Learning Framework (pytorch or keras probably) to code up an
attention model. Train this model on the bilingual corpus, and run it on the
test corpus. Again, run BLEU to score the translation.
• Compare both the scores obtained.
(b) Steps Forward
• Implement the Neural System Combination for Machine Translation paper
(https://arxiv.org/pdf/1704.06393.pdf)
5
Data Resources
12. Domain:
Description: Natural Language Inference : Tounderstand semantic concepts like
textual entailment and contradiction. The task isthat of comparing two sentences and
identifying the relationshipbetween them.
Dataset: SNLI, SICK Datasets
Track1
Baseline: Structrued Self-attentive Sentence Embedding
Basline+:
Further modifications of your own, or
Some suggestions
- Augment word representations with character level features
- Check out DCN
Track2
Baseline: : A Decomposable Attention Model for Natural Language Inference
Basline+: Add intra-sentence attention - referenced in the paper
7