CW Sequence Analysis
CW Sequence Analysis
CW Sequence Analysis
COURSE 2023-2024
GENERAL INFORMATION
This coursework extends your knowledge gained from tutorials and lectures, focusing
on the design, training, and evaluation of Deep Learning (DL) based models for
sequence based tasks.. By the end of this coursework, participants should demonstrate
proficiency in designing, training, and evaluating different models for various tasks
where sequences are used. It leverages various concepts covered in the module,
including:
● Basic n-gram language models: bigrams, trigrams, etc., with smoothing methods
like KneserNey smoothing.
● Neural Network Language Models (e.g., skipgrams).
● Recurrent Neural Networks (RNNs) encompassing Language Models and
Classifiers, such as LSTM and GRU.
● Sequence analysis using ConvNets and RNNs for tasks like video classification,
image tagging, and action classification.
● Implementation of dialogue systems (chatbots).
● Exploration of Transformers for applications like text generation and chatbots.
● LLMs (large language models), RAG (Retrieval Augmented generation)
By the end of this coursework, you will have honed your skills in applying DL concepts
to real-world problems. Your chosen problem will serve as a practical case study to
showcase your proficiency in utilizing PyTorch for model development and evaluation.
The ultimate goal is to present a well-crafted solution that reflects your understanding
and application of the learned concepts.
Read the whole document and ask as soon as possible anything that is unclear. The
sections are as follow:
Each component is evaluated up to 100%, and you need to get 50% in each component to pass
the module. Otherwise, you will need to resit the component you have failed.
You cannot do your oral presentation without having presented (or passed if resitting the OP
only) your written report.
Report submission deadline (WR) Sunday, 5th May 2024 17:00 GMT
a. You must submit your WR in PDF format online using the Submission Area:
Written Report in Moodle.
b. Caution: The system will not admit any submissions after the deadline. That is, if
you press the submit button at 17:00:01, it won’t be accepted.
Oral Presentation Date (OP) Thursday, 16th May 2024 - the time will be scheduled
c. Please submit your slides to the Submission Area Oral Presentation (Viva) in
Moodle.
d. Your specific slot will be announced later and the timetable will be uploaded to
Moodle
IMPORTANT:
In this coursework, it is imperative that the submitted code and the accompanying report go
hand in hand, forming a cohesive and comprehensive submission. The code should be
structured and documented in a manner that aligns seamlessly with the report's content.
Each section of the report should reference and explain the relevant portions of the code,
providing clarity on the implementation details, methodologies, and key findings. It is essential
that both components, the code, and the report, complement each other to convey a complete
and coherent narrative of the undertaken project.
2. About the coursework
In your project, you will leverage the knowledge acquired from the module to tackle a diverse
range of challenges based on sequential data. This involves:
In addition to language and sequence-based tasks, consider expanding the scope of your
project to encompass domains such as trading data and weather, where sequential data plays a
crucial role. Here are examples that illustrate the application of deep learning in these domains:
● Time Series Forecasting for Trading Data:
○ Problem: Predicting stock prices or market trends based on historical trading
data.
○ Potential Solution: Employ recurrent neural networks (RNNs) or long short-term
memory networks (LSTMs) to capture temporal dependencies and patterns in
financial time series data.
You will work on exactly one problem you will have selected from one of the topics on this list. If
you have a different idea in mind we can discuss it, but always be aware of the time you have to
implement it.
Before starting to work on your projects make sure you have a quick chat with the module
leader about your idea and your implementation plan (i.e. design, train and evaluate).
You do not need to add any state of the art functionality to your report, this is left for the
individual project, however make sure you make your project your own. Use your own ideas.
During the duration of the course you will be presented best practices in structuring your code,
using github and logging libraries.
Must:
● Only Pytorch framework must be used for all implementations.
● Only Wandb should be used for logging.
● Code should be committed periodically to github.
Provide an overview of the problem you are addressing, specify the dataset employed, mention
any borrowed source code, and include a brief literature review (which can be integrated into
the Introduction text or having a separate section).
Explain the models used along with Architecture figures for clarification - if you’re using parts of
an existing Architecture don’t forget to reference them, as well as making it clear what your
contribution is. Include all relevant equations, such as loss functions, ensuring they are
presented, explained, correctly labeled/numbered, and appropriately referenced.
Presenting the results should involve meticulous attention to detail, employing tables, such as
the confusion matrix for classification problems, and plots. The submitted code for generating
these figures is imperative, accompanied by the inclusion of Wandb logs for comprehensive
transparency.
It is essential that all visual elements, tables, and accuracy metrics adhere to a precise labeling
and numbering scheme. Each component should be thoroughly referenced, and the rationale
behind their inclusion must be clearly explained to ensure the integrity and interpretability of the
results.
In the case of multiple experiments showcased in the results, it is crucial that accompanying
comments provide a comparative analysis, highlighting both commonalities and distinctions
among the various trials. This approach contributes to a nuanced understanding of the
outcomes and facilitates a comprehensive evaluation of the experiments conducted.
Sum up the framework you've worked on in the earlier sections, tying it back to the Introduction.
Capture the main takeaways concisely and clearly. This part should showcase your
understanding of the topic.
5. Reflections (15 marks): Reflect on the learning outcomes of the coursework, detailing
encountered challenges, deviations from the initial plan, and insights gained. Discuss what you
would have done differently.
For pairs working on the coursework, each student should choose one of the individual
components (Introduction or Conclusions) to work on independently. These components will be
assessed separately, resulting in different final marks. Equal contribution is expected for all
other sections.
Time:
● Plan effectively for timely submission.
● Avoid overly ambitious or overly simplistic projects.
● Be prepared for the possibility of unforeseen challenges during implementation or
evaluation.
Results:
● Aim for meaningful results; surpassing benchmarks is not mandatory.
● Avoid outcomes like a 1% overall accuracy or an F1 score of 125.
Dataset:
● Optimize time by using complete, pre-labeled datasets.
● Consider benchmark datasets or open-source alternatives discussed in the
course.
● Report (pdf) - check the Written Report section above for details on the content
● Code deliverables check Implementation and deliverables section above for a
detailed outline of the needed files.
● Presentation - will be online, you must be presented during your assigned slot on
the given conference link.
Extenuating Circumstances:
If unforeseen medical or personal circumstances prevent you from submitting your coursework
on time, promptly contact the Programmes Office. Fill out an Extenuating Circumstances form
and provide strong, genuine evidence, such as medical certificates or legal statements, to
support your case.
Plagiarism:
Copying the work of others, whether from another team or a third party, with or without
permission, will result in zero marks, and disciplinary action will be taken. The same
consequences apply if you allow others to copy your work. Refer to the addendum to the
guidelines or consult the module leader for additional clarification.
Feedback
In the labs we can check your progress and give formative feedback.
Evaluative feedback and marks on your coursework will be released after the presentations.