0% found this document useful (0 votes)
9 views5 pages

Proj 2

Uploaded by

rodrigoferraribr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views5 pages

Proj 2

Uploaded by

rodrigoferraribr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Fall 2024

CSIS 3400 Natural Language Processing


Project 01

Due date: November 25 23:59

NOTE:

Instructor may ask the project group or individual members questions about the actual work and
contribution after submission. The questions may be very specific (E.g., the lines of code written
by a member and the explanation of the codes.) It may be done via email or in-person after the
class without advanced notice. Failing to answer the questions satisfactorily will result in mark
deduction. Please ensure all group members have roughly even contribution in the project,
especially the coding part, and have full understanding on the whole project.

Grouping:

You need to form groups of 4 to complete the project in this course. Sign your project group list on the
Wiki page in Blackboard.

One of the members in your group, the group leader/captain, is responsible for submitting the project.
Marks will be deducted if more than one member of your group submit the same/different project.

Project Description:

In this project you will be asked to create an FAQBot using ElasticSearch and embeddings.

Project Submission Requirements

Proj02.ipynb (or a zip file if you have any extra files): In this Jupyter notebook, you have to utilize different
cells (code/markdown) to clearly indicate and explain every step. Your Jupyter notebook should include
all the markdown texts signifying the steps with correct heading, python code, comments/analysis, and
visualizations as stated in the following instruction. Note: You need to create the appropriate markdown
headings for each section mentioned below. Codes should have some short comment describing the
statement. Adding a markdown cell containing text before specific actions performed is appreciated.
(Note: Restart the kernel and re-run all cells of the notebook before submission. Substantial marks will be
deducted for cell errors.)

1
Fall 2024

Files Required:

In Douglas College, we have a number of FAQs. For example:

• https://www.douglascollege.ca/student-services/student-
resources/covid19/international-faqs
• https://www.douglascollege.ca/future-students/apply-douglas/domestic-
students/admissions-faq
• https://www.douglascollege.ca/student-services/student-resources/covid19/general-
faqs
• https://www.douglascollege.ca/current-students/important-dates-information/grading-
faq
• https://www.douglascollege.ca/current-students/enrolment-services/fees-related-
information/fees-faq
• https://www.douglascollege.ca/current-students/advising-services/advising-
services/advising-services-faq
• https://www.douglascollege.ca/student-services/student-
support/counselling/counselling-faq
• https://www.douglascollege.ca/student-services/student-resources/covid19/academic-
faqs
• (and more)

Part A. Planning

1. Title, Name and References


Create the very first markdown cell in the notebook and include the full name and student ID of your
group.
Create another markdown cell titled References. Add information about any references you used to
help complete the project

2. Planning
Create another markdown cell. In this cell, discuss why search engine is more suitable that Chatbot
tools like DialogFlow in creating FAQBot.

Examine the dataset carefully. Create another markdown cell. In this cell, describe your plan about
how to create a FAQBot using ElasticSearch. Note that in FAQBot, user will input a question to look for
the relevant answer.

Part B. Basic Model

2
Fall 2024

In the following parts, you need to write python codes, with appropriate comments or markdown cells
to explain your work. Lacking explanations will result in mark deduction.

1. Library import and data loading


Import all the required important libraries for Parts B and C.

2. Search engine building


Refer to the lecture notes and demo, build an FAQBot for Douglas College’s FAQ based on Elasticsearch.
Add explanations in Markdown cells whenever appropriate.

3. Testing and evaluation


i. Each of the members will do the testing and evaluation independently.
ii. Without looking at the dataset, each member decides 5 different questions independently.
The question must not be the same as any questions in the dataset. Create a Markdown and
write down these 5 questions. There will be 20 questions if your group has four members.
iii. Write python code to issue the questions in 3(ii).
iv. From the results, each member judges the relevance of the returned answer for his/her own
question.
v. Precision@K, which is equal to the number of relevant documents divided by the first K
number of documents returned by the search engines. and calculate the precision@1 of each
query and their average precision@1. In other words, Precision@1 will check if the first
returned document is relevant or not.

Part C. Comparison with embedding

1. Discussion
Create a Markdown cell and discuss the problem of keyword search in building FAQBot using
ElasticSearch.

2. Using embedding
i. Study the tutorial https://www.jpmorgan.com/technology/technology-blog/faq-bot
ii. Redo B(2), but using the pre-trained model en_core_web_lg to find the embedding of the
questions. Marks will be deducted if you use other pre-trained model.

3. Testing and evaluation


i. Redo B(3) using the same five questions

4. Conclusion
Create a Markdown cell. In this cell, compare the results in B and C and draw your conclusions.

3
Fall 2024

Member Contribution

In addition to the proposal, each group needs to submit a peer evaluation matrix. Each cell should be
a number between 1 and 4, which reflects how a member thinks the contribution by another member.
The evaluation is opened to open to all members of your group (i.e., Every one can see how others
grade on you), so that each member knows how to enhance their contribution in the project.
(Hint: You may refer to this link to see how to create a table in a Jupyter Markdown cell.)

Evaluatee Member 1 Member 2 Member 3 Member 4


Evaluator

Member 1
Member 2
Member 3
Member 4

Here is the rubric on how to evaluate your team members:


1 Point: No or very little contribution to the project; cannot deliver artifacts or largely miss the
agreed deadline; showing no or very little passion in development.
2 Points: Little contribution to the project with no negative effect to the group; sometimes cannot
deliver artifacts or miss the agreed deadline; mainly follow other members’ idea and
instructions.
3 Points: Fairly large and positive contribution to the project; can handle most of the assigned
tasks and deliver artifacts on time;
4 Points: Large and positive contribution to the project; can help members to tackle problems;
pro-active and passionate in the development.

D. Project Grading Criteria

The project will be graded on a scale of 20 points.

Criteria Grading
Project submitted, named properly with all files included in their corresponding folders to 1
Blackboard.
Part Detail
A Planning for the analysis 2
B.2 Search engine building 4
B.3 Testing and evaluation 4
B Overall description or explanation 2
C.1 Discussion 1
C.2 Using embedding 2

4
Fall 2024

C.3 Testing and evaluation 2


C.4 Conclusion 2
Total: 20

You might also like