0% found this document useful (0 votes)
6 views54 pages

Project Documentation

Uploaded by

priya5371coma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views54 pages

Project Documentation

Uploaded by

priya5371coma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

MASTER MCQ’S USING

TEXTUAL DATA
A project report submitted in partial fulfillment of the
Requirements for the award of degree of

Bachelor of Technology
in
Computer Science and Engineering

by

M.Deevena (S170747)
V.Anjani Devi (S171018)

O.Devika Tejaswi(S170402)

Under the Supervision of

Ms.J.Vishnu Priyanka,Assistant Professor

Department of Computer Science and Engineering


Rajiv Gandhi University of Knowledge Technologies,
Srikakulam

i
RAJIV GANDHI UNIVERSITY OF KNOWLEDGE TECHNOLOGIES
DEPARTMENT OF COMPUTER SCIENCE ENGINEERING
(A.P. Government Act 18 of 2008)
RGUKT-Srikakulam, Srikakulam Dist – 532410
Tele Fax: 08656 – 235557/235150

CERTIFICATE OF COMPLETION

This is to certify that the work entitled, “Master MCQ’s using Textual data” is the
bona fide work of M.Deevena(S170747), V.Anjani Devi(S171018), O.Devika
Tejaswi(S170402) carried out under the guidance and supervision of Ms.J.Vishnu
Priyanka for the final year project of Bachelor of Technology in the department of
Computer Science and Engineering at Rajiv Gandhi University of Knowledge
Technologies (RGUKT) Srikakulam. This work was completed during the academic
session of December 2022 – April 2023 under my guidance.

------------------------- ----------------------------
Ms.J.Vishnu Priyanka Ms. M. Roopa
Assistant Professor Head of the Department
Department of CSE Department of CSE
RGUKT - Srikakulam RGUKT - Srikakulam

ii
RAJIV GANDHI UNIVERSITY OF KNOWLEDGE TECHNOLOGIES
DEPARTMENT OF COMPUTER SCIENCE ENGINEERING
(A.P. Government Act 18 of 2008)
RGUKT-Srikakulam, Srikakulam Dist – 532410
Tele Fax: 08656 – 235557/235150

CERTIFICATE OF EXAMINATION

This is to certify that the work entitled, “Master MCQ’s using Textual data” is the bona
fide work of M.Deevena(S170747),V.Anjani Devi(S171018),O.Devika Tejaswi(S170402)
here by accord our approval of it as a study carried out and presented in a manner required
for its acceptance in the partial fulfilment of the requirement for the award of the degree
of Bachelor of Technology for which it has been submitted. This approval does not
necessarily endorse or accept every statement made, opinion expressed or conclusion
drawn, as a recorded in this thesis. It only signifies the acceptance of this thesis for the
purpose for which it has been submitted.

------------------------- --------------------------
Ms.J.Vishnu Priyanka Examiner
Assistant Professor Assistant Professor
Department of CSE Department of CSE
RGUKT - Srikakulam RGUKT - Srikakulam

iii
RAJIV GANDHI UNIVERSITY OF KNOWLEDGE TECHNOLOGIES
DEPARTMENT OF COMPUTER SCIENCE ENGINEERING
(A.P. Government Act 18 of 2008)
RGUKT-Srikakulam, Srikakulam Dist – 532410
Tele Fax: 08656 – 235557/235150

DECLARATION

We, M.Deevena (S170747), V.Anjani Devi (S171018), and O.Devika Tejaswi


(S170402), hereby declare that the project report entitled “Master MCQ’s using
Textual data” is our original work and has been completed under the guidance of Ms.
J. Vishnu Priyanka for the partial fulfilment of the requirement for the award of the
degree of Bachelor of Technology in Computer Science and Engineering during the
academic session December 2022 – April 2023 at RGUKT-Srikakulam.

We also declare that this project is a result of our own effort and has not been
copied or imitated from any source. Citations from any websites are mentioned in the
references. Furthermore, the results presented in this project report have not been
submitted to any other university or institute for the award of any degree or diploma.

Date: ____________

Place: ____________

M.Deevena(S170747)

V.Anjani Devi(S171018)

O.Devika Tejaswi(S170402)

iv
ACKNOWLEDGEMENT

We express our sincere appreciation and utmost respect to our team guide, Ms.
J. Vishnu Priyanka, for her exceptional guidance, monitoring, and constant motivation
throughout this semester. The time spent with her during the course of this project has
been invaluable, and we will always treasure the knowledge gained in the field of Web
Development and Natural Language Processing (NLP).
We are grateful for the confidence bestowed upon us and for entrusting our
project entitled "Master MCQ's using Textual data" to our team.
We extend our gratitude to Ms. M. Roopa (HOD of CSE) and other faculty
members for serving as a source of inspiration and constant encouragement that assisted
us in successfully completing the project.
Finally, we express our heartfelt appreciation to our parents for their blessings
and to our friends for their assistance and good wishes towards the successful
completion of this project.

M.Deevena(S170747)

V.Anjani Devi(S171018)

O.Devika Tejaswi(S170402)

v
ABSTRACT

The main aim of our project, 'Master MCQ’s using textual data', is to develop a software
tool that utilizes NLP techniques to automatically generate multiple-choice questions
from textual data in various formats, including PDFs, DOCs, TXTs, images, PPTs, and
URLs. The tool extracts important keywords and phrases from the input data and uses
them to generate questions with relevant distractors. In addition, this project includes a
voice assistance feature to aid blind individuals in writing exams more easily. The
system reads the generated questions aloud and accepts user input in the form of options,
which are marked accordingly. This project features a web-based interface that is easy
to use and accessible to individuals with different levels of technical skills. It is a
valuable resource for educators, professionals, and individuals with visual impairments
who need to create or take multiple-choice exams. It may have applications in various
fields, including education, training, and accessibility.

Keywords: NLP techniques, Automatic MCQ generation, Distractor generation

vi
TABLE OF CONTENTS

1.INTRODUCTION ............................................................................................................ 1
1.1 Problem Definition ....................................................................................... 1
1.2 Motivation of the Project .................................................................................. 1
1.3 Limitations of the Project ................................................................................. 2
1.4 Existing System ................................................................................................ 2
1.5 Proposed System .............................................................................................. 3
1.6 Scope of the Project .......................................................................................... 3
1.6.1 User-Friendly Interface ......................................................................................... 4
1.6.2 Efficient Time Management .................................................................................. 4
2. LITERATURE SURVEY ................................................................................................ 5
2.1 Automatic Generation of Multiple Choice Questions Using Wikipedia: ............ 5
2.2 An automated multiple choice question generation using Natural Language
Processing Techniques:........................................................................................... 6
2.3 Automated MCQ Generator using Natural Language Processing: ..................... 6
3. REQUIREMENT SPECIFICATION ............................................................................. 8
3.1 Functional Requirements .................................................................................. 8
3.2 Non-Functional Requirements .......................................................................... 9
3.3 System Requirements ..................................................................................... 11
3.3.1 Hardware Requirements ...................................................................................... 11
3.3.2 Software Requirements ....................................................................................... 12
4. SYSTEM DESIGN......................................................................................................... 13
4.1 Introduction .................................................................................................... 13
4.2 Class Diagram ................................................................................................ 14
4.3 Use case Diagram ........................................................................................... 16
4.4 Sequence Diagram .......................................................................................... 17
4.5 Activity Diagram ............................................................................................ 18
5. WORKING PROCESS.................................................................................................. 19
5.1 Components of Quiz Craft .............................................................................. 19
5.2 Features .......................................................................................................... 19
vii
5.3 Working procedure ......................................................................................... 20
6. RESULTS AND OUTPUT SCREENS .......................................................................... 21
6.1 Registration Page ............................................................................................ 21
6.2 Login Page ..................................................................................................... 21
6.3 Home Page ..................................................................................................... 22
6.4 Insert Data Page ............................................................................................. 22
6.5 Generated Questions Page .............................................................................. 24
6.6 Admin Dashboard Page .................................................................................. 24
6.7 Manual Questions Page .................................................................................. 25
7. TESTING AND VALIDATION .................................................................................... 27
7.1 Introduction .................................................................................................... 27
7.2 Types of Testing ............................................................................................. 27
7.2.1 Unit Testing ........................................................................................................ 27
7.2.2 Black Box Testing ............................................................................................... 28
7.2.3 White Box Testing .............................................................................................. 28
7.2.4 Integration Testing .............................................................................................. 28
7.3 Validation ....................................................................................................... 28
8. CONCLUSION .............................................................................................................. 33
9. REFERENCES .............................................................................................................. 34
9.1 Automatic Generation of Multiple Choice Questions Using Wikipedia ........... 34
9.2 An automated multiple choice question generation using Natural Language
Processing Techniques .......................................................................................... 34
9.3 Automated MCQ Generator using Natural Language Processing .................... 34
APPENDIX ........................................................................................................................ 35

viii
1.INTRODUCTION

1.1 Problem Definition


The process of creating multiple-choice questions (MCQs) traditionally requires
a significant amount of time and effort, particularly for educators and professionals who
need to generate a large number of MCQs. Additionally, individuals with visual
impairments face difficulties in accessing and answering MCQs. Therefore, there is a
need for an automated system that can generate MCQs from textual data and provide
accessibility features to aid visually impaired individuals in exams. This project aims to
address these issues by developing a software tool that utilizes NLP techniques to
automatically generate MCQs from various formats of textual data and also it includes
a voice assistance feature that allows blind individuals to easily access and answer the
generated questions.

1.2 Motivation of the Project

The idea to implement the "Master MCQ's using Textual Data" web-based
software tool was based on our personal experiences with the time-consuming process
of manually generating multiple-choice questions (MCQs) and the lack of accessibility
options for individuals with visual impairments. We observed that existing methods for
generating MCQs, require significant manual effort, which may not be feasible for
creating a large number of questions. Additionally, individuals with visual impairments
face challenges in accessing and answering MCQs. As technical students, we aimed to
address these issues through a technical solution. After successful analysis and
brainstorming discussions, we proposed our idea, which we believe can provide a partial
solution to these major issues. Our system utilizes NLP techniques to automatically
generate MCQs from various formats of textual data and includes a voice assistance
feature for blind individuals to easily access and answer the generated questions. By
developing this system, we aim to contribute towards a more efficient and accessible
learning environment, and we hope that our software tool can help to improve the
educational experience for all.

1
1.3 Limitations of the Project

Knowledge and learning have no limitations. However, we have limited ourselves to


developing only some features that satisfy the requirements in order to solve the basic
problems that we have faced.
The possible limitations of the project include:

 Accuracy of generated questions: Since the quality of the questions heavily


relies on the input data provided, the accuracy of the MCQ generation may vary.
 Compatibility with input data: Since the software tool supports various input
formats, it may not work as expected with certain input data that contains
complex formatting or structure.
 Limited language support: As the NLP algorithms are trained on specific
languages, they may not work as effectively on other languages.
 Voice assistance limitations: As it may not be accessible to individuals with
hearing impairments or those who prefer a different mode of interaction.
 Hardware limitations: Since the software tool requires a certain level of
hardware specifications to run smoothly, low-end devices may experience
performance issues.

1.4 Existing System

Under the current system, creating multiple-choice questions can be a time-


consuming and challenging process, particularly when dealing with large amounts of
textual data. Educators and professionals often rely on manual methods for generating
questions, which can be tedious and error-prone. Identifying suitable distractors is also
a challenging task that typically requires human involvement. Additionally, individuals
with visual impairments face significant challenges in accessing and interacting with
traditional exam materials. Although there are existing tools available for generating
multiple-choice questions, they may not be comprehensive or user-friendly enough to
meet the needs of all users. Therefore, there is a need for a more efficient, accessible,
and effective system for generating and taking multiple-choice exams from textual data.

2
1.5 Proposed System

Our proposed system, "Master MCQ's using textual data," leverages NLP-based
techniques to generate correct and relevant questions from textual data with greater
efficiency and accuracy. We use the Text-to-Text Transfer Transformer (T5) algorithm
to summarize the text and generate MCQs through sentence mapping, which greatly
reduces the time and effort required to generate questions manually. Additionally, we
use WordNet or Sense2Vec to generate distractors for each question. By using T5 and a
lexical database, our proposed system can generate questions with high accuracy and
relevance to the given lesson material. We developed a web-based application that
provides a user-friendly interface for effortless accessibility. This application will be
beneficial for educators and visually impaired individuals who require assistance in
generating or taking multiple-choice tests. Overall, our proposed system aims to
enhance the speed, accuracy, and accessibility of generating and taking multiple-choice
exams from textual data.

1.6 Scope of the Project

Our project has a broad scope and can serve a diverse audience facing challenges
such as limited resources and tight deadlines. Our software tool is designed to
automatically generate multiple-choice questions from various forms of textual data,
including PDFs, DOCs, TXTs, images, PPTs, and URLs. Additionally, our system
includes a voice assistance feature that aids visually impaired individuals in taking and
writing exams. The tool is web-based and provides a user-friendly interface that can be
accessed by individuals with varying levels of technical expertise. With potential
applications in education and training, our tool can benefit educators, trainers, and
anyone in need of generating or taking multiple-choice exams from textual data.
Overall, our project aims to provide an efficient and accessible solution to the challenges
of generating and taking multiple-choice exams from textual data, with the goal of
improving accessibility and enhancing the learning experience.

3
The proposed project focuses on the following aspects:

1.6.1 User-Friendly Interface

A user-friendly interface is available for users to interact with the system and
create their accounts. Input provided by users through the graphical interface is
stored in the database for future use.

1.6.2 Efficient Time Management

Efficiency in generating multiple-choice questions is crucial for the system,


which serves a diverse group of users, including educators. To achieve this, the
system concentrates on the MCQ generation feature alone, reducing the server's
workload and enabling the prompt generation of questions.

4
2. LITERATURE SURVEY

The following reference papers have been examined to identify the strengths and
weaknesses of existing systems for automatic MCQ generation. Through our analysis,
we have identified several limitations and potential areas for improvement in these
systems, which have informed the development of our own project. The following are
the list of the papers we reviewed, along with a brief summary of their findings:

2.1 Automatic Generation of Multiple Choice Questions Using


Wikipedia:

 In the domain of sports, the authors have developed a three-part format for
multiple-choice questions (MCQs), which includes the stem, serving as the
foundation for the question, the target word, indicating the correct answer, and
the distractors, which comprise the incorrect answers. By utilizing existing
questions in this field, the authors aimed to identify sentences suitable for
MCQs.
 The authors employed a combination of parsing techniques and named entity
recognition (NER) systems to identify the correct answer for the MCQ. In
addition, they extracted extra attribute values of the correct answer from the
internet and explored Wikipedia for entities sharing similar attribute values.
 To generate distractors, the authors retrieved relevant data from structured
sources, such as information boxes or the opening sentence of the content
featured on the right-hand side of a Wikipedia page. Subsequently, they searched
Wikipedia for related candidates from the same category, or selected distractors
at random from the pool of candidates.

Overall, the approach demonstrates the potential for automated generation of MCQs
using Wikipedia. However, randomly choosing distractors may lead to lower quality
MCQs, and not every topic has a corresponding Wikipedia page, so educators may not
be able to extract MCQs for their specific needs using this method.

5
2.2 An automated multiple choice question generation using Natural
Language Processing Techniques:

The authors developed a system that automatically generates multiple choice questions
(MCQs) from lesson materials using (TF-IDF) and N-grams. The system's efficiency is
assessed by comparing manually extracted keywords from five lesson materials by a
teacher with those auto-generated by the system. The number of MCQs generated for
each document was found to be proportional to the number of extracted keywords. The
system picked three other keywords at random from the extracted pool of keywords to
serve as distractors for users to select the correct option. However, a drawback of this
method is that the distractors, which are extracted keywords, may not match the options,
and the same options may repeat in every question.

2.3 Automated MCQ Generator using Natural Language Processing:

The authors of this paper proposed an automated system that utilizes natural language
processing techniques for generating multiple choice questions (MCQs). The system
comprises three primary components, including text summarization, keyword
extraction, and distractor generation.
 For text summarization, the authors use the BERT algorithm, which is a state-
of-the-art method for natural language processing tasks. The system first
identifies the most important sentences in a given text and then generates a
summary of the key points
 For keyword extraction, the authors use two different methods: the Python
Keyword Extractor (PKE) and the Rapid Automatic Keyword Extraction
(RAKE) library. These methods identify the most relevant words and phrases in
the text, which are then used to generate the MCQs.

 For distractor generation, the WordNet algorithm, a lexical database for English,
is employed to identify related words and concepts that are similar to the correct
answer but not quite right.

6
Some limitations were identified during the research. Specifically, the fixed-length input
and output of the BERT algorithm can restrict its effectiveness for tasks such as
summarization, where input and output lengths can vary significantly. Additionally,
WordNet, which was last updated in 2012, has a limited number of words, resulting in
poor generation of distractors or even the inability to find suitable distractors. However,
in our project, we have considered using PKE for keyword extraction, which is capable
of identifying important keywords

7
3. REQUIREMENT SPECIFICATION

The process of Requirements-Determination involves the analyst acquiring


knowledge about the organization and utilizing it to choose the appropriate technology
for a specific application. The Software Requirements Specification (SRS) provides a
comprehensive overview of the system's behaviour that will be developed. This
document consists of a collection of use cases that detail all of the interactions that users
will have with the software. These use cases are also referred to as functional
requirements. In addition to functional requirements, the SRS also includes non-
functional requirements, which are limitations that the design or implementation must
adhere to (such as performance engineering requirements, quality standards, or design
restrictions).

3.1 Functional Requirements

In the field of software engineering, a functional requirement refers to the


definition of a software system or component's function, which includes a set of inputs,
behavior and outputs. The functional requirements may include calculations, technical
specifications, data manipulation and processing, and other specific functionalities that
outline the system's intended accomplishments. Use cases capture the behavioral
requirements that describe all the scenarios where the system uses the functional
requirements. Non-functional requirements, also called quality requirements, support
functional requirements by imposing limitations on the design or implementation, such
as performance, security, or reliability. The system design provides detailed information
about how the system implements functional requirements. A requirements analyst may
generate use cases after gathering and validating a set of functional requirements,
illustrating behavioral scenarios through one or more functional requirements. On some
occasions, the analyst may initiate the requirements elicitation process by extracting a
set of use cases that will be used to derive the necessary functional requirements for
executing each use case.
To satisfy the functional requirements for this project, users must register/sign up, with
two types of users, Admin and Student. Admin must provide details such as name,
address, phone number, and email id, while the student has to provide their name, ID

8
number, address, phone number, and email id. To login, the user must enter their
username and password, and they will be able to access only the software features that
they are authorized to use.

Functional Requirements for this project

 Register: To begin using the system, the user is required to complete the
registration/sign-up process, with two distinct user types available:
 Admin: The admin has to provide details like name, address, phone
number, email id.
 Student: The students have to provide details about his/her name, ID
number, address, phone number, email id.
 Login:
 Input: Enter the username and password provided.
 Output: Users will be able to use the features of software that they are
authorized to access.

3.2 Non-Functional Requirements

Non-functional requirements are a type of requirement in systems engineering


and requirements engineering that define standards to assess the system's performance,
unlike functional requirements which define specific behaviors or functions. The
purpose of non-functional requirements is to specify how a system should operate,
whereas functional requirements specify what a system should do. Other terms like
"constraints," "quality attributes," "quality goals," "quality of service requirements,"
and "non-behavioral requirements" are commonly used to refer to non-functional
requirements. There are two categories of non-functional requirements: execution
qualities (e.g. security and usability) that can be observed at runtime, and evolution
qualities (e.g. maintainability, testability, extensibility, and scalability) that are reflected
in the software system's static structure.

9
• Usability Requirement

The system must be accessible through a website on a phone, PC, or tablet, and users
should be able to navigate the system easily without any special training. The system
must be user-friendly.

• Availability Requirement

The website must be available to users 24 hours a day, 365 days a year with no

downtime.

• Efficiency Requirement

The Mean Time to Repair (MTTR) should be no more than an hour, and the system

must recover quickly in case of any failures.

• Accuracy Requirement

The system must provide accurate information, taking concurrency issues into account.

The system should be reliable and provide 100% access reliability.

• Performance Requirement

The system must refresh information based on updates and respond to user requests in

no more than two seconds. Large processing jobs may take longer, but responses to

view information must appear on the screen within 5 seconds.

• Security Requirement

The system must provide security by limiting access for different users, and the

database should not be accessible to everyone.

10
• Reliability Requirement

The system must be highly reliable, and in case of server crashes, data will not be lost

as a backup will be maintained.

3.3 System Requirements

Using computer software efficiently usually necessitates the presence of specific


hardware components or software resources on the computer. These components are
called system requirements and serve as a useful guide for users.

3.3.1 Hardware Requirements

In order to run computer software efficiently, specific hardware components or


software resources are usually required. These necessary components are known
as system requirements, and they provide a useful guide for users. When it
comes to operating systems and software applications, system requirements
often include physical computer resources, or hardware. An HCL, or hardware
compatibility list, is typically provided alongside a hardware requirements list,
particularly for operating systems. This list outlines the hardware devices that
have been tested and are considered either compatible or incompatible with a
particular operating system or application. The next sections will explore the
various aspects of hardware requirements.

Hardware Requirements:

 Laptop/PC
 Processor: Intel Core i5 or higher
 RAM: 8 GB or higher
 Hard Disk Space: 50 GB or more
 Display: 14-inch or larger display with a resolution of 1920 x 1080
pixels or higher
 Graphics Card: Any modern graphics card with at least 2GB of
dedicated memory
 Input devices: Keyboard or mouse

11
 Operating System: Windows 10 or later
 Microphone
 Speakers or Headphones
 Sound Card

3.3.2 Software Requirements

To ensure optimal functioning of an application, software requirements entail


specifying the necessary software resources and prerequisites that must be installed on
a computer. Typically, these requirements are not included in the software installation
package and must be installed separately prior to software installation.

Software Requirements:

 Operating System:
 Windows 8 or above.
 Frontend:
 Html 5
 CSS3
 Bootstrap 4
 JavaScript

 Web Framework:

 Flask
 IDE:

 VS Code (Visual Studio Code)


 Programming Languages:

 Python Language

12
4. SYSTEM DESIGN

4.1 Introduction

In January 1997, the first version of the Unified Modelling Language (UML) was
released. It was the result of a collaboration between Grady Booch, James Rumbaugh,
and Ivar Jacobson, who combined the most effective aspects of their respective object-
oriented analysis and design techniques. UML's fundamental components are drawn
from the methods of Booch, OMT, and OOSE.

UML aims to achieve several objectives, including:


1. Applying object-oriented principles to model systems effectively.
2. Establishing a clear connection between conceptual and executable aspects.
3. Addressing the challenges of scalability in large and crucial systems.
4. Developing a modeling language that is user-friendly for both humans and
machines.

Basic Building Blocks of UML

UML comprises fundamental elements such as entities and connections, which


can be assembled in various ways according to specific principles to produce diverse
types of diagrams. UML provides nine categories of diagrams, each with a concise
definition. The succeeding sections of this document will concentrate on the first four
diagrams, which are the most common and referred to as the UML core diagrams.

Use case Diagram: It displays the available use cases and their application by actors.
Class Diagram: It describes the system's structure, which comprises classes,
associations, and other relationships.
Sequence Diagram: It visualizes object interaction through message exchange.
Activity Diagram: It represents the program's flow from a defined starting point to a
finishing point.
State chart Diagram: It illustrates state machines, including states, transitions, events,
and activities.

13
Object Diagram: It depicts a snapshot of class object instances and their relationships.
Collaboration Diagram: It emphasize the order in which objects send and receive
messages.
Component Diagram: It depicts the system's static implementation view by showing
the organization and dependencies among components.
Deployment Diagram: It shows the configuration of runtime processing nodes and
components that reside on them.

4.2 Class Diagram

A Class diagram is an integral part of system design and represents both the core objects
and interactions within the application, as well as the classes that will be programmed.

The diagram consists of boxes that represent each class, containing three distinct parts:
1. The top portion displays the class name, which is centrally aligned, written in
bold, and begins with a capitalized letter.
2. The middle portion lists the class attributes, aligned to the left and written in
lowercase.
3. The bottom part describes the methods that the class can execute, aligned to the
left and written in lowercase.

By identifying and grouping classes in a class diagram, the static relationships between
the objects can be established to facilitate system design. To further refine the
conceptual design, classes may be divided into multiple subclasses through detailed
modeling.

14
Class Diagram

Fig 3.1: Class Diagram

15
4.3 Use case Diagram
Use case diagrams are commonly utilized to analyse the high-level requirements
of a system. These requirements are expressed as organized system functionalities
known as use cases. Actors are also important elements in use case diagrams, as they
are the ones who interact with the system. Actors can include human users, internal or
external applications. To create a concise use case diagram, it is necessary to identify
the functionalities to be represented, actors, and relationships among the actors and use
cases. An effective use case diagram must have a well-chosen name for each use case
that accurately reflects its functionality. It is also important to name actors appropriately
and to clearly show relationships and dependencies within the diagram. Not all types of
relationships need to be included in the diagram since its primary purpose is to identify
system requirements. Additionally, notes can be added to clarify essential points.
Use case Diagram

Fig 3.2: Use Case Diagram

16
4.4 Sequence Diagram
A Sequence diagram is a type of interaction diagram that visualizes the order of
processes and their interactions. It presents the objects and classes involved in a scenario
and illustrates the messages exchanged between objects to execute the functionality.
The lifelines on a sequence diagram represent different objects or processes, while
horizontal arrows show the messages exchanged between them in chronological order.
Sequence Diagram

Fig 3.3: Sequence Diagram

17
4.5 Activity Diagram

Activity diagrams are a useful tool for representing the workflow of business
processes or even class operations. These diagrams are similar to flowcharts in that they
model the flow of activities from one to another. The activity diagram toolbox provides
a range of tools that can be used to create such diagrams, including activities, decisions,
end states, objects, object flow, start states, states, swim lanes, synchronizations, and
transmissions.
Activity Diagrams

Fig 3.4: Activity Diagram

18
5. WORKING PROCESS

5.1 Components of Quiz Craft

Our web project contains many components for the user convenience and the user
satisfaction. The components are:

 Login & Logout pages: Users can log in using their credentials and log out
when finished.
 Registration page: Newusers must register before using the website.
 User Dashboard: Each user has a personalized dashboard displaying generated
MCQs and other relevant information.
 Admin Dashboard: Administrators have a personalized dashboard with
additional features such as the ability to add or delete questions and options,
view statistical data, and manage users.
 Profile pages: Users and admins have separate profile pages to manage their
accounts and view their details.

5.2 Features

 Availability: The website is accessible round the clock and can be reached from
any location worldwide, provided that the user has internet connectivity.
 Responsive Design: The website incorporates a responsive design that enables
easy accessibility across various devices, such as desktop computers, laptops,
tablets, and mobile phones.
 High Reliability: The website has a robust and reliable architecture that ensures
user data is secure and safe, even in case of power loss or internet disconnection.
 Ease of Access: The website can be accessed in a simple and easy manner
through the help of the internet, with no additional software required.
 Browser Compatibility: Any contemporary web browser, such as Google
Chrome, Mozilla Firefox, or Microsoft Edge, can be used to access the website.

19
 Authentication and Authorization: The website provides robust
authentication and authorization features to ensure user data is secure.
 User-friendly Interface: The website features a user-friendly interface that
simplifies the process of generating MCQs and navigating for users.
 Multiple Input Options: Users can input text, files, and URLs as sources for
MCQ generation, providing flexibility and convenience.

5.3 Working procedure

The working process refers to the procedure followed by our system, including
how the software functions and the steps involved when a user browses the URL.
Initially, all system users, such as students and examiners, are taught how to access and
use the software. During registration, users must provide their ID number, name, email,
phone number, and password, with all fields being validated to prevent entry of illegal
values. Once registered, user details are saved in the database, and login credentials are
sent to the registered email and phone number.
Once a student receives their credentials, they can log in to the system, which
provides authentication and authorization. Without registration, a student cannot access
any features of the system. Upon successful login, the student can see the dashboard,
which displays generated questions and previous session information. The dashboard
content is sourced from the database, and students can view and modify generated
questions according to their needs.
The profile page contains all the details of each student, and they can edit their
account details. A limited view of their details is also available on the top right of the
dashboard. Students can customize the theme to white, black, or transparent based on
their preference, and they can log out of the software using the logout button provided
in the bottom left of the website.

20
6. RESULTS AND OUTPUT SCREENS

6.1 Registration Page

Fig 6.1.1: Registration page

6.2 Login Page

Fig 6.2.1: Login page

21
6.3 Home Page

Fig 6.3.1: Home page

6.4 Insert Data Page

Fig 6.4.1: Insert Text page

22
Fig 6.4.2: Insert URL page

Fig 6.4.3: File Upload page

23
6.5 Generated Questions Page

Fig 6.5.1: Generated Questions page

6.6 Admin Dashboard Page

Fig 6.6.1: Admin Dashboard page

24
Fig 6.6.1:Add or Remove users page

6.7 Manual Questions Page

Fig 6.7.1: Manual Questions page

25
Fig 6.7.2: Add/Remove Questions and options

26
7. TESTING AND VALIDATION

7.1 Introduction

Software testing is a process conducted to provide information to stakeholders


about the quality of a product or service being tested. It offers an independent, objective
view of the software, enabling businesses to comprehend and appreciate the risks
involved during the software's implementation. Testing involves various techniques,
including running an application or program to locate software bugs. The aim of
software testing is to validate and verify that the software program or product conforms
to the technical and business requirements that guided its design and development,
performs as anticipated, and can be implemented with similar characteristics. The
testing method utilized can vary, but it can be performed at any stage in the development
process.
Different software development models will allocate test efforts at different
points during the development process. More recent development models, such as
Agile, may implement test-driven development, with a more extensive portion of testing
carried out by the developer before the formal testing phase. However, in a traditional
model, most of the testing is done after the requirements have been defined and the
coding process completed. It is important to note that testing cannot identify all defects
within software, but instead provides a critical comparison of the product's state and
behavior against specific criteria or oracles. The oracles may encompass a range of
factors, such as specifications, similar products, previous iterations of the product, user
or consumer demands, relevant regulations and standards, or other relevant benchmarks,
among others.

7.2 Types of Testing

7.2.1 Unit Testing

Unit Testing is performed on individual modules once they are finished and able
to run. The testing is limited to the requirements of the designer.

27
7.2.2 Black Box Testing

The black box testing approach generates test cases that execute all the
functional requirements of the program as input conditions. This type of testing is useful
for detecting errors in missing or incorrect functions, interface errors, data structure
errors, external database access errors, performance errors, and
initialization/termination errors. However, in this type of testing, only the output is
checked for accuracy, and the logical flow of the data is not examined.

7.2.3 White Box Testing

White Box Testing involves generating test cases based on the internal logic of each
module, typically through the creation of flow diagrams. The purpose is to test all
logical decisions on both their true and false sides, guarantee that all independent paths
have been executed, execute all loops at their boundaries and within their operational
bounds, and ensure the validity of internal data structures.

7.2.4 Integration Testing

Integration testing guarantees that software and supporting systems function as


a unit. To ensure that the modules function properly when merged together, it tests each
module's interface.

7.3 Validation

The successful implementation of the system verifies that all the requirements
stated in the software requirements specification have been met. If incorrect input is
entered, the system displays the appropriate error message.

28
Test Scenario-1: Login

Test No Test Case Expected Output Actual Output Result


1 Empty input with The line “Invalid The line “Invalid Passed
no username and username or username or
password password. Please password. Please
try again.” try again.”

2 Giving username The line “Invalid The line “Invalid Passed


with no password username or username or
password. Please password. Please
try again.” try again.”

3 Giving correct The line “Invalid The line “Invalid Passed


username with username or username or
incorrect password password. Please password. Please
try again.” try again.”

4 Giving correct Redirects to Redirects to Passed


username and welcome page welcome page
password

Test Scenario-2: Register

Test No Test Case Expected Output Actual Output Result


1 Giving User name The line The line Passed
which is already “Username “Username
existed already taken. already taken.
Please choose a Please choose a
different one.” different one.”

2 Giving all details Redirects to Redirects to Passed


correctly welcome page welcome page

29
Test Scenario-3: Dashboard

Test No Test Case Expected Output Actual Output Result


1 The dashboard Questions are Questions are Passed
displaying the displayed displayed
generated questions
from the database.

2 The user can Modification of Modification of Passed


modify the generated generated
generated questions questions questions
and save the
changes.

3 The user can view Previously Previously Passed


previous session generated generated
details and questions are questions are
download them. accessible accessible

4 The user can Profile Updation Profile Updation Passed


navigate to their
profile page and
edit their details.

Test Scenario-4: MCQ generation

Test Test Case Expected Output Actual Output Result


No

1 Enter valid text in the text MCQs are MCQs are Passed
area and click on generate generated and generated and
MCQ displayed on the displayed on the
screen. screen.

30
2 Enter invalid text in the No MCQs are No MCQs are Passed
text area and click on generated. generated.
generate MCQ.

3 Upload a file and click on MCQs are MCQs are Passed


generate MCQ. generated based generated based
on the file on the file
contents and contents and
displayed on the displayed on the
screen. screen.

Test Scenario-5: Logout Page

Test No Test Case Expected Actual Result


Output Output
1 Click on the Redirected to the Redirected to
logout button and login page. the login page. Passed
ensure the user is
logged out

Test Scenario-6: Profile Page

Test No Test Case Expected Actual Result


Output Output
1 Displaying Details Details of the visible. Passed
student is
visible.
2 Updating details Can’t update. Can’t update.
with a user name User name User name
Passed
which is already already exists. already exists.
exist.

31
3 Updating details Updated Updated
with a user name Successfully. Successfully.
Passed
which is not
already exist.

32
8. CONCLUSION

In conclusion, the automatic MCQ generator is a valuable tool that simplifies


the process of creating MCQs for various purposes such as exams, assessments, and
surveys. The system is designed to generate questions automatically based on user input,
saving time and effort while ensuring the quality and validity of the questions.
Through the use of natural language processing, the system is able to understand
and analyse the input text, and generate relevant and accurate questions. The system
also allows users to edit and modify the generated questions according to their needs.
The system incorporates a user-friendly interface, which enables effortless navigation
and ease of use for users. The system provides features such as user registration, login,
and profile management, which ensure the security and privacy of user data.
Overall, the automatic MCQ generator system offers a convenient and efficient
solution for generating MCQs for various purposes, reducing the workload and time
required for the task while maintaining the quality of the questions. We convey our
sincere thanks to all the readers for your patience and time.

33
9. REFERENCES

9.1 Automatic Generation of Multiple Choice Questions Using


Wikipedia

9.2 An automated multiple choice question generation using Natural


Language Processing Techniques

9.3 Automated MCQ Generator using Natural Language Processing

34
APPENDIX

The project entitled “Master MCQ’s using Textual Data” solves some of the issues that
are facing by the examiners to conduct the examinations. Thus, we are looking forward
for implementing this project.

PRE-REQUISITES:

 pip install scipy: This installs the scipy library that can be used for a wide range
of scientific and technical computing tasks.
 pip install pytesseract: This installs the pytesseract library, which provides an
interface for using the Tesseract OCR engine to recognize text from images.
 pip install textract: This installs the textract library, which is a Python wrapper
for extracting text from various file formats, such as PDF, DOCX, and images.
 pip install tesseract-ocr: This installs the Tesseract OCR engine on your system,
which is required by pytesseract to recognize text from images.
 pip install gTTS: This installs the gTTS library, which provides an interface for
using Google Text-to-Speech API to convert text to speech.
 pip install playsound: This installs the playsound library, which allows you to
play sound files from Python.
 pip install pygobject: This installs the pygobject library, which provides a
Python wrapper for the GObject library. This is used to create graphical user
interfaces in Python.
 pip install flashtext: This installs the flashtext library, which provides a fast and
flexible way to replace words or phrases in text.
 pip install git+https://github.com/boudinfl/pke.git: This installs the pke
(Python Keyphrase Extraction) library from its GitHub repository. pke is a
Python library for extracting keywords and keyphrases from text.
 pip install transformers: This installs the transformers library, which is a
Python library for natural language processing (NLP). transformers provides a
wide range of pre-trained models for tasks such as text classification, question
answering, and language generation.

35
 pip install sentencepiece: This installs the sentencepiece library, which is a
Python library for subword text tokenization. sentencepiece can be used to split
text into smaller units for NLP tasks such as machine translation.
 pip install textwrap3: This installs the textwrap3 library, which provides
improved text wrapping functionality compared to the built-in textwrap module.
 pip install strsim: This installs the strsim library, which provides functions for
computing string similarity metrics such as Levenshtein distance
 pip install sense2vec: This installs the sense2vec library, which provides pre-
trained word embeddings based on the spaCy library.
 pip install sentence-transformers: This installs the sentence-transformers
library, which provides pre-trained models for generating vector embeddings of
sentences or paragraphs. These embeddings can be used for various NLP tasks
such as sentence similarity and clustering.
 nltk.download('punkt'): This downloads the punkt package from the Natural
Language Toolkit (NLTK), which provides functions for tokenizing text into
sentences or words. This package is often used in NLP tasks for text
preprocessing.
 nltk.download('brown'): This downloads the Brown Corpus, which is a
collection of text samples used for linguistic research. The Brown Corpus is often
used in NLP tasks for training language models and evaluating text processing
algorithms.
 nltk.download('wordnet'): This downloads the WordNet database, which is a
large lexical database of English words organized by semantic relationships.
WordNet is often used in NLP tasks for tasks such as word sense disambiguation
and semantic similarity computation.
 nltk.download('stopwords'): This downloads a list of stop words from NLTK,
which are commonly used words that are often removed from text during
preprocessing. Stop words include words like "a", "an", "the", "and", "but", etc.
and are often removed because they do not carry significant meaning for NLP
tasks.

36
 pip install wget
 pip install tar
Navigate to the directory where you want to download the file and run the following
command to download it:
wget
https://github.com/explosion/sense2vec/releases/download/v1.0.0/s2v_reddit_2015_m
d.tar.gz
Once the download is complete, run the following command to extract the contents of
the archive:
tar -xvf s2v_reddit_2015_md.tar.gz_2015_
This will extract the contents of the archive to a new directory called
s2v_reddit_2015_md.

wget is a command-line tool for downloading files from the internet. We use it to
download the sense2vec model archive from the official GitHub repository.
Once the file is downloaded, we need to extract its contents using the tar command,
which is a command-line tool for working with tar archives. The tar -xvf command is
used to extract the contents of the archive to a new directory called
s2v_reddit_2015_md.
Overall, this process is necessary to obtain the sense2vec model, which is a pre-trained
language model used for generating multiple-choice questions from a given text.

37
Source Code:

mcq generationflask.py

from flask import Flask, request, render_template, url_for, send_file

import textract
import os
from code_5 import generate_question,convert_to_text

app = Flask(__name__)

@app.route('/')
def home():
return render_template('new.html')

@app.route('/submit', methods=['POST'])
def submit():
text = request.form['text']
output=generate_question(text, "Sense2vec")

return 'MCQ: {}'.format(output)

@app.route('/submit2', methods=['POST'])
def submit2():
file = request.files["file"]
file.save(os.path.join(file.filename))
return 'Text submitted: {}'.format("Success")

@app.route('/submit3', methods=['POST'])
def submit3():
url = request.form['url']
text=convert_to_text(url)
output=generate_question(text, "Sense2vec")
return 'Text submitted: {}'.format(output)

if __name__ == '__main__':
app.run()

Maincode.py

38
import docx2txt
import pytesseract
from PIL import Image
from io import StringIO
from pdfminer.converter import TextConverter
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.pdfpage import PDFPage
import docx2txt
import pptx
import requests
from bs4 import BeautifulSoup
from textwrap3 import wrap
import torch
from transformers import T5ForConditionalGeneration, T5Tokenizer
from nltk.tokenize import sent_tokenize
from textwrap import wrap
import random
import numpy as np
import nltk
import pke
from nltk.corpus import wordnet as wn
from nltk.tokenize import sent_tokenize
from nltk.corpus import stopwords
import torch
from flashtext import KeywordProcessor
import numpy as np
import string
import traceback
from sense2vec import Sense2Vec
s2v = Sense2Vec().from_disk('s2v_old')
from sentence_transformers import SentenceTransformer
from similarity.normalized_levenshtein import NormalizedLevenshtein
normalized_levenshtein = NormalizedLevenshtein()
from collections import OrderedDict
from sklearn.metrics.pairwise import cosine_similarity
from gtts import gTTS

text = """ """


def convert_to_text(file_path):
# Determine the file format and call the appropriate conversion
function
if file_path.endswith(('.jpg', '.jpeg', '.png', '.bmp')):
return convert_image_to_text(file_path)
elif file_path.endswith('.pdf'):
return convert_pdf_to_text(file_path)
elif file_path.endswith('.docx'):
return convert_doc_to_text(file_path)
elif file_path.endswith('.pptx'):

39
return convert_ppt_to_text(file_path)
elif file_path.startswith('https:'):
return convert_url_to_text(file_path)
elif file_path.startswith('http:'):
return convert_url_to_text(file_path)
else:
return text

file_url =" "


text =convert_to_text(file_url)

def generate_question(context, radiobutton):


summary_text = summarizer(context, summary_model, summary_tokenizer)
np = get_keywords(context, summary_text)

output = ""
audio_output = ""

for answer in np:


ques = get_question(summary_text, answer, question_model,
question_tokenizer)
if radiobutton == "Wordnet":
distractors = get_distractors_wordnet(answer)
else:
distractors = get_distractors(answer.capitalize(), ques,
s2v, sentence_transformer_model, 40, 0.2)
output += "<p>" + ques + "</p>"
output += "<p>" + "Ans: " + answer.capitalize() + "</p>"
audio_output += ques + '\n'
audio_output += "Option 1: " + answer.capitalize() + '\n'
if len(distractors) > 0:
for i, distractor in enumerate(distractors[:3], 2):
output += "<p>" + "Option {}: {}".format(i, distractor)
+ "</p>"
audio_output += "Option {}: {}".format(i, distractor) +
'\n'

output = output

from gtts import gTTS


from IPython.display import Audio
from io import BytesIO
from textwrap import wrap
tts = gTTS(audio_output)
tts.save('output.mp3')
from IPython.display import Audio
print("Audio saved as output.mp3")
Audio("output.mp3", autoplay=True)

40
return output

if __name__ == '__main__':

def convert_url_to_text(file_path):
try:
page = requests.get(file_path)
if page.status_code == 200:
soup = BeautifulSoup(page.content, "html.parser")
for script in soup(["script", "style"]):
script.decompose()
text = soup.get_text()
return text.strip()
else:
return None
except:
return None

def convert_image_to_text(file_path):
try:
# Open the image file and extract the text using OCR
image = Image.open(file_path)
text = pytesseract.image_to_string(image)
except Exception as e:
print(f"Error processing image file: {e}")
text = None
return text

def convert_pdf_to_text(file_path):
try:
# Extract the text from the PDF file
resource_manager = PDFResourceManager()
file_stream = StringIO()
converter = TextConverter(resource_manager, file_stream)
interpreter = PDFPageInterpreter(resource_manager, converter)
with open(file_path, 'rb') as file:
for page in PDFPage.get_pages(file, caching=True,
check_extractable=True):
interpreter.process_page(page)
text = file_stream.getvalue()
except Exception as e:
print(f"Error processing PDF file: {e}")
text = None
return text

def convert_doc_to_text(file_path):
try:
# Extract the text from the Word document

41
text = docx2txt.process(file_path)
except Exception as e:
print(f"Error processing Wordpi file: {e}")
text = None
return text

def convert_ppt_to_text(file_path):
try:
# Extract the text from the PowerPoint presentation
presentation = pptx.Presentation(file_path)
text = ''
for slide in presentation.slides:
for shape in slide.shapes:
if hasattr(shape, 'text'):
text += shape.text
except Exception as e:
print(f"Error processing PowerPoint file: {e}")
text = None
return text

def get_nouns_multipartite(content):
out=[]
try:
extractor = pke.unsupervised.MultipartiteRank()
extractor.load_document(input=content,language='en')
pos = {'PROPN','NOUN'}
stoplist = list(string.punctuation)
stoplist += ['-lrb-', '-rrb-', '-lcb-', '-rcb-', '-lsb-', '-rsb-
']
stoplist += stopwords.words('english')
extractor.candidate_selection(pos=pos)
extractor.candidate_weighting(alpha=1.1,
threshold=0.75,
method='average')
keyphrases = extractor.get_n_best(n=15)

for val in keyphrases:


out.append(val[0])
except:
out = []
traceback.print_exc()

return out

def get_keywords(originaltext,summarytext):
keywords = get_nouns_multipartite(originaltext)

42
keyword_processor = KeywordProcessor()
for keyword in keywords:
keyword_processor.add_keyword(keyword)

keywords_found = keyword_processor.extract_keywords(summarytext)
keywords_found = list(set(keywords_found))

important_keywords =[]
for keyword in keywords:
if keyword in keywords_found:
important_keywords.append(keyword)

return important_keywords[:4]

imp_keywords = get_keywords(text,summarized_text)

from transformers import T5ForConditionalGeneration, T5Tokenizer


question_model =
T5ForConditionalGeneration.from_pretrained('ramsrigouthamg/t5_squad_v1')
question_tokenizer =
T5Tokenizer.from_pretrained('ramsrigouthamg/t5_squad_v1')

question_model = question_model.to(device)
def get_question(context,answer,model,tokenizer):
text = "context: {} answer: {}".format(context,answer)
encoding = tokenizer.encode_plus(text,max_length=384,
pad_to_max_length=False,truncation=True, return_tensors="pt").to(device)
input_ids, attention_mask = encoding["input_ids"],
encoding["attention_mask"]

outs = model.generate(input_ids=input_ids,
attention_mask=attention_mask,
early_stopping=True,
num_beams=5,
num_return_sequences=1,
no_repeat_ngram_size=2,
max_length=72)

dec = [tokenizer.decode(ids,skip_special_tokens=True) for ids in outs]

Question = dec[0].replace("question:","")
Question= Question.strip()
return Question

for answer in imp_keywords:

43
ques =
get_question(summarized_text,answer,question_model,question_tokenizer)

sentence_transformer_model = SentenceTransformer('msmarco-distilbert-
base-v3')
def filter_same_sense_words(original,wordlist):
filtered_words=[]
base_sense =original.split('|')[1]
print (base_sense)
for eachword in wordlist:
if eachword[0].split('|')[1] == base_sense:
filtered_words.append(eachword[0].split('|')[0].replace("_", "
").title().strip())
return filtered_words

def get_highest_similarity_score(wordlist,wrd):
score=[]
for each in wordlist:
score.append(normalized_levenshtein.similarity(each.lower(),wrd.lowe
r()))
return max(score)

def sense2vec_get_words(word,s2v,topn,question):
output = []
print ("word ",word)
try:
sense = s2v.get_best_sense(word, senses= ["NOUN",
"PERSON","PRODUCT","LOC","ORG","EVENT","NORP","WORK OF
ART","FAC","GPE","NUM","FACILITY"])
most_similar = s2v.most_similar(sense, n=topn)

output = filter_same_sense_words(sense,most_similar)

except:
output =[]

threshold = 0.6
final=[word]
if question is None:
return []
checklist =question.split()
for x in output:
if get_highest_similarity_score(final,x)<threshold and x not in
final and x not in checklist:
final.append(x)

44
return final[1:]

def mmr(doc_embedding, word_embeddings, words, top_n, lambda_param):

word_doc_similarity = cosine_similarity(word_embeddings,
doc_embedding)
word_similarity = cosine_similarity(word_embeddings)

keywords_idx = [np.argmax(word_doc_similarity)]
candidates_idx = [i for i in range(len(words)) if i !=
keywords_idx[0]]

for _ in range(top_n - 1):

candidate_similarities = word_doc_similarity[candidates_idx, :]
target_similarities = np.max(word_similarity[candidates_idx][:,
keywords_idx], axis=1)

mmr = (lambda_param) * candidate_similarities - (1-lambda_param)


* target_similarities.reshape(-1, 1)
mmr_idx = candidates_idx[np.argmax(mmr)]

keywords_idx.append(mmr_idx)
candidates_idx.remove(mmr_idx)

return [words[idx] for idx in keywords_idx]

def get_distractors_wordnet(word):
distractors=[]
try:
syn = wn.synsets(word,'n')[0]

word= word.lower()
orig_word = word
if len(word.split())>0:
word = word.replace(" ","_")
hypernym = syn.hypernyms()
if len(hypernym) == 0:
return distractors
for item in hypernym[0].hyponyms():
name = item.lemmas()[0].name()

if name == orig_word:
continue
name = name.replace("_"," ")

45
name = " ".join(w.capitalize() for w in name.split())
if name is not None and name not in distractors:
distractors.append(name)
except:
print ("Wordnet distractors not found")
return distractors

def get_distractors
(word,origsentence,sense2vecmodel,sentencemodel,top_n,lambdaval):
distractors =
sense2vec_get_words(word,sense2vecmodel,top_n,origsentence)
print ("distractors ",distractors)
if len(distractors) ==0:
return distractors
distractors_new = [word.capitalize()]
distractors_new.extend(distractors)
embedding_sentence = origsentence+ " "+word.capitalize()
keyword_embedding = sentencemodel.encode([embedding_sentence])
distractor_embeddings = sentencemodel.encode(distractors_new)
max_keywords = min(len(distractors_new),5)
filtered_keywords = mmr(keyword_embedding,
distractor_embeddings,distractors_new,max_keywords,lambdaval)

final = [word.capitalize()]
for wrd in filtered_keywords:
if wrd.lower() !=word.lower():
final.append(wrd.capitalize())
final = final[1:]
return final

46

You might also like