0% found this document useful (0 votes)
13 views47 pages

25CSE33 Project Report

Uploaded by

Akash Rai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views47 pages

25CSE33 Project Report

Uploaded by

Akash Rai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

A Project Report on

“QuizCraft:MCQs Generator using AI”

Submitted in partial fulfillment of the requirements

for the award of the degree of


Bachelor of Technology

in
Computer Science and Engineering
by

Rishank Kashyap (2100970100094)

Nishant Kr. Pandey (2100970100071)

Nishant Tiwari (2100970100072)

Semester – VII
Under the Supervision of

Dr. Krishan Kumar Saraswat

Galgotias College of Engineering & Technology


Greater Noida 201306
Affiliated to

Dr. APJ Abdul Kalam Technical University, Lucknow

(Session: 2024-2025)
GALGOTIAS COLLEGE OF ENGINEERING & TECHNOLOGY
GREATER NOIDA, UTTER PRADESH, INDIA - 201306

CERTIFICATE

This is to certify that the project report entitled “QuizCraft:MCQs Generator using

AI” submitted by Mr. RISHANK KASHYAP (2100970100094), Mr. NISHANT


KUMAR PANDEY (2100970100071), Mr. NISHANT TIWARI (2100970100072) to
the Galgotias College of Engineering & Technology, Greater Noida, Utter Pradesh,
affiliated to Dr. A.P.J. Abdul Kalam TechnicalUniversity Lucknow, Uttar Pradesh in partial
fulfillment for the award of Degree of Bachelor of Technology in Computer science &
Engineering is a Bonafide record of the project work carried out by them under my
supervision during the year 2024-2025.

Dr. Krishan Kumar Saraswat Dr. Pushpa Chaudhary


Associate Professor Professor and Head
Dept. of CSE Dept. of CSE

i
GALGOTIAS COLLEGE OF ENGINEERING & TECHNOLOGY
GREATER NOIDA, UTTER PRADESH, INDIA - 201306

ACKNOWLEDGEMENT

We have taken efforts in this project. However, it would not have been possible without the
kind support and help of many individuals and organizations. We would like to extend my
sincere thanks to all of them.

We are highly indebted to Dr. Krishan Kumar Saraswat for his guidance and constant
supervision. Also, we are highly thankful to them for providing necessary information
regarding the project & also for their support in completing the project.

We are extremely indebted to Dr. Pushpa Chaudhary, HOD, Department of Computer Science
and Engineering, GCET and Mr. Manish Kumar Sharma and Dr. Sanjay Kumar, Project
Coordinator, Department of Computer Science and Engineering, GCET for their valuable
suggestions and constant support throughout my project tenure. We would also like to
express our sincere thanks to all faculty and staff members of Department of Computer
Science and Engineering, GCET for their support in completing this project on time.

We also express gratitude towards our parents for their kind co-operation and encouragement
which helped me in completion of this project. Our thanks and appreciations also go to our
friends in developing the project and all the people who havewillingly helped me out with
their abilities.

(RISHANK KASHYAP)

(NISHANT KUMAR PANDEY)

(NISHANT TIWARI)
ii
ABSTRACT

This project proposes the development of an automated Multiple Choice Question (MCQ)
generator leveraging OpenAI’s language models, specifically GPT, integrated within the
LangChain framework. The tool is engineered to assist educators in the efficient creation of
MCQs that are contextually relevant and exhibit a range of difficulty levels, suitable across
various educational subjects. Users can upload text or PDF files, and the system will
automatically generate MCQs based on the content provided. A user-friendly interface allows
for easy selection of the number of questions, subject specification, and complexity
customization. The generated questions are then evaluated against educational standards to
ensure quality and relevance. This project aims to enhance the process of educational content
creation through AI, making it more efficient, scalable, and adaptable to diverse learning
environments.

KEYWORDS: MCQ Generator, OpenAI, LangChain, Educational Content, Question


Generation, Language Models, User Interface Design.

iii
CONTENTS
Title Page

CERTIFICATE i
ACKNOWLEDGEMENT ii
ABSTRACT iii
CONTENTS iv
LIST OF FIGURES v
CHAPTER 1: INTRODUCTION 1

CHAPTER 2: LITERATURE REVIEW


Early Approaches 3
Advancements in AI and NLP 3
Challenges 4

Role of LangChain 5
Conclusion 5

CHAPTER 3: PROBLEM FORMULATION


Need and Significance of work 6
Challenges in Manual MCQ Creation 6
Objectives 7

CHAPTER 4: METHEDOLOGY/ PLANNING OF WORK


Study Design and Requirement Analysis 9
System Architecture Design 9
AI Model Selection 9
Content Processing 10
MCQ Generation 10
Quality Assurance 10
User Testing and Feedback 11
Performance and Scalability Optimization 11

iv
CHAPTER 5: SYSTEM DESIGN
Introduction 12
System Design Overview 12
Architecture 12
System Components 15

Workflow Diagram 16
System Workflow Details 18
Flowchart 18
Deployment 19

CHAPTER 6: IMPLEMENTATION
Setting Up the Development Environment 21
Code Implementation 24
Conclusion 29

CHAPTER 7: RESULT ANALYSIS

Functionality 30

Performance 31

User Experience 32

CHAPTER 8: CONCLUSION, LIMITATION AND FUTURE SCOPE


Conclusion 33
Limitation 34
Future Scope 35
Conclusion 37

REFERENCE 38

CONTRIBUTION OF PROJECT

1. Objective of Relevance of Project 39


2. Expected Outcomes 39
3. Social Relevance 39

v
LIST OF FIGURES

Figure Title Page

1. Architecture 13

2. System Components 14

3. Workflow Diagram 17

4. Flowchart 19

5. UI Diagram 30

6. Output: Readable 31

7. Output: Downloadable 32

vi
CHAPTER 1
INTRODUCTION

Multiple Choice Questions (MCQs) are a fundamental component of educational assessments,


providing a standardized method for evaluating student knowledge across a wide range of
subjects. MCQs are particularly valued for their scalability and efficiency in large-scale testing
environments. However, the manual creation of high-quality MCQs is often a time-consuming
and resource-intensive process, requiring significant expertise in both subject matter and
question design. The complexity of crafting questions that are not only accurate and clear but
also varied in difficulty and aligned with learning objectives presents a substantial challenge
for educators.

In recent years, advances in Artificial Intelligence (AI) and Natural Language Processing
(NLP) have opened new avenues for automating the generation of educational content,
including MCQs. AI models, particularly those based on deep learning, have demonstrated
remarkable capabilities in understanding and generating human-like text. Among these,
OpenAI's GPT models have emerged as leading tools in the field of text generation, capable of
producing coherent, context-aware text based on prompts provided by users.

This project aims to harness the power of OpenAI's GPT models within the LangChain
framework to create an automated MCQ generator that addresses the challenges faced by
educators. The LangChain framework provides a structured approach to building NLP
applications, enabling the customization and fine-tuning of GPT models to meet specific
educational needs. By integrating these technologies, the proposed system will not only
automate the generation of MCQs but also allow for the customization of questions based on
subject matter, complexity, and educational standards.

(Figure 1: Overview of the MCQ generation process, showing the interaction between the user
interface, content processing module, and GPT-based question generation.)

The system will feature a user-friendly interface where educators can upload text or PDF files,
specify the number of questions, choose the subject area, and set the desired difficulty level.
The AI model will then process the content, generate relevant MCQs, and present them for
review. This approach not only saves time but also ensures that the questions generated are
tailored to the specific educational context, making the tool adaptable to a wide range of
subjects and learning environments.

1
In summary, this project seeks to address the limitations of manual MCQ creation by leveraging
AI to automate the process, thereby enhancing the efficiency, scalability, and quality of
educational assessments. The following sections will provide a detailed overview of the
existing literature on automated question generation, the specific problem this project
addresses, the objectives of the research, the methodology employed, and the expected
outcomes.

2
CHAPTER 2

LITERATURE REVIEW

The automated generation of educational content, particularly MCQs, has been a subject of
extensive research and development over the past few decades. Various methodologies have
been proposed and implemented, ranging from traditional rule-based systems to modern AI-
driven approaches. This literature survey explores the evolution of automated question
generation, highlighting key studies, methodologies, and the technological advancements that
have shaped the current landscape.

2.1 Early Approaches: Rule-Based and Template-Driven Systems

The earliest attempts at automating question generation relied heavily on rule-based systems.
These systems used predefined rules and templates to generate questions from structured data
sources, such as textbooks or databases. While these systems were relatively simple to
implement, they were limited in their ability to adapt to different contexts and subject areas.
For example, a rule-based system might generate a question by extracting a sentence from a
text and converting it into a question format by replacing specific words with blanks or adding
question phrases. However, such questions often lacked depth and variety, leading to a limited
pool of questions that could be used in assessments.

Template-driven approaches attempted to address some of these limitations by providing more


structured and customizable templates for question generation. These templates could be
designed to accommodate different question types, such as fill-in-the-blank, true/false, or
multiple-choice formats. However, the reliance on templates still constrained the creativity and
variability of the questions generated, often resulting in repetitive and predictable questions.

2.2 Advancements in AI and NLP: The Emergence of GPT Models:

The introduction of AI and NLP into the field of automated question generation marked a
significant shift from the rigid rule-based and template-driven systems of the past. Machine
learning models, particularly those based on deep learning, began to demonstrate the ability to
understand and generate human-like text. Among these models, OpenAI's GPT series has been
particularly influential.

3
GPT-3, one of the most advanced versions of OpenAI's Generative Pretrained Transformer
models, has been widely recognized for its ability to generate coherent, context-aware text
based on a wide range of inputs. The model is trained on a massive dataset comprising diverse
text sources, enabling it to generate text that is not only grammatically correct but also
contextually relevant. Several studies have explored the application of GPT-3 in educational
content generation, including the automatic creation of summaries, explanations, and
questions.

For example, in a study by Brown et al. (2020), GPT-3 was used to generate a variety of
educational materials, including MCQs. The results demonstrated that GPT-3 could produce
questions that were comparable in quality to those created by human educators. However, the
study also highlighted some challenges, such as the need for prompt engineering to guide the
model in generating questions that align with specific educational objectives.

2.3 Challenges in AI-Generated Educational Content:

Despite the promising capabilities of AI models like GPT-3, several challenges remain in
ensuring the quality, relevance, and difficulty of AI-generated educational content. One of the
primary concerns is the model's tendency to generate text that is either too simple or too
complex for the intended audience. This issue is particularly important in educational settings,
where questions must be tailored to the appropriate grade level and subject matter.

Another challenge is the potential for bias in AI-generated content. Since GPT-3 is trained on
a vast corpus of text from the internet, it may inadvertently reproduce biases present in the
training data. This can lead to the generation of questions that are culturally insensitive or that
reinforce stereotypes. Ensuring fairness and inclusivity in AI-generated educational materials
is therefore a critical area of focus in the development of automated question generation
systems.

2.4 The Role of LangChain in Structuring AI-Generated Content:

The LangChain framework provides a structured approach to building NLP applications,


enabling developers to customize and fine-tune AI models for specific use cases. In the context
of this project, LangChain will be used to structure the prompts and refine the outputs of GPT

4
models, ensuring that the generated MCQs are aligned with educational standards and
objectives.

By incorporating LangChain into the MCQ generation process, this project aims to address
some of the challenges identified in the literature. The framework allows for the customization
of prompts, enabling the model to generate questions that are contextually relevant and
appropriately challenging. Additionally, LangChain's capabilities in prompt engineering will
be leveraged to minimize bias and ensure that the generated questions are fair and inclusive.

2.5 Conclusion:

The literature on automated question generation reveals a progression from simple rule-based
systems to sophisticated AI-driven models capable of generating high-quality educational
content. While significant advancements have been made, challenges remain in ensuring the
quality, relevance, and fairness of AI-generated questions. This project builds on the existing
literature by leveraging OpenAI's GPT models within the LangChain framework to create an
automated MCQ generator that addresses these challenges. The following sections will outline
the specific problem this project seeks to solve, the research objectives, the methodology
employed, and the expected outcomes.

5
CHAPTER 3

PROBLEM FORMULATION

Creating high-quality MCQs that cater to various difficulty levels and educational standards is
a time-consuming and resource-intensive process. Existing automated systems often produce
questions lacking in contextual relevance or appropriate difficulty variation. This project
addresses these gaps by developing an MCQ generator that can efficiently produce diverse and
high-quality questions from user-provided content. The significance of this research lies in its
potential to revolutionize educational content creation, making it more efficient and scalable.
By leveraging advanced AI models and a structured framework for prompt engineering, this
tool aims to enhance educators' ability to generate tailored content for various learning
contexts.

3.1 Need and Significance of Proposed Research Work:

The creation of high-quality Multiple Choice Questions (MCQs) is a critical but often resource-
intensive task in the field of education. MCQs are widely used in various educational
assessments due to their ability to evaluate a broad range of knowledge efficiently. However,
the manual process of crafting these questions requires significant expertise in both subject
matter and question design. This process is particularly challenging when there is a need to
produce a large number of questions that are diverse in content, appropriately varied in
difficulty, and aligned with specific educational objectives.

3.2 Challenges in Manual MCQ Creation

One of the primary challenges in manual MCQ creation is the need to ensure that questions are
not only accurate and clear but also contextually relevant and appropriately challenging for the
intended audience. Educators must carefully design questions to test different levels of
understanding, from basic recall of facts to higher-order thinking skills such as analysis and
evaluation. This requires a deep understanding of both the subject matter and the cognitive
processes involved in learning.

Furthermore, the need to create a large pool of questions that can be used across multiple
assessments adds to the complexity of the task.

6
OBJECTIVES

The objectives of this research are:

1. Develop an Automated MCQ Generator: Utilize OpenAI’s GPT language models,


integrated within the LangChain framework, to automate the generation of multiple-choice
questions (MCQs) from user-provided content such as text or PDF files. The system aims to
reduce the time and effort required for manual question creation.

2. Design a User-Friendly Interface: Create an intuitive and user-friendly interface that


allows educators to upload content files effortlessly. The interface will also provide options to
select the number of questions, subject matter, and difficulty level, ensuring flexibility and ease
of use.

3. Ensure Contextual Relevance and Complexity: The system will be designed to ensure that
the generated MCQs are contextually relevant to the input content. It will also provide a variety
of difficulty levels, from basic recall questions to more complex analytical ones, catering to a
range of cognitive skills.

4. Validate Against Educational Standards: Validate the quality and relevance of the
generated MCQs by comparing them against established educational benchmarks and
standards. This will ensure that the questions meet the necessary criteria for accuracy, clarity,
and educational value.

5. Incorporate Subject-Specific Customization: Implement functionality that allows the


generation of MCQs tailored to specific subjects. By fine-tuning the AI model’s prompts, the
system will be able to generate questions that align with the learning objectives of particular
subjects such as mathematics, science, or humanities.

6. Enable Difficulty Customization: Provide options for users to customize the difficulty level
of the generated MCQs. This will allow educators to create questions appropriate for various
student proficiency levels, from beginner to advanced learners, ensuring a versatile and
adaptable tool.

7. Ensure Scalability and Efficiency: Design the system to be scalable, capable of handling a
large volume of content uploads, and generating MCQs in real-time. The tool will be optimized

7
for performance, ensuring that it can efficiently process and generate questions for multiple
subjects simultaneously.

8. Minimize Bias and Ensure Inclusivity: Focus on minimizing biases in AI-generated


questions by using advanced prompt-engineering techniques. This objective aims to ensure that
the MCQs are fair, inclusive, and culturally sensitive, avoiding any inadvertent bias from the
language model's training data.

9. Facilitate Question Review and Editing: Provide users with the ability to review and
modify the generated MCQs before finalizing them. This will give educators control over the
output, allowing them to fine-tune questions to better meet their specific educational needs.

10. Support Multiple File Formats: Ensure that the system supports multiple file formats for
content uploads, including text files, PDFs, and other document types, making the tool
adaptable to various educational environments and materials.

11. Test Across Diverse Subjects and Educational Levels: Conduct extensive testing of the
system across a wide range of subjects, including science, mathematics, and language arts, as
well as different educational levels. This will ensure that the tool can generate quality MCQs
suitable for different grade levels and areas of study.

12. Provide Continuous System Feedback and Improvements: Incorporate feedback


mechanisms that allow educators to provide input on the quality of the generated questions.
This feedback will be used to continuously refine the system, improving the accuracy,
relevance, and usability of the MCQs over time.

13. Develop a Reliable Question Evaluation Mechanism: Implement a mechanism within


the system to evaluate the generated MCQs, ensuring that they not only meet educational
standards but are also pedagogically sound and suitable for assessments.

8
CHAPTER 4

METHEDOLOGY/ PLANNING OF WORK

The research methodology involves several steps:

4.1 Study Design and Requirement Analysis

 Requirement Gathering: The first phase will involve gathering detailed requirements
from educators, content creators, and other stakeholders. The aim is to understand the
specific needs for automated MCQ generation, including subject-specific requirements,
difficulty levels, and content format preferences (PDFs, text files, etc.).
 Feasibility Study: A feasibility study will be conducted to assess the capability of
OpenAI’s GPT language models for MCQ generation. This will include testing
different model versions and exploring the suitability of LangChain for structuring and
fine-tuning outputs.

4.2 System Architecture Design

 Frontend Development: The user interface (UI) will be designed using modern web
development frameworks to provide an intuitive and seamless experience. Educators
will be able to upload content, select the number of questions, specify the subject, and
adjust the complexity level of the MCQs.
 Backend Development: The backend architecture will be designed to handle content
processing, communication with GPT models, and the generation of MCQs. The system
will integrate LangChain to structure AI-generated outputs and customize the questions
according to the user’s input.

4.3 AI Model Selection and Fine-Tuning

 Model Selection: OpenAI’s GPT-3.5 or GPT-4 models will be leveraged for generating
MCQs. The chosen model will undergo rigorous testing to ensure it can generate
diverse, contextually relevant, and high-quality questions.
 Fine-Tuning with LangChain: LangChain will be used to structure the prompts fed to
the GPT models. By refining the prompts and training the system on educational

9
content, the model will be tailored to produce questions that align with various subjects,
difficulty levels, and educational standards.

4.4 Content Processing

 Text Analysis and Segmentation: The uploaded content (PDFs, text, etc.) will be
analyzed and segmented into smaller, manageable chunks. These segments will be
passed to the language model for MCQ generation, ensuring that the system works
efficiently with large files or dense content.
 Keyword Extraction: To ensure relevance, a keyword extraction process will be applied
to the input text. These keywords will help guide the GPT model in generating focused
and accurate MCQs that reflect the key concepts within the content.

4.5 MCQ Generation and Customization

 Prompt Engineering: Custom prompts will be designed to instruct the AI model to


generate MCQs of varying complexity. Different prompts will be tested to see which
formulations result in the best-quality questions for different educational contexts.
 Question Customization: The tool will allow users to set parameters such as the number
of questions, subject area, and difficulty level. Based on these parameters, the AI model
will generate appropriate MCQs.

4.6 Quality Assurance and Validation

 Evaluation Against Educational Standards: The generated MCQs will be validated


against educational standards to ensure they meet pedagogical quality requirements.
Educators will be involved in testing the questions to confirm their relevance, difficulty
level, and alignment with learning objectives.
 Bias Testing: The system will undergo testing to identify and mitigate any potential
biases in the AI-generated content. Special attention will be given to ensuring that
questions are inclusive and free from cultural or gender biases.

10
4.7 User Testing and Feedback

 Beta Testing with Educators: A beta version of the system will be tested with a selected
group of educators from various subjects. Feedback on the user interface, question
quality, and customization features will be collected and used to refine the tool.
 Iterative Development: Based on the feedback, iterative improvements will be made to
the system, including updates to the user interface, question generation accuracy, and
overall performance.

4.8 Performance and Scalability Optimization

 System Optimization: The backend system will be optimized to handle large-scale


content processing and real-time MCQ generation. Performance metrics, including
response times and system scalability, will be monitored and improved.
 Continuous Learning: A machine learning pipeline will be established to allow the
system to continuously learn from user interactions and feedback, improving the
accuracy and relevance of future MCQs generated.

4.9 Data Collection and Analysis

 Data Logging: Data on the generated questions, user feedback, and performance metrics
will be logged and analyzed. This will help in identifying areas for improvement and
ensuring that the system is aligned with its objectives.
 Statistical Analysis: The collected data will undergo statistical analysis to assess the
quality of generated questions in terms of accuracy, relevance, and difficulty level. This
analysis will provide insights into the system’s effectiveness across different subjects
and educational levels.

11
CHAPTER 5

SYSTEM DESIGN

5.1 Introduction

This project demonstrates the development of an MCQ Generator using OpenAI, LangChain,
and Streamlit, showcasing how cutting-edge generative AI technology can be applied to real-
world problems. The system takes textual input or topics from users and generates meaningful
multiple-choice questions (MCQs) with answer options.

5.2 System Design Overview

The system is divided into three key components:

Frontend: A user interface built using Streamlit to provide input and view results.

Backend Logic: Combines OpenAI APIs with LangChain for natural language processing and
MCQ generation.

Integration Layer: Facilitates seamless interaction between the frontend and the AI model.

5.3 Architecture

Below is the high-level architecture:

12
Figure - 1 : High Level Architecture

13
5.4 System Components

Figure - 2 : System Components

Frontend (Streamlit):

Purpose: Provides an intuitive user interface for users to:

14
Input text or topic.

Configure parameters (difficulty, number of questions).

View or download generated MCQs.

Tools:

Streamlit for UI design.

Python for connecting frontend with backend.

Backend Logic:

LangChain:

Role: Acts as an orchestration layer for prompt engineering and response post-processing.

Key Features:

Constructs dynamic prompts to customize OpenAI API requests.

Formats responses into structured MCQs.

OpenAI API:

Role: Core AI component for text understanding and MCQ generation.

Model: GPT (e.g., GPT-4 or GPT-3.5-turbo).

Workflow:

Input is sent to OpenAI via LangChain.

GPT generates questions and options in a JSON-like structure.

Output Example:

"question": "What is the capital of France?",

15
"options": ["Paris", "London", "Berlin", "Madrid"],

"answer": "Paris"

Integration Layer:

Workflow:

Streamlit sends input to LangChain.

LangChain forwards input to OpenAI and processes the output.

Processed output is returned to Streamlit for display.

Technologies:

HTTP Requests/Responses via Python libraries like requests.

Database (Optional):

Role:

If persistence is required, a database can store: User inputs, Generated MCQs for later retrieval.

Choice: SQLite for simplicity or MongoDB for scalability.

5.5 Workflow Diagram

The following flowchart illustrates the system workflow:

16
Figure - 3 : Workflow Diagram

17
5.6 System Workflow Details

Input:

User enters a topic or text via the Streamlit interface.

Selects additional parameters (e.g., difficulty level, number of questions).

Processing:

LangChain builds a prompt using the input.

The prompt is sent to OpenAI via its API.

GPT generates MCQs based on the prompt.

Output:

LangChain formats the GPT output.

The formatted MCQs are displayed on Streamlit.

Users can download the MCQs in formats like CSV or JSON.

5.7 Flowchart

Here’s a graphical representation of the flow:

18
Figure - 4 : Flowchart

5.8 Deployment

Cloud Hosting:

Deploy Streamlit app on Streamlit Cloud, AWS, or Google Cloud.

Ensure API keys are securely stored.

API Key Management:

Use environment variables or a secret manager to handle OpenAI API keys.

5.9 Benefits and Scalability

Benefits:

User-friendly interface.

Highly customizable MCQs.

19
Quick response time leveraging GPT's capabilities.

Scalability:

Add support for additional languages or question formats.

Integrate with LMS systems for seamless deployment in educational setups.

20
CHAPTER 6

IMPLEMENTATION

In this chapter, we describe the implementation of the MCQ Generator system using OpenAI,
LangChain, and Streamlit. The system is designed to generate multiple-choice questions
(MCQs) from a given topic using GPT-based models, with integration into a simple web
interface powered by Streamlit. This chapter will cover the entire process from setting up the
development environment to the detailed code flow, database management, testing, and
deployment.

6.1 Setting Up the Development Environment

Before starting with the implementation, it is essential to set up the development environment
correctly. The environment setup is divided into the following steps:

6.1.1 Software Requirements

Python 3.9+: The primary programming language used in the project. It is recommended to use
Python 3.9 or later versions.

Streamlit: This is the framework used for creating the interactive user interface for the project.

OpenAI API: The core of the project is the OpenAI API (GPT models), which will generate
the MCQs based on a given prompt.

LangChain: A tool that allows us to manage prompts dynamically and interact with OpenAI’s
API more efficiently.

SQLite (optional): A lightweight database for storing MCQs and user inputs. This is optional
and used for additional functionality like saving generated MCQs for later use.

Environment Variables: These are used to store sensitive data like API keys.

21
6.1.2 Installing Dependencies

The first step in the development process is to install the required dependencies. This can be
done using Python's package manager pip.

Create a virtual environment (recommended) and install the necessary libraries:

python -m venv mcq-env

source mcq-env/bin/activate # On Windows, use mcq-env\Scripts\activate

pip install streamlit openai langchain python-dotenv sqlite3

6.1.3 Environment Setup

Once the dependencies are installed, you need to set up the environment variables to store the
OpenAI API key securely. Create a .env file in the root directory of your project:

OPENAI_API_KEY=your_openai_api_key

The .env file will be used by the python-dotenv library to securely load the OpenAI API key.

6.2 Code Implementation

6.2.1 Frontend: User Interface with Streamlit

The frontend is implemented using Streamlit, which provides an interactive interface for users
to upload files, input parameters, and generate MCQs.

import os

import json

import traceback

import pandas as pd

import streamlit as st

from dotenv import load_dotenv

from langchain.callbacks import get_openai_callback

from src.mcqgenerator.utils import read_file, get_table_data

22
from src.mcqgenerator.MCQGenerator import generate_evaluate_chain

from src.mcqgenerator.logger import logging

import ssl

ssl._create_default_https_context = ssl._create_unverified_context

with open(r'C:\Users\risha\OneDrive\Desktop\mcqGen\Response.json', 'r') as file:

RESPONSE_JSON = json.load(file)

st.title("MCQs Creator Application with LangChain")

# User input form

with st.form("user_inputs"):

# File Upload

uploaded_file = st.file_uploader("Upload a PDF or txt file")

# Input Fields

mcq_count = st.number_input("No. of MCQs", min_value=3, max_value=50)

# Subject

subject = st.text_input("Insert Subject", max_chars=20)

# Quiz Tone

tone = st.text_input("Complexity Level Of Questions", max_chars=20,


placeholder="Simple")

# Add Button

button = st.form_submit_button("Create MCQs")

# Check if the button is clicked and all fields have input

if button and uploaded_file is not None and mcq_count and subject and tone:

with st.spinner("Loading..."):

23
try:

text = read_file(uploaded_file)

# Count tokens and the cost of API call

with get_openai_callback() as cb:

response = generate_evaluate_chain(

"text": text,

"number": mcq_count,

"subject": subject,

"tone": tone,

"response_json": json.dumps(RESPONSE_JSON),

except Exception as e:

traceback.print_exception(type(e), e, e.__traceback__)

st.error("Error")

else:

print(f"Prompt Tokens: {cb.prompt_tokens}")

print(f"Total Tokens: {cb.total_tokens}")

print(f"Completion Tokens: {cb.completion_tokens}")

print(f"Total Cost: {cb.total_cost}")

if isinstance(response, dict):

# Extract the quiz data from the response

24
quiz = response.get("quiz", None)

if quiz is not None:

table_data = get_table_data(quiz)

if table_data is not None:

df = pd.DataFrame(table_data)

df.index = df.index + 1

st.table(df)

# Display the review in a text box as well

st.text_area(label="Review", value=response["review"])

else:

st.error("Error in the table data")

else:

st.write(response)

Explanation of Frontend Code

File Upload: Users can upload a PDF or text file containing the content for MCQ
generation.

Input Fields: Allows users to specify the number of MCQs, subject, and complexity level.

Create Button: Triggers the generation process and displays results in a table format.

Display Results: The generated quiz is displayed in a structured table, along with a review
text area.

6.2.2 Backend: Logic to Generate and Evaluate MCQs with LangChain and OpenAI

The backend handles quiz generation and evaluation using LangChain and OpenAI's GPT API.

25
import os

import json

import pandas as pd

import traceback

import openai

from src.mcqgenerator.utils import read_file, get_table_data

from src.mcqgenerator.logger import logging

from dotenv import load_dotenv

load_dotenv()

from langchain.chat_models import ChatOpenAI

from langchain.prompts import PromptTemplate

from langchain.chains import LLMChain, SequentialChain

from langchain.callbacks import get_openai_callback

KEY = os.getenv("OPENAI_API_KEY")

openai.api_key = KEY

# Initialize OpenAI LLM

llm = ChatOpenAI(

openai_api_key=KEY,

26
model_name="gpt-3.5-turbo",

temperature=0.5,

# Templates for Quiz Generation

quiz_generation_template = """

Text:{text}

You are an expert MCQ maker. Given the above text, it is your job to \

create a quiz of {number} multiple choice questions for {subject} students in {tone} tone.

Make sure the questions are not repeated and check all the questions to be conforming the text
as well.

Make sure to format your response like RESPONSE_JSON below and use it as a guide. \

Ensure to make {number} MCQs

### RESPONSE_JSON

{response_json}

"""

quiz_generation_prompt = PromptTemplate(

input_variables=["text", "number", "subject", "tone", "response_json"],

template=quiz_generation_template,

quiz_chain = LLMChain(llm=llm, prompt=quiz_generation_prompt, output_key="quiz",


verbose=True)

27
# Templates for Quiz Evaluation

quiz_evaluation_template = """

You are an expert English grammarian and writer. Given a Multiple Choice Quiz for {subject}
students, \

you need to evaluate the complexity of the question and give a complete analysis of the quiz.
Only use at max 50 words for complexity analysis.

If the quiz is not at par with the cognitive and analytical abilities of the students, \

update the quiz questions which need to be changed and change the tone such that it perfectly
fits the students' abilities.

Quiz_MCQs:

quiz_evaluation_prompt = PromptTemplate(

input_variables=["subject", "quiz"],

template=quiz_evaluation_template,

review_chain = LLMChain(llm=llm, prompt=quiz_evaluation_prompt, output_key="review",


verbose=True)

# Combined Chain

generate_evaluate_chain = SequentialChain(

chains=[quiz_chain, review_chain],

input_variables=["text", "number", "subject", "tone", "response_json"],

output_variables=["quiz", "review"],

28
verbose=True,

Explanation of Backend Code

Prompt Templates: Define the format for MCQ generation and evaluation using
LangChain's PromptTemplate.

Sequential Chain: Combines quiz generation and evaluation into a single workflow.

Output Handling: Returns a structured response for easy integration with the frontend.

6.6. Conclusion

This chapter has described the detailed implementation process of the MCQ Generator system,
from setting up the development environment and writing code for the backend and frontend,
to testing and deploying the system. The MCQ Generator leverages cutting-edge technologies
such as OpenAI and LangChain for natural language processing, while Streamlit provides a
user-friendly interface for interacting with the system. The optional database integration
provides the capability to store generated questions for future use, enhancing the application's
utility.

29
CHAPTER 7

RESULT ANALYSIS

The MCQ Generator project underwent a detailed evaluation to assess its functionality,
performance, and user experience. The analysis highlights the tool’s ability to generate
contextually accurate and high-quality multiple-choice questions (MCQs) while identifying
areas for future improvement. Below is the result analysis of the project.

7.1 Functionality

The MCQ Generator was tested for its core features:

Input Handling:

Users can upload .pdf or .txt files containing source text for generating MCQs.

(Fig 5)

Input fields such as the number of questions, subject, and complexity tone are
customizable, enhancing user flexibility.

Figure - 5 : UI Diagram

30
MCQ Generation:

The system produces unique MCQs directly related to the provided input text.

Questions follow the specified tone and complexity, catering to different audiences.

Generated MCQs are formatted in a structured JSON format, ensuring consistency


and ease of integration.

It has two formats: i. Readable Format (Figure - 6)

ii. Downloadable Format (Figure - 7)

Figure - 6 : Output - Readable Format

31
Figure - 7 : Output - Downloadable Format

Review and Evaluation:

The built-in review mechanism evaluates the cognitive complexity of the questions
and provides concise feedback.

Suggestions to refine questions improve their relevance and suitability for the target
audience.

7.2 Performance Metrics

Response Time:

The average time to generate and evaluate MCQs is acceptable for small to medium-
sized inputs.

Longer texts slightly increase response times due to the token limits of the OpenAI
API and processing overhead.

Scalability:

The system handles up to 50 MCQs effectively without performance degradation.

It supports dynamic adjustment of parameters such as tone and subject, ensuring


scalability across diverse use cases.

32
Error Handling:

Error messages are meaningful and assist users in troubleshooting common issues
like unsupported file formats, missing API keys, or invalid inputs.

7.3 User Experience

Frontend Design:

The Streamlit-based interface is clean and user-friendly, ensuring ease of use for
both technical and non-technical users.

The MCQs and review results are displayed in an organized table format, enhancing
clarity.

User Feedback:

The application’s ability to customize quizzes and provide detailed reviews has
been positively received.

Users appreciated the accuracy of MCQs and the seamless integration of file
processing.

33
CHAPTER 8

CONCLUSION, LIMITATION AND FUTURE SCOPE

8.1 Conclusion

The MCQ Generator project successfully demonstrates the power of generative AI, particularly
using OpenAI’s GPT models and LangChain for natural language processing. It leverages these
technologies to automatically generate multiple-choice questions (MCQs) from a given topic,
with configurable difficulty levels and a customizable number of questions. The integration of
Streamlit allows the project to be easily deployed as a user-friendly web application, where
users can input their desired topic and parameters and receive immediate feedback in the form
of MCQs.

This project is a prime example of the potential of generative AI in automating educational


content creation. By utilizing cutting-edge models like OpenAI’s GPT, the MCQ Generator
provides a powerful tool for educators, content creators, and students. With the ability to
generate MCQs on any topic and adapt to various levels of difficulty, this tool can significantly
reduce the time and effort involved in creating quizzes and exams.

Key Features of the MCQ Generator:

Topic Flexibility: The system can generate questions on virtually any topic, making it versatile
for a wide range of educational fields.

Difficulty Control: The user can adjust the difficulty of the generated MCQs, ensuring that the
questions are suited to the learning level of the audience.

Ease of Use: Through a simple web interface built using Streamlit, users can generate MCQs
by simply entering the topic and choosing the desired parameters.

Integration with OpenAI’s GPT: The backend logic is powered by LangChain and OpenAI,
ensuring high-quality and relevant MCQs.

The project has been implemented successfully, and it demonstrates the potential of leveraging
AI to create educational content automatically. By using advanced AI models and cloud
technologies, the system is scalable and adaptable, able to handle increasing amounts of data
and user interactions in the future.

34
8.2 Limitations

While the MCQ Generator provides a solid foundation for automating question creation, there
are several limitations and challenges that have been encountered or that may arise in future
iterations:

8.2.1 Quality of Questions

Although the OpenAI GPT model is powerful, the quality of the generated questions can
sometimes be inconsistent. Depending on the complexity of the topic or the way it is framed,
the model might generate questions that:

Lack clarity or precision

Include irrelevant options

Have errors in the formatting or structure of the questions

This is particularly true for topics with ambiguous or complex concepts, where the model may
fail to fully capture the nuances.

8.2.2 Limited to Textual Data

The current implementation is limited to generating MCQs based on text input. It does not yet
support generating questions from other media formats such as images, videos, or audio.
Integrating multimodal data processing would expand the capabilities of the MCQ generator
significantly.

8.2.3 API Dependency

The project heavily depends on the OpenAI API, which introduces certain limitations:

Cost: As OpenAI’s API is a paid service, generating a large number of MCQs or running the
application frequently may result in high costs.

Rate Limits: OpenAI imposes rate limits on the number of requests that can be made within a
certain time frame. If the application receives high traffic, it might face restrictions on
generating MCQs in real time.

35
8.2.4 Complexity in Handling Large Inputs

For more advanced or larger topics, the model might struggle to generate meaningful MCQs
that are both relevant and accurate. Very broad or detailed topics can result in poorly formatted
or overly generic questions.

8.2.5 Limited Customization

While the MCQ generator allows users to select the number of questions and the difficulty
level, further customization is limited. For example, users cannot currently:

Specify the format of questions (e.g., fill-in-the-blank, true/false)

Define the number of options (currently fixed at four)

Add explanatory answers or solutions to the generated questions

8.3. Future Scope

The MCQ Generator project has several exciting opportunities for improvement and expansion.
These enhancements could improve the user experience, increase flexibility, and address some
of the limitations discussed earlier. Here are some possible directions for future development:

8.3.1 Enhanced Question Quality

Improving the quality of the generated questions is crucial. This can be achieved through
several strategies:

Fine-tuning the GPT model: By training the model on a specific dataset of high-quality MCQs,
the generator could produce more relevant and accurate questions.

Post-processing of generated questions: Implementing a review and correction mechanism


after the questions are generated can improve the final output. For example, using rule-based
filters to remove irrelevant options or ensuring the questions are factually correct.

Crowdsourcing Feedback: Allowing users to rate the quality of generated questions could help
improve the system by providing feedback that can be used to fine-tune the models.

8.3.2 Multimodal Question Generation

Expanding the scope to handle multimedia input is a valuable feature. For example:

36
Images: Allowing users to upload images and generating MCQs based on the visual content
could be useful for subjects like biology, geography, and art.

Videos and Audio: Generating questions from videos or audio clips would make the tool even
more versatile, especially for subjects that require interpreting media such as history,
languages, or literature.

8.3.3 Integrating More Question Formats

To increase the flexibility of the system, additional question types could be added:

True/False questions

Fill-in-the-blank questions

Matching questions

Short-answer questions

This would allow educators to customize the questions more effectively for different types of
assessments.

8.3.4 Advanced User Personalization

User personalization could be improved by integrating machine learning to adjust question


generation based on a user’s past interactions.

For example:

Difficulty Adjustment: The system could learn the user’s proficiency level over time and
generate MCQs that are increasingly difficult.

Topic-based Adaptation: By tracking the user’s preferences or performance in different topics,


the system could provide more targeted and relevant questions.

8.3.5 Expand to Other Educational Content

In addition to generating MCQs, the system could be expanded to generate other educational
materials:

Flashcards: Generate flashcards from a given topic, with questions on one side and answers on
the other.

37
Explanatory Content: Generate detailed explanations or notes for the MCQs, which could be
useful for self-learning or teaching purposes.

8.3.6 Open Source and Community Contributions

Another future direction would be to open-source the project, allowing the community to
contribute:

Custom Prompt Templates: Users could submit their own templates for generating MCQs in
different formats or for specific subject areas.

Additional Features: The community could help add new features, such as a bulk MCQ
generation tool, integration with learning management systems (LMS), or support for multiple
languages.

8.3.7 Integration with Other Platforms

Learning Management Systems (LMS): Integrating the MCQ Generator with popular LMS
platforms like Moodle or Blackboard would allow teachers to automatically generate quizzes
for students based on course content.

AI-Assisted Tutoring: The MCQ Generator could be integrated into AI tutoring systems, where
it would not only generate questions but also evaluate student responses in real-time and
provide feedback.

8.4. Conclusion

In conclusion, the MCQ Generator is a powerful tool that leverages OpenAI’s GPT models and
LangChain for generating educational content, specifically MCQs, with flexibility in topic
selection, difficulty levels, and the number of questions. Despite some limitations regarding
question quality, customization, and multimedia support, the system presents a promising
solution for educators, content creators, and learners. With further development in areas like
multimodal content, advanced personalization, and integration with other platforms, the MCQ
Generator could evolve into an even more robust tool for the education sector.

This project serves as a starting point, and with continued improvements, it could revolutionize
the way educational content is generated, making it faster, more accessible, and customizable
for learners worldwide.

38
REFERENCES

1. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam,
P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., & McCandlish, S. (2020).
Language Models are Few-Shot Learners. OpenAI GPT Research, 10(1), 67-73.

2. Sun, J., Chen, S., & Chu, W. (2019). An Automated Approach to Multiple-Choice Question
Generation from Text. Journal of Educational Technology & Society, 22(1), 42-53.

3. Van Der Linden, W. J., & Glas, C. A. W. (2018). Elements of Adaptive Testing. Springer.

4. Kumar, R., & Chattopadhyay, S. (2021). Enhancing Educational Assessment with AI: An Overview
of Automated Question Generation. Artificial Intelligence in Education, 12(2), 132-146.

5. LangChain. (2022). LangChain: Building Applications with Large Language Models. Available at:
https://langchain.com

6. Harasim, L. (2017). Learning Theory and Online Technologies. Routledge.

7. Molenda, M., & Reigeluth, C. M. (2016). The Role of Technology in Education Reform.
Educational Technology Research and Development, 44(3), 31-53.

39
CONTRIBUTION OF PROJECT

1. Objective and Relevance of Project

 The main objective of our project to make easiest way to approach the disease of
patient by getting knowledge of the symptoms

 In order to ease the process of doctors and helping them in making the treatment more
quickly.

 It also helps to the patient in easily knowing their disease.

2. Expected Outcome

 Outcome of our project is that it precisely identifies the disease of any patient and
report to him that what the problem he/she has.

3. Social Relevance

 According to our project it helps to many people in finding their disease very easily
and quickly so it is very useful to our society in many aspects.

40

You might also like