Documentation
Documentation
ABSTRACT
“College Enquiry Chatbot with Flask” introduces an advanced chatbot designed to
streamline college-related inquiries. Utilizing Flask, an AI-driven platform integrates natural
language processing to offer accurate and accessible information to users.This interactive
interface aims to provide details on admission criteria, courses, campus facilities, and events.
This Chatbot serves as an intuitive and reliable resource for individuals seeking college-
related information, fostering informed decision-making for prospective students. The answers
are appropriate to what the user queries. The User can query any college-related activities
through the system. The user does not have to personally go to the college for enquiry. The
System analyses the question and then answers to the user. The chatbot’s development
involves AI algorithms within a user-friendly interface to enhance accessibility and simplify
the acquisition of precise and pertinent college details.
TABLE OF CONTENTS
CHAPTER TITLE PAGE NO
1. INTRODUCTION
1.1 GENERAL
1.2 OBJECTIVES
1.3 EXISTING SYSTEM
1.3.1 DISADVANTAGES OF EXISTING
SYSTEM
1.4 PROPOSED SYSTEM
1.4.1 ADVANTAGES OF PROPOSED
SYSTEM
2. SYSTEM REQUIREMENTS
2.1 GENERAL
2.2 HARDWARE REQUIREMENT
2.3 SOFTWARE REQUIREMENT
3. SYSTEM DESIGN
3.1 GENERAL
3.2 USE CASE DIAGRAM
3.3 DATA FLOW DIAGRAM
3.4 PROCESS FLOW DIAGRAM
3.5 SEQUENCE DIAGRAM
3.6 ACTIVITY DIAGRAM
3.7 COLLABORATION DIAGRAM
4. SOFTWARE SPECIFICATION
4.1 GENERAL
5. MODULES
6.1 GENERAL
7. TESTING
8. DATABASE IMPLEMENTATION
9. INTERFACE
11.
12.
1. INTRODUCTION
1.1GENERAL
College Enquiry Chatbot is for college students, staff, and parents. Easy way to
interaction and time consuming. This project is mainly targeted at colleges and the
synchronization of all the sparse and diverse information regarding regular college schedule.
Generally, students face problems in getting correct notifications at the correct time,
sometimes important notices such as campus interview, training and placement events,
holidays, and special announcements. Smart Campus tries to bridge this gap between
students, teachers, and college administrators.
The College bot project is built using artificial algorithms that analyses user’s queries
and understand user’s message. This System is a web application which provides answer to
the query of the student. Students just must query through the bot which is used for chatting.
Students can chat using any format there is no specific format the user has to follow. The
System uses built in artificial intelligence to answer the query. The answers are appropriate
what the user queries. The User can query any college related activities through the system.
The user does not have to personally go to the college for enquiry. The System analyses the
question and then answers to the user. The system answers to the query as if it is answered by
the person. With the help of artificial intelligence, the system answers the query asked by the
students.
1.2 OBJECTIVE
The current system for college inquiries involves manual browsing of college websites
or direct communication with admission offices for information. Some colleges use basic
chatbot, but these systems often lack comprehensive capabilities. Overall, the existing process
relies on manual navigation and human interaction, which can be time-consuming and less
interactive for individuals seeking college-related details.
1.3.1 DISADVANTAGES OF EXISTING SYSTEM
Manual Navigation
Limited Interaction
Dependency on Human Assistance
Limited Accessibility
1. Hardware Requirements
2. Software Requirements
The hardware requirements may serve as the basis for a contract for the
implementation of the system and should therefore be a complete and consistent specification
of the whole system. They are used by software engineers as the starting point for the system
design. It shows what the systems do and not how it should be implemented.
The software requirements are the specification of the system. It should include
both a definition and a specification of requirements. It is a set of what the system should do
rather than how it should do it. The software requirements provide a basis for creating the
software requirements specification. It is useful in estimating cost, planning team activities,
performing tasks and tracking the team’s and tracking the team’s progress throughout the
development activity.
1. Data Set
2. Python
3. SYSTEM DESIGN
3.1 GENERAL
System Design deals with the various UML [Unified Modelling language] diagrams
for the implementation of project. Design is a meaningful engineering representation of a
thing that is to be built. Software design is a process through which the requirements are
translated into representation of the software. Design is the place where quality is rendered in
software engineering. Design is the means to accurately translate customer requirements into
finished product.
System design is the process of defining the components, modules, interfaces, and data
for a system to satisfy specified requirements. System development is the process of creating
or altering systems, along with the processes, practices, models, and methodologies used to
develop them. System Requirements are the necessary specifications your computer must
have in order to use the software or hardware.
Architectural design
The architectural design of a system emphasizes the design of the system architecture
that describes the structure, behaviour and more views of that system and analysis.
Logical design
The logical design of a system pertains to an abstract representation of the data flows,
inputs and outputs of the system. This is often conducted via modelling, using an overabstract
(and sometimes graphical) model of the actual system. In the context of systems, designs are
included. Logical design includes entity-relationship diagrams (ER diagrams).
Physical design
The physical design relates to the actual input and output processes of the system.
This is explained in terms of how data is input into a system, how it is verified or
authenticated, how it is processed, and how it is displayed.
In physical design, the following requirements about the system are decided.
1. Input requirement,
2. Output requirements,
3. Storage requirements,
4. Processing requirements,
5. System control and backup or recovery.
Thus, the physical portion of system design can generally be broken down into three subtasks:
1. User Interface Design
2. Data Design
3. Process Design
User Interface Design is concerned with how users add information to the system
and with how the system presents information back to them.
Data Design is concerned with how the data is represented and stored within the
system.
Process Design is concerned with how data moves through the system, and with
how
and where it is validated, secured and/or transformed as it flows into, through and out
of the system.
Use case diagram overview the usage requirement for system they are useful for
presentations to management and/or project stakeholders, but for actual development you will
find that use cases provide significantly more value because they describe the meant of the
actual requirements. A use case describes a sequence of action that provide something of
measurable value to an action and is drawn as a horizontal ellipse.
A key concept of use case modelling is that it helps us design a system from the end
user's perspective. It is an effective technique for communicating system behaviour in the
user's terms by specifying all externally visible system behaviour.
Use case: A use case describes a sequence of actions that provided something of measurable
value to an actor and is drawn as a horizontal ellipse.
Actor: An actor is a person, organization or external system that plays a role in one or more
interaction with the system.
Use Case Diagram
In a DFD, processes transform input data into output, which is then sent to other processes or
stored in data repositories. The diagram helps to understand the system’s functionality and the
relationship between different entities, processes, and data stores. DFDs are used to analyze
and design systems, providing a clear map of how data is processed and exchanged within the
system.
Elements that may be included are: sequence of actions, materials or services entering or
leaving the process (inputs and outputs), decisions that must be made, people who become
involved, time involved at each step and/or process measurements.
Sequence diagram model the flow of logic within your system in a visual manner,
enabling you both to document and validate your logic, and commonly used for both analysis
and design purpose. Sequence diagram are the most popular UML for dynamic modelling,
which focuses on identifying the behaviour within your system.
The chatbot assists students and visitors with inquiries related to admissions, courses, fees,
faculty, campus facilities, and more. It will be built using Flask (a Python web framework)
and integrate with a database and possibly an NLP engine.
4.2 PYTHON
Python is Interpreted: Python is processed at runtime by the interpreter. You do not need to
compile your program before executing it. This is similar to PERL and PHP.
Python is Interactive: You can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.
Python is Object-Oriented: Python supports Object-Oriented style ortechnique of
programming that encapsulates code within objects.
Python is a Beginner's Language: Python is a great language for the beginner-level
programmers and supports the development of a wide range of applications from simple text
processing to WWW browsers to games.
History of Python
Python was developed by Guido van Rossum in the late eighties and early nineties at
the National Research Institute for Mathematics and Computer Science in the Netherlands.
Python is derived from many other languages, including ABC, Modula-3, C, C++, Algol-68,
Small Talk, Unix shell, and other scripting languages. Python is copyrighted. Like Perl,
Python source code is now available under the GNU General Public License (GPL).Python is
now maintained by a core development team at the institute, although Guido van Rossum still
holds a vital role in directing its progress.
Python Features
Easy-to-learn: Python has few keywords, simple structure, and a clearly defined syntax.
This allows the student to pick up the language quickly.
Easy-to-read: Python code is more clearly defined and visible to the eyes.
Easy-to-maintain: Python's source code is fairly easy-to-maintain.
A broad standard library: Python's bulk of the library is very portable and cross-
platform compatible on UNIX, Windows, and Macintosh.
Interactive Mode: Python has support for an interactive mode which allows interactive
testing and debugging of snippets of code.
Portable: Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.
Extendable: You can add low-level modules to the Python interpreter. These modules
enable programmers to add to or customize their tools to be more efficient.
Databases: Python provides interfaces to all major commercial databases.
GUI Programming: Python supports GUI applications that can be created and ported to
many system calls, libraries, and windows systems, such as Windows MFC, Macintosh, and
the X Window system of Unix.
Scalable: Python provides a better structure and support for large programs than shell
scripting.
Apart from the above-mentioned features, Python has a big list of good features, few are
listed below:
IT supports functional and structured programming methods as well as OOP.
It can be used as a scripting language or can be compiled to byte-code for building large
applications.
It provides very high-level dynamic data types and supports dynamic type checking.
IT supports automatic garbage collection.
It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.
Python is available on a wide variety of platforms including Linux and Mac OS X. Let
understand how to set up our Python environment.
4.3.1 Flask:
Flask is a lightweight and versatile web framework for Python, known for its
simplicity and flexibility. It provides developers with the tools to build web applications
quickly and efficiently. Flask follows the WSGI (Web Server Gateway Interface) protocol and
is based on the Werkzeug WSGI toolkit and the Jinja2 template engine. With Flask,
developers can create web applications ranging from simple APIs to complex full-fledged web
applications. Its minimalistic design allows for easy customization and integration with other
Python libraries and frameworks, making it a popular choice for web development projects of
all sizes.
Importing Flask: Start by importing the Flask class from the flask package. python
from flask import Flask
Creating an instance: Create an instance of the Flask class. This instance will be
the WSGI application.
Python app = Flask(__name__)
Defining routes: Define routes to specify how the application responds to different URLs.
Routes are defined using the @app. route() decorator.
python
@app. route(‘/’)
def index():
return ‘Hello, World!’;
Running the application: Finally, run the Flask application using the run() method.
python
if __name__ ==’__main__’:
app. run(debug=True)
Putting it all together, a basic Flask application structure looks like this:
Python from flask import Flask
app = Flask(__name__)
@app. route(‘/’)
def index():
return ’Hello, World!’;
if __name__ == ‘__main__’:
app. run(debug=True)
This structure creates a simple Flask application with a single route (/) that returns the string
" Hello, World!" when accessed. The debug=True parameter in app. run() enables
debug mode, which provides helpful error messages during development.
4.3.2 NLP:
Natural Language Processing (NLP) is a field of artificial intelligence (AI) that
focuses on the interaction between computers and human language. It involves the
development of algorithms and models that enable computers to understand,
interpret, and generate human language in a meaningful way.
Text Understanding: Extracting meaning from unstructured text data, such as sentiment
analysis, named entity recognition, and topic modeling.
Text Generation: Creating human-like text, such as language translation, text summarization,
and chatbots.
Text Classification: Categorizing text documents into predefined categories or labels, such as
spam detection, sentiment analysis, and topic classification.
NLP techniques often involve a combination of machine learning, deep learning, and
linguistic principles to analyze and understand human language. With the growing volume of
textual data available online, NLP plays a crucial role in various applications, including search
engines, social media analysis, customer service automation, healthcare, finance, and more.
Advances in NLP have led to significant improvements in the accuracy and efficiency of
language-related tasks, driving innovation in many industries.
4.3.3 NLTK:
NLTK, or the Natural Language Toolkit, is a leading platform for building Python
programs to work with human language data. It provides easy-to-use interfaces and
libraries for tasks such as tokenization, stemming, tagging, parsing, and semantic
reasoning. Developed by researchers and educators in the field of computational
linguistics, NLTK is widely used in academia and industry for teaching, research,
and development of NLP applications.
Tokenization: NLTK offers tools for breaking text into tokens, such as words or
sentences, facilitating further analysis and processing.
Named Entity Recognition (NER): NLTK includes modules for identifying named
entities (e.g., person names, locations, organizations) in text, which is essential for
tasks like information extraction and entity linking.
Text Classification: NLTK provides tools for building and evaluating text
classification models, which are used for tasks like sentiment analysis, spam
detection, and topic classification.
4.4 ANACONDA NAVIGATOR
The simplest way is with Spyder. From the Navigator Home tab, click Spyder, and
write and execute your code. You can also use Jupyter Notebooks the same way. Jupyter
Notebooks are an increasingly popular system that combine your code, descriptive text,
output, images and interactive interfaces into a single notebook file that is edited, viewed and
used in a web browser.
Features of Python
A simple language which is easier to learn, Python has a very simple and elegant
syntax. It much easier to read and write Python programs compared to other languages like:
C++, Java, C#. Python makes programming fun and allows you to focus on the solution rather
than syntax. If you are a newbie, it a great choice to start your journey with Python.
● Portability
You can move Python programs from one platform to another, and
run it without any changes. It runs seamlessly on almost all platforms including Windows,
Mac OS X and Linux.
● Object-oriented
Everything in Python is an object. Object oriented programming(OOP) helps you solve
a complex problem intuitively. With OOP, you are able to divide these complex problems into
smaller sets by creating object Python
History and Versions:
Python is predominantly a dynamic typed programming language which was initiated by
Guido van Rossum in the year 1989. The major design philosophy that was given more
importance was the readability of the code and expressing an idea in fewer lines of code rather
than the verbose way of expressing things as in C++ and Java [K-8][K-9]. The other design
philosophy that was worth mentioning was that, there should be always a single way and a
single obvious way to express a given task which is contradictory to other languages such as
C++, Perl etc. [K-10]. Python compiles to an intermediary code and this in turn is interpreted
by the Python Runtime Environment to the Native Machine Code. The initial versions of
Python were heavily inspired from lisp (for functional programming constructs). Python had
heavily borrowed the module system, exception model and also keyword arguments from
Modula-3 language [K-10]. Pythons’ developers strive not to entertain premature
optimization, even though it might increase the performance by a few basis points [K-9].
During its design, the creators had conceptualized the language as being a very extensible
language, and hence they had designed the language to have a small core library which
was extended by a huge standard library [K-7]. Thus as a result, python is used as a scripting
language as it can be easily embedded into any application, though it can be used to develop a
full-fledged application.
There are a few advantages in using a dynamic typed language, the most
prominent one would be that the code is more readable as there is less code (in other words
has less boiler-plate code). But the main disadvantage in having python as a dynamic
programming language is that there would be no way to guarantee that a particular piece of
code would run successfully for all the different data-types scenarios simply because it had
run successfully with one type. Basically, we don’t have any means to find out an error in the
code, till the code has started running. 1.5.4 Strengths and Weaknesses and Application Areas:
Python is predominantly used as a scripting language used in developing standalone
applications that are being developed with Static-Typed languages, because of the flexibility it
provides due to its dynamic typed nature. Python favours rapid application development,
which qualifies it to be used for prototyping. To a certain extent, Python is also used in
developing websites. Due to its dynamic typing and of the presence of a Virtual Machine,
there is a considerable overhead which translates to way less performance when we compare
with native programming languages [K-13]. And hence it is not suited.
4.6 NUMPY
NumPy is the fundamental package for scientific computing in Python. It is a Python
library that provides a multidimensional array object, various derived objects (such as masked
arrays and matrices), and an assortment of routines for fast operations on arrays, including
mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms,
basic linear algebra, basic statistical operations, random simulation and much more. At the
core of the NumPy package, is the and array object. This encapsulates n-dimensional arrays of
homogeneous data types, with many operations being performed in compiled code for
performance.
There are several important differences between NumPy arrays and the standard
Python sequences:
• NumPy arrays have a fixed size at creation, unlike Python lists (which can grow
dynamically). Changing the size of an array will create a new array and delete the original.
• The elements in a NumPy array are all required to be of the same data type, and thus will be
the same size in memory. The exception: one can have arrays of (Python, including NumPy)
objects, thereby allowing for arrays of different sized elements.
• NumPy arrays facilitate advanced mathematical and other types of operations on large
numbers of data. Typically, such operations are executed more efficiently and with less code
than is possible using Python’s built-in sequences.
• A growing plethora of scientific and mathematical Python-based packages are using NumPy
arrays; though these typically support Python sequence input, they convert such input to
NumPy arrays prior to processing, and they often output NumPy arrays.
In other words, in order to efficiently use much (perhaps even most) of today’s
scientific/mathematical Python-based software, just knowing how to use Python’s built-in
sequence types is insufficient - one also needs to know how to use NumPy arrays. The points
about sequence size and speed are particularly important in scientific computing. As a simple
example, consider the case of multiplying each element in a 1-D sequence with the
corresponding element in another sequence of the same length. If the data are stored in two
Python lists, a and b, we could iterate over each element:
• A concrete data set makes explaining the behaviour of some functions much easier to
motivate than simply talking about abstract operations on abstract data sets.
• Every reader will at least an intuition as to the meaning of the data and organization of
image files.
• The result of various manipulations can be displayed simply since the data set has a natural
graphical representation.
All users of NumPy, whether interested in image processing or not, are encouraged to
follow the tutorial with a working NumPy installation at their side, testing the examples, and,
more importantly, transferring the understanding gained by working on images to their
specific domain. The best way to learn is by doing – the aim of this tutorial
is to guide you along this “doing.”
NLTK (Natural Language Toolkit) is a Python library widely used for natural language
processing (NLP) tasks. It provides various tools and resources for tasks such as tokenization,
stemming, tagging, parsing, and semantic reasoning. NLTK is popular among researchers,
educators, and developers for its extensive collection of text processing modules and its ease
of use. It supports numerous corpora, lexical resources, and algorithms, making it a
valuable tool for analyzing and processing human language data in Python-based NLP
projects.
5. Modules
5.1 Pre-processing
Each sentence extracted from a given article is parsed using the LTP software.
Specifically, in this step our system performs word segmentation, part of speech tagging,
named entity recognition and dependency parsing and semantic role label parsing. This
information is essential for sentence simplification and question generation, described next.
In this stage, the simplified declarative sentences derived in stage 1 are transformed
into a set of questions based on predefined question generation rules showed in Table 2. A key
subtask of question generation is target content selection, i.e. what is the target content the
question is asking about. In our case, we identify answer phrases in the input declarative
sentence as potential targets for generating questions about. In English, a question is generated
by using an interrogative pronoun to replace the target answer phrase in the declarative
sentence. Unlike question generation in English, it does not require subject-auxiliary inversion
and verb decomposition. In this respect, the question generation process in English is simpler.
The previous stages generate questions that vary in their quality with respect to syntax,
semantics or importance. This is unavoidable and happens for different reasons, such as errors
in sentence parsing, named entity recognition, and sentence simplification. To address this
problem, ranking the large pool of questions according to their quality is needed. Stage 3 in
our method implements a learning to rank algorithm to meet this challenge.
6. ALGORITHMS AND METHODS USED
6.1 GENERAL
The algorithm used for obtaining accurate results for the project is machine learning.
MACHINE LEARNING
Machine Learning is a system that can learn from example through self-improvement and
without being explicitly coded by programmer. The breakthrough comes with the idea that a
machine can singularly learn from the data (i.e., example) to produce accurate results.
Machine learning combines data with statistical tools to predict an output. This output is then
used by corporate to makes actionable insights. Machine learning is closely related to data
mining and Bayesian predictive modelling. The machine receives data as input, use an
algorithm to formulate answers.
A typical machine learning tasks are to provide a recommendation. For those who have a
Netflix account, all recommendations of movies or series are based on the user's
historical data. Tech companies are using unsupervised learning to improve the user
experience with personalizing recommendation.
Machine learning is also used for a variety of task like fraud detection, predictive
maintenance, portfolio optimization, automatize task and so on.
Machine learning can be grouped into two broad learning tasks: Supervised and
Unsupervised. There are many other algorithms
SUPERVISED LEARNING
An algorithm uses training data and feedback from humans to learn the relationship of given
inputs to a given output. For instance, a practitioner can use marketing expense and weather
forecast as input data to predict the sales of cans. You can use supervised learning when the
output data is known. The algorithm will predict new data.
There are two categories of supervised learning:
● Classification task
● Regression task
Classification
Imagine you want to predict the gender of a customer for a commercial. You will start
gathering data on the height, weight, job, salary, purchasing basket, etc. from your customer
database. You know the gender of each of your customer, it can only be male or female. The
objective of the classifier will be to assign a probability of being a mall or a female (i.e., the
label) based on the information (i.e., features you have collected). When the model learned
how to recognize male or female, you can use new data to make a prediction. For instance,
you just got new information from an unknown customer, and you want to know if it is a male
or female. If the classifier predicts male = 70%, it means the algorithm is sure at 70% that this
customer is a male, and 30% it is a female. The label can be of two or more classes. The
above example has only two classes, but if a classifier needs to predict object, it has dozens or
classes (e.g., glass, table, shoes, etc. each object represents a class)
Regression
When the output is a continuous value, the task is a regression. For instance, a financial
analyst may need to forecast the value of a stock based on a range of feature like equity,
previous stock performances, macroeconomics index. The system will be trained to estimate
the price of the stocks with the lowest possible error.
UNSUPERVISED LEARNING
In unsupervised learning, an algorithm explores input data without being given an explicit
output variable (e.g., explores customer demographic data to identify patterns)