0% found this document useful (0 votes)
9 views32 pages

Documentation

The document outlines the development of a College Enquiry Chatbot using Flask, designed to facilitate college-related inquiries through an AI-driven interface. It aims to provide accurate information on admissions, courses, and campus facilities, enhancing user experience and reducing the workload on college administrators. The proposed system addresses the limitations of existing inquiry methods by offering a 24/7 accessible platform that utilizes natural language processing for personalized responses.

Uploaded by

jeevithababu207
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views32 pages

Documentation

The document outlines the development of a College Enquiry Chatbot using Flask, designed to facilitate college-related inquiries through an AI-driven interface. It aims to provide accurate information on admissions, courses, and campus facilities, enhancing user experience and reducing the workload on college administrators. The proposed system addresses the limitations of existing inquiry methods by offering a 24/7 accessible platform that utilizes natural language processing for personalized responses.

Uploaded by

jeevithababu207
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

College Enquiry Chatbot with Flask

ABSTRACT
“College Enquiry Chatbot with Flask” introduces an advanced chatbot designed to
streamline college-related inquiries. Utilizing Flask, an AI-driven platform integrates natural
language processing to offer accurate and accessible information to users.This interactive
interface aims to provide details on admission criteria, courses, campus facilities, and events.
This Chatbot serves as an intuitive and reliable resource for individuals seeking college-
related information, fostering informed decision-making for prospective students. The answers
are appropriate to what the user queries. The User can query any college-related activities
through the system. The user does not have to personally go to the college for enquiry. The
System analyses the question and then answers to the user. The chatbot’s development
involves AI algorithms within a user-friendly interface to enhance accessibility and simplify
the acquisition of precise and pertinent college details.
TABLE OF CONTENTS
CHAPTER TITLE PAGE NO
1. INTRODUCTION

1.1 GENERAL
1.2 OBJECTIVES
1.3 EXISTING SYSTEM
1.3.1 DISADVANTAGES OF EXISTING
SYSTEM
1.4 PROPOSED SYSTEM
1.4.1 ADVANTAGES OF PROPOSED
SYSTEM

2. SYSTEM REQUIREMENTS

2.1 GENERAL
2.2 HARDWARE REQUIREMENT
2.3 SOFTWARE REQUIREMENT

3. SYSTEM DESIGN

3.1 GENERAL
3.2 USE CASE DIAGRAM
3.3 DATA FLOW DIAGRAM
3.4 PROCESS FLOW DIAGRAM
3.5 SEQUENCE DIAGRAM
3.6 ACTIVITY DIAGRAM
3.7 COLLABORATION DIAGRAM

4. SOFTWARE SPECIFICATION

4.1 GENERAL

5. MODULES

6. ALGORITHMS AND METHODS USED

6.1 GENERAL
7. TESTING

8. DATABASE IMPLEMENTATION
9. INTERFACE

10. SOURCE CODE

11.

12.
1. INTRODUCTION
1.1GENERAL

College Enquiry Chatbot is for college students, staff, and parents. Easy way to
interaction and time consuming. This project is mainly targeted at colleges and the
synchronization of all the sparse and diverse information regarding regular college schedule.
Generally, students face problems in getting correct notifications at the correct time,
sometimes important notices such as campus interview, training and placement events,
holidays, and special announcements. Smart Campus tries to bridge this gap between
students, teachers, and college administrators.

The College bot project is built using artificial algorithms that analyses user’s queries
and understand user’s message. This System is a web application which provides answer to
the query of the student. Students just must query through the bot which is used for chatting.
Students can chat using any format there is no specific format the user has to follow. The
System uses built in artificial intelligence to answer the query. The answers are appropriate
what the user queries. The User can query any college related activities through the system.
The user does not have to personally go to the college for enquiry. The System analyses the
question and then answers to the user. The system answers to the query as if it is answered by
the person. With the help of artificial intelligence, the system answers the query asked by the
students.

1.2 OBJECTIVE

This project provides many features like

 Provide Instant Responses: Offer quick and accurate responses to


common queries from students, parents and staff.
 Streamline Enquiries: Reduce the workload of college administrators by
automating responses to frequent enquiries.
 Enhance User Experience: Provide a user-friendly interface for users to
interact with the college, improving overall satisfaction.

1.3 EXISTING SYSTEM

The current system for college inquiries involves manual browsing of college websites
or direct communication with admission offices for information. Some colleges use basic
chatbot, but these systems often lack comprehensive capabilities. Overall, the existing process
relies on manual navigation and human interaction, which can be time-consuming and less
interactive for individuals seeking college-related details.
1.3.1 DISADVANTAGES OF EXISTING SYSTEM

There are some disadvantages of existing system. They are,

 Manual Navigation
 Limited Interaction
 Dependency on Human Assistance
 Limited Accessibility

1.4 PROPOSED SYSTEM

The proposed system introduces an AI-powered chatbot using Flask to transform


college inquiries. It focuses on providing an interactive, 24/7 accessible platform capable of
delivering personalized and accurate information about admission, courses, facilities, and
events. By leveraging AI, ML and NLP, it aims to streamline the process, enhance user
experience, and offer timely and relevant details for informed decision-making about colleges
and courses.

1.4.1 ADVANTAGES OF PROPOSED SYSTEM

There are some advantages in the proposed system. They are,

 Real-Time and Accurate Information


 Interactive User Experience
 Personalized Responses
 Continuous Improvement
2. SYSTEM REQUIREMENTS
2.1 GENERAL

To be used efficiently, all projects need certain hardware components or other


software resources to be present on a computer. It is commonly known use system
requirements. There are two types of requirement. They are,

1. Hardware Requirements

2. Software Requirements

2.2 HARDWARE REQUIREMENTS

The hardware requirements may serve as the basis for a contract for the
implementation of the system and should therefore be a complete and consistent specification
of the whole system. They are used by software engineers as the starting point for the system
design. It shows what the systems do and not how it should be implemented.

1. Windows 7,8,10 64 bit


2. RAM 4GB

2.3 SOFTWARE REQUIREMENTS

The software requirements are the specification of the system. It should include
both a definition and a specification of requirements. It is a set of what the system should do
rather than how it should do it. The software requirements provide a basis for creating the
software requirements specification. It is useful in estimating cost, planning team activities,
performing tasks and tracking the team’s and tracking the team’s progress throughout the
development activity.

1. Data Set
2. Python
3. SYSTEM DESIGN
3.1 GENERAL

System Design deals with the various UML [Unified Modelling language] diagrams
for the implementation of project. Design is a meaningful engineering representation of a
thing that is to be built. Software design is a process through which the requirements are
translated into representation of the software. Design is the place where quality is rendered in
software engineering. Design is the means to accurately translate customer requirements into
finished product.

System design is the process of defining the components, modules, interfaces, and data
for a system to satisfy specified requirements. System development is the process of creating
or altering systems, along with the processes, practices, models, and methodologies used to
develop them. System Requirements are the necessary specifications your computer must
have in order to use the software or hardware.

 Architectural design

The architectural design of a system emphasizes the design of the system architecture
that describes the structure, behaviour and more views of that system and analysis.

 Logical design

The logical design of a system pertains to an abstract representation of the data flows,
inputs and outputs of the system. This is often conducted via modelling, using an overabstract
(and sometimes graphical) model of the actual system. In the context of systems, designs are
included. Logical design includes entity-relationship diagrams (ER diagrams).

 Physical design

The physical design relates to the actual input and output processes of the system.
This is explained in terms of how data is input into a system, how it is verified or
authenticated, how it is processed, and how it is displayed.

In physical design, the following requirements about the system are decided.

1. Input requirement,
2. Output requirements,
3. Storage requirements,
4. Processing requirements,
5. System control and backup or recovery.

Thus, the physical portion of system design can generally be broken down into three subtasks:
1. User Interface Design
2. Data Design
3. Process Design

 User Interface Design is concerned with how users add information to the system
and with how the system presents information back to them.
 Data Design is concerned with how the data is represented and stored within the
system.
 Process Design is concerned with how data moves through the system, and with
how
and where it is validated, secured and/or transformed as it flows into, through and out
of the system.

3.2 USE CASE DIAGRAM

Use case diagram overview the usage requirement for system they are useful for
presentations to management and/or project stakeholders, but for actual development you will
find that use cases provide significantly more value because they describe the meant of the
actual requirements. A use case describes a sequence of action that provide something of
measurable value to an action and is drawn as a horizontal ellipse.

A key concept of use case modelling is that it helps us design a system from the end
user's perspective. It is an effective technique for communicating system behaviour in the
user's terms by specifying all externally visible system behaviour.

A Use case Diagram is used to present a graphical overview of the functionality


provided by a system in terms of actors, their goals and any dependencies between those use
cases.

Use case diagram consists of two parts:

Use case: A use case describes a sequence of actions that provided something of measurable
value to an actor and is drawn as a horizontal ellipse.

Actor: An actor is a person, organization or external system that plays a role in one or more
interaction with the system.
Use Case Diagram

3.2 DATA FLOW DIAGRAM

A data-flow diagram (DFD) is a way of representing a flow of a data of process or a


system (usually an information system). The DFD also provides information about the
outputs and inputs of each entity and the process itself. A data-flow diagram has no control
flow , there are no decision rules and no loops. Specific operations based on the data can be
represented by a flowchart. The data-flow diagram is part of the structured-analysis modelling
tools. When using UML, the activity diagram typically takes over the role of the data- flow
diagram. A special form of data-flow plan is a site-oriented data-flow plan.

Level 0 of Data Flow Diagram

Level 1 of Data Flow Diagram

In a DFD, processes transform input data into output, which is then sent to other processes or
stored in data repositories. The diagram helps to understand the system’s functionality and the
relationship between different entities, processes, and data stores. DFDs are used to analyze
and design systems, providing a clear map of how data is processed and exchanged within the
system.

3.4 PROCESS FLOW DIAGRAM

It is also called as process flowchart, process flow chart.


Variations: macro flowchart, top-down flowchart, detailed flowchart (also called process map,
micro map, service map, or symbolic flowchart), deployment flowchart (also called down-
across or cross-functional flowchart), several-level flowchart. A flowchart is a picture of the
separate steps of a process in sequential order.

Elements that may be included are: sequence of actions, materials or services entering or
leaving the process (inputs and outputs), decisions that must be made, people who become
involved, time involved at each step and/or process measurements.

The process described can be anything: a manufacturing process, an administrative or service


process, a project plan. This is a generic tool that can be adapted for a wide variety of
purposes.

WHEN TO USE PROCESS FLOW CHART

• To develop understanding of how a process is done.


• To study a process for improvement.
• To communicate to others how a process is done.
• When better communication is needed between people involved with the same
process.
• To document a process.

COMMONLY USED SYMBOLS

The most commonly used symbol in a process flow diagram are,


Meaning Symbol Description
Start or End An elongated circle represents the start or
end of a process.

Step/Flow-line Represents direction of flow/process from


one step to another.

Process/Operation Rectangle/square box shows


instructions/actions/activity.

Decision Diamond box represents


decision on particular activity.

Storage Represents storage of material/parts.

Delay/Wait Represents delay in


operation/process/activity.
Document Represents supportive documents required.

PROCESS FLOW DIAGRAM


3.5 SEQUENCE DIAGRAM

Sequence diagram model the flow of logic within your system in a visual manner,
enabling you both to document and validate your logic, and commonly used for both analysis
and design purpose. Sequence diagram are the most popular UML for dynamic modelling,
which focuses on identifying the behaviour within your system.

Sequence Flow Diagram

A sequence diagram shows object interactions arranged in time sequence. It depicts


the objects and classes involved in the scenario and the sequence of messages exchanged
between the objects needed to carry out the functionality of the scenario. Sequence diagrams
are typically associated with use case realizations in the Logical View of system under
development. Sequence diagrams are sometimes called event diagrams or event scenarios.

A sequence diagram shows, as parallel vertical lines, different processes or objects


that live simultaneously, and, as horizontal arrows, the messages exchanged between them, in
the order in which they occur. This allows the specification of simple runtime scenarios in a
graphical manner.
3.6 ACTIVITY DIAGRAM

Activity diagram is a graphical representation of work flows of stepwise activities and


actions with support for choice, iteration and concurrency. An activity diagram shows the
overall flow of control.

The most important shape types:

• Rounded rectangles represent activities.

• Diamonds represent decisions.

• Bars represent the start or end of concurrent activities.

• A black circle represents the start of the work flow.

• An encircled circle represents the end of the work flow.


3.7 COLLABORATION DIAGRAM
UML Collaboration Diagrams illustrate the relationship and interaction between
software objects. They require use cases, system operation contracts and domain model to
already exist. The collaboration diagram illustrates messages being sent between classes and
objects.
4. SOFTWARE SPECIFICATION
4.1 GENERAL

The chatbot assists students and visitors with inquiries related to admissions, courses, fees,
faculty, campus facilities, and more. It will be built using Flask (a Python web framework)
and integrate with a database and possibly an NLP engine.

4.2 PYTHON

Python is a high-level, interpreted, interactive and object-oriented scripting language.


Python is designed to be highly readable. It uses English keywords frequently where as other
languages use punctuation, and it has fewer syntactical constructions than other languages.

 Python is Interpreted: Python is processed at runtime by the interpreter. You do not need to
compile your program before executing it. This is similar to PERL and PHP.
 Python is Interactive: You can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.
 Python is Object-Oriented: Python supports Object-Oriented style ortechnique of
programming that encapsulates code within objects.
 Python is a Beginner's Language: Python is a great language for the beginner-level
programmers and supports the development of a wide range of applications from simple text
processing to WWW browsers to games.

History of Python

Python was developed by Guido van Rossum in the late eighties and early nineties at
the National Research Institute for Mathematics and Computer Science in the Netherlands.
Python is derived from many other languages, including ABC, Modula-3, C, C++, Algol-68,
Small Talk, Unix shell, and other scripting languages. Python is copyrighted. Like Perl,
Python source code is now available under the GNU General Public License (GPL).Python is
now maintained by a core development team at the institute, although Guido van Rossum still
holds a vital role in directing its progress.

Python Features

Python features include:

 Easy-to-learn: Python has few keywords, simple structure, and a clearly defined syntax.
This allows the student to pick up the language quickly.
 Easy-to-read: Python code is more clearly defined and visible to the eyes.
 Easy-to-maintain: Python's source code is fairly easy-to-maintain.
 A broad standard library: Python's bulk of the library is very portable and cross-
platform compatible on UNIX, Windows, and Macintosh.
 Interactive Mode: Python has support for an interactive mode which allows interactive
testing and debugging of snippets of code.
 Portable: Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.
 Extendable: You can add low-level modules to the Python interpreter. These modules
enable programmers to add to or customize their tools to be more efficient.
 Databases: Python provides interfaces to all major commercial databases.
 GUI Programming: Python supports GUI applications that can be created and ported to
many system calls, libraries, and windows systems, such as Windows MFC, Macintosh, and
the X Window system of Unix.
 Scalable: Python provides a better structure and support for large programs than shell
scripting.

Apart from the above-mentioned features, Python has a big list of good features, few are
listed below:
 IT supports functional and structured programming methods as well as OOP.
 It can be used as a scripting language or can be compiled to byte-code for building large
applications.
 It provides very high-level dynamic data types and supports dynamic type checking.
 IT supports automatic garbage collection.
 It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.

4.3 PYTHON ENVIRONMENT

Python is available on a wide variety of platforms including Linux and Mac OS X. Let
understand how to set up our Python environment.

4.3.1 Flask:

Flask is a lightweight and versatile web framework for Python, known for its
simplicity and flexibility. It provides developers with the tools to build web applications
quickly and efficiently. Flask follows the WSGI (Web Server Gateway Interface) protocol and
is based on the Werkzeug WSGI toolkit and the Jinja2 template engine. With Flask,
developers can create web applications ranging from simple APIs to complex full-fledged web
applications. Its minimalistic design allows for easy customization and integration with other
Python libraries and frameworks, making it a popular choice for web development projects of
all sizes.

The basic structure of a Flask application typically includes:

Importing Flask: Start by importing the Flask class from the flask package. python
from flask import Flask

Creating an instance: Create an instance of the Flask class. This instance will be
the WSGI application.
Python app = Flask(__name__)
Defining routes: Define routes to specify how the application responds to different URLs.
Routes are defined using the @app. route() decorator.
python
@app. route(‘/’)
def index():
return ‘Hello, World!’;

Running the application: Finally, run the Flask application using the run() method.
python
if __name__ ==’__main__’:
app. run(debug=True)

Putting it all together, a basic Flask application structure looks like this:
Python from flask import Flask
app = Flask(__name__)
@app. route(‘/’)
def index():
return ’Hello, World!’;
if __name__ == ‘__main__’:
app. run(debug=True)

This structure creates a simple Flask application with a single route (/) that returns the string
" Hello, World!" when accessed. The debug=True parameter in app. run() enables
debug mode, which provides helpful error messages during development.

4.3.2 NLP:
Natural Language Processing (NLP) is a field of artificial intelligence (AI) that
focuses on the interaction between computers and human language. It involves the
development of algorithms and models that enable computers to understand,
interpret, and generate human language in a meaningful way.

NLP encompasses a wide range of tasks, including:

Text Understanding: Extracting meaning from unstructured text data, such as sentiment
analysis, named entity recognition, and topic modeling.

Text Generation: Creating human-like text, such as language translation, text summarization,
and chatbots.

Text Classification: Categorizing text documents into predefined categories or labels, such as
spam detection, sentiment analysis, and topic classification.

Information Extraction: Identifying and extracting structured information from unstructured


text data, such as extracting named entities, relationships, and events.
Speech Recognition: Converting spoken language into text, enabling voice- controlled
applications and virtual assistants.

NLP techniques often involve a combination of machine learning, deep learning, and
linguistic principles to analyze and understand human language. With the growing volume of
textual data available online, NLP plays a crucial role in various applications, including search
engines, social media analysis, customer service automation, healthcare, finance, and more.
Advances in NLP have led to significant improvements in the accuracy and efficiency of
language-related tasks, driving innovation in many industries.

4.3.3 NLTK:
NLTK, or the Natural Language Toolkit, is a leading platform for building Python
programs to work with human language data. It provides easy-to-use interfaces and
libraries for tasks such as tokenization, stemming, tagging, parsing, and semantic
reasoning. Developed by researchers and educators in the field of computational
linguistics, NLTK is widely used in academia and industry for teaching, research,
and development of NLP applications.

Key features of NLTK include:

Corpora: NLTK includes a vast collection of corpora, or linguistic data sets,


covering various languages and domains. These corpora serve as valuable
resources for training and testing NLP models.

Tokenization: NLTK offers tools for breaking text into tokens, such as words or
sentences, facilitating further analysis and processing.

Part-of-speech Tagging: NLTK provides pre-trained models for tagging words


with their respective parts of speech (e.g., noun, verb, adjective), enabling syntactic
analysis and feature extraction.

Parsing: NLTK supports syntactic parsing, allowing users to analyze the


grammatical structure of sentences and extract syntactic dependencies.

Named Entity Recognition (NER): NLTK includes modules for identifying named
entities (e.g., person names, locations, organizations) in text, which is essential for
tasks like information extraction and entity linking.

WordNet Integration: NLTK integrates WordNet, a lexical database of English


words and their semantic relationships, enabling tasks such as synonym detection,
word sense disambiguation, and semantic similarity computation.

Text Classification: NLTK provides tools for building and evaluating text
classification models, which are used for tasks like sentiment analysis, spam
detection, and topic classification.
4.4 ANACONDA NAVIGATOR

Anaconda Navigator is a desktop graphical user interface (GUI) included in Anaconda


distribution that allows you to launch applications and easily manage Anaconda packages,
environments and channels without using command-line commands. Navigator can search for
packages on Anaconda Cloud or in a local Anaconda Repository. It is available for
Windows, mac OS and Linux.

Why use Navigator?


In order to run, many scientific packages depend on specific versions of other
packages. Data scientists often use multiple versions of many packages, and use multiple
environments to separate these different versions.
The command line program Anaconda is both a package manager and an environment
manager, to help data scientists ensure that each version of each package has all the
dependencies it requires and works correctly.
Navigator is an easy, point-and-click way to work with packages and environments
without needing to type Anaconda commands in a terminal window. You can use it to find the
packages you want, install them in an environment, run the packages and update them, all
inside Navigator.

WHAT APPLICATIONS CAN I ACCESS USING NAVIGATOR?

The following applications are available by default in Navigator:


● JupyterLab
● Jupyter Notebook
● QTConsole
● Spyder
● VSCode
● Glueviz
● Orange 3 App
● Rodeo
● RStudio
Advanced Anaconda users can also build your own Navigator applications

How can I run code with Navigator?

The simplest way is with Spyder. From the Navigator Home tab, click Spyder, and
write and execute your code. You can also use Jupyter Notebooks the same way. Jupyter
Notebooks are an increasingly popular system that combine your code, descriptive text,
output, images and interactive interfaces into a single notebook file that is edited, viewed and
used in a web browser.

What’s new in 1.9?


● Add support for Offline Mode for all environment related actions.
● Add support for custom configuration of main windows links.
● Numerous bug fixes and performance enhancements.
4.5 PYTHON
Python is a general-purpose, versatile and popular programming language. It great as a
first language because it is concise and easy to read, and it is also a good language to have in
any programmer stack a it can be used for everything from web development to software
development and scientific applications. It has simple easy-to-use syntax, making it the
perfect language for someone trying to learn computer programming for the first time.

Features of Python

A simple language which is easier to learn, Python has a very simple and elegant
syntax. It much easier to read and write Python programs compared to other languages like:
C++, Java, C#. Python makes programming fun and allows you to focus on the solution rather
than syntax. If you are a newbie, it a great choice to start your journey with Python.

● Free and open source


You can freely use and distribute Python, even for commercial use.Not only can you
use and distribute software’s written in it, you can even make changes to the Python's
source code. Python has a large community constantly improving it in each iteration.

● Portability
You can move Python programs from one platform to another, and
run it without any changes. It runs seamlessly on almost all platforms including Windows,
Mac OS X and Linux.

● Extensible and Embeddable


Suppose an application requires high performance. You can easily combine pieces of
C/C++ or other languages with Python code. This will give your application high performance
as well as scripting capabilities which other languages may not provide out of the box.

● A high-level, interpreted language


Unlike C/C++, you don’t have to worry about daunting tasks like memory
management, garbage collection and so on. Likewise, when you run Python code, it
automatically converts your code to the language your computer understands. You dont
need to worry about any lower level operations.

● Large standard libraries to solve common tasks


Python has a number of standard libraries which makes life of a programmer much
easier since you don't have to write all the code yourself. For example: Need to connect
MySQL database on a Web server You can use MySQL db library using import MySQL
db Standard libraries in Python are well tested and used by hundreds of people. So you can be
sure that it won break your application.

● Object-oriented
Everything in Python is an object. Object oriented programming(OOP) helps you solve
a complex problem intuitively. With OOP, you are able to divide these complex problems into
smaller sets by creating object Python
History and Versions:
Python is predominantly a dynamic typed programming language which was initiated by
Guido van Rossum in the year 1989. The major design philosophy that was given more
importance was the readability of the code and expressing an idea in fewer lines of code rather
than the verbose way of expressing things as in C++ and Java [K-8][K-9]. The other design
philosophy that was worth mentioning was that, there should be always a single way and a
single obvious way to express a given task which is contradictory to other languages such as
C++, Perl etc. [K-10]. Python compiles to an intermediary code and this in turn is interpreted
by the Python Runtime Environment to the Native Machine Code. The initial versions of
Python were heavily inspired from lisp (for functional programming constructs). Python had
heavily borrowed the module system, exception model and also keyword arguments from
Modula-3 language [K-10]. Pythons’ developers strive not to entertain premature
optimization, even though it might increase the performance by a few basis points [K-9].
During its design, the creators had conceptualized the language as being a very extensible
language, and hence they had designed the language to have a small core library which
was extended by a huge standard library [K-7]. Thus as a result, python is used as a scripting
language as it can be easily embedded into any application, though it can be used to develop a
full-fledged application.

The reference implementation of python is CPython. There are also other


implementations like Jython, Iron Python which can use python syntax as well as can use any
class of Java (Jython) or .Net class (Iron Python). Versions: Python has two versions 2.x
version and 3.x version.The 3.x version is a backward incompatible release was released to fix
many design issues which plagued the 2.x series. The latest in the 2.x series is 2.7.6 and the
latest in 3.x series is 3.4.0. 1.5.2 Paradigms:

Python supports multi-paradigms such as: Object-Oriented, Imperative, Functional,


Procedural, and Reflective. In Object-Oriented Paradigm, Python supports most of the OOPs
concepts such as Inheritance (It also has support for Multiple Inheritance), Polymorphism but
its lack of support for encapsulation is a blatant omission as Python doesn’t have private,
protected members: all class members are public [K-11]. Earlier Python 2.6 versions didn’t
support some OOP’s concepts such as Abstraction through Interfaces and Abstract Classes [K-
19]. It also supports Concurrent paradigm, but with Python we will not be able to make truly
multitasking applications as the inbuilt threading API is limited by GIL (Global Interpreter
Lock) and hence applications that use the threading API cannot run on multi-core parallelly
[K-12].The only remedy is that, the user has to either use the multi-processing module
which would fork processes or use Interpreters that haven’t implemented GIL such as Jython
or Iron Python [K-12]. 1.5.3 Compilation, Execution and Memory Management: Compilation,
Execution and Memory Management: 21 A Comparative Studies of Programming Languages
(Comparative Studies of Six Programming Language) Just like the other Managed Languages,
Python compiles to an intermediary code and this in turn is interpreted by the Python Runtime
Environment to the Native Machine Code. The reference implementation (i.e. CPython)
doesn’t come with a JIT compiler because of which the execution speed is slow compared to
native programming languages [K-17]. We can use PyPy interpreter as it includes a JIT
compiler rather than using the Python interpreter that comes by default with the python
language, if speed of execution is one of the important factors [K-18]. The Python Runtime
Environment also takes care of all the allocation and deallocation of memory through the
Garbage Collector. When a new object is created, the GC allocates the necessary memory, and
once the object goes out of its scope, the GC doesn’t release memory immediately but instead
it becomes eligible for Garbage Collection, which would eventually release the memory.
Typing Strategies: Python is a strongly dynamic typed language. Python 3 also supports
optional static typing [K-20].

There are a few advantages in using a dynamic typed language, the most
prominent one would be that the code is more readable as there is less code (in other words
has less boiler-plate code). But the main disadvantage in having python as a dynamic
programming language is that there would be no way to guarantee that a particular piece of
code would run successfully for all the different data-types scenarios simply because it had
run successfully with one type. Basically, we don’t have any means to find out an error in the
code, till the code has started running. 1.5.4 Strengths and Weaknesses and Application Areas:
Python is predominantly used as a scripting language used in developing standalone
applications that are being developed with Static-Typed languages, because of the flexibility it
provides due to its dynamic typed nature. Python favours rapid application development,
which qualifies it to be used for prototyping. To a certain extent, Python is also used in
developing websites. Due to its dynamic typing and of the presence of a Virtual Machine,
there is a considerable overhead which translates to way less performance when we compare
with native programming languages [K-13]. And hence it is not suited.

4.6 NUMPY
NumPy is the fundamental package for scientific computing in Python. It is a Python
library that provides a multidimensional array object, various derived objects (such as masked
arrays and matrices), and an assortment of routines for fast operations on arrays, including
mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms,
basic linear algebra, basic statistical operations, random simulation and much more. At the
core of the NumPy package, is the and array object. This encapsulates n-dimensional arrays of
homogeneous data types, with many operations being performed in compiled code for
performance.

There are several important differences between NumPy arrays and the standard
Python sequences:

• NumPy arrays have a fixed size at creation, unlike Python lists (which can grow
dynamically). Changing the size of an array will create a new array and delete the original.

• The elements in a NumPy array are all required to be of the same data type, and thus will be
the same size in memory. The exception: one can have arrays of (Python, including NumPy)
objects, thereby allowing for arrays of different sized elements.
• NumPy arrays facilitate advanced mathematical and other types of operations on large
numbers of data. Typically, such operations are executed more efficiently and with less code
than is possible using Python’s built-in sequences.

• A growing plethora of scientific and mathematical Python-based packages are using NumPy
arrays; though these typically support Python sequence input, they convert such input to
NumPy arrays prior to processing, and they often output NumPy arrays.

In other words, in order to efficiently use much (perhaps even most) of today’s
scientific/mathematical Python-based software, just knowing how to use Python’s built-in
sequence types is insufficient - one also needs to know how to use NumPy arrays. The points
about sequence size and speed are particularly important in scientific computing. As a simple
example, consider the case of multiplying each element in a 1-D sequence with the
corresponding element in another sequence of the same length. If the data are stored in two
Python lists, a and b, we could iterate over each element:

The Numeric Python extensions (NumPy henceforth) is a set of extensions to the


Python programming language which allows Python programmers to efficiently manipulate
large sets of objects organized in grid-like fashion. These sets of objects are called arrays, and
they can have any number of dimensions: one dimensional arrays are similar to standard
Python sequences, two-dimensional arrays are similar to matrices from linear algebra.
Note that one-dimensional arrays are also different from any other Python sequence, and that
two-dimensional matrices are also different from the matrices of linear algebra, in ways which
we will mention later in this text. Why are these extensions needed? The core reason is a very
prosaic one, and that is that manipulating a set of a million numbers in Python with the
standard data structures such as lists, tuples or classes is much too slow and uses too much
space. Anything which we can do in NumPy we can do in standard Python – we just may not
be alive to see the program finish. A more subtle reason for these extensions however is that
the kinds of operations that programmers typically want to do on arrays, while sometimes
very complex, can often be decomposed into a set of fairly standard operations. This
decomposition has been developed similarly in many array languages. In some ways, NumPy
is simply the application of this experience to the Python language – thus many of the
operations described in NumPy work the way they do because experience has shown that way
to be a good one, in a variety of contexts. The languages which were used to guide the
development of NumPy include the infamous APL family of languages, Basis, MATLAB,
FORTRAN, S and S+, and others. This heritage will be obvious to users of NumPy who
already have experience with these other languages.
This tutorial, however, does not assume any such background, and all that is
expected of the reader is a reasonable working knowledge of the standard Python language.
This document is the “official” documentation for NumPy. It is both a tutorial and the most
authoritative source of information about NumPy with the exception of the source code. The
tutorial material will walk you through a set of manipulations of simple, small, arrays of
numbers, as well as image files. This choice was made because:

• A concrete data set makes explaining the behaviour of some functions much easier to
motivate than simply talking about abstract operations on abstract data sets.
• Every reader will at least an intuition as to the meaning of the data and organization of
image files.
• The result of various manipulations can be displayed simply since the data set has a natural
graphical representation.

All users of NumPy, whether interested in image processing or not, are encouraged to
follow the tutorial with a working NumPy installation at their side, testing the examples, and,
more importantly, transferring the understanding gained by working on images to their
specific domain. The best way to learn is by doing – the aim of this tutorial
is to guide you along this “doing.”

4.7 Library Used:

NLTK (Natural Language Toolkit) is a Python library widely used for natural language
processing (NLP) tasks. It provides various tools and resources for tasks such as tokenization,
stemming, tagging, parsing, and semantic reasoning. NLTK is popular among researchers,
educators, and developers for its extensive collection of text processing modules and its ease
of use. It supports numerous corpora, lexical resources, and algorithms, making it a
valuable tool for analyzing and processing human language data in Python-based NLP
projects.
5. Modules
5.1 Pre-processing
Each sentence extracted from a given article is parsed using the LTP software.
Specifically, in this step our system performs word segmentation, part of speech tagging,
named entity recognition and dependency parsing and semantic role label parsing. This
information is essential for sentence simplification and question generation, described next.

5.2 Sentence Simplification


In our approach, a set of transformation operations derive a simpler form of the source
sentence by removing parentheses (The elements in a sentence which function as the
explanatory or qualifying remarks and have no clear dependent relations with the other
constituents of a sentence.), adverbial modifiers between commas and phrase types such as
sentence-level modifying phrases (e.g. manner adverb). But, in some cases, we keep some
adverbial modifiers if they contain information about a person name, place, number, and time
because this information can generate potential questions.

5.3 Question Transformation

In this stage, the simplified declarative sentences derived in stage 1 are transformed
into a set of questions based on predefined question generation rules showed in Table 2. A key
subtask of question generation is target content selection, i.e. what is the target content the
question is asking about. In our case, we identify answer phrases in the input declarative
sentence as potential targets for generating questions about. In English, a question is generated
by using an interrogative pronoun to replace the target answer phrase in the declarative
sentence. Unlike question generation in English, it does not require subject-auxiliary inversion
and verb decomposition. In this respect, the question generation process in English is simpler.

5.4 Question Ranking

The previous stages generate questions that vary in their quality with respect to syntax,
semantics or importance. This is unavoidable and happens for different reasons, such as errors
in sentence parsing, named entity recognition, and sentence simplification. To address this
problem, ranking the large pool of questions according to their quality is needed. Stage 3 in
our method implements a learning to rank algorithm to meet this challenge.
6. ALGORITHMS AND METHODS USED
6.1 GENERAL
The algorithm used for obtaining accurate results for the project is machine learning.

MACHINE LEARNING
Machine Learning is a system that can learn from example through self-improvement and
without being explicitly coded by programmer. The breakthrough comes with the idea that a
machine can singularly learn from the data (i.e., example) to produce accurate results.
Machine learning combines data with statistical tools to predict an output. This output is then
used by corporate to makes actionable insights. Machine learning is closely related to data
mining and Bayesian predictive modelling. The machine receives data as input, use an
algorithm to formulate answers.
A typical machine learning tasks are to provide a recommendation. For those who have a
Netflix account, all recommendations of movies or series are based on the user's
historical data. Tech companies are using unsupervised learning to improve the user
experience with personalizing recommendation.
Machine learning is also used for a variety of task like fraud detection, predictive
maintenance, portfolio optimization, automatize task and so on.
Machine learning can be grouped into two broad learning tasks: Supervised and
Unsupervised. There are many other algorithms

SUPERVISED LEARNING
An algorithm uses training data and feedback from humans to learn the relationship of given
inputs to a given output. For instance, a practitioner can use marketing expense and weather
forecast as input data to predict the sales of cans. You can use supervised learning when the
output data is known. The algorithm will predict new data.
There are two categories of supervised learning:
● Classification task
● Regression task
Classification
Imagine you want to predict the gender of a customer for a commercial. You will start
gathering data on the height, weight, job, salary, purchasing basket, etc. from your customer
database. You know the gender of each of your customer, it can only be male or female. The
objective of the classifier will be to assign a probability of being a mall or a female (i.e., the
label) based on the information (i.e., features you have collected). When the model learned
how to recognize male or female, you can use new data to make a prediction. For instance,
you just got new information from an unknown customer, and you want to know if it is a male
or female. If the classifier predicts male = 70%, it means the algorithm is sure at 70% that this
customer is a male, and 30% it is a female. The label can be of two or more classes. The
above example has only two classes, but if a classifier needs to predict object, it has dozens or
classes (e.g., glass, table, shoes, etc. each object represents a class)
Regression
When the output is a continuous value, the task is a regression. For instance, a financial
analyst may need to forecast the value of a stock based on a range of feature like equity,
previous stock performances, macroeconomics index. The system will be trained to estimate
the price of the stocks with the lowest possible error.

Algorithm Name Description Type

Linear regression Finds a way to correlate each Regression


feature to the output to help
predict future values.

Logistic regression Extension of linear Classification


regression that's used
for classification tasks. The
output variable 3 is
binary(e.g., only black or
white) rather than continuous
(e.g., an infinite list of
potential colors)

Decision tree Highly interpretable Regression Classification


classification or regression
model that splits data-feature
values into branches at
decision nodes (e.g., if a
feature is a color, each
possible color becomes a
new branch) until a final
decision output is made

Naïve Bayes The Bayesian method is a Regression Classification


classification method that
makes use of the Bayesian
theorem. The theorem
updates the prior knowledge
of an event with the
independent probability of
each feature that can affect
the event.

Support Vector machine Support Vector Machine, or Regression (not very


SVM, is typically used for common)Classification
the classification task. SVM
algorithm finds a hyperplane
that optimally divided the
classes. It is best used with a
non-linear solver.

Random forest The algorithm is built upon a Regression Classification


decision tree to improve the
accuracy drastically.
Random forest generates
many times simple decision
trees and uses the
'majority vote'
method to decide on which
label to return. For the
classification task, the final
prediction will be the one
with the most vote; while for
the regression task, the
average prediction of all the
trees is the final prediction.

Ada Boost Classification or regression Regression Classification


technique that uses a
multitude of models to come
up with a decision but
weighs them based on their
accuracy in predicting the
outcome

UNSUPERVISED LEARNING

In unsupervised learning, an algorithm explores input data without being given an explicit
output variable (e.g., explores customer demographic data to identify patterns)

Algorithm Description Type

K-means clustering Puts data into some groups Clustering


(k) that each contains data
with similar characteristics
(as determined by the model,
not in advance by humans)

Gaussian mixture model A generalization of k-means Clustering


clustering that provides
more flexibility in the size
and shape of groups

Hierarchical clustering Splits clusters along a Clustering


hierarchical tree to form a
classification system. Can be
used for Cluster loyalty-card
customer

Recommender system Help to define the relevant Clustering


data for making a
recommendation.

PCA/T-SNE Mostly used to decrease the Dimension Reduction


dimensionality of the data.
The algorithms reduce the
number of features to 3 or 4
vectors with the highest
variances.

You might also like