Major Project Doc)
Major Project Doc)
Mr. S. MUTHULINGAM
Assistant Professor, IT
TO
Jawaharlal Nehru Technological University
Hyderabad
In partial fulfilment of the requirements for award of degree
of BACHELOR OF TECHNOLOGY
IN
INFORMATION TECHNOLOGY
2022-2023
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
PROJECT CERTIFICATE
This is to certify that this is the certificate of an industrial oriented major project report titled
“BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHNIQUES” submitted by
DEEPAK SHARMA (19W91A1215), M.SRINIVAS PRASAD (19W91A1239), J.SAI SHREEJA
(19W91A1225), G.RAGHU (19W91A1218) of B. Tech in the
partial fulfillment of the requirements for the degree of Bachelor of Technology in Information
Technology, Department of Information Technology and this has not been submitted for the award of
any other degree of this institution.
DECLARATION
We hereby declare that the entitled industrial oriented major project report
“ BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHNIQUES ”
submitted to Malla Reddy Institute of Engineering and Technology, affiliated to Jawaharlal Nehru
Technological University Hyderabad (JNTUH), for the award of the degree of Bachelor of
Technology in Information Technology is a result of original industrial oriented minor project done
by us. It is further declared that the minor project report or any part thereof has not been previously
submitted to any University or Institute for the award of degree or diploma.
ACKNOWLEDGEMENT
First and foremost, We are grateful to the principal Dr.M Ashok for providing us
with all the resources in the college to make our project a success. I thank him for his
valuable suggestions at the time of seminars which encouraged us to give our best in the
project.
We would also like to thank all the supporting staff of the department of IT and all
other departments who have been helpful directly or indirectly in making the project a
success. We are extremely grateful to our parents for their blessings and prayers for our
completion of project. This gave us strength to do our project better.
G. RAGHU 19W91A1218
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
INDEX
Abstract I
List of Figures II
List of Screenshots III
CHAPTER NO. CONTENTS PAGE NO.
1 INTRODUCTION 1-2
1.1. Motivation 1
2.1. Introduction 3
3.1. Introduction 7
3.4. Algorithm 10
4.1. Introduction 14
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
6 6. SNAPSHOTS 46-48
7.1. Introduction 49
8.1 CONCLUSION 52
9 REFERENCES 54
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
ABSTRACT
The web is the huge and most extravagant wellspring of data. To recover the information
from the World Wide Web, Search Engines are commonly utilized. Search engines provide a
simple interface for searching for user query and displaying results in the form of the web
address of the relevant web page, but using traditional search engines has become very
challenging to obtain suitable information. This project proposed a search engine using
Machine Learning technique that will give more relevant web pages at top for user queries.
INTRODUCTION
World Wide Web is actually a web of individual systems and servers which are connected with different
technology and methods. Every site comprises the heaps of site pages that are being made and sent on the
server. So if a user needs something, then he or she needs to type a keyword. Keyword is a set of words
extracted from user search input. Search input given by a user may be syntactically incorrect. Here comes
the actual need for search engines. Search engines provide you a simple interface to search user queries and
display the results.
1) Web crawler Web crawlers help in collecting data about a website and the links related to them. We are
only using web crawlers for collecting data and information from WWW and storing it in our database.
2) Indexer which arranges each term on each web page and stores the subsequent list of terms in a
tremendous repository.
3) Query Engine It is mainly used to reply to the user’s keyword and show the effective outcome for their
keyword. In the query engine, the Page ranking algorithm ranks the URL by using different algorithms in
the query engine.
4)T his paper utilizes Machine Learning Techniques to discover the utmost suitable web address for the
given keyword. The output of the PageRank algorithm is given as input to the machine learning algorithm.
5) The section II discusses the related work in search engine and PageRank algorithm. In section III
Objective is explained. Section IV deals with a proposed system which is based on machine learning
technique and section V contains the conclusion.
1
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
1.1 MOTIVATION
As in today’s internet world, people are mostly based on search engines to search what they are looking for
in the internet. • The web is the huge and most extravagant well spring of data. To retrieve the information
from the World Wide Web, Search Engines are commonly utilized.
2
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
2. LITERATURE SURVEY
3
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
4
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
3. SYSTEM ANALYSIS
5
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
Front-End : Python.
6
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
FEASIBILITY STUDY
The feasibility of the project is analyzed in this phase and business proposal is put forth with a
very general plan for the project and some cost estimates. During system analysis the feasibility study of
the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the
company. For feasibility analysis, some understanding of the major requirements for the system is
essential.
ECONOMICAL FEASIBILITY
TECHNICAL FEASIBILITY
SOCIAL FEASIBILITY
This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and development of the
system is limited. The expenditures must be justified. Thus the developed system as well within the budget
and this was achieved because most of the technologies used are freely available. Only the customized
products had to be purchased.
7
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
The aspect of study is to check the level of acceptance of the system by the user. This includes the
process of training the user to use the system efficiently. The user must not feel threatened by the system,
instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods
that are employed to educate the user about the system and to make him familiar with it. His level of
confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as
he is the final user of the system.
8
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
4 SOFTWARE ENVIRONMENT
4.1 PYTHON
Python is a general-purpose interpreted, interactive, object-oriented, and high-level programming language.
An interpreted language, Python has a design philosophy that emphasizes code readability (notably
using whitespace indentation to delimit code blocks rather than curly brackets or keywords), and a syntax
that allows programmers to express concepts in fewer lines of code than might be used in languages such
as C++or Java. It provides constructs that enable clear programming on both small and large scales. Python
interpreters are available for many operating systems. C Python, the reference implementation of Python,
is open source software and has a community-based development model, as do nearly all of its variant
implementations. C Python is managed by the non-profit Python Software Foundation. Python features
a dynamic type system and automatic memory management. It supports multiple programming paradigms,
including object-oriented, imperative, functional and procedural, and has a large and
comprehensive standard library.
$ python
Python 2.4.3 (#1, Nov 11 2010, 13:34:43)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
Type the following text at the Python prompt and press the Enter −
Hello, Python!
Script Mode Programming
9
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
Invoking the interpreter with a script parameter begins execution of the script and continues until the script
is finished. When the script is finished, the interpreter is no longer active.
Let us write a simple Python program in a script. Python files have extension .py. Type the following source
code in a test.py file −
Live Demo
print "Hello, Python!"
We assume that you have Python interpreter set in PATH variable. Now, try to run this program as follows −
$ python test.py
This produces the following result −
Hello, Python!
Let us try another way to execute a Python script. Here is the modified test.py file −
Live Demo
#!/usr/bin/python
Hello, Python!
Python Identifiers
A Python identifier is a name used to identify a variable, function, class, module or other object. An
identifier starts with a letter A to Z or a to z or an underscore (_) followed by zero or more letters,
underscores and digits (0 to 9).
Python does not allow punctuation characters such as @, $, and % within identifiers. Python is a case
10
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
sensitive programming language. Thus, Manpower and manpower are two different identifiers in Python.
Class names start with an uppercase letter. All other identifiers start with a lowercase letter.
Starting an identifier with a single leading underscore indicates that the identifier is private.
Starting an identifier with two leading underscores indicates a strongly private identifier.
If the identifier also ends with two trailing underscores, the identifier is a language-defined special name.
Reserved Words
The following list shows the Python keywords. These are reserved words and you cannot use them as
constant or variable or any other identifier names. All the Python keywords contain lowercase letters only.
The number of spaces in the indentation is variable, but all statements within the block must be indented the
same amount. For example −
11
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
if True:
print "True"
else:
print "False"
However, the following block generates an error −
if True:
print "Answer"
print "True"
else:
print "Answer"
print "False"
Thus, in Python all the continuous lines indented with same number of spaces would form a block. The
following example has various statement blocks −
Note − Do not try to understand the logic at this point of time. Just make sure you understood various blocks
even if they are without braces.
#!/usr/bin/python
import sys
try:
# open file stream
file = open(file_name, "w")
except IOError:
print "There was an error writing to", file_name
sys.exit()
print "Enter '", file_finish,
print "' When finished"
while file_text != file_finish:
file_text = raw_input("Enter text: ")
if file_text == file_finish:
# close the file
12
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
file.close
break
file.write(file_text)
file.write("\n")
file.close()
file_name = raw_input("Enter filename: ")
if len(file_name) == 0:
print "Next time please enter something"
sys.exit()
try:
file = open(file_name, "r")
except IOError:
print "There was an error reading file"
sys.exit()
file_text = file.read()
file.close()
print file_text
Multi-Line Statements
Statements in Python typically end with a new line. Python does, however, allow the use of the line
continuation character (\) to denote that the line should continue. For example −
total = item_one + \
item_two + \
item_three
Statements contained within the [], {}, or () brackets do not need to use the line continuation character. For
example −
The triple quotes are used to span the string across multiple lines. For example, all the following are legal −
13
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
word = 'word'
sentence = "This is a sentence."
paragraph = """This is a paragraph. It is
made up of multiple lines and sentences."""
Comments in Python
A hash sign (#) that is not inside a string literal begins a comment. All characters after the # and up to the
end of the physical line are part of the comment and the Python interpreter ignores them.
Live Demo
#!/usr/bin/python
# First comment
print "Hello, Python!" # second comment
This produces the following result −
Hello, Python!
You can type a comment on the same line after a statement or expression −
# This is a comment.
# This is a comment, too.
# This is a comment, too.
# I said that already.
Following triple-quoted string is also ignored by Python interpreter and can be used as a multiline
comments:
'''
This is a multiline
comment.
'''
Using Blank Lines
A line containing only whitespace, possibly with a comment, is known as a blank line and Python totally
14
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
ignores it.
In an interactive interpreter session, you must enter an empty physical line to terminate a multiline
statement.
#!/usr/bin/python
if expression :
suite
elif expression :
suite
else :
suite
Command Line Arguments
Many programs can be run to provide you with some basic information about how they should be run.
Python enables you to do this with -h −
15
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
$ python -h
usage: python [option] ... [-c cmd | -m mod | file | -] [arg] ...
Options and arguments (and corresponding environment variables):
-c cmd : program passed in as string (terminates option list)
-d : debug output from parser (also PYTHONDEBUG=x)
-E : ignore environment variables (such as PYTHONPATH)
-h : print this help message and exit
You can also program your script in such a way that it should accept various options. Command Line
Arguments is an advanced topic and should be studied a bit later once you have gone through rest of the
Python concepts.
Python Lists
The list is a most versatile datatype available in Python which can be written as a list of comma-separated
values (items) between square brackets. Important thing about a list is that items in a list need not be of the
same type.
Creating a list is as simple as putting different comma-separated values between square brackets. For
example −
Creating a tuple is as simple as putting different comma-separated values. Optionally you can put these
comma-separated values between parentheses also. For example −
16
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
tup1 = ();
To write a tuple containing a single value you have to include a comma, even though there is only one value
−
tup1 = (50,);
Like string indices, tuple indices start at 0, and they can be sliced, concatenated, and so on.
Live Demo
#!/usr/bin/python
tup1[0]: physics
tup2[1:5]: [2, 3, 4, 5]
Updating Tuples
Live Demo
#!/usr/bin/python
17
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
dict['Name']: Zara
dict['Age']: 7
If we attempt to access a data item with a key, which is not part of the dictionary, we get an error as follows
−
Live Demo
#!/usr/bin/python
dict['Alice']:
Traceback (most recent call last):
File "test.py", line 4, in <module>
print "dict['Alice']: ", dict['Alice'];
KeyError: 'Alice'
Updating Dictionary
You can update a dictionary by adding a new entry or a key-value pair, modifying an existing entry, or
deleting an existing entry as shown below in the simple example −
Live Demo
#!/usr/bin/python
18
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
dict['Age']: 8
dict['School']: DPS School
Delete Dictionary Elements
You can either remove individual dictionary elements or clear the entire contents of a dictionary. You can
also delete entire dictionary in a single operation.
To explicitly remove an entire dictionary, just use the del statement. Following is a simple example −
Live Demo
#!/usr/bin/python
dict['Age']:
Traceback (most recent call last):
File "test.py", line 8, in <module>
print "dict['Age']: ", dict['Age'];
TypeError: 'type' object is unsubscriptable
Note − del() method is discussed in subsequent section.
19
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
(a) More than one entry per key not allowed. Which means no duplicate key is allowed. When duplicate
keys encountered during assignment, the last assignment wins. For example −
Live Demo
#!/usr/bin/python
dict['Name']: Manni
(b) Keys must be immutable. Which means you can use strings, numbers or tuples as dictionary keys but
something like ['key'] is not allowed. Following is a simple example −
Live Demo
#!/usr/bin/python
Live Demo
#!/usr/bin/python
20
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
To explicitly remove an entire tuple, just use the del statement. For example −
Live Demo
#!/usr/bin/python
21
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
4.2 DJANGO
Django is a high-level Python Web framework that encourages rapid development and clean,
pragmatic design. Built by experienced developers, it takes care of much of the hassle of Web development,
so you can focus on writing your app without needing to reinvent the wheel. It’s free and open source.
Django's primary goal is to ease the creation of complex, database-driven websites. Django
emphasizes reusability and "pluggability" of components, rapid development, and the principle of don't
repeat yourself. Python is used throughout, even for settings files and data models.
Django also provides an optional administrative create, read, update and delete interface that is generated
dynamically through introspection and configured via admin models
22
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
Create a Project
Whether you are on Windows or Linux, just get a terminal or a cmd prompt and navigate to the place you
want your project to be created, then use this code −
myproject/
manage.py
myproject/
__init__.py
settings.py
urls.py
wsgi.py
The Project Structure
The “myproject” folder is just your project container, it actually contains two elements −
manage.py − This file is kind of your project local django-admin for interacting with your project via
command line (start the development server, sync db...). To get a full list of command accessible via
manage.py you can use the code −
23
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
The “myproject” subfolder − This folder is the actual python package of your project. It contains four files −
urls.py − All links of your project and the function to call. A kind of ToC of your project.
DEBUG = True
This option lets you set if your project is in debug mode or not. Debug mode lets you get more information
about your project's error. Never set it to ‘True’ for a live project. However, this has to be set to ‘True’ if
you want the Django light server to serve static files. Do it only in the development mode.
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': 'database.sql',
'USER': '',
'PASSWORD': '',
'HOST': '',
'PORT': '',
}
}
Database is set in the ‘Database’ dictionary. The example above is for SQLite engine. As stated earlier,
Django also supports −
MySQL (django.db.backends.mysql)
PostGreSQL (django.db.backends.postgresql_psycopg2)
24
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
You can also set others options like: TIME_ZONE, LANGUAGE_CODE, TEMPLATE…
Now that your project is created and configured make sure it's working −
Validating models...
0 errors found
September 03, 2015 - 11:41:50
Django version 1.6.11, using settings 'myproject.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
A project is a sum of many applications. Every application has an objective and can be reused into another
project, like the contact form on a website can be an application, and can be reused for others. See it as a
module of your project.
Create an Application
We assume you are in your project folder. In our main “myproject” folder, the same folder then manage.py
−
myapp/
__init__.py
admin.py
25
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
models.py
tests.py
views.py
__init__.py − Just to make sure python handles this folder as a package.
admin.py − This file helps you make the app modifiable in the admin interface.
INSTALLED_APPS = (
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'myapp',
)
Creating forms in Django, is really similar to creating a model. Here again, we just need to inherit from
Django class and the class attributes will be the form fields. Let's add a forms.py file in myapp folder to
contain our app forms. We will create a login form.
myapp/forms.py
26
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
class LoginForm(forms.Form):
user = forms.CharField(max_length = 100)
password = forms.CharField(widget = forms.PasswordInput())
As seen above, the field type can take "widget" argument for html rendering; in our case, we want the
password to be hidden, not displayed. Many others widget are present in Django: DateInput for dates,
CheckboxInput for checkboxes, etc.
def login(request):
username = "not logged in"
if request.method == "POST":
#Get the posted form
MyLoginForm = LoginForm(request.POST)
if MyLoginForm.is_valid():
username = MyLoginForm.cleaned_data['username']
else:
MyLoginForm = Loginform()
27
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
<html>
<body>
<br>
<br>
</center>
28
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
</div>
</form>
</body>
</html>
The template will display a login form and post the result to our login view above. You have probably
noticed the tag in the template, which is just to prevent Cross-site Request Forgery (CSRF) attack on your
site.
{% csrf_token %}
Once we have the login template, we need the loggedin.html template that will be rendered after form
treatment.
<html>
<body>
You are : <strong>{{username}}</strong>
</body>
</html>
Now, we just need our pair of URLs to get started: myapp/urls.py
urlpatterns = patterns('myapp.views',
url(r'^connection/',TemplateView.as_view(template_name = 'login.html')),
url(r'^login/', 'login', name = 'login'))
When accessing "/myapp/connection", we will get the following login.html template rendered −
Setting Up Sessions
In Django, enabling session is done in your project settings.py, by adding some lines to the
MIDDLEWARE_CLASSES and the INSTALLED_APPS options. This should be done while creating the
project, but it's always good to know, so MIDDLEWARE_CLASSES should have −
29
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
'django.contrib.sessions.middleware.SessionMiddleware'
And INSTALLED_APPS should have −
'django.contrib.sessions'
By default, Django saves session information in database (django_session table or collection), but you can
configure the engine to store information using other ways like: in file or in cache.
When session is enabled, every request (first argument of any view in Django) has a session (dict) attribute.
Let's create a simple sample to see how to create and save sessions. We have built a simple login system
before (see Django form processing chapter and Django Cookies Handling chapter). Let us save the
username in a cookie so, if not signed out, when accessing our login page you won’t see the login form.
Basically, let's make our login system we used in Django Cookies handling more secure, by saving cookies
server side.
For this, first lets change our login view to save our username cookie server side −
def login(request):
username = 'not logged in'
if request.method == 'POST':
MyLoginForm = LoginForm(request.POST)
if MyLoginForm.is_valid():
username = MyLoginForm.cleaned_data['username']
request.session['username'] = username
else:
MyLoginForm = LoginForm()
def formView(request):
30
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
if request.session.has_key('username'):
username = request.session['username']
return render(request, 'loggedin.html', {"username" : username})
else:
return render(request, 'login.html', {})
Now let us change the url.py file to change the url so it pairs with our new view −
urlpatterns = patterns('myapp.views',
url(r'^connection/','formView', name = 'loginform'),
url(r'^login/', 'login', name = 'login'))
When accessing /myapp/connection, you will get to see the following page
In basic technical terms, machine learning uses algorithms that take empirical or
historical data in, analyze it, and generate outputs based on that analysis. In some
approaches, the algorithms work with so-called “training data” first and then they
learn, predict, and find ways to improve their performance over time.
31
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
In supervised learning, the computer is trained on a set of data inputs and outputs, with a goal of
learning a general rule that maps the given inputs to the given outputs. Two main types of supervised
learning are:
1) classification, which entails the prediction of a class label, and
2) regression, which entail the prediction of a numerical value.
In unsupervised learning, the learning algorithm is not given this type of guidance; instead, it works
to discover the pattern or structure in the input on its own. Two main types of unsupervised learning
are:
1) clustering, which involves discovering groups within the dataset that share similar characteristics,
2) density estimation, which involves evaluating the statistical distribution of the data set.
Unsupervised learning methods also include visualization with the data and projection, which
reduces the dimensions of the data, a form of simplification.
In reinforcement learning, the computer and algorithms will confront a problem in a dynamic
environment and as it works to perform a given goal, it will receive feedback (rewards), which will
reinforce its learning and goal seeking effort. The example of AlphaGo is a case of reinforcement
learning; reinforcement learning algorithms include Q-learning, temporal-difference learning, and
deep reinforcement learning.
In the financial markets, machine learning is used for automation, portfolio optimization, risk management,
and to provide financial advisory services to investors (robo-advisors).
For automation in the form of algorithmic trading, human traders will build mathematical models that
analyze financial news and trading activities to discern markets trends, including volume, volatility, and
possible anomalies. These models will execute trades based on a given set of instructions, enabling activity
without direct human involvement once the system is set up and running.
For portfolio optimization, machine learning techniques can help in evaluating large amounts of data,
determining patterns, and finding solutions for given problems with regard to balancing risk and reward. ML
can also help in detecting investment signals and in time-series forecasting.
For risk management, machine learning can assist with credit decisions and also with detecting suspicious
transactions or behavior, including KYC compliance efforts and prevention of fraud.
32
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
For financial advisory services, machine learning has supported the shift towards robo-advisors for some
types of retail investors, assisting them with their investment and savings goals.
Machine Learning can review large volumes of data and discover specific trends and patterns that would not
be apparent to humans. For instance, for an e-commerce website like Amazon, it serves to understand the
browsing behaviors and purchase histories of its users to help cater to the right products, deals, and
reminders relevant to them. It uses the results to reveal relevant advertisements to them.
With ML, you don’t need to babysit your project every step of the way. Since it means giving machines the
ability to learn, it lets them make predictions and also improve the algorithms on their own. A common
example of this is anti-virus softwares; they learn to filter new threats as they are recognized. ML is also
good at recognizing spam.
3. Continuous Improvement
As ML algorithms gain experience, they keep improving in accuracy and efficiency. This lets them make
better decisions. Say you need to make a weather forecast model. As the amount of data you have keeps
growing, your algorithms learn to make more accurate predictions faster.
4. Handling multi-dimensional and multi-variety data
Machine Learning algorithms are good at handling data that are multi-dimensional and multi-variety, and
they can do this in dynamic or uncertain environments.
5. Wide Applications
You could be an e-tailer or a healthcare provider and make ML work for you. Where it does apply, it holds
the capability to help deliver a much more personal experience to customers while also targeting the right
customers.
33
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
With all those advantages to its powerfulness and popularity, Machine Learning isn’t perfect. The following
factors serve to limit it:
1. Data Acquisition
Machine Learning requires massive data sets to train on, and these should be inclusive/unbiased, and of
good quality. There can also be times where they must wait for new data to be generated.
ML needs enough time to let the algorithms learn and develop enough to fulfill their purpose with a
considerable amount of accuracy and relevancy. It also needs massive resources to function. This can mean
additional requirements of computer power for you.
3. Interpretation of Results
Another major challenge is the ability to accurately interpret results generated by the algorithms. You must
also carefully choose the algorithms for your purpose.
4. High error-susceptibility
Machine Learning is autonomous but highly susceptible to errors. Suppose you train an algorithm with data
sets small enough to not be inclusive. You end up with biased predictions coming from a biased training set.
This leads to irrelevant advertisements being displayed to customers.
Machine Learning (ML) has a rich and fascinating history, spanning several decades of research and
development. Here's an overview of the key milestones and advancements in the history of ML:
34
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
2020s:
ML applications expanded into various domains, including healthcare, finance, autonomous vehicles, and
robotics.
Explainable AI and fairness in ML became crucial topics to address biases and improve transparency.
Generative Adversarial Networks (GANs) emerged as a powerful technique for generating realistic synthetic
data.
35
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
ML frameworks and libraries, such as TensorFlow and PyTorch, provided tools for easier model
development and deployment.
Ethical considerations, privacy concerns, and regulation discussions surrounding ML gained traction.
It's important to note that this overview provides a high-level summary of ML's history, and there are many
more specific developments and breakthroughs that have occurred along the way. The field continues to
evolve rapidly, with ongoing research and advancements shaping the future of ML.
36
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
5. SYSTEM DESIGN
1) Web crawler Web crawlers help in collecting data about a website and the links related to them. We are
only using web crawler for collecting data and information from WWW and store it to our database.
2) Indexer Indexer which arranges each term on each web page and stores the subsequent list of terms in a
tremendous repository.
3) Query Engine It is mainly used to reply the user’s keyword and show the effective outcome for their
keyword. In query engine, Page ranking algorithm ranks the URL by using different algorithms in the query
engine.
.5.2 METHODOLOGY
To build a search engine which gives web address of the most relevant web page at the top of the search
result, according to user queries. The main focus of our system is to build a search engine using machine
learning technique for increasing accuracy compare to available search engine.
Following is the step by step procedure for building the search engine:
37
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
In this step, data cleaning is performed to preprocess the data using NLP steps so that unnecessary data is
removed. After collecting data from WWW using web crawler, there is need to perform data cleaning using
NLP.
Fig.5.3 NLP steps for data cleaning for Building Search Engine
Using Machine Learning Technique
This algorithm calculates the page score at the time the pages are indexed. Web page weight is calculated
based on inbound and outbound links of importance web page. It calculates hub and authority score for each
web page. Input Parameter Incoming links Incoming and outgoing links Content, incoming and outgoing
links Algorithm Complexity O(log N) < O(log N) < O(log N) Quality of Results Good More than PageRank
Less than PageRank effificiency medium High Low Among all, the Weighted PageRank algorithm is best
suited for system because it gives more accuracy and effificiency comparable to other
In this step, topmost output of pagerank algorithm is considered as input for machine learning algorithm.
The output of machine learning algorithm is given to the user as a web address of relevant web page based
on user queries. For implementing the machine learning algorithm to find out the most relevant web
pagebased on user queries, we are dividing the web feature into three parts:
1) Page content
2) Page content of Neighbors
39
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
3) Link analysis
E. Implement query engine to display the efficient results for
user query
At last, implement the Query engine which takes the input from the user in a form of query and display the
effificient result for their query. It will display the web address of relevant pages based on the output of
machine learning algorithm.
The Unified Modeling Language is a standard language for specifying, Visualization, Constructing
and documenting the artifacts of software system, as well as for business modeling and other non-software
systems. The UML represents a collection of best engineering practices that have proven successful in the
modeling of large and complex systems.The UML is a very important part of developing objects oriented
software and the software development process. The UML uses mostly graphical notations to express the
design of software projects.
GOALS:
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling Language so that they can develop and
exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks, patterns and
components.
7. Integrate best practices.
40
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
Fig 5.3.1 Use Case Diagram For Building Search Engine Using Machine Learning Technique
41
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
In software engineering, a class diagram in the Unified Modeling Language (UML) is a type of static
structure diagram that describes the structure of a system by showing the system's classes, their attributes,
operations (or methods), and the relationships among the classes. It explains which class contains
information
Fig 5.3.2 Class Diagram For Building Search Engine Using Machine Learning Technique
42
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram that shows how
processes operate with one another and in what order. It is a construct of a Message Sequence Chart.
Sequence diagrams are sometimes called event diagrams, event scenarios, and timing diagrams .
Fig 5.3.3 Sequence Diagram For Building Search Engine Using Machine Learning Technique
43
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
Activity diagrams are graphical representations of workflows of stepwise activities and actions with support
for choice, iteration and concurrency. In the Unified Modeling Language, activity diagrams can be used to
describe the business and operational step-by-step workflows of components in a system. An activity
diagram shows the overall flow of control
Fig 5.3.4 Activity Diagram For Building Search Engine Using Machine Learning Technique
44
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
5.4 IMPLIMENTATION
IMPLEMENTATION We have used three algorithms in our project. They are:
1. Support Vector Machine
2. Artificial Neural Network
3. XGBoost
45
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
5.4.1 MODULES
Manager
user
Admin
Machine-learning
Manager:
Manager information and task descriptions for the entire experiment. Manager can upload the file into the
database. we can upload the file with file type and name of the file and also particular url to the file to get
the information about the file.
User:
user information and task descriptions for the entire experiment. user after login into the session he will get
two options. he can search the whatever particular url or information. we can search the particular file and
also we can get the weight and rank of the file by using the concept.
Admin:
Admin will give authority to managers and users. In order to facilitate activate the managers and activate the
users. the admin can see the details of all users and managers. Admin can get the accuracy results of svm
and xgboost algorithms.
Machine learning:
Machine learning refers to the computer’s acquisition of a kind of ability to make predictive judgments and
make the best decisions by analyzing and learning a large number of existing data. The representation
algorithms include deep learning, artificial neural networks, decision trees, enhancement algorithms and so
on. The key way for computers to acquire artificial intelligence is machine learning. Nowadays, machine
learning plays an important role in various fields of artificial intelligence. Whether in aspects of internet
search, biometric identification, auto driving, Mars robot, or in American presidential election, military
decision assistants and so on, basically, as long as there is a need for data analysis.
46
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
path('userlogin/',user.userlogin,name='userlogin'),
path('userregister/',user.userregister,name='userregister'),
path('userlogincheck/',user.userlogincheck,name='userlogincheck'),
path('pagerank',user.pagerank,name='pagerank'),
path('search/',user.search, name="search"),
path('search1/',user.search1, name="search1"),
path('usersearchresult/',user.usersearchresult, name="usersearchresult"),
path('usersearchresult1/',user.usersearchresult1, name="usersearchresult1"),
path('weight/', user.weight, name="weight"),
path('logout/',user.logout,name='logout'),
path('managerlogin/',manager.managerlogin,name='managerlogin'),
path('managerregister/',manager.managerregister,name='managerregister'),
path('managerlogincheck/',manager.managerlogincheck,name='managerlogincheck'),
path('fileupload/', manager.fileupload, name='fileupload'),
path('admin1/',search.adminlogin,name='admin1'),
path('adminloginentered/',search.adminloginentered,name='adminloginentered'),
path('userdetails/',search.userdetails,name='userdetails'),
path('Managerdetails/',search.managerdetails,name='Managerdetails'),
path('activateuser/',search.activateuser,name='activateuser'),
path('activatemanager/',search.activatemanager,name='activatemanager'),
47
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
views.py:
def managerlogin(request):
return render(request,'manager/managerlogin.html')
def managerregister(request):
if request.method=='POST':
form1=managerForm(request.POST)
if form1.is_valid():
form1.save()
print("succesfully saved the data")
return render(request, 'manager/managerlogin.html')
#return HttpResponse("registreration succesfully completed")
else:
print("form not valied")
return HttpResponse("form not valied")
else:
form=managerForm()
return render(request,"manager/managerregister.html",{"form":form})
def managerlogincheck(request):
if request.method == 'POST':
sname = request.POST.get('email')
print(sname)
spasswd = request.POST.get('upasswd')
print(spasswd)
try:
check = managerModel.objects.get(email=sname,passwd=spasswd)
# print('usid',usid,'pswd',pswd)
48
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
print(check)
# request.session['name'] = check.name
# print("name",check.name)
status = check.status
print('status',status)
if status == "Activated":
request.session['email'] = check.email
return render(request, 'manager/managerpage.html')
else:
messages.success(request, 'manager is not activated')
return render(request, 'manager/managerlogin.html')
except Exception as e:
print('Exception is ',str(e))
pass
messages.success(request,'Invalid name and password')
return render(request,'manager/managerlogin.html')
models.py:
from django.db import models
class userModel(models.Model):
name = models.CharField(max_length=50)
email = models.EmailField()
passwd = models.CharField(max_length=40)
cwpasswd = models.CharField(max_length=40)
mobileno = models.CharField(max_length=50, default="", editable=True)
status = models.CharField(max_length=40, default="", editable=True)
def __str__(self):
return self.email
class Meta:
49
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
db_table='userregister'
class weightmodel(models.Model):
filename = models.CharField(max_length=100)
file = models.FileField(upload_to='files/pdfs/')
weight=models.CharField(max_length=100)
rank=models.CharField(max_length=100,default="", editable=False)
label=models.CharField(max_length=100,default="", editable=False)
def __str__(self):
return self.filename
class Meta:
db_table='weight'
forms.py:
from django import forms
from user.models import *
from django.core import validators
class userForm(forms.ModelForm):
name = forms.CharField(widget=forms.TextInput(), required=True, max_length=100,)
passwd = forms.CharField(widget=forms.PasswordInput(), required=True, max_length=100)
cwpasswd = forms.CharField(widget=forms.PasswordInput(), required=True, max_length=100)
email = forms.CharField(widget=forms.TextInput(),required=True)
mobileno= forms.CharField(widget=forms.TextInput(), required=True,
max_length=10,validators=[validators.MaxLengthValidator(10),validators.MinLengthValidator(10)])
status = forms.CharField(widget=forms.HiddenInput(), initial='waiting', max_length=100)
def __str__(self):
return self.email
class Meta:
model=userModel
50
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
fields=['name','passwd','cwpasswd','email','mobileno','status']
userdetails.html:
{% extends 'adminbase.html' %}
{% load static %}
{% block contents %}
<div class="login-box-body">
<p class="login-box-msg">Sign in to start your session</p>
<div class="form-group">
<form name="" id="loginForm">
<div class="form-group has-feedback">
<!----- username -------------->
<input class="form-control" placeholder="Username" id="loginid" type="text"
51
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
autocomplete="off" />
<span style="display:none;font-weight:bold; position:absolute;color: red;position:
absolute;padding:4px;font-size: 11px;background-color:rgba(128, 128, 128, 0.26);z-index: 17; right: 27px;
top: 5px;" id="span_loginid"></span>
<!---Alredy exists ! -->
<span class="glyphicon glyphicon-user form-control-feedback"></span>
</div>
<div class="form-group has-feedback">
<!----- password -------------->
<input class="form-control" placeholder="Password" id="loginpsw" type="password"
autocomplete="off" />
<span style="display:none;font-weight:bold; position:absolute;color: grey;position:
absolute;padding:4px;font-size: 11px;background-color:rgba(128, 128, 128, 0.26);z-index: 17; right: 27px;
top: 5px;" id="span_loginpsw"></span>
<!---Alredy exists ! -->
<span class="glyphicon glyphicon-lock form-control-feedback"></span>
</div>
<div class="row">
<div class="col-xs-12">
<div class="checkbox icheck">
<label>
<input type="checkbox" id="loginrem" > Remember Me
</label>
</div>
</div>
<div class="col-xs-12">
<button type="button" class="btn btn-green btn-block btn-flat" onclick="userlogin()">Sign
In</button>
</div>
</div>
</form>
</div>
</div>
</div>
52
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
</div>
</div>
</div>
<!--/ Modal box-->
<!--Banner-->
<div class="banner">
<div class="bg-color">
<div class="container">
<div class="row">
<div class="banner-text text-center">
<div class="text-border">
<!--<h2 class="text-dec"></h2>-->
</div>
<div class="intro-para text-center quote">
<p>
Welcome admin page...
</p>
<center><h3>
<table border="2px solid red" align="left">
<tr><th style="color:green">Id</th>
<th style="color:green">name</th>
<th style="color:green">email</th>
<th style="color:green">mobileno</th>
<th style="color:green">status</th>
<th style="color:green">activate</th>
</tr>
{% for x in qs %}
<tr>
<td style="color:red">{{x.id}}</td>
<td style="color:red">{{x.name}}</td>
53
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
<td style="color:red">{{x.email}}</td>
<td style="color:red">{{x.mobileno}}</td>
<td style="color:red">{{x.status}}</td>
{% if x.status == 'waiting' %}
<td style="color:orange"> <a href="/activateuser/?pid={{ x.id }}"
>Activate</a></td>
{% else %}
<td style="color:orange"> Activated</td>
{% endif %}
</tr>
{% endfor %}
</table>
</h3>
</center>
</div>
</a>
</div>
</div>
</div>
</div>
</div>-->
{% endblock %}
54
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
In this paper author is using machine learning algorithms called SVM and XGBOOST to predict search
result of given query and building search engine with machine learning algorithms. To train this algorithm
author is using website data and then this data will be converted to numeric vector called TFIDF (term
frequency inverse document frequency). TFIDF vector contains average frequency of each words.
In this paper author has implemented following modules
1) Admin module: admin can login to application using username and password as admin and then
accept or activate new users registration and then train SVM and XGBOOST algorithm
2) Manager module: manager can login to application by using username and password as Manager and
Manager and then upload dataset to application
3) New User Signup: using this module new user can signup with the application
4) User Login: user can login to application and then perform search by giving query.
To run project install MYSQL and python 3.7 and then copy content from DB.txt file and paste in MYSQL
to create database.
Now double click on ‘run.bat’ file to start python DJANGO server and get below screen
Fig 6.2.1 django sever page Building Search Engine Using Machine Learning Technique
In above screen server started and build a vector from dataset where first row showing word and remaining
rows contains TFIDF word frequency. Now open browser and enter URL as
http://127.0.0.1:8000/index.html and press enter key to get below page
55
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
Fig 6.2.2 user signup page Building Search Engine Using Machine Learning Technique
In above screen click on ‘New User Signup Here’ link to get below screen
Fig 6.2.3 Assigning credentials in the user screen Building Search Engine Using Machine Learning
Technique
In above screen user is signing up and then press button to get below output
56
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
Fig 6.2.4 Signup Process Completed Building Search Engine Using Machine Learning Technique
In above screen user signup process completed and now click on ‘User Login’ to get below screen
Fig 6.2.5 User login Screen Building Search Engine Using Machine Learning Technique
57
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
Fig 6.2.6 Admin Login Building Search Engine Using Machine Learning Technique
In above screen we gave correct login but account not activated by admin so click on ‘Admin Login’ link to
login as admin and then activate user
58
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
Fig 6.2.7 Login Screen Building Search Engine Using Machine Learning Technique
In above screen admin is login and after login will get below screen
Fig 6.2.8 Home Page Building Search Engine Using Machine Learning Technique
In above screen admin can click on ‘View Users’ link to view all users
Fig 6.2.9 User Account Building Search Engine Using Machine Learning Technique
59
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
In above screen we can see SVM and XGBOOST accuracy and in both algorithms XGBOOST got high
accuracy and now logout and login as Manager
Fig 6.2.10 Manager Login Screen Building Search Engine Using Machine Learning Technique
In above screen manager is login and after login will get below screen
60
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
Fig 6.2.11 User Uploading Screen Building Search Engine Using Machine Learning Technique
In above screen manager can click on ‘Upload Dataset’ link to upload dataset or documents
Fig 6.2.12 Upload Data Set Building Search Engine Using Machine Learning Technique
61
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
In above screen manager is browsing and uploading dataset and this file you can find inside ‘Dataset’ folder
and now press button to saved dataset at server database
Fig 6.2.13 Data Set Screen Building Search Engine Using Machine Learning Technique
In above screen dataset file saved in database and now logout and login as user to perform search
62
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
Fig 6.2.14 User Login Screen Building Search Engine Using Machine Learning Technique
In above screen user is login and after login will get below output
Fig 6.2.15 Searching Ranking Page Building Search Engine Using Machine Learning Technique
In above screen user can click on ‘Search with Page Rank’ link to search any data
63
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
Fig 6.2.16 Query Screen Building Search Engine Using Machine Learning Technique
In above screen I entered query as ‘news on security’ and press button to get below search result
Fig 6.2.17 URL Screen Building Search Engine Using Machine Learning Technique
64
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
In above screen machine learning algorithm predicts two URLS for given query and user can click on those
URLS to visit page
Fig 6.2.18 Output Page Building Search Engine Using Machine Learning Technique
In above screen by clicking on URL link user can visit and view page. Similarly user can give any query and
if query available in dataset then he will get output For above query we got below result
65
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
Excepted Remarks(IF
S.no Test Case Result
Result Fails) 7.
If User If already user
2 1. User Register registration Pass email exists then
successfully. it fails.
If the Username
Unknown
and password is
Register Users
2. User Login correct then it Pass
will not be
will be a valid
logged in.
page.
If the Manager
name and .Unknown
password is Register
3. Manager login Pass
correct then it Manager will not
will be a valid log in.
pag.
Admin can If the manager
Admin can
activate the did not find it
4. activate the Pass
register manager then it won’t
register magers
id. login
Admin can login
with his login Invalid login
5. Admin login credential. If Pass details will not
success he get is allowed here
home page
Admin can Admin can .If the user did
6. activate the activate the Pass not find it then it
register users register user id . won’t login.
by clicking svm
admin can get prediction of svm
7. it will display Pass
the svm results won’t get..
svm prediction
by clicking
admin can get prediction of
xgboost it will
8. the xgboost Pass xgboost won’t
display xgboost
results get..
prediction.
user can search
the we won’t get the
9. user login page weight of Pass weight of
particular document.
document
Pass
10.
TYPES OF TESTING
66
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
Unit testing
Unit testing involves the design of test cases that validate that the internal program logic is
functioning properly, and that program inputs produce valid outputs. All decision branches and internal code
flow should be validated. It is the testing of individual software units of the application .it is done after the
completion of an individual unit before integration. This is a structural testing, that relies on knowledge of
its construction and is invasive. Unit tests perform basic tests at component level and test a specific business
process, application, and/or system configuration. Unit tests ensure that each unique path of a business
process performs accurately to the documented specifications and contains clearly defined inputs and
expected results.
Integration testing
Integration tests are designed to test integrated software components to determine if they
actually run as one program. Testing is event driven and is more concerned with the basic outcome of
screens or fields. Integration tests demonstrate that although the components were individually satisfaction,
as shown by successfully unit testing, the combination of components is correct and consistent. Integration
testing is specifically aimed at exposing the problems that arise from the combination of components.
Functional test
Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user manuals.
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be exercised.
Systems/Procedures : interfacing systems or procedures must be invoked.
Organization and preparation of functional tests is focused on requirements, key functions, or
special test cases. In addition, systematic coverage pertaining to identify Business process flows; data fields,
predefined processes, and successive processes must be considered for testing. Before functional testing is
complete, additional tests are identified and the effective value of current tests is determined.
System Test
System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system testing is the configuration
oriented system integration test. System testing is based on process descriptions and flows, emphasizing pre-
driven process links and integration points.
67
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
Features to be tested
Verify that the entries are of the correct format
No duplicate entries should be allowed
All links should take the user to the correct page.
Integration Testing
Software integration testing is the incremental integration testing of two or more integrated
software components on a single platform to produce failures caused by interface defects.
The task of the integration test is to check that components or software applications,
Test Results: All the test cases mentioned above passed successfully. No defects encountered.
Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires significant participation by the end
user. It also ensures that the system meets the functional requirements.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.
8. CONCLUSION
68
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
Search engines are very useful for finding out more relevant URLs for given keywords. Due to this, user
time is reduced for searching the relevant web page. For this, Accuracy is a very important factor. From the
above observation, it can be concluded that XGBoost is better in terms of accuracy than SVM and ANN.
Thus, Search engines built using XGBoost and PageRank algorithms will give better accuracy.
69
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
9.FUTURE ENHANCEMENT
Building a search engine using machine learning techniques offers a wide range of possibilities for future
enhancements and improvements. Here are some potential areas for future development and enhancements:
1. Query Understanding: Enhancing the search engine's ability to understand user queries more
accurately and interpret their intent. This could involve natural language processing (NLP)
techniques, sentiment analysis, entity recognition, and understanding contextual information to
provide more relevant search results.
2. Personalization: Incorporating personalization features to tailor search results based on user
preferences, search history, location, and demographic information. This could involve building user
profiles and using collaborative filtering or reinforcement learning techniques to provide
personalized recommendations.
3. Multimedia Content Search: Expanding the search engine's capabilities to handle and retrieve
various types of multimedia content, including images, videos, audio files, and documents.
Developing advanced techniques for content analysis, image recognition, speech recognition, and
video understanding can greatly enhance the search experience.
4. Semantic Search: Implementing techniques to understand the meaning and context of the search
query and the indexed content. Utilizing semantic analysis, knowledge graphs, and ontologies can
improve the search engine's ability to retrieve more accurate and contextually relevant results.
5. Real-Time Updates: Enabling the search engine to handle real-time updates and index new content
as it becomes available. This involves efficient indexing algorithms and techniques to handle large-
scale data ingestion and provide up-to-date search results.
6. Explainability and Trust: Addressing the challenges of explainability in machine learning models
used within the search engine. Developing techniques to provide transparency and explanations for
the ranking of search results, helping users understand why certain results are presented and building
trust in the search engine's recommendations.
7. Multilingual and Cross-Language Search: Extending the search engine's capabilities to handle
multiple languages and facilitate cross-language search. Developing language translation models,
language detection, and cross-lingual retrieval techniques can enable users to search and access
information in different languages.
8. Contextual Search: Leveraging user context, such as time, location, device, and user behavior, to
deliver more context-aware search results. This involves incorporating contextual information into
the ranking algorithm and providing personalized recommendations based on the user's current
situation.
70
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
10.REFERENCES
[1] Manika Dutta, K. L. Bansal, “A Review Paper on Various Search Engines (Google, Yahoo, Altavista,
Ask and Bing)”, International Journal on Recent and Innovation Trends in Computing and Communication,
2016.
[2] Gunjan H. Agre, Nikita V.Mahajan, “Keyword Focused Web Crawler”, International Conference on
Electronic and Communication Systems, IEEE, 2015.
[3] Tuhena Sen, Dev Kumar Chaudhary, “Contrastive Study of Simple PageRank, HITS and Weighted
PageRank Algorithms: Review”, International Conference on Cloud Computing, Data Science &
Engineering, IEEE, 2017.
[4] Michael Chau, Hsinchun Chen, “A machine learning approach to web page filtering using content and
structure analysis”, Decision Support Systems 44 (2008) 482–494,scienceDirect,2008.
[5] Taruna Kumari, Ashlesha Gupta, Ashutosh Dixit, “Comparative Study of Page Rank and Weighted
Page Rank Algorithm”, International Journal of Innovative Research in Computer and Communication
Engineering, February 2014.
[6] K. R. Srinath, “Page Ranking Algorithms – A Comparison”, International Research Journal of
Engineering and Technology (IRJET), Dec2017.
[7] S. Prabha, K. Duraiswamy, J. Indhumathi, “Comparative Analysis of Different Page Ranking
Algorithms”, International Journal of Computer and Information Engineering, 2014.
[8] Dilip Kumar Sharma, A. K. Sharma, “A Comparative Analysis of Web Page Ranking Algorithms”,
International Journal on Computer Science and Engineering, 2010.
[9] Vijay Chauhan, Arunima Jaiswal, Junaid Khalid Khan, “Web Page Ranking Using Machine Learning
Approach”, International Conference on Advanced Computing Communication Technologies, 2015.
[10] Amanjot Kaur Sandhu, Tiewei s. Liu., “Wikipedia Search Engine: Interactive Information Retrieval
Interface Design”, International Conference on Industrial and Information Systems, 2014.
[11] Neha Sharma, Rashi Agarwal, Narendra Kohli, “Review of features and machine learning techniques
for web searching”, International Conference on Advanced Computing Communication Technologies, 2016.
[12] Sweah Liang Yong, Markus Hagenbuchner, Ah Chung Tsoi, “Ranking Web Pages using Machine
Learning Approaches”, International Conference on Web Intelligence and Intelligent Agent Technology,
2008.
[13] B. Jaganathan, Kalyani Desikan,“Weighted Page Rank Algorithm based on In-Out Weight of
Webpages”, Indian Journal of Science and Technology, Dec-2015
71
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN
72