0% found this document useful (0 votes)

21 views79 pages

Major Project Doc)

Uploaded by

shreeja471

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views79 pages

Major Project Doc)

Uploaded by

shreeja471

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 79

BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

An industrial oriented major project report

BUILDING SEARCH ENGINE USING MACHINE LEARNING

TECHNIQUES
Submitted by
DEEPAK SHARMA 19W91A1215
M.SRINIVAS PRASAD 19W91A1239
J.SAI SHREEJA 19W91A1225
G.RAGHU 19W91A1218

Under the Esteemed Guidance of

Mr. S. MUTHULINGAM
Assistant Professor, IT
TO
Jawaharlal Nehru Technological University
Hyderabad
In partial fulfilment of the requirements for award of degree
of BACHELOR OF TECHNOLOGY

INFORMATION TECHNOLOGY

DEPARTMENT OF INFORMATION TECHNOLOGY

MALLA REDDY INSTITUTE OF ENGINEERING AND TECHNOLOGY
(UGC AUTONOMOUS)
(Sponsored by Malla Reddy Educational society)
(Accredited by NBA, Permanent Affiliated to JNTU, Hyderabad)
Maisammaguda, Dhulapally post, Secunderabad-500014.

2022-2023
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

Department of Information Technology

PROJECT CERTIFICATE

This is to certify that this is the certificate of an industrial oriented major project report titled
“BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHNIQUES” submitted by
DEEPAK SHARMA (19W91A1215), M.SRINIVAS PRASAD (19W91A1239), J.SAI SHREEJA
(19W91A1225), G.RAGHU (19W91A1218) of B. Tech in the
partial fulfillment of the requirements for the degree of Bachelor of Technology in Information
Technology, Department of Information Technology and this has not been submitted for the award of
any other degree of this institution.

INTERNAL GUIDE SIGN HEAD OF THE DEPARTMENT SIGN

Mr. S. Muthulingam Dr. P. Srinivas

Assistant Professor, IT Professor, IT

EXTERNAL EXAMINER SIGN

BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

DECLARATION

We hereby declare that the entitled industrial oriented major project report
“ BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHNIQUES ”
submitted to Malla Reddy Institute of Engineering and Technology, affiliated to Jawaharlal Nehru
Technological University Hyderabad (JNTUH), for the award of the degree of Bachelor of
Technology in Information Technology is a result of original industrial oriented minor project done
by us. It is further declared that the minor project report or any part thereof has not been previously
submitted to any University or Institute for the award of degree or diploma.

DEEPAK SHARMA 19W91A1215

M. SRINIVAS PRASAD 19W91A1239
J. SAI SHREEJA 19W91A1225
G. RAGHU 19W91A1218
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

ACKNOWLEDGEMENT

First and foremost, We are grateful to the principal Dr.M Ashok for providing us
with all the resources in the college to make our project a success. I thank him for his
valuable suggestions at the time of seminars which encouraged us to give our best in the
project.

We would like to express our gratitude to Dr.P.Srinivas Head of the Department

of Information Technology for his support and valuable suggestions during the dissertation
work.

We offer our sincere gratitude to our project Coordinator B.Sunil Kumar

Assistant professor and Project internal guide Mr.S.Muthulingam Assistant Professor of
Information technology who has supported us throughout this project with their patience
and valuable suggestions.

We would also like to thank all the supporting staff of the department of IT and all
other departments who have been helpful directly or indirectly in making the project a
success. We are extremely grateful to our parents for their blessings and prayers for our
completion of project. This gave us strength to do our project better.

DEEPAK SHARMA 19W91A1215

M. SRINIVAS PRASAD 19W91A1239

J. SAI SHREEJA 19W91A1225

G. RAGHU 19W91A1218
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

INDEX
Abstract I
List of Figures II
List of Screenshots III
CHAPTER NO. CONTENTS PAGE NO.

1 INTRODUCTION 1-2

1.1. Motivation 1

1.2. Problem definition 1

1.3. Objective of the Project 1

1.4. Limitations of Project 2

2 LITERATURE SURVEY 3-5

2.1. Introduction 3

2.2. Existing system 4

2.3. Disadvantages of Existing system 4

2.4. Proposed System 4

2.5. Disadvantages of Proposed system 5

3 SYSTEM ANALYSIS 6-10

3.1. Introduction 7

3.2. Software requirement specification 8

3.3. & Flow chart 8

3.4. Algorithm 10

4 SYSTEM DESIGN 14-20

4.1. Introduction 14
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

4.3. Modules 18-20

5 IMPLEMENTATION 21-45

5.1 Input Design 21

5.2 Source Code 23

6 6. SNAPSHOTS 46-48

6.1 Cloud Server Login Page 46

6.2 User Login Page 46

6.3 View Registered Users By Admin 47

6.4 View Uploaded File of Users 48

6.5 Cloud Server View 48

7 SOFTWARE TESTING 49-51

7.1. Introduction 49

7.2 Sample Test Cases 51

8 CONCLUSION AND FUTURE ENHANCEMENTS 52

8.1 CONCLUSION 52

8.2 FUTURE ENHANCEMENTS 52-53

9 REFERENCES 54
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

ABSTRACT

The web is the huge and most extravagant wellspring of data. To recover the information
from the World Wide Web, Search Engines are commonly utilized. Search engines provide a
simple interface for searching for user query and displaying results in the form of the web
address of the relevant web page, but using traditional search engines has become very
challenging to obtain suitable information. This project proposed a search engine using
Machine Learning technique that will give more relevant web pages at top for user queries.

Index Terms—World Wide Web, Search Engine, PageRank, Machine Learning.

BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

INTRODUCTION

World Wide Web is actually a web of individual systems and servers which are connected with different
technology and methods. Every site comprises the heaps of site pages that are being made and sent on the
server. So if a user needs something, then he or she needs to type a keyword. Keyword is a set of words
extracted from user search input. Search input given by a user may be syntactically incorrect. Here comes
the actual need for search engines. Search engines provide you a simple interface to search user queries and
display the results.
1) Web crawler Web crawlers help in collecting data about a website and the links related to them. We are
only using web crawlers for collecting data and information from WWW and storing it in our database.
2) Indexer which arranges each term on each web page and stores the subsequent list of terms in a
tremendous repository.
3) Query Engine It is mainly used to reply to the user’s keyword and show the effective outcome for their
keyword. In the query engine, the Page ranking algorithm ranks the URL by using different algorithms in
the query engine.
4)T his paper utilizes Machine Learning Techniques to discover the utmost suitable web address for the
given keyword. The output of the PageRank algorithm is given as input to the machine learning algorithm.
5) The section II discusses the related work in search engine and PageRank algorithm. In section III
Objective is explained. Section IV deals with a proposed system which is based on machine learning
technique and section V contains the conclusion.

1
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

1.1 MOTIVATION
As in today’s internet world, people are mostly based on search engines to search what they are looking for
in the internet. • The web is the huge and most extravagant well spring of data. To retrieve the information
from the World Wide Web, Search Engines are commonly utilized.

1.2 PROBLEM DEFINTION

The project we have built is used to provide the faster retrieval of information using search engines that are
implemented by using machine learning algorithms. It provides a simple interface for searching for user
query and displaying results in the form of the web address of the relevant web page but using traditional
search engines has become very challenging to obtain suitable information

1.3 OBJECTIVE OF PROJECT

To build a search engine which gives web address of the most relevant web page at the top of the search
result, according to user queries. The main focus of our system is to build a search engine using machine
learning technique for increasing accuracy compare to available search engine.

2
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

2. LITERATURE SURVEY

1) Weighted page rank algorithm based on in-out weight of webpages

AUTHORS: Kalyani Desikan, B. Jaganathan.

In its classical formulation, the well known page rank algorithm ranks web pages only based on in-links
between web pages. We propose a new in-out weight based page rank algorithm. In this paper, we have
introduced a new weight matrix based on both the in-links and out-links between web pages to compute the
page ranks. We have illustrated the working of our algorithm using a web graph. We notice that the page
rank values of the web pages computed using the original page rank algorithm and our proposed algorithm
are comparable. Moreover, our algorithm is found to be efficient with respect to the time taken to compute
the page rank values.

2) Web Page Ranking Using Machine Learning Approach

AUTHORS: Junaid Khan, Arunima Jaiswal.

One of the key components which ensures the acceptance of web search service is the web page ranker - a
component which is said to have been the main contributing factor to the early successes of Google. It is
well established that a machine learning method such as the Graph Neural Network (GNN) is able to learn
and estimate Google's page ranking algorithm. This paper shows that the GNN can successfully learn many
other web page ranking methods e.g. TrustRank, HITS and OPIC. Experimental results show that GNN may
be suitable to learn any arbitrary web page ranking scheme, and hence, may be more flexible than any other
existing web page ranking scheme. The significance of this observation lies in the fact that it is possible to
learn ranking schemes for which no algorithmic solution exists or is known.

3
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

3) Review of features and machine learning techniques for web searching.

AUTHORS: Neha Sharm ,Narendra Kohli

As the amount of information is growing rapidly on world wide web, it has become very difficult to get
relevant information using traditional search engines within a stipulated time. The main reasons for
irrelevant search results are the lack of understanding of user's search intention or user's preferences,
keyword based searching, short queries. In this paper, we will study different features that are used in
information retrieval. We will also discuss various machine learning techniques that are helpful in deciding
the relevance of web page to user. We have done classification on the basis of features. In the end we will
compare different techniques and their pros and cons are also discussed.

4
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

3. SYSTEM ANALYSIS

3.1 EXISTING SYSTEM

Information retrieval is to retrieve the information resources that we are interested in or extract whatever
information we need.
• Information Retrieval (IR) may deal with the organization, storage, retrieval and evaluation of information
from documents, particularly textual information
. • But we cannot give the ranks to those documents.

DISADVANTAGES OF EXISTING SYSTEM

Information retrieval will be very difficult in large numbers of texts in a document.
• Difficult to identify the important concepts or topic in a collection of documents
• The explicit rankings are always difficult to obtain or even not available in many documents.

3.2 PROPOSED SYSTEM

• The proposed search engine is very useful for finding out more relevant URLs for given keywords.
• Anyone can easily identify the important documents in a collection of documents and retrieve the related
data.
• It proposes a novel model, named LDA (Linear Discriminant Analysis), easy for clustering the related
documents based on that ranking.

ADVANTAGES OF PROPOSED SYSTEM

• We will build a search engine which gives the web address of the most relevant web page at the top of the
search result, according to user queries
• The main focus of our system is to build a search engine to discover the utmost suitable web address for
the given keyword by using machine learning techniques for increasing accuracy compared to available
search engines.

5
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

3.3 SYSTEM SPECIFICATION:

3.3.1 HARDWARE REQUIREMENTS:

 System : Pentium IV 2.4 GHz.

 Hard Disk : 40 GB.

 Monitor : 14’ Colour Monitor.

 Mouse : Optical Mouse.

 Ram : 512 Mb.

3.3.2 SOFTWARE REQUIREMENTS:

 Operating system : Windows 7 Ultimate.

 Coding Language : Python.

 Front-End : Python.

 Designing : Html, Css, Javascript .

 Data Base : MySQL.

6
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

3.4 SYSTEM STUDY

FEASIBILITY STUDY

The feasibility of the project is analyzed in this phase and business proposal is put forth with a
very general plan for the project and some cost estimates. During system analysis the feasibility study of
the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the
company. For feasibility analysis, some understanding of the major requirements for the system is
essential.

Three key considerations involved in the feasibility analysis are,

 ECONOMICAL FEASIBILITY
 TECHNICAL FEASIBILITY
 SOCIAL FEASIBILITY

3.4.1 ECONOMICAL FEASIBILITY

This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and development of the
system is limited. The expenditures must be justified. Thus the developed system as well within the budget
and this was achieved because most of the technologies used are freely available. Only the customized
products had to be purchased.

3.4.2 TECHNICAL FEASIBILITY

This study is carried out to check the technical feasibility, that is, the technical requirements of
the system. Any system developed must not have a high demand on the available technical resources. This
will lead to high demands on the available technical resources. This will lead to high demands being placed
on the client. The developed system must have a modest requirement, as only minimal or null changes are
required for implementing this system.

7
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

3.4.3 SOCIAL FEASIBILITY

The aspect of study is to check the level of acceptance of the system by the user. This includes the
process of training the user to use the system efficiently. The user must not feel threatened by the system,
instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods
that are employed to educate the user about the system and to make him familiar with it. His level of
confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as
he is the final user of the system.

8
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

4 SOFTWARE ENVIRONMENT

4.1 PYTHON
Python is a general-purpose interpreted, interactive, object-oriented, and high-level programming language.
An interpreted language, Python has a design philosophy that emphasizes code readability (notably
using whitespace indentation to delimit code blocks rather than curly brackets or keywords), and a syntax
that allows programmers to express concepts in fewer lines of code than might be used in languages such
as C++or Java. It provides constructs that enable clear programming on both small and large scales. Python
interpreters are available for many operating systems. C Python, the reference implementation of Python,
is open source software and has a community-based development model, as do nearly all of its variant
implementations. C Python is managed by the non-profit Python Software Foundation. Python features
a dynamic type system and automatic memory management. It supports multiple programming paradigms,
including object-oriented, imperative, functional and procedural, and has a large and
comprehensive standard library.

Interactive Mode Programming

Invoking the interpreter without passing a script file as a parameter brings up the following prompt −

$ python
Python 2.4.3 (#1, Nov 11 2010, 13:34:43)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
Type the following text at the Python prompt and press the Enter −

>>> print "Hello, Python!"

If you are running new version of Python, then you would need to use print statement with parenthesis as in
print ("Hello, Python!");. However in Python version 2.4.3, this produces the following result −

Hello, Python!
Script Mode Programming

9
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

Invoking the interpreter with a script parameter begins execution of the script and continues until the script
is finished. When the script is finished, the interpreter is no longer active.

Let us write a simple Python program in a script. Python files have extension .py. Type the following source
code in a test.py file −

Live Demo
print "Hello, Python!"
We assume that you have Python interpreter set in PATH variable. Now, try to run this program as follows −

$ python test.py
This produces the following result −

Hello, Python!
Let us try another way to execute a Python script. Here is the modified test.py file −

Live Demo
#!/usr/bin/python

print "Hello, Python!"

We assume that you have Python interpreter available in /usr/bin directory. Now, try to run this program as
follows −

$ chmod +x test.py # This is to make file executable

$./test.py
This produces the following result −

Hello, Python!
Python Identifiers
A Python identifier is a name used to identify a variable, function, class, module or other object. An
identifier starts with a letter A to Z or a to z or an underscore (_) followed by zero or more letters,
underscores and digits (0 to 9).

Python does not allow punctuation characters such as @, $, and % within identifiers. Python is a case

10
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

sensitive programming language. Thus, Manpower and manpower are two different identifiers in Python.

Here are naming conventions for Python identifiers −

Class names start with an uppercase letter. All other identifiers start with a lowercase letter.

Starting an identifier with a single leading underscore indicates that the identifier is private.

Starting an identifier with two leading underscores indicates a strongly private identifier.

If the identifier also ends with two trailing underscores, the identifier is a language-defined special name.

Reserved Words
The following list shows the Python keywords. These are reserved words and you cannot use them as
constant or variable or any other identifier names. All the Python keywords contain lowercase letters only.

and exec not

assert finally or
break for pass
class from print
continue global raise
def if return
del import try
elif in while
else is with
except lambdayield

Lines and Indentation

Python provides no braces to indicate blocks of code for class and function definitions or flow control.
Blocks of code are denoted by line indentation, which is rigidly enforced.

The number of spaces in the indentation is variable, but all statements within the block must be indented the
same amount. For example −

11
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

if True:
print "True"
else:
print "False"
However, the following block generates an error −

if True:
print "Answer"
print "True"
else:
print "Answer"
print "False"
Thus, in Python all the continuous lines indented with same number of spaces would form a block. The
following example has various statement blocks −

Note − Do not try to understand the logic at this point of time. Just make sure you understood various blocks
even if they are without braces.

#!/usr/bin/python

import sys

try:
# open file stream
file = open(file_name, "w")
except IOError:
print "There was an error writing to", file_name
sys.exit()
print "Enter '", file_finish,
print "' When finished"
while file_text != file_finish:
file_text = raw_input("Enter text: ")
if file_text == file_finish:
# close the file

12
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

file.close
break
file.write(file_text)
file.write("\n")
file.close()
file_name = raw_input("Enter filename: ")
if len(file_name) == 0:
print "Next time please enter something"
sys.exit()
try:
file = open(file_name, "r")
except IOError:
print "There was an error reading file"
sys.exit()
file_text = file.read()
file.close()
print file_text
Multi-Line Statements
Statements in Python typically end with a new line. Python does, however, allow the use of the line
continuation character (\) to denote that the line should continue. For example −

total = item_one + \
item_two + \
item_three
Statements contained within the [], {}, or () brackets do not need to use the line continuation character. For
example −

days = ['Monday', 'Tuesday', 'Wednesday',

'Thursday', 'Friday']
Quotation in Python
Python accepts single ('), double (") and triple (''' or """) quotes to denote string literals, as long as the same
type of quote starts and ends the string.

The triple quotes are used to span the string across multiple lines. For example, all the following are legal −

13
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

word = 'word'
sentence = "This is a sentence."
paragraph = """This is a paragraph. It is
made up of multiple lines and sentences."""
Comments in Python
A hash sign (#) that is not inside a string literal begins a comment. All characters after the # and up to the
end of the physical line are part of the comment and the Python interpreter ignores them.

Live Demo
#!/usr/bin/python

# First comment
print "Hello, Python!" # second comment
This produces the following result −

Hello, Python!
You can type a comment on the same line after a statement or expression −

name = "Madisetti" # This is again comment

You can comment multiple lines as follows −

# This is a comment.
# This is a comment, too.
# This is a comment, too.
# I said that already.
Following triple-quoted string is also ignored by Python interpreter and can be used as a multiline
comments:
'''
This is a multiline
comment.
'''
Using Blank Lines
A line containing only whitespace, possibly with a comment, is known as a blank line and Python totally

14
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

ignores it.

In an interactive interpreter session, you must enter an empty physical line to terminate a multiline
statement.

Waiting for the User

The following line of the program displays the prompt, the statement saying “Press the enter key to exit”,
and waits for the user to take action −

#!/usr/bin/python

raw_input("\n\nPress the enter key to exit.")

Here, "\n\n" is used to create two new lines before displaying the actual line. Once the user presses the key,
the program ends. This is a nice trick to keep a console window open until the user is done with an
application.
Multiple Statements on a Single Line
The semicolon ( ; ) allows multiple statements on the single line given that neither statement starts a new
code block. Here is a sample snip using the semicolon.
import sys; x = 'foo'; sys.stdout.write(x + '\n')
Multiple Statement Groups as Suites
A group of individual statements, which make a single code block are called suites in Python. Compound or
complex statements, such as if, while, def, and class require a header line and a suite.
Header lines begin the statement (with the keyword) and terminate with a colon ( : ) and are followed by one
or more lines which make up the suite. For example −

if expression :
suite
elif expression :
suite
else :
suite
Command Line Arguments
Many programs can be run to provide you with some basic information about how they should be run.
Python enables you to do this with -h −

15
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

$ python -h
usage: python [option] ... [-c cmd | -m mod | file | -] [arg] ...
Options and arguments (and corresponding environment variables):
-c cmd : program passed in as string (terminates option list)
-d : debug output from parser (also PYTHONDEBUG=x)
-E : ignore environment variables (such as PYTHONPATH)
-h : print this help message and exit
You can also program your script in such a way that it should accept various options. Command Line
Arguments is an advanced topic and should be studied a bit later once you have gone through rest of the
Python concepts.
Python Lists
The list is a most versatile datatype available in Python which can be written as a list of comma-separated
values (items) between square brackets. Important thing about a list is that items in a list need not be of the
same type.

Creating a list is as simple as putting different comma-separated values between square brackets. For
example −

list1 = ['physics', 'chemistry', 1997, 2000];

list2 = [1, 2, 3, 4, 5 ];
list3 = ["a", "b", "c", "d"]
Similar to string indices, list indices start at 0, and lists can be sliced, concatenated and so on.
A tuple is a sequence of immutable Python objects. Tuples are sequences, just like lists. The differences
between tuples and lists are, the tuples cannot be changed unlike lists and tuples use parentheses, whereas
lists use square brackets.

Creating a tuple is as simple as putting different comma-separated values. Optionally you can put these
comma-separated values between parentheses also. For example −

tup1 = ('physics', 'chemistry', 1997, 2000);

tup2 = (1, 2, 3, 4, 5 );
tup3 = "a", "b", "c", "d";
The empty tuple is written as two parentheses containing nothing −

16
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

tup1 = ();
To write a tuple containing a single value you have to include a comma, even though there is only one value
−

tup1 = (50,);
Like string indices, tuple indices start at 0, and they can be sliced, concatenated, and so on.

Accessing Values in Tuples

To access values in tuple, use the square brackets for slicing along with the index or indices to obtain value
available at that index. For example −

Live Demo
#!/usr/bin/python

tup1 = ('physics', 'chemistry', 1997, 2000);

tup2 = (1, 2, 3, 4, 5, 6, 7 );
print "tup1[0]: ", tup1[0];
print "tup2[1:5]: ", tup2[1:5];
When the above code is executed, it produces the following result −

tup1[0]: physics
tup2[1:5]: [2, 3, 4, 5]
Updating Tuples

Accessing Values in Dictionary

To access dictionary elements, you can use the familiar square brackets along with the key to obtain its
value. Following is a simple example −

Live Demo
#!/usr/bin/python

dict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'}

print "dict['Name']: ", dict['Name']

17
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

print "dict['Age']: ", dict['Age']

When the above code is executed, it produces the following result −

dict['Name']: Zara
dict['Age']: 7
If we attempt to access a data item with a key, which is not part of the dictionary, we get an error as follows
−

Live Demo
#!/usr/bin/python

dict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'}

print "dict['Alice']: ", dict['Alice']
When the above code is executed, it produces the following result −

dict['Alice']:
Traceback (most recent call last):
File "test.py", line 4, in <module>
print "dict['Alice']: ", dict['Alice'];
KeyError: 'Alice'
Updating Dictionary
You can update a dictionary by adding a new entry or a key-value pair, modifying an existing entry, or
deleting an existing entry as shown below in the simple example −

Live Demo
#!/usr/bin/python

dict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'}

dict['Age'] = 8; # update existing entry
dict['School'] = "DPS School"; # Add new entry

print "dict['Age']: ", dict['Age']

print "dict['School']: ", dict['School']
When the above code is executed, it produces the following result −

18
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

dict['Age']: 8
dict['School']: DPS School
Delete Dictionary Elements
You can either remove individual dictionary elements or clear the entire contents of a dictionary. You can
also delete entire dictionary in a single operation.

To explicitly remove an entire dictionary, just use the del statement. Following is a simple example −

Live Demo
#!/usr/bin/python

dict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'}

del dict['Name']; # remove entry with key 'Name'
dict.clear(); # remove all entries in dict
del dict ; # delete entire dictionary

print "dict['Age']: ", dict['Age']

print "dict['School']: ", dict['School']
This produces the following result. Note that an exception is raised because after del dict dictionary does not
exist any more −

dict['Age']:
Traceback (most recent call last):
File "test.py", line 8, in <module>
print "dict['Age']: ", dict['Age'];
TypeError: 'type' object is unsubscriptable
Note − del() method is discussed in subsequent section.

Properties of Dictionary Keys

Dictionary values have no restrictions. They can be any arbitrary Python object, either standard objects or
user-defined objects. However, same is not true for the keys.

There are two important points to remember about dictionary keys −

19
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

(a) More than one entry per key not allowed. Which means no duplicate key is allowed. When duplicate
keys encountered during assignment, the last assignment wins. For example −

Live Demo
#!/usr/bin/python

dict = {'Name': 'Zara', 'Age': 7, 'Name': 'Manni'}

print "dict['Name']: ", dict['Name']
When the above code is executed, it produces the following result −

dict['Name']: Manni
(b) Keys must be immutable. Which means you can use strings, numbers or tuples as dictionary keys but
something like ['key'] is not allowed. Following is a simple example −

Live Demo
#!/usr/bin/python

dict = {['Name']: 'Zara', 'Age': 7}

print "dict['Name']: ", dict['Name']
When the above code is executed, it produces the following result −

Traceback (most recent call last):

File "test.py", line 3, in <module>
dict = {['Name']: 'Zara', 'Age': 7};
TypeError: unhashable type: 'list'
Tuples are immutable which means you cannot update or change the values of tuple elements. You are able
to take portions of existing tuples to create new tuples as the following example demonstrates −

Live Demo
#!/usr/bin/python

tup1 = (12, 34.56);

tup2 = ('abc', 'xyz');

20
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

# Following action is not valid for tuples

# tup1[0] = 100;

# So let's create a new tuple as follows

tup3 = tup1 + tup2;
print tup3;
When the above code is executed, it produces the following result −

(12, 34.56, 'abc', 'xyz')

Delete Tuple Elements
Removing individual tuple elements is not possible. There is, of course, nothing wrong with putting together
another tuple with the undesired elements discarded.

To explicitly remove an entire tuple, just use the del statement. For example −

Live Demo
#!/usr/bin/python

tup = ('physics', 'chemistry', 1997, 2000);

print tup;
del tup;
print "After deleting tup : ";
print tup;
This produces the following result. Note an exception raised, this is because after del tup tuple does not exist
any more −

('physics', 'chemistry', 1997, 2000)

After deleting tup :
Traceback (most recent call last):
File "test.py", line 9, in <module>
print tup;
NameError: name 'tup' is not defined

21
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

4.2 DJANGO

Django is a high-level Python Web framework that encourages rapid development and clean,
pragmatic design. Built by experienced developers, it takes care of much of the hassle of Web development,
so you can focus on writing your app without needing to reinvent the wheel. It’s free and open source.
Django's primary goal is to ease the creation of complex, database-driven websites. Django
emphasizes reusability and "pluggability" of components, rapid development, and the principle of don't
repeat yourself. Python is used throughout, even for settings files and data models.

Django also provides an optional administrative create, read, update and delete interface that is generated
dynamically through introspection and configured via admin models

22
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

Create a Project
Whether you are on Windows or Linux, just get a terminal or a cmd prompt and navigate to the place you
want your project to be created, then use this code −

$ django-admin startproject myproject

This will create a "myproject" folder with the following structure −

myproject/
manage.py
myproject/
__init__.py
settings.py
urls.py
wsgi.py
The Project Structure
The “myproject” folder is just your project container, it actually contains two elements −

manage.py − This file is kind of your project local django-admin for interacting with your project via
command line (start the development server, sync db...). To get a full list of command accessible via
manage.py you can use the code −

$ python manage.py help

23
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

The “myproject” subfolder − This folder is the actual python package of your project. It contains four files −

init.py − Just for python, treat this folder as package.

settings.py − As the name indicates, your project settings.

urls.py − All links of your project and the function to call. A kind of ToC of your project.

wsgi.py − If you need to deploy your project over WSGI.

Setting Up Your Project

Your project is set up in the subfolder myproject/settings.py. Following are some important options you
might need to set −

DEBUG = True
This option lets you set if your project is in debug mode or not. Debug mode lets you get more information
about your project's error. Never set it to ‘True’ for a live project. However, this has to be set to ‘True’ if
you want the Django light server to serve static files. Do it only in the development mode.

DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': 'database.sql',
'USER': '',
'PASSWORD': '',
'HOST': '',
'PORT': '',
}
}
Database is set in the ‘Database’ dictionary. The example above is for SQLite engine. As stated earlier,
Django also supports −

MySQL (django.db.backends.mysql)
PostGreSQL (django.db.backends.postgresql_psycopg2)

24
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

Oracle (django.db.backends.oracle) and NoSQL DB

MongoDB (django_mongodb_engine)
Before setting any new engine, make sure you have the correct db driver installed.

You can also set others options like: TIME_ZONE, LANGUAGE_CODE, TEMPLATE…

Now that your project is created and configured make sure it's working −

$ python manage.py runserver

You will get something like the following on running the above code −

Validating models...

0 errors found
September 03, 2015 - 11:41:50
Django version 1.6.11, using settings 'myproject.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.

A project is a sum of many applications. Every application has an objective and can be reused into another
project, like the contact form on a website can be an application, and can be reused for others. See it as a
module of your project.

Create an Application
We assume you are in your project folder. In our main “myproject” folder, the same folder then manage.py
−

$ python manage.py startapp myapp

You just created myapp application and like project, Django create a “myapp” folder with the application
structure −

myapp/
__init__.py
admin.py

25
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

models.py
tests.py
views.py
__init__.py − Just to make sure python handles this folder as a package.

admin.py − This file helps you make the app modifiable in the admin interface.

models.py − This is where all the application models are stored.

tests.py − This is where your unit tests are.

views.py − This is where your application views are.

Get the Project to Know About Your Application

At this stage we have our "myapp" application, now we need to register it with our Django project
"myproject". To do so, update INSTALLED_APPS tuple in the settings.py file of your project (add your app
name) −

INSTALLED_APPS = (
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'myapp',
)
Creating forms in Django, is really similar to creating a model. Here again, we just need to inherit from
Django class and the class attributes will be the form fields. Let's add a forms.py file in myapp folder to
contain our app forms. We will create a login form.

myapp/forms.py

#-- coding: utf-8 --

26
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

from django import forms

class LoginForm(forms.Form):
user = forms.CharField(max_length = 100)
password = forms.CharField(widget = forms.PasswordInput())
As seen above, the field type can take "widget" argument for html rendering; in our case, we want the
password to be hidden, not displayed. Many others widget are present in Django: DateInput for dates,
CheckboxInput for checkboxes, etc.

Using Form in a View

There are two kinds of HTTP requests, GET and POST. In Django, the request object passed as parameter to
your view has an attribute called "method" where the type of the request is set, and all data passed via POST
can be accessed via the request.POST dictionary.

Let's create a login view in our myapp/views.py −

#-- coding: utf-8 --

from myapp.forms import LoginForm

def login(request):
username = "not logged in"

if request.method == "POST":
#Get the posted form
MyLoginForm = LoginForm(request.POST)

if MyLoginForm.is_valid():
username = MyLoginForm.cleaned_data['username']
else:
MyLoginForm = Loginform()

return render(request, 'loggedin.html', {"username" : username})

The view will display the result of the login form posted through the loggedin.html. To test it, we will first
need the login form template. Let's call it login.html.

27
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

<form name = "form" action = "{% url "myapp.views.login" %}"

method = "POST" >{% csrf_token %}

<div style = "max-width:470px;">

<br>

<div style = "max-width:470px;">

<br>

<div style = "max-width:470px;">

<button style = "border:0px; background-color:#4285F4; margin-top:8%;

height:35px; width:80%;margin-left:19%;" type = "submit"
value = "Login" >
<strong>Login</strong>
</button>

</center>
28
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

</div>

</form>

</body>
</html>
The template will display a login form and post the result to our login view above. You have probably
noticed the tag in the template, which is just to prevent Cross-site Request Forgery (CSRF) attack on your
site.

{% csrf_token %}
Once we have the login template, we need the loggedin.html template that will be rendered after form
treatment.

<html>

<body>
You are : <strong>{{username}}</strong>
</body>

</html>
Now, we just need our pair of URLs to get started: myapp/urls.py

from django.conf.urls import patterns, url

from django.views.generic import TemplateView

urlpatterns = patterns('myapp.views',
url(r'^connection/',TemplateView.as_view(template_name = 'login.html')),
url(r'^login/', 'login', name = 'login'))
When accessing "/myapp/connection", we will get the following login.html template rendered −
Setting Up Sessions
In Django, enabling session is done in your project settings.py, by adding some lines to the
MIDDLEWARE_CLASSES and the INSTALLED_APPS options. This should be done while creating the
project, but it's always good to know, so MIDDLEWARE_CLASSES should have −

29
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

'django.contrib.sessions.middleware.SessionMiddleware'
And INSTALLED_APPS should have −

'django.contrib.sessions'
By default, Django saves session information in database (django_session table or collection), but you can
configure the engine to store information using other ways like: in file or in cache.

When session is enabled, every request (first argument of any view in Django) has a session (dict) attribute.

Let's create a simple sample to see how to create and save sessions. We have built a simple login system
before (see Django form processing chapter and Django Cookies Handling chapter). Let us save the
username in a cookie so, if not signed out, when accessing our login page you won’t see the login form.
Basically, let's make our login system we used in Django Cookies handling more secure, by saving cookies
server side.

For this, first lets change our login view to save our username cookie server side −

def login(request):
username = 'not logged in'

if request.method == 'POST':
MyLoginForm = LoginForm(request.POST)

if MyLoginForm.is_valid():
username = MyLoginForm.cleaned_data['username']
request.session['username'] = username
else:
MyLoginForm = LoginForm()

return render(request, 'loggedin.html', {"username" : username}

Then let us create formView view for the login form, where we won’t display the form if cookie is set −

def formView(request):

30
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

if request.session.has_key('username'):
username = request.session['username']
return render(request, 'loggedin.html', {"username" : username})
else:
return render(request, 'login.html', {})
Now let us change the url.py file to change the url so it pairs with our new view −

from django.conf.urls import patterns, url

from django.views.generic import TemplateView

urlpatterns = patterns('myapp.views',
url(r'^connection/','formView', name = 'loginform'),
url(r'^login/', 'login', name = 'login'))
When accessing /myapp/connection, you will get to see the following page

4.3 MACHINE LEARNING

Machine Learning is a branch of the broader field of artificial intelligence that makes
use of statistical models to develop predictions. It is often described as a form of
predictive modelling or predictive analytics and traditionally, has been defined as the
ability of a computer to learn without explicitly being programmed to do so.

In basic technical terms, machine learning uses algorithms that take empirical or
historical data in, analyze it, and generate outputs based on that analysis. In some
approaches, the algorithms work with so-called “training data” first and then they
learn, predict, and find ways to improve their performance over time.

4.3.1 Types of Machine Learning

There are three main approaches to machine learning: supervised, unsupervised, and
reinforcement learning. There are also hybrid approaches including semi-supervised
learning, which can be tailored to the problem a researcher is seeking to solve. Each
approach has specific strengths and weaknesses, and some techniques are better
suited to particular types of problems than others.

31
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

 In supervised learning, the computer is trained on a set of data inputs and outputs, with a goal of
learning a general rule that maps the given inputs to the given outputs. Two main types of supervised
learning are:
1) classification, which entails the prediction of a class label, and
2) regression, which entail the prediction of a numerical value.

 In unsupervised learning, the learning algorithm is not given this type of guidance; instead, it works
to discover the pattern or structure in the input on its own. Two main types of unsupervised learning
are:
1) clustering, which involves discovering groups within the dataset that share similar characteristics,
2) density estimation, which involves evaluating the statistical distribution of the data set.
Unsupervised learning methods also include visualization with the data and projection, which
reduces the dimensions of the data, a form of simplification.

 In reinforcement learning, the computer and algorithms will confront a problem in a dynamic
environment and as it works to perform a given goal, it will receive feedback (rewards), which will
reinforce its learning and goal seeking effort. The example of AlphaGo is a case of reinforcement
learning; reinforcement learning algorithms include Q-learning, temporal-difference learning, and
deep reinforcement learning.

4.3.2 Application Examples of Machine Learning

In the financial markets, machine learning is used for automation, portfolio optimization, risk management,
and to provide financial advisory services to investors (robo-advisors).

For automation in the form of algorithmic trading, human traders will build mathematical models that
analyze financial news and trading activities to discern markets trends, including volume, volatility, and
possible anomalies. These models will execute trades based on a given set of instructions, enabling activity
without direct human involvement once the system is set up and running.

For portfolio optimization, machine learning techniques can help in evaluating large amounts of data,
determining patterns, and finding solutions for given problems with regard to balancing risk and reward. ML
can also help in detecting investment signals and in time-series forecasting.

For risk management, machine learning can assist with credit decisions and also with detecting suspicious
transactions or behavior, including KYC compliance efforts and prevention of fraud.

32
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

For financial advisory services, machine learning has supported the shift towards robo-advisors for some
types of retail investors, assisting them with their investment and savings goals.

4.3.3 Advantages and Disadvantages of Machine Learning

Advantages of Machine learning

1. Easily identifies trends and patterns

Machine Learning can review large volumes of data and discover specific trends and patterns that would not
be apparent to humans. For instance, for an e-commerce website like Amazon, it serves to understand the
browsing behaviors and purchase histories of its users to help cater to the right products, deals, and
reminders relevant to them. It uses the results to reveal relevant advertisements to them.

2. No human intervention needed (automation)

With ML, you don’t need to babysit your project every step of the way. Since it means giving machines the
ability to learn, it lets them make predictions and also improve the algorithms on their own. A common
example of this is anti-virus softwares; they learn to filter new threats as they are recognized. ML is also
good at recognizing spam.

3. Continuous Improvement

As ML algorithms gain experience, they keep improving in accuracy and efficiency. This lets them make
better decisions. Say you need to make a weather forecast model. As the amount of data you have keeps
growing, your algorithms learn to make more accurate predictions faster.
4. Handling multi-dimensional and multi-variety data

Machine Learning algorithms are good at handling data that are multi-dimensional and multi-variety, and
they can do this in dynamic or uncertain environments.

5. Wide Applications

You could be an e-tailer or a healthcare provider and make ML work for you. Where it does apply, it holds
the capability to help deliver a much more personal experience to customers while also targeting the right
customers.

33
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

Disadvantages of Machine Learning

With all those advantages to its powerfulness and popularity, Machine Learning isn’t perfect. The following
factors serve to limit it:

1. Data Acquisition

Machine Learning requires massive data sets to train on, and these should be inclusive/unbiased, and of
good quality. There can also be times where they must wait for new data to be generated.

2. Time and Resources

ML needs enough time to let the algorithms learn and develop enough to fulfill their purpose with a
considerable amount of accuracy and relevancy. It also needs massive resources to function. This can mean
additional requirements of computer power for you.

3. Interpretation of Results

Another major challenge is the ability to accurately interpret results generated by the algorithms. You must
also carefully choose the algorithms for your purpose.

4. High error-susceptibility

Machine Learning is autonomous but highly susceptible to errors. Suppose you train an algorithm with data
sets small enough to not be inclusive. You end up with biased predictions coming from a biased training set.
This leads to irrelevant advertisements being displayed to customers.

4.3.4 HISTORY OF MACHINE LEARNING

Machine Learning (ML) has a rich and fascinating history, spanning several decades of research and
development. Here's an overview of the key milestones and advancements in the history of ML:

1950s and 1960s:

The field of artificial intelligence (AI) emerged, and researchers began exploring the concept of machines
that could learn and mimic human intelligence.
Early work on ML focused on developing algorithms and models for pattern recognition, such as the
Perceptron algorithm by Frank Rosenblatt.

34
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

1970s and 1980s:

Researchers started to develop more sophisticated ML techniques, including decision trees, neural networks,
and rule-based systems.
The field faced challenges and criticisms due to limited computing power and data availability, leading to
the "AI winter" where funding and interest in AI and ML declined.
1990s:
The emergence of the World Wide Web and the availability of vast amounts of digital data created new
opportunities for ML research.
Support Vector Machines (SVMs) gained popularity as a powerful algorithm for classification and
regression tasks.
Reinforcement Learning, a branch of ML focused on training agents to interact with environments, gained
attention.
2000s:
The growth of e-commerce, social media, and online services led to an explosion of data, further driving the
development of ML.
Deep Learning gained prominence with the introduction of deep neural networks capable of learning
hierarchical representations.
The availability of large-scale labeled datasets, such as ImageNet, enabled significant advancements in
computer vision tasks.
2010s:
Deep Learning revolutionized several domains, including computer vision, natural language processing
(NLP), and speech recognition.
Convolutional Neural Networks (CNNs) achieved breakthrough performance in image classification tasks.
Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks were successfully
applied to sequence data and language tasks.
Transfer Learning, which leverages pre-trained models for different tasks, gained popularity.

2020s:
ML applications expanded into various domains, including healthcare, finance, autonomous vehicles, and
robotics.
Explainable AI and fairness in ML became crucial topics to address biases and improve transparency.
Generative Adversarial Networks (GANs) emerged as a powerful technique for generating realistic synthetic
data.

35
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

ML frameworks and libraries, such as TensorFlow and PyTorch, provided tools for easier model
development and deployment.
Ethical considerations, privacy concerns, and regulation discussions surrounding ML gained traction.
It's important to note that this overview provides a high-level summary of ML's history, and there are many
more specific developments and breakthroughs that have occurred along the way. The field continues to
evolve rapidly, with ongoing research and advancements shaping the future of ML.

36
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

5. SYSTEM DESIGN

5.1 SYSTEM ARCHITECTURE:

Fig. 5.1. Block Diagram of Search Engine

1) Web crawler Web crawlers help in collecting data about a website and the links related to them. We are
only using web crawler for collecting data and information from WWW and store it to our database.

2) Indexer Indexer which arranges each term on each web page and stores the subsequent list of terms in a
tremendous repository.

3) Query Engine It is mainly used to reply the user’s keyword and show the effective outcome for their
keyword. In query engine, Page ranking algorithm ranks the URL by using different algorithms in the query
engine.

.5.2 METHODOLOGY

37
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

1) Collect data from WWW using web crawler.

2) Perform data cleaning using NLP.
3) Study and compare the existing page ranking algorithm.
4) Merge the selected page rank algorithm with current
technologies in machine learning.
5) Implement query engine to display the efficient results for user query.

A. Collect data from WWW using web crawler

In this step, we are using keyword based web crawler to collect data and information from internet. It begins
its working utilizing seed URL. Subsequent to visiting the website page of seed URL and concentrates every
one of the hyperlinks present in that site page and store the extracted hyperlinks to the queue and exact the
data from all web pages. Finally fifilter out the URL which is relevant for particular keywords.
Algorithm steps:

Step 1: Start with seed URL.

Step 2: Initialize queue (q).
Step 3: Dequeue URL’s from queue (q).
Step 4: Downloads web page related with this URL.
Step 5: Extract all URLs from downloaded web pages
Step 6: Insert extracted URL into queue (q).
Step 7: Goto step 1 until more relevant results are achieved.

Fig. 5.2. Flowchart for keyword focused web crawler

38
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

B. Perform data cleaning using NLP

In this step, data cleaning is performed to preprocess the data using NLP steps so that unnecessary data is
removed. After collecting data from WWW using web crawler, there is need to perform data cleaning using
NLP.

Fig.5.3 NLP steps for data cleaning for Building Search Engine
Using Machine Learning Technique

C. Study and compare the existing page ranking algorithm

This algorithm calculates the page score at the time the pages are indexed. Web page weight is calculated
based on inbound and outbound links of importance web page. It calculates hub and authority score for each
web page. Input Parameter Incoming links Incoming and outgoing links Content, incoming and outgoing
links Algorithm Complexity O(log N) < O(log N) < O(log N) Quality of Results Good More than PageRank
Less than PageRank effificiency medium High Low Among all, the Weighted PageRank algorithm is best
suited for system because it gives more accuracy and effificiency comparable to other

C. Merge the selected page rank algorithm with current

In this step, topmost output of pagerank algorithm is considered as input for machine learning algorithm.
The output of machine learning algorithm is given to the user as a web address of relevant web page based
on user queries. For implementing the machine learning algorithm to find out the most relevant web
pagebased on user queries, we are dividing the web feature into three parts:
1) Page content
2) Page content of Neighbors

39
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

3) Link analysis
E. Implement query engine to display the efficient results for
user query

At last, implement the Query engine which takes the input from the user in a form of query and display the
effificient result for their query. It will display the web address of relevant pages based on the output of
machine learning algorithm.

5.3 UML DIAGRAMS

UML stands for Unified Modeling Language. UML is a standardized general-purpose modeling language in
the field of object-oriented software engineering. The standard is managed, and was created by, the Object
Management Group. The goal is for UML to become a common language for creating models of object
oriented computer software. In its current form UML is comprised of two major components: a Meta-model
and a notation. In the future, some form of method or process may also be added to; or associated with,
UML.

The Unified Modeling Language is a standard language for specifying, Visualization, Constructing
and documenting the artifacts of software system, as well as for business modeling and other non-software
systems. The UML represents a collection of best engineering practices that have proven successful in the
modeling of large and complex systems.The UML is a very important part of developing objects oriented
software and the software development process. The UML uses mostly graphical notations to express the
design of software projects.

GOALS:
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling Language so that they can develop and
exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks, patterns and
components.
7. Integrate best practices.

40
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

5.3.1 USE CASE DIAGRAM:

A use case diagram in the Unified Modeling Language (UML) is a type of behavioral diagram defined by
and created from a Use-case analysis. Its purpose is to present a graphical overview of the functionality
provided by a system in terms of actors, their goals (represented as use cases), and any dependencies
between those use cases. The main purpose of a use case diagram is to show what system functions are
performed for which actor. Roles of the actors in the system can be depicted.

Fig 5.3.1 Use Case Diagram For Building Search Engine Using Machine Learning Technique

41
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

5.3.2 CLASS DIAGRAM:

In software engineering, a class diagram in the Unified Modeling Language (UML) is a type of static
structure diagram that describes the structure of a system by showing the system's classes, their attributes,
operations (or methods), and the relationships among the classes. It explains which class contains
information

Fig 5.3.2 Class Diagram For Building Search Engine Using Machine Learning Technique

42
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

5.3.3 SEQUENCE DIAGRAM:

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram that shows how
processes operate with one another and in what order. It is a construct of a Message Sequence Chart.
Sequence diagrams are sometimes called event diagrams, event scenarios, and timing diagrams .

Fig 5.3.3 Sequence Diagram For Building Search Engine Using Machine Learning Technique

43
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

5.3.4 ACTIVITY DIAGRAM:

Activity diagrams are graphical representations of workflows of stepwise activities and actions with support
for choice, iteration and concurrency. In the Unified Modeling Language, activity diagrams can be used to
describe the business and operational step-by-step workflows of components in a system. An activity
diagram shows the overall flow of control

Fig 5.3.4 Activity Diagram For Building Search Engine Using Machine Learning Technique

44
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

5.4 IMPLIMENTATION
IMPLEMENTATION We have used three algorithms in our project. They are:
1. Support Vector Machine
2. Artificial Neural Network
3. XGBoost

SUPPORT VECTOR MACHINE

SVM is treated as of its exceptional performance, a SVM was also used to allow a better approach. It used
the same set of feature scores to perform classification. Dataset is not linearly separable so we are using
nonlinear SVM. Rbf, poly and sigmoid are type of nonlinear kernel. The above 14 feature are selected as a
input for SVM model and based on that feature, SVM tried to predict, whether each web page in the testing
set was relevant to the given query or not. The results were stored and used for performance evaluation.
ARTIFICIAL NEURAL NETWORK:
A neural network consist of three layers, namely input layer, hidden layer, and output layer. The neural
network’s input layer consisted of 14 nodes corresponding to each web page’s 14 feature scores. Only one
output node is required in output layer for determining relevancy of a web page. The number of nodes was
set to 7 in the hidden layer. These parameters are set using a grid search based on some initial
experimentation. The entire process has been repeated 150 times and the batch size is set to 10. The results
were stored and used for performance evaluation.
XGBOOST
It is a type of Boosting based ensemble learning. It uses gradient boosted decision trees for improving
accuracy and speed. The input feature consist of same 14 features and we are using gbtree based booster.
The number of classifier are set to 50 and max depth size

45
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

5.4.1 MODULES
 Manager
 user
 Admin
 Machine-learning

Manager:
Manager information and task descriptions for the entire experiment. Manager can upload the file into the
database. we can upload the file with file type and name of the file and also particular url to the file to get
the information about the file.

User:
user information and task descriptions for the entire experiment. user after login into the session he will get
two options. he can search the whatever particular url or information. we can search the particular file and
also we can get the weight and rank of the file by using the concept.

Admin:
Admin will give authority to managers and users. In order to facilitate activate the managers and activate the
users. the admin can see the details of all users and managers. Admin can get the accuracy results of svm
and xgboost algorithms.

Machine learning:
Machine learning refers to the computer’s acquisition of a kind of ability to make predictive judgments and
make the best decisions by analyzing and learning a large number of existing data. The representation
algorithms include deep learning, artificial neural networks, decision trees, enhancement algorithms and so
on. The key way for computers to acquire artificial intelligence is machine learning. Nowadays, machine
learning plays an important role in various fields of artificial intelligence. Whether in aspects of internet
search, biometric identification, auto driving, Mars robot, or in American presidential election, military
decision assistants and so on, basically, as long as there is a need for data analysis.

46
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

6.IMPLEMENTATION AND RESULTS

6.1 Source Code

url.py:

path('userlogin/',user.userlogin,name='userlogin'),
path('userregister/',user.userregister,name='userregister'),
path('userlogincheck/',user.userlogincheck,name='userlogincheck'),
path('pagerank',user.pagerank,name='pagerank'),
path('search/',user.search, name="search"),
path('search1/',user.search1, name="search1"),
path('usersearchresult/',user.usersearchresult, name="usersearchresult"),
path('usersearchresult1/',user.usersearchresult1, name="usersearchresult1"),
path('weight/', user.weight, name="weight"),
path('logout/',user.logout,name='logout'),

path('managerlogin/',manager.managerlogin,name='managerlogin'),
path('managerregister/',manager.managerregister,name='managerregister'),
path('managerlogincheck/',manager.managerlogincheck,name='managerlogincheck'),
path('fileupload/', manager.fileupload, name='fileupload'),

path('admin1/',search.adminlogin,name='admin1'),
path('adminloginentered/',search.adminloginentered,name='adminloginentered'),
path('userdetails/',search.userdetails,name='userdetails'),
path('Managerdetails/',search.managerdetails,name='Managerdetails'),
path('activateuser/',search.activateuser,name='activateuser'),
path('activatemanager/',search.activatemanager,name='activatemanager'),

47
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

views.py:
def managerlogin(request):
return render(request,'manager/managerlogin.html')

def managerregister(request):
if request.method=='POST':
form1=managerForm(request.POST)
if form1.is_valid():
form1.save()
print("succesfully saved the data")
return render(request, 'manager/managerlogin.html')
#return HttpResponse("registreration succesfully completed")
else:
print("form not valied")
return HttpResponse("form not valied")
else:
form=managerForm()
return render(request,"manager/managerregister.html",{"form":form})

def managerlogincheck(request):
if request.method == 'POST':
sname = request.POST.get('email')
print(sname)
spasswd = request.POST.get('upasswd')
print(spasswd)
try:
check = managerModel.objects.get(email=sname,passwd=spasswd)
# print('usid',usid,'pswd',pswd)
48
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

print(check)
# request.session['name'] = check.name
# print("name",check.name)
status = check.status
print('status',status)
if status == "Activated":
request.session['email'] = check.email
return render(request, 'manager/managerpage.html')
else:
messages.success(request, 'manager is not activated')
return render(request, 'manager/managerlogin.html')
except Exception as e:
print('Exception is ',str(e))
pass
messages.success(request,'Invalid name and password')
return render(request,'manager/managerlogin.html')

models.py:
from django.db import models

class userModel(models.Model):
name = models.CharField(max_length=50)
email = models.EmailField()
passwd = models.CharField(max_length=40)
cwpasswd = models.CharField(max_length=40)
mobileno = models.CharField(max_length=50, default="", editable=True)
status = models.CharField(max_length=40, default="", editable=True)

def __str__(self):
return self.email

class Meta:
49
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

db_table='userregister'

class weightmodel(models.Model):
filename = models.CharField(max_length=100)
file = models.FileField(upload_to='files/pdfs/')
weight=models.CharField(max_length=100)
rank=models.CharField(max_length=100,default="", editable=False)
label=models.CharField(max_length=100,default="", editable=False)

def __str__(self):
return self.filename
class Meta:
db_table='weight'

forms.py:
from django import forms
from user.models import *
from django.core import validators

class userForm(forms.ModelForm):
name = forms.CharField(widget=forms.TextInput(), required=True, max_length=100,)
passwd = forms.CharField(widget=forms.PasswordInput(), required=True, max_length=100)
cwpasswd = forms.CharField(widget=forms.PasswordInput(), required=True, max_length=100)
email = forms.CharField(widget=forms.TextInput(),required=True)
mobileno= forms.CharField(widget=forms.TextInput(), required=True,
max_length=10,validators=[validators.MaxLengthValidator(10),validators.MinLengthValidator(10)])
status = forms.CharField(widget=forms.HiddenInput(), initial='waiting', max_length=100)

def __str__(self):
return self.email

class Meta:
model=userModel
50
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

fields=['name','passwd','cwpasswd','email','mobileno','status']

userdetails.html:
{% extends 'adminbase.html' %}
{% load static %}

{% block contents %}

<div class="modal fade" id="login" role="dialog">

<div class="modal-content">
<div class="modal-header">
<button type="button" class="close" data-dismiss="modal">×</button>
<h4 class="modal-title text-center form-title">Login</h4>
</div>
<div class="modal-body padtrbl">

<div class="login-box-body">
<p class="login-box-msg">Sign in to start your session</p>
<div class="form-group">
<form name="" id="loginForm">
<div class="form-group has-feedback">

<input class="form-control" placeholder="Username" id="loginid" type="text"
51
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

autocomplete="off" />
<span style="display:none;font-weight:bold; position:absolute;color: red;position:
absolute;padding:4px;font-size: 11px;background-color:rgba(128, 128, 128, 0.26);z-index: 17; right: 27px;
top: 5px;" id="span_loginid"></span>

<span class="glyphicon glyphicon-user form-control-feedback"></span>
</div>
<div class="form-group has-feedback">

<input class="form-control" placeholder="Password" id="loginpsw" type="password"
autocomplete="off" />
<span style="display:none;font-weight:bold; position:absolute;color: grey;position:
absolute;padding:4px;font-size: 11px;background-color:rgba(128, 128, 128, 0.26);z-index: 17; right: 27px;
top: 5px;" id="span_loginpsw"></span>

<span class="glyphicon glyphicon-lock form-control-feedback"></span>
</div>
<div class="row">
<div class="col-xs-12">
<div class="checkbox icheck">
<label>
<input type="checkbox" id="loginrem" > Remember Me
</label>
</div>
</div>
<div class="col-xs-12">
<button type="button" class="btn btn-green btn-block btn-flat" onclick="userlogin()">Sign
In</button>
</div>
</div>
</form>
</div>
</div>
</div>

52
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

</div>

</div>
</div>


<div class="banner">
<div class="bg-color">
<div class="container">
<div class="row">
<div class="banner-text text-center">
<div class="text-border">

</div>
<div class="intro-para text-center quote">
<p>
Welcome admin page...
</p>
<center><h3>
<table border="2px solid red" align="left">
<tr><th style="color:green">Id</th>
<th style="color:green">name</th>

<th style="color:green">email</th>
<th style="color:green">mobileno</th>
<th style="color:green">status</th>
<th style="color:green">activate</th>
</tr>

{% for x in qs %}
<tr>
<td style="color:red">{{x.id}}</td>
<td style="color:red">{{x.name}}</td>

53
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

<td style="color:red">{{x.email}}</td>
<td style="color:red">{{x.mobileno}}</td>
<td style="color:red">{{x.status}}</td>

{% if x.status == 'waiting' %}
<td style="color:orange"> <a href="/activateuser/?pid={{ x.id }}"
>Activate</a></td>
{% else %}
<td style="color:orange"> Activated</td>
{% endif %}

</tr>
{% endfor %}

</table>
</h3>
</center>
</div>

</a>
</div>
</div>
</div>
</div>
</div>-->

{% endblock %}

54
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

6.2 SCREEN SHOTS

In this paper author is using machine learning algorithms called SVM and XGBOOST to predict search
result of given query and building search engine with machine learning algorithms. To train this algorithm
author is using website data and then this data will be converted to numeric vector called TFIDF (term
frequency inverse document frequency). TFIDF vector contains average frequency of each words.
In this paper author has implemented following modules
1) Admin module: admin can login to application using username and password as admin and then
accept or activate new users registration and then train SVM and XGBOOST algorithm
2) Manager module: manager can login to application by using username and password as Manager and
Manager and then upload dataset to application
3) New User Signup: using this module new user can signup with the application
4) User Login: user can login to application and then perform search by giving query.

To run project install MYSQL and python 3.7 and then copy content from DB.txt file and paste in MYSQL
to create database.
Now double click on ‘run.bat’ file to start python DJANGO server and get below screen

Fig 6.2.1 django sever page Building Search Engine Using Machine Learning Technique

In above screen server started and build a vector from dataset where first row showing word and remaining
rows contains TFIDF word frequency. Now open browser and enter URL as
http://127.0.0.1:8000/index.html and press enter key to get below page

55
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

Fig 6.2.2 user signup page Building Search Engine Using Machine Learning Technique

In above screen click on ‘New User Signup Here’ link to get below screen

Fig 6.2.3 Assigning credentials in the user screen Building Search Engine Using Machine Learning
Technique
In above screen user is signing up and then press button to get below output

56
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

Fig 6.2.4 Signup Process Completed Building Search Engine Using Machine Learning Technique

In above screen user signup process completed and now click on ‘User Login’ to get below screen

Fig 6.2.5 User login Screen Building Search Engine Using Machine Learning Technique

57
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

In above screen user is login and will get below output

Fig 6.2.6 Admin Login Building Search Engine Using Machine Learning Technique

In above screen we gave correct login but account not activated by admin so click on ‘Admin Login’ link to
login as admin and then activate user

58
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

Fig 6.2.7 Login Screen Building Search Engine Using Machine Learning Technique

In above screen admin is login and after login will get below screen

Fig 6.2.8 Home Page Building Search Engine Using Machine Learning Technique

In above screen admin can click on ‘View Users’ link to view all users

Fig 6.2.9 User Account Building Search Engine Using Machine Learning Technique
59
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

In above screen we can see SVM and XGBOOST accuracy and in both algorithms XGBOOST got high
accuracy and now logout and login as Manager

Fig 6.2.10 Manager Login Screen Building Search Engine Using Machine Learning Technique

In above screen manager is login and after login will get below screen

60
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

Fig 6.2.11 User Uploading Screen Building Search Engine Using Machine Learning Technique

In above screen manager can click on ‘Upload Dataset’ link to upload dataset or documents

Fig 6.2.12 Upload Data Set Building Search Engine Using Machine Learning Technique
61
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

In above screen manager is browsing and uploading dataset and this file you can find inside ‘Dataset’ folder
and now press button to saved dataset at server database

Fig 6.2.13 Data Set Screen Building Search Engine Using Machine Learning Technique

In above screen dataset file saved in database and now logout and login as user to perform search

62
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

Fig 6.2.14 User Login Screen Building Search Engine Using Machine Learning Technique

In above screen user is login and after login will get below output

Fig 6.2.15 Searching Ranking Page Building Search Engine Using Machine Learning Technique

In above screen user can click on ‘Search with Page Rank’ link to search any data

63
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

Fig 6.2.16 Query Screen Building Search Engine Using Machine Learning Technique

In above screen I entered query as ‘news on security’ and press button to get below search result

Fig 6.2.17 URL Screen Building Search Engine Using Machine Learning Technique

64
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

In above screen machine learning algorithm predicts two URLS for given query and user can click on those
URLS to visit page

Fig 6.2.18 Output Page Building Search Engine Using Machine Learning Technique

In above screen by clicking on URL link user can visit and view page. Similarly user can give any query and
if query available in dataset then he will get output For above query we got below result

65
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

7.TESTING AND VALIDATION

7.1 SAMPLE TEST CASES

Excepted Remarks(IF
S.no Test Case Result
Result Fails) 7.
If User If already user
2 1. User Register registration Pass email exists then
successfully. it fails.
If the Username
Unknown
and password is
Register Users
2. User Login correct then it Pass
will not be
will be a valid
logged in.
page.
If the Manager
name and .Unknown
password is Register
3. Manager login Pass
correct then it Manager will not
will be a valid log in.
pag.
Admin can If the manager
Admin can
activate the did not find it
4. activate the Pass
register manager then it won’t
register magers
id. login
Admin can login
with his login Invalid login
5. Admin login credential. If Pass details will not
success he get is allowed here
home page
Admin can Admin can .If the user did
6. activate the activate the Pass not find it then it
register users register user id . won’t login.
by clicking svm
admin can get prediction of svm
7. it will display Pass
the svm results won’t get..
svm prediction
by clicking
admin can get prediction of
xgboost it will
8. the xgboost Pass xgboost won’t
display xgboost
results get..
prediction.
user can search
the we won’t get the
9. user login page weight of Pass weight of
particular document.
document
Pass
10.
TYPES OF TESTING
66
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

Unit testing
Unit testing involves the design of test cases that validate that the internal program logic is
functioning properly, and that program inputs produce valid outputs. All decision branches and internal code
flow should be validated. It is the testing of individual software units of the application .it is done after the
completion of an individual unit before integration. This is a structural testing, that relies on knowledge of
its construction and is invasive. Unit tests perform basic tests at component level and test a specific business
process, application, and/or system configuration. Unit tests ensure that each unique path of a business
process performs accurately to the documented specifications and contains clearly defined inputs and
expected results.
Integration testing
Integration tests are designed to test integrated software components to determine if they
actually run as one program. Testing is event driven and is more concerned with the basic outcome of
screens or fields. Integration tests demonstrate that although the components were individually satisfaction,
as shown by successfully unit testing, the combination of components is correct and consistent. Integration
testing is specifically aimed at exposing the problems that arise from the combination of components.
Functional test
Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user manuals.
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be exercised.
Systems/Procedures : interfacing systems or procedures must be invoked.
Organization and preparation of functional tests is focused on requirements, key functions, or
special test cases. In addition, systematic coverage pertaining to identify Business process flows; data fields,
predefined processes, and successive processes must be considered for testing. Before functional testing is
complete, additional tests are identified and the effective value of current tests is determined.
System Test
System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system testing is the configuration
oriented system integration test. System testing is based on process descriptions and flows, emphasizing pre-
driven process links and integration points.

67
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

White Box Testing

White Box Testing is a testing in which in which the software tester has knowledge of the
inner workings, structure and language of the software, or at least its purpose. It is purpose. It is used to test
areas that cannot be reached from a black box level.
Black Box Testing
Black Box Testing is testing the software without any knowledge of the inner workings,
structure or language of the module being tested. Black box tests, as most other kinds of tests, must be
written from a definitive source document, such as specification or requirements document, such as
specification or requirements document.
Unit Testing
Unit testing is usually conducted as part of a combined code and unit test phase of the
software lifecycle, although it is not uncommon for coding and unit testing to be conducted as two distinct
phases.
Test strategy and approach
Field testing will be performed manually and functional tests will be written in detail.
Test objectives
 All field entries must work properly.
 Pages must be activated from the identified link.
 The entry screen, messages and responses must not be delayed.

Features to be tested
 Verify that the entries are of the correct format
 No duplicate entries should be allowed
 All links should take the user to the correct page.
Integration Testing
Software integration testing is the incremental integration testing of two or more integrated
software components on a single platform to produce failures caused by interface defects.
The task of the integration test is to check that components or software applications,
Test Results: All the test cases mentioned above passed successfully. No defects encountered.
Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires significant participation by the end
user. It also ensures that the system meets the functional requirements.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.

8. CONCLUSION
68
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

Search engines are very useful for finding out more relevant URLs for given keywords. Due to this, user
time is reduced for searching the relevant web page. For this, Accuracy is a very important factor. From the
above observation, it can be concluded that XGBoost is better in terms of accuracy than SVM and ANN.
Thus, Search engines built using XGBoost and PageRank algorithms will give better accuracy.

69
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

9.FUTURE ENHANCEMENT
Building a search engine using machine learning techniques offers a wide range of possibilities for future
enhancements and improvements. Here are some potential areas for future development and enhancements:
1. Query Understanding: Enhancing the search engine's ability to understand user queries more
accurately and interpret their intent. This could involve natural language processing (NLP)
techniques, sentiment analysis, entity recognition, and understanding contextual information to
provide more relevant search results.
2. Personalization: Incorporating personalization features to tailor search results based on user
preferences, search history, location, and demographic information. This could involve building user
profiles and using collaborative filtering or reinforcement learning techniques to provide
personalized recommendations.
3. Multimedia Content Search: Expanding the search engine's capabilities to handle and retrieve
various types of multimedia content, including images, videos, audio files, and documents.
Developing advanced techniques for content analysis, image recognition, speech recognition, and
video understanding can greatly enhance the search experience.
4. Semantic Search: Implementing techniques to understand the meaning and context of the search
query and the indexed content. Utilizing semantic analysis, knowledge graphs, and ontologies can
improve the search engine's ability to retrieve more accurate and contextually relevant results.
5. Real-Time Updates: Enabling the search engine to handle real-time updates and index new content
as it becomes available. This involves efficient indexing algorithms and techniques to handle large-
scale data ingestion and provide up-to-date search results.
6. Explainability and Trust: Addressing the challenges of explainability in machine learning models
used within the search engine. Developing techniques to provide transparency and explanations for
the ranking of search results, helping users understand why certain results are presented and building
trust in the search engine's recommendations.
7. Multilingual and Cross-Language Search: Extending the search engine's capabilities to handle
multiple languages and facilitate cross-language search. Developing language translation models,
language detection, and cross-lingual retrieval techniques can enable users to search and access
information in different languages.
8. Contextual Search: Leveraging user context, such as time, location, device, and user behavior, to
deliver more context-aware search results. This involves incorporating contextual information into
the ranking algorithm and providing personalized recommendations based on the user's current
situation.

70
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

10.REFERENCES

[1] Manika Dutta, K. L. Bansal, “A Review Paper on Various Search Engines (Google, Yahoo, Altavista,
Ask and Bing)”, International Journal on Recent and Innovation Trends in Computing and Communication,
2016.
[2] Gunjan H. Agre, Nikita V.Mahajan, “Keyword Focused Web Crawler”, International Conference on
Electronic and Communication Systems, IEEE, 2015.
[3] Tuhena Sen, Dev Kumar Chaudhary, “Contrastive Study of Simple PageRank, HITS and Weighted
PageRank Algorithms: Review”, International Conference on Cloud Computing, Data Science &
Engineering, IEEE, 2017.
[4] Michael Chau, Hsinchun Chen, “A machine learning approach to web page filtering using content and
structure analysis”, Decision Support Systems 44 (2008) 482–494,scienceDirect,2008.
[5] Taruna Kumari, Ashlesha Gupta, Ashutosh Dixit, “Comparative Study of Page Rank and Weighted
Page Rank Algorithm”, International Journal of Innovative Research in Computer and Communication
Engineering, February 2014.
[6] K. R. Srinath, “Page Ranking Algorithms – A Comparison”, International Research Journal of
Engineering and Technology (IRJET), Dec2017.
[7] S. Prabha, K. Duraiswamy, J. Indhumathi, “Comparative Analysis of Different Page Ranking
Algorithms”, International Journal of Computer and Information Engineering, 2014.
[8] Dilip Kumar Sharma, A. K. Sharma, “A Comparative Analysis of Web Page Ranking Algorithms”,
International Journal on Computer Science and Engineering, 2010.
[9] Vijay Chauhan, Arunima Jaiswal, Junaid Khalid Khan, “Web Page Ranking Using Machine Learning
Approach”, International Conference on Advanced Computing Communication Technologies, 2015.
[10] Amanjot Kaur Sandhu, Tiewei s. Liu., “Wikipedia Search Engine: Interactive Information Retrieval
Interface Design”, International Conference on Industrial and Information Systems, 2014.
[11] Neha Sharma, Rashi Agarwal, Narendra Kohli, “Review of features and machine learning techniques
for web searching”, International Conference on Advanced Computing Communication Technologies, 2016.
[12] Sweah Liang Yong, Markus Hagenbuchner, Ah Chung Tsoi, “Ranking Web Pages using Machine
Learning Approaches”, International Conference on Web Intelligence and Intelligent Agent Technology,
2008.
[13] B. Jaganathan, Kalyani Desikan,“Weighted Page Rank Algorithm based on In-Out Weight of
Webpages”, Indian Journal of Science and Technology, Dec-2015

71
BUILDING SEARCH ENGINE USING MACHINE LEARNING TECHN

Information Retrieval Techniques by Iresh Dhotre
100% (3)
Information Retrieval Techniques by Iresh Dhotre
168 pages
Internship Report On Machine Learning Techniques
No ratings yet
Internship Report On Machine Learning Techniques
29 pages
5 Unit Notes
100% (1)
5 Unit Notes
166 pages
Search Engines Information Retrieval in Practice PDF
No ratings yet
Search Engines Information Retrieval in Practice PDF
542 pages
Nandha Engineering College ERODE - 638 052: (Autonomous)
No ratings yet
Nandha Engineering College ERODE - 638 052: (Autonomous)
28 pages
Internship Report On Machine Learning With Python
100% (1)
Internship Report On Machine Learning With Python
50 pages
Majp Doc M
No ratings yet
Majp Doc M
70 pages
Irt Book1
No ratings yet
Irt Book1
175 pages
Technical Seminar Report ON Search Engine: Computer Science and Engineering
No ratings yet
Technical Seminar Report ON Search Engine: Computer Science and Engineering
39 pages
Shubham Wadhwa Project Report 20001602055
No ratings yet
Shubham Wadhwa Project Report 20001602055
25 pages
An Anaya
No ratings yet
An Anaya
40 pages
Project Thesis-4
No ratings yet
Project Thesis-4
30 pages
B.E Cse Batchno 256
No ratings yet
B.E Cse Batchno 256
57 pages
Book Suggestion System Doc Team-12
No ratings yet
Book Suggestion System Doc Team-12
57 pages
Big Data Searching FIRST Review
No ratings yet
Big Data Searching FIRST Review
10 pages
DSA Mini Project Template
No ratings yet
DSA Mini Project Template
11 pages
Last FinalAAT
No ratings yet
Last FinalAAT
37 pages
CPP Report
No ratings yet
CPP Report
9 pages
Project
No ratings yet
Project
63 pages
Major Project PROPOSAL-BACHELOR OF ENGINEERING
No ratings yet
Major Project PROPOSAL-BACHELOR OF ENGINEERING
37 pages
SEARCH ENGINE (Synopsis) - Vivek
No ratings yet
SEARCH ENGINE (Synopsis) - Vivek
17 pages
Machine Learning Techniques For Search Engine Development
No ratings yet
Machine Learning Techniques For Search Engine Development
12 pages
570 Report
No ratings yet
570 Report
38 pages
13 Building Search Engine Using Machine Learning
No ratings yet
13 Building Search Engine Using Machine Learning
4 pages
Srs Search Engine
50% (4)
Srs Search Engine
18 pages
Final Last
No ratings yet
Final Last
34 pages
Sam Path
No ratings yet
Sam Path
7 pages
Amazon For More Content
No ratings yet
Amazon For More Content
67 pages
Final Report Uday
No ratings yet
Final Report Uday
33 pages
VEERENDRA Internship Report 1
No ratings yet
VEERENDRA Internship Report 1
42 pages
Personalized Query Results Using User Search Logs
No ratings yet
Personalized Query Results Using User Search Logs
10 pages
Comparative Study of Page Rank and Weighted Page Rank Algorithm
No ratings yet
Comparative Study of Page Rank and Weighted Page Rank Algorithm
9 pages
ML - Report Daiva
No ratings yet
ML - Report Daiva
39 pages
ML Internship
No ratings yet
ML Internship
40 pages
Final Docs Organized (1) Organized (1) Removed Merged
No ratings yet
Final Docs Organized (1) Organized (1) Removed Merged
29 pages
Project Document
No ratings yet
Project Document
30 pages
Abstract Shodhava Search Engine
No ratings yet
Abstract Shodhava Search Engine
4 pages
21P31A05C3
No ratings yet
21P31A05C3
54 pages
CS8080 Information Retrieval Technique Ripped From Amazon Kindle
No ratings yet
CS8080 Information Retrieval Technique Ripped From Amazon Kindle
168 pages
Internship Report Final
No ratings yet
Internship Report Final
19 pages
An Approach For Search Engine Optimization Using Classification - A Data Mining Technique
No ratings yet
An Approach For Search Engine Optimization Using Classification - A Data Mining Technique
4 pages
RM DemoReport SURYA
No ratings yet
RM DemoReport SURYA
9 pages
J1 (SkillDzire)
No ratings yet
J1 (SkillDzire)
49 pages
IEEE Paper Format Template
No ratings yet
IEEE Paper Format Template
2 pages
CS3691 Embedded Systems and Iot L T P C
No ratings yet
CS3691 Embedded Systems and Iot L T P C
2 pages
Syllabus
No ratings yet
Syllabus
2 pages
Machine Learning Based Car Price Prediction System
No ratings yet
Machine Learning Based Car Price Prediction System
32 pages
DBMS Black
No ratings yet
DBMS Black
19 pages
Major Project Report BIG MART Final Reedited
No ratings yet
Major Project Report BIG MART Final Reedited
91 pages
Fake Review Detection Prj2
No ratings yet
Fake Review Detection Prj2
30 pages
Title Search Engine: Submitted in Partial Fulfillment For The Award of Degree
No ratings yet
Title Search Engine: Submitted in Partial Fulfillment For The Award of Degree
40 pages
SearchEngineOptimization SEO 2024
100% (1)
SearchEngineOptimization SEO 2024
1 page
13 Building Search Engine Using Machine Learning Technique
No ratings yet
13 Building Search Engine Using Machine Learning Technique
4 pages
Course Plan: Department of Computer Science Enginnering
No ratings yet
Course Plan: Department of Computer Science Enginnering
8 pages
Syllabus Information Retrieval Techniques
No ratings yet
Syllabus Information Retrieval Techniques
2 pages
Syllabus
No ratings yet
Syllabus
3 pages
BDA Presentation1
No ratings yet
BDA Presentation1
12 pages
Comparative Analysis of Yandex and Google Search Engines
No ratings yet
Comparative Analysis of Yandex and Google Search Engines
70 pages
SEO Beginners Slide Show
No ratings yet
SEO Beginners Slide Show
45 pages
Seo Guide
100% (1)
Seo Guide
25 pages
The Google Algorithm Leak Lifts The Veil For Seos: by Andrew Ansley
No ratings yet
The Google Algorithm Leak Lifts The Veil For Seos: by Andrew Ansley
40 pages
Final Year Project Ideas For Computer Science
No ratings yet
Final Year Project Ideas For Computer Science
13 pages
Dennis Yu Blitzmetrics Content Marketing Guide
No ratings yet
Dennis Yu Blitzmetrics Content Marketing Guide
49 pages
H.V.P.M's College of Engineering and Technology, Amravati
No ratings yet
H.V.P.M's College of Engineering and Technology, Amravati
23 pages
Digital Marketing PAP
No ratings yet
Digital Marketing PAP
19 pages
WSMA 2021-22 Question Paper Answered
No ratings yet
WSMA 2021-22 Question Paper Answered
11 pages
Digital Marketing Ambedkar College
No ratings yet
Digital Marketing Ambedkar College
251 pages
Ir Practical Manual 2
No ratings yet
Ir Practical Manual 2
24 pages
Econ 2040 Homework #4 SOlutions
0% (3)
Econ 2040 Homework #4 SOlutions
5 pages
Context Based Adoptionof Rankingand Indexing Measuresfor Cricket Team Ranks
No ratings yet
Context Based Adoptionof Rankingand Indexing Measuresfor Cricket Team Ranks
25 pages
IR Practical Code
No ratings yet
IR Practical Code
13 pages
IR Journal (Printable)
No ratings yet
IR Journal (Printable)
20 pages
IRS Syllabus
No ratings yet
IRS Syllabus
2 pages
Google's EFE
No ratings yet
Google's EFE
10 pages
Class 12 IP ct2
No ratings yet
Class 12 IP ct2
2 pages
BDA Qbank (2016-2020) : Chapter 1: Introduction To Big Data and Hadoop
No ratings yet
BDA Qbank (2016-2020) : Chapter 1: Introduction To Big Data and Hadoop
7 pages
How Google Fights Disinformation
No ratings yet
How Google Fights Disinformation
32 pages
Unofficial Question Bank
No ratings yet
Unofficial Question Bank
20 pages
Using Logarithms in The Real World
No ratings yet
Using Logarithms in The Real World
5 pages
Ranking of Closeness Centrality For Large-Scale Social Networks
No ratings yet
Ranking of Closeness Centrality For Large-Scale Social Networks
10 pages
3 - IJIM Aswani SEM
No ratings yet
3 - IJIM Aswani SEM
10 pages
CS345 Data Mining: Link Analysis Algorithms Page Rank
No ratings yet
CS345 Data Mining: Link Analysis Algorithms Page Rank
37 pages
Hadoop Pagerank PDF
No ratings yet
Hadoop Pagerank PDF
17 pages
Indian Agricultural Universities
No ratings yet
Indian Agricultural Universities
8 pages