0% found this document useful (0 votes)

124 views5 pages

Query and Reporting Tools: Search Engine Architecture

The document discusses query and reporting tools available from Business Objects to access a data warehouse. It describes BusinessObjects, InfoView, InfoBurst, and Data Warehouse List Upload as tools that provide point-and-click interfaces for querying, reporting, refreshing reports, and uploading lists. The document also states that WSU has negotiated a contract with Business Objects for purchasing these tools at a discount.

Uploaded by

Umang Purohit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

124 views5 pages

Query and Reporting Tools: Search Engine Architecture

Uploaded by

Umang Purohit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Query and Reporting Tools

The data warehouse is accessed using an end-user query and reporting tool from Business Objects.

Business Objects provides several tools to securely access the data warehouse or personal data files

with a point-and-click interface including the following:

 BusinessObjects (Reporter and Explorer) ? a Microsoft Windows based query and reporting

tool.

 InfoView - a web based tool, that allows reports to be refreshed on demand (but can not

create new reports).

 InfoBurst - a web based server tool, that allows reports to be refreshed, scheduled and

distributed. It can be used to distribute reports and data to users or servers in various formats

(e.g. Text, Excel, PDF, HTML, etc.). For more information, see the documentation below:

 InfoBurst Usage Notes (PDF)

 InfoBurst User Guide (PDF)

 Data Warehouse List Upload - a web based tool, that allows lists of data to be uploaded into

the data warehouse for use as input to queries. For more information, see the documentation

below:

 Data Warehouse List Upload Instructions (PDF)

WSU has negotiated a contract with Business Objects for purchasing these tools at a discount. View

BusObj Rates

Search engine architecture:-

A software architecture consists of software components, the interfaces provided by those
components, and the relationships between them.
-describes a system at a particular level of abstraction.

Architecture of a search engine determined by 2 requirements

-effectiveness (quality of results) and efficiency (response time and throughput)

Search engines are one tool used to answer information

needs
• Users express their information needs as queries.

What is a good answer to a query?

• One that is relevant to the user’s information need!
• Search engines typically return ten answers-per-page,
where each answer is a short summary of a web
document
• Likely relevance to an information need is approximated
by statistical similarity between web documents and the
query
• Users favour search engines that have high precision,
that is, those that return relevant answers in the first
page of results.

HOW DO SEARCH ENGINES WORK:-

1. Web Crawler / Spiders
2. Databases & Indexes (Inverted Index)
3. Search Results Ranking

Three main parts:

– Gather the contents of all web pages (using a program called a crawler or spider)
– Organize the contents of pages in a way that allows efficient retrieval (indexing)
– Take in a query, determine which pages match, and show the results (ranking and
display of results)

1. Web Crawlers / Spiders

Crawlers gather pages/sites
– Programs that move from site to site on the web and gather information about the
pages found
– Start with a list of domain names (homepages), and follow the hyperlink on the
homepages.
– Keep a list of urls visited and those still to be visited.
– At each site, the crawler may be focused on breadth or depth
• Breadth – gather top pages and move on to another site
– Allows it to find more sites
• Depth – gathers all pages at site
– Allows it to index more pages in each site
How frequently a site gets crawled varies
– From engine to engine
Web Crawler do collect…
-Mostly html pages
-PDF
-Word
-PPT, etc.

2. Databases & Indexing

Databases
a. Input from crawlers, from submissions by authors, from related directories
b. Cashed pages
c. Describes pages (indexes)
d. The size of the database is an important issue
e. Even the largest does not cover the entire Web

-Indexing
– Each page that is included in the database is indexed (automatically)
• “All” the words on the page (for full-text search)
• Stop words
• Metatags: title, others
• URL
• Hypertext anchors and links
– Spamming
• Load words into metatags
• Load invisible words (e.g., white text on white background)
Indexing
– Each page that is included in the database is indexed (automatically)
• “All” the words on the page (for full-text search)
• Stop words
• Metatags: title, others
• URL
• Hypertext anchors and links
– Spamming
• Load words into metatags
• Load invisible words (e.g., white text on white background)
Inverted Index
-How to store the words for fast lookup
-Basic steps:
– Make a “dictionary” of all the words in all of the web pages
– For each word, list all the documents it occurs in.
– Often omit very common words
• “stop words”
– Sometimes stem the words
• (also called morphological analysis)
• cats -> cat
• running -> run
-In reality, this index is huge.

3. Results ranking
a)Search engine receives a query, then
b)Looks up the words in the index, retrieves many documents, then
c)Rank orders the pages and extracts “snippets” or summaries containing query words.
– Most web search engines assume the user wants all of the words (Boolean AND,
not OR).
d)These are complex and highly guarded algorithms unique to each search engine.

Some ranking criteria:-

-For a given candidate result page, use:
– Number of matching query words in the page
– Frequency of terms on the page and in general
– Proximity of matching words to one another
– Location of terms within the page
– Location of terms within tags e.g. <title>, <h1>, link text, body text
– Anchor text on pages pointing to this one
– Link analysis of which pages point to this one
– (Sometimes) Click-through analysis: how often the page is clicked on
– How “fresh” is the page.

HOW TO SCALE MODERN TIMES-

• Currently
– Efficient index
– Petabyte scale storage space
– Efficient Crawling
– Cost effectiveness of hardware
• Future
– Qualitative context
• Maintaining localization data
– Perhaps send indexing to clients
– Client computers help gather Google’s index in a distributed,
decentralized fashion?

WRAPUP
• Loads of future work
– Even at that time, there were issues of:
• Information extraction from semi-structured sources (such as web pages)
– Still an active area of research
• Search engines as a digital library
– What services, APIs and toolkits should a search engine provide?
– What storage methods are the most efficient?
– From 2005 to 2010 to ???
• Enhancing metadata
– Automatic markup and generation
– What are the appropriate fields?
• Automatic Concept Extraction
– Present the Searcher with a context
• Searching languages: beyond context-free queries
• Other types of search: Facet, GIS, etc.

UNIT 3 Notes
No ratings yet
UNIT 3 Notes
32 pages
10-Searching The Web
100% (1)
10-Searching The Web
27 pages
Search Engines Information Retrieval in Practice PDF
No ratings yet
Search Engines Information Retrieval in Practice PDF
542 pages
Crazy Free Bitcoin PDF
60% (5)
Crazy Free Bitcoin PDF
4 pages
Seach Engine
50% (2)
Seach Engine
18 pages
IR Module 3
No ratings yet
IR Module 3
45 pages
08 Web Search and Web Crawling
No ratings yet
08 Web Search and Web Crawling
33 pages
Webmininglec
100% (1)
Webmininglec
75 pages
Lect 1 IRIntroduction
No ratings yet
Lect 1 IRIntroduction
59 pages
7 CurrentTrendsAndIssues
No ratings yet
7 CurrentTrendsAndIssues
50 pages
IR On Web Search Engines: Reference of Slides Taken From DR Haddawy's Material
No ratings yet
IR On Web Search Engines: Reference of Slides Taken From DR Haddawy's Material
21 pages
Chapter 1 Search Engine 1. Objective
No ratings yet
Chapter 1 Search Engine 1. Objective
63 pages
Chapter 2
No ratings yet
Chapter 2
45 pages
Web Search Engingine Indexing Crawling and Ranking
No ratings yet
Web Search Engingine Indexing Crawling and Ranking
63 pages
CS8080 Irt Unit 4 23 24
No ratings yet
CS8080 Irt Unit 4 23 24
36 pages
WEB BROWSERS+search Engine
No ratings yet
WEB BROWSERS+search Engine
10 pages
Lab Manual: Web Technology
No ratings yet
Lab Manual: Web Technology
39 pages
Internet Searching Technique - Last Edited
No ratings yet
Internet Searching Technique - Last Edited
36 pages
Jaff Seminar
No ratings yet
Jaff Seminar
31 pages
Chapter 2
No ratings yet
Chapter 2
23 pages
Unit 8 - Search Engines
No ratings yet
Unit 8 - Search Engines
8 pages
Crawler, Index, Ranking
No ratings yet
Crawler, Index, Ranking
20 pages
Ir 5
No ratings yet
Ir 5
18 pages
Comsats Institute of Information TECHNOLOGY Islamabad
No ratings yet
Comsats Institute of Information TECHNOLOGY Islamabad
11 pages
Search Engine
No ratings yet
Search Engine
35 pages
How Google Works
No ratings yet
How Google Works
61 pages
Mini Google
No ratings yet
Mini Google
34 pages
L4 Slides Developing For The Web Y8
No ratings yet
L4 Slides Developing For The Web Y8
16 pages
Web Search. Web Spidering
No ratings yet
Web Search. Web Spidering
44 pages
Darknet Report
No ratings yet
Darknet Report
27 pages
Duplichecker Plagiarism Report
No ratings yet
Duplichecker Plagiarism Report
4 pages
Google Earth Tutorial
100% (3)
Google Earth Tutorial
22 pages
Search Engine Using Apache Lucene
No ratings yet
Search Engine Using Apache Lucene
5 pages
Chap 2
No ratings yet
Chap 2
29 pages
Working of Webb Search Engines
No ratings yet
Working of Webb Search Engines
29 pages
VV - IR - UNIT-I - Part2
No ratings yet
VV - IR - UNIT-I - Part2
35 pages
CS571 Note
No ratings yet
CS571 Note
2 pages
Meta Search Engines
No ratings yet
Meta Search Engines
48 pages
005-001-000-024 Search Engines
No ratings yet
005-001-000-024 Search Engines
11 pages
Completed Final UNIT-V 9.10.17
100% (1)
Completed Final UNIT-V 9.10.17
74 pages
SEARCH ENGINES and PAGERANK
No ratings yet
SEARCH ENGINES and PAGERANK
29 pages
Different Types of Web Crawlers
No ratings yet
Different Types of Web Crawlers
40 pages
Search Engine Student Documents
No ratings yet
Search Engine Student Documents
6 pages
The Wisdom of Crowds: Web Mining or
No ratings yet
The Wisdom of Crowds: Web Mining or
50 pages
Search Engine Optimization - Using Data Mining Approach
No ratings yet
Search Engine Optimization - Using Data Mining Approach
5 pages
How Do Search Engines Work
No ratings yet
How Do Search Engines Work
3 pages
Unit 5 - Data Science & Big Data - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Science & Big Data - WWW - Rgpvnotes.in
17 pages
Search Engine
No ratings yet
Search Engine
42 pages
Search Engines .: Presented By: Rasik Mevada Vishal Dabhi Vimal Nair Ravi Mathai
No ratings yet
Search Engines .: Presented By: Rasik Mevada Vishal Dabhi Vimal Nair Ravi Mathai
25 pages
Answers - Cause&Effect Concepts&Comments PDF
No ratings yet
Answers - Cause&Effect Concepts&Comments PDF
5 pages
Students Online Clearance System
0% (1)
Students Online Clearance System
31 pages
Search Engine Description
No ratings yet
Search Engine Description
17 pages
Search Engine Comparisons
No ratings yet
Search Engine Comparisons
23 pages
Search Tools: Presented By: ISHA
No ratings yet
Search Tools: Presented By: ISHA
22 pages
Working of Search Engines: Avinash Kumar Widhani, Ankit Tripathi and Rohit Sharma Lnmiit
No ratings yet
Working of Search Engines: Avinash Kumar Widhani, Ankit Tripathi and Rohit Sharma Lnmiit
10 pages
Working of Search Engines: Avinash Kumar Widhani, Ankit Tripathi and Rohit Sharma Lnmiit
No ratings yet
Working of Search Engines: Avinash Kumar Widhani, Ankit Tripathi and Rohit Sharma Lnmiit
13 pages
Effective Searching Policies For Web Crawler
No ratings yet
Effective Searching Policies For Web Crawler
3 pages
Web Search Engine
No ratings yet
Web Search Engine
26 pages
Sciencelogic Api 11-2-0
No ratings yet
Sciencelogic Api 11-2-0
254 pages
Preparation
No ratings yet
Preparation
10 pages
RSA - MDM Reference Data Functional Detailed Design v0 2
No ratings yet
RSA - MDM Reference Data Functional Detailed Design v0 2
34 pages
ArcGIS Enterprise Hardening Guide 1
No ratings yet
ArcGIS Enterprise Hardening Guide 1
119 pages
Enterprise Application
No ratings yet
Enterprise Application
47 pages
Ez Switch Setup Guide
No ratings yet
Ez Switch Setup Guide
64 pages
1.2.3 Internet Principles of Operation
No ratings yet
1.2.3 Internet Principles of Operation
7 pages
390 - Computer Programming Concepts (Open) - R - 2020
No ratings yet
390 - Computer Programming Concepts (Open) - R - 2020
10 pages
Big O Notation
No ratings yet
Big O Notation
234 pages
Java Anda LL
No ratings yet
Java Anda LL
73 pages
React Native Workshop: By: Mochamad Halili MZ
No ratings yet
React Native Workshop: By: Mochamad Halili MZ
10 pages
Osmin Sanabria: Education
No ratings yet
Osmin Sanabria: Education
1 page
IT All Chapters Notes
No ratings yet
IT All Chapters Notes
14 pages
Yoga Book: User Guide
No ratings yet
Yoga Book: User Guide
21 pages
User Guide Nokia 1 Plus User Guide
No ratings yet
User Guide Nokia 1 Plus User Guide
55 pages
FTP Abstract
100% (1)
FTP Abstract
3 pages
DES-1228P: User Manual
No ratings yet
DES-1228P: User Manual
100 pages
Virtual Prescription Management System
No ratings yet
Virtual Prescription Management System
7 pages
Neha Solutions at Pollachi: MOB: 9655340005 / 9655340006 Old Bustand Backside, Watertank Opposite
No ratings yet
Neha Solutions at Pollachi: MOB: 9655340005 / 9655340006 Old Bustand Backside, Watertank Opposite
20 pages
Common Problems and Solutions
No ratings yet
Common Problems and Solutions
6 pages
Getting Started in Your Project Ares Trial
No ratings yet
Getting Started in Your Project Ares Trial
5 pages
SQL Report Writer Resume
No ratings yet
SQL Report Writer Resume
1 page
Setting Up Online Grocery Business in 2022 - Here Is What You Need To Know - Business Model Canvas, O
No ratings yet
Setting Up Online Grocery Business in 2022 - Here Is What You Need To Know - Business Model Canvas, O
1 page
Google Is Going To Collapse and Fail
No ratings yet
Google Is Going To Collapse and Fail
2 pages
Mapa de Colombia Pra
No ratings yet
Mapa de Colombia Pra
3 pages
Banner Kanan Kiri
No ratings yet
Banner Kanan Kiri
2 pages
IDS - Company Profile One Pager PDF
No ratings yet
IDS - Company Profile One Pager PDF
1 page
Web Strategy for Everyone: How to Create and Manage a Website, Usable by Anyone on Any Device, With Great Information Architecture and High Performance
From Everand
Web Strategy for Everyone: How to Create and Manage a Website, Usable by Anyone on Any Device, With Great Information Architecture and High Performance
Marcus Österberg
4/5 (3)
Web Devlopment
From Everand
Web Devlopment
Netra
No ratings yet
Elasticsearch Indexing: How to Improve User's Search Experience
From Everand
Elasticsearch Indexing: How to Improve User's Search Experience
Huseyin Akdogan
1/5 (1)
Access 2016: Up To Speed
From Everand
Access 2016: Up To Speed
R.M. Hyttinen
5/5 (2)
Seo Learning Guide
From Everand
Seo Learning Guide
ngencoband
No ratings yet