0% found this document useful (0 votes)

82 views33 pages

Keyword Searching and Browsing in Databases Using BANKS

The document describes the BANKS system for keyword searching and browsing of databases. BANKS allows users to search databases using keywords without needing to learn SQL. It handles challenges like related data being split across tables and joins needing to be computed dynamically. BANKS models the database as a graph and returns trees of connected tuples matching the keywords. It provides fast response times, extensive browsing features, and graphical result displays.

Uploaded by

intisar ibrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views33 pages

Keyword Searching and Browsing in Databases Using BANKS

Uploaded by

intisar ibrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 33

Keyword Searching and Browsing in

Databases using BANKS

Gaurav Bhalotia, Arvind Hulgeri,

Charuta Nakhe,
Soumen Chakrabarti, S. Sudarshan

I.I.T. Bombay
11/25/2018 1
Motivation

 Keyword search of documents on the Web has

been enormously successful
 Simple and intuitive, no need to learn any query
language
 Database querying using keywords is desirable
 SQL is not appropriate for casual users
 Form interfaces cumbersome:
 Require separate form for each type of query — confusing for
casual users of Web information systems
 Not suitable for ad hoc queries

11/25/2018 2
Motivation

 Many Web documents are dynamically generated

from databases
 E.g. Catalog data
 Keyword querying of generated Web documents
 May miss answers that need to combine information
on different pages
 Suffers from duplication overheads

11/25/2018 3
Examples of Keyword Queries

 On a railway reservation database

 “mumbai bangalore”
 On an e-store database
 “camcorder panasonic”
 On a book store database
 “sudarshan databases”

11/25/2018 4
Differences from IR/Web Search

 Related data split across multiple tuples due to

normalization
 E.g. Paper (paper-id, title, journal),
Author (author-id, name)
Writes (author-id, paper-id, position)
 Different keywords may match tuples from
different relations
 What joins are to be computed can only be decided on
the fly
 Cites(citing-paper-id, cited-paper-id)

11/25/2018 5
Connectivity

 Tuples may be connected by

 Foreign key
 Implicit links (shared words), etc.
 Tuples belonging to the same relation
 Would like to find sets of (closely) connected
tuples that match all given keywords

11/25/2018 6
Basic Model

 Database: modeled as a graph

 Nodes = tuples
 Edges = references between tuples
 foreign key, other kind of relationships
 Edges are directed.

BANKS: Keyword search… MultiQuery Optimization paper

writes

Charuta S. Sudarshan Prasan Roy author

11/25/2018 7
Answer Example

Query: sudarshan roy

paper
MultiQuery Optimization

writes writes

author author
S. Sudarshan Prasan Roy

11/25/2018 8
Edge Directionality

 Some popular tuples are connected to many

other tuples
 E.g. Students -> departments -> university
 Popular tuples would create misleading shortcuts
from every tuple to every other
 E.g. every student would be closely linked with every
other student via the department/university
 Solution: define different forward and backward
edge weights
 Forward edges: In the direction of the foreign key
reference
11/25/2018 9
Edge Weight

 Weight of forward edge based on schema

 e.g. citation link weights > writes link weights
 Weight of backward edge = indegree of edges
pointing to the node
3

1
3
1

3
1

11/25/2018 10
Edge Weight Scaling

 Problem: Some backward edges have unduly

large weights
 Scale edge weights by using log(1+raw-edgeweight)
 total-edge-weight =  edge-weights
 Edge score E = 1 / total-edge-weight

11/25/2018 11
Node Weight

 Nodes have prestige weights too

 Observation: nodes with intuitively greater prestige
tend to have greater indegree
 Set node weight = indegree
 Problem: Nodes with many in-edges result in
skewed answers
 Subdue extreme node weights by using
log(1+indegree)
 Node score N =
root-node-weight +  leaf-node-weights

11/25/2018 12
Combining Scores

 Problem: how to combine two independent

metrics: node weight and edge weight
 Normalize each to 0-1
 Combine using weighting factor 
 Additive: (1- ) E +  N

 Multiplicative: E N
 Performance study to compare alternatives and
to find reasonable values for 

11/25/2018 13
The BANKS Answer Model

 Query: set of keywords {k1, k2, .., kn}

 Each keyword ki matches set of nodes Si
 Answer: rooted, directed tree connecting
nodes, with one node from each Si
 Root node(also referred to as Information Node) has
special significance, may be restricted to some
relations
 E.g. relations representing entities, not relationships
 May include intermediate nodes not in any Si and
hence a Steiner tree.
 Multiple answers
 Ranking based on proximity + prestige
11/25/2018 14
Finding Answer Trees

 Computation of minimum weight Steiner

Trees: NP complete
 Backward Expanding Search Algorithm:
 Intuition: find vertices from which a forward path
exists to at least one node from each Si.
 Run concurrent single source shortest path algorithm
from each node matching a keyword
 Create an iterator for each node matching a keyword
 Traverse the graph edges in reverse direction

 Output a node whenever it is on the intersection of the sets of

nodes reached from each keyword

11/25/2018 15
Finding Answer Tress

 For each vertex visited, maintain a nodelist v.Li

for each search term ti.
 Update the ith nodelist when the search starting
from a vertex uєSi reaches the vertex v.
 The new result tress produced correspond to the
nodelists : u × Л v.Lj
i‡j

11/25/2018 16
Backward Expanding Search
Query: sudarshan roy

paper MultiQuery Optimization

writes

authors S. Sudarshan Prasan Roy

11/25/2018 17
Result Ordering
 Answer trees may not be generated in relevance
order
 Solution:
 Best-first search across all iterators, based on path
length
 Output answers to a buffer
 Eliminate duplicates: Isomorphic Trees
 Output highest ranked answer from buffer to user
when buffer is full

11/25/2018 18
THE BANKS SYSTEM

 BANKS provides keyword search coupled with

extensive browsing facilities
 Schema browsing + data browsing
 Graphical display of data
 Implemented using Java + servlets
 Keyword search response times typically 1 to 3
seconds on
 DBLP database with 100,000 tuples/300,000 edges
 P3 600 MHz, 512 MB RAM
 Try it out at www.cse.iitb.ac.in/banks/

11/25/2018 19
The BANKS Architecture

HTTP JDBC
User BANKS

Web Server
+ Servlets Database

 Connects to any database using JDBC

 JDBC metadata features used to provide schema
browsing
 No programming needed for customization
 Minimal preprocessing of database to create indices and give
weights to links
 Extensive set of browsing features
11/25/2018 20
Browsing Features

 Hyperlinks are automatically added to all

displayed results
 Template facilities to do a variety of tasks
 Browsing data by grouping and creating crosstabs
 e.g., theses grouped by department and year

 Hierarchical views of data

 Nested XML style, even on relational data

 Graphical displays
 Bar charts, pie charts, etc

11/25/2018 21
Example of Browsing in BANKS

11/25/2018 22
BANKS Query Result Example

 Result of “Soumen Sunita”

11/25/2018 23
Anecdotes

 “Mohan”
 Returns C. Mohan at top based on prestige (number of
papers written)
 “Transaction”
 Returns Jim Gray’s classic paper and textbook as top
answers based on prestige (number of citations)
 “Sunita Seltzer”
 No common papers, but both have papers with
Stonebraker: system finds this connection

11/25/2018 24
Effect of Parameters
 Log scaling of edge weights worked well
 (1- ) E +  N versus E N -- made little difference
 Best with  = .2 (subdue node weights but not entirely)

11/25/2018 25
Related Work
 DataSpot (DTL)/Mercado Intuifind [VLDB 98]
 Based on patent by Palmon (filed 1995, granted 1998)
 Similar answer model to ours
 Differences: our model of backward link weights and prestige
 Proximity Search [VLDB98]
 Different model of proximity
 No edge weights, prestige, different evaluation algorithm
 Information units (linked Web pages) [WWW10]
 No directionality, only studied in Web context
 Microsoft DBExplorer
 No ranking, based on SQL generation
 Addresses efficient construction of text indexes

11/25/2018 26
Some Extensions to the BANKS

 Searching for similar results: Template Search

 define the notion of similarity between two result trees
 perform the restricted search
 Efficiently handling meta-data queries
 starting the search from each of the tuples in a table is
too costly

11/25/2018 27
Template Search

 Feedback in terms of result tree

 Type of a result tree defined in terms of
 type of nodes
 the table to which the node belongs
 type of edges :
 the type of nodes which it connects
 the link information e.g. ‘cites’ and ‘cited’ link between two
papers.
 Which nodes to start the search from
 only the chosen nodes
 all the nodes corresponding to a particular keyword

11/25/2018 28
Template Search

 Start the backward search only from allowed set

of nodes
 Follow the edges as defined by the result type
 Example : Consider Query “sudarshan database”
 Two types of results for above query
 papers written by professor sudarshan
 papers cited by papers written by professor sudarshan
 Two result types distinguished by whether to
follow the cites/cited link from a paper node.

11/25/2018 29
Metadata Keyword Queries

 Metadata keywords : match all the tuples of

a relation.
 Too costly to start the search from each of
 the tuples of a table
 First cut approach: start the forward search from
the information node for the non-metadata
keywords
 selectively choose the nodes from where to
start the forward search

11/25/2018 30
Example of Metadata Query

 Consider the query “sudarshan paper”

writes table
nodes

To paper table
(forward search)
sudarshan

11/25/2018 31
Conclusions and Future Work

The next big wave: keyword searching and

browsing of databases?
Future work:
 Keyword queries on XML

 Disambiguating queries by selecting

 Nodes: G.W.Bush: “Bush Jr” or “Bush Sr”

 Tree structure: “coauthors” or “cites”
 Boolean queries, stemming, thesaurus
 Metadata: column/relation names

11/25/2018 32
Thank You

11/25/2018 33

CSE 444 Practice Problems
No ratings yet
CSE 444 Practice Problems
13 pages
Chapter 2 - Data Models Importance of Data Models
No ratings yet
Chapter 2 - Data Models Importance of Data Models
2 pages
Keyword Search in Structured Databases: Vagelis Hristidis
No ratings yet
Keyword Search in Structured Databases: Vagelis Hristidis
58 pages
Chapter 2: Modeling: Advanced Topics in Information Retrieval
No ratings yet
Chapter 2: Modeling: Advanced Topics in Information Retrieval
28 pages
4 IRModels
No ratings yet
4 IRModels
32 pages
Web Search
No ratings yet
Web Search
30 pages
Information Retrieval System and The Pagerank Algorithm
No ratings yet
Information Retrieval System and The Pagerank Algorithm
37 pages
LIBS 894 Assignment Three Classic Models
No ratings yet
LIBS 894 Assignment Three Classic Models
8 pages
Query Execution
No ratings yet
Query Execution
87 pages
Lecture Notes
No ratings yet
Lecture Notes
96 pages
Introduction To Telecom Technologies (Telecom) : Getachew Mamo
No ratings yet
Introduction To Telecom Technologies (Telecom) : Getachew Mamo
65 pages
A System For Keyword-Based Searching in Databases: N.L. Sarda Ankur Jain
No ratings yet
A System For Keyword-Based Searching in Databases: N.L. Sarda Ankur Jain
18 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
31 pages
Informaiton Retrieval and Web Search
No ratings yet
Informaiton Retrieval and Web Search
44 pages
Bulu
No ratings yet
Bulu
47 pages
Web Search Engines: Rooted in Information Retrieval (IR) Systems
No ratings yet
Web Search Engines: Rooted in Information Retrieval (IR) Systems
48 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
50 pages
4 IRModels
No ratings yet
4 IRModels
46 pages
4 IRModels
No ratings yet
4 IRModels
30 pages
Keyword Search On External Memory Data Graphs: Bhavana Dalvi Meghana Kshirsagar
No ratings yet
Keyword Search On External Memory Data Graphs: Bhavana Dalvi Meghana Kshirsagar
29 pages
Information Retrieval: IR Evaluation
No ratings yet
Information Retrieval: IR Evaluation
36 pages
02 Chap02a-BooleanAndvector Models
No ratings yet
02 Chap02a-BooleanAndvector Models
30 pages
Unit II
No ratings yet
Unit II
73 pages
10 Data Structures That Make Databases Fast and Scalable
No ratings yet
10 Data Structures That Make Databases Fast and Scalable
12 pages
Session - 6 - Complex Data Types
No ratings yet
Session - 6 - Complex Data Types
27 pages
Unit 3
No ratings yet
Unit 3
63 pages
Query Evaluation
No ratings yet
Query Evaluation
51 pages
6-Query Languages
No ratings yet
6-Query Languages
19 pages
IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model
No ratings yet
IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model
46 pages
Multidimensional Indexes
No ratings yet
Multidimensional Indexes
31 pages
IR Chap4
100% (1)
IR Chap4
32 pages
IR Chap4
100% (1)
IR Chap4
32 pages
Unit 2
No ratings yet
Unit 2
13 pages
Information Retrieval Models
No ratings yet
Information Retrieval Models
15 pages
Cp5094 IRT University Question
75% (8)
Cp5094 IRT University Question
3 pages
Keyword Search Over Relational Databases: A Metadata Approach
No ratings yet
Keyword Search Over Relational Databases: A Metadata Approach
12 pages
DINLect 1
No ratings yet
DINLect 1
69 pages
Web Search Engingine Indexing Crawling and Ranking
No ratings yet
Web Search Engingine Indexing Crawling and Ranking
63 pages
5 B IRModels
No ratings yet
5 B IRModels
51 pages
DM Unit-I
No ratings yet
DM Unit-I
54 pages
160960475X
No ratings yet
160960475X
411 pages
IR Models: Chapter Five
100% (1)
IR Models: Chapter Five
26 pages
SEM 4 MC0077 Advances Database System
No ratings yet
SEM 4 MC0077 Advances Database System
38 pages
Web Information Retrieval
No ratings yet
Web Information Retrieval
10 pages
Query Languages: Chapter Seven
No ratings yet
Query Languages: Chapter Seven
36 pages
Link Analysis
No ratings yet
Link Analysis
43 pages
CS583 Info Retrieval
No ratings yet
CS583 Info Retrieval
34 pages
Score: Context-Oriented Structured and Unstructured Information Integration
No ratings yet
Score: Context-Oriented Structured and Unstructured Information Integration
35 pages
Database 2
No ratings yet
Database 2
50 pages
09 Indexes2
No ratings yet
09 Indexes2
5 pages
7-Query Processing
No ratings yet
7-Query Processing
47 pages
Query Languages
No ratings yet
Query Languages
54 pages
Lec2 2
No ratings yet
Lec2 2
17 pages
CS583 Info Retrieval
No ratings yet
CS583 Info Retrieval
33 pages
Slide 3
No ratings yet
Slide 3
35 pages
IR Systems Usually Adopt Index Terms To Process Queries Index Term
No ratings yet
IR Systems Usually Adopt Index Terms To Process Queries Index Term
24 pages
Dbms Ques & Ans-1
No ratings yet
Dbms Ques & Ans-1
9 pages
Ijcsi 8 5 1 210 218
No ratings yet
Ijcsi 8 5 1 210 218
9 pages
Spatial, Text, and Multimedia Databases: Erik Zeitler Udbl
No ratings yet
Spatial, Text, and Multimedia Databases: Erik Zeitler Udbl
53 pages
Dbms Ani
No ratings yet
Dbms Ani
68 pages
“Mastering Relational Databases: From Fundamentals to Advanced Concepts”: GoodMan, #1
From Everand
“Mastering Relational Databases: From Fundamentals to Advanced Concepts”: GoodMan, #1
Patrick Mukosha
No ratings yet
Proxy-Based Acceleration of Dynamically Generated Content On The World Wide Web: An Approach and Implementation
No ratings yet
Proxy-Based Acceleration of Dynamically Generated Content On The World Wide Web: An Approach and Implementation
20 pages
I: Intelligent, Interactive Investigaton of OLAP Data Cubes
No ratings yet
I: Intelligent, Interactive Investigaton of OLAP Data Cubes
5 pages
Computing Capabilities of Mediators: Ramana Yerneni Chen Li Hector Garcia-Molina Jeffrey Ullman
No ratings yet
Computing Capabilities of Mediators: Ramana Yerneni Chen Li Hector Garcia-Molina Jeffrey Ullman
28 pages
Niagaracq: A Scalable Continuous Query System For Internet Databases
No ratings yet
Niagaracq: A Scalable Continuous Query System For Internet Databases
25 pages
Yer Neni
No ratings yet
Yer Neni
12 pages
Eءxpert System PDF
No ratings yet
Eءxpert System PDF
52 pages
(Good) Ian H. Witten, Eibe Frank, Mark A. Hall Data Mining - Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems) 2011
No ratings yet
(Good) Ian H. Witten, Eibe Frank, Mark A. Hall Data Mining - Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems) 2011
16 pages
Integrate SharePoint Using SAP PI - SAP Blogs
No ratings yet
Integrate SharePoint Using SAP PI - SAP Blogs
4 pages
Relational Algebra and Relational Calculus
No ratings yet
Relational Algebra and Relational Calculus
44 pages
DDBMS True False
No ratings yet
DDBMS True False
7 pages
Veeam Backup 7 0 Web API
No ratings yet
Veeam Backup 7 0 Web API
381 pages
CPSInstall
No ratings yet
CPSInstall
30 pages
C1 Studios Installer
No ratings yet
C1 Studios Installer
3 pages
1268821
No ratings yet
1268821
4 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
Install PHP in IIS 7 As FastCgi Module
No ratings yet
Install PHP in IIS 7 As FastCgi Module
7 pages
Servicenow Steps For CMDB - Discovery Project
No ratings yet
Servicenow Steps For CMDB - Discovery Project
22 pages
Analysis and Design of Mobile Public Auc PDF
No ratings yet
Analysis and Design of Mobile Public Auc PDF
9 pages
Pragena's Resume
No ratings yet
Pragena's Resume
5 pages
SF PLT Managing User Info en
No ratings yet
SF PLT Managing User Info en
142 pages
Chapter 4
No ratings yet
Chapter 4
24 pages
IMW14307USEN
No ratings yet
IMW14307USEN
12 pages
SQL DBA AlwaysOn Interview Questions and Answers 01
No ratings yet
SQL DBA AlwaysOn Interview Questions and Answers 01
9 pages
Advanced Databse Module
No ratings yet
Advanced Databse Module
131 pages
"D I C S ": A Seminar Report
No ratings yet
"D I C S ": A Seminar Report
6 pages
Voucher Oamy Wifi Zone 1 Semaine Up 487 04.07.24
No ratings yet
Voucher Oamy Wifi Zone 1 Semaine Up 487 04.07.24
7 pages
C 2 Aed 76
No ratings yet
C 2 Aed 76
3 pages
ETL Process Optimization Using Push Down Optimization (PDO) and Teradata Parallel Transporter (TPT)
No ratings yet
ETL Process Optimization Using Push Down Optimization (PDO) and Teradata Parallel Transporter (TPT)
13 pages
B2B Integration Using Sap Netweaver Pi: Sam Raju, Claus Wallacher
No ratings yet
B2B Integration Using Sap Netweaver Pi: Sam Raju, Claus Wallacher
54 pages
1 Quiz 4
No ratings yet
1 Quiz 4
2 pages
Citrix Installation - Configuration Guide
No ratings yet
Citrix Installation - Configuration Guide
18 pages
Intrusionousous
No ratings yet
Intrusionousous
2 pages
R4 AQUA-ASPICE-iDesigner EN v4
100% (1)
R4 AQUA-ASPICE-iDesigner EN v4
97 pages
Functions:: Lab 6 (Part 1) Functions & Procedures
No ratings yet
Functions:: Lab 6 (Part 1) Functions & Procedures
9 pages
Example: Datatype Cursor Varchar2 Variable
No ratings yet
Example: Datatype Cursor Varchar2 Variable
14 pages
Eclipse
No ratings yet
Eclipse
33 pages

Keyword Searching and Browsing in Databases Using BANKS

Uploaded by

Keyword Searching and Browsing in Databases Using BANKS

Uploaded by

Keyword Searching and Browsing in

Databases using BANKS

Gaurav Bhalotia, Arvind Hulgeri,

 Keyword search of documents on the Web has

 Many Web documents are dynamically generated

 On a railway reservation database

 Related data split across multiple tuples due to

 Tuples may be connected by

 Database: modeled as a graph

BANKS: Keyword search… MultiQuery Optimization paper

Charuta S. Sudarshan Prasan Roy author

Query: sudarshan roy

 Some popular tuples are connected to many

 Weight of forward edge based on schema

 Problem: Some backward edges have unduly

 Nodes have prestige weights too

 Problem: how to combine two independent

 Query: set of keywords {k1, k2, .., kn}

 Computation of minimum weight Steiner

 Output a node whenever it is on the intersection of the sets of

 For each vertex visited, maintain a nodelist v.Li

paper MultiQuery Optimization

authors S. Sudarshan Prasan Roy

 BANKS provides keyword search coupled with

 Connects to any database using JDBC

 Hyperlinks are automatically added to all

 Hierarchical views of data

 Result of “Soumen Sunita”

 Searching for similar results: Template Search

 Feedback in terms of result tree

 Start the backward search only from allowed set

 Metadata keywords : match all the tuples of

 Consider the query “sudarshan paper”

The next big wave: keyword searching and

 Disambiguating queries by selecting

 Nodes: G.W.Bush: “Bush Jr” or “Bush Sr”

You might also like