Deep Web
Deep Web
INTRODUCTION
The deep web is the content that resides in
searchable databases, the results from which
up
of
hundreds
of
of
thousands
publicly
accessible
databases
and is approximately
schemas.
CONTENTS
An overview of deep web search System structure
Schema Mining Schema Matching Query Planning Query Reuse Incremental Plan Generation Plan Execution
System design
System details Exploiting Parallelism
consider a query that asks for the amino acids occurring at the corresponding position in the orthologous gene of non-human mammals with respect to a particular gene, such as ERCC6 .[1]
ref[1]:F. Wang and G. Agrawal, SEEDEEP: A System for Exploring and Querying Scientific Deep web Data Sources
System structure
The system structure comprises of two components which are the exploring and the querying components.
There are 6 individual modules, including Schema Mining Module (Smi),Schema
Matching
Module(Sma),Query
Planning
System structure[2]
ref[2]:F. Wang and G. Agrawal, SEEDEEP: A System for Exploring and Querying Scientific Deep web Data Sources
Schema Mining
SMi is used to automatically mine the schemas
of deep web data sources and build data source models to describe the data content hidden behind the data sources. The schema mining module aims to find the complete output schemas of deep web data sources.
Schema Matching
To construct a data source dependence model, we
need to consider the input-output schema matching problem for different data sources. Schema interfaces. matching discovers the semantics
Query Planning
The query planning module takes a query and system
Query Reuse
The query reuse module takes a query-plan-driven
strategy to generate query plans based on previously cached query plans which are stored in the plan base. An effective data reuse can greatly reduce the execution time of a query plan.
Plan Execution
The query plan is executed by the plan execution module.
System design
A query plan can be modeled by a directed graph
G = (V,E), where the node set is the data sources involved in the plan, and the edge set is the interdependence between these data sources. In order to save the time spent while waiting for results from data sources, we can query multiple data sources in parallel.
System details
In our design, a querier pool and a data pool are
built to implement the parallel query plan execution.
ref[3]:F. Wang and G. Agrawal, SEEDEEP: A System for Exploring and Querying Scientific Deep web Data Sources
Querier pool comprises a querier corresponding to each data source. A querier is implemented as an
ref[4]:F. Wang and G. Agrawal, SEEDEEP: A System for Exploring and Querying Scientific Deep web Data Sources
Exploiting Parallelism
The system has three kinds of parallelism:Task Parallelism:-Data sources having no interdependence can be queried in parallel. Thread parallelism:- Multiple input instances can be
CONCLUSION
This paper has described and evaluated a
system that exploits parallelization for accelerating search over multiple deep web data sources. An interactive, two-stage multithreading system design enables us to achieve
References
www.cse.ohio-state.edu
www.portal.acm.com
F. Wang and G. Agarwal, SEEDEEP: A System for Exploring and Querying Scientific Deep web data sources,Statistical
QUESTIONS..?
THANK YOU...