A Motivating Problem: Wrapper Induction: Thai Restaurants in L.A. A-Rated by The L.A. County Health Depart

This document discusses active learning techniques for wrapper induction, which is the task of automatically generating extraction rules to extract structured data from web pages. It introduces three multi-view active learning algorithms - Co-Testing, Co-EMT, and Adaptive View Validation - that can learn accurate wrappers from only a few labeled examples by exploiting redundancy across different representations or views of the data. Co-Testing and Co-EMT actively select the most informative examples to label by considering disagreement between views, while Adaptive View Validation predicts whether a new task is suitable for multi-view learning based on prior tasks.

Uploaded by

Srinivas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views1 page

A Motivating Problem: Wrapper Induction: Thai Restaurants in L.A. A-Rated by The L.A. County Health Depart

Uploaded by

Srinivas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Active Learning with Multiple Views

detecting the most informative examples, while also • From Zagat’s, it obtains the name and address of
exploiting the remaining unlabeled examples. Second, all Thai restaurants in L.A. A
we discuss Adaptive View Validation (Muslea et al., • From the L.A. County Web site, it gets the health
2002b), which is a meta-learner that uses the experience rating of any restaurant of interest.
acquired while solving past learning tasks to predict • From the Geocoder, it obtains the latitude/longi-
whether multi-view learning is appropriate for a new, tude of any physical address.
unseen task. • From Tiger Map, it obtains the plot of any loca-
tion, given its latitude and longitude.
A Motivating Problem: Wrapper
Induction Information agents typically rely on wrappers to
extract the useful information from the relevant Web
Information agents such as Ariadne (Knoblock et al., pages. Each wrapper consists of a set of extraction rules
2001) integrate data from pre-specified sets of Web sites and the code required to apply them. As manually writ-
so that they can be accessed and combined via database- ing the extraction rules is a time-consuming task that
like queries. For example, consider the agent in Figure requires a high level of expertise, researchers designed
1, which answers queries such as the following: wrapper induction algorithms that learn the rules from
user-provided examples (Muslea et al., 2001).
Show me the locations of all Thai restaurants in L.A. In practice, information agents use hundreds of
that are A-rated by the L.A. County Health Depart- extraction rules that have to be updated whenever the
ment. format of the Web sites changes. As manually labeling
examples for each rule is a tedious, error-prone task,
To answer this query, the agent must combine data one must learn high accuracy rules from just a few
from several Web sources: labeled examples. Note that both the small training
sets and the high accuracy rules are crucial to the suc-
cessful deployment of an agent. The former minimizes
the amount of work required to create the agent, thus
making the task manageable. The latter is required in
order to ensure the quality of the agent’s answer to
Figure 1. An information agent that combines data each query: when the data from multiple sources is
from the Zagat’s restaurant guide, the L.A. County integrated, the errors of the corresponding extraction
Health Department, the ETAK Geocoder, and the Tiger rules get compounded, thus affecting the quality of
Map service the final result; for instance, if only 90% of the Thai
restaurants and 90% of their health ratings are extracted
Restaurant Guide
correctly, the result contains only 81% (90% x 90% =
81%) of the A-rated Thai restaurants.
Query:
L.A. County
Health Dept. A-rated Thai
We use wrapper induction as the motivating problem
restaurants for this article because, despite the practical importance
in L.A. of learning accurate wrappers from just a few labeled
examples, there has been little work on active learn-
ing for this task. Furthermore, as explained in Muslea
Agent (2002), existing general-purpose active learners can-
RESULTS: not be applied in a straightforward manner to wrapper
induction.
Geocoder

MAIN THRUST
Tiger Map Server

In the context of wrapper induction, we intuitively

describe three novel algorithms: Co-Testing, Co-EMT,

mlr3 Tutorial
100% (2)
mlr3 Tutorial
271 pages
Active Learning
100% (3)
Active Learning
116 pages
Unit 4
No ratings yet
Unit 4
207 pages
Classification
No ratings yet
Classification
65 pages
TTNT 09 Learning From Examples
No ratings yet
TTNT 09 Learning From Examples
58 pages
Learning: Introduction and Overview: Chapter 18-21
No ratings yet
Learning: Introduction and Overview: Chapter 18-21
29 pages
Chap 13
No ratings yet
Chap 13
68 pages
14 Vcat
No ratings yet
14 Vcat
66 pages
TR1648
No ratings yet
TR1648
47 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
64 pages
Mafia
100% (3)
Mafia
239 pages
Machine Learning Mod 5
No ratings yet
Machine Learning Mod 5
15 pages
4 PDF
No ratings yet
4 PDF
37 pages
100 Days ML
No ratings yet
100 Days ML
15 pages
TLE HE CC9 w1
No ratings yet
TLE HE CC9 w1
4 pages
Active Learning Book
No ratings yet
Active Learning Book
116 pages
Learning and Planning
No ratings yet
Learning and Planning
107 pages
86 37 196 Mod 5
No ratings yet
86 37 196 Mod 5
52 pages
5 Learning
No ratings yet
5 Learning
42 pages
Chapter 3
No ratings yet
Chapter 3
14 pages
University of Computer Studies, Mandalay (UCSM)
No ratings yet
University of Computer Studies, Mandalay (UCSM)
23 pages
I2ml3e Chap17
No ratings yet
I2ml3e Chap17
15 pages
Ai Unit5 Learning
No ratings yet
Ai Unit5 Learning
62 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
Unit 5 Half Ai
No ratings yet
Unit 5 Half Ai
9 pages
Lect6 PDF
No ratings yet
Lect6 PDF
66 pages
Yu Et Al. - 2022 - Progressive Ensemble Kernel-Based Broad Learning System For Noisy Data Classification
No ratings yet
Yu Et Al. - 2022 - Progressive Ensemble Kernel-Based Broad Learning System For Noisy Data Classification
14 pages
Microsoft Word - Dissertacao - Vers.o Final
No ratings yet
Microsoft Word - Dissertacao - Vers.o Final
12 pages
ML Week 2 Part 2
No ratings yet
ML Week 2 Part 2
6 pages
Is Rjecnik
100% (1)
Is Rjecnik
371 pages
English For Business
100% (2)
English For Business
34 pages
11 Learning
No ratings yet
11 Learning
25 pages
Unit 5 2
No ratings yet
Unit 5 2
31 pages
Korpela Introduction
No ratings yet
Korpela Introduction
25 pages
9 Learning
No ratings yet
9 Learning
16 pages
2.1-Characterization of Learning Problems
No ratings yet
2.1-Characterization of Learning Problems
14 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
15 pages
Chapter 8: Learning: By, Safa Hamdare
No ratings yet
Chapter 8: Learning: By, Safa Hamdare
46 pages
ML
No ratings yet
ML
5 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
28 pages
M Tech Ai Unit Iii
No ratings yet
M Tech Ai Unit Iii
6 pages
Chap 18
No ratings yet
Chap 18
51 pages
Tata Starbucks LTD - A Strategic Analysis
0% (2)
Tata Starbucks LTD - A Strategic Analysis
28 pages
AI Unit 2
No ratings yet
AI Unit 2
14 pages
MC Donald's
50% (2)
MC Donald's
4 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
7 pages
The LEGACEY by Anandahomes - E Brochure
No ratings yet
The LEGACEY by Anandahomes - E Brochure
41 pages
Year 9 NAPLAN Language Conventions Practice Test 2 PDF
No ratings yet
Year 9 NAPLAN Language Conventions Practice Test 2 PDF
8 pages
Langdon & Seah 2017
No ratings yet
Langdon & Seah 2017
166 pages
Panic Room by David Koepp: Converted To PDF by Nas Ahmed
No ratings yet
Panic Room by David Koepp: Converted To PDF by Nas Ahmed
104 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
4 pages
Co-Testing: Multi-View Active Learning: R2 Backto (Cuisine) Backto ( (Number) )
No ratings yet
Co-Testing: Multi-View Active Learning: R2 Backto (Cuisine) Backto ( (Number) )
1 page
ReNeuIR at SIGIR 2023: The Second Workshop On Reaching Efficiency in Neural Information Retrieval
No ratings yet
ReNeuIR at SIGIR 2023: The Second Workshop On Reaching Efficiency in Neural Information Retrieval
4 pages
View Detection Algorithm
No ratings yet
View Detection Algorithm
1 page
Hotel Majapahit Brochure Wedding 2024
No ratings yet
Hotel Majapahit Brochure Wedding 2024
6 pages
Word Practice For Sounds
No ratings yet
Word Practice For Sounds
18 pages
Poland Brochure
No ratings yet
Poland Brochure
2 pages
(PDF) A Case Study On Zomato - The Online Foodking of India - IP Innovative Publication Pvt. Ltd. - Academia - Edu
No ratings yet
(PDF) A Case Study On Zomato - The Online Foodking of India - IP Innovative Publication Pvt. Ltd. - Academia - Edu
8 pages
A Study On Perception of College Going Student Towards Fast Food Consumption Its Effect On Health
No ratings yet
A Study On Perception of College Going Student Towards Fast Food Consumption Its Effect On Health
8 pages
Ip Case Digests
No ratings yet
Ip Case Digests
15 pages
SWS 2
No ratings yet
SWS 2
41 pages
Datos
No ratings yet
Datos
16 pages
Adjective and Adverb Clauses
No ratings yet
Adjective and Adverb Clauses
19 pages
Calangute
No ratings yet
Calangute
8 pages
Grade 6, Project1 Term 1, Math - Science
No ratings yet
Grade 6, Project1 Term 1, Math - Science
13 pages
Research Proposal in The Case of The Rol
No ratings yet
Research Proposal in The Case of The Rol
11 pages
CUB1 Quiz10
No ratings yet
CUB1 Quiz10
3 pages
Radisson Hotels
No ratings yet
Radisson Hotels
3 pages
Importance of Restaurant Industry in Pakistan
No ratings yet
Importance of Restaurant Industry in Pakistan
3 pages
Databases and Ontologies
No ratings yet
Databases and Ontologies
1 page
tina câu chẻ, sửa lỗi sai
No ratings yet
tina câu chẻ, sửa lỗi sai
3 pages
G12 Integrative Performance Task Q4
No ratings yet
G12 Integrative Performance Task Q4
4 pages
Bio in For Matics
No ratings yet
Bio in For Matics
1 page
Lek Kitchen Ac Layout
No ratings yet
Lek Kitchen Ac Layout
1 page
MARKETING
No ratings yet
MARKETING
3 pages
Lake Danao: Floating Restaurant
No ratings yet
Lake Danao: Floating Restaurant
3 pages
Machine Learning Tools: (Scherf Et. Al. 2005)
No ratings yet
Machine Learning Tools: (Scherf Et. Al. 2005)
1 page
Automatic Music Timbre Indexing
No ratings yet
Automatic Music Timbre Indexing
1 page
Automatic Musical Instrument
No ratings yet
Automatic Musical Instrument
1 page
Similarly Presented and Having
No ratings yet
Similarly Presented and Having
1 page
Bioinformatics Programmers
No ratings yet
Bioinformatics Programmers
1 page
Discussed The Application
No ratings yet
Discussed The Application
1 page
American Standard Code For Informa
No ratings yet
American Standard Code For Informa
1 page
Bibliomining For Library Decision-Making: Key Terms
No ratings yet
Bibliomining For Library Decision-Making: Key Terms
1 page
Familiar With The Browser
No ratings yet
Familiar With The Browser
1 page
Bibliomining For Library Decision-Making: Background
No ratings yet
Bibliomining For Library Decision-Making: Background
1 page
Provides More Accurate Recommendations
No ratings yet
Provides More Accurate Recommendations
1 page
Historic Nature of Data
No ratings yet
Historic Nature of Data
1 page
Business Areas Served
No ratings yet
Business Areas Served
1 page
Have Realized The Importance
No ratings yet
Have Realized The Importance
1 page
A Bayesian Based Machine Learning Application To Task Analysis
No ratings yet
A Bayesian Based Machine Learning Application To Task Analysis
1 page
Best Practices in Data Warehousing: Les Pang
No ratings yet
Best Practices in Data Warehousing: Les Pang
1 page
Recorded Phone Conversations Between
No ratings yet
Recorded Phone Conversations Between
1 page
Categories of Customer Behavior
No ratings yet
Categories of Customer Behavior
1 page
Modified For This Purpose
No ratings yet
Modified For This Purpose
1 page
Proceedings of International Symposium
No ratings yet
Proceedings of International Symposium
1 page
Key Terms: A Bayesian Based Machine Learning Application To Task Analysis
No ratings yet
Key Terms: A Bayesian Based Machine Learning Application To Task Analysis
1 page
Bayesian Based Machine Learning
No ratings yet
Bayesian Based Machine Learning
1 page
The Framework For Behavioral Pattern-Based Clustering
No ratings yet
The Framework For Behavioral Pattern-Based Clustering
1 page
What Are Musical Pitch
No ratings yet
What Are Musical Pitch
1 page
Task Analysis Compared
No ratings yet
Task Analysis Compared
1 page
Their Semantic and Multidimen
No ratings yet
Their Semantic and Multidimen
1 page
Support Vector Machines
No ratings yet
Support Vector Machines
1 page
A Small Set of Digital Library
No ratings yet
A Small Set of Digital Library
1 page
Defect Prediction in Software Development & Maintainence
From Everand
Defect Prediction in Software Development & Maintainence
Rudra Kumar
No ratings yet
Oracle 11g Streams Implementer's Guide
From Everand
Oracle 11g Streams Implementer's Guide
Ann L. R. McKinnell
No ratings yet
Analysis and Design of Algorithms: A Beginner’s Hope
From Everand
Analysis and Design of Algorithms: A Beginner’s Hope
Shefali Singhal
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
NetOps 2.0 Transformation: The DIRE Methodology
From Everand
NetOps 2.0 Transformation: The DIRE Methodology
Ray Belleville
5/5 (1)
ISTQB Certified Tester Foundation Level Practice Exam Questions
From Everand
ISTQB Certified Tester Foundation Level Practice Exam Questions
Gabriel Awoyemi
5/5 (1)
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
ISTQB Certified Tester Advanced Level Test Manager (CTAL-TM): Practice Questions Syllabus 2012
From Everand
ISTQB Certified Tester Advanced Level Test Manager (CTAL-TM): Practice Questions Syllabus 2012
Gabriel Awoyemi
No ratings yet
Kafka Developer Certified: The Essential Guide
From Everand
Kafka Developer Certified: The Essential Guide
SUJAN
No ratings yet
Software Testing: A Guide to Testing Mobile Apps, Websites, and Games
From Everand
Software Testing: A Guide to Testing Mobile Apps, Websites, and Games
Mark Garzone
4.5/5 (3)
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
From Everand
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
Ahmed Ph. Abbasi
No ratings yet
Feedback Control Theory
From Everand
Feedback Control Theory
Bruce Francis
5/5 (1)
Effective Test Case Writing
From Everand
Effective Test Case Writing
D. P. Harrison
4/5 (6)
Differential Evolution: Fundamentals and Applications
From Everand
Differential Evolution: Fundamentals and Applications
Fouad Sabry
No ratings yet