0% found this document useful (0 votes)
19 views1 page

Co-Testing: Multi-View Active Learning: R2 Backto (Cuisine) Backto ( (Number) )

This document discusses active learning algorithms for multi-view learning, including Co-Testing and Co-EMT. Co-Testing uses initially labeled data to train classifiers in different views, then queries the user to label examples where the views disagree. Co-EMT improves on Co-Testing by interleaving it with Co-EM, a semi-supervised multi-view learner, allowing the algorithms to benefit from each other's labeled and unlabeled data. The document provides an example of applying these techniques to extract phone numbers from documents using forward-looking and backward-looking rules as different views.

Uploaded by

Srinivas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views1 page

Co-Testing: Multi-View Active Learning: R2 Backto (Cuisine) Backto ( (Number) )

This document discusses active learning algorithms for multi-view learning, including Co-Testing and Co-EMT. Co-Testing uses initially labeled data to train classifiers in different views, then queries the user to label examples where the views disagree. Co-EMT improves on Co-Testing by interleaving it with Co-EM, a semi-supervised multi-view learner, allowing the algorithms to benefit from each other's labeled and unlabeled data. The document provides an example of applying these techniques to extract phone numbers from documents using forward-looking and backward-looking rules as different views.

Uploaded by

Srinivas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Active Learning with Multiple Views

and Adaptive View Validation. Note that these algo- This rule is applied forward, from the beginning
rithms are not specific to wrapper induction, and they of the page, and it ignores everything until it finds the
have been applied to a variety of domains, such as text string Phone:<i>. Note that this is not the only way to
classification, advertisement removal, and discourse detect where the phone number begins. An alternative
tree parsing (Muslea, 2002). way to perform this task is to use the following rule:

Co-Testing: Multi-View Active Learning R2 = BackTo( Cuisine ) BackTo( ( Number ) )

Co-Testing (Muslea, 2002, Muslea et al., 2000), which which is applied backward, from the end of the docu-
is the first multi-view approach to active learning, ment. R2 ignores everything until it finds “Cuisine”
works as follows: and then, again, skips to the first number between
parentheses.
• First, it uses a small set of labeled examples to Note that R1 and R2 represent descriptions of the
learn one classifier in each view. same concept (i.e., beginning of phone number) that are
• Then, it applies the learned classifiers to all unla- learned in two different views (see Muslea et al. [2001]
beled examples and asks the user to label one of for details on learning forward and backward rules).
the examples on which the views predict different That is, views V1 and V2 consist of the sequences of
labels. characters that precede and follow the beginning of the
• It adds the newly labeled example to the training item, respectively. View V1 is called the forward view,
set and repeats the whole process. while V2 is the backward view. Based on V1 and V2,
Co-Testing can be applied in a straightforward man-
Intuitively, Co-Testing relies on the following ob- ner to wrapper induction. As shown in Muslea (2002),
servation: if the classifiers learned in each view predict Co-Testing clearly outperforms existing state-of-the-art
a different label for an unlabeled example, at least one algorithms, both on wrapper induction and a variety of
of them makes a mistake on that prediction. By ask- other real world domains.
ing the user to label such an example, Co-Testing is
guaranteed to provide useful information for the view Co-EMT: Interleaving Active and
that made the mistake. Semi-Supervised Learning
To illustrate Co-Testing for wrapper induction, con-
sider the task of extracting restaurant phone numbers To further reduce the need for labeled data, Co-EMT
from documents similar to the one shown in Figure 2. (Muslea et al., 2002a) combines active and semi-
To extract this information, the wrapper must detect supervised learning by interleaving Co-Testing with
both the beginning and the end of the phone number. Co-EM (Nigam & Ghani, 2000). Co-EM, which is a
For instance, to find where the phone number begins, semi-supervised, multi-view learner, can be seen as the
one can use the following rule: following iterative, two-step process: first, it uses the
hypotheses learned in each view to probabilistically
R1 = SkipTo( Phone:<i> ) label all the unlabeled examples; then it learns a new
hypothesis in each view by training on the probabilisti-
cally labeled examples provided by the other view.
By interleaving active and semi-supervised learn-
ing, Co-EMT creates a powerful synergy. On one hand,
Figure 2. The forward rule R1 and the backward rule Co-Testing boosts Co-EM’s performance by providing
R2 detect the beginning of the phone number. Forward it with highly informative labeled examples (instead
and backward rules have the same semantics and differ of random ones). On the other hand, Co-EM provides
only in terms of from where they are applied (start/end Co-Testing with more accurate classifiers (learned
of the document) and in which direction from both labeled and unlabeled data), thus allowing
Co-Testing to make more informative queries.
R1: SkipTo( Phone : <i> ) R2: BackTo( Cuisine) BackTo( (Number) ) Co-EMT was not yet applied to wrapper induction,
Name: <i>Gino’s </i> <p>Phone :<i> (800)111-1717 </i> <p> Cuisine : … because the existing algorithms are not probabilistic

You might also like