We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1
Section: Unlabeled Data
Active Learning with Multiple Views
Ion Muslea SRI International, USA
INTRODUCTION shown by Nigam, et al. (2000) and Raskutti, et al. (2002),
their distribution can be used to boost the accuracy of a Inductive learning algorithms typically use a set of classifier learned from the few labeled examples. labeled examples to learn class descriptions for a set of Intuitively, semi-supervised, multi-view algorithms user-specified concepts of interest. In practice, labeling proceed as follows: first, they use the small labeled the training examples is a tedious, time consuming, er- training set to learn one classifier in each view; then, ror-prone process. Furthermore, in some applications, they bootstrap the views from each other by augmenting the labeling of each example also may be extremely the training set with unlabeled examples on which the expensive (e.g., it may require running costly labora- other views make high-confidence predictions. Such tory tests). In order to reduce the number of labeled algorithms improve the classifiers learned from labeled examples that are required for learning the concepts data by also exploiting the implicit’ information pro- of interest, researchers proposed a variety of methods, vided by the distribution of the unlabeled examples. such as active learning, semi-supervised learning, and In contrast to semi-supervised learning, active meta-learning. learners (Tong & Koller, 2001) typically detect and ask This article presents recent advances in reducing the user to label only the most informative examples the need for labeled data in multi-view learning tasks; in the domain, thus reducing the user’s data-labeling that is, in domains in which there are several disjoint burden. Note that active and semi-supervised learners subsets of features (views), each of which is sufficient take different approaches to reducing the need for la- to learn the target concepts. For instance, as described beled data; the former explicitly search for a minimal in Blum and Mitchell (1998), one can classify segments set of labeled examples from which to perfectly learn of televised broadcast based either on the video or on the target concept, while the latter aim to improve a the audio information; or one can classify Web pages classifier learned from a (small) set of labeled examples based on the words that appear either in the pages or by exploiting some additional unlabeled data. in the hyperlinks pointing to them. In summary, this In keeping with the active learning approach, this article focuses on using multiple views for active learn- article focuses on minimizing the amount of labeled data ing and improving multi-view active learners by using without sacrificing the accuracy of the learned classi- semi-supervised- and meta-learning. fiers. We begin by analyzing co-testing (Muslea, 2002), which is a novel approach to active learning. Co-testing is a multi-view active learner that maximizes the benefits BACKGROUND of labeled training data by providing a principled way to detect the most informative examples in a domain, Active, Semi-Supervised, and thus allowing the user to label only these. Multi-view Learning Then, we discuss two extensions of co-testing that cope with its main limitations—the inability to Most of the research on multi-view learning focuses on exploit the unlabeled examples that were not queried semi-supervised learning techniques (Collins & Singer, and the lack of a criterion for deciding whether a task 1999, Pierce & Cardie, 2001) (i.e., learning concepts from is appropriate for multi-view learning. To address the a few labeled and many unlabeled examples). By them- former, we present Co-EMT (Muslea et al., 2002a), selves, the unlabeled examples do not provide any direct which interleaves co-testing with a semi-supervised, information about the concepts to be learned. However, as multi-view learner. This hybrid algorithm combines the benefits of active and semi-supervised learning by