0% found this document useful (0 votes)
10 views9 pages

Machine Learning and Artificial Intellig

The document provides an introduction to machine learning and artificial intelligence, emphasizing their applications in various fields, particularly in chemical analysis and expert systems. It discusses the operational definition of AI, the development of expert systems, and the potential of machine learning techniques to enhance data interpretation and automate complex tasks. The document also outlines the version space algorithm as a method for refining knowledge extraction from data.

Uploaded by

willstaben
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views9 pages

Machine Learning and Artificial Intellig

The document provides an introduction to machine learning and artificial intelligence, emphasizing their applications in various fields, particularly in chemical analysis and expert systems. It discusses the operational definition of AI, the development of expert systems, and the potential of machine learning techniques to enhance data interpretation and automate complex tasks. The document also outlines the version space algorithm as a method for refining knowledge extraction from data.

Uploaded by

willstaben
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

A/C INTERFACE

Machine Learning
and Artificial Intelligence
An Introduction

there is a strong tendency for people gram and its user) have been avail-
E. D. Salin to think, "Oh, that wasn't so hard," able for 10 y e a r s , providing easy
Department of Chemistry which often leads to another thought, communication between nonpro-
McGill University "That didn't take any intelligence." g r a m m e r s and powerful d a t a b a s e
801 Sherbrooke St. W. Thus any computer program that ex- m a n a g e m e n t systems. Vision sys-
Montreal, Canada H3A 2K6
ploits explainable techniques gener- tems locate objects and perform in-
ally does not seem very intelligent. spections (1). Expert systems (2-4)
Patrick H. Winston Accordingly, it is easier to get a routinely handle tasks at and above
Massachusetts Institute of Technology
Artificial Intelligence Laboratory feel for artificial intelligence (AI) by novice levels. For example, American
545 Technology Square studying established techniques and Express has an expert system to rap-
Cambridge, MA 01239 describing their applications rather idly consider all large credit card
than by dwelling on definitions and purchases (5). Toyota has an automo-
philosophical nuances. In this article bile repair system that translates to
What is artificial intelligence? An we will focus on techniques of ma- a factor of 10 productivity gain for its
operational definition from an engi- chine learning, a phenomenon t h a t mechanics (6). The military has a
neering perspective might be, "A col- we believe will have a substantial ef- myriad of expert systems that range
lection of t e c h n i q u e s t h a t allows fect on the way science is practiced. from those designed for battlefield
computers to perform t a s k s t h a t There are many other categories of intelligence fusion (7) to those that
would otherwise require intelligent techniques, some of which have had advise on tactics (<§) and procedures
people." This definition, unfortu- considerable practical applications. (9). In some organizations, expert
nately, leads to a curious dilemma. Natural language "front ends" (i.e., systems are small but ubiquitous. Du
Once a problem h a s been solved, software interfaces between a pro- Pont has more than 200 expert sys-

0003-2700/92/0364 -49A/$02.50/0 ANALYTICAL CHEMISTRY, VOL. 64, NO. 1, JANUARY 1, 1992 · 49 A


© 1991 American Chemical Society
A/C INTERFACE
terns (10) performing functions that generate reports, very much as ad- usually the rate-limiting step in the
range from slurry flow diagnostics to vanced Laboratory Information Man- development of large applications.
complex multistate sales tax calcula- agement Systems do now, but with Feigenbaum has called this the "bot-
tions. In other organizations, expert further capabilities t h a t reflect an tleneck" problem (25).
systems are bigger, and their indi- additional measure of "knowledge." Fortunately, machine-learning
vidual contributions are larger in The AI-enabled component of a to- t e c h n i q u e s offer h o p e t h a t t h i s
proportion. For example, the U.S. Air tal system may be only a small part knowledge engineering bottleneck
Force claims to have saved more than of that system. To carry out its tasks, can be eliminated. The value of ma-
$1 billion with an expert system for the entire system might use much chine learning is illustrated in a nu-
purchasing (11). larger components that manage a da- clear fuel plant report in which con-
Chemistry literature abounds with tabase, handle network communica- trol knowledge was automatically
descriptions of expert systems that tions, and work out statistics. Ester extracted from data without tradi-
reflect the power of knowledge-based Dyson, noted AI industry analyst, tional knowledge engineering. The
systems. Applications include areas has called this the "raisin bread" control knowledge paid for its devel-
such as chemical synthesis (12), pro- phenomenon. Like raisins in raisin opment during the first half-day of
cess control (13), atomic line selection bread, AI components may not oc- deployment (26).
(14) or line analysis (15), spectral in- cupy much space, but they are never- We believe that we are on the edge
terpretation (16-18), and chemomet- theless essential to the system. The of a big rush to a new generation of
rics (19). The trend toward increased LMA is a good example. AI-enabled applications in which the
laboratory automation is reflected in emphasis will be on complementing
attempts to manage instrument con- Development of expert systems human intelligence, not replacing it.
trol and operation (20) and in robot- The development of expert systems Although humans will excel at com-
ics (21). Expert systems are used for managing and controlling instru- mon-sense reasoning for the foresee-
everywhere from the lab bench (22) ments is a natural next stage in the able future, AI systems will be well
to the boardroom (23, 24). evolution of the practice of chemical suited to searching masses of data for
The potential effects of the use of analysis (see box at left below). Each regularities in patterns or relation-
expert systems are staggering, and it s t a g e e n c o m p a s s e s all p r e v i o u s ships. These machine-learning tech-
is natural to speculate on how such stages, although the utility of the niques are not merely shortcuts that
systems might be applied in the ana- "old" i n f o r m a t i o n may d e c r e a s e . circumnavigate knowledge engineer-
lytical laboratory, especially given About 20 years ago large-scale com- ing; instead, they enable us to more
the increasing complexity of today's puter development began, and dur- fully interpret data and thus may aid
i n s t r u m e n t s and t h e b u r g e o n i n g ing the first 10 years a great deal of in our struggle to understand more
q u a n t i t y of d a t a e m e r g i n g from effort went into the study of hard- about the world around us, facilitat-
them. Because it will soon be beyond ware interfaces. During the last de- ing discoveries t h a t could not be
the ability of a single expert, or even cade that effort has shifted to soft- made in any other way.
many experts, to evaluate the large ware. Readers may choose to alter
quantity of i n s t r u m e n t - g e n e r a t e d the time scale slightly to suit their Version space
data, a Laboratory Manager's Assis- own experiences, but the trends are To illustrate how machine-learning
tant (LMA) expert system would be quite clear. The science of chemical techniques work, we first consider
of enormous help by storing sample analysis methodology is evolving at the well-known version space algo-
histories, noting conflicting results, an accelerating pace, making it diffi- r i t h m described by Mitchell (27).
and monitoring concentration levels. cult for scientists and nonscientists This algorithm allows the develop-
This LMA could consult s p r e a d - to keep up with new developments. m e n t of a n i n c r e a s i n g l y s h a r p e r
sheets, interrogate databases, and Although expert systems appear to model that subsequently can be used
have enormous potential, one prob- for identification.
lem is that they often take a long The basic operation of the version
time to develop. The traditional ap- space algorithm requires correct and
Evolution of chemical proach to expert system development incorrect examples of the condition of
analysis methodology involves an iterative process in which interest, usually described as posi-
a "knowledge engineer" interviews tive and negative instances. The ver-
Duration one or more "domain experts." Unfor- sion space model consists of two de-
(years) Method used tunately, the typical rate of knowl- scription sets. The first, which we
edge extraction by this approach is a will call the S set, consists of highly
3000 Descriptive: few rule-like chunks of knowledge specific descriptions; the second, G,
color, taste, smell per day, whereas large expert sys- consists of very general descriptions.
400 Quantitative: mass, tems may involve thousands of these For both sets, each description must
volume rules. Thus knowledge engineering is cover all of the positive instances but
60 Analog electronics:
current, voltage
(meters)
30 Digital electronics: data Table I. Description of samples according to attributes
logger
20 Computers: data Client Color PH Appearance Source
acquisition and control Alumco None 1-2 Clear Mine
10 Computers: control Tracelab Yellow 3-4 Cloudy Pond
with feedback (decision NTEX Blue 5-6 Turbid River
loops) Royal-M Green 7-8 Dark Lake

50 A · ANALYTICAL CHEMISTRY, VOL. 64, NO. 1, JANUARY 1, 1992


m u s t not a d m i t any negative i n - characteristics or attributes describ- because ? can denote any value of the
stances. The S set is kept as specific ing each sample are collected and attribute in that position. However,
as possible, and the G set is made as classified, forming a table with five the sample (Tracelab None 3 - 4 ?
general as possible. attributes, each with four possible Mine) does not fit the general de-
As more instances are fed to the al- values, as shown in Table I. Evalua- scription of problem-causing samples
gorithm, the S set descriptions be- tion of every combination of attribute because Tracelab and NTEX are dif-
come more general and those in the and variable would require 1024 dif- ferent clients.
G set become more specific. The algo- ferent tests. By m a k i n g some a s - P o s i t i v e and n e g a t i v e i n -
rithm is considered to have reached a sumptions, however, it is possible to stances. Instance-handling rules are
conclusion when the S and G sets are determine the probable cause of the given in the box below. In examining
identical. The order of presentation problem very rapidly. samples that cause instrument prob-
of positive and negative instances Suppose the instrument problems lems, each sample can be considered
does not affect the final outcome. actually are caused by NTEX sam- to be an instance. After the initial de-
D e s c r i p t i o n l a n g u a g e . An ex- ples from a mine. In our description scription language is understood, pos-
ample will introduce the fundamen- language, these samples would be r e - itive and negative instances can be in-
tals of the description language and ported as (NTEX ? ? ? Mine). The troduced. In the description language,
will also demonstrate the use of sym- first position is occupied by NTEX, + at the front of an instance signifies
bolic, rather t h a n numeric, data rep- the Client attribute, indicating that that the instance has the condition of
resentations such as color, client, and NTEX samples must be involved. The interest or problem, rather than the
source. Although symbolic studies next three positions are occupied by alternative, satisfactory, denoted by - .
have been t h e emphasis in t h e AI a " ?," signifying total indifference to (There is no mathematical or symbolic
arena for many years, chemists have the Color, pH, and Appearance at- significance to the + or -.)
been actively involved in numeric tributes. In general, a ? means t h a t Consider the sample (NTEX None
studies under the guise of chemomet- any value can be in that position. 5 - 6 Cloudy Mine). Because this de-
rics, statistics, or p a t t e r n recogni- To gain some familiarity with the scription fits t h a t of t h e problem-
tion. As will become clear, a complete description language, consider the causing samples, we would add + to
representation of a scientific problem example (NTEX None 3 - 4 Turbid show that it has the condition of in-
often requires t h a t d a t a be repre- Mine), which describes a s a m p l e terest: + (NTEX None 5 - 6 Cloudy
sented in both numeric and symbolic from a NTEX mine t h a t h a s no color, Mine). Analysis of this positive in-
formats, thereby presenting a partic- has a pH in the 3 - 4 range, and is stance using the version space algo-
ularly interesting problem. turbid. This sample fits t h e more rithm leads to
Consider a case in which a group of g e n e r a l d e s c r i p t i o n (NTEX ? ? ?
S = (NTEX None 5 - 6 Cloudy Mine)
samples causes a specific problem in Mine) of samples t h a t cause the in-
an instrument and the source of the strument problem. The example also and
problem m u s t be determined. The fits the set (NTEX None 3 - 4 ? Mine) Q _ ( ? ? ? ? ?)

The most specific description set,


S, consists of just one description: the
Instance-handling rules instance description. The most gen-
eral description set, G, also consists
Initialize the sets S and G respectively, to the sets of maximally specific and of just one description with ?s in ev-
maximally general generalizations that are consistent with the first observed ery position, for there are no nega-
positive training instance. tive instances t h a t need to be ex-
For each subsequent instance I cluded at this point. This process
Begin completes the initialization, the first
part of the box at left.
it I is a negative instance,
then begin Now we can move t h r o u g h t h e
other two primary loops of the algo-
Retain in S only those generalizations that do not match I. rithm (box), considering all samples
Make generalizations in G that matcn I more specifically, but only to the and determining positive and nega-
extent that they no longer match I. and only in such ways that each re- tive instances. Suppose the following
mains more general than some generalization in S. negative instance is provided:
Remove from G any element that is more specific than some other ele-
ment in G. -(Alumco Yellow 7 - 8 Cloudy Pond)
end This negative instance can be used to
else transform the description in the G
>f I is a positive instance, set into several new descriptions be-
then begin cause ( ? ? ? ? ? ) is now too general; it
Retain in G only those generalizations that match I. fits everything, including this nega-
tive instance. The version space algo-
Generalize members of S that do not match I. only to HIP pxtent required rithm returns
to match I, and only in such ways that each remains more specific than
some generalization in G. S = (NTEX None 5 - 6 Cloudy Mine)
Remove from S any element that is more general than some other ele and
ment in S.
G = (NTEX ? ? ? ?)
end
(? None ? ? ?)
end (? ? 5 - 6 ? ?)
(? ? ? ? Mine)

ANALYTICAL CHEMISTRY, VOL. 64, NO. 1, JANUARY 1, 1992 · 51 A


A/C INTERFACE
Because the instance was negative, it generalization of the description in every description in the G set that
is used to transform the description the S set. This causes the algorithm does not cover the new positive in-
in the G set into more specific de- to assume that the attribute value stance is eliminated. Thus the in-
scriptions that exclude negative in- can be anything, ?, for that attribute. stances (? None ? ? ?) and (? ? 5 - 6 ? ?)
stances. Each new description is just With another positive instance, are eliminated because neither in-
specific enough to reject the negative +(NTEX Yellow 7 - 8 Clear Mine), the cludes (NTEX Yellow 7 - 8 C l e a r
instance, yet each is a generalization algorithm becomes Mine).
of the description in the S set. Positive instances also cause the S
S = (NTEX ? ? ? Mine)
A generalization with a value in set to be transformed. In order for S
the Appearance column (fourth at- and to always match old instances, the
tribute) is missing. Because "Cloudy" descriptions in S must become more
G = (NTEX ? ? ? ?)
appears in both negative and positive general. Such generalization involves
instances, there is no way to include (? ? ? ? Mine) changing the existing description to
an appearance value t h a t excludes Now we find that the version space match the new instance. The algo-
the negative instance yet remains a h a s clarified considerably. F i r s t , rithm has tried a number of combi-
nations, but more specific permuta-
tions such as (NTEX ? 3 - 4 ? Mine)
and (NTEX None ? ? Mine) do not
Table II. Full set of sample data partitioned by the Client satisfy the requirement of matching
attribute both the original description in S and
t h e new positive i n s t a n c e . T h u s
Client" Color PH Appearance Source Class" (NTEX None 5 - 6 Cloudy Mine) and
(NTEX Yellow 7 - 8 Clear Mine) com-
NTEX (23) Green 1-2 Cloudy River S bine to produce (NTEX ? ? ? Mine) for
Blue 3-4 Clear Mine Ρ the most specific description set, S.
None 7-8 Dark Lake S With our prior knowledge, we can
Blue 1-2 Clear Mine Ρ see that the S set description actu-
None 5-6 Cloudy Mine Ρ ally has the correct final description,
Yellow 7-8 Cloudy Pond s but the version space algorithm does
Green 3-4 Clear Mine Ρ not know t h a t it has reached the fi-
None 3-4 Dark Mine Ρ nal description until S and G are
Blue 7-8 Clear Mine Ρ identical. If we now consider the
None 3-4 Turbid Mine Ρ sample -(Tracelab None 1-2 Turbid
Green 7-8 Cloudy Mine Ρ Mine) we find that
Yellow 1-2 Dark Mine Ρ
None 5-6 Dark Mine Ρ S = (NTEX ? ? ? Mine)
Green 3-4 Clear River s and
Green 7-8 Cloudy Mine Ρ
Blue 1-2 Cloudy Mine Ρ G = (NTEX ? ? ? ?)
None 3-4 Dark Mine Ρ Because our second generalization
Yellow 3-4 Clear Mine Ρ in the previous G set (? ? ? ? Mine)
Blue 3-4 Cloudy River s matches (Tracelab None 1-2 Turbid
None 5-6 Turbid Pond s Mine), we must try to specialize it,
Blue 5-6 Cloudy Mine Ρ recognizing the rule that any such
None 3-4 Dark Mine Ρ specialization must be a generaliza-
Green 5-6 Turbid Pond s tion of (NTEX ? ? ? Mine), yet not a
specialization of (NTEX ? ? ? ? ) . Be-
Tracelab (5) Blue 7-8 Dark Pond s cause there is no such specialization,
Green 5-6 Clear Mine s (? ? ? ? Mine) is eliminated.
Green 5-6 Cloudy Pond s If we now input -(NTEX None 1-2
Green 7-8 Clear Lake s Turbid Pond) we m u s t specialize
Yellow 7-8 Turbid River s (NTEX ? ? ? ?) so as not to match but
produce a result that is a generaliza-
Alumco (7) Yellow 1-2 Turbid Mine s tion of or equal to (NTEX ? ? ? Mine).
Green 7-8 Dark Mine s This forces convergence to the single
Green 5-6 Cloudy Lake s value
Yellow 3-4 Turbid Mine s
Green 5-6 Dark River s S = G = (NTEX ? ? ? Mine)
Green 5-6 Dark Mine s and we know that only NTEX mine
Green 7-8 Turbid River s samples, independent of all other de-
scriptions, cause problems with our
Royal-M (5) Green 1-2 Cloudy Mine s instrument. Thus, with data from
None 5-6 Clear Mine s only five samples, the group of sam-
Yellow 3-4 Cloudy Pond s ples causing the problem can be de-
None 5-6 Cloudy Pond s termined. Five is the minimum num-
Yellow 1-2 Clear River s ber needed because the data have
a
The number in parentheses denotes the number of samples with the attribute. been purposely selected to give a
" Ρ = (NTEX) and (Mine). rapid convergence. If we wanted to
build a rule set for an expert system,

52 A · ANALYTICAL CHEMISTRY, VOL. 64,, NO. 1, JANUARY 1, 1992


A/C INTERFACE
it would be to learn incrementally from its own creases as the description of the ver-
experience. sion space becomes more concise (27).
If Client is NTEX and Source is Mine,
Because the version space algo- Despite its a t t r a c t i v e f e a t u r e s ,
then instrument problem is True
rithm stores only generalizations and the version space algorithm does
In addition to being elegant, the ver- specializations, and not the entire not have a great deal of application
sion space algorithm develops incre- t r a i n i n g set, the storage require - to r e a l - w o r l d problems. The p r i -
mentally because t h e description ments are smaller t h a n those for mary reason is t h a t it cannot toler-
gets better and better as more infor- m a n y other algorithms (11). The a t e u n c e r t a i n t y or noise in t h e
mation is provided. Even with a par- computational requirements may training set. All instances, positive
tial training or sample set, the ver- also be reduced compared with those or negative, must be correct. What
sion space algorithm can determine of several competitive algorithms; if one sample from the NTEX mine
unambiguously t h a t certain in- however, the computational complex- did not cause a problem? In the real
stances do not belong in the space. ity varies with the size of the train- world, things like this happen. A
This capability is valuable when ing set and the square of both S and second disconcerting feature is t h a t
there is only a limited training set G. C o m p u t a t i o n t i m e i n c r e a s e s , the algorithm cannot tolerate OR-
and when the algorithm is expected reaches a plateau, and finally de- type situations; for example, if the
problem was caused by mine sam-
ples from two clients, Alumco and
NTEX, the algorithm would errone-
Table III. Sample set partitioned by the Appeara nee attribute ously conclude that the Client at-
tribute should be ?.
Appearance ' Client Color pH Source Class" Unfortunately, the real world is
filled with noise from both funda-
Clear (10) NTEX Blue 3-4 Mine Ρ
mental sources and human error, as
NTEX Blue 1-2 Mine Ρ
well as from OR-type s i t u a t i o n s ,
NTEX Green 3-4 Mine Ρ
which must be considered. Thus we
Tracelab Green 5-6 Mine s move our discussion to a more robust
NTEX Blue 7-8 Mine Ρ
methodology that has received con-
NTEX Green 3-4 River s s i d e r a b l e a c c e p t a n c e by w o r k e r s
Royal-M None 5-6 Mine s dealing with real-world problems.
Tracelab Green 7-8 Lake s
NTEX Yellow 3-4 Mine Ρ Inductive learning
Royal-M Yellow 1-2 River s D e c i s i o n trees. The most commonly
used machine-learning algorithms
Cloudy (13) Royal-M Green 1-2 Mine s involve procedures that develop deci -
NTEX Green 1-2 River s sion trees. Rules of the type dis-
NTEX None 5-6 Mine Ρ cussed previously can be easily de-
NTEX Yellow 7-8 Pond s veloped from these trees. The basic
Tracelab Green 5-6 Pond s principle behind these algorithms is
NTEX Green 7-8 Mine Ρ that "The world should be simple."
Alumco Green 5-6 Lake s This means that as few rules as pos-
NTEX Green 7-8 Mine Ρ sible should be needed when develop-
NTEX Blue 1-2 Mine Ρ ing a set of rules; in other words, the
Royal-M Yellow 3-4 Pond s smallest decision tree should be de-
NTEX Blue 3-4 River s veloped. The greatest advantage of
NTEX Blue 5-6 Mine Ρ these algorithms in the real world is
Royal-M None 5-6 Pond s that they can tolerate noise.
Given any data set (without noise)
Turbid (7) Alumco Yellow 1-2 Mine s of η instances, it is always possible
NTEX None 3-4 Mine Ρ
with n-\ rules to separate the data
Alumco Yellow 3-4 Mine s into various classes. This classifica-
Tracelab Yellow 7-8 River s tion is the symbolic counterpart to
Alumco Green 7-8 River s the curve-fitting adage that it is al-
NTEX None 5-6 Pond s ways possible to fit perfectly a set of
NTEX Green 5-6 Pond s η points w i t h a polynomial of η
terms. Many of us are familiar with
Dark (10) NTEX None 7-8 Lake s the student who is delighted to have
Alumco Green 7-8 Mine s a regression coefficient of 1.0 on a
Tracelab Blue 7-8 Pond s l i n e a r r e g r e s s i o n on two points:
NTEX None 3-4 Mine Ρ "Look! Perfect!" The symbolic coun-
NTEX Yellow 1-2 Mine Ρ terpart is to separate two instances
NTEX None 5-6 Mine Ρ with one rule. The key, in both do-
NTEX None 3-4 Mine Ρ mains, is to get the correct represen-
Alumco Green 5-6 River s t a t i o n — a s t r a i g h t l i n e or one
Alumco Green 5-6 Mine s rule—as more real data are collected
NTEX None 3-4 Mine Ρ and processed.
" The number in parentheses denotes the number of samples with the attribute. Consider t h e simple descriptor
6
Ρ = (NTEX) and (Mine). base from the previous example, pre-
sented in Table I. Whereas we previ-

ANALYTICAL CHEMISTRY, VOL. 64, NO. 1, JANUARY 1, 1992 · 55 A


A/C INTERFACE
ously described our instances as be- for the attribute Client and the value available attribute to see which one
ing - or +, we will think now in terms Mine for the attribute Source. gives the best separation. The key-
of classes. To maintain continuity We will use a machine-learning word is "best," because a variety of
with the previous example, we will methodology in the form of an algo- criteria could be used. For our first
use only two classes, "problem" (P) r i t h m t h a t we call Sprouter (28), example, we can use a simple intui-
and "satisfactory" (S), but we are not which is based on the ID3 algorithm tive criterion: the "best" attribute
limited to two classes as we were described by Quinlan (29). First we will leave the maximum number of
with the version space method. In will apply Sprouter to the simple instances in leaves or groups that are
this method, t h e r e can be m a n y data set from our first example. Then homogeneous by class. Because there
classes. As in our previous example, we will make the data more complex are two classes, Ρ and S, this separa-
we know that the class Ρ applies to and see how Sprouter handles it. tion will require a splitting in which
samples that have the value NTEX S p r o u t e r works by t r y i n g each Ρ instances are always sorted into
leaves with other Ρ instances, and S
instances always are grouped to-
gether.
Table IV. NTEX samples partitioned by the Source attribute Table II lists t h e s a m p l e d a t a
grouped by the Client attribute, and
Source3 Color PH Appearance Class" Table III contains the data grouped
Mine (16) Blue 3-4 Clear Ρ by Appearance. It can be noted im-
Blue 1-2 Clear Ρ mediately from Table II that all of
None 5-6 Cloudy Ρ the problem instances are contained
Green 3-4 Clear Ρ in the Client attribute NTEX. All
None 3-4 Dark Ρ other attributes contain only S in-
Blue 7-8 Clear Ρ stances. Intuitively we recognize that
None 3-4 Turbid Ρ the client attribute value NTEX will
Green 7-8 Cloudy Ρ separate the Ρ from several of the S
Yellow 1-2 Dark Ρ instances.
None 5-6 Dark Ρ Another a t t r i b u t e m u s t also be
Green 7-8 Cloudy Ρ considered, because the NTEX leaf
Blue 1-2 Cloudy Ρ needs additional work. All the other
None 3-4 Dark Ρ leaves are homogeneous; that is, they
Yellow 3-4 Clear Ρ contain only one class, but the leaf
Blue 5-6 Cloudy Ρ corresponding to the attribute value
None 3-4 Dark Ρ NTEX still has a mixture of S and Ρ
class instances. The same procedure
River (3) Green 1-2 Cloudy s is thus applied to this leaf, and all of
Green 3-4 Clear s the remaining attributes—pH, Color,
Appearance, and Source—are tested.
Blue 3-4 Cloudy s This process goes faster because only
Pond (3) Yellow 7-8 Cloudy s the instances in this one leaf need to
None 5-6 Turbid s be considered. Table IV shows that
complete separation is achieved after
Green 5-6 Turbid s consideration of the Source attribute.
Lake ft) None 7-8 Dark s Now all the instances with Ρ as a
8
class are in the Mine leaf, and all the
e
The number in parentheses denotes the number of samples with that attribute. S instances are in the other leaves.
P = (NTEX) and (Mine).
By inspection we can see that every
sample is classified without ambigu-
ity. From this decision tree process
we can conclude
If Client is NTEX and Source is Mine,
then Class is Ρ
Samples
Note t h a t the initial branch could
have been on source rather than on
client, with the same end result.
Client Alumco NTEX Tracelab Royal-M Because our example was quite
easy, we got away with the simple
rule, "Split on t h e a t t r i b u t e t h a t
leaves the maximum number of in-
stances in homogeneous groups." In
more difficult situations, more gen-
eral criteria must be used to estab-
lish which attribute to use. Usually
these criteria are based on statistics
or on information theory (29) in
which each test produces a "disorder
value" that is compared with the dis-
Figure 1. Decision tree with symbolic attribute leaves. order values for the other attributes.

56 A · ANALYTICAL CHEMISTRY, VOL. 64, NO. 1, JANUARY 1, 1992


A/C INTERFACE
The attribute with the most favor- fortunately, at the present time,
able disorder value is used wherever there is no perfect disorder criterion.
further clarification is required. Un- N u m b e r s . As described thus far,

Samples

Client Alumco NTEX Tracelab Royal-M

Analytical Chemistry
of Bacillus
thuringiensis 1-2 3-4 5-6 7-8
PH
S
ure to become the standard reference in
the field, this unique text concentrates
on describing and using analytical tech-
niques for identifying and quantifying active
inclusion proteins and β-exotoxins produced
by Bacillus thuringiensis (Bt). No other volume
brings together in one source all the"'major
analytical techniques-including state-of-the-
art immunoassays and chromatographic as-
says-that researchers in academia and in-
dustry have developed to accurately analyze Bt
products.
This 13-chapter study covers methods such
as reverse-phase HPLC, cyanogen bromide Figure 2. Decision tree with symbolic attribute leaves and unfinished leaves.
mapping. SDS-PAGE. and multiparameter light
scattering. An especially interesting chapter
details the expression of toxic proteins in
transgenic plants.
Companies developing chemical instrumen- Samoles
tation will find this book a rich source of infor-
mation as will researchers in biological control
and analytical and molecular biologists special-
izing in qene and protein analysis. Client Alumco NTEX Tracelab Royal-M
CONTENTS
An Overview · Quantification of Active Ingredient
Percentage · Quantification of Bacillus thuringlensls
'--Endotoxin · Specificity of Insecticidal Crystal
Proteins · In Vitro Analysis of ι-Endotoxin
Action · Identification of Entomocidal Toxins
• Characterization of Parasporal Crystal Toxins · An
HPLC Assay for a Endotoxin · Use of SOS —PAGE
To Quantify -Endotoxins * Immunoassay of
Insecticidal Proteins · -.Endotoxin Production in <6.3 >6.3
Inclusion Bodies · Quantification of Insect Control
Protein · HPLC Analysis of Two β-Exotoxlns

Leslie A. Hickle, Editor. Mycogen Corporation


William L. Fitch, Editor. Sandoz Crop
Protection Corporation
Developed from a symposium sponsored by the Divi-
sion of Agrochemicals of the American Chemical
Figure 3. Decsio
i n tree with symbocil and contn
i uous atribute e
l aves.
Society
ACS Symposium Series No. 432
149 pages (1990) Clothbound Table V. NTEX noisy data sets
ISBN 0-8412-1815-3 LC 90-38800
$34.95
Set Color PH Appearance Source Class
Ο · R · D · Ε · R F · R - Ο - M
Noisy Data 1 Dark River Ρ
III

American Chemical Society 7-8


Distribution Office, Dept. 79 1-2 Clear Mine Ρ
1155 Sixteenth St.. N.W. 5-6 Cloudy Mine Ρ
Washington. DC 20036
or CALL TOLL FREE
Noisy Data II 7-8 Dark Mine s

800-227-5558 1-2 Clear Mine Ρ


III

5-6 Cloudy Mine Ρ


(in Washington. D.C. 872-4363) and use your credit card!

58 A · ANALYTICAL CHEMISTRY, VOL. 64, NO. 1, JANUARY 1, 1992


The American Chemical Society
Presents
Sprouter seems to handle symbolic If Client is NTEX and Source is
descriptions well, but our world is (Mine OR River), then Class is Ρ
described by both symbols and num-
bers. Because numbers are the tradi- The viability of an OR condition
tional domain of the statistician, a could be determined by a supervising
natural question is, "What about nu- expert or statistical methods.
meric attributes?" The situation is slightly different
The following approach is com- in the set of NTEX Noisy Data II in
which two Mine samples are Ρ and
High Performance
monly used for handling a continu-
ous numeric attribute. The instances
are sorted by the numeric value of
one is S. This is also a viable possi-
bility, and, once again, one could re-
Liquid
the attribute. From this sort, a list of
possible thresholds is produced, each
sort to statistics or expert advice to
decide whether this branch should be Chromatography:
of which corresponds to the midpoint
between adjacent values. For each
pruned or included. The effect on the
rule set is somewhat different, how- Theory and Practice
possible threshold, the sets on both ever. Two rules result:
Tuesday-Friday,
sides are tested for disorder. The best If Client is NTEX and Source is Mine,
threshold value for the entire process February 11-14,1992
then Class is Ρ (confidence 66%)
is retained as a reference point for a Tuesday-Friday,
test on t h a t attribute. Thus this ap- and June 23-26,1992
proach produces rules with "greater
than" and "less than" relationships. If Client is NTEX and Source is Mine,
then Class is S (confidence 33%) An intensive four-day short course
To illustrate, we extend our exam- providing practical hands-on
ple to show the simple way in which Assuming that this subset of the
Sprouter can deal with a numeric at- full set has all of the NTEX Mine experience with the techniques
tribute—in this case, pH values. An- cases, then 66% of the time NTEX and instrumentation ofHPLC
o t h e r g r o u p of s a m p l e s causes a Mine samples are P, and 33% of the
problem. By developing a decision time they are S. This example illus-
tree using attributes of Client and t r a t e s the synergistic relationship Here's How You'll Benefit
pH as in Figure 1, we can determine t h a t can exist between inductive from This Course:
that NTEX samples with pH values learning systems and rule-based ex-
of 5 - 6 and 7 - 8 are P. This result can pert systems. The inductive learning • Learn how to solve tough separation
be identified as an OR relationship: system develops the decision trees problems
pH = 5 - 6 OR 7 - 8 . An astute reader with appropriate weightings for the
of the decision tree might then notice branches. The rule-based expert sys- • Find out how to perform quantitative
t h a t the numeric ranges 5 - 6 and tem is designed to handle the "fuzzy and qualitative analyses
7 - 8 are adjacent and deduce that the logic" (weightings or confidence fac- • Be able to interpret and
Ρ representation should be 5 - 8 . Con- tors) so often required to describe troubleshoot from chromatograms
sider a slightly modified data set in real-world situations. It is not sur-
which some of the NTEX samples prising that inductive learning sys-
• Learn sample preparation techniques
with pH values in the range 5 - 6 are tems are often integral components • Become familiar with new
S whereas others are P. The effect on of modern expert systems. techniques and equipment
the decision tree is quite interesting,
as shown in Figure 2. There is addi- Other techniques Register Today! Enrollment is
tional branching from the pH 5 - 6
We have described j u s t two algo- limited to 24 participants.
node. If more specific pH measure-
rithms for extracting regularity from
ments were available, it would be For more information, phone the Continuing Education
data; t h e r e are many more. Some
possible to determine a threshold Short Course Office at (800) 227-5558 (TOLL FREE) or
other algorithms are inspired by the
that has real meaning to a chemist— (202) 872-4508. Or, mail the coupon below to:
seductive desire to imitate nature, or American Chemical Society, Dept. of Continuing
for example, pH > 6.3. The decision
at least what we think we see in na- Education, Meeting Code VPI9202,1155 Sixteenth
tree depicted in Figure 3 illustrates
t u r e . Among t h e most popular of Street, N.W., Washington, DC 20036.
this situation and the power of our
these are algorithms based on net-
basic rule, "The world should be sim-
works of neuron-like elements.
ple."
The important point is that no one YES! Please send me a FREE tnjdwre on the ACS
Uncertainty. One of the endear- algorithm is right for all circum- Short Course. High Performance Liquid Chnxnatognphy:
ing features of inductive learning stances, so an expert in machine Theory and Practice, to be he'd Februry 11 -14.1992, and
systems is their capacity to tolerate learning must be well versed in a va- June 23-26,1992, in Backsburg, Virgin».
uncertainty or noise. Consider the riety of techniques, and any system
examples for which data are given in for helping chemists must include an
Table V. In the set of Noisy Data I armamentarium of techniques. NAME
both NTEX Mine samples and NTEX We argue that AI will bring about
River samples are P. Is this a noise TITLE
an important change in instrument
problem? Not necessarily. Here again design. Within the next 10 years or ADDRESS
statistics come into play. If only a so, leading-edge instruments will in-
small fraction of the data set sup- corporate regularity-spotting learn-
ports this branch of the tree, then it ing algorithms in their ubiquitous
might be pruned. If it is included in microprocessors so as to supply not CΞTY. STATE. ZIP
the data set it will cause an OR con- only data but also interpretations of
dition so t h a t the rule will read the data and of their own health.

ANALYTICAL CHEMISTRY, VOL. 64, NO. 1, JANUARY 1, 1992 · 59 A


A/C INTERFACE
LABORATORY References (24) Lein, J. K. Int. J. Environ. Stud. 1989,
SERVICE CENTER 33, 13-27.
(1) Grimson, W.E.L. Object Recognition by (25) Feigenbaum, E. A. In State of the Art
Computer: The Role of Geometric Con- Report on Machine Intelligence; Bond, Α.,
straints. MIT Press: Cambridge, MA, Ed.; Bergamon-Infotech: Maidenhead,
FREE CATALOG 1990.
(2) De Monchy, A. R.; Forster, A. R.; Ar-
England, 1981.
(26) Leach, W. J. Proceedings of the Interna-
GRAPH: great for scientific plotting and standard
curve calculations, MINSQ: non-linear curve fitting retteig, J. R.; Lan, L.; Deming, S. N. tional Conference and Exhibit; Instrument
and modeling, EQUIL: solution chemical equilibrium Anal. Chem. 1988, 60, 1355 A. Society of America: Research Triangle
calculations, and more! Prices from $150. (3) Settle, F. Α., Jr.; Pleva, M. A. Am. Lab. Park, NC, 1986; Vol. 41, paper 86-2601.
1-800-942-MATH (Fairfield, Conn.) 1988, 20(10), 64.
(4) Wade, A. P.; Crouch, S. R.; Betteridge,
(27) Mitchell, T. M. Artificial Intelligence
1982, 18, 203-26.
MicroMath, Salt Lake City, UT 84121-0550 D. TrAC, Trends Anal. Chem. (Pers. Ed.) (28) Winston, P. H. Artificial Intelligence,
1988, 7, 358-65. in press.
LOWEST COST-HIGHEST PERFORMANCE
GAS CHROMATOGRAPHS AND INTEGRATORS
(5) Feigenbaum, E.; McCorduck, P.; Nii, (29) Quinlan, J. R. Machine Learning
SEVEN DETECTORS FOR FIELD PORTABLE-TEMP. PROG. H. P. The Rise of the Expert Company; 1986, 7,81-106.
ALL EPA-ASTU
ALL EPA ASTM ΜΕΤΗΠΓΜ
METHODS DATA SYSTEMS-INTEGRATORS
ρ ^ , , ^ Α Ν 0 raAp Times Books: New York, 1988; p. 92.
FID TCD ECD FPD GAS SAMPLING VALVES (6) Feigenbaum, E.; McCorduck, P.; Nii,
PID ELCD NPD THERMAL DESORBERS H. P. 77ie Rise of the Expert Company;
COMBINE ANY OR ALL ^ STARTING AT J2495.00 Times Books: New York, 1988; p. 85.
SRI INSTRUMENTS RENTALS AT 0.5% PER DAY
2 1 3 - 2 1 4 - 5 0 9 0 fa» S0Q7 TRAINING CLASSES (7) Bonasso, R. P. Proceedings of the Army
Z U - Z 1 4 - 5 U 3 U tax 5 0 9 7 TWO YEAR WARRANTY Conference on Applications of AI to Battle-
field Information Management; Battelle
Columbus Laboratories: Washington,
LABORATORY SERVICE CENTER DC, April 1983.
(8) Callero, M.; Waterman, D. Α.; Kipps,
J. R. "TATR: A Prototype Expert Sys-
(Equipment, Materials, Services, In- tem for Tactical Air Targeting"; Report
struments for Leasing), Maximum No. R-3096-ARPA; The Rand Corpora-
E. D. Salin received his B.S. degree from
space — 4 inches per advertise- tion: New York, 1984.
(9) Anderson, B. M., Cramer, N. L.; Line- the University of California at Berkeley in
ment. Column width, 2-3/16"; two 1969 and his Ph.D. in 1972from Oregon
berry, M.; Lystad, G. S.; Stern, R. C.
column width, 4-9/16". Artwork ac- Proceedings of the First Conference on Artifi- State University under the direction of Jim
cepted. No combination of directory cial Intelligence Applications; IEEE Com- Ingle, Jr. In 1979, after two years of post-
rates with ROP advertising. Rates puter Society: Los Alamitos, CA, Dec.
1984. doctoral studies at the University of Al-
based on number of inches used
berta with Gary Horlick, he joined the
within 12 months from first date of (10) Feigenbaum, E.; McCorduck, P.; Nii,
H. P. The Rise of the Expert Company; staff of McGill University, where he is now
first insertion. Times Books: New York, 1988; p. 142. an associate professor. In 1989-90 he
Per inch: 1 " — $185; 12" — $180; (11) Allen, M. K. The Development of an Ar- was a Visiting Scientist at the MIT Artifi-
24" — $175; 38" — $170; 48" — tificial Intelligence System for Inventory
cial Intelligence Laboratory, where he
Management Using Multiple Experts; Coun-
$185. cil of L o g i s t i c s M a n a g e m e n t : O a k worked with the director, Patrick H. Win-
Brooke, IL, 1986. ston, on machine learning. His interests
(12) Hendrickson, J . B.; Bernstein, Z.; include atomic spectroscopy, particularly
CALL OR WRITE JANE GATENBY Miller, T. M.; Parks, C ; Toczko, A. G. In sample introduction methodologies, and
Expert System Applications in Chemistry
(ACS Symposium Series 408); Hohne, computation in the laboratory, including
ANALYTICAL CHEMISTRY Β. Α.; Pierce, T. H., E d s . ; American chemometrics; automation; robotics; and
500 Post Road East Chemical Society: W a s h i n g t o n , DC, artificial intelligence, which he believes
1989; p. 62.
P.O. Box 231 will be a dominant factor in the way sci-
(13) Moore, R. L. In Expert System Applica-
Westport, CT 08880 tions in Chemistry (ACS Symposium Se- ence is practiced in the next century.
203-228-7131 ries 408); Hohne, Β. Α.; Pierce, T. H.,
Eds.; A m e r i c a n C h e m i c a l Society;
FAX: 203-454-9939 Washington, DC, 1989; p. 169.
(14) Webb, D. P.; Salin, E. D. / Anal. At.
Spectrom. 1990, 4, 793-96.
HELP WANTED ADS (15) Pomeroy, R. S.; Kolczynski, J . D.;
Denton, M. B. Appl. Spectrosc. 1 9 9 1 ,
ROP display at ROP rates. Rate based 45(7), 1111.
on number of insertions within contract (16) Perkins, J . H.; Hasenoehrl, E. J.;
year. Cannot be combined for frequency. Griffiths, P. R. Anal. Chem. 1991, 63,
Unit 1-TI 6-T1 12-TI
1738.
1" (25 mm) $210 $190 $180 (17) Curry, B. In Computer-Enhanced Ana-
lytical Spectroscopy; Meuzelaar, H.L.C.,
24-TI 48-TI 72-TI Ed.; Plenum: New York, 1990; Vol. 2, pp.
$170 $160 $150 183-209. Patrick H. Winston is professor of com-
(18) Hong, H. X.; Xin, X. / . Chem. Inf.
ANALYTICAL CHEMISTRY Comput. Set. 1990, 30, 203-10.
puter science and director of the Artificial
500 Post Road East Intelligence Laboratory at the Massachu-
(19) Marsili, M.; Marengo, E.; Sailer, H.
P.O. Box 231 Anal. Chim. Acta 1988, 210, 33-50. setts Institute of Technology. He also re-
Westport, CT 06880 (20) Browett, W. R.; Stillman, M. / Prog. ceived his M.S. degree and his Ph. D. from
203-226-7131/FAX: 203-454-9939 Anal. Spectrosc. 1989, 12, 73-110. MIT. His doctoral thesis introduced ideas
(21) Isenhour, T. L.; Marshall, J. C.J. Res. about computer learning from examples.
Natl. Bur. Stand. (U.S.) 1988, 95(3), 2 0 9 -
Current research interests are in AI and
FREE DATA, FAST 12.
(22) Settle, F. Α., J r . ; Diamondstone, allied fields; he is involved in the study of
To quickly amass data on all of the prod- B. I.; Kingston, H. M.; Pleva, M. A. / learning by analogy, common-sense prob-
ucts you need, consult the Lab Data Ser- Chem. Inf. Comput. Sci. 1989, 29, 11-17. lem solving, and application of AI to de-
vice Section on our Analytical Chemistry (23) Greathouse, D. G. Proceedings of the sign. He has lectured widely, both domes-
81st APCA Annual Meeting; Air & Waste
reader reply card insert. Management Association: Pittsburgh, tically and abroad, and is the author of
PA, 1988; paper 88/51.10. several books.

60 A · ANALYTICAL CHEMISTRY, VOL. 64, NO. 1, JANUARY 1, 1992

You might also like