Neuro-Fuzzy and Soft Computing-A Computational Approach To Learning and Machine Intelligence (Book Review)
Neuro-Fuzzy and Soft Computing-A Computational Approach To Learning and Machine Intelligence (Book Review)
net/publication/3022849
CITATIONS READS
1,400 12,748
3 authors:
Eiji Mizutani
National Taiwan University of Science and Technology
54 PUBLICATIONS 3,789 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Jyh-Shing R Jang on 20 August 2015.
Book Reviews
In this section, the IEEE Control Systems Society publishes reviews of books in the control field and related areas. Readers are invited
to send comments on these reviews for possible publication in the Technical Notes and Correspondence section of this TRANSACTIONS.
The CSS does not necessarily endorse the opinions of the reviewers.
If you have used an interesting book for a course or as a personal reference, we encourage you to share your insights with our readers
by writing a review for possible publication in the TRANSACTIONS. All material published in the TRANSACTIONS is reviewed prior to
publication. Submit a completed review or a proposal for a review to:
D. S. Naidu
Associate Editor—Book Reviews
College of Engineering
Idaho State University
833 South Eighth Street
Pocatello, ID 83209 USA
Neuro-Fuzzy and Soft Computing—A Computational Ap- one does this through the use of the optimal cost-to-go function
proach to Learning and Machine Intelligence—J. S. R. Jang, V (state). In optimal control, it is the LaGrange multiplier function
C. T. Sun, and E. Mizutani (Englewood Cliffs, NJ: Prentice-Hall, (t). Both devices summarize the future effect of present control in
1997). Reviewed by Yu-Chi Ho. terms of its immediate effect on the next state. This way a dynamic
optimization problem can be reduced to a series of static optimization
problems. The price you pay is of course that you must solve either
This is a book every modern control engineer should have on the functional equation or the two-point boundary value problem of
his/her reference bookshelf. It is worth the $89.00 price even if you optimal control that links the series of static problems.
have to pay for it out of your personal funds. First, it collects in one
place, in consistent notation, all of the information on computational
Intelligence (CI), such as Neural Networks (NN), Fuzzy Logic (FL), II. ITERATION TRADEOFF BETWEEN ESTIMATION AND UPDATING
Genetic Algorithms (GA), and other acronyms like SA, radial basis When you are trying to solve a stochastic optimization prob-
function networks (RBFN’s), etc., that you always wanted to know lem, Min (Max)E [L(; )], or the equivalent root finding problem,
but were afraid to ask regarding the mass of jargons that have grown Grad E [L(; )] = 0,3 via successive approximation, you face
over the years in the subject. Second, this is a thoroughly modern the following tradeoff. Since the value of any expectation, E [L],
book complete with Matlab exercises, companion floppy disks (for must be estimated, one must decide on how accurately to estimate
the asking from the publisher), and websites for the latest updates the quantity before using it for the purpose of updating or hill
and resources. Third, the book is remarkably restrained in its hype climbing. For a finite computing budget, one can either spend most
for the subject, which cannot be said for many other works in of the budget on estimating each iterative step very accurately and
this area. Finally, there is useful information in here that control settle for fewer steps, or estimate each step very poorly but use
1
engineers should know beyond the traditional tools of the trade
such as H , robustness, LQG, Lyapunov, and other differential
the budget to calculate many iterative steps. In the former, you
have accurate information but have few chances of improvement,
equation/frequency domain-based techniques. while in the latter you have poorer information but can afford
Having said this, let me provide a guided tour for the readers and many chances for improvement. Particularly in dynamic system
some reservations.1 Before doing this, however, we shall first state optimization problems, one has the issue of trading updating (hill
some well-known facts and self-evident truths (at least to the readers climbing) a strategy/policy/control law after every state transition, or
of the IEEE TRANSACTIONS ON AUTOMATIC CONTROL) with which to postponing the updating until the entire trajectory based on the policy
better understand the structure of CI and to penetrate the jargons.2 is played out, or anywhere in between. The choice of the step length
parameter in the Robbins–Munroe stochastic approximation algorithm
I. CONVERTING DYNAMIC OPTIMIZATION PROBLEMS TO STATIC ONES is another quantitative manifestation of this tradeoff consideration for
convergence purposes.
The principal idea in dynamic optimization is to convert long-
term effects into short-term consideration. In dynamic programming
III. STATISTICAL FUNCTION FITTING
The work reported in this paper was supported in part by NSF
grants EEC-95-27422, EID-92-12122, ARO contracts DAA1-03-92-G- The generic situation is that we have a mass of data and we wish
0115, DAAH-04-95-0148, and AFOSR contract F49620-95-1-0131. The to summarize it using a model with much fewer parameters. Once
reviewer is with Harvard University, Cambridge, MA 02138 USA (e-
thus modeled, we can economically use the model in place of the
mail:[email protected]).
Publisher Item Identifier S 0018-9286(97)06595-1. original data and/or generate new data which we hope will have the
1 Or rather urges to the authors for their next edition. same statistical properties as the original (the so-called generalization
2 The reviewer hastens to point out that the authors do a very good job property). The simplest version of this process is linear regression,
of defining and explaining the jargon. However, they have not provided the
control theory perspective that TRANSACTIONS readers might like to have in 3 We should point out that solving the Bellman equation in stochastic
understanding CI. dynamic programming is a problem of this type.
where you use a linear parametric model to least square fit a set DATA ) INFORMATION )
KNOWLEDGE or the SFF idea
of given data. The fit is declared complete if the “residue” can be discussed above. Learning with a teacher is often used when both
identified as i.i.d. noises. This is in the sense of all “information” the input and the desired output is explicitly given as with NN
having been squeezed out of the data, and there is no way to further learning or pattern classification. Reinforcement learning refers to
predict the white noise residue. More generally, nonlinear models can cases where the desired output may be partially known, such as
be used for the fit. And if the black box generating the data is known either right or wrong, but not how right or how wrong.5 Finally,
to have a certain structure, then the same structure should be adopted learning without supervision such as in clustering is really based on
in the fitting model. This is the basis of the Kalman filter which in the implicit criterion of similarity and principal component analysis
essence says that using polynomials to fit general dynamic systems or as in filtering is based on the criterion of innovation residue
output is inefficient and plainly wrong. We should use the system (refer again to the SFF idea above). In short, we shall argue that
responses themselves as the fitting functions. This is why we see the the distinctions between “learning with a teacher,” “reinforcement
presence of both the system dynamic equation and the observation learning,” and “learning without supervision” are somewhat artificial.
equation in the filter. On the other hand, if nothing is known about They are all problems that attempt to convert raw experience into
the source generating the data, then a general model-free architecture digested knowledge under some form of explicit or implicit guidance
is adopted. There are many generic “model free” candidates for data via the solution of an optimization problem.
fitting starting with polynomials, Fourier series, to wavelets, RBFN’s, However, learning under dynamic constraints (Ch. 10) imposes
and other NN’s, each with their advantages and disadvantages. For additional difficulties. A principal issue is a more involved identi-
example, wavelets and certain neural networks, unlike the Fourier fication of cause-and-effect in the sense that future desired output
series or threshold functions, have the property that the response now depends on all past input. This is sometimes referred to as
!
of individual units/nodes are localized and drop to zero as inputs the credit assignment problem, i.e., which past input deserves credit
approach infinity. This property can be important for many modern for the success in current output. This is where the idea D S
applications. enters. Dynamic programming is invoked to convert the problem
to a series of static learning problems. The price one pays is the
IV. COMBINATORIAL EXPLOSION OF STATE SPACE determination of the optimal cost-to-go function, V (state), and the
solution of the Bellman equation. In general, of course, the Bellman
Many problems/techniques are conceptually simple and elegant to
equation cannot be solved in closed form, and iterative computations
explain and execute, but they do not scale up well, i.e., the compu-
are required. Here the idea of estimation versus updating (EvsU)
tations involved grow exponentially with the size of the problem.
the V function comes into play. The techniques of policy iteration
This renders such techniques impractical. The entire spectrum of
and TD() discussed in Chapter 10 are simply different tradeoffs of
NP-hard combinatorial optimization problems, such as the traveling
EvsU when attempting to determine the optimal cost-to-go function
salesman problem, belong in this category. In our realm, this difficulty
V (state) by solving the Bellman equation. Q-learning is also another
often crops up as the size of the state space or the “curse of
related approach to the solution of the Bellmen equation. Finally,
dimensionality.”4 In fact, without this fundamental difficulty, dynamic
because of the CE difficulty, instead of trying to solve V (state) for
programming would have declared the end of control theory research
each state, we attempt to use neural networking as SFF to determine
way back in the 1960’s. In the writing of academic papers, we tend
V approximately. This is the rationale behind the neuro-dynamic
to ignore or sidestep this fundamental but unpleasant fact. However,
programming method of Bertsekas–Tsitsiklis.
awareness of it helps to explain why certain research directions are
Having reduced the problem of dynamic constraints into static
taken and others not and to assess what technique is practical and
problems, we can concentrate on the static version of the problem
what is merely conceptual.
of learning. It is here multilayered-perceptron or neural networks
Now we can review the book in terms of these four fundamental
(MLP/NN) found their roles. Chapters 8 and 9 explain the basics
!
ideas described above. We shall use shorthand notations, dynamic
of MLP/NN as model-free SFF in the sense described above. Since
to static conversion (D S), estimating versus updating (EvsU),
SFF requires the solution of an optimization (often least square fit)
statistical function fitting (SFF), and combinatorial explosion (CE),
problem, techniques of Part II are used effectively here to explain
respectively, to reference these four ideas and the concepts behind
learning with a teacher, back propagation, hybrid learning, and mixed
them.
linear and nonlinear least square fit. Finally, unsupervised learning in
The book begins with an excellent introductory chapter explaining
Chapter 11 can be considered as SFF optimization problems with
what is CI and its difference and relationship to artificial intelligence
built-in or implicit teachers based on similarity or other criterion.6
(AI). The next three chapters (Chs. 2–4) constituting Part I of the RBFN’s and Hopfield networks are simply additional families of
book deal with FL. We shall postpone its discussion until later. Part fitting functions used in these settings, although the latter can be
II (Chs. 5–7) succinctly reviews techniques of real variable-based
viewed as SFF only in a general sense.
optimization, in particular, least square fit, iterative hill climbing Returning to Part I (Chs. 2–4) on FL, we find something new, i.e.,
using gradients, and derivative-free optimization searches including outside the fundamental ideas 1–4 above. Over the years Zadeh has
genetic algorithm and simulated annealing commonly associated with
persuasively maintained that interactions with the human-made word
discrete or structureless optimization problems. ultimately must be done via human languages which are imprecise
This material is immediately put to use in Part III (Chs. 8–11) on and fuzzy. One way or another, analysis cannot proceed without
NN’s and learning. The chapters are well written and comprehensive.
some translation between the human and the mathematical languages.
This reviewer found that the material is best approached from the For example, to capture the experience-based control rule “if the
following perspective. “Learning” is the root of all intelligence, temperature is HIGH and the pressure is LOW, then turn up knob
natural or artificial. We used the word in the sense of the paradigm:
K by a SMALL amount,” we must define what we mean by the
4 We wish to thank W. Gong for pointing out that the problem of CE some-
5 In the book and in the literature, reinforcement learning is particularly
time arises by way of complex constraints. Contributions of the algorithms
of linear and integer programming are due to the way they efficiently handle associated with learning in a dynamic setting.
constraints. For the same reason, uniform sampling in a general irregular 6 In learning to design a linear/nonlinear filter, the criterion is to reduce the
region can be very difficult practically. estimation error residue to i.i.d. white noise.
1484 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 42, NO. 10, OCTOBER 1997
capitalized words. FL provides a systematic and quantitative way of application of model free MLP and genetic algorithms without
doing this versus ad hoc approximations as practices in the art of structural knowledge resulted in inferior performances.11 Knowledge
engineering. In traditional system analysis, we are also principally acquisition is of course the $64 million question in AI or CI. We
concerned with a mapping y = f (x), where f may represent a very know relatively little about how-to-capture and what-is knowledge.12
complex composition of functions. In a sentence, Chapters 2–4 deal It is not the fault of the book that this problem remains unsolved.
with the same issues when x; y , and more importantly f may be Lastly, this reviewer feels that insufficient attention or emphasis
given in human language terms as well as in mathematical languages has been paid to the issue of CE. Not all CI techniques scale up
(e.g., the if–then rule stated above in English). It is this translation well. And the reader should not read more into the techniques than
ability provided by FL that enables the systematic incorporation of there is. Just to mention one particular example, GA requires the
human language-based experience into learning techniques of Part III evaluation of the “fitness” of a population of possible alternatives. It
reviewed above. This important contribution brings us directly into is routinely assumed that in the implementation of GA this fitness
Part IV (Chs. 12 and 13) and Part V (Chs. 14–16). can be easily evaluated. However, in many real world problems,
FL also enables us to capture human experience quantitatively this fitness evaluation can only be “estimated” via a computationally
and systematically into learning models. In particular, we can divine intensive Monte Carlo simulation, e.g., the performance of a complex
model structures or architectures from various fuzzy if–then rules semiconductor manufacturing plant, or a multi-service military logis-
or human language-based descriptions of the systems. Such “knowl- tic supply chain under a given operational strategy. In such cases, one
edge” helps learning immensely versus the model-free or black box iteration of a GA may impose an infeasible computational burden on
learning approaches of Part III. The ANFIS and CANFIS architecture the algorithm. Such a computational burden faces the fundamental
in this part demonstrate the advantage of incorporating this structural limit of simulation and cannot be improved.13 Such warnings are not
knowledge.7 Chapters 14–16 also address some issues not commonly often given or emphasized in academic papers and books.
encountered in continuous variable system modeling, namely, the In fact, it is the opinion of this reviewer that the resolution of
identification of structure in discrete sets. The pruning of regression the CE problem is intimately related to the problem of knowledge
trees and structure identification in rule-bases have their parallels in acquisition and heuristics. NP-hardness is a fundamental limitation
the datamining literature.8 The central theme here is the problem on what computation can do. Quantifying heuristics and acquiring
of overfitting of data or how to be parsimonious with respect to structural knowledge seems to be the only salvation for the effective
parameters in SFF. solution of complex real problems. In fact, these are the reasons for
Parts VI and VII finally deal with the application of the ideas the continuing existence of human researchers and problem solvers.
discussed above to case studies involving the control of dynamic Despite these reservations, this reviewer urges the control commu-
systems, pattern recognition, game playing, and complex industrial nity to embrace the subject of CI and recommends this book. CI is
problems. It is here that this reviewer finds the promises of CI mind broadening and opens up a host of real world problems beyond
not completely fulfilled. This is not particularly the fault of the those differential equation-based problems that this TRANSACTIONS has
authors but rather an indication of the relative youth of the field lived with for the past 35 years. The methodologies are not hard
which is sometimes obscured by the enthusiasm of people working to learn (in fact quite familiar once you get beyond the jargons
in it. In fact, to the authors’ credit they are honest about the and acronyms) when viewed in terms of the perspective of ideas
shortcomings of the state-of-the-art of CI in the book (e.g., p. 434 1-4 above. More importantly, readers of this TRANSACTIONS have a
on knowledge acquisition). In the chapters of neuro-fuzzy control great deal to contribute because of such perspectives. While there are
and on advanced application, the original challenge of parking a car numerous fine books on the subject of fuzzy logic, genetic algorithms,
advanced by Zadeh at the start of FL research some 32 years ago and neural networks individually, this book puts all the subjects
remains untouched.9 Except for the part played by the translation together in one place and shows their respective places in CI. The
ability of FL, it is not entirely convincing to this reviewer that reviewer congratulates the authors for this timely book.
the problems could not have been solved by traditional techniques.
Whether it is the “knowledge” about the problem or the power of ACKNOWLEDGMENT
the techniques of CI that contributed more to the solution of the
application cases is less clear.10 In fact, the importance of knowledge The reviewer wishes to thank students in the DEDS group at
is made abundantly evident when in Chapter 22 the straightforward Harvard, E. Lau, L.-H. Lee, J. Lee, D. Li, and M. Yang, for studying
and discussing the book under review. They helped to clarify many
7 For two complementary and very well written references on fuzzy systems,
issues. However, the reviewer alone is responsible for the opinions
see J. Mendel “Fuzzy logic system for engineering—A tutorial,” Proc. IEEE,
pp. 345–377, Mar. 1995, and L. Zadeh, “Fuzzy Logic = Computing with
expressed.
wblapords,” IEEE Trans. Fuzzy Syst., vol. 4, pp. 103–111, May 1996. 11 Refer also to the remark about the ANFIS and CANFIS structure
8 See the inaugural editorial of the new Journal on Data Mining and
mentioned in the previous paragraph and Chs. 12 and 13.
Knowledge Discovery, vol. 1. New York: Kluwer, 1997. 12 We are reminded of the famous observation of Supreme Court Justice
9 However, this reviewer is not aware that the parking problem has been
Potter Stewart who remarks that he cannot define pornography but he knows
treated anywhere else including nonfuzzy control theory. An easier version of it when he see it. “Knowledge” fits the same dictum. We also use the
the problem, backing a truck to a loading dock, was treated by Sastry et al. word “knowledge” in a more general and amorphous sense than knowledge
10 This reviewer does concede that human-based knowledge can be more discovery in datamining for databases.
easily and systematically captured via FL than with the traditional “art of 13 The confidence interval of a Monte Carlo simulation cannot be improved
engineering practice.” This, after all is the contribution of FL and the CI upon faster than 1/(square root of the length of simulation). To increase
community. CI and FL also seem to offer a better impedance match to complex accuracy by one order of magnitude requires a two-order or magnitude
system problems than some the rarified theories of systems control. increase in computation cost.