0% found this document useful (0 votes)
16 views79 pages

235933recursive Identification and Parameter Estimation Hanfu Chen Instant Download

The document discusses the book 'Recursive Identification and Parameter Estimation' by Han-Fu Chen and Wenxiao Zhao, which focuses on mathematical modeling and parameter estimation in various systems. It covers topics such as recursive parameter estimation, identification of ARMAX and nonlinear systems, and other related problems. The book serves as a resource for understanding and applying recursive methods in systems engineering and mathematics.

Uploaded by

yaeggyliins
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views79 pages

235933recursive Identification and Parameter Estimation Hanfu Chen Instant Download

The document discusses the book 'Recursive Identification and Parameter Estimation' by Han-Fu Chen and Wenxiao Zhao, which focuses on mathematical modeling and parameter estimation in various systems. It covers topics such as recursive parameter estimation, identification of ARMAX and nonlinear systems, and other related problems. The book serves as a resource for understanding and applying recursive methods in systems engineering and mathematics.

Uploaded by

yaeggyliins
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

Recursive Identification And Parameter

Estimation Hanfu Chen download

https://ebookbell.com/product/recursive-identification-and-
parameter-estimation-hanfu-chen-4718668

Explore and download more ebooks at ebookbell.com


Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Recursive Models Of Dynamic Linear Economies Course Book Lars Peter


Hansen Thomas J Sargent

https://ebookbell.com/product/recursive-models-of-dynamic-linear-
economies-course-book-lars-peter-hansen-thomas-j-sargent-51952826

Recursive Macroeconomic Theory Fourth Edition Lars Ljungqvist

https://ebookbell.com/product/recursive-macroeconomic-theory-fourth-
edition-lars-ljungqvist-56635444

Recursive Macroeconomic Theory 2nd Edition Lars Ljungqvist Thomas J


Sargent

https://ebookbell.com/product/recursive-macroeconomic-theory-2nd-
edition-lars-ljungqvist-thomas-j-sargent-2375288

Recursive Desire Rereading Epic Tradition Reprint Jeremy M Downes

https://ebookbell.com/product/recursive-desire-rereading-epic-
tradition-reprint-jeremy-m-downes-23900602
Recursive Estimation And Timeseries Analysis An Introduction For The
Student And Practitioner 2nd Edition Peter C Young

https://ebookbell.com/product/recursive-estimation-and-timeseries-
analysis-an-introduction-for-the-student-and-practitioner-2nd-edition-
peter-c-young-2451416

Recursive Filtering For 2d Shiftvarying Systems With Communication


Constraints 1st Edition Jinling Liang

https://ebookbell.com/product/recursive-filtering-for-2d-shiftvarying-
systems-with-communication-constraints-1st-edition-jinling-
liang-33562186

Recursive Models Of Dynamic Linear Economies Lars Peter Hansen Thomas


J Sargent

https://ebookbell.com/product/recursive-models-of-dynamic-linear-
economies-lars-peter-hansen-thomas-j-sargent-34749378

Recursive Streamflow Forecasting A Statespace Approach Jozsef Szilagyi


Andras Szollosinagy

https://ebookbell.com/product/recursive-streamflow-forecasting-a-
statespace-approach-jozsef-szilagyi-andras-szollosinagy-4422132

Recursive Macroeconomic Theory Third Edition Lars Ljungqvist

https://ebookbell.com/product/recursive-macroeconomic-theory-third-
edition-lars-ljungqvist-4763342
Recursive
Identification and
Parameter Estimation

K16406_FM.indd 1 5/19/14 4:07 PM


Recursive
Identification and
Parameter Estimation

Han-Fu Chen
Wenxiao Zhao

K16406_FM.indd 3 5/19/14 4:07 PM


CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2014 by Taylor & Francis Group, LLC


CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper


Version Date: 20140418

International Standard Book Number-13: 978-1-4665-6884-6 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has
not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-
ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval system,
without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.
com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the CCC,
a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used
only for identification and explanation without intent to infringe.

Library of Congress Cataloging‑in‑Publication Data

Chen, Hanfu.
Recursive identification and parameter estimation / authors, Han‑Fu Chen, Wenxiao
Zhao.
pages cm
Includes bibliographical references and index.
ISBN 978‑1‑4665‑6884‑6 (hardback)
1. Systems engineering‑‑Mathematics. 2. Parameter estimation. 3. Recursive
functions. I. Zhao, Wenxiao. II. Title.

TA168.C476 2014
620’.004201519536‑‑dc23 2014013190

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com

and the CRC Press Web site at


http://www.crcpress.com
Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

About the Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

1 Dependent Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Some Concepts of Probability Theory . . . . . . . . . . . . . . . . 2


1.2 Independent Random Variables, Martingales, and Martingale Differ-
ence Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Markov Chains with State Space (Rm , B m ) . . . . . . . . . . . . . 15
1.4 Mixing Random Processes . . . . . . . . . . . . . . . . . . . . . . 29
1.5 Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.6 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . 40

2 Recursive Parameter Estimation . . . . . . . . . . . . . . . . . . . . . 43

2.1 Parameter Estimation as Root-Seeking for Functions . . . . . . . . 44


2.2 Classical Stochastic Approximation Method: RM Algorithm . . . . 46
2.3 Stochastic Approximation Algorithm with Expanding Truncations . 51
2.4 SAAWET with Nonadditive Noise . . . . . . . . . . . . . . . . . 63
2.5 Linear Regression Functions . . . . . . . . . . . . . . . . . . . . . 71
2.6 Convergence Rate of SAAWET . . . . . . . . . . . . . . . . . . . 77
2.7 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . 80

3 Recursive Identification for ARMAX Systems . . . . . . . . . . . . . . 81

3.1 LS and ELS for Linear Systems . . . . . . . . . . . . . . . . . . . 82


3.2 Estimation Errors of LS/ELS . . . . . . . . . . . . . . . . . . . . . 86

v
vi 

3.3 Hankel Matrices Associated with ARMA . . . . . . . . . . . . . . 93


3.4 Coefficient Identification of ARMAX by SAAWET . . . . . . . . . 121
3.5 Order Estimation of ARMAX . . . . . . . . . . . . . . . . . . . . 142
3.6 Multivariate Linear EIV Systems . . . . . . . . . . . . . . . . . . 158
3.7 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . 164

4 Recursive Identification for Nonlinear Systems . . . . . . . . . . . . . 165

4.1 Recursive Identification of Hammerstein Systems . . . . . . . . . . 166


4.2 Recursive Identification of Wiener Systems . . . . . . . . . . . . . 180
4.3 Recursive Identification of Wiener–Hammerstein Systems . . . . . 195
4.4 Recursive Identification of EIV Hammerstein Systems . . . . . . . 230
4.5 Recursive Identification of EIV Wiener Systems . . . . . . . . . . 253
4.6 Recursive Identification of Nonlinear ARX Systems . . . . . . . . 273
4.7 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . 287

5 Other Problems Reducible to Parameter Estimation . . . . . . . . . . 289

5.1 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . 289


5.2 Consensus of Networked Agents . . . . . . . . . . . . . . . . . . . 316
5.3 Adaptive Regulation for Hammerstein and Wiener Systems . . . . 324
5.4 Convergence of Distributed Randomized PageRank Algorithms . . 337
5.5 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . 352

Appendix A: Proof of Some Theorems in Chapter 1 . . . . . . . . . . . . 353

Appendix B: Nonnegative Matrices . . . . . . . . . . . . . . . . . . . . . . 381

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
 vii

Symbols
w
Ω basic space −→ weak convergence
ω element, or sample R real line including +∞
MT transpose of matrix M and −∞
A, B sets in Ω B 1-dimensional Borel σ -
∅ empty set algebra
σ (ξ ) σ -algebra generated by m(ξ ) median of random vari-
random variable ξ able ξ
P probability measure ν var total variation norm of
AB union of sets A and B signed measure ν
A B intersection of sets A aN ∼ bN c1 bN ≤ aN ≤ c2 bN
and B ∀ N ≥ 1 for some pos-
AΔB symmetric difference of itive constants c1 and
sets A and B c2
IA indicator function of set ⊗ Kronecker product
A det A determinant of matrix A
Eξ mathematical expecta- Adj A adjoint of matrix A
tion of random variable n! factorial of n
ξ Cnk combinatorial number
λmax (X) maximal eigenvalue of of k from n: Cnk =
matrix X n!
k!(n−k)!
λmin (X) minimal eigenvalue of
Re{a} real part of complex
matrix X
number a
X norm of matrix X de-
1 Im{a} imaginary part of com-
fined as (λmax (X T X)) 2 plex number a
a.s.
−→ almost sure conver- M+ pseudo-inverse of ma-
gence trix M
P
−→ convergence in proba- [a] integer part of real num-
bility ber a
 ix

Abbreviations
AR autoregression NARX nonlinear autoregression
ARMA autoregressive and mov- with exogenous input
ing average PCA principal component anal-
ARMAX autoregressive and mov- ysis
ing average with exoge- PE persistent excitation
nous input RM Robbins–Monro
ARX autoregression with ex- SA stochastic approximation
ogenous input SAAWET stochastic approximation
DRPA distributed randomized algorithm with expanding
PageRank algorithm truncations
EIV errors-in-variables SISO single-input single-output
ELS extended least squares SPR strictly positive realness
GCT general convergence theo- a.s. almost surely
rem iff if and only if
LS least squares iid independent and identi-
MA moving average cally distributed
MFD matrix fraction descrip- i.o. infinitely often
tion mds martingale difference se-
MIMO multi-input multi-output quence
Preface

To build a mathematical model based on the observed data is a common task for
systems in diverse areas including not only engineering systems, but also physical
systems, social systems, biological systems and others. It may happen that there is
no a priori knowledge concerning the system under consideration; one then faces the
“black box” problem. In such a situation the “black box” is usually approximated by
a linear or nonlinear system, which is selected by minimizing a performance index
depending on the approximation error. However, in many cases, from physical or me-
chanical thinking or from human experiences one may have some a priori knowledge
about the system. For example, it may be known that the data are statically linearly
related, or they are related by a linear dynamic system but with unknown coefficients
and orders, or they can be fit into a certain type of nonlinear systems, etc. Then,
the problem of building a mathematical model is reduced to fixing the uncertainties
contained in the a priori knowledge by using the observed data, e.g., estimating coef-
ficients and orders of a linear system or identifying the nonlinear system on the basis
of the data. So, from a practical application point of view, when building a mathemat-
ical model, one first has to fix the model class the system belongs to on the basis of
available information. After this, one may apply an appropriate identification method
proposed by theoreticians to perform the task.
Therefore, for control theorists the topic of system identification consists of doing
the following things: 1) Assume the model class is known. It is required to design
an appropriate algorithm to identify the system from the given class by using the
available data. For example, if the class of linear stochastic systems is assumed, then
one has to propose an identification algorithm to estimate the unknown coefficients
and orders of the system on the basis of the input–output data of the system. 2) The
control theorists then have to justify that the proposed algorithm works well in the
sense that the estimates converge to the true ones as the data size increases, if the
applied data are really generated by a system belonging to the assumed class. 3) It
has also to be clarified what will happen if the data do not completely match the
assumed model class, either because the data are corrupted by errors or because the
true system is not exactly covered by the assumed class.

xi
xii  Recursive Identification and Parameter Estimation

When the model class is parameterized, then the task of system identification
consists of estimating parameters characterizing the system generating the data and
also clarifying the properties of the derived estimates. If the data are generated by
a system belonging to the class of linear stochastic systems, then the identification
algorithm to be proposed should estimate the coefficients and orders of the system
and also the covariance matrix of the system noise. Meanwhile, properties such as
strong consistency, convergence rate and others of the estimates should be investigat-
ed. Even if the model class is not completely parameterized, for example, the class of
Hammerstein systems, the class of Wiener systems, and the class of nonlinear ARX
systems where each system in the class contains nonlinear functions, and the purpose
of system identification includes identifying the nonlinear function f (·) concerned,
the identification task can still be transformed to a parameter estimation problem. The
obvious way is to parameterize f (·) by approximating it with a linear combination
of basis functions with unknown coefficients, which are to be estimated. However, it
may also be carried out in a nonparametric way. In fact, the value of f (x) at any fixed
x can be treated as a parameter to estimate, and then one can interpolate the obtained
estimates for f (x) at different x, and the resulting interpolating function may serve
as the estimate of f (·).
As will be shown in the book, not only system identification but also many prob-
lems from diverse areas such as adaptive filtering and other problems from signal
processing and communication, adaptive regulation and iterative learning control,
principal component analysis, some problems connected with network systems such
as consensus control of multi-agent systems, PageRank of web, and many others can
be transformed to parameter estimation problems.
In mathematical statistics there are various types of parameter estimates whose
behaviors basically depend on the statistical assumptions made on the data. In the
present book, with the possible exception of Sections 3.1 and 3.2 in Chapter 3, the
estimated parameter denoted by x0 is treated as a root of a function g(·), called the
regression function. It is clear that an infinite number of functions may serve as such
a g(·), e.g., g(x) = A(x − x0 ), g(x) = sin x − x0 , etc. Therefore, the original prob-
lem may be treated as root-seeking for a regression function g(·). Moreover, it is
desired that root-seeking can be carried out in a recursive way in the sense that the
(k + 1)th estimate xk+1 for x0 can easily be obtained from the previous estimate xk
by using the data Ok+1 available at time k + 1. It is important to note that any da-
ta Ok+1 at time k + 1 may be viewed as an observation on g(xk ), because we can
always write Ok+1 = g(xk ) + εk+1 , where εk+1  Ok+1 − g(xk ) is treated as the ob-
servation noise. It is understandable that the properties of {εk } depend upon not only
the uncertainties contained in {Ok } but also the selection of g(·). So, it is hard to
expect that {εk } can satisfy any condition required by the convergence theorems for
the classical root-seeking algorithms, say, for the Robbins–Monro (RM) algorithm.
This is why a modified version of the RM algorithm is introduced, which, in fact, is a
stochastic approximation algorithm with expanding truncations (SAAWET). It turns
out that SAAWET works very well to deal with the parameter estimation problems
transformed from various areas.
In Chapter 1, the basic concept of probability theory and some information from
Preface  xiii

martingales, martingale difference sequences, Markov chains, mixing processes, and


stationary processes are introduced in such a way that they are readable without
referring to other sources and they are kept to the minimum needed for understanding
the coming chapters. Except material easily available from textbooks, here most of
the results are proved for those who are interested in mathematical derivatives. The
proof is placed in Appendix A at the end of the book, but without it the rest of the
book is still readable. In Chapter 2, root-seeking for functions is discussed, starting
from the classical RM algorithm. Then, SAAWET is introduced, and its convergence
and convergence rate are addressed in detail. This chapter provides the main tool
used for system identification and parameter estimation problems to be presented in
the subsequent chapters.
In Chapter 3, the ARMAX (autoregressive and moving average with exogenous
input) system is recursively identified. Since ARMAX is linear with respect to its
input, output, and driven noise, the conventional least squares (LS) or the extend-
ed LS (ELS) methods work well and the estimates are derived in a recursive way.
However, for convergence of the estimates given by ELS a restrictive SPR (strictly
positive realness) condition is required. After analyzing the estimation errors pro-
duced by the recursive LS and ELS algorithms in the first two sections, we then turn
to the root-seeking approach to identifying ARMAX systems without requiring the
SPR condition. Since the coefficients of the AR part and the correlation functions/the
impulse responses of the system are connected by a linear algebraic equation via a
Hankel matrix, it is of crucial importance to have the row-full-rank of the Hankel
matrix appearing in the linear equation. This concerns the identifiability of the AR
part and is discussed here in detail. Then the coefficients of ARMAX systems are
recursively estimated by SAAWET, and the strong consistency of the estimates is es-
tablished. As concerns the order estimation for ARMA, almost all existing methods
are based on minimizing some information criteria, so they are nonrecursive. In this
chapter a recursive order estimation method for ARMAX systems is presented and is
proved to converge to the true orders as data size increases. The recursive and strong-
ly consistent estimates are also derived for the case where both the input and output
of the ARMAX systems are observed with additive errors, i.e., the errors-in-variables
(EIV) case.
Chapter 4 discusses identification of nonlinear systems: Unlike other identifica-
tion methods, all estimation algorithms given in this chapter are recursive. The fol-
lowing types of nonlinear systems are considered: 1) the Hammerstein system com-
posed of a linear subsystem cascading with a static nonlinearity, which is located at
the input side of the system; 2) the Wiener system being also a cascading system
composed of a linear subsystem and a static nonlinearity but with the nonlineari-
ty following the linear part; 3) the Wiener–Hammerstein system being a cascading
system with the static nonlinearity sandwiched by two linear subsystems; 4) the EIV
Hammerstein and EIV Wiener systems; 5) the nonlinear ARX (NARX) system defin-
ing the system output in such a way that the output nonlinearly depends on the finite
number of system past inputs and outputs. For the linear subsystems of these sys-
tems, SAAWET is applied to estimate their coefficients and the strong consistency
of estimates is proved. The nonlinearities in these systems including the nonlinear
xiv  Recursive Identification and Parameter Estimation

function defining the NARX system are estimated by SAAWET incorporated with
kernel functions, and the strong consistency of the estimates is established as well.
Chapter 5 addresses the problems arising from different areas that are solved by
SAAWET. We limit ourselves to present the most recent results including principal
component analysis, consensus control of the multi-agent system, adaptive regula-
tion for Hammerstein and Wiener systems, and PageRank of webs. As a matter of
fact, the proposed approach has successfully solved many other problems such as
adaptive filtering, blind identification, iterative learning control, adaptive stabiliza-
tion and adaptive control, adaptive pole assignment, etc. We decided not to include
all of them, because either they are not the newest results or some of them have been
presented elsewhere.
Some information concerning the nonnegative matrices is provided in Appendix
B, which is essentially used in Sections 5.2 and 5.4.
Thebookiswrittenforstudents,researchers,andengineersworking in systems and
control, signal processing, communication, and mathematical statistics. The target
the book aims at is not only to show the results themselves on system identifica-
tion and parameter estimation presented in Chapters 3–5, but more importantly to
demonstrate how to apply the proposed approach to solve problems from different
areas.

Han-Fu Chen and Wenxiao Zhao


Acknowledgments

The support of the National Science Foundation of China, the National Center for
Mathematics and Interdisciplinary Sciences, and the Key Laboratory of Systems and
Control, Chinese Academy of Sciences are gratefully acknowledged. The authors
would like to express their gratitude to Professor Haitao Fang and Dr. Biqiang Mu
for their helpful discussions.

xv
About the Authors

Having graduated from the Leningrad (St. Petersburg) State University, Han-Fu Chen
joined the Institute of Mathematics, Chinese Academy of Sciences (CAS). Since
1979, he has been with the Institute of Systems Science, now a part of the Academy
of Mathematics and Systems Science, CAS. He is a professor at the Key Laboratory
of Systems and Control of CAS. His research interests are mainly in stochastic sys-
tems, including system identification, adaptive control, and stochastic approximation
and its applications to systems, control, and signal processing. He has authored and
coauthored more than 200 journal papers and 7 books.
Professor Chen served as an IFAC Council member (2002–2005), president of
the Chinese Association of Automation (1993–2002), and a permanent member of
the Council of the Chinese Mathematics Society (1991–1999).
He is an IEEE fellow, IFAC fellow, a member of TWAS, and a member of CAS.

Wenxiao Zhao earned his BSc degree from the Department of Mathematics,
Shandong University, China in 2003 and a PhD degree from the Institute of Sys-
tems Science, AMSS, the Chinese Academy of Sciences (CAS) in 2008. After this
he was a postdoctoral student at the Department of Automation, Tsinghua University.
During this period he visited the University of Western Sydney, Australia, for nine
months. Dr. Zhao then joined the Institute of Systems of Sciences, CAS in 2010. He
now is with the Key Laboratory of Systems and Control, CAS as an associate profes-
sor. His research interests are in system identification, adaptive control, and system
biology. He serves as the general secretary of the IEEE Control Systems Beijing
Chapter and an associate editor of the Journal of Systems Science and Mathematical
Sciences.

xvii
Chapter 1

Dependent Random
Vectors

CONTENTS
1.1 Some Concepts of Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Independent Random Variables, Martingales, and Martingale
Difference Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Markov Chains with State Space (Rm , B m ) . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 Mixing Random Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.5 Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.6 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

In the convergence analysis of recursive estimation algorithms the noise properties


play a crucial role. Depending on the problems under consideration, the noise may
be with various properties such as mutually independent random vectors, martin-
gales, martingale difference sequences, Markov chains, mixing sequences, station-
ary processes, etc. The noise may also be composed of a combination of such kind
of processes. In order to understand the convergence analysis to be presented in the
coming chapters, the properties of above-mentioned random sequences are described
here. Without any attempt to give a complete theory, we restrict ourselves to present
the theory at the level that is necessary for reading the book.
Prior to describing random sequences, we first introduce some basic concepts of
probability theory. This mainly is to provide a unified framework of notations and
language.

1
2  Recursive Identification and Parameter Estimation

1.1 Some Concepts of Probability Theory


Denote by Ω the basic space. A point ω ∈ Ω is called the element, or sample. Denote
by A, B, or C, etc. the sets in Ω and by ∅ the empty set. The complementary set of A
in Ω is denoted by Ac  {ω ∈ Ω : ω ∈ A} and the symmetric difference of sets A and
B by AΔB  ABc ∪ BAc .

Definition 1.1.1 A set-class F is called a σ -algebra or σ -field if the following


conditions are satisfied:
(i) Ω ∈ F ;
(ii) If A ∈ F , then Ac ∈ F ;
(iii) If An ∈ F , n = 1, 2, 3, · · · , then ∪∞
n=1 An ∈ F .

Let F1 be a σ -algebra and F1 ⊂ F . Then F1 is called a sub-σ -algebra of F .

A conclusion immediately follows from Definition 1.1.1 that the σ -algebra F


is closed under countable intersection of sets, i.e., ∩∞ n=1 An ∈ F if An ∈ F , n =
1, 2, 3, · · · . A set A belonging to F is also named as A being measurable with re-
spect to F or F -measurable and the pair (Ω, F ) is called a measurable space.

Definition 1.1.2 For a sequence {An }n≥1 of sets, define


∞ 
 ∞
lim inf An  Ak , (1.1.1)
n→∞
n=1 k=n
∞ 
 ∞
lim sup An  Ak . (1.1.2)
n→∞
n=1 k=n

By Definition 1.1.2, it is clear that

lim inf An = {ω ∈ Ω : ω ∈ An for all but a finite number of indices n},


n→∞
lim sup An = {ω ∈ Ω : ω ∈ An , i.o.},
n→∞

and lim inf An ⊂ lim sup An , where the abbreviation “i.o.” is to designate “infinitely
n→∞ n→∞
often.”
If lim inf An = lim sup An = A, then A is called the limit of the sequence {An }n≥1 .
n→∞ n→∞

Definition 1.1.3  A set function φ on F is called σ -additive or countably additive



if φ (∪∞n=1 An ) = n=1 φ (An ) for any sequence of disjoint sets {An }n≥1 , i.e., An ∈
F , n ≥ 1 and An ∩ Am = ∅ for any n = m. A nonnegative σ -additive set function φ is
called a measure if φ (A) ≥ 0 for any A ∈ F and φ (∅) = 0. If φ is a measure, then the
triple (Ω, F , φ ) is called a measure space. A measure φ is said to be σ -finite if there
is {Ωn }n≥1 ⊂ F such that Ω = ∪∞ n=1 Ωn and φ (Ωn ) < ∞, n ≥ 1. For two measures
φ1 and φ2 , φ1 is said to be absolutely continuous with respect to φ2 if φ1 (A) = 0
whenever φ2 (A) = 0, A ∈ F . This is denoted by φ1  φ2 .
Dependent Random Vectors  3

By Definition 1.1.3, the σ -additive set function φ is allowed to take values +∞


or −∞. For the σ -additive function φ and any A ∈ F , define
φ+ (A)  sup φ (B) , φ− (A)  − inf φ (B) . (1.1.3)
B∈F ,B⊂A B∈F ,B⊂A

Theorem 1.1.1 (Jordan–Hahn Decomposition) For any σ -additive set function φ on


F , there exists a set D ∈ F such that
φ+ (A) = φ (AD) , φ− (A) = −φ (ADc ) , (1.1.4)
where both φ+ and φ− are measures on F and φ = φ+ − φ− .
The measures φ+ , φ− , and φ  φ+ + φ− are called the upper, lower, and total
variation measures of φ .
Definition 1.1.4 A measure P defined on (Ω, F ) is called a probability measure if
P(Ω) = 1. The triple (Ω, F , P) is called a basic probability space and P(A) is called
the probability of set A, A ∈ F , respectively.
For a measure space (Ω, F , φ ), the subsets of an F -measurable set A with
φ (A) = 0 may not belong to F . It is natural to add all subsets of sets with zero
measure to the σ -algebra F and define their measures to equal zero. Mathematical-
ly, this is expressed as
F ∗  {AΔM : A ∈ F , M ⊂ N ∈ F , φ (N) = 0}, (1.1.5)

φ (AΔM)  φ (A). (1.1.6)
It can be proved that F ∗ is a σ -algebra and φ ∗ is a measure on (Ω, F ∗ ). The triple
(Ω, F ∗ , φ ∗ ) is called the completion of (Ω, F , φ ). In the sequel, we always assume
that the measure space and the probability space are completed.
Example 1.1.1 Denote the real line by R  [−∞, ∞]. The σ -algebra B generated
by the class of infinite intervals of the form [−∞, x) with −∞ < x < +∞ is called
the Borel σ -algebra, the sets in B are called the Borel sets, and the measurable space
(R, B) is known as the 1-dimensional Borel space.
Theorem 1.1.2 Any nondecreasing finite function m(·) on (−∞, ∞) determines a
complete measure space (R, Mm , νm ) with Mm being the completed σ -algebra gen-
erated by the set class {[t : m(t) ∈ [a, b)], a, b ∈ R}, where
νm ([a, b)) = m(b−) − m(a−), − ∞ < a ≤ b < +∞,
νm ({∞}) = νm ({−∞}) = 0,
where m(x−) denotes the left limit of m(·) at x.
Definition 1.1.5 The measure νm defined by Theorem 1.1.2 is called the Lebesgue
Stieljies measure determined by m and the complete measure space (R, Mm , νm )
is named as the Lebesgue–Stieljies measure space determined by m. If m(x) = x
∀x ∈ (−∞, ∞), then ν = νm is referred to the Lebesgue measure and the sets in
M  Mm are called the Lebesgue measurable sets.
4  Recursive Identification and Parameter Estimation

Definition 1.1.6 For measurable spaces (Ωi , Fi ), i = 1, · · · , n, define



n
Ai  {(ω1 , · · · , ωn ) : ωi ∈ Ai , i = 1, · · · , n}, Ai ∈ Fi ,
i=1

n 
n
Fi  σ Ai : Ai ∈ Fi , i = 1, · · · , n ,
i=1 i=1
 n
n  
n
(Ωi , Fi )  Ωi , Fi ,
i=1 i=1 i=1
n n
where i=1 Ωi is called the product space, i=1 (Ωi , Fi ) the n-dimensional product
n
measurable space, and i=1 Fi is the product σ -algebra.
Theorem 1.1.3 Let (Ωi , Fi , φi ), i = 1, 2 be two σ -finite measure spaces and let Ω =
2 2
i=1 Ωi , F = i=1 Fi be the product measurable spaces. Then there exists a σ -
finite measure φ on (Ω, F ) for which
φ (A1 × A2 ) = φ (A1 ) · φ (A2 ), ∀ Ai ∈ Fi , i = 1, 2.
The σ -finite measure φ on (Ω, F ) is called the product measure of φ1 and φ2 and is
often denoted by φ = φ1 × φ2 .
Definition 1.1.7 A real function ξ = ξ (ω ) defined on (Ω, F ) is called a random
variable if it is F -measurable, i.e., {ω ∈ Ω : ξ (ω ) ∈ A} ∈ F for any A ∈ B and if
it takes finite value almost surely, i.e., P (|ξ (ω )| < ∞) = 1. The distribution function
Fξ (·) of a random variable ξ is defined as

Fξ (x)  P (ω : ξ (ω ) < x) ∀ x ∈ R. (1.1.7)


A real number m(ξ ) is called the median of the random variable ξ if P(ξ ≤ m(ξ )) ≥
2 ≤ P(ξ ≥ m(ξ )) .
1

From the above definition we see that the random variables are in fact the mea-
surable functions from (Ω, F ) to (R, B). A measurable function f from (R, B) to
(R, B) is usually called the Borel measurable function.
It can be shown that the distribution function is nondecreasing and left-
continuous. If Fξ is differentiable, then its derivative fξ (x)  dFξ (x)/dx is called
the probability density function of ξ , or, simply, density function.
The n-dimensional vector ξ = [ξ1 · · · ξn ]T is called a random vector if ξi is a
random variable for each i = 1, · · · , n. The n-dimensional distribution function and
density function are, respectively, defined by
Fξ (x1 , · · · , xn )  P (ξ1 < x1 , · · · , ξn < xn ) , (1.1.8)
and
x1 xn
Fξ (x1 , · · · , xn )  ··· fξ (t1 , · · · ,tn )dt1 · · · dtn . (1.1.9)
−∞ −∞
Dependent Random Vectors  5

Example 1.1.2 The well-known Gaussian density function is given by


 2
1 1 x−μ
√ exp − , x∈R (1.1.10)
2πσ 2 σ

for fixed scalars μ and σ with σ > 0. In the n-dimensional case the Gaussian density
function is defined as
 
1 1 T −1
n 1 exp − (x − μ ) Σ (x − μ ) , x ∈ Rn (1.1.11)
(2π ) 2 (det Σ) 2 2

with fixed μ ∈ Rn and positive definite Σ ∈ Rn×n .

We now introduce the concept ofmathematical expectation of a random variable


ξ , denoted by E ξ  Ω ξ dP, where Ω ξ dP denotes the integral of ξ with respect to
probability measure P. We first consider the nonnegative ξ . 
For each n ≥ 1, consider ω -sets Ani = ω ∈ Ω : 2in < ξ ≤ i+1
2n , i = 0, · · · , n2 −
n

1 and the sequence


n

n2 −1
i
Sn = P(Ani ) + nP(ξ > n). (1.1.12)
2n
i=0

It can be shown that Sn converges as n → ∞ and the limit is defined as the mathe-
matical expectation of ξ , i.e., E ξ = Ω ξ dP  lim Sn .
 n→∞ 
In the following, for A ∈ F , by A ξ dP we mean Ω ξ IA dP where IA is the indi-
cator of A:
1, if ω ∈ A,
IA (ω ) 
0, otherwise.
For a random variable ξ , define the nonnegative random variables

ξ+  max {ξ , 0} , ξ−  max {−ξ , 0} . (1.1.13)

It is clear that ξ = ξ+ − ξ− . The mathematical expectation of ξ is defined by E ξ 


E ξ+ − E ξ− if at least one of E ξ+ and E ξ− is finite. If both E ξ+ and E ξ− are finite,
then ξ is called integrable. If both E ξ+ and E ξ− are infinite, then we say that the
mathematical expectation does not exist for ξ .

Theorem 1.1.4 (Fubini) If (Ω, F , P) is the product space of two probability spaces
(Ωi , Fi , Pi ), i = 1, 2 and X = X(ω1 , ω2 ) is a random variable on (Ω, F ) for which
the mathematical expectation exists, then

XdP = dP1 X(ω1 , ω2 )dP2 = dP2 X(ω1 , ω2 )dP1 . (1.1.14)


Ω Ω1 Ω2 Ω2 Ω1
6  Recursive Identification and Parameter Estimation

We now list some basic inequalities related to the mathematical expectation.


Chebyshev Inequality. For any ε > 0,
1
P (|ξ | > ε ) ≤ E|ξ |. (1.1.15)
ε
Jensen Inequality. Let the Borel measurable function g be convex, i.e., g(θ1 x1 +
θ2 x2 ) ≤ θ1 g(x1 ) + θ2 g(x2 ) for any x1 , x2 ∈ R and any θ1 ≥ 0, θ2 ≥ 0 with θ1 + θ2 = 1.
If ξ is integrable, then

g (E ξ ) ≤ Eg(ξ ). (1.1.16)

Lyapunov Inequality. For any 0 < s < t,


1  1
(E|ξ |s ) s ≤ E|ξ |t t . (1.1.17)

Hölder Inequality. Let 1 < p < ∞, 1 < q < ∞, and 1p + 1q = 1. For random
variables ξ and η with E|ξ | p < ∞ and E|η |q < ∞, it holds that
1 1
E|ξ η | ≤ (E|ξ | p ) p (E|η |q ) q . (1.1.18)

In the case p = q = 2, the Hölder inequality is also named as the Schwarz inequality.
Minkowski Inequality. If E|ξ | p < ∞ and E|η | p < ∞ for some p ≥ 1, then
1 1 1
(E|ξ + η | p ) p ≤ (E|ξ | p ) p + (E|η | p ) p . (1.1.19)

Cr -Inequality.
 r

n 
n
|ξi | ≤ Cr |ξi |r , (1.1.20)
i=1 i=1

1, if r < 1,
where Cr =
nr−1 , if r ≥ 1.
We now introduce the concepts of convergence of random variables.

Definition 1.1.8 Let ξ and {ξn }n≥1 be random variables. The sequence {ξn }n≥1 is
a.s.
said to converge to ξ with probability one or almost surely, denoted by ξn −→ ξ ,
  n →∞
if P ω : ξn (ω ) −→ ξ (ω ) = 1. {ξn }n≥1 is said to converge to ξ in probabili-
n→∞
P
ty, denoted by ξn −→ ξ , if P (|ξn − ξ | > ε ) = o(1) for any ε > 0. {ξn }n≥1 is said
n→∞
w
to weakly converge or to converge in distribution to ξ , denoted by ξn −→ ξ , if
n→∞
Fξn (x) −→ Fξ (x) at any x where Fξ (x) is continuous. {ξn }n≥1 is said to converge
n→∞
to ξ in the mean square sense if E|ξn − ξ |2 −→ 0.
n→∞
Dependent Random Vectors  7

In what follows iff is the abbreviation of “if and only if.” The relationship between
various types of convergence is demonstrated by the following theorem.

Theorem 1.1.5 The following results on convergence of random variables take


place:
a.s. P
(i) If ξn −→ ξ , then ξn −→ ξ .
n→∞ n→∞

P w
(ii) If ξn −→ ξ , then ξn −→ ξ .
n→∞ n→∞

P
(iii) If E|ξn − ξ |2 −→ 0, then ξn −→ ξ .
n→∞ n→∞

a.s. P P
(iv) ξn −→ ξ iff sup j≥n |ξ j − ξ | −→ 0 iff supm>n |ξm − ξn | −→ 0.
n→∞ n→∞ n→∞

P
(v) ξn −→ ξ iff supm>n P(|ξm − ξn | > ε ) = o(1) ∀ ε > 0.
n→∞

The following theorems concern the interchangeability of taking expectation and


taking limit.

Theorem 1.1.6 (Monotone Convergence Theorem) If {ξn }n≥1 nondecreasingly con-


verges to ξ with probability one and ξn ≥ η a.s. for some random variable η with
E η− < ∞, then limn→∞ E ξn = E ξ .

Theorem 1.1.7 (Fatou Lemma) If ξn ≥ η , n = 1, 2, 3, · · · for some random variable


η with E η− < ∞, then E lim infn→∞ ξn ≤ lim infn→∞ E ξn .

Theorem 1.1.8 (Dominated Convergence Theorem) If |ξn | ≤ η , n = 1, 2, 3, · · · ,


P
E η < ∞, and ξn −→ ξ , then E|ξ | < ∞, E|ξn − ξ | −→ 0, and E ξn −→ E ξ .
n→∞ n→∞ n→∞

We now introduce an important concept in probability theory, the conditional


expectation.
Let A and B be F -measurable sets. The conditional probability of set A given
P(AB)
B is defined by P (A|B) = P(B) whenever P(B) = 0. Intuitively, P (A|B) describes
the probability of set A with a priori information of set B. We now generalize this
elementary concept from conditioned on a set to conditioned on a σ -algebra.

Theorem 1.1.9 (Radon–Nikodym Theorem) Let F1 be a sub-σ -algebra of F . For


any random variable ξ with E ξ well defined, there is a unique (up to sets of proba-
bility zero) F1 -measurable random variable η such that

η dP = ξ dP ∀ A ∈ F1 . (1.1.21)
A A

Definition 1.1.9 The F1 -measurable function η defined by (1.1.21) is called the


conditional expectation of ξ given F1 , denoted by E (ξ |F1 ). In particular, if ξ = IB ,
then η is the conditional probability of B given F1 and we write it as P(B|F1 ).
8  Recursive Identification and Parameter Estimation

For random variables ξ and ζ , the conditional expectation of ξ given ζ is defined


by E (ξ |ζ )  E (ξ |σ (ζ )), where σ (ζ ) is the σ -algebra generated by ζ , i.e., σ (ζ )
being the smallest σ -algebra containing all sets of the form {ω : ζ (ω ) ∈ B}, B ∈ B.
It is worth noting that if ξ = IB , ζ = ID , then σζ = {∅, Ω, D, Dc } and P(B|ζ ) =
P(B|D)ID + P(B|Dc )IDc , where P(B|D) and P(B|Dc ) are defined for the elementary
case.
For the conditional expectation, the following properties take place.
(i) E[aξ + bη |F1 ] = aE[ξ |F1 ] + bE[η |F1 ] for any constants a and b.
(ii) If ξ ≤ η , then E[ξ |F1 ] ≤ E[η |F1 ].
(iii) There exists a Borel-measurable function f such that E[ξ |η ] = f (η ) a.s.
(iv) E[E(ξ |η )] = E ξ .
(v) E[ηξ |F1 ] = η E[ξ |F1 ] a.s. for any F1 -measurable η .
(vi) If σ -algebras F1 ⊂ F2 ⊂ F , then E[E(ξ |F2 )|F1 ] = E[ξ |F1 ].
(vii) If F1 = {Ω, ∅}, then E[ξ |F1 ] = E ξ a.s.
For a sub-σ -algebra F1 of F , the Chebyshev inequality, Jensen inequality, Lya-
punov inequality, Hölder inequality, Minkowski inequality, and Cr -inequality also
hold if the expectation operator E(·) is replaced by E(·|F1 ). The monotone conver-
gence theorem, Fatou lemma, and the dominated convergence theorem also remain
valid by replacing E(·) with E(·|F1 ). We just need to note that in the case of condi-
tional expectation these inequalities and convergence theorems hold a.s.

1.2 Independent Random Variables, Martingales, and


Martingale Difference Sequences
Throughout the book the basic probability space is always denoted by (Ω, F , P).

Definition
  Themevents Ai ∈ F , i = 1, · · · , n are said to be mutually independent
1.2.1
if P ∩mj=1 Ai j = j=1 P(Ai j ) for any subset [i1 < · · · < im ] ⊂ [1, · · · , n]. The σ -
 
algebras Fi ⊂ F , i = 1, · · · , n are said to be mutually independent if P ∩mj=1 Ai j =
m
j=1 P(Ai j ) for any Ai j ∈ Fi j , j = 1, · · · , m with [i1 < · · · < im ] being any subset of
[1, · · · , n]. The random variables {ξ1 , · · · , ξn } are called mutually independent if the
σ -algebras σ (ξi ) generated by ξi , i = 1, · · · , n are mutually independent. Let {ξi }i≥1
be a sequence of random variables. {ξi }i≥1 is called mutually independent if for any
n ≥ 1 and any set of indices {i1 , · · · , in }, the random variables {ξik }nk=1 are mutually
independent.

A sequence of random variables {ξk }k≥1 is called independent and identically


distributed (iid) if {ξk , k ≥ 1} are mutually independent with the same distribution
function.
Dependent Random Vectors  9

∞
Definition 1.2.2 The tail σ -algebra of a sequence {ξk }k≥1 is k=1 σ {ξ j , j ≥ k}. The
sets of the tail σ -algebra are called tail events and the random variables measurable
with respect to the tail σ -algebra are called tail variables.

Theorem 1.2.1 (Kolmogorov Zero–One Law) Tail events of an iid sequence {ξk }k≥1
have probabilities either zero or one.

Proof. See Appendix A. 

Theorem 1.2.2 Let f (x, y) be a measurable function defined on (Rl × Rm , B l ×


B m ). If the l-dimensional random vector ξ is independent of the m-dimensional
random vector η and E f (ξ , η ) exists, then

E[ f (ξ , η )|σ (ξ )] = g(ξ ) a.s. (1.2.1)

where
E f (x, η ), if E f (x, η ) exists,
g(x) =
0, otherwise.

Proof. See Appendix A. 

Theorem 1.2.3 (Kolmogorov Three Series Theorem)


∞ Assume the random variables
{ξk }k≥1 are mutually independent. The sum k=1 ξk converges almost surely iff the
following three series converge:


P(|ξk | > 1) < ∞, (1.2.2)
k=1


E ξk < ∞, (1.2.3)
k=1


E(ξk − E ξk )2 < ∞, (1.2.4)
k=1

where ξk  ξk I[|ξk |≤1] .

Theorem 1.2.4 (Marcinkiewicz–Zygmund) Assume {ξk }k≥1 are iid. Then


n
ξk − cn
k=1
1 −→ 0 a.s. p ∈ (0, 2) (1.2.5)
n→∞
np
if and only if E|ξk | p < ∞, where the constant c = E ξk if p ∈ [1, 2), while c is arbitrary
if p ∈ (0, 1).
10  Recursive Identification and Parameter Estimation

As will be seen in the later chapters, the convergence analysis of many identifica-
tion algorithms relies on the almost sure convergence of a series of random vectors,
which may not satisfy the independent assumption and thus Theorem 1.2.3 is not
directly applicable. Therefore, we need results on a.s. convergence for the sum of
dependent random variables, which are summarized in what follows.
We now introduce the concept of martingale, which is a generalization of the
sum of zero-mean mutually independent random variables, and is widely applied in
diverse research areas.

Definition 1.2.3 Let {ξk }k≥1 be a sequence of random variables and {Fk }k≥1 be
a sequence of nondecreasing σ -algebras. If ξk is Fk -measurable for each k ≥ 1,
then we call {ξk , Fk }k≥1 an adapted process. An adapted process {ξk , Fk }k≥1 with
E|ξk | < ∞ ∀k ≥ 1 is called a submartingale if E[ξn |Fm ] ≥ ξm a.s. ∀n ≥ m, a su-
permartingale if E[ξn |Fm ] ≤ ξm a.s. ∀n ≥ m, and a martingale if it is both a su-
permartingale and a submartingale, i.e., E[ξn |Fm ] = ξm a.s. ∀n ≥ m. An adapt-
ed process {ξk , Fk }k≥1 is named as a martingale difference sequence (mds) if
E[ξk+1 |Fk ] = 0 a.s. ∀k ≥ 1.

Here we give a simple example to illustrate the definition introduced above.


Let {ηk }k≥1 be a sequence of zero-mean mutually independent random variables.
k
Define Fk = σ {η1 , · · · , ηk } and ζk = i=1 ηi . Then {ηk , Fk }k≥1 is an mds and
{ζk , Fk }k≥1 is a martingale.

Theorem 1.2.5 (Doob maximal inequality) Assume {ξk }k≥1 is a nonnegative sub-
martingale. Then for any λ > 0,
 
1
P max ξ j ≥ λ ≤ ξn dP. (1.2.6)
1≤ j ≤ n λ 
max ξ j ≥λ
1≤ j≤n

Further,
  1p p  p  1p
E max ξ jp ≤ E ξn (1.2.7)
1≤ j≤n p−1

if E ξ jp < ∞, j = 1, · · · , n, where 1 < p < ∞.

Proof. See Appendix A. 

Definition 1.2.4 Let {Fk }k≥1 be a sequence of nondecreasing σ -algebras. A mea-


surable function T taking values in {1, 2, 3, · · · , ∞} is called a stopping time with
respect to {Fk }k≥1 if

{ω : T (ω ) = k} ∈ Fk ∀k ≥ 1. (1.2.8)

In addition, if P(T = ∞) = 0 then the stopping time T is said to be finite. A finite


stopping time is also called a stopping rule or stopping variable.
Dependent Random Vectors  11

Lemma 1.2.1 Let {ξk , Fk }k≥1 be adapted, T a stopping time, and B a Borel set. Let
TB be the first time at which the process {ξk }k≥1 hits the set B after time T , i.e.,

inf{k : k > T, ξk ∈ B}
TB  (1.2.9)
∞, if ξk ∈ B for all k > T.
Then TB is a stopping time.
Proof. The conclusion follows from the following expression:
−1
k
[TB = k] = {[T = i] ∩ [ξi+1 ∈ B, · · · , ξk−1 ∈ B, ξk ∈ B]} ∈ Fk ∀ k ≥ 1.
i=0


Let {ξk , Fk }, k = 1, · · · , N be a submartingale. For a nonempty interval (a, b),
define
T0  0,
min{1 ≤ k ≤ N : ξk ≤ a},
T1 
N + 1, if ξk > a, k = 1, · · · , N,
min{T1 < k ≤ N : ξk ≥ b},
T2 
N + 1, if ξk < b ∀k : T1 < k ≤ N, or T1 = N + 1,
..
.
min{T2m−2 < k ≤ N : ξk ≤ a},
T2m−1 
N + 1, if ξk > a ∀k : T2m−2 < k ≤ N, or T2m−2 = N + 1,
min{T2m−1 < k ≤ N : ξk ≥ b},
T2m 
N + 1, if ξk < b ∀k : T2m−1 < k ≤ N, or T2m−1 = N + 1.

The largest m for which ξ2m ≥ b is called the number of up-crossings of the
interval (a, b) by the submartingale {ξk , Fk }Nk=1 and is denoted by β (a, b).
Theorem 1.2.6 (Doob) For the submartingale {ξk , Fk }Nk=1 the following inequali-
ties hold
E(ξN − a)+ E(ξN )+ + |a|
E β (a, b) ≤ ≤ (1.2.10)
b−a b−a
where (ξN )+ is defined by (1.1.13).
Proof. See Appendix A. 
Theorem 1.2.7 (Doob) Let {ξk , Fk }k≥1 be a submartingale with supk E(ξk )+ < ∞.
Then there is a random variable ξ with E|ξ | < ∞ such that
lim ξk = ξ a.s. (1.2.11)
k→∞
12  Recursive Identification and Parameter Estimation

Proof. See Appendix A. 

Corollary 1.2.1 If either (i) or (ii) are satisfied, where


(i) {ξk , Fk }k≥1 is a nonnegative supermartingale or nonpositive submartingale,
(ii) {ξk , Fk }k≥1 is a martingale with supk E|ξk | < ∞,
then

lim ξk = ξ a.s. and E|ξ | < ∞.


k→∞

We have presented some results on the a.s. convergence of some random series
and sub- or super-martingales. However, a martingale or an mds may converge not
on the whole space Ω but on its subset. In the following we present the set where a
martingale or an mds converges.
Let {ξk , Fk }k≥0 with ξk ∈ Rm be an adapted sequence, and let G be a Borel set
in B m . Then the first exit time T of {ξk }k≥0 from G defined by

min{k : ξk ∈ G}
T=
∞, if ξk ∈ G, ∀ k ≥ 0

is a stopping time. This is because {T = k} = {ξ0 ∈ G, ξ1 ∈ G, · · · , ξk−1 ∈ G, ξk ∈


G} ∈ Fk .

Lemma 1.2.2 Let {ξk , Fk }k≥0 be a martingale (supermartingale, submartingale)


and T be a stopping time. Then the process {ξT ∧k , Fk }k≥0 is again a martingale
(supermartingale, submartingale), where T ∧ k  min(T, k).

Proof. See Appendix A. 

Theorem 1.2.8 Let {ξk ,


Fk }k≥0 be a one-dimensional mds. Then as k → ∞, the
k
sequence ηk = i=0 ξi converges on
k≥0


  
A ω: E ξk2 |Fk−1 < ∞ . (1.2.12)
k=1

Proof. See Appendix A. 


k
Theorem 1.2.9 Let {ξk , Fk }k≥0 be an mds and let ηk = i=0 ξi , k ≥ 0.

(i) If E(supk ξk )+ < ∞, then ηk converges a.s. on A1  {ω : supk ηk < ∞}.


(ii) If E(infk ξk )− < ∞, then ηk converges a.s. on A2  {ω : infk ηk > −∞}.

Proof. It suffices to prove (i) since (ii) is reduced to (i) if ξk is replaced by −ξk .
The detailed proof is given in Appendix A. 
Dependent Random Vectors  13

Theorem 1.2.10 (Borel–Cantelli–Lévy) Let {Bk }k≥1 be a sequence of events, Bk ∈


Fk . Then


IBk < ∞ (1.2.13)
k=1
iff


P(Bk |Fk−1 ) < ∞, (1.2.14)
k=1
or equivalently
∞ 
 ∞ ∞

Bi = ω: P(Bk |Fk−1 ) = ∞ . (1.2.15)
k=1 i=k k=1
Proof. See Appendix A. 
Theorem 1.2.11 (Borel–Cantelli) Let {Bk }k≥1 be a sequence of events.
∞
(i) If k=1 P(Bk ) < ∞, then the probability that Bk , k ≥ 1 occur infinitely often
is zero.
∞
(ii) If Bk , k ≥ 1 are mutually independent and k=1 P(Bk ) = ∞, then
P Bk i.o. = 1.
Proof. See Appendix A. 
Lemma 1.2.3 Let {yk , Fk } be an adapted process and {bk } a sequence of positive
numbers. Then
∞  ∞
 
ω: yk converges A= ω : yk I[|yk |≤bk ] converges A (1.2.16)
k=1 k=1
where

  
A= ω: P |yk | > bk |Fk−1 < ∞ . (1.2.17)
k=1
Proof. See Appendix A. 
The following result is a generalization of Theorem 1.2.3.
Theorem 1.2.12 Denote by S the ω –set where the following three series converge:


P(|yk | > c|Fk−1 ) < ∞, (1.2.18)
k=1

  
E yk I[|yk |≤c] |Fk−1 < ∞, (1.2.19)
k=1
∞ 
     2 
E y2k I[|yk |≤c] |Fk−1 − E yk I[|yk |≤c] |Fk−1 < ∞, (1.2.20)
k=1
14  Recursive Identification and Parameter Estimation

k
where c is a positive constant. Then ηk = i=1 yi converges on S as k → ∞.

Proof. See Appendix A. 


k
Theorem 1.2.13 Let {ξk , Fk } be an mds. Then ηk  i=1 ξi converges on

  
A ω: E |ξk | p |Fk−1 < ∞ for 0 < p ≤ 2. (1.2.21)
k=1

Theorem 1.2.13 generalizes Theorem 1.2.8. For its proof we refer to Appendix A.
For analyzing the asymptotical properties of stochastic systems we often need to
know the behavior of partial sums of an mds with weights. In the sequel we introduce
such a result to be frequently used in later chapters.
For a sequence of matrices {Mk }k≥1 and a sequence of nondecreasing positive
numbers {bk }k≥1 , by Mk = O(bk ) we mean

lim sup Mk /bk < ∞


k→∞

and by Mk = o(bk ),
lim Mk /bk = 0.
k→∞
We introduce a technical lemma, known as the Kronecker lemma.

Lemma 1.2.4 (Kronecker lemma) If {bk }k≥1 is a sequence of positive numbers non-
decreasingly diverging to infinity and if for a sequence of matrices {Mk }k≥1 ,

 1
Mk < ∞, (1.2.22)
bk
k=1
k
then i=1 Mi = o(bk ).

Proof. See Appendix A. 


Based on Lemma 1.2.4, the following estimate for the weighted sum of an mds
takes place.

Theorem 1.2.14 Let {ξk , Fk } be an l-dimensional mds and {Mk , Fk } a matrix


adapted process. If  
sup E ξk+1 α |Fk  σ < ∞ a.s.
k

for some α ∈ (0, 2], then as k → ∞


k  
 1 +η
Mi ξi+1 = O sk (α ) log(sαk (α ) + e) α a.s. ∀ η > 0, (1.2.23)
i=0

  α1
where sk (α ) =
k α
i=0 Mi  .
Dependent Random Vectors  15

Proof. See Appendix A. 

Example 1.2.1 In the one dimensional case if {ξk }k≥1 is iid with E ξk = 0 and E ξk2 <
k  1 
∞, then from Theorem 1.2.14 we have i=1 ξi = O k 2 (log k) 2 +η a.s. ∀ η > 0.
1

Thus the estimate given by Theorem 1.2.14 is not as sharp as those given by the law
of the iterative logarithm but the conditions required here are much more general.

1.3 Markov Chains with State Space (Rm , B m )


In systems and control, many dynamic systems are modeled as the discrete-time
stochastic systems, which are closely connected with Markov chains. To see this, let
us consider the following example.

Example 1.3.1 The ARX system is given by

yk+1 = a1 yk + · · · + a p yk+1− p + b1 uk + · · · + bq uk+1−q + εk+1 , (1.3.1)

where uk and yk are the system input and output, respectively, and {εk } is a sequence
of mutually independent zero-mean random variables. Define
⎡ ⎤
a1 · · · · · · · · · a p b1 · · · · · · bq
⎡ ⎤
⎢1
⎢ 0 ··· 0 0 0⎥ ⎥ εk
⎡ ⎤
yk ⎢0 1 0 0 0 ⎥ ⎢0⎥
⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ ⎢ .. . . .. .. .. .. .. ⎥ ⎢ .. ⎥
⎢ . ⎥ ⎢. . . . . . . ⎥ ⎢.⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢yk+1− p ⎥ ⎢ . ⎥ ⎢0⎥
ϕk = ⎢⎢ ⎥ ⎢ . ⎥
0 · · · · · · 0 ⎥ , ξk = ⎢ ⎥
u ⎥ ,A = ⎢ 0 . 1 0 ⎢uk ⎥ .
⎢ k ⎥ ⎢0 0 0 0 ··· ··· 0 ⎥ ⎥ ⎢ ⎥
⎢ . ⎥ ⎢ ⎢0⎥
⎣ .. ⎦ ⎢. .. . ⎥ ⎢ ⎥
⎢ .. ··· 0 1 . .. ⎥ ⎢.⎥
uk+1−q ⎢ ⎥ ⎣ .⎦
.
⎢.
⎣ .. .. .. .. .. ⎥ ⎦
··· . . . . 0
0 ··· ··· ··· ··· ··· 0 1 0
(1.3.2)

Then (1.3.1) can be rewritten as

ϕk+1 = Aϕk + ξk+1 , (1.3.3)

and hence the regressor sequence {ϕk }k≥0 is a Markov chain valued in (R p+q ,B p+q )
provided {ξk }k≥0 is a sequence of mutually independent random vectors. Thus, in
a certain sense, the analysis of the system (1.3.1) can resort to investigating the
properties of the chain {ϕk }k≥0 .

In this section we introduce some results of vector-valued Markov chains. These


results, extending the corresponding results of chains valued in a countable state
16  Recursive Identification and Parameter Estimation

space, will frequently be used in the later chapters to establish the a.s. convergence
of recursive algorithms.
Assume {xk }k≥0 is a sequence of random vectors valued in Rm . If

P{xk+1 ∈ A|xk , · · · , x0 } = P{xk+1 ∈ A|xk } (1.3.4)

for any A ∈ B m , then the sequence {xk }k≥0 is said to be a Markov chain with the
state space (Rm , B m ). Further, if the right-hand side of (1.3.4) does not depend on
the time index k, i.e.,

P{xk+1 ∈ A|xk } = P{x1 ∈ A|x0 }, (1.3.5)

then the chain {xk }k≥0 is said to be homogenous.


In the rest of the section, all chains are assumed to be homogenous if no special
statements are claimed.
For the chain {xk }k≥0 , denote the one-step transition probability and the k-step
transition probability by P(x, A) = P{x1 ∈ A|x0 = x} and Pk (x, A) = P{xk ∈ A|x0 = x},
respectively, where x ∈ Rm and A ∈ B m . It holds that

Pk (x, A) = Pk−1 (y, A)P(x, dy). (1.3.6)


Rm

Denote by Pk (·) the probability measure induced by xk : Pk (A) = P{xk ∈ A}, A ∈


Bm.
Assume

P(x, A)P0 (dx) = P0 (A) ∀ A ∈ B m (1.3.7)


Rm

for some initial probability measure P0 (·) of x0 . Then by (1.3.6) and Theorem 1.1.4,
it can inductively be proved that for any A ∈ B m ,

P1 (A) = P(x, A)P0 (dx) = P0 (A),


Rm
 
P2 (A) = P2 (x, A)P0 (dx) = P(y, A)P(x, dy) P0 (dx)
Rm Rm Rm
 
= P(y, A) P(x, dy)P0 (dx) = P(y, A)P1 (dy)
Rm Rm Rm

= P(y, A)P0 (dy) = P1 (A) = P0 (A),


Rm

and further,

Pk (A) = P0 (A), k ≥ 1.
Dependent Random Vectors  17

The initial probability measure P0 (·) of x0 satisfying (1.3.7) is called the invariant
probability measure of the chain {xk }k≥0 .
It should be noted that for a chain {xk }k≥0 , its invariant probability measure does
not always exist, and if exists, it may not be unique.
Denote the total variation norm of a signed measure ν (·) on (Rm , B m ) by ν var ,
i.e.,
ν var = ν+ (dx) + ν− (dx),
Rm Rm

where ν = ν+ − ν− is the Jordan–Hahn decomposition of ν (see Theorem 1.1.1).

Definition 1.3.1 The chain {xk }k≥0 is called ergodic if there exists a probability
measure PIV (·) on (Rm , B m ) such that

Pk (x, ·) − PIV (·)var −→ 0 (1.3.8)


k→∞

for any x ∈ Rm . Further, if there exist constants 0 < ρ < 1 and M > 0 possibly
depending on x, i.e., M = M(x) such that

Pk (x, ·) − PIV (·)var ≤ M(x)ρ k , (1.3.9)

then the chain {xk }k≥0 is called geometrically ergodic.

The probability measure PIV (·) is, in fact, the invariant probability measure of
{xk }k≥0 . It is clear that if the chain {xk }k≥0 is ergodic, then its invariant probability
measure is unique. In what follows we introduce criteria for ergodicity and geometric
ergodicity of the chain {xk }k≥0 valued in (Rm , B m ). For this, we first introduce some
definitions and related results, which the ergodicity of Markov chains is essentially
based on.

Definition 1.3.2 The chain {xk }k≥0 valued in (Rm , B m ) is called μ -irreducible if
there exists a measure μ (·) on (Rm , B m ) such that


Pk (x, A) > 0 (1.3.10)
k=1

for any x ∈ Rm and any A ∈ B m with μ (A) > 0. The measure μ (·) is called the
maximal irreducibility measure of {xk }k≥0 if

(i) {xk }k≥0 is μ -irreducible;


(ii) for any other measure μ  (·) on (Rm , B m ), {xk }k≥0 is μ  -irreducible if and
only if μ  (·) is absolutely continuous with respect to μ (·);
  ∞ 
(iii) μ x : Pk (x, A) > 0 = 0 whenever μ (A) = 0.
k=1
18  Recursive Identification and Parameter Estimation

Formula (1.3.10) indicates that for the μ -irreducible chain {xk }k≥0 , starting from
any initial state x0 = x ∈ Rm , the probability that in a finite number of steps the
sequence {xk }k≥0 enters any set A with positive μ -measure is always positive. In the
following, when we say that the chain {xk }k≥0 is μ -irreducible, we implicitly assume
that μ (·) is the maximal irreducibility measure of {xk }k≥0 .

Definition 1.3.3 Suppose A1 , · · · , Ad are disjoint sets in B m . For the chain {xk }k≥0 ,
if
(i) P(x, Ai+1 ) = 1 ∀ x ∈ Ai , i = 1, · · · , d − 1,
and
(ii) P(x, A1 ) = 1 ∀ x ∈ Ad ,
then {A1 , · · · , Ad } is called a d-cycle of {xk }k≥0 .
The d-cycle is called maximal if
(iii) there exists a measure ν (·) on (Rm , B m ) such that

 
d 
ν (Ai ) > 0, i = 1, · · · , d and v Rm / Ai = 0, (1.3.11)
i=1

and
(iv) for any sets {A1 , · · · , Ad  } satisfying (i) and (ii) with d replaced by d  , d  must
divide d.
The integer d is called the period of {xk }k≥0 if the d-cycle of {xk }k≥0 is maximal.
When the period equals 1, the chain {xk }k≥0 is called aperiodic.

The small set is another concept related to ergodicity of Markov chains valued
in (Rm , B m ). Let us first recall the ergodic criterion for Markov chains valued in a
countable state space.
Suppose that the chain {ϕk }k≥0 takes values in {1, 2, 3, · · · } and its transition
probability is denoted by pi j = P{ϕk+1 = j|ϕk = i}, i, j = 1, 2, 3, · · · . It is known that
if {ϕk }k≥0 is irreducible, aperiodic, and there exist a finite set C ⊂ {1, 2, 3, · · · }, a
nonnegative function g(·), and constants K > 0 and δ > 0 such that

E[g(ϕk+1 )|ϕk = j] < g( j) + K ∀ j ∈ C, (1.3.12)


E[g(ϕk+1 )|ϕk = j] < g( j) − δ ∀ j∈C, (1.3.13)

then {ϕk }k≥0 is ergodic, i.e.,

lim P{ϕk = j|ϕ0 = i} = π j , j = 1, 2, · · · (1.3.14)


k→∞
∞
with π j ≥ 0 and j=1 π j = 1. The concept of a small set can be regarded as an
extension of the above finite subset C.
Dependent Random Vectors  19

In the sequel, we adopt the following notations,

E(s(xn )|x0 = x)  s(y)Pn (x, dy), Eν (s)  s(x)ν (dx) (1.3.15)


Rm Rm

where Pn (x, ·) is the n-step transition probability of the chain {xk }k≥0 , s(x) is a mea-
surable function on (Rm , B m ), and ν (·) is a measure on (Rm , B m ).

Definition 1.3.4 Assume {xk }k≥0 is a μ -irreducible chain. We say that {xk }k≥0 sat-
isfies the minorization condition M(m0 , β , s, ν ), where m0 ≥ 1 is an integer, β > 0 a
constant, s(x) a nonnegative measurable function on (Rm , B m ) with Eμ (s) > 0, and
ν (·) a probability measure on (Rm , B m ), if

Pm0 (x, A) ≥ β s(x)ν (A) ∀ x ∈ Rm ∀ A ∈ B m . (1.3.16)

The function s(x) and the probability measure ν (·) are called the small function and
small measure, respectively. If s(x) equals some indicator function, i.e., s(x) = IC (x)
for some C ∈ B m with μ (C) > 0 and

Pm0 (x, A) ≥ β ν (A) ∀ x ∈ C ∀ A ∈ B m , (1.3.17)

then C is called a small set.

Lemma 1.3.1 Suppose that the μ -irreducible chain {xk }k≥0 satisfies the minoriza-
tion condition M(m0 , β , s, ν ). Then
(i) the small measure ν (·) is also an irreducibility measure for {xk }k≥0 , and

(ii) the set C  {x : s(x) ≥ γ } for any constant γ > 0 is small, whenever it is
μ -positive.

Proof. See Appendix A. 

Lemma 1.3.2 Suppose that the chain {xk }k≥0 is μ -irreducible. Then,
(i) for any set B with μ (B) > 0, there exists a small set C ⊂ B;
(ii) if s(x) is small, so is E(s(xn )|x0 = x) ∀ n ≥ 1; and
(iii) if both s(·) and s (·) are small functions, so is s(·) + s (·).

Proof. See Appendix A. 


The following results are useful in justifying whether a Markov chain is aperiodic
or a set is small.

Theorem 1.3.1 Suppose that the chain {xk }k≥0 is μ -irreducible. If either
(i) there exists a small set C ∈ B m with μ (C) > 0 and an integer n, possibly
depending on C, such that

Pn (x,C) > 0, Pn+1 (x,C) > 0 ∀ x ∈ C, (1.3.18)


20  Recursive Identification and Parameter Estimation

or

(ii) there exists a set A ∈ B m with μ (A) > 0 such that for any B ⊂ A, B ∈ B m
with μ (B) > 0 and for some positive integer n possibly depending on B,

Pn (x, B) > 0, Pn+1 (x, B) > 0 ∀ x ∈ B, (1.3.19)

then {xk }k≥0 is aperiodic.

Proof. See Appendix A. 

Theorem 1.3.2 Suppose that {xk }k≥0 is a μ -irreducible, aperiodic Markov chain
valued in (Rm , B m ).
(i) Let s(x) be a small function. Then, any set C with μ (C) > 0 satisfying


l
inf E(s(xk )|x0 = x) > 0 (1.3.20)
x∈C
k=0

for some integer l ≥ 0 is a small set.


(ii) Any set C with μ (C) > 0 satisfying the following condition is a small set:
there exists some A ∈ B m with μ (A) > 0 such that for any B ⊂ A with μ (B) >
0,


l
inf Pk (x, B) > 0, (1.3.21)
x∈C
k=0

where the integer l ≥ 0 may depend on B.

Proof. See Appendix A. 

Theorem 1.3.3 Assume that the chain {xk }k≥0 is irreducible and aperiodic. If there
exist a nonnegative measurable function g(·), a small set S, and constants ρ ∈ (0, 1),
c1 > 0, and c2 > 0 such that

E[g(xk+1 )|xk = x] ≤ ρ g(x) − c1 ∀ x∈S, (1.3.22)


E[g(xk+1 )|xk = x] ≤ c2 ∀ x ∈ S, (1.3.23)

then there exist a probability measure PIV (·) and a nonnegative measurable function
M(x) such that

Pk (x, ·) − PIV (·)var ≤ M(x)ρ k . (1.3.24)

 can be selected such that M(x) =


Further, the nonnegative function M(x) in (1.3.24)
a + bg(x), where a ≥ 0, b ≥ 0 are constants and Rm g(x)PIV (dx) < ∞.
Dependent Random Vectors  21

Under assumptions different from those required in Theorem 1.3.3 we have dif-
ferent kinds of ergodicity. To this end, we introduce the following definition.

Definition 1.3.5 For the chain {xk }k≥0 , the following property is called the Doeblin
condition: There exist a probability measure ν (·) and some constants 0 < ε < 1, 0 <
δ < 1 such that Pk0 (x, A) ≥ δ ∀ x ∈ Rm for an integer k0 whenever ν (A) > ε .

Theorem 1.3.4 Suppose that the chain {xk }k≥0 is irreducible and aperiodic. If
{xk }k≥0 satisfies the Doeblin condition, then there exist a probability measure PIV (·)
and constants M > 0, 0 < ρ < 1 such that

Pk (x, ·) − PIV (·)var ≤ M ρ k . (1.3.25)

Theorem 1.3.5 Assume that the chain {xk }k≥0 is irreducible and aperiodic. If there
exist a nonnegative measurable function g(·), a small set S, and constants c1 > 0 and
c2 > 0 such that

E[g(xk+1 )|xk = x] ≤ g(x) − c1 ∀ x∈S, (1.3.26)


E[g(xk+1 )|xk = x] ≤ c2 ∀ x ∈ S, (1.3.27)

then there exists a probability measure PIV (·) such that

Pk (x, ·) − PIV (·)var −→ 0. (1.3.28)


k→∞

Theorems 1.3.3, 1.3.4, and 1.3.5 are usually called the geometrically ergodic cri-
terion, the uniformly ergodic criterion, and the ergodic criterion, respectively. It is
clear that the geometrical ergodicity is stronger than the ergodicity, but weaker than
the uniform ergodicity. Next, we show that for a large class of stochastic dynamic
systems the geometrical ergodicity takes place if a certain stability condition holds.
By μn (·) we denote the Lebesgue measure on (Rn , B n ).
Let us consider the ergodicity of the single-input single-output (SISO) nonlinear
ARX (NARX) system

yk+1 = f (yk , · · · , yk+1− p , uk , · · · , uk+1−q ) + εk+1 , (1.3.29)

where uk and yk are the system input and output, respectively, εk is the noise, (p0 , q0 )
are the known system orders, and f (·) is a nonlinear function.
The NARX system (1.3.29) is a straightforward generalization of the linear ARX
system and covers a large class of dynamic phenomena. This point will be made clear
in the later chapters.
By denoting

xk  [yk , · · · , yk+1− p , uk , · · · , uk+1−q ]T ,


ϕ1 (xk )  [ f (yk , · · · , yk+1− p , uk , · · · , uk+1−q ), yk , · · · , yk+2− p ]T ,
ϕ2 (xk )  [0, uk , · · · , uk+2−q ]T ,
ϕ (xk )  [ϕ1 (xk )T ϕ2 (xk )T ]T ,
22  Recursive Identification and Parameter Estimation

and

ξk+1  [εk+1 , 0, · · · , 0, uk+1 , 0, · · · , 0]T ,


$ %& ' $ %& '
p q

the NARX system (1.3.29) is transformed to the following state space model

xk+1 = ϕ (xk ) + ξk+1 . (1.3.30)

Thus {xk }k≥0 is a Markov chain if {ξk }k≥0 satisfies certain probability condi-
tions, e.g., if {ξk }k≥0 is a sequence of mutually independent random variables. Er-
godicity of {xk }k≥0 can be investigated by the results given in the preceding sections.
To better understand the essence of the approach, let us consider the first order (i.e.,
p = q = 1) NARX system:

yk+1 = f (yk , uk ) + εk+1 . (1.3.31)

We need the following conditions.

A1.3.1 Let the input {uk }k≥0 be a sequence of iid random variables with Euk =
0, Eu2k < ∞, and with a probability density function denoted by fu (·), which
is positive and continuous on R.
A1.3.2 {εk }k≥0 is a sequence of iid random variables with E εk = 0, E εk2 < ∞, and
with a density function fε (·), which is assumed to be positive and uniformly
continuous on R;
A1.3.3 {εk }k≥0 and {uk }k≥0 are mutually independent;
A1.3.4 f (·, ·) is continuous on R2 and there exist constants 0 < λ < 1, c1 > 0, c2 > 0,
and l > 0 such that | f (ξ1 , ξ2 )| ≤ λ |ξ1 | + c1 |ξ2 |l + c2 ∀ ξ = [ξ1 ξ2 ]T ∈ R2 ,
where λ , c1 , c2 , and l may be unknown;

A1.3.5 E|uk |l < ∞ and the initial value y0 satisfies E|y0 | < ∞.

By denoting xk  [yk uk ]T , ϕ (xk )  [ f (yk , uk ) 0]T , and ξk  [εk uk ]T , the NARX


system (1.3.31) is rewritten as follows:

xk+1 = ϕ (xk ) + ξk+1 . (1.3.32)

Under the conditions A1.3.1–A1.3.3, it is clear that the state vector sequence
{xk }k≥0 defined by (1.3.32) is a time-homogeneous Markov chain valued in
(R2 , B 2 ). As to be seen in what follows, Assumption A1.3.4 is a kind of stability
condition to guarantee ergodicity of {xk }k≥0 .

Lemma 1.3.3 If A1.3.1–A1.3.3 hold, then the chain {xk }k≥0 defined by (1.3.32)
is μ2 -irreducible and aperiodic, and μ2 is the maximal irreducibility measure of
{xk }k≥0 . Further, any bounded set A ∈ B 2 with μ2 (A) > 0 is a small set.
Dependent Random Vectors  23

Proof. See Appendix A. 

Theorem 1.3.6 Assume A1.3.1–A1.3.5 hold. Then

(i) there exist a probability measure PIV (·) on (R2 , B 2 ), a nonnegative measur-
able function M(x), and a constant ρ ∈ (0, 1) such that

Pn (x, ·) − PIV (·)var ≤ M(x)ρ n ∀ x = [ξ1 ξ2 ]T ∈ R2 ; (1.3.33)



(ii) supn R2 M(x)Pn (dx) < ∞ and Pn (·) − PIV (·)var ≤ cρ n for some constants
c > 0 and ρ ∈ (0, 1);

(iii) PIV (·) is with probability density fIV (·, ·) which is positive on R2 , and

fIV (s1 , s2 ) = fε (s1 − f (ξ1 , ξ2 ))PIV (dx) fu (s2 ). (1.3.34)


R2

Proof. We first prove (i). Define the Lyapunov function g(x)  |ξ1 | + β |ξ2 |l ,
where x = [ξ1 ξ2 ]T ∈ R2 and β > 0 is a constant to be determined.
By A1.3.1–A1.3.5, we have

E[g(xk+1 )|xk = x] = E[|yk+1 | + β |uk+1 |l |xk = x]


(
= E[| f (yk , uk ) + εk+1 |(xk = x] + β E|u1 |l ≤ | f (ξ1 , ξ2 )| + E|ε1 | + β E|u1 |l
≤ λ |ξ1 | + c1 |ξ2 |l + c2 + E|ε1 | + β E|u1 |l .
c1
Let β = λ . Then from the above inequalities it follows that

E[g(xk+1 )|xk = x] ≤λ |ξ1 | + λ β |ξ2 |l + c2 + E|ε1 | + β E|u1 |l


≤λ g(x) + c3 , (1.3.35)
 
=λ  g(x) − (λ  − λ )g(x) − c3 , (1.3.36)

where c3  c2 + E|ε1 | + β E|u1 |l and 0 < λ < λ  < 1.



(Choose K > 0 large enough such that (λ − λ )K − c3 > 0, and define S = {x ∈
(
R g(x) ≤ K}. Since S is a bounded set, by Lemma 1.3.3 S is a small set.
2

From (1.3.36) we have


 
E[g(xk+1 )|xk = x] ≤ λ  g(x) − (λ  − λ )K − c3  λ  g(x) − c4 ∀ x ∈ S, (1.3.37)

and from (1.3.35)

E[g(xk+1 )|xk = x] ≤ λ K + c3  c5 ∀ x ∈ S. (1.3.38)

Noticing (1.3.37) and (1.3.38) and applying Theorem 1.3.3, we see that (1.3.33)
holds.
24  Recursive Identification and Parameter Estimation

We now prove (ii). By Theorem 1.3.3, the measurable function M(x) actually can
be taken as a + bg(x), where a and b are positive constants and g(x) is the Lyapunov
function defined above. To prove (ii) we first verify that

sup g(x)Pk (dx) < ∞. (1.3.39)


k≥0
R2

Noticing A1.3.4, A1.3.5, and g(x) = |ξ1 | + β |ξ2 |l , we have that

g(x)Pk (dx) = Eg(xk ) = E|yk | + β E|uk |l


R2
=E| f (yk−1 , uk−1 ) + εk | + β E|u1 |l
≤λ E|yk−1 | + c1 E|uk−1 |l + c2 + E|ε1 | + β E|u1 |l
 
=λ E| f (yk−2 , uk−2 ) + εk−1 | + c1 E|u1 |l + c2 + E|ε1 | + β E|u1 |l
   
≤λ 2 E|yk−2 | + λ c1 E|u1 |l + c2 + E|ε1 | + c1 E|u1 |l + c2 + E|ε1 | + β E|u1 |l
≤···
 
≤λ k E|y0 | + (λ k−1 + λ k−2 + · · · + 1) c1 E|u1 |l + c2 + E|ε1 | + β E|u1 |l ,

which imply (1.3.39) by noticing 0 < λ < 1.


Then by Lemma 1.3.3 and (1.3.39) and by noticing the basic property of the total
variation norm, for any A ∈ B 2 we have

|Pn (A)−PIV (A)| ≤ |Pn (x, A) − PIV (A)|P0 (dx)


R2

≤ Pn (x, ·) − PIV (·)var P0 (dx) ≤ ρ n M(x)P0 (dx) ≤ M ρ n ,


R2 R2

and

Pn (·) − PIV (·)var = sup (Pn (A) − PIV (A)) − inf (Pn (A) − PIV (A))
A∈B2 A∈B2

≤ 2 sup |Pn (A) − PIV (A)| ≤ 2M ρ n .


A∈B2

Hence (ii) holds.


Finally we prove (iii). Noticing that both {uk } and {εk } are sequences of iid
random variables with densities fu (·) and fε (·), respectively, and

PIV (A) = Pn (x, A)PIV (dx) ∀ A ∈ B 2 ∀ n ≥ 1, (1.3.40)


R2
Dependent Random Vectors  25

by (A.56) we have

PIV (A) = P(x, A)PIV (dx)


R2

= fε (s1 − f (ξ1 , ξ2 )) fu (s2 )PIV (dx)ds1 ds2 . (1.3.41)


A R2

Hence, PIV (A) is with density function

fIV (s1 , s2 ) = fε (s1 − f (ξ1 , ξ2 ))PIV (dx) fu (s2 ). (1.3.42)


R2

According to A1.3.4, we have sup x ≤K | f (x1 , x2 )| < ∞ for any fixed K > 0. As
both fu (·) and fε (·) are positive, for a large enough K > 0 it follows that

fIV (s1 , s2 ) = fε (s1 − f (ξ1 , ξ2 ))PIV (dx) fu (s2 )


R2

≥ fε (s1 − f (ξ1 , ξ2 ))PIV (dx) fu (s2 )


x ≤K

≥ inf { fε (s1 − f (ξ1 , ξ2 ))} fu (s2 )PIV {x ≤ K} > 0.


x ≤K

This proves (iii). 


We now consider the NARX system (1.3.29) and (1.3.30) with p0 > 1, q0 > 1.
For (1.3.29), the assumptions A1.3.1, A1.3.2, and A1.3.3 remain unchanged,
while A1.3.4 and A1.3.5 correspondingly change to the following A1.3.4’ and
A1.3.5’.
A1.3.4’ f (·) is continuous on R p+q and there exist a vector norm  · v on R p and
constants 0 < λ < 1, c1 > 0, c2 > 0, and l > 0 such that

q
ϕ1 (x)v ≤ λ sv + c1 |ti |l + c2 ∀ x ∈ R p+q , (1.3.43)
i=1

where s  [s1 · · · s p ]T ∈ R p , t  [t1 · · ·tq ]T ∈ Rq , and x  [sT t T ]T ∈ R p+q .

A1.3.5’ E|uk |l < ∞ and EY0  < ∞, where Y0  [y0 , y−1 , · · · , y1− p ]T is the initial
value.
The probabilistic properties of {xk }k≥0 such as irreducibility, aperiodicity, and
ergodicity for the case p > 1, q > 1 can be established as those for the first order
system. In fact, we have the following theorem.

Theorem 1.3.7 If A1.3.1–A1.3.3, A1.3.4’, and A1.3.5’ hold, then the chain {xk }k≥0
defined by (1.3.30) is μ p+q -irreducible, aperiodic, and
26  Recursive Identification and Parameter Estimation

(i) there exist a probability measure PIV (·) on (R p+q , B p+q ), a nonnegative
measurable function M(x), and a constant 0 < ρ < 1 such that Pn (x, ·) −
PIV (·)var ≤ M(x)ρ n ∀ x ∈ R p+q ;

(ii) supn R p+q M(x)Pn (dx) < ∞ and Pn (·)−PIV (·)var ≤ cρ n for some constants
c > 0 and 0 < ρ < 1.
Further, PIV (·) is with probability density, which is positive on R p+q .

Theorem 1.3.7 can be proved similarly to Lemma 1.3.3 and Theorem 1.3.6. Here
we only give some remarks.

Remark 1.3.1 Set n0  p ∨ q = max{p, q}. To establish irreducibility and ape-


riodicity in the case p = q = 1, the one-step transition probability P(x, A), x ∈
R2 , A ∈ B 2 is considered, while for the case n0 > 1, the n0 -step transition prob-
ability Pn0 (x, A), x ∈ R p+q , A ∈ B p+q should be investigated. To establish the ge-
qergodicity of {xk }k≥0 , 1the Lyapunov function may be chosen as g(x) =
ometrical
sv + i=1 βi |ti |l , where β1 = qc
λ q and βi+1 = λ βi − c1 , i = 1, · · · , q − 1.

Remark 1.3.2 We note that (1.3.34) gives the  expression of the invariant probabil-
ity density of the first order NARX system (p, q) = (1, 1) . For the general case
p > 1 and q > 1, the invariant probability density and its properties can similarly be
obtained from the n0 -step transition probability Pn0 (x, ·) with n0 = max(p, q). For ex-
ample, for the case (p, q) = (2, 1) by investigating the two-step transition probability,
we find that the invariant probability density is expressed as follows:
 ∞ 
 
fIV (s1 , s2 , s3 ) = fε s1 − f (s2 , x1 ,t) fu (t)dt
R3 −∞
 
· fε s2 − f (x1 , x2 , x3 ) PIV (dx) fu (s3 ),

while for the case (p, q) = (3, 2) considering the three-step transition probability
leads to the invariant probability density

fIV (s1 , s2 , s3 , s4 , s5 )
 ∞ 
   
= fε s1 − f (s2 , s3 , x1 , s5 ,t) fε s2 − f (s3 , x1 , x2 ,t, x4 ) fu (t)dt
R5 −∞
 
· fε s3 − f (x1 , x2 , x3 , x4 , x5 ) PIV (dx) fu (s4 ) fu (s5 ).

The properties of fIV (s1 , s2 ), fIV (s1 , s2 , s3 ), and fIV (s1 , s2 , s3 , s4 , s5 ) are derived from
the above formulas by using the assumptions made in Theorem 1.3.7.

Remark 1.3.3 In A1.3.4’, a vector norm rather than the Euclidean norm is adopted.
This is because such a norm is more general than the Euclidean norm and λ in
(1.3.43) for many NARX systems in such a norm may be taken smaller than 1. The
fact that λ ∈ (0, 1) is of crucial importance for establishing stability and ergodicity
Dependent Random Vectors  27

of the NARX system (see the proof of Theorem 1.3.6). It is natural to ask what will
happen if λ ≥ 1. Let us consider the following example:

yk+1 = yk + εk+1 ,
k+1
where {εk } is iid. It is clear that yk+1 = i=1 εi if the initial value y0 = 0. It is seen
that for the above system, the constant λ equals 1 and {yk }k≥1 is not ergodic. So,
in a certain sense, the condition λ ∈ (0, 1) is necessary for ergodicity of the NARX
system.

For ergodicity of nonlinear systems, we assume that both {uk }k≥0 and {εk }k≥0
are with positive probability density functions. In fact, these assumptions are suf-
ficient but not necessary for ergodicity of stochastic systems. Let us consider the
following linear process:

xk+1 = Fxk + Gεk+1 k ≥ 0, (1.3.44)

where xk ∈ Rm , εk ∈ Rr , F ∈ Rm×m , and G ∈ Rm×r .


We make the following assumptions.

A1.3.6 All eigenvalues of F are strictly inside the unit cycle;


) *
A1.3.7 (F, G) is controllable, i.e., rank G FG · · · F m−1 G = m;

A1.3.8 {εk }k≥0 is iid with density which is positive and continuous on a set U ∈ B r
satisfying μr (U) > 0, where μr (·) is the Lebesgue measure on (Rr , B r ).

Theorem 1.3.8 Assume that A1.3.6–A1.3.8 hold. Then the chain {xk }k≥0 defined by
(1.3.44) is geometrically ergodic.

To prove Theorem 1.3.8, we need an auxiliary lemma.

Lemma 1.3.4 Given a matrix A ∈ Rn×n and any ε > 0, there exists a vector norm
 · v such that

Axv ≤ (ρ (A) + ε )xv ∀ x ∈ Rn , (1.3.45)

where ρ (A)  max{|λi |, i = 1, · · · , n} and {λi , i = 1, · · · , n} are the eigenvalues of A.

Proof. First, for the matrix A there exists a unitary matrix U such that
⎡ ⎤
λ1 t12 t13 · · · t1n
⎢ 0 λ1 t23 · · · t2n ⎥
⎢ ⎥
⎢ .. . . .. .. ⎥
−1 ⎢ .⎥
U AU = ⎢ . . . ⎥. (1.3.46)
⎢. . . . ⎥
⎣. . . . . . . ⎦
.
0 · · · · · · 0 λn
28  Recursive Identification and Parameter Estimation

For any fixed δ > 0, define

Dδ  diag{1, δ , · · · , δ n−1 }. (1.3.47)

Then it follows that


⎡ ⎤
λ1 δ t12 δ 2t13 ··· δ n−1t1n
⎢0 λ1 δ t23 ··· δ n−2t2n ⎥
⎢ ⎥
⎢ .. .. .. .. ⎥
(UDδ ) A(UDδ ) = ⎢
−1
⎢. . . . ⎥ ⎥. (1.3.48)
⎢. .. .. ⎥
⎣ .. . . δ tn−1,n ⎦
0 ··· ··· 0 λn

For the given ε > 0, we can choose δ > 0 small enough such that


n
|ti j |δ j−i < ε , i = 1, · · · , n − 1. (1.3.49)
j=i+1

For any x ∈ Rn , define the vector norm by

xv  (UDδ )−1 x∞ , (1.3.50)

where v∞  max1≤i≤n {|vi |}, v = [v1 · · · vn ]T ∈ Rn .


From (1.3.49) and (1.3.50), we have

Axv =(UDδ )−1 Ax∞


( (
= max ( (UDδ )−1 Ax ( i
1≤i≤n
( (
= max ( (UDδ )−1 A(UDδ ) · (UDδ )−1 x i (
1≤i≤n
 
n  ( (
≤ max |ti j |δ j−i + |λi | · max ( (UDδ )−1 Ax i (
1≤i≤n 1≤i≤n
j=i+1

≤(ρ (A) + ε )xv ∀ x ∈ Rn . (1.3.51)

This finishes the proof. 


Proof of Theorem 1.3.8. We only sketch the proof, since it can be done in similar
fashion to Theorems 1.3.6 and 1.3.7.
From (1.3.44), it follows that

xk+1 = F k+1 x0 + F k Gε1 + F k−1 Gε2 + · · · + Gεk+1 . (1.3.52)

Denote by E the vector space spanned by vectors {F k+1 x0 (ω ) + F k Gε1 (ω ) +


F k−1 Gε 2 (ω ) + · · · + Gεk+1 (ω ), k ≥ 1, ω ∈ Ω}. By A1.3.7 and A1.3.8, E is μm -
positive. Denote by C m the sub-σ -algebra of B m restricted on E. In the following
we consider the measurable space (E, C m ) and the Lebesgue measure on it. For sim-
plicity of notations, the Lebesgue measure on (E, C m ) is still denoted by μm (·).
Dependent Random Vectors  29

It can be shown that {xk }k≥0 defined by (1.3.44) is a Markov chain valued in
(E, C m ). Carrying out a discussion similar to that for Theorems 1.3.6 and 1.3.7 and
noticing that the distribution of εk is absolutely continuous with respect to μm (·), we
can show that {xk }k≥0 is μm (·)-irreducible, aperiodic, and any bounded set in C m
with a positive μm -measure is a small set.
By Lemma 1.3.4 for the matrix F there exist a vector norm  · v and 0 < λ < 1
such that Fxv ≤ λ xv ∀ x ∈ Rn . Then by choosing the Lyapunov function
g(·) =  · v and by applying Theorem 1.3.3, it is shown that {xk }k≥0 is geometri-
cally ergodic. 

1.4 Mixing Random Processes


Consider the following linear systems

y1,k+1 = b1 uk + εk+1 , (1.4.1)


y2,k+1 = b1 uk + · · · + bq uk+1−q + εk+1 , (1.4.2)
y3,k+1 = a1 y3,k + · · · + a p y3,k+1− p + b1 uk + · · · + bq uk+1−q + εk+1 . (1.4.3)

Suppose that {uk }k≥0 and {εk }k≥0 are mutually independent and each of them is
a sequence of iid random variables. Further, assume A(z) = 1 − a1 z − · · · − a p z p is
stable, i.e., all roots of A(z) lie strictly outside the unit disk. It is clear that {y1,k }k≥0
and {y2,qk+l }k≥0 are iid sequences for each l = 0, 1, · · · , q − 1. But, this does not hold
k−1
for {y3,k }k≥0 , since for each k, y3,k depends on the past inputs {ui }i=0 and noises
{εi }i=0 . However, since A(z) is stable, we can show that as l tends to infinity, y3,k
k

and y3,k+l are asymptotically independent in a certain sense. In probability theory,


this is called the mixing. In this section, we first introduce definitions of different
types of mixing random processes and the related covariance inequalities, then give
results on the almost sure convergence of mixing random series, and finally present
the connection between the mixing random processes and the geometrically ergodic
Markov chains. It is worth noting that all definitions and results given here can be
applied to random vectors.
Let {ϕk }k≥0 be a random sequence and let F0n  σ {ϕk , 0 ≤ k ≤ n} and Fn∞ 
σ {ϕk , k ≥ n} be the σ -algebras generated by {ϕk , 0 ≤ k ≤ n} and {ϕk , k ≥ n},
respectively.

Definition 1.4.1 The process {ϕk }k≥0 is called an α -mixing or strong mixing if

α (k)  sup sup |P(AB) − P(A)P(B)| −→ 0, (1.4.4)


n A∈F n ,B∈F ∞ k→∞
0 n+k

a β -mixing or completely regular if


 
β (k)  sup E sup |P(B|F0n ) − P(B)| −→ 0, (1.4.5)
n ∞
B∈Fn+k k→∞
30  Recursive Identification and Parameter Estimation

and a φ -mixing or uniformly strong mixing if


|P(AB) − P(A)P(B)|
φ (k)  sup sup −→ 0. (1.4.6)
n ∞
A∈F0n ,P(A)>0,B∈Fn+k P(A) k→∞

The sequences {α (k)}k≥0 , {β (k)}k≥0 , and {φ (k)}k≥0 are called the mixing co-
efficients. It can be shown that

α (k) ≤ β (k) ≤ φ (k). (1.4.7)



Lemma 1.4.1 (i) Assume {ϕk }k≥0 is an α -mixing. For ξ ∈ F0k and η ∈ Fn+k ,
if E[|ξ | p + |η |q ] < ∞ for some p > 1, q > 1, and p + q < 1, then
1 1

1 1 1 1
|E ξ η − E ξ E η | ≤ 10(α (n))1− p − q (E|ξ | p ) p (E|η |q ) q . (1.4.8)


(ii) Assume {ϕk }k≥0 is an φ -mixing. For ξ ∈ F0k and η ∈ Fn+k , if E[|ξ | p +
|η |q ] < ∞ for some p > 1, q > 1, and 1p + 1q = 1, then
1 1 1
|E ξ η − E ξ E η | ≤ 2(φ (n)) p (E|ξ | p ) p (E|η |q ) q . (1.4.9)

Proof. See Appendix A. 


The concept mixingale is generated from the mixing property and is defined as
follows.

Definition 1.4.2 Let {Fk }k≥0 be a sequence of nondecreasing σ -algebras. The se-
quence {ϕk , Fk }k≥0 is called a simple mixingale if ϕk is Fk -measurable and if
for two sequences of nonnegative constants {ck }k≥0 and {ψm }m≥0 with ψm → 0
as m → ∞, the following conditions are satisfied:
 1
(i) E|E(ϕk |Fk−m )|2 2 ≤ ψm ck ∀ k ≥ 0 and ∀ m ≥ 0,
(ii) E ϕk = 0,
where Fk  {∅, Ω} if k ≤ 0.

From the definition, we see that {ck }k≥0 and {ψm }m≥0 reflect the moment and
∞ coefficients of {ϕk }k≥0 , which are important for the almost sure convergence
mixing
of k=0 ϕk . In fact, we have the following result.

Theorem 1.4.1 Let {ϕk , Fk }k≥0 be a simple mixingale such that




c2k < ∞ (1.4.10)
k=1

and

∞ 

(log k)(log log k)1+γ ψk2 c2j < ∞ for some γ > 0. (1.4.11)
k=1 j=k
Dependent Random Vectors  31

Then


ϕk < ∞ a.s. (1.4.12)
k=1

Proof. See Appendix A. 

Theorem 1.4.2 Assume that {ϕk }k≥0 is an α -mixing with mixing coefficients denot-
ed by {α (k)}k≥0 . Let {Φk (·)}k≥0 be a sequence of functions Φk (·) : R → R and
EΦk (ϕk ) = 0. If there exist constants ε > 0 and γ > 0 such that

   2+2 ε
E|Φk (ϕk )|2+ε <∞ (1.4.13)
k=1

and

 ε
log k(log log k)1+γ (α (k)) 2+ε < ∞, (1.4.14)
k=1

then


Φk (ϕk ) < ∞ a.s. (1.4.15)
k=1

Proof. See Appendix A. 


By Theorem 1.4.2, to establish the almost sure convergence of series of mixing
random variables, it suffices to verify the convergence of two deterministic series
(1.4.13) and (1.4.14). The first series concerns the (2 + ε )th moment of the variables,
while the second one is related to the mixing coefficients.
If {ϕk }k≥0 is a sequence of mutually independent random variables satisfying the
assumptions required in Theorem 1.4.2, then the mixing coefficients α (k) = 0, k ≥ 1.
 1   1
By the Lyapunov inequality we have E|Φk (ϕk )|2 2 ≤ E|Φk (ϕk )|2+ε 2+ε . Hence
(1.4.13) implies


E|Φk (ϕk )|2 < ∞. (1.4.16)
k=1

Applying Theorem 1.2.8 to the sum of independent random variables, we obtain




Φk (φk ) < ∞ a.s.
k=1

So, Theorem 1.4.2 can be regarded as a generalization of the corresponding results


for the sum of independent random variables.
The following theorem connects the mixing process with the transition probabil-
ity of Markov chains.
Another Random Document on
Scribd Without Any Related Topics
reached their ports of destination, would create opportunities and
facilities for smuggling.
It appears that the people of Maryland felt some apprehension that
an unrestricted power to make commercial and fiscal regulations
might result in compelling vessels bound to or from Baltimore to
enter or clear at Norfolk, or some other port in Virginia. The
delegates of Maryland accordingly introduced a proposition, which
embraced two ideas; first, that Congress shall not oblige vessels,
domestic or foreign, to enter or pay duties or imposts in any other
State than in that to which they may be bound, or to clear from any
other State than that in which their cargoes may be laden; secondly,
that Congress shall not induce vessels to enter or clear in one State
in preference to another, by any privileges or immunities.[241] This
proposition became the basis of that clause of the Constitution,
which declares that "no preference shall be given by any regulation
of commerce or revenue to the ports of one State over those of
another; nor shall vessels bound to, or from, one State, be obliged
to enter, clear, or pay duties in another."[242]
It was while this subject of the equal operation of the commercial
and revenue powers upon the different States was under
consideration, that the further provision was devised and
incorporated into the Constitution, which requires all duties, imposts,
and excises to be uniform throughout the United States. This clause,
in the final revision of the instrument, was annexed to the power of
taxation.[243]
The commercial power, besides being subjected to the restrictions
which have been thus described, was extended to a subject not
embraced in it by the report of the committee of detail. They had
included in it "commerce with foreign nations, and among the
several States";—meaning, by the former term, not to include the
Indian tribes upon this continent, but all other communities, civilized
and barbarian, foreign to the people of the United States. By the
system which had always prevailed in the relations of Europeans and
their descendants with the Indians of America, those tribes had
constantly been regarded as distinct and independent political
communities, retaining their original rights, and among them the
undisputed possession of the soil; subject to the exclusive right of
the European nation making the first discovery of their territory to
purchase it. This principle, incorporated into the public law of Europe
at the time of the discovery and settlement of the New World, and
practised by general consent of the nations of Europe, was the basis
of all the relations maintained with the Indian tribes by the imperial
government, in the time of our colonial state, by our Revolutionary
Congress, and by the United States under the Confederation. It
recognized the Indian tribes as nations, but as nations peculiarly
situated, inasmuch as their intercourse and their power to dispose of
their landed possessions were restricted to the first discoverers of
their territory. This peculiar condition drew after it two
consequences;—first, that, as they were distinct nations, they could
not be treated as part of the subjects of any one of the States, or of
the United States; and secondly, that, as their intercourse and trade
were subjected to restraint, that restraint would be most
appropriately exercised by the federal power. So general was the
acquiescence in these necessities imposed by the principle of public
law which defined the condition of the Indian tribes, that during the
whole of the thirteen years which elapsed from the commencement
of the Revolution to the adoption of the Constitution, the regulation
of intercourse with those tribes was left to the federal authority. It
was tacitly assumed by the Revolutionary Congress, and it was
expressly conferred by the Articles of Confederation.
The provision of the Confederation on this subject gave to the
United States the exclusive right and power "of regulating the trade
and managing all affairs with the Indians not members of any of the
States, provided that the legislative right of any State within its own
limits be not infringed or violated." The exception of such Indians as
were members of any State, referred to those broken members of
tribes who had lost their nationality, and had become absorbed as
individuals into the political community of the whites. With all other
Indians, remaining as distinct and self-governing communities, trade
and intercourse were subject to the regulation of Congress; while at
the same time each State retained to itself the regulation of its
commerce with all other nations. The broad distinction thus early
established, and thus perpetuated in the Confederation, between
commerce with the Indian tribes, and commerce with "foreign
nations," explains the origin and introduction of a special provision
for the former, as distinguished from the latter, in the Constitution of
the United States.
For although there might have been some reason to contend that
commerce with "foreign nations"—if the grant of the commercial
power had not expressly embraced the Indian tribes—would have
extended to those tribes, as nations foreign to the United States, yet
the entire history of the country, and the peculiarity of the
intercourse needful for their security, made it eminently expedient
that there should be a distinct recognition of the Indian
communities, in order that the power of Congress to regulate all
commerce with them might not only be as ample as that relating to
foreign nations, but might stand upon a distinct assertion of their
condition as tribes. Accordingly, Mr. Madison introduced the separate
proposition "to regulate affairs with the Indians, as well within as
without the limits of the United States";[244] and the committee to
whom it was referred gave effect to it, by adding the words, "and
with the Indian tribes," to the end of the clause containing the grant
of the commercial power.[245]
The remaining powers of Congress may be considered in the order
in which they were acted upon by the Convention. The powers to
establish a uniform rule of naturalization, to coin money and
regulate the value thereof and of foreign coin, and fix the standard
of weights and measures, were adopted without discussion and with
entire unanimity, as they had been proposed in the draft prepared by
the committee of detail. The power to establish post-offices was
extended to embrace post-roads.[246]
These were succeeded by the subject of borrowing money and
emitting bills on the credit of the United States; a power that was
proposed to be given by the committee of detail, while they at the
same time proposed to restrain the States from emitting bills of
credit. I have not been able to discover upon what ground it was
supposed to be proper or expedient to confer a power of emitting
bills of credit on the United States, and to prohibit the States from
doing the same thing. That the same thing was in contemplation in
the two provisions reported by the committee, sufficiently appears
from the debates and from the history of the times. The object of
the prohibition on the States was to prevent the issue and circulation
of paper money; the object of the proposed grant of power to the
United States was to enable the government to employ a paper
currency, when it should have occasion to do so. But the records of
the discussions that have come down to us do not disclose the
reasons which may have led to the supposition that a paper currency
could be used by the United States with any more propriety or safety
than by a State. One of the principal causes which had led to the
experiment of making a national government with power to prevent
such abuses, had been the frauds and injustice perpetrated by the
States in their issues of paper money; and there was at this very
time a loud and general outcry against the conduct of the people of
Rhode Island, who had kept themselves aloof from the national
Convention, for the express purpose, among others, of retaining to
themselves the power to issue such a currency.
It is possible that the phrase "emit bills on the credit of the United
States" might have been left in the Constitution, without any other
danger than the hazards of a doubtful construction, which would
have confined its meaning to the issuing of certificates of debt under
the power to "borrow money." But this was not the sense in which
the term "bills of credit" was generally received throughout the
country, nor the sense intended to be given to it in the clause which
contained the prohibition on the States. The well-understood
meaning of the term had reference to paper issues, intended to
circulate as currency, and bearing the public promise to pay a sum of
money at a future time, whether made or not made a legal tender in
payment of debts. It would have been of no avail, therefore, to have
added a prohibition against making such bills a legal tender. If a
power to issue them should once be seen in the Constitution, or
should be suspected by the people to be there, wrapt in the power
of borrowing money, the instrument would array against itself a
formidable and probably a fatal opposition. It was deemed wiser,
therefore, even if unforeseen emergencies might in some cases
make the exercise of such a power useful, to withhold it altogether.
It was accordingly stricken out, by a vote of nine States against two,
and the authority of Congress was thus confined to borrowing
money on the credit of the United States, which appears to have
been intended to include the issuing of government notes not
transferable as currency.[247]
The clauses which authorize Congress to constitute tribunals inferior
to the Supreme Court,[248] and to make rules as to captures on land
and water,[249]—the latter comprehending the grant of the entire
prize jurisdiction,—were assented to without discussion.[250] Then
came the consideration of the criminal jurisdiction in admiralty, and
that over offences against the law of nations. The committee of
detail had authorized Congress "to declare the law and punishment
of piracies and felonies committed on the high seas, ... and of
offences against the law of nations." The expression to "declare the
law," &c. was changed to the words "define and punish," for the
following reason. Piracy is an offence defined by the law of nations,
and also by the common law of England. But in those codes a single
crime only is designated by that term.[251] It was necessary that
Congress should have the power to declare whether this definition
was to be adopted, and also to determine whether any other crimes
should constitute piracy. In the same way, the term "felony" has a
particular meaning in the common law, and it had in the laws of the
different States of the Union a somewhat various meaning. It was
necessary that Congress should have the power to adopt any
definition of this term, and also to determine what other crimes
should be deemed felonies. So also there were various offences
known to the law of nations, and generally regarded as such by
civilized States. But before Congress could have power to punish for
any of those offences, it would be necessary that they, as the
legislative organ of the nation, should determine and make known
what acts were to be regarded as offences against the law of
nations; and that the power to do this should include both the power
to adopt from the code of public law offences already defined by
that code, and to extend the definition to other acts. The term
"declare" was therefore adopted expressly with a view to the
ascertaining and creating of offences, which were to be treated as
piracies and felonies committed on the high seas, and as offences
against the law of nations.[252]
The same necessity for an authority to prescribe a previous
definition of the crime of counterfeiting the securities and current
coin of the United States would seem to have been felt; and it was
probably intended to be given by the terms "to provide for the
punishment of" such counterfeiting.[253]
The power to "declare" war had been reported by the committee as
a power to "make" war. There was a very general acquiescence in
the propriety of vesting the war power in the legislature rather than
the executive; but the former expression was substituted in place of
the latter, in order, as it would seem, to signify that the legislature
alone were to determine formally the state of war, but that the
executive might be able to repel sudden attacks.[254] The clause
which enables Congress to grant "letters of marque and reprisal"
was added to the war power, at a subsequent period, on the
recommendation of a committee to whom were referred sundry
propositions introduced by Charles Pinckney, of which this was one.
[255]

In addition to the war power, which would seem to involve of itself


the authority to raise all the necessary forces required by the
exigencies of a war, the committee of detail had given the separate
power "to raise armies," which the Convention enlarged by adding
the term to "support."[256] This embraced standing armies in time of
peace, and, as the clause thus amended would obviously allow, such
armies might be enlarged to any extent and continued for any time.
The nature of the government, and the liberties and the very
prejudices of the people, required that some check should be
introduced, to prevent an abuse of this power. A limitation of the
number of troops that Congress might keep up in time of peace was
proposed, but it was rejected by all the States as inexpedient and
impracticable.[257] Another check, capable of being adapted to the
proper exercise of the power itself, was to be found in an idea
suggested by Mr. Mason, of preventing a perpetual revenue.[258]
The application of this principle to the power of raising and
supporting armies would furnish a salutary limitation, by requiring
the appropriations for this purpose to pass frequently under the
review of the representatives of the people, without embarrassing
the exercise of the power itself. Accordingly, the clause now in the
Constitution, which restricts the appropriation of money to the
support of the army to a term not longer than two years, was added
to the power of raising and supporting armies.[259]
Authority "to provide and maintain a navy" was unanimously agreed
as the most convenient definition of the power, and to this was
added, from the Articles of Confederation, the power "to make rules
for the government and regulation of the land and naval forces."[260]
The next subject which required consideration was the power of the
general government over the militia of the States. There were few
subjects dealt with by the framers of the Constitution exceeding this
in magnitude, in importance, and delicacy. It involved not only the
relations of the general government to the States and the people of
the States, but the question whether and how far the whole effective
force of the nation could be employed for national purposes and
directed to the accomplishment of objects of national concern. The
mode in which this question should be settled would determine, in a
great degree, and for all time, whether the national power was to
depend, for the discharge of its various duties in peace and in war,
upon standing armies, or whether it could also employ and rely upon
that great reservation of force that exists in all countries accustomed
to enroll and train their private citizens to the use of arms.
The American Revolution had displayed nothing more conspicuously
than the fact, that, while the militia of the States were in general
neither deficient in personal courage, nor incapable of being made
soldiers, they were inefficient and unreliable as troops. One of the
principal reasons for this was, that, when called into the field in the
service of the federal power, the different corps of the several States
looked up to their own local government as their sovereign; and
being amenable to no law but that of their own State, they were
frequently indisposed to recognize any other authority. But a far
more powerful cause of their inefficiency lay in the fact that they
were not disciplined or organized or armed upon any uniform
system. A regiment of militia drawn from New Hampshire was a very
different body from one drawn from New York, or Pennsylvania, or
New Jersey, or South Carolina. The consequence was, that when
these different forces were brought to act together, there were often
found in the same campaign, and sometimes in the same
engagement, portions of them in a very respectable state of
discipline and equipment, and others in no state of discipline or
equipment at all.
The necessity, therefore, for a uniform system of disciplining and
arming the militia was a thing well ascertained and understood, at
the time of the formation of the Constitution. But the control of this
whole subject was a part of the sovereignty of each State, not likely
to be surrendered without great jealousy and distrust; and one of
the most delicate of the tasks imposed upon the Convention was
that of determining how far and for what purposes the people of the
several States should be asked to confer upon the general
government this very important part of their political sovereignty.
One thing, however, was clear;—that, if the general government was
to be charged with the duty of undertaking the common defence
against an external enemy, or of suppressing insurrection, or of
protecting the republican character of the State constitutions, it must
either maintain at all times a regular army suitable for any such
emergency, or it must have some power to employ the militia. The
latter, when compared with the resource of standing armies, is, as
was said of the institution of chivalry, "the cheap defence of
nations"; and although no nation has found, or will be likely to find,
it sufficient, without the maintenance of some regular troops, the
nature of the liberties inherent in the construction of the American
governments, and the whole current of the feelings of the American
people, would lead them to the adoption of a policy that might
restrain, rather than encourage, the growth of a permanent army. So
far, therefore, it seemed manifest, from the duties which were to be
imposed on the government of the Union, that it must have a power
to employ the militia of the States; and this would of necessity draw
after it, if it was to be capable of a beneficial exercise, the power to
regulate, to some extent, their organization, armament, and
discipline.
But the first draft of the Constitution, prepared by the committee of
detail, contained no express power on this subject, excepting "to call
forth the aid of the militia in order to execute the laws of the Union,
enforce treaties, suppress insurrections, and repel invasions."[261]
Possibly it might have been contended, after the Constitution had
gone into operation, that the general power to make all laws
necessary and proper for the execution of the powers specially
enumerated, would enable Congress to prescribe regulations of the
force which they were authorized to employ, since the authority to
employ would seem to involve the right to have the force kept in a
fit state to be employed. But this would have been a remote
implication of power, too hazardous to be trusted; and it at once
occurred to one of the wisest and most sagacious of the statesmen
composing the Convention, who, though he never signed the
Constitution, exercised a great and salutary influence in its
preparation,—Mr. Mason of Virginia,—that an express and
unequivocal power of regulating the militia must be conferred. He
stated the obvious truth, that, if the disciplining of the militia were
left in the hands of the States, they never would concur in any one
system; and as it might be difficult to persuade them to give up their
power over the whole, he was at first disposed to adopt the plan of
placing a part of the militia under the control of the general
government, as a select force.[262] But he, as well as others,
became satisfied that this plan would not produce a uniformity of
discipline throughout the entire mass of the militia. The question,
therefore, resolved itself practically into this,—what should be the
nature and extent of the control to be given to the general
government, assuming that its control was to be applicable to the
entire militia of the several States. This important question, involved
in several distinct propositions, was referred to a grand committee of
the States.[263] It was by them that the plan was digested and
arranged by which Congress now has the power to provide for
organizing, arming, and disciplining the militia, and for governing
such part of them as may be employed in the service of the United
States, reserving to the States the appointment of the officers, and
the authority of training the militia according to the discipline
prescribed by Congress;[264]—a provision that was adopted by a
large majority of the States. The clause reported by the committee
of detail was also adopted, by which Congress is enabled to provide
for calling forth the militia to execute the laws of the Union,
suppress insurrections, and repel invasions.[265]
The next subject in the order of the report made by the committee
of detail was that general clause now found at the close of the
enumeration of the express powers of Congress, which authorizes
them "to make all laws which may be necessary and proper for
carrying into execution the foregoing powers, and all other powers
vested by this Constitution in the government of the United States,
or in any department or officer thereof."[266] Nothing occurred in the
proceedings on this provision which throws any particular light upon
its meaning, excepting a proposition to include in it, expressly, the
power to "establish all offices" necessary to execute the powers of
the Constitution; an addition which was not made, because it was
considered to be already implied in the terms of the clause.[267]
The subjects of patents for useful inventions and of copyrights of
authors appear to have been brought forward by Mr. Charles
Pinckney. They gave rise to no discussion in the Convention, but
were considered in a grand committee, with other matters, and
there is no account of the views which they took of this interesting
branch of the powers of Congress. We know, however, historically,
that these were powers not only possessed by all the States, but
exercised by some of them, before the Constitution of the United
States was formed. Some of the States had general copyright laws,
not unlike those which have since been enacted by Congress;[268]
but patents for useful inventions were granted by special acts of
legislation in each case. When the power to legislate on these
subjects was surrendered by the States to the general government,
it was surrendered as a power to legislate for the purpose of
securing a natural right to the fruits of mental labor. This was the
view of it taken in the previous legislation of the States, by which
the power conferred upon Congress must of course, to a large
extent, be construed.
Such are the legislative powers of Congress, which are to be
exercised within the States themselves;—and it is at once obvious,
that they constitute a government of limited authority. The question
arises, then, whether that authority is anywhere full and complete,
embracing all the powers of government and extending to all the
objects of which it can take cognizance. It has already been seen,
that, when provision was made for the future acquisition of a seat of
government, exclusive legislation over the district that might be
acquired for that purpose was conferred upon Congress.[269] In the
same clause, the like authority was given over all places that might
be purchased, with the consent of any State legislature, for the
erection of forts, magazines, arsenals, dock-yards, and other needful
buildings.[270] All the other places to which the authority of the
United States can extend are included under the term "territories,"
which are out of the limits and jurisdiction of any State. As this is a
subject which is intimately connected with the power to admit new
States into the Union, we are now to consider the origin and history
of the authority given to Congress for that purpose.
In examining the powers of Congress contained in the first article of
the Constitution, the reader will not find any power to admit new
States into the Union; and while he will find there the full legislative
authority to govern the District of Columbia and certain other places
ceded to the United States for particular purposes, of which I have
already spoken, he will find no such authority there conferred in
relation to the territory which had become the property of the United
States by the cession of certain of the States before and after the
adoption of the Articles of Confederation. If this power of legislation
exists as to the territories, it is to be looked for in another
connection; and although it is not the special province of this work
to discuss questions of construction, it is proper here to state the
history of those portions of the Constitution which relate to this
branch of the authority of Congress.
In the first volume of this work, I have given an account of the
origin of the Northwestern Territory, of its relations to the Union, and
of the mode in which the federal Congress had dealt with it down to
the time when the national Convention was assembled.[271] From
the sources there referred to, and from others to which reference
will now be made, it may be convenient to recapitulate what had
been done or attempted by the Congress of the Confederation.
It appears that during the preparation of the Articles of
Confederation an effort was made to include in them a grant of
express power to the United States in Congress to ascertain and fix
the western boundaries of the existing States, and to lay out the
territory beyond the boundaries that were to be thus ascertained
into new States. This effort totally failed. It was founded upon the
idea that the land beyond the rightful boundaries of the old States
was already, or would by the proposed grant of power to ascertain
those boundaries become, the common property of the Union. But
the States, which then claimed an uncertain extension westward
from their actual settlements, were not prepared for such an
admission, or such a grant; and accordingly the Articles of
Confederation, which were issued in 1777 and took effect in 1781,
contained no express power to deal with landed property of the
United States, and no provision which could safely be construed into
a power to form and admit new States out of then unoccupied lands
anywhere upon the continent. Still, the Articles were successively
ratified by some of the States, and finally became established, in the
express contemplation that the United States should be made the
proprietor of such lands, by the cession of the States which claimed
to hold them. In order to procure such cessions, as the means of
inducing a unanimous accession to the confederacy, the Congress in
1780 passed a resolve, in which they promised to dispose of the
lands for the common benefit of the United States, to settle and
form them into distinct republican States, and to admit such States
into the Union on an equal footing with its present members.[272]
The great cession by Virginia, made in 1784, was immediately
followed by another resolve, for the regulation of the territory thus
acquired.[273]
This resolve, as originally reported by Mr. Jefferson, embraced a plan
for the organization of temporary governments in certain States
which it undertook to describe and lay out in the Western territory,
and for the admission of those States into the Union. In one
particular, also, it undertook, as it was first reported, to regulate the
personal rights or relations of the settlers, by providing that, after
the year 1800, slavery, or involuntary servitude except for crime,
should not exist in any of the States to be formed in the territory.
But this clause was stricken out before the resolve was passed, and
its removal left the measure a mere provision for the political
organization of temporary and permanent governments of States,
and for the admission of such States into the Union. So far as
personal rights or relations were involved in it, the settlers were
authorized to adopt, for a temporary government, the constitution
and laws of any one of the original States, but the laws were to be
subject to alteration by their ordinary legislature. The conditions of
their admission into the Union referred solely to their political
relations to the United States, or to the rights of the latter as the
proprietor of the ungranted lands.
In about a year from the passage of this measure introduced by Mr.
Jefferson, and after he had gone on his mission to France, an effort
was made by Mr. King to legislate on the subject of the immediate
and perpetual exclusion of slavery from the States described in Mr.
Jefferson's resolve. Mr. King's proposition was referred to a
committee, but it does not appear that it was ever acted upon.[274]
The cessions of Massachusetts and Connecticut followed, in 1785
and 1786. Within two years from this period, such had been the
rapidity of emigration and settlement, and so inconvenient had
become the plan of 1784, that Congress felt obliged to legislate
anew on the whole subject of the Northwestern Territory, and
proceeded to frame and adopt the Ordinance of July 13, 1787. This
instrument not only undertook to make political organizations, and
to provide for the admission of new States into the Union, but it also
dealt directly with the rights of individuals. Its exclusion of slavery
from the territory is well known as one of its fundamental articles,
not subject to alteration by the people of the territory, or their
legislature.[275]
The power of Congress to deal with the admission of new States was
not only denied at the time, but its alleged want of such power was
one of the principal reasons which were said to require a revision of
the federal system. It does not appear that the subject of legislation
on the rights or condition of persons attracted particular attention;
nor do we know, from anything that has come down to us, that the
clause relating to slavery was stricken from Mr. Jefferson's resolve in
1784, upon the special ground of a want of constitutional power to
legislate on such a question. But Mr. Jefferson has himself informed
us, that a majority of the States in Congress would not consent to
construe the Articles of Confederation as if they had reserved to nine
States in Congress power to admit new States into the Union from
the territorial possessions of the United States; and that they so
shaped his measure, as to leave the question of power and the rule
for voting to be determined when a new State formed in the territory
should apply for admission.[276] It seems, also, that although the
power to frame territorial governments, to organize States and admit
them into the Union, was assumed in the Ordinance of 1787, the
Congress of the Confederation never acted upon the power so far as
to admit a State.[277] Finally, we are told by Mr. Madison, in the
Federalist, that all that had been done in the Ordinance by the
Congress of the Confederation, including the sale of lands, the
organization of governments, and the prescribing of conditions of
admission into the Union, had been done "without the least color of
constitutional authority";[278]—an assertion which, whether
justifiable or not, shows that the power of legislation was by some
persons strenuously denied.[279]
With regard to the powers of Congress, under the Confederation, to
erect new States in the Northwestern Territory, and to admit them
into the Union, the truth seems to be this. There is no part of the
Articles of Confederation which can be said to confer such a power;
and, in fact, when the Articles were framed, the Union, although it
then existed by an imperfect bond, not only possessed no such
territory, but it did not then appear likely to become the proprietor of
lands, claimed by certain of the States as the successors of the
crown of Great Britain, and lying within what they regarded as their
original chartered limits. The refusal of those States to allow the
United States to determine their boundaries, made it unnecessary to
provide for the exercise of authority over a public domain. But in the
interval between the preparation of the Articles and their final
ratification, a great change took place in the position of the Union. It
was found that certain of the smaller States would not become
parties to the Confederation, if the great States were to persist in
their refusal to cede to the Union their claims to the unoccupied
Western lands; and although the States which thus held themselves
back, for a long time, from the ratification of the Articles, finally
adopted them, before the cessions of Western territory were made,
they did so upon the most solemn assertion that they expected and
confided in a future relinquishment of their claims by the other
States. Those just expectations were fulfilled. By the acts of cession,
and by the proceedings of Congress which invited them, the United
States not only became the proprietors of a great public domain, but
they received that domain upon the express trust that its lands
should be disposed of for the common benefit, and that the country
should be settled and formed into republican States, and that those
States should be admitted into the Union. In these conveyances,
made and accepted upon these trusts, there was a unanimous
acquiescence by the States.
While, therefore, in the formal instrument under which the Congress
was organized, and by which the United States became a corporate
body, there was no article which looked to the admission of new
States into that body, formed out of territory thus acquired, and no
power was conferred to dispose of such lands or govern such
territory, there were, outside of that instrument, and closely
collateral to it, certain great compacts between the States, arising
out of deeds of cession and the formal guaranties by which those
cessions had been invited, and with which they had been received,
which proceeded as if there were a competent authority in the
United States in Congress to provide for the formation of the States
contemplated, and for their admission into the Union. Strictly
speaking, however, there was no such authority. It was to be
gathered, if at all, from public acts and general acquiescence, and
could not be found in the instrument that formed the charter and
established the powers of the Congress. It was an authority,
therefore, liable to be doubted and denied; it was one for the
exercise of which the Congress was neither well fitted nor well
situated; and it was moreover so delicate, so extensive, and so
different from all the other powers and duties of the government, as
to make it eminently necessary to have it expressly stated and
conferred in the instrument under which all the other functions of
the government were to be exercised.[280]
Such was the state of things at the period of the formation of the
Constitution; and as we are to look for the germ of every power
embraced in that instrument in some stage of the proceedings which
took place in the course of its preparation, it is important at once to
resort to the first suggestion of any authority over these subjects. In
doing so, we are to remember that the United States had accepted
cessions of the Northwestern Territory, impressed with two distinct
trusts: first, that the country should be settled and formed into
distinct republican States, which should be admitted into the Union;
secondly, that the lands should be disposed of for the common
benefit of all the States.[281]
Accordingly, we find in the plan of government presented by
Governor Randolph at the opening of the Convention, a resolution
declaring "that provision ought to be made for the admission of
States lawfully arising within the limits of the United States, whether
from a voluntary junction of government and territory or otherwise,
with the consent of a number of voices in the national legislature
less than the whole."[282] This resolution remained the same in
phraseology and in purpose through all the stages to which the
several propositions that formed the outline of the new government
were subjected, down to the time when they were sent to the
committee of detail for the purpose of having the Constitution drawn
out. Looking to the manifest want of power in the Confederation to
admit new States into the Union; to the probability that Vermont,
Kentucky, Tennessee (then called Franklin), and Maine,—none of
which were embraced in any cessions that had then been made to
the United States,—might become separate States; and to the
prospective legislation of the Ordinance of 1787 concerning the
admission of States that were to be formed in the territory northwest
of the Ohio, which had been ceded to the Union;—it seems quite
certain that the purpose of the resolution was to supply a power to
admit new States, whether formed from the territory of one of the
existing States, or from territory that had become the exclusive
property of the United States. The resolution contained, however, no
positive restriction, which would require the assent of any existing
State to the separation of a part of its territory; but as the States to
be admitted were to be those "lawfully arising," it is apparent that
the original intention was that no present State should be
dismembered without its consent. But in order to make this the
more certain, the committee of detail, in the article in which they
carried out the resolution, gave effect to its provisions in these
words:—"New States lawfully constituted or established within the
limits of the United States may be admitted, by the legislature, into
this government; but to such admission the consent of two thirds of
the members present in each house shall be necessary. If a new
State shall arise within the limits of any of the present States, the
consent of the legislatures of such States shall be also necessary to
its admission. If the admission be consented to, the new States shall
be admitted on the same terms with the original States. But the
legislature may make conditions with the new States concerning the
public debt which shall be then subsisting."[283]
In the first draft of the Constitution, therefore, there was contained
a qualified power to admit new States, whether arising within the
limits of any of the old States, or within the territory of the United
States. But in this proposition there was a great omission; for
although the States to be admitted were to be those lawfully arising,
and such a State might be formed out of the territory of an existing
State by the legislative power of the latter, yet it was not ascertained
how a State was "lawfully to arise" in the territory of the United
States. Nor was there, at present, any provision introduced into the
Constitution by which Congress could dispose of the soil of the
national domain. These as well as other omissions at once attracted
the attention of Mr. Madison, who, as we have seen, held the opinion
that the entire legislation of the old Congress in reference to the
Northwestern Territory was without constitutional authority. Before
the article which embraced the admission of new States was
reached, he moved the following among other powers:[284] "to
dispose of the unappropriated lands of the United States"; and "to
institute temporary governments for new States arising therein."
These propositions were referred to the committee of detail, but
before any action upon them, the article previously reported by that
committee was reached and taken up, and there ensued upon it a
course of proceeding which resulted in the provisions that now stand
in the third section of the fourth article of the Constitution.[285]
The first alteration made in the article reported by the committee
was to strike out the clause which declared that the new States
should be admitted on an equal footing with the old ones. The
reason assigned for this change was, that the legislature ought not
to be tied down to such an admission, as it might throw the balance
of power into the Western States.[286] The next modification was to
strike out the clause which required a vote of two thirds of the
members present for the admission of a State.[287] This left the
proposed article a mere grant of power to admit new States,
requiring the consent of the legislature of any State that might be
dismembered, as well as the consent of Congress. An earnest effort
was then made, by some of the members from the smaller States, to
remove this restriction, upon the ground that the United States, by
the treaty of peace with England, had become the proprietor of the
crown lands which were situated within the limits claimed by some
of the States that would be likely to be divided; and it was urged,
that to require the consent of Virginia, North Carolina, and Georgia
to the separation of their Western settlements, might give those
States an improper control over the title of the United States to the
vacant lands lying within the jurisdiction claimed by those States,
and would enable them to retain the jurisdiction unjustly, against the
wish of the settlers. But a large majority of the States refused to
concede a power to dismember a State, without its consent, by
taking away even its claims to jurisdiction. It was considered by
them, that as to municipal jurisdiction over settlements already
made within limits claimed by Virginia, North Carolina, and Georgia,
the Constitution ought not to interfere, without the joint consent of
the settlers and the State exercising such jurisdiction; that if the title
to lands unoccupied at the treaty of peace, lying within the originally
chartered limits of any of the States, was in dispute between them
and the United States, that controversy would be within the reach of
the judicial power, as one between a State and the United States, or
it might be terminated by a voluntary cession of the State claim to
the Union.[288]
The next step taken in the settlement of this subject was to provide
for the case of Vermont, which was then in the exercise of an
independent sovereignty, although it was within the asserted limits
of New York. It was thought proper, in this particular case, not to
make the State of Vermont, already formed, dependent for her
admission into the Union on the consent of New York. For this
reason, the words "hereafter formed" were inserted in the article
under consideration, and the word "jurisdiction" was substituted for
"limits."[289] Thus modified, the article stood as follows:—
"New States may be admitted by the legislature into the Union; but
no new State shall be hereafter formed or erected within the
jurisdiction of any of the present States, without the consent of the
legislature of such State, as well as of the general legislature."
This provision was quite unsatisfactory to the minority. They wished
to have the Constitution assert a distinct power in Congress to erect
new States within, as well as without, the territory claimed by any of
the States, and to admit such new States into the Union; and they
also wished for a saving clause to protect the title of the United
States to vacant lands ceded by the treaty of peace. Luther Martin
accordingly moved a substitute article, embracing these two objects,
but it was rejected.[290] A clause was then added to the article
pending, which declared that no State should be formed by the
junction of two or more States, or parts of States, without the
consent of the States concerned, as well as the consent of Congress.
This completed the substance of what is now the first clause of the
third section of the fourth article of the Constitution.[291]
Mr. Carroll thereupon renewed the effort to introduce a clause saving
the rights of the United States to vacant lands; and after some
modification, he finally submitted it in these words: "Nothing in this
Constitution shall be construed to alter the claims of the United
States, or of the individual States, to the Western territory; but all
such claims shall be examined into, and decided upon, by the
Supreme Court of the United States." Before any vote was taken
upon this proposition, however, Gouverneur Morris moved to
postpone it, and brought forward as a substitute the very provision
which now forms the second clause of the third section of article
fourth, which he presented as follows: "The legislature shall have
power to dispose of, and make all needful rules and regulations
respecting, the territory or other property belonging to the United
States; and nothing in this Constitution contained shall be so
construed as to prejudice any claims, either of the United States or
of any particular State." This provision was adopted, without any
other dissenting vote than that of the State of Maryland.[292]
The purpose of this provision, as it existed at the time in the minds
of the framers of the Constitution, must be gathered from the whole
course of their proceedings with respect to it, and from the
surrounding facts, which exhibit what was then, and what was
afterwards likely to become, the situation of the United States in
reference to the acquisition of territory and the admission of new
States. There were, then, at the time when this provision was made,
four classes of cases in the contemplation of the Convention. The
first consisted of the Northwestern Territory, in which the title to the
soil and the political jurisdiction were already vested in the United
States. The second embraced the case of Vermont, which was then
exercising an independent jurisdiction adversely to the State of New
York, and the case of Kentucky, then a district under the jurisdiction
of Virginia; in both of which the United States neither claimed nor
sought to acquire either the title to the vacant lands or the rights of
political sovereignty, but which would both require to be received as
new and separate States, the former without the consent of New
York, the latter with the consent of Virginia. The third class
comprehended the cessions which the United States in Congress
were then endeavoring to obtain from the States of North Carolina,
South Carolina, and Georgia, and in which were afterwards
established the States of Tennessee, Mississippi, and Alabama.[293]
These cessions, as it then appeared, might or might not all be made.
If made, the title of the United States to the unoccupied lands would
be complete, resting both upon the cessions and upon the treaty of
peace with England; and the political jurisdiction over the existing
settlements, as well as over the whole territory, would be transferred
with the cessions, subject to any conditions which the ceding States
might annex to their grants. If the cessions should not be made, the
claims of the United States to the unoccupied lands would stand
upon the treaty of peace, and would require to be saved by some
clause in the Constitution which should signify that they were not
surrendered; while the claims of the respective States would require
to be protected in like manner.
The reader will now be prepared to understand the following
explanation of the third section of the fourth article of the
Constitution. First, with reference to the Northwestern Territory, the
soil and jurisdiction of which was already completely vested in the
United States, it was necessary that the Constitution should confer
upon Congress power to exercise the political jurisdiction of the
United States, power to dispose of the soil, and power to admit new
States that might be formed there into the Union. Secondly, with
reference to such cases as that of Vermont, it was necessary that
there should be a power to admit new States into the Union without
requiring the assent of any other State, when such new States were
not formed within the actual jurisdiction of any other State. Thirdly,
with reference to such cases as that of Kentucky, which would be
formed within the actual jurisdiction of another State, it was
necessary that the power to admit should be qualified by the
condition of the consent of that State. Fourthly, with reference to
such cessions as were expected to be made by North Carolina,
South Carolina, and Georgia, it was necessary to provide the power
of political government, the power to admit into the Union, and the
power to dispose of the soil, if the cessions should be made; and at
the same time to save the claims of the United States and of the
respective States as they then stood, if the cessions anticipated
should not be made. None of these cases, however, were specifically
mentioned in the Constitution, but general provisions were made,
which were adapted to meet the several aspects of these cases.
From the generality of these provisions, it is held by some that the
clause which relates to "the territory or other property of the United
States," was intended to be applied to all cessions of territory that
might ever be made to the United States, as well as to those which
had been made, or which were then specially anticipated; while
others give to the clause a much narrower application.[294]
There now remain to be considered the restraints imposed upon the
exercise of the powers of Congress, both within the States and in all
other places; both where the authority of the United States is limited
to certain special objects, and where it is unlimited and universal,
excepting so far as it is narrowed by these constitutional restraints.
Some of them I have already described, in tracing the manner in
which they were introduced into the Constitution. We have seen how
far the commercial and revenue powers became limited in respect to
the slave-trade, to taxes on exports, to preferences between the
ports of different States, and to the levying of capitation or other
direct taxes. These restrictions were applicable to these special
powers. But others were introduced, which apply to the exercise of
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge


connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and


personal growth every day!

ebookbell.com

You might also like