0% found this document useful (0 votes)
9 views190 pages

Weide B.W. - Statistical Methods in Algorithm Design and Analysis (Thesis) (1978)

The thesis explores the application of statistical methods in the design and analysis of discrete algorithms, focusing on techniques such as randomization, ranking, and sampling. It discusses probabilistic approximation algorithms and their behavior, providing empirical results that suggest alternatives to traditional algorithms like Quicksort. Additionally, the work analyzes the use of order statistics in optimization problems and presents new algorithms for various computational challenges.

Uploaded by

pmast1234567890
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
9 views190 pages

Weide B.W. - Statistical Methods in Algorithm Design and Analysis (Thesis) (1978)

The thesis explores the application of statistical methods in the design and analysis of discrete algorithms, focusing on techniques such as randomization, ranking, and sampling. It discusses probabilistic approximation algorithms and their behavior, providing empirical results that suggest alternatives to traditional algorithms like Quicksort. Additionally, the work analyzes the use of order statistics in optimization problems and presents new algorithms for various computational challenges.

Uploaded by

pmast1234567890
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 190
Statistical Methods in Algorithm Design and Analysis Brute W. Weide Dept. of Computer Science Carnegie-Mellon University Pittsburgh, PA 15213 ‘August 1978 ‘Submitted to Carnegie-Mellon University in partial fulfillment of the Fequirements for the degree of Doctor of Philosophy Carnegie-Viellon University CARNEGIE INSTITUTE OF TECHNOLOGY AND MELLON INSTITUTE OF SCIENCE THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR 1:1 Decree or__Doctor of. Philosophy ame STATISTICAL METHODS IN ALGORTTHM DESTGN AND ANALYSTS - PRESENTED BY. Bruce ¥. Weide Computer Science ACCEPTED BY THE DEPARTMENT OF. Michael Sharos APPROVED BY THE COLLEGE COUNCIL Banged ey ox G22 8-78 Abstract The use of statisticat methods in the design and analysis of discrete algorithms is explorec, Among the design tools are randomization, ranking, sampling and subsampling, density estimation, and “cell” or “bucke!” techniques. The analysis techniques include those based on the design methods as well as the use of stochastic convergence concepts and order ctatisties, The intraduétory chapter contains a lilersture curvey and background shaterial on probability theory. In Chapter 2, probabilistic approximalion algorithms are discusced with the goal of exposing and correcting some oversichts in previous work. Some advantages of the proposed solution are the introduction of a homogeneous model for dealing with random problems, and a tet of methads for analyzing the probabilistic behavior of approximation algoritams which permit consideration of fairly complex algorithms in which there are dependencies aong the random variables in question. “Chapter 3 contains many uratul design and analysis tools such as those mentioned above, and several exansies of the uses of tke methods. Algorithms which run in linear expecied lime for 4 wide range of probabilstie assumptions about their inputs are given for problem: ransing from sorting to finding all nearest neighbors in a point set in dimensions. Empirical results are presented which indicale thal the sorting algorithm, Binsort, is a good aliernative to Quicksort uncer most conditions. There are also new algorithms for some selection and discrete optimization problems. Finally, Chapter 4 descrises the uses of resus from order stalislcs 10 analyze greedy aigoritims and to investaate the behavior of parall algorithms. Among the results Teported here are gereral Iheavens regarding the distribution of solulion values (or Optimization problems on weighled graphs. Many recent resuls inthe literature, which apply for certain distributions of edge weighs and for speclic problems, follow as immediate corollaries rom these general theorems. ; Full rete Glebov, and Perepelica [1976] cites similar Summary ‘This summiary of the thesis begins with the motivations for expanding some recent work in probabilistic and randomized algorithr-s, and a short literature survey. snces can be found in the thesis itself. Summaries of the :hree main chapters, follow. Only within the last few years have sarious attempts been made to investigate probabilistic models of algorithm behavior. One of the most revolutionary ideas, at least to computer scientists and mathematicians, is the notion offered by Karp [1976] and Rabin [1976] among others, that an algorithm need not get the exact answer for every input. While it is, of course, most desirable that an algorithm always get the correct soluticn, this is not necessarily the most cost-effective approach, since it is Widely believed that the NP-hard problems require exporential computing times. iculty. First, it is There are three options available to help overcome this possible for an algorithm 10 produce a good approximation all the time, an alternative which has been recogrized for quite a white. Karp [1976] credits Graham [1965] with pioncering such algorithms for NP-hard probiems. Even producing a guaranteed good approximation is NP-hard for certain proslems, though (see Garey and Johnson [19/8]. A second possiblity is for an algorithe: 10 get the exact answer mast of the time (Rabin (1976). Finally, an algorithm could produce a good approximation to the correct answer most of the time (Karp (1976). A short survey article by Gimady, {eas in the Russian mathematical literature. Uniortunately, there is now consi¢erable confusion regarding Vetinitiuns of certain key terms being used to describe probabilistic algorithms, such as Karp's [1976] algorithm for the Euclidean traveling salesman problem and Posa’s [1976] algorithm for the Hamiltonian circuit problem. That this confusion exists is undeniable, ‘ven though it is not mentioned in the literature. The source of the difficully seems to bbe that there are at least three non-equivalent probabilistic models (and associated definitions of the phrase “almost everywhere", but results obtained under one model are commonly cited in papers using a different one. Chapter 2 deals with this problem, showing the ‘relationships among the different modeis and how some proofs could be revised to take advantace of these relationships. Although there have been many mepectadctime analyses of discrete algorithms, | most suthors mate very restielve stuumptions abou! the sistribulions of input parameters. Two natable exceptions are Spica (1970) who assumes no patiuar distribution of edge weizhts for the shortest path problem, but merely ‘hat they are independent; and Genley and Shanos [1978], who make only a very weak techrieal assumption about the distribution of the points to prove good expected behavior of their planar convex hull algorithm. Hoare [1962] proposed that analysis of Quicksort could be freed assumption of equally likely input permutations simply by choosing the pa element at random, Nore recently Yuval [1975b], Rabin (1976} and Carter and Wegman [1977] have suggested thal this idea be used to design algorithms which perform well under s wide range of probabilistic assumptions. In Chapter 3 we Jue this trend, and chow how randomization and sampling can affect both the « design and analysiz of many algorithms. Chapter 4 includes new probabilistic madals for some problems which Car be analyzed by the use of order stalistics. Borovov (1962), Weide [1976] Golden [1977] Baudet [1978], and Robinson [1973] have previously mace use of such methods for computer sciznce problems. Some of the general theorems in,Chapter 4 have as corollaries 2 number of special results regarding the asymptotic behavior of the solutions to optimization problems on graphs. The previous cases have been analyzed on an ad hoc basis, depen of edge weights. Summary of Chapter 2: Stochastic Convergence and Probabilistic Algorithms Chapter 2 is 2 discussion of the analysis of “probabilistic approximation algorithms” using the concepts of stochastic convergence. Such an algorithm usually, but not necessarily always, produces the exact solution to a problem or at least a good approximate answer. We would like to be able to characterize a particular probabilistic approximation: algorithm by 2 statement of the form, “The algorithm produces an answer having relative error at most ¢ with probability at'least p." A good probabilistic approximation algorithes would have a small value for ¢ and p near one. Unfortunately, most problems far-which probabilistic” approxim: seems appropriate (such as the NP-hard problems) cannot be solved even in this weak sense of such an algorithm is by simple algoritisic. As a resull, the probabilistic anal typically complicated by the fact that certain steps of the algorithm are not ‘independent, and by the fact that the answer produced by the algorithm is not independent of the true ancwer. The latter correlation is, of course, desirable (otherwise we would be hard pressed to justify the procedure as an algorithm for solving the problem), but it contributes to making probabilistic analysis extremely” difficult in general. As an alternative, we follow Karp [1976] in proposing that the behavior of a, Prababitictic approximation algorithm be characterized by the stochastic convergence (to zero) of the seauence of errors in the answers it produces to a sequence of random problems. We chow haw to deal with deperdence among the relevant random variables, and introduce the’ notions of “strong” and "weak" success of algorithms to describe those which have error sequences which converge to zero almost surely and in probability, respectively. Tt is obvious that the probabilistic modal of the problem instances can be an important factor in determining haw “strongly” an algorithm succeeds in this sense. If edge weights 2 graph are chosen from a uniform distribution, for example, the algorithm might succeed strongly, whereas if they are chosen from 2 normal distribution the alzorithm might not work at all. This possibility is apparently recognized by everyone. It turns out that a much more subtle problem with’ the literature of the field, and we’ propose a deficiency. : The solution involves the distinction betwoen what we call the “incremental iffers only incrementally problem model", in which the n'" problem of the sequence from the previous onc, and the “independent problem model’, in which the a! problem Of the sequence is totally independent of the previous one. Our main result relstzs the problem models and modes of stochastic success, and demonstrates that strong success in the independent mcdel is a strictly stronger criterion than strong success in the incremental model, which is strict ly stronger than weak success in either model. Unfortunately, while thare are many prababilistic analyses of algorithms which demonstrate strong success, most of these apply only in the incremental model. After giving a review of many of the papers in this area in an attempt to illustrate the confusion which can arice if the distinction between problem models is not made ‘explicitly, we propose that this difference be recognized, and argue for adoplion of the independent problem model as the canonical basis for proving stochastic suecess of 1 probat approximation algorithms, Finally, we give a detailed analysis of Karp’s [1976] algorithm for the Euclidean traveling salesman problem, The main result here suggests that the algorithm succeeds strongly in the independent mocel, although that conclusion has yet to be proved. A long and rather detailed prosf of our theorem is included to demonstrate the use of Several techniques which seem to be of universal utility in dealing with such problems. Chapter 2 is by far the most difficult reading in this thesis, and its importance to subsequent recults rests primarily with the definitions of strong and weak success and the identification of the two problem models. The reader who is familiar with these concepts and understands the hierarchy of problem madeis and stochastic convergence should have no difficulty interpreting later results which refer to Chapter 2. ‘Summary of Chapter 3: Randomization and Sampling Chapter 3 contains many practical techniques and results. We begin with a classification of algoriths into “probabilis approximation algorithms” of Chapter 2, “randomized algorithiss", and all others. A randomized algorithm is non-deterministic in the ser that it may not perform exactly the same computation if given the same inputs. The non-determinism is the result of a randomization cr sampling step in the algorithm which is designed to zive the algorithm goad expected behavior over 2 wide range of input distributions. A natural correspondence between these algorithm classes and parametric and non-parametric statistics is introduced in order to suggest which stati al techniques may be most useful in designing certain types of algorithms. We then introduce four general techniques for designing randomized and ideas from statistics, These include probabil i¢ approximation algorithms using “randomization”, which is the process of shuffling the input prior -to running an algorithm in an attempt to achieve good expected running time regardleis of the input Permutation; “approximation in rank", which is useful in problems on totally ordered stimation and subsampling” for designing on algorithms; and the use of “empirical distribution functions” sets and for to extend the domain of good behavior of certain algcrithms from uniformly distributed technical con inputs to all distribution: satisfying cert ns, The thesis contains at least two examples of the use of each technique. An algorithm for sorting real numbers from a wide class of distributions in linear expected time is given. Empirical results show that the algorithm, Bincort, ie a practical alternative to Quizkzori when more than a few hundred items are to be sorted. We also include new on-line algorithms for selection problems which use very litle =: ace, but therefore are necessarily only approximate. Again, empirical recuits indicate that the approximations are bolter in practice than can be proved in theory. The technique: are alco chown to be effective in designing and analyzing algo and for geometrical problems. They lead iv new algorithms for some clocest point problems which run in linear expected time VY for a large class of point distributions. Other geomatrica! problems, such as finding the convex hull of 2 zet of points in the plane, can be solved in linear expected time by Using Binzort to do sorting. The expected running time of an algorithm for which the worst case is dominated by a sorting step can often be improved by this method. Summary of Chapter 4: Order Statistics s from the field of order st Some iquing re: ies are used in Chapter 4 to analyze the behavior of solutions to graph optimization problems and to compare these with the behavior of greedy algorithms for such problens, The results include 4 host of previous results in the literature as special cases. In particular, two theorems relate the value of the optimal zolution to 2 problem defined by an ebjective function on the edge weights to the exictence in a random graph of a subgraph satisfying the structural constraints of the problem. For instance, they relate the length of the ‘optimal traveling sitesmen tour in a randomly weighted complete graph to the existence of a Hamiltonian circuit in a random graph. There is also a discuscion of the rather surprising fact that a randomized aigorithm (as defined in Chapter 3) can, in theory at least, possibly be improved simply by starting up several instantiations of the same problem simultaneously and“*time- sharing* the computing resources among the different versions. We state a condition on the distribution of cunning times of a randomized aizorithm which, if satisfied, assures that the algorithm does nat have optimal expected running time. Finally, a few easy results from order statistics are used in the analysis of 2 schema for problem decompocition for azynchronous multiprocessors. The results are ‘extended from the cace of an ideal multiprocessor to one in which there is overhead ascociated with the scheduling and dispatching of tasks to the processors. Contents Acknowledgements 1. Introduction and Summary 1.1. Previous Work 1.2. Summary of Chapter 2: Stochastic Convergence, Probabilistic Algorithms 1.3. Summary of Chepter 3: Randomization and Sampling 1.4. Summary of Chapter 4: Order Statistics 1.5. Background Material 1.5.1, Notation S 1.5.2, Basie Probability Theory 1.5.3, Random Structures 1.6. Conclusions and Further Work 2. Probabilistic Algorithms and Stochastic Convergence 2.1, Stochastic Convergence 2.2. Random Problem Models 2.3. History of Confusion 2.4, Strong Success in the Independent Mode! 2.5. Example: Tha Traveling Salesman Problem 2.6. Conclusions 15 17 1-9 2-12 219 2-40 3. Randomization and Sampling 3.1. Classification of Algorithms 3.2. Classification of Statistical Procedures 3.3. Design Principles for Randomized Algorithms 3.3.1, Randomization 3.3.2. Approximation in Rank 3.3.3. Estimation and Subsampling 3.3.4, Empirical Gistribution Functions 3.4, Exsmples 3.4.1, Sorting and Searching 3.4.2. Selection 3.4.3. Discrete Opti 3.4.4, Geometrical Problems 3.5, Conclusions 4. Order Statistics 4.1. Expected Values and Asymptotic Cstributions 4,2. Examples 4.2.1. Greedy Algorithms 4,2.2, Parallelism 4.2.3. Problem Decomposition for Multiprocessors 4.3. Conclusions 5. References 32 34 3-29 3-30 9-39 351 358 375 6 4-18 4-23 4-30 Acknowledgements ‘The suggestion that there might be something more to statistics than “mere computation* was made by my thesis advisor, Mike Shamos, about three years ago. He proceeded to explore how the algorithm design tools of computer science could be ion to geometrical computations. Meanwhile, I examined the opposite approach: How could statistical tools help in the design and analysis of algorithms for computer science? I am happy to acknowledge that Mike is responsible for my interest in stalisties as well as algorithms. It is enough to ask that one's thesis advisor help wil technical matters, but for him to translate articles from the Russian originals is more than should be expected. Nevertheless, that is exactly what Mike did. He is also the source of most of the problem ideas and started me thinking about many of the solutions in this thesis, and I am proud to consider him my friend, Without the patience and assistance of the other members of my thesis committee, however, this work would still be a proposal. Jon Bentley provided many insightful suggestions regarding the algorithmic aspects, and more than he admits regarding the statistical ones. Bill Eddy put up with my constant questions about Probability theory «ra slatisiies, answering most of them immediately and spending considerable effort leading me in search of answers to the others, Bill offered many Particularly good ideas about the material in Chapter 2, and both he and Jon were always available for consultaticn about technical and non-technical problems alike. Jay Kadana and HT. Kung also made many important suggestions, without which I would still be trying to prove some of the theorems of Chapters 2 and 4. Of course, many other people helped with the technical problems and with my writing style. Al the risk of overlooking someone, I would especially like to thank Tom Andraws, Gerard Baudet, Kevin Brown, Peter Denning, Diane Detig, Therese Flaherty, Sam Fuller, Paul Hilfinger, David Jefferson, John Lehoczky, Takao Nishizeki, Larry Rafsky, John Robinson, Jim Saxe, Joe Traub, and Jay Wolf. Also, thanks to Lee Ccoprider, Reid, and Mark Sapsford for their assistance with CMU's marvelous document production facilities. In keeping with tradition, I suppose that I should "assume responsibility for any remaining errors in the msnuzsript, which I hereby do. However, I am confident that there are not too many left becauwe of the careful perusal of early dratts by several of these people. Generous financial support during my four years at CMU was provided by the National Science Foundation and by IBM in the form of graduate fellowships, and by my parents in forms too numerous to list here. Finally, ! would like to thank my family, especially my parents, Harley and Belly Jo Weide, and several close friends who helped make this experience 2 pleasant as well as an educational one, Ann Kolwitz, Jay and Ellen Wolf, Diane Delig, Dave and ‘Moddy McKeown, and Jon and Judy Rosenberg saved ma from overwork and boredom fon several occasions, 2s did the members of Turing's Machine, the Arpanels, the Jive Turkeys, and last but certainly not least, SIGLUMGH To all these people I owe a sincere debt of gratitude for their continuing friendship. 1. Introduction and Summary Until qui recently, research in the design and analysis of discrete algorithms has been devoted to the “worst-case” question: At worst, how bad is this algorithm? ‘The results produced by this effort ars of considerable intrinsic interest, and even of eeeasional practical value, but the label “pessimist is often attached to computer sclentists who pursue these issues. Inv ions of the “typical” behavior of algorril ims are*actually motivated more by pragmatism than by optimism, but they are frustrated by the difficulty of dealing with general probabilistic models. ‘The major point of this thesis is that certain parts of probability theory and ics, which are not really difficult to learn, provide valuable tools for the exploration of some practical issues in algorithm design end analysis. Thus, while many of the results reported here are apparently only of a theorstical nature, others are directly applicable to real-world situations. Although the contributions of this ‘work include these resulls, they constitute only @ minor part of the motivation for it: most are simply demonstrations that resuits can be produced using the proposed methods. The most ir:portant parts of this thesis are the introduction of « homogensous “1. The other side of the coin, namely how algorithm analysts can help statisticians, is examined by Shamos [1976} 1-2 model for random problems, which should help prevent the kinds of misinterpretations which have appeared in initial efforts to deal with probabilistic models; promotion of the idea that algorithms need nat always get the exact answer in order to be viable: ‘and introduction of long-ignored probabilistic and statistical tools to enable design and analysis of algorithms under very general probabilistic assumptions. The introductory chapter begins with a brief review of pravious work in the area, although most of the details are left for later chapters. Section 1.2 is a summary of Chanter 2, on stochastic convergence and probabilistic algorithms; Section 1.3 is a summary of Chapt: 3, on randomization and sampling; and Section 1.4 is a summary of Chapter 4, on the uses of order statistics. Section 1.5 introduces some basic material which is essential to developing the relationships between computer science and statistics. There is a description of notation and basic probability theory and a unifying concept of “random structures” which will ba used throughout the thesis. Finally, in Section 1.6 we mention some open problems and gresent saveral points which the reader should keep in mind as he reads the more technical material of later chapters. 1.1. Previous Work Only within the last few years have serious attempts been made to investigate probabilistic models of algorithm behavior. One of the most revolutionary ideas, at least to computer scientists and mathematicians, is the notion offered by Karp [1976] and Rabin [1976], among others, that an algorithm need not get the exact answer for every input. While it is, of course, most de ‘ablo that an algorithm always get the correct solution, this is not necessarily the most cost-effective approach, since it is widely believed that solving NP-hard problems exactly requires exponential computing time. 13 There are three options available to help overcome this difficully. Firat, it is Possible for an algorithm to produce 4 good approximation all the time, an alternative which has been recognized for quite a while. Karp [1976] credits Graham [1966] with pioneering such algorithms for NP-hard problems. Even producing a guarantead good approximation, however, is NP-hard for certain problems (see Garey and Johnson (1976). A second possibility is for an algorithm to get tha exzct answer most of the time (Rabin [1976). Finally, an algorithm could produce 2 good approximation to the correct answer most of the lime (Karp (1976). A short survey article by Gimady, Glebov, and Perepelice [1976] cites similar ideas in the Russian mathematical literature. . Unfortunately, there ix now considerable confusion regarding definitions of cert rms being used to describe probabilistic algorithms, such as Karp’s in key (1976] algorithm for the Euclidean traveling salesman problem and Posa’s [1976] algorithm for tha Hamiltonian circuit problem That this confusion exists is undeniable, evan though it is not mentioned in the literature. The source of the difficully seems to be that there are at least three non-equivalent probabilistic models (and associated defi fons of the phrase “almost everywhere", but results oblained under one motel are commonly cited in papers using 2 different one. Chapler 2 deals wilh this problem, showing the relationships among the different models and how some proofs could be revised to take advantage of these relationships. Although this may sound like Serious attack on the authors cr their results, it is just the opposite. The fact that some subtle problems have been overlooked in such pioneering efforts is not unusual trom a historical viewpoint. An opportunity to clear them up at their inception is too good to miss. Aithough there have been many expected-tima analyses of discral algor 14 most authors make very restrictive assumptions about the distributions of input paraz'sters, Two notable exceptions are Spira [1973] who assumes no particular distribution of edge weights for the shortest path problem, but merely that they are Independent; and Bentley and Shares [1978], ! make only a very weak technical essumption about the distribution of the points to prove good expected behavior of " planar convex hull algorithm. One of the many open questions in thi develop 2 problem model which allows dependence among probabilistic quantities, and then analyze it, which seems feasible since mild dependence is allowed by a variely-of statistical theorems, Hoare [1962] proposed that analysis of Quicksort could be freed of the assumplion of equally likely input permutations simply by choosing the partitioning slement at random, More recently Yuval [1975b] Rabin [1976] and Carter and Wegman [1977] have suggested that this idea be used to design algorithms w: perform well under a wide range of probabilistic assumptions. In Chapter 3 we continue this trend, and show how randemization and sampling can affect both the design and analysis of many algorithms. Chapter 4 includes new probabilistic models for some problems which can be analyzed by the use of order statistics. Borovkov [1962], Weide [1976] Golden [1977}, Baudet [1978] and Robinson [1978] have previously made use of such methods for computer science problens. Some of the general theorems in Chapter 4 have as corollaries a number éf special results regarding the asymptotic behavior of She solutions to optimization problems on graphs. The previous cases have been analyzed on an ad hoc basis, dependis of edge weights. 1s 1.2. Summary of Chapter 2: Stochastic Convergence and Probabilistic Algorithms Chapter 2 is 2 discussion of the analysis of “probabilistic approximation algorithms” using the concepts of stochastic convergence defined in Section 1.5.2. Such an algorithm usually, but not necessarily always, produces the exact solution to a problem or at least 2 good approximate answer. Wa would like to be able to characterize a particular probabilistic approximation algorithm by a statement of the torm, “The algorithm produces an answer having relative error at most € with probability at least p." A good probabilistic approximation algorithm would have 2 small value of € and p near.one. Unfortunately, most problems for which probabilistic approximation seems ‘appropriate (such as the NP-hard problems) cannot be solved even in this weak sense le algorithms. As a result, the probabil by ie analysis of such an algorithm Is typically complicated by the fact that certain steps of the algorithm are not independent, and by the fact that the answer produced by the algorithm is not independent of the true answer. The latter correlation is, of course, desirable (otherwise we would be hard pressed to justify the procedure as an algorithm for solving the problem), but it contributes to making probabilistic analysis extremely diffieult In general. As an alternative, we follow Karp [1976] in proposing that the behavior of a proba istic approximation algorithm be characterized by the stochastic convergence {to zero) of the sequence of errors in the answers it produces to a sequence of random problems. We show how to deal with dependence among the relevant random bles, and introduce the notions of “strong” and "weak" success of algorithms to 16 describe those which have error sequences which converge to zero almost surely and {In probability, respectively. It Is obvious that the probabilistic model of the problem instances can be an important factor in determining how “strongly” an algorithm succeeds in this senso. If ‘edge weights in # graph sre chosen from 2 uniform distribution, for example, the algorithm might succeed slrongly, whereas if they are chosen from @ normal Gistribution the algorithm might not work at all. This possibility is apparently recognized by everyone, It turns out that a much more subtle problem with probabilistic madels has gone unnoticed in the literature of the field, and we propose scheme to correct this defici ney. The sclution involves the distinction between what we call the “incremental problem model", In which the a! problam of the sequence differs onl rementally trom the previous one, and the “independent problem model", in which the n'M problem of the sequence is totally independent of the previous one. Theorem 28 is our m: result relating the problem models and modes of stochastic success, and demonstraies that strong success in the independent model is a sts tly stronger criterion than strong success In the incremental model, which is strictly stronger than weak success in either model. Unfortunately, of the many probabilistic analyses of algorithms whi demonstrate strong success, most apply only in the incremental model. Section 23 gives a review of many of the papers in this area in an attempt to illustrato the contusion which can it the distinction between problem models is not mado explicitly. We therefore propose that this difference be recogrized, and argue for ‘adoption of the independent problem model as the canonical basis for proving stochastic success of probabilistic approximation algorithms. 17 Finally, in Section 25 we give a detailed analysis of Karp’s [1976] algorithm for the Euclidean traveling salesman problem. Theorem 2.9 suggests that the algorithm succeeds strongly in the independent model, although that conclusion has yet to be Proved. The long and rather detailed proof of Theorem 2.9 is included to demonstrate the use of several techniques which seem to be of universal utility in dealing with such problems. Chapter 2 is by far the most difficult reading in this thesis, and its importance to subsequent results rests primarily with the definitions of strong and weak success and the identification of the two problem models. The reader who is familiar with these concepts and understands the hicrarchy described in Theorem 28 should have no difficulty interpreting later results which refer to Chapter 2. 1.3. Summary of Chapier 3: Randomization and Sampling Chapter 3 contains many practical techniques and results. We begin with 2 classification of algorithms into “probabilistic approximation algorithms” of Chapter 2, ‘randomized algorithms”, and all others. A randomized algorithm is non~deterministie in the sense that it may not perform exactly the same computatic if given the same inputs. The non-determinism is the resuit of a randomization or sampling step in the algorithm which is designed to give the algorithm good expected behavior over a wide range of input distributions. A natural correspondence between these algorithm classes and parametric and non-parametric statistics is introduced in order to suggest which statistical techniques may be most useful in designing certain types of jorithms. n for the title of the thesis. We introduce Section 3.3 is part of the jus four general techniques fc: designing randomized and probabilistic approximation ‘algorithms using ideas from statistics. These include *randomization”, which includes as a special case the process of shuffling the input prior to running an algorithm in an attempt to achieve good expected running time regardless of the input permutation; “approximation in rank", which is useful in problems on totally ordered sets and for timation and subsampling” for designing probabilistic discrete optimization probiems; approximation algorithms; and the use of “empirical distribution functions” to extend the domain of good behavior of certain algorithms from uniformly distributed inputs to all distributions satisfying certain technical conditions. In Section 3.4 we present at least two examples of the use of each technique. class of distributions in linear An algorithm for sorting real numbers from a “expected time is given in Section 3.4.1. Empirical results show that the algorithm, ical alternative to Quicksort when more than a few hundred items are Binsert, is a pra 22 be sorted. We also include new on-line algorithms for selection problems which use very little space, but therefore r@ necessarily Only approximate. Again, empirical sults indicate that the approximations are better in practice than can be proved in theory. The techniques are also shown to be effective in designing and analyzing algorithms for discrete optimization problems and for geometrical problems. They lead to new algorithms for some closest point problems which run in linear expected time for a large class of point distributions. Other geometrical problems, such as finding the convex hull of a set of points in the plane, can be solved in linear expected time by using Binsort to do sorting. The expected running time of any algorithm for which the worst case is dominated by a sorting step can often be improved by this method. 1-9 1.4. Summary of Chapter 4: Order Statistics ‘Some intriguing results from the field of order statistics are used in Chapter 4 to analyze the behavior of solutions to graph optimization problems and to compare these with the behavior of greedy algorithms for such problems. The results of Section 4.21 include a host of previous results in the literature as special cases. In al solution to 2 problem ular, Theorems 48 and 49 relate the va'ue of the opti P defined by an objective function on the edge weights to the existence in a random graph of a subgeaph satisfying the structural constraints of the problem. For instance, they relate the tength of the optimal traveling salesman tour in « randomly weighted complete graph to the existence of a Hamiltonian circuit in a randem graph. There is “also a discussion of the rather surprising fact that a randomized proved simply igorithm (as defined in Chapter 3) can, in theory at least, possibly bi by starting up several instantistions of the same problem simultaneously and “t fon the distribution of running times Gl a randomized algorithm which, if satisfied, assures that the algorithm does not have optimal expected running time. Finally, a few easy results from order statistics are used in the analysis of 2 schema for problem decomposition for asynchronous multiprocessors. The results are extended from the case of an ideal multiprocessor to one in which there is overhead associated with the scheduling and dispatching of tasks to the processors. 1.5. Background Material This section contains a summary of notation, definitions, and elementary probability theory which will be used throughout the remaining chapters. I! is intended 1-10 only to provide a basis for the models and terminology which we will propose and use later, Most of the new concepts are defined when they arise naturally in later chapters, so only common terms and ideas are reviewed here. While much of the discussion may seem unnecessarily formal, subsequent issues will be much easier to identity with this foundation. 1.5.1. Notation When dealing with asymptotic behavior of functions, our’ notation will essentially follow that used by Knuth (1976} Spe fie notations are available to describe the relationships between functions f{n) and g(n), all of which are based on the behavior of the ratio f(n)/g(n) for all sufficiently large values of n. We say that Ke) = ofgin) iff fend/etn) + 0. f(a) = OXgin)) iff nd/etn) $ ¢ for some constant c. (a) = Mein) iff f(ndfeln) z € for some constant ¢ > 0. Ale) © G(gin)) iff fin) = OXgind) and Kn) = Agim). Another possi ¥1 f(a /g(n) - co, is not specifically accounted for by this notation, but it turns out that it would be especially useful to have some way of indicating this behavior. By symmetry, the correct choice would seem to be {(n) = wig(n)). Rather than adopt this non-standard notation we will simply say that f(n) grows faster than (nif this condition is satisfied. Other notati is essentially standard For example, x+A* means that x approaches A from above, and F(x") is the limit of F(z) as z approaches x from below. 1 Most of the other terminology commonly used in analysis of diterte algorithms is ako used here; for example, log means logarithm to the base 2 See Weide [1977] for ‘similar conventions and deseriptions of most of the problems which will be examincd here. 1.5.2. Basic Probability Theory Perhaps the major problem which plagues attempts to use probability and statistics in diverse applications areas is the limited degree to which these ideas are typically developed. One of the goals of this thesis is to define clearly the problem models being analyzed and to attempt to put previous work on a sound footing. The basis of the probabilily theory we will neod is the probability space. (2BP). The set 9 is called the sample space, and consists of elements w € called sample points. B is a g-field, or g-algebra, or Borel field, of {2 which means that it Ise class of subsets of 9 which is closed under complementation and countable union (and as a result of these two, also under countable intersection). Finally, P is a probability, axioms: measure; that is, P salisfies the fol 1) PO) #1. (2) PRE) 2 0 for every EE B. a u Ee z PEE,) whenever En E, = $ (the null set) for every i j. Given a probability space (M58)? we define the infinite product space (Q,8,P) in which 2 = Nyx Ox, and where B is the usual e-field and P the usual 1-12 product measure there (see Halmos [1950] for more details). A sample point & € is ‘an Infinite sequence (wy, W,,—) where w, € My Such a space turns out to be very Useful in several respects, and every space (9,8) will hereafter be assumed to be this ini fe product space. This is very important, since some lemmas and theorems will not make sense in a general probability space. Random variables are measurable real-valued functions on the sample space 9. A random variable X has 2 distribution function F(x) « P{ w: X(@) s x } 2 A distribution funetion is right-continuous (.e, Flx*) = F(x) for all values of x, but may not be left continuaus at a countable number of points where it has jump discontinuities, Since F(x) is non-decreasing, it has an inverse My) = if x: Fx) 2 y } (see Chung (1974). The events Ey Eq &, are totally independent itt P(A €)= TT PIE). The ¢ events Ey Eq —» &, are flay independent it P16) = TT PE sequence of random variables (X,} will be said to be independent if no two of the functions (X,} depend on common components of a sample point i. This is a non- ‘standard definition of independence of random variables, but clearly, whenever {E,} event: Involving, respectively, the independent random variables {X,}, the events {E,} are totally independent. The random variables (X,} are identically distributed If the distribution of X, does not depend on n. If {X,} have the same distribution as another random variable X, we write X,~ X. Random variables which satisfy both these conditions are independent and identicaily distributed (iid) 2. Even though P is a function, it is customary to omit the parentheses around its argument, which Is an event or set and is usually delimited by braces. 1-13 ‘The expected value of a function of a random variable X, say g(X), is denoted E(g(X)). It is defined to be Setner whenever Sleooisr0o exists, where F is the distribution function of X The mean, or expected value of X is simply EO) The variance of X, denoted D(X), is just E((X - £0X))2) = £0X2) ~ EO, The normal distribution Fox) « (2ma7¥2(™ expl-t = wF/t2eAdt is aiven the special name R(y.c2). A random variable X having this distribution has mean wand veriance @2, By a slight abuse of notation, this fact will be cenoted X ~ 9(j,02). Ot primary importance in later chapters will be stochastic convergence of a ‘sequence of random variables. The sequence of random variables (X,} converges almost surely, or almost everywhere, or with probability one, to X (written X, 55 X) whenever P{ i: lim X,(03) = Xs) }= 1. Similarly, the sequence {X,] converges in robability, or in measure, to X (written X,+peX) whenever, for all €>0, lim P{ a: K(@) ~ XC) <€} = 1. Finally, (X,} converges in distribution to X (written Xq tg X) whenever F(x) = PL G:X,(G) sx } converges to F(x) = P{ a: X(G) s x } at all continuity points of F.2 To illustrate these concepts with an example of ari infinite product space, we 3. Convergence in distribution is actually a property of the sequence {F,} s0 that the random variables (X,} nead not be defined on the same space. This technical point Is of no real concern io us, since all random variables used here will be defined on the same space "I described below. cite perhaps the most useful instance of all. The sample space 2, is the set of reals 05 w, <1, Fis the usual Borel field on [0,1), and Py is the usual Lebesgue measure. In the infinite product space, @ is a sequence of real numbers, each of which Is between © and 1. Furthermore, Plu, $x} =x for 0 sx <1. In more common terms, each w, uniformly dist nd a sequence © consists of independently uted between 0 and 1, uniformly distributed components. This space is so sper I that we will call It by its ‘own name, Wf. By moans of the co-call ilty-integral transformation, it Is possibts to define a random variable X having any given distribution function F (not necessarily continuous) by using the probability space %f, and the functional inverse of F. Briefly, because F is increasing, Fix) = P[XSx}= P{ FOX) SFO) ) Letting X= FXe) = inf{ x: Fx) 2 « } for some i gives Plu, 5 F(x) } = Fle), which is satisfied exsctly when 4, is uniformly distributed between O and 1, as it is in the space 3 This principle is used by simulation systems to generate random numbers from an exponential distribution, for example, by computing a function of random numbers from the uniform infinite product space is the elegant manner in which these t: need to talk about balls in urns, or other combinatorial or procedural structures, in order to define or understand such diverse topics as independence and stochastic 115, convergence. Even better, such special models can be defined within the probability- ‘space model in a natural way, 2 fact which will enable us to see clearly the sources of difficulty in a number of misinterpretations which have recently appeared in the literature. 1.5.3. Re: ndom Structure random variables are functions on , they may be defined in arbitrarily complex ways, including functional composition. Specifically, X(3) may be based on an intermediate random structure. The structure is determined by the argument @, and then the final value of X is determined by the structure, In this section, we will ‘examine some random structures based on the space % which arise in computer sclence problems. The simplest such structure is a natural extension of the idea of a random variable. An ordered list of random variables is a random vector, or random point.4 ‘Suppose that we were interested in tha distances from the origin of points uniformly distributed in the unit cube. Our final random variable X might be defined as X(@) = (of + uf + a)”, For this problem, the random vector is “hidden” by the fact that it Is supposed to be uniformly distributed in the cube. More generally, if we were 10 distribution, then we would use X(ja) = interested in points from 3 given multivar Cf + 1 + 29% where (xy 961g) Is a random vector determined by transforming 4.. This should not be confused with a sample point . 1-16 ‘some components of @ to produce the desired distribution of points. Random vectors are useful in probabilistic models of problems from geometry, mathematical programming, polynomial arithmetic, etc, Another structure wi frequently appears in computer science problems is the random permutation, We typically would like a model under which each of the possible permutations of n objects is equally likely. A random variable of possible Interest is the number of comparisons required to sort n elements of @ linearly ordered set, whose expected value we wish to compute under this probability model. It is usually easy to think in procedural terms when generating random structures (see Sedgewick [19771 for more about permutations). In this case, the permutation m Gafined by i can simply be thal ordering of the integers {1,2-n} for wich ty, < Ong <— < One ‘The most complex of the random structures which we will encounter are fandom graphs (see Erdos and Renyi (1959] and Erdos and Spencer (1974). A classical problem from random graph theory is the question of connectivity: What is the probability that a random labelled graph with n vertices and m edges is connecte-? We can di a random variable X of the 0-1 type (0 if the graph is not cernected, 1 If it is connected), and find its expected value, assuming that every labelled graph with 1 vertices and m edges is equally likely.> Again, a procedural description of X is easy. Consider the first (9) components of @ to be numbere “12 “1 1 zs “tn 2a 14 - rn 5.. Other definitions of random graphs are possible. See Chapter 4. 1-17 and so forth, through 1, If y Is the m'"-smallest of these components, then we simply let edge (ij) be present in the graph iff a, y. Now X is 0 if the resulting graph is not connected and 1 if it is Slight extensions of ordinary random graphs random directed graphs and random weighted graphs. The latter are especially useful in médeling certain mathematical programming problems, since the weights may be assigned to vertices, or edges, or both, and may have arbitrary distributions. Up to this point, we have not made use of the fact that & is infinite- dimensional, since each random structure depends on only a finite number of components of &. However, it is easy to use & to define sequences of candom structures, and thereby sequences of random variables for which we can explore the properties of stochastic convergence. There are at least two different ways of the defining such sequences, one of which "re-uses it components af @ to determine each structure, and the other of which “discards” the used components.© ‘The structures, and hence the random variables, described by the former method are ‘not Independent, whereas those defined by the latter methed are independent, Random variables based on these two different sequences of random structures can exhibit different mades of stochastic convergence. difference, as far as I can deter 6.1 ine, has been overlooked by virtvally everyone using sequences of random structures, but is very important. Chapter 2 explains the ramifications of the distinction. 1-18 1.6. Conciusions and Further Work Several statistical techniques are shown to be useful in the design and analysis ‘of algorithms for computer science problems. The techniques illustrated here, however, are only the simplest used by statisticians. In the case of sampling, for instance, much more sophisticated schemes than the random sampling used in our algorithms can be devised, It remains to be seen hich advanced statis I tools can be profitably applied te algorithm design and analysis. In addition to this general observation, there are several other questions of varying degrees of importance which could be explored in an extension of this work or which remain as open problems. The following is a partial list which the reader may keep in mind aé he continues through Chapters 2,3, and 4. (2) There are several problems for which it is known that finding an approximate ‘olution with bounded relative error is NP-complete, Is there any problem for which it can be demonstrated that finding an approximate solution with bounded relative error almost surely is NP-complete, In some reasonable probabilistic model? (2) Prove the mi ing companion to Theorem 2.9 and show that the result of Beardwood, et al. [1959] holds almosi surely in ihe independent’ model. Such a proof would, according to Theorem 2.9(b), assure that Karp’s [1976] original algorithm for the TSP succeeds strongly in the independent problem model. (2) Extend the techniques of Section 3.3.4 from distributions of bounded random Variables to a more general class of distributions. This looks somewhat easier than it probably is because the present form of the algorithm “scive* relics on the fact that the expected number of items in any bin is bounded by @ 1-19 constant, and this might not be true if the minimum and maximum elements could wander off abritrarily far. (@) Develop a technique for proving lower bounds in a probabilistic setting. (The work. of Yao [1977] seems important here.) As a starting point, prove that an on-line selection algorithm displaying the behavior described in Thzorem 9.14 must use at least as much space as the procedure “median_es (5) Show how to prove non-trivial lower bounds in a model of computation which includes the floor function. (6) Pertorm some experiments to evaluate the technique proposed in Section 2.4.3 for ‘A-opt heuristics. (7) Extend the results for geometry problems in the plane to higher dimensions. Some of them generalize very naturally, but others do not. In particular, the algorithm for constructing the Voronoi diagram does not seem to extend naturally to three or more dimensions and continue io run in linear expected time. (B) Using Lemma 44 as a starting point, derive the three possible forms of limit distributions for extreme values of random variables. The classical approach to this problem (see David [1970] for a description) uses a very elegant argument to show what limiting distributions are possible, but does not make use of the result of Lemma 44 Because of its simplicity, this lemma seems like an appropriate candidate for the seed of an alternative proof. (9) Find 2 companion to Theorem 48 which gives an almost sure lower bound on the value of the optimal sotution. (10) Exhibit an uncontrived algorithm for » real problem which can be improved by 1-20 Using the method of Section 42.2. In the event that one is found, suggest an extension to an existing programming language which permits the programmer to have control over the scheduling of parallel tasks. (11) Test the conclusions of Section 423 for a real problem on 2 ree! mulliprocessor. Pr Fy measurements of an integer programming code running on Cmmp Inspired the investigaticn of this problem In the first place, and tend to confirm the conclusions. However, each problem required such an enormous amount of ‘computer time thal no statistically significant results were ever obtained. (12) Find the variance of the solution time for the randomized algorithms presented here or, even better, the distribution of solution times. The results of Sections. 422 and 42.3 argue that knowing the distri ight be useful In ways which are not immediately evident. (13) Give mere examples of the uses of any cf the statistical techniques suggested in this thesis. (14) Suggest other algorithm design and analysis techniques based on statistical concepts. 22 ‘8 sequence of random variables (X,} converges almost surely! to the random varlable X (, a5 X) Itt PL Gi lim X,(@) = XG) } = 1 Similarly, the sequence (X,} converges in, 2 to X iff, for every € > 0, tim Pf a K,(a) ~ X(N <€} = 1. From both an intuitive and a practical standpoint, it is profitable to view stochastic convergence in terms of individual sample points & The sequence of random variables (X,} converges almost surely to X if the set of sample points for which the sequence of real numbers {X,(j)} converges to X() has probability one. It converges In probatiity i, for every €>0, x, is within € of XG) fr cals of sample points whose probabilities approach one as n~ co. In the Jatter case, it is possible that the sequence of reals (X,(a)} does not converge to X(G) for any sample point i Consider a case where X, are O-1 random variables, and X is identically 0. If X, 4g X then for each sample point @ in a set of probability one, the sequence {X,(a)} can have only finitely many "ones". Far X, “pr % however, “ones” may continue to appear occasionally (io, infinitely often) in every sequence {X,(G)}, although for most such sequences they cannot appear too frequently. . ‘An example of a sequence which converges in probability but nat almost surely 1. The analogous concept in real function theory uses the phrase “almost everywhere”. Since we are dealing with probabil lity, though, the terms “almost urely* and “with probability one” are preferable. 2. Again, this terminology is preferred to the phrase “in measure" for our purposes. 23 Is provided by X,(@) = I if @, < I/n and X, = 0 otherwise. It Is clear that “ones* can appear infinitely often in this sequence, but it is not obvious that this actually happens for a set of sample points of positive measure. However, Lemma 22 shows that convergence of this sequence is not almost sure, so that the probability that the sequence does nol converge is strictly positive. On the other hand, the probability of a “one appearing in position n tends to 0 as + aa, so the sequence does converge in probability. It Is clear from these intuitive pictures that almost sure convergence implies convergence LEMMA 2.1 - TEX, yg X then X, tye probability, and this can be proved rigorously. Prost - Ses Chung [1974] sage 66.0 It is easier in practice to prove convergence in probability, directly from its definition, than it is to prove almost sure convergence. Fortunately, there is an alternate characterization of almost sure convergence, provided by LEMMA 2.2 - (Borel-Cantelli Lerma) - rt z= PUK, -XI>€) is finite for every €>0, then X, a5 X If X, tas X and the {%,] are independent, then =z PUR, -XI> €) Is finite for every « > 0. Proof - See Chung [1974} pages 79-78. 0 From the definition of convergence in probability and the Gorel-Cantelli Lemma, we can prove roy EMMA 2.3 - Let (%,} be independent. (2) X, “pr Kill, for every € > 0, PL My = X1> €} = (1) (b) Let logn be the it® iterated logarithm of n (that is, let 1ogn = and fog"n = tog logYny, and tet f,(n) = (TT es yt, with to) 3 €20 and every K20. If, for every €>0, P(K,-XI>€) = Te Xytgg % then PLD, =X1>€} = offind) for every OK (n}4logny-5) far any b 20 and any 8 > 0, then X, 55 X Proof - Part (a) is just a restatement of the defi of convergence in probebifity. Pert (b) follaws from the Boret-Cantelli Lemma, as follows. Nole first that >, in) diverges for sit kz 0, since Df) = atlog’"n), which * & can be proved by comparing the sum to the integral fiektx. Thur, it & PUK, - X1> €) is to be finite, Pf PK, ~ XI > €] must approach zero faster than f,(n) for every k2 0. On the other hand, D. fy(nrilognyt+8) is finite for every k 2 0 and every § > 0, which can also be proved by considering the corresponding integral (see, Hardy [1924). This demonstrates statement (b) of the lemma. Lemmas 2.1 and 23 clearly illustrate that %,-+,,X is a (possibly strictly) stronger statement than X, ~tpp X Even soy it is not strong enough vhat we can always draw even the seemingly innocent coniusion that E(X,) + E00. 25 ‘LEMMA 2.4 ~ 1 Q%} are uniformiy bounded by a constant, then X, -tpp X implies EK, XI)+0 for every p>a. la general, however, even X, as X does not imply that EX, ~ XIP) + 0. Proof - The first part of the lemma follows directly from Theorem 4.1.4 of Chung [1974] An example of 2 case where (X,) are not uniformly bounded can be constructed using the space 4. Let X,(@) = 0 if w, > 1/n?, and X,(@) = 2" if @, 1/n2, Applying the Borel-Cantelli Lemma, we see that X, +4, 0, but E(X,|P) = 29P/n? + eo for allp>0.0 Another somewhat non-intuitive aspect of stochastic convergence Is identified bby the following lemma, which uses the difference between identical random variables Lot (Z,} be independent, and let h(n) be an integer-valued function which is non-decreasing and unbounded. Define {X,} 2s a sequence Of random variables with the property that X, Zap Define {¥,) ss a sequence of independent random variables with the properly that ¥,~ Zar G2) IF, gg Z then X, ag Z (©) IZ, 445 Z then Y, py 2 but it is not necessarily true that Yq as 2 26 Proof - We may assume without loss of generality that Z = 0 by considering the random variables Z, ~ Z, X, - Z, and Y, ~ Z To prove part (a), note thal for each &, {Z,(@)) and (X,(G)} have a commen subsequence, call it (C,(@)}. The sequence (X,(@)} contains only the terms {C,(@)}, possibly repeated, depending on h of course. In particular, we have X,(a) Ca) for 1snsmy where m, = min{n: h(n) > h(1)}; simitarly, X,(B) = C,(G) for m, him,)}; and so forth, Now if Z,(@) + 0, then C,(G) + 0, since {C,(a)} is a subsequence of (Z,(3)}. As we have seen above, C,(ai) +0 implies that X,(@) + 0, Therafore, X,() + 0 whenever Z,(@) +0, and the latter occurs for every w in some set of probability one, Thus, X15 0 Part (b) is somewhat different, since the only relationship between {¥,} and {Z,} Is that PLY, sx} = Pl Zy,) 5 * }: Since (Z,} are independent, Lemma 2.3 applies, showing that Pf [Z,|> € } + 0 for every €> 0, Hence, P{ I¥,|> € } + 0 since hin) is non- decreasing and unbounded, proving that Y, tp, 0. As a counterexample to the claim that Y, 4, 0, use H to define 2,(3) = 0 it @, > 27 and Z,(@) = 1 if a, $27, and let hin) = log nl + 1. For all €>0 we have PLIZ,[> €}5.2™, so by the Borel-Cantelli Lemma, Z, +55 0. On the other hand, for O€} 2 1/2n). Again, by the Borel-Cantelli Lemma, {Y,} does not converge almost surely. 0 In the computer science literature, convergence concepts have been tied to 27 algorithm behavior in the sente of probabilistic approximation. Typically, » sample point & describes 2 sequence of random problems (basz¢ on random structures such as those introduced earlier), one of each size n for nz 1. The random vi ble X, is the error produced by the aigorithm on a problem of size nj in particular, X,(@) Is the ‘error when the algorilhm is applied to the problem of size n specified by ©. Under these circumstances, we will say that the algorithm succeeds strongly if X, gq 0, and that it succeeds weakly if X, typ 0. The terms “strong” and "weak" are used by analogy to the strong and weak laws of large numbers. The following twa lemmas are helptul in proving stochastic success. The first is used when X, ic the absolute error, and the second when il is the relative error. EMMA 2.6 - (Absolute error lemma) ~ () It Yytpp 2 and Zapp b for constants a and b, then Yat pra tb. Ab) IF Yy gg 4 and Z, gs b, then Y, + 2, tgs a +b. Proot - This is a standard result, a stronger version of which is proved in Chung [1974] but the proof is instructive because it demonstrates a common technique for pr conclusions. For part (a), we must show that, for every ne 5 fixed €> 0, P{ IY, *Z,-a-b1>€} +0. PLY, +Z,-a-bl>e} s PL,~al+iZ,-bl>€) S PL Ng- al? €/2 or [2,-b1> 6/2) Ss PUN,-al> 6/2} + PLIQ-b1> 6/2} 28 Since Yq pr 2 and Z, “pr b, both terms on the right-hand side of the last Inequality tend to zero as + ca, which proves part (a). Part (b) is proved In exactly ne before each of the probabilities. Each of the sums converges because Y, ty, @ and the same way, with the only difference being the insertion of summation Treg oO LEMMA 2.7 - (Relative error lemma) = G2) IE, pp € > 0 and Z, apy 6 then (Yq = 2) 1 Yy tgp O- Ab) IF Yy tag € > 0 and Z, gg €, then (Yq = Z,) 1 Yq tag O- Proat ~ Again, the proof of part (b) follows exactly the same pattern as that for part (2) so we will prove only convergence in probability. We must show that PL KY, ~ 2) / Yel? €} + 0 for every fixed €> 0. PLU, 2) Yel? = PUY, - 6-2, - /Y> EF S PCH, =e) / Yul © MZ,- 0d Yel > 5 PEW / Yl? 12 oF [Zy= Yel? 6/2} 5 PLIY,-O/%I> 2} + PUI -a/¥> <2) ‘To show that of the probabilities on the right-hand side of the last inequality both tend to zero, we ill concentrate on the first one. The proot for the other is similar. Let 0<8 8) + PLKY,- 0 /Y,1> 2 | N,-el58) 29 PUR, - ets 8) S PLN, e1>3) © PCWY-eD/YI> 72 | My el s8} Ss PLIY,-c1>8) + PLIV,~ > ec-8/2 } The last step follows because Y, is positive and is, in fact, no smaller than ¢ -& Since both of these probabilities approach zero, the lemma is proved. 0 ‘As an example of an algorithm which succeeds in each of these modes using absolute error, consider the problem of finding the arithmetic average of independent observations from (0,1), 2 normal distribution with mean 0 and variance 1. Our probabilistic approximation algorithm will simply be to choose the first Fn) observations from the original n and average them. Assume that r(n) = ofa). Using the space 41, we may think of the components of a sample point @ as being numb in a triangular array, with n elements in row n Row n then determines a sample? of size n through a suitable transformation of the components Wy Let X,(@) be the yronce between the average of all n observations determined by row n, ¥,(@), and the average of the first r(n) observations, Z,(0}. It follows. immediately trom known properties ofthe normal distribution that ¥, = S2(0 ,n°#) and Z~ (0, etny4). 3. A sample is a group of observations; an observation is a specific value of « fandom variable. In this case, a problem of size n Is # sample consisting of observations which are to be averaged. 2-10 A minor complicstion arises in determining the distribution of X, = Y, = 2, since Y, and Z, are not independent. We may not conclude, therefore, that X, ~ R(0, n“! + (a), which would be the case if %, and Z, were Independent. This problem will come up again in many of the algorithms we would like to investigate. Fortunately in this instance, X, can be written as a linear combination of Z, and the average of the remaining n-r{n) observations, which are independent. This leads to the conclusion that X, ~ 10 , (1-rind/ni%tt eine] Ka-r(a))). at PL KL > € } = Oeln)Yexpl-€190, $0 Xo, 0. In this case, the algorithm succeeds weakly. If (0) grows faster than log n then, by Lemma 23, X,-t50 and the algorithm succeeds strongly. It is in fact the case that these growth conditions are necessary for weak land strong. success, respectively. The reason that r{n) must be unbounded for weak success is clear. It must grow faster than logn for strong success because DX ciny-2expt-c2r(n)/2) is finite for all ¢> 0 iff that condition it met. We could have proved the sufficiency of the conditions on r(n) more easily using Lemma 2.6, the absolute error lemma. Knowing the distributions of Y,, and Z, we an show that the conditions given above are sufficient for Z, pp 0 and Zy tag 0, respectively. Since Y, a5 0 in any event, we may conclude that X, pr 0 If rin) grows ‘without bound, and that X, +g, 0 if rin) grows faster than log n. What we gain by not 2a having to worry about the independence of Y, and Z, is offset by the fact that we cannot say that the growth condition on r(n) is necessary. Furthermore, we know nothing about the distribution of X, if we take the easy way out. Nevertheless, it is g00d to have a technique available for proving stochastic success which can easily deat with dependent random variables. This example illustrates two prime objections to the characterization of probal ie approximation algérithms by the “strength” of their succes. In the first plac ven strong success is only of theoretical value. We could let r(n) = 1 forn sC land r{n) = login for n> C, and the above algorithm would succeed strongly for any C, evan if C were larger than the size of any problem we might encounter in practice. For this reason, the approach does not enable us to conclude that an algorithm is provably praciicai. Secondly, neither version of success provides any means of determining, even asymptotically, how well an algorithm works as a function of n. The only guidelines available are those provided by Lemma 23, which may be entirely too pessimistic. We theretore have no basis for comparison of two strongly successful algorithms, or even for one strongly successful and one weakly successful algorithm, unless the weakly successful one can be shown not to succeed strongly. Even then the weakly since both charact ‘successful algorithm might be preferred in practi ions are only asymptotic. Notice that these are fundamentally different objections to the probabilistic approach than the ususi ozs: thet sons 32: re not only reasonable, but at ied, yet a strongly successtul algorithm can remain impractical. Similarly, the objections are stronger than the typical argument against asymptotic analysis in general, which is that the conclusions are not valid except for very large values of n. In cantrast to the “big-0° notation, for example, there is no way to make any meaningful quantitative stalement even about asymptotic behavior. These notions of probabilistic success are therefore too weak to allow us to draw any serious conclusions regarding the practical value of an algorithm, although they do permit comparisons among algorithms, at we will see in Section 2.2. Nevertheless, due in part to the impetus which such analysie has already received from people in the algorithms area, these basically theoretical descriptions of probabilistic approximation algorithms are here to stay. It is important that @ firm and ‘comprehensible basis for their study be established while there are not too many di Hlerent results which depend on one another in a complex fashion. The goal of the next section is to point out certain difficulties which can arise, and to introduce ‘random problem madel which can serve as the foundation for further work. In Chapter ‘3 we will return to the objections to stochastic convergence concepts as descriptions of success, and present some alternatives which partially overcome these arguments. 2.2. Random Problem Models ‘There are at least two possible reasons for wanting to deal with random problems rather than with fixed ones. The first, examined in some detail by Vajda [1972] for mathematical programming problems, is thst the problem data may be Uncertain; that is, the data may be random perturbations of the true values of the Parameters (due to measurement errors, for instance). Among the interesting questions 243 In this scenario are such things as how to optimize the expected value of @ linear program knowing only the joint distribution of the objective function coefficients, and not their actual values. On the other hand, i! may be that we would like to solve many instances of some type of problem, and have no prior knowledge of what these instances might be other than distributions of certain characteristics. Suppose that XYZ Company has thousands of customers throughout the United States from which it receives requests for service at random times, but has only one serviceman who is sent out every time 100 requests have accumulated. If XYZ wants to solve the resulting 100-city traveling ‘salesman problems, company management might be interested in knowing how much extra travel cost would be incurred, on the average, by using some approximat sigorithm rather than a (very costly!) exact algorithm. Whatever the possible practical applications, random problems are consi bby algorithm analysts in order to make, probabilistic statements similar to those made bby statisticians about statistical procedures. These statements include not only the expected resource consumpticn of an algorithm and the classical questions of computer science, but more recently have centered on the probability that an algorithm will ‘achieve a specified small error. It is, of course, valuable to keep in mind the possible real-world applications of results, but in large part the models considered for analysis must be quite simple in order to be mathematically tractable. In this section, we examine problem models with respect to their suitability for discussion of stochastic success of algorithms. ‘A candom problem is defined by (1) a random structure, such as set of random points or a random graph; and (2) some function of the structure, such as the mean 21a distance between the points or the chromatic number of the graph. In some cases, @ “solution” to a problem may require not only the value of this function, but other information at well. For example, the solution to a traveling salesman problem demands, ‘an actual tour of the points and not just the length of the tour, The function value Provides the basis for comparison between the exact solution and an approximation. For dealing with stochastic success, we require sequences of random problems. A sequence can be generated in many ways from # sample point @, but we will study only two. ‘The first possibility, which we call the incremental problem model, operates as follows. It R.(@) is the nt problem determined by &, and R, is fully specified by k(n) components of a sample point, then R,(@) depends on the components 635, Wy Ohiqy Specifically, R,(@) is generated “incrementally” from R_.i(@) by some process 2 random depending on Wxia-tjey»—> S_iqy If the underlying problem structure vector, R,(cs) might be an n-vector generated by appending one new coordinate to the (n-L-vector R_(i3). If it is a random graph, the incremental change might be the addition of a new vertex and scme edges incident to it, with all the edges of the previous graoh unchanged. A second possibilty is the independent problem model, Contider the sequence to be numbered as in a triangular array, with Kin) elements in row n, where k(n) it defined as above. In this case, R,(G) depends on the components Was Oz —» Onala) This means that R,, is totally independent of Ry, begin generated by all new defining 215 components. If the underlying problem structure is a random vector, (js) is defined by creating all new coordinate values, none of which is necessarily the same as its counterpart in R,.(G). Similarly, if it is a random graph, an entirely new graph is created.4 With these two problem models, and two possible modes of stochastic success orithm, there are four cases to consider. The following theorem describes the of an relationships between them. ‘THEOREM 2.8 - (a) If an algorithm succeeds strongly in either model, then it succeeds: weakly in that model, but not necessarily vice versa, (b) If an algorithm succeeds weakly in one model, then it succeeds weakly In the other model. (c) If an algorithm ‘succeeds strongly in the independent problem model, then it succeeds strongiy in the incremental model, but not necessarily vice versa. root - Figure 2.1 shows schematically how the four possibilities are related, Trroughout the proot, tet X= (yey tp thoy) be the error of the algorithm in the incremental problem model, and let Y,(G3) = f(a Onz 1 Opyfq)) be the error in the independent problem model. The difference between X,, and Y,, is that between successive problems, but not strictly incremental generation. Such models have apparently not been used In the literature and appear to be of no value in this context. 216 strong success strong success. independent model incremental model a ne cece eeet S&S paeeenatnet FIGURE 2.1 - Problem models and stochastic convergence. they are the same function of different components of &, so that the random variables {%} are not indepandent, whereas {¥,} are independent. Part (2) of the theorem is clear, since X,-*a, 0 implies that X, tp, 0, and er similarly for {Y,} It not the case that X, pp O implies that X,y5 0, which is illustrated by the following example. Suppose that Kin) =A and that X(@) =0 if iq) > L/n, and 5:,(0) = 1 if wg.) $ L/n. Although X, ->p, 0, the Borel-Cantelli Lemma shows that the convergence is not almost sure. The same example suttices for {Yq}, proving part (a)5 ‘The proof of part (b) is a direct consequence of the fact that X,~ Yq which is 5. The reader may note that it is difficult to imagine 2 useful algorithm with an error that depends in this way on the problem being solved. Nevertheless, such an error funet le, which is all that is required. 217 true because both X, and Y,, are obtained by applying the same function to arguments which are identically distributed. Thus, although X, and ¥,, afe not equal, depending 2 they do on different components of a sample point, P(X, sx} = P{ ¥,Sx } If one of les P{ K,1>€) or Pl ¥,1>€) tends to zero, then s0 does the other one, equal, This demonstrates part (b) of the theorem. By far the most interesting point of the theorem is part (c), which shows the differance between the independent problem model and the incremental problem model. The implication is an easy consequence of the Borel-Cantelli Lemma and the fact that X,~Y,, Since {¥,} are independent, >, Pl %,l>€] is finite for every €>0, Therefore, PL Ky1>€) fe frite for ll €>0, which means that X, 59 0, ever though (X,} are not independent. This argument fails to prove the implication in the other direction, since the lack of independence of (X,) prevents us from concluding anything about the convergence of >, P{ X,|>€}. However, a very instructive example shows that the Feverse implication is sometimes false. This will be the key to exposing the underlying problem which has not been noticed so far in the literature on probabil approximation algorithms. Suppose that X,(@) = k(ny” a where Z,~ (0,1) is determined by iS Wp and Kin) is non-decreasing and unbounded. Similarly, let ¥,(@) = wort Bemis 218 where 2, ~ 2(0,, 1) is determined by «1, ,6 ‘According to the strong law of large numbers (see, for example, Chung [1974] ‘Theorem 5.4.2) the sequence S, = mia tag 0- Since X, = Sip by Lemma 25, Stag 0 mpl 3s that X,-t45 0 On the other hand, {Y,} are independent, with Ya ~ Suny Lemma 25 says that {Y,} may not converge almost surely, which is in tact the case for this particular sequence. Gecause Y,~ (0, Kiny"#), we know that X PC 1> €} is finite tor all €>0 iff Kin) grows faster than log n. The choice Kin) = Liog nd + 1, for example, wii result in Xq +g almost bbut {¥,} will nat conver surely. This example completes the proof of part (c). 0 ‘The proof of the last part of Theorem 28 makes an important observation about the strong law of large numbers. At first glance, it may seem slightly amazing that under the simple assumption that (Z,} are iid. and have a finite mean, S, = Kt DS 2, +45 # whenever Kin) is non-decreasing and unbounded. Closer inspection a! reveals that because {S,} are partial sums, if the terms of 2 particular sequence {S,(@)] ever get close to zero, they are not likely to deviate greatly from zero thareatter. For the usual case Kin) =n, this is true because S,= (1-n)S,. + 717, , 50 that as n+ co, the contribution of the “incremental” summand Z, gets rapidly ‘smaller. In this sense, then, it is less of a surprise that S, +g, 0, beccuse the random variables {5,} are defined in the ‘neremental model. 6 . John Lehoczhy suggested that the proof be simplified by letting Z, and Z,, be 2. Stochastic Convergence and Probabilistic Algorithms Three types of stochastic convergence of random variables were defined in Chapter 1. Two of these, amost sure convergence and convergence in probal pare important in describing the (stochastic) success of probabilistic approximation algorithms. By a probabilistic approximation algorithm we mean an sigorithm which usually, but not always, produces the exact solution or @ good approximate answer to a problem, We might characterize such an algorithm, and our knowledge of it, by dividing the class into three categories: algorithms which usually got the exact answer, those which usually get within some error criterion ¢, and those which have a certain ‘error distribution F,. Most of the algorithms we will consider are of the second type, although it is desirable but only occasionally possible to find the error distribution. ‘Tho convergence concepts themselves sre first examined in some detail, after which we discuss their applications to specific probabilistic approximation algorithms such as those described by Karp [1976} 2.1, Stochastic Convergence Although convergence in distribution is used in some proofs we will need, the other two modes of stochastic convergence are of more immediate interest. Recall that 2-19 tral limit theorem, leads to a An extension of the proof given here, using the version of the strong law of large numbers for the independent problem model. The only differnnce is that two added conditions (that Z,, have finite variance, and that kia) grow faster than toga) are used to prove almost sure convergence of (aor Dz, ey 2.3. History of Confusion Theorem 2.8, part 4 model ard the indess convergence, in I not vice versa, 1 surely" and tale considerable con The fact 1 by computer scientiN has apparently not bew the esence of the different framework of the Lindeberg-Fel® R (see Chung [1974], pages 196-214), where “tranguiar a PM sriables are introduced. In weaker versions of the central limt iheorem (Chung [1974} page 169) Ihe partiat ‘sums 5S, of the previous section are uscd exclusively. While this difference is nated in ed. 219 ‘An extension of the proof given here, using the central limit theorem, leads to = version of the strong law of large numbers for the independent problem model. The only difference is that two added conditions (that 2,, have finite variance, and that ‘(n) grow faster than log n) are used to prove almost sure convergence of {ator 3a 2.3. History of Confusion ‘Theorem 2.8, part (c), explains the difference between the incremental problem model and the independent problem model: strong success, or generally almost sure convergence, in the independent model implies the same in the incremental model, but not vice versa. In this section, we will review the use of such phrases as “almost surely” and “almost everywhere” in the relevant papers, and show that there Is considerable confusion not only between problem models, but even between almost sure convergence and convergence in probabil. ‘The fact that the difference between the problem models has been overlooked by computer scientists who are applying probability theory is understandable, since it thas apparently not been explicitly pointed out in these terms before. Nevertheless, the essence of the difference is recognized by standard probability theory texts in the framework of the Lindeberg-Feller version of the central limit theorem (see Chung (1974}, pages 196-214), where “triangular arrays” of random variables are introduced. Sn weaker versions of the central limit theorem (Chung [1974], page 169) the pi ‘sums S, of the previous zection are used exclusively. While this difference is note: Passing, it is not emphasized. In a very difficult paper, Beardwood, Halton, and Hammersley [1959] prove that

You might also like