0% found this document useful (0 votes)

12 views19 pages

Fitting To The Power-Law Distribution

Uploaded by

Souhayb Zouari Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views19 pages

Fitting To The Power-Law Distribution

Uploaded by

Souhayb Zouari Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Fitting to the Power-Law Distribution

Michel L. Goldstein, Steven A. Morris, Gary G. Yen

School of Electrical and Computer Engineering, Oklahoma State University, Stillwater, OK 74078
(Receipt date: 02/11/2004)

This paper reviews and compares methods of fitting power-law distributions and methods
to test goodness-of-fit of power-law models. It is shown that the maximum likelihood
estimation (MLE) and Bayesian methods are far more reliable for estimation than using
graphical fitting on log-log transformed data, which is the most commonly used fitting
technique. The Kolmogorov-Smirnoff (KS) goodness-of-fit test is explained and a table
of KS values designed for the power-law distribution is given. The techniques presented
here will advance the application of complex network theory by allowing reliable
estimation of power-law models from data and further allowing quantitative assessment
of goodness-of-fit of proposed power-law models to empirical data.

PACS Number(s): 02.50.Ng, 05.10.Ln, 89.75.-k

I. INTRODUCTION

In recent years, a significant amount of research focused on showing that many

physical and social phenomena follow a power-law distribution. Some examples of these
phenomena are the World Wide Web [1], metabolic networks [2], Internet router
connections [3], journal paper reference networks [4], and sexual contact networks [5].
Often, simple graphical methods are used for establishing the fit of empirical data to a
power-law distribution. Such graphical analysis can be erroneous, especially for data
plotted on a log-log scale. In this scale, a pure power law distribution appears as a straight
line in the plot with a constant slope.
The pure power-law distribution, known as the zeta distribution, or discrete Pareto
distribution [6] is expressed as:

1
k −γ
p (k ) = (1)
ζ (γ )

where: k is an integer usually measuring some variable of interest, e.g., number of links
per network node
p(k) is the probability of observing the value k;
γ is the power-law exponent;
ζ(γ) is the Riemann zeta function.
Without a quantitative measure of goodness of fit, it is difficult to make final
conclusions about how well the data approximates a power-law distribution. Moreover, a
quantitative analysis of the goodness of fit enables the identification of possible
interesting external phenomena that could be causing the distribution to deviate from a
power-law. In some cases the underlying process may not actually generate power-law
distributed data, but outside influences, such as data collection techniques, may cause the
data to appear as power-law distributed. Quantitative assessment the goodness-of-fit for
the power-law distribution can assist on identifying these cases.
In the remainder of this paper, Section II discusses the methods for fitting a power
law. Section III presents two candidate goodness-of-fit tests of a power-law distribution,
the Kolmogorov-Smirnov test, and the χ2 goodness-of-fit test. Section IV illustrates the
application of fitting and goodness-of-fit testing to analysis of a series of collections of
journal papers. Finally, Section V presents conclusions about the problem of fitting
power-law distributions and discusses some possible further analysis that can be
implemented.

II. FITTING POWER-LAW DISTRIBUTIONS

Many methods exist in the theory of parameter estimation that can be used for
estimating the exponent of the power-law distribution [7]. This section overviews three
methods, namely maximum likelihood estimation (MLE), Bayesian Estimation, and
linear regression-based methods.

2
In some cases the head of the distribution may deviate from a power-law, while the
tail appears to be a power-law. A good example of this is the distribution of outbound
links on a webpage [8]. Most have few links only, but some do have a larger amount of
links, especially pages that give a list of interesting pages, also called hubs [9]. On a log-
log plot, the number of outbound links in the tail appears to be linear, suggesting a
“power-law tail.”
It is necessary to have a strict definition of a power-law tail, and define estimators and
tests for this distribution. It is important to note that the tail usually contains only a small
fraction of the data. Thus, no statistical methods may be available to accurately estimate
the power-law exponent, or even determine that the distribution has a power-law tail. An
analysis of this uncertainty was recently performed by Jones and Handcock [10]. The
scope of this paper is limited to analyzing the fitting of power-law functions and to an
entire distribution and applying goodness-of-fit tests for validation and comparison. A
deeper analysis for tail distributions would require first an analysis of the basis of a tail-
only distribution and is beyond the scope of this paper.

A. Maximum likelihood estimation (MLE)

MLE is often used for estimating the exponent of a power-law distribution [6]. It is
based on finding the maximum value of the likelihood function:

N
xi−γ
l (γ | x ) = ∏
i =1 ζ (γ )
(2)
N N
L ( γ | x ) = log l ( γ | x ) = ∑ ( −γ log xi − log ζ ( γ ) ) = −γ ∑ log xi − N log ζ ( γ )
i =1 i =1

where: l(γ|x) is called the likelihood function of γ given the data x

L(γ|x) is the log-likelihood function.
The log-likelihood is used because it simplifies the calculation and, because the log
function is a monotonically increasing function, which does not disturb the point where
the maximum is obtained. This maximum can be obtained theoretically for the zeta
distribution by finding the root of the derivative of the log-likelihood function:

3
N
d 1 d
L ( γ | x ) = −∑ log xi − N ζ (γ ) = 0
dγ i =1 ζ (γ ) dγ
(3)
ζ ′ (γ ) 1 N
⇒− = ∑ log xi
ζ ( γ ) N i =1

where: ζ ′ ( γ ) is the derivative of the Riemann Zeta function.

A table with the value of the ratio ζ ′ ( γ ) ζ ( γ ) can be obtained in [11] or values can

be generated on most modern mathematical and engineering calculation programs.

Calculation of MLE is very fast and robust; however it only offers a single estimate
of the power-law exponent without information to define the confidence interval of that
estimate. In order to deal with this deficiency, a Bayesian estimator, discussed below, can
be used.

B. Bayesian estimation
A Bayesian estimator, derived from MLE, differs from MLE in the meaning of the
parameter estimate [12]. In a Bayesian approach, the unknown parameter is not a single
value, but a distribution, called the posterior distribution. Moreover, this approach
incorporates what is known about the values of the parameter before any data is analyzed,
speeding up convergence and assuring that the final estimate is constrained to values that
are deemed as reasonable. This is done by the definition of a prior distribution p(θ). The
final posterior distribution is given by a normalized multiplication of the prior and the
likelihood. For the power-law distribution:

−∞ −∞

The choice of the prior, as mentioned, relates to the range and, possibly, the
distribution in this range. Moreover, the prior also defines possible discretization of the
estimated parameter. A discrete prior always generates a discrete posterior distribution.

4
Another good feature of the Bayesian estimation process is that it is naturally
adaptable to iterative estimation, where one sample or a subset of the samples is analyzed
at a time. This is particularly useful if it is interesting to analyze the influence of each of
the samples, and if the amount of memory available for implementation is low (the only
thing that has to be stored is the posterior at each step).

C. Linear regression-based estimators

For power-law exponent estimation, linear regression is an often used estimation
procedure [13]. Different variations of this technique are all based on the same principle:
a linear fit is made to the data that is plotted on a log-log scale. Actually, with reasonable
accuracy, the linear fit can be made by hand on a log-log plot of the distribution.
However, the linear fitting does not take into consideration that almost all of the data
observed is on the first few points of the distribution. For example, for an exponent, γ, of
3.0, 93.6% of the data is expected to have k=1 or k=2. Therefore, an estimation method
that does not incorporate this fact will fit to the “noise” in the tail, where very few
observations occur [14].
Because of this, two modifications to direct linear fitting were proposed: 1) the use of
only the first 5 points for regression and, 2) the use of logarithmically binned data. The
first variation is straightforward to implement. The second variation is based on adding
all values that fall into bins that are logarithmically spaced (same size in the logarithmic
scale), and then performing the linear regression on the log of the quantity of these
groups of data. This method is similar to binning methods used for curve estimation [1].
The advantage of this method is that, by grouping the data points the noise is reduced.
The reduction of noise is dependent on the size chosen for the bins. However, this
method only generates a graph that is approximately linear even when the distribution is a
power-law, as can be seen in Figure 1 for “log2” bins, i.e., bin boundaries at (0, 1, 2, 4,
8, ...). The linearization error decreases the estimation accuracy. More importantly, the
slope obtained is not directly the value of the power-law exponent. This can be observed
by plotting the exponent of the power-law distribution and the slope obtained by
simulating this distribution and using the method explained above. Figure 2 shows this

5
plot for “log2” bins. Approximating the relation to a line, the following transformation
equation was obtained:

b = 1.094 ⋅ γ − 0.963 (5)

where: b is the measured exponent

γ is the actual exponent.
Another common method of linear estimation is of using 5 bins per decade. Using the
same method, the transformation equation obtained using 10 bins (2 decades) was:

b = 1.026 ⋅ γ − 0.931 . (6)

The most important observation about the linear fitting methods is that they are not
tied to the definition of a probability distribution. Thus, the integration of the fit line may
not be unity. By using the slope given by this fitted line and forcing an adjustment to the
intercept in order for the fitted function to be a probability distribution, the final function
may end up visually distant from the empirical distribution. Examples in Section IV will
assist on illustrating this behavior.

III. GOODNESS-OF-FIT TESTS

In statistical analysis, many methods for assessing the goodness-of-fit of a

distribution have been proposed. Among these methods, the most commonly used for
general distributions are Pearson’s χ2 goodness-of-fit test, and the Kolmogorov-Smirnov
(KS) type test.

A. Pearson’s χ2 goodness-of-fit test

Pearson’s χ2 test is the most commonly used test for large samples. It was introduced
by Pearson in 1900 [15] and is defined as the following test when hypothesizing for a
specific distribution:

6
( Oi − Ei )
2
C
Q=∑ ~& χ C2 −1 (7)
i =1 Ei

where: C is the total number of classes

Oi is the observed value related to class i
Ei is the expected value of class i.
When the distribution is independent on the data, the number of degrees of freedom
of the χ2 distribution is, as shown in Equation (7), C-1. However, when the distribution in
the hypothesis has some parameters that are estimated using the same data to which the
test is going to be applied, the number of degrees of freedom decreases [16]. More
specifically, the number of degrees of freedom of the χ2 distribution is C-s-1, where s is
the number of parameters that were obtained from the data. This decrease in the number
of degrees of freedom assumes that the parameters were obtained using MLE method.
Using other methods may cause this number do decrease even more. Thus, it is only
possible to apply the χ2 test when MLE is performed. For MLE, the degrees of freedom
used for testing for the power-law distribution is C-2.
Another important decision for the χ2 test is on the number of classes to use. Later
analysis of the χ2 test has shown that the test is not valid when the expected value of the
quantity in any of the classes is less than 5 [16]. Therefore, it is necessary to sum all
values from the tail of the distribution into a class whose total expected value is greater
than 5. For example, in a dataset with 5,000 samples and a γ of 3.0, there would be 10
classes; a class for integers 1 to 9 and a tail class for all points whose frequency is greater
than 9.
Nicchols [17] points out that the need for all class expected values to be greater than 5
is a rule of thumb, it has a purely heuristic reason and is the main criticism researchers
have about using this test. For example, another possible solution would be to use the
smallest classes possible for the tail, i.e., instead of grouping all tail values into one class,
they would be grouped into classes such that each of the class has an expected value as
close to 5 as possible. By choosing this heuristic solution, which does not have conflicts
with any of the test assumptions, the χ2 test statistic may vary considerably. Because of
this, most analyses tend to employ the Kolmogorov-Smirnov test.

7
B. Kolmogorov-Smirnov-type test
The KS-type test has recently been applied to testing goodness-of-fit when total
sample size is small. The test is based on the following value:

K = sup F * ( x ) − S ( x ) (8)
x

where: F*(x) is the hypothesized cumulative distribution function

S(x) is the empirical distribution function based on the sampled data.
Kolmogorov [18] first supplied a table for the quantiles of this distribution for the
case where the probability function was independent on the data points. However, when
there is dependence, other tables must be used. This limitation was not taken into
consideration by Pao and Nicholls in their application [17, 19] of the KS test to power-
laws. Without correcting for this factor, the KS test gives a rejection rate lower than what
is expected [20].
Lilifoers later introduced tables for using the KS test with other distributions, such as
normal and exponential [21, 22]. These tables were obtained using a Monte Carlo
method, which is based on the generation of a large number of distributions with random
parameters and calculating the test statistic for each of the test cases. From these tests,
empirical values for the quantiles can be extracted. The same procedure was used to
obtain these values for the power-law distribution. For each of the logarithmically spaced
sample sizes, 10,000 power-law distributions were simulated, with random exponent
from 1.5 to 4.0. Statistics were collected from these simulations to generate the KS table,
shown in Table I. This table was created assuming that the estimation method used was
the MLE. If other estimation methods are used, it would be necessary to construct a new
KS table.
A step by step example of how to apply the KS test for determining the goodness-of-
fit to a power-law is presented in the next section.

8
IV. EXAMPLES

First a simple example will be given on how the KS table can be used for determining
the goodness-of-fit of an empirical distribution to a power-law distribution. Using data
from a small collection of 131 papers and 359 authors that cover the topic of MEMS RF
switches, the process of using the KS goodness-of-fit test for the papers per author
distribution follows four steps:
1) Use the MLE method for estimating the power-law exponent. In this case, the
estimated exponent was 2.76.
2) Generate the hypothesized cumulative distribution F*(x) using the cumulative
sum of equation (1) and build a table showing side by side the values of F*(x)
and S(x) where there were values observed in the dataset. This table is shown
in Table II.
3) Calculate the absolute difference between each pair of values and find the
maximum. This is the KS test statistic. The absolute differences can be seen in
Table II. The value in bold is largest difference for this dataset.
4) On the table in the Appendix, the largest value, 0.0313, is compared with the
values in the row with the closest number of points. For a more conservative
approach, where it is better to accept the hypothesized distribution when there
is a doubt, the row with the lower number of points should be used. In this
case, use the row for 100 points. This row shows that for 90% of the cases
when the distribution was a power-law, the KS statistic was 0.0580 or below.
The maximum observed KS statistic for this example was much lower than
this. In other words, the p value, or Observed Significance Level (OSL) is
greater than 10%. Thus, using a confidence level of 5%, there is no statistical
evidence to support that this distribution is not power-law.
This simple example shows two important details about any goodness-of-fit tests: the
result of the test does not prove that a sample actually comes from a power-law
distribution, it can only suggest when the chance of being a power-law is low. As would
be expected, higher chance of the sample being tested of being from a power-law
distribution is suggested by a higher the p value. The latter can be used as a method to

9
compare samples to infer which are more likely to be generated by a power-law
distribution.
A second example is a collection of papers covering the topic of vibrating sandpiles,
containing 368 papers with 6272 references. The power-law exponent of the paper per
reference distribution was extracted using the four methods discussed above (the
Bayesian method was not used because it generates results that are not easily comparable
to the other results): MLE, linear regression in log-log scale, linear regression using the
first 5 points only and linear regression using logarithmically binned data. A third
example estimates the power-law exponent of the authors per paper distribution of a
collection 336 papers and 422 authors from a collaboration network associated with
researchers at the University of Maryland Psychiatric Research Center. Figure 3 shows a
comparison of the different results for this dataset. As discussed, the linear regression on
log-log transformed data fits all points with equal weight, and greatly underestimates the
exponent. Using the first 5 points, the method over-estimated the exponent, because it
does not take into consideration the tail, while the log2 binned data underestimated the
exponent because it places too much emphasis on the tail points.
Table III shows a summary of the estimated exponents obtained using each of the
fitting methods applied to 27 different collections of journal papers from 27 different
research topics. These collections were gathered from the Institute for Scientific
Information Web of Science product over a period of two years from queries and seed
references and were used for research in information visualization and knowledge
domain mapping. The characteristics of these collections are summarized in Table IV.
For paper collections two distributions are usually claimed as power-laws: papers per
author (Lotka’s Law [23]) and papers per reference ([24]). The number of papers in each
collection varies from 131 to 14,211 and the MLE power-law exponents vary from 1.99
to 3.71 for the distribution of papers per author and 1.98 to 3.93 for the distribution of
papers per reference.
Using the MLE, the two goodness-of-fit tests discussed above were used to analyze
all 27 datasets. Table V shows the overall result of the number of distributions that were
actually accepted as power laws using both goodness of fit tests described in the previous
section using a 95% confidence level.

10
These results support the idea that it is not possible to assume in all datasets that these
distributions are actually power-law. Papers per author distributions experience a 56%
acceptance rate using the KS test and appear to be more likely to be accepted as actual
power-laws, . However, using the KS test of power-law fit on the papers per reference
distribution the acceptance rate was only 7%, so that it is very unlikely to be an actual
power law, as suggested by Naranan [24]. Further analysis on the ability to define power-
law tail distributions would be needed to test if the paper-per-reference distribution is
power-law tail only as reported by Redner [4]. The two collections where paper per
reference distribution was accepted as power-law by both χ2 and KS tests are two small
datasets containing 148 papers and 3,767 references, and 131 papers and 1,573
references, respectively.
In Figure 4 to Figure 7, some examples of distributions and their actual goodness of
fit test values are shown. In these examples, it is easy to observe that visual inspection in
a log-log plot is not accurate enough to determine if these distributions are actual power-
laws or not.

V. CONCLUSIONS

This paper presents an analysis of the extent to which empirical data can be assumed
to be power-law distributed. First a brief discussion of possible fitness methods was
presented. Then two well-known goodness-of-fit measurements were used for the
analysis: Pearson’s χ2 test and the Kolmogorov-Smirnov test. A KS table for testing the
fitness of MLE estimated power-law was provided.
Using these goodness-of-fit tests on 27 collections of journal papers and testing them
for two distributions that are usually believed as power-laws, the papers per author and
papers per reference distributions, it was shown that caution must be taken when
assuming power-law distributions. Especially on the papers per reference distributions, it
was observed that in many cases the power-law distribution could not be substantiated.
Importantly, the usual method for observing power-law distributions, that is, plotting on
log-log scale, offers no support on actually identifying poor goodness of fit.

11
Most of the fitness problems may be caused by external effects that usually affect the
initial points of the distribution only (the head of the distribution). Further analysis would
be required to test if the tail of the distribution is a power-law. However, when testing for
the fit of the tail, it is wise to be cautious about the extreme paucity of sample points that
generate the tail of the distribution. The power of goodness-of-fit tests decreases when
fewer points are sampled. Therefore, it becomes much more difficult to confirm that the
distribution of the tail is power-law and not any other distribution.
Another possible analysis that could be performed with this data is a quantitative
analysis of the modifying external effects. For example, if it is known that, in some
collections of journal paper there may be some survey papers that reference many papers
that were never referenced before and that are actually external from the dataset, this may
cause an unexpected increase in the number of references appearing in only one paper
(the first value in the distribution). With a goodness of fit test it is possible to establish
some hypothesis on the amount of external references that were added to the database
and, possibly, remove them from further analyses.
Overall, the evaluation of these tests is simple and does not add much to the overall
processing complexity. The insightful understanding of goodness of fit measurements
when testing for power-law distributions enhances the capabilities of analysis of datasets
that may show highly skewed distributions. It is a vital process in order to confirm
assumptions and make meaning full comparisons when modeling of the datasets.

12
Table I - KS test table for power-law distributions
Quantile
# samples 0.9 0.95 0.99 0.999
10 0.1765 0.2103 0.2835 0.3874
20 0.1257 0.1486 0.2003 0.2696
30 0.1048 0.1239 0.1627 0.2127
40 0.0920 0.1075 0.1439 0.1857
50 0.0826 0.0979 0.1281 0.1719
100 0.0580 0.0692 0.0922 0.1164
500 0.0258 0.0307 0.0412 0.0550
1000 0.0186 0.0216 0.0283 0.0358
2000 0.0129 0.0151 0.0197 0.0246
3000 0.0102 0.0118 0.0155 0.0202
4000 0.0087 0.0101 0.0131 0.0172
5000 0.0073 0.0086 0.0113 0.0147
10000 0.0059 0.0069 0.0089 0.0117
50000 0.0025 0.0034 0.0061 0.0077

Table II - Sample results for using the KS goodness-of-fit test

x S(x) F*(x) |F*(x) - S(x)|
1 0.7647 0.7960 0.0313
2 0.9188 0.9132 0.0056
3 0.9692 0.9513 0.0178
4 0.9860 0.9686 0.0174
5 0.9916 0.9779 0.0137
6 0.9972 0.9835 0.0137
17 1.0000 0.9971 0.0029

Table III – Sample statistics for power-law fitting of all datasets varying the fitting method
Papers per author Papers per reference
µ σ µ σ
MLE 2.63 0.47 MLE 2.78 0.51
Linear 2.17 0.48 Linear 2.04 0.46
Linear (5p) 2.59 0.51 Linear (5p) 2.77 0.49
Log2 bins 2.73 0.55 Log2 bins 2.60 0.46

13
Table IV - Summary table of a series of 27 paper collections used in for demonstrating power-law
fitting and goodness of fit testing.

no. of no. of no. of

Index topic papers references authors
1 agent based models 148 3767 259
2 angiogenesis 453 8246 1590
3 anthrax 2472 25010 4493
4 atrial ablation 3095 22670 6574
5 biosensors 5892 32767 11034
6 botox 1560 20819 3521
7 cocition and bibliographic coupling 550 13010 492
8 complex networks 902 19185 1665
9 distance education 1391 16603 2472
10 econophysics 482 6281 588
11 ht supercon 1631 29044 3001
12 info science 14211 119289 9413
13 information visualization 2450 56912 5545
14 mems RF switch 131 1573 359
15 milgrams 404 6791 465
16 molecular imprinting 513 5717 785
17 nerve agents 407 8293 1064
18 neuroimaging 671 25279 2042
19 ontology 224 6501 456
20 schizophrenia 513 20422 1477
21 scientometrics 3468 70117 2928
22 self organized criticality 1634 27622 2176
23 silicon on insulator semiconductor 2383 23041 4902
24 superstring 6652 53568 4813
25 TQM 1893 28216 2875
26 U of Maryland 336 5890 422
27 vibrating sandpiles 368 6272 547

Table V - Overall results for the goodness of fit for all datasets
Papers per author Papers per reference
χ2 test KS test χ2 test KS test
# rejected 10 (37%) 12 (44%) # rejected 26 (96%) 25 (93%)
# accepted 17 (63%) 15 (56%) # accepted 1 (4%) 2 (7%)

14
Figure 1 - Log-2 bin results for a theoretical power-law distribution with N=1000 and γ=2.0

Figure 2 - Empirical transformation between the slope of the log-2 binned data and the power-law
exponent

15
Figure 3 - Results of fitting using the different fitting methods for papers per reference distribution
for the vibrating sandpiles dataset.

Figure 4 –Papers per author distribution for the University of Maryland dataset. The circles
represent the actual empirical distribution, the line is the Maximum Likelihood fit (gamma = 2.02).
The database has 336 papers, 422 authors, Q = 3.24, pQ = 0.7785, K = 0.0158, pK > 0.1.

16
Figure 5 – Papers per author distribution for the atrial ablation dataset. The circles represent the
actual empirical distribution, the line is the Maximum Likelihood fit (gamma = 2.11). The database
has 3,095 papers, 6,574 authors, Q = 62.7, pQ = 1.5·10-5, K = 0.0125, pK < 0.01.

Figure 6 – Reference distribution for the superstring dataset. The circles represent the actual
empirical distribution, the line is the Maximum Likelihood fit (gamma = 1.98). The database has
6,652 papers, 53,568 references, 208,119 citations, Q = 604, pQ = 0, K = 0.0139, pK < 0.001.

17
Figure 7 – Reference distribution for the angiogenesis dataset. The circles represent the actual
empirical distribution, the line is the Maximum Likelihood fit (gamma = 2.33). The database has 453
papers, 8,246 references, 18,818 citations, Q = 1.74, pQ = 1.2·10-5, K = 0.0107, pK < 0.01.

[1] R. Albert, H. Jeong, and A. L. Barabasi, Nature 401, 130 (1999).

[2] H. Jeong, B. Tombor, R. Albert, et al., Nature 407, 651 (2000).
[3] M. Faloutsos, P. Faloutsos, and C. Faloutsos, Comput. Commun. Rev. 29, 251
(1999).
[4] S. Redner, Eur. Phys. J. B 4, 131 (1998).
[5] F. Liljeros, C. R. Edling, L. A. N. Amaral, et al., Nature 411, 907 (2001).
[6] N. L. Johnson, S. Kotz, and A. W. Kemp, Univariate discrete distributions (John
Wiley & Sons, New York, 1992).
[7] J. M. Mendel, Lessons in Estimation Theory for Signal Processing,
Communications, and Control (Prentice Hall, Upper Saddle River, NJ, 1995).
[8] R. Albert, H. Jeong, and A.-L. Barabási, 401, 130 (1999).
[9] J. M. Kleinberg, J. ACM 5, 604 (1999).
[10] J. H. Jones and M. S. Handcock, Nature 423, 605 (2003).
[11] A. Walther, Acta Mathe. 48, 393 (1926).
[12] G. E. P. Box and G. C. Tiao, Bayesian inference in statistical analysis (Addison-
Wesley Pub. Co., Reading, Mass.,, 1973).
[13] R. Albert and A.-L. Barabási, Rev. Mod. Phys. 74, 47 (2002).
[14] J. H. Jones and M. S. Handcock, P Roy Soc Lond B Bio 270, 1123 (2003).

18
[15] K. Pearson, 50, 157 (1900).
[16] W. G. Cochran, Ann. Math. Stat. 23, 315 (1952).
[17] P. T. Nicholls, J. Am. Soc. Inform. Sci. 40, 379 (1989).
[18] A. N. Kolmogorov, G. Inst. Ital. Attuari 4, 77 (1933).
[19] M. L. Pao, Info. Proc. Man. 21, 305 (1985).
[20] W. J. Conover, Practical nonparametric statistics (Wiley, New York, 1999).
[21] H. W. Lilifoers, J. Am. Stat. Assoc. 62, 399 (1967).
[22] H. W. Lilifoers, J. Am. Stat. Assoc. 64, 387 (1969).
[23] A. J. Lotka, J. Wash. Acad. Sci. 16, 317 (1926).
[24] S. Naranan, J. Doc. 27, 83 (1971).

Statistical Models Based On Counting Processes (PDFDrive) PDF
No ratings yet
Statistical Models Based On Counting Processes (PDFDrive) PDF
778 pages
Estimation of The Generalized Extreme Value Distribution by The Method of Probability Weighted Moments
No ratings yet
Estimation of The Generalized Extreme Value Distribution by The Method of Probability Weighted Moments
11 pages
Maximum Likelihood Homework
100% (1)
Maximum Likelihood Homework
8 pages
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
No ratings yet
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
35 pages
Introducción A La Teoría de Grafos
No ratings yet
Introducción A La Teoría de Grafos
58 pages
Statistical Machine Learning 1665832214
No ratings yet
Statistical Machine Learning 1665832214
55 pages
CS464 Ch3 Estimation
No ratings yet
CS464 Ch3 Estimation
56 pages
Non-Parametric Power-Law Surrogates
No ratings yet
Non-Parametric Power-Law Surrogates
24 pages
Power-Law Distributions in Empirical Data: 1. Introduction. Many Empirical Quantities Cluster Around A Typical Value. The
No ratings yet
Power-Law Distributions in Empirical Data: 1. Introduction. Many Empirical Quantities Cluster Around A Typical Value. The
43 pages
Likelihood Frequentist
No ratings yet
Likelihood Frequentist
27 pages
Where Does The Tail Begin
No ratings yet
Where Does The Tail Begin
38 pages
Estimation of The Generalized Lambda Distribution From Censored Data - Joseph Mercy - May 2008
No ratings yet
Estimation of The Generalized Lambda Distribution From Censored Data - Joseph Mercy - May 2008
15 pages
Ajms 506 23
No ratings yet
Ajms 506 23
16 pages
Unit II - 03 - Inference
No ratings yet
Unit II - 03 - Inference
19 pages
ASCE JOHydraulicEng 1989 Nguyen-Inna
No ratings yet
ASCE JOHydraulicEng 1989 Nguyen-Inna
22 pages
Statistical Inference
No ratings yet
Statistical Inference
55 pages
Aban Et Al 2006 JASA
No ratings yet
Aban Et Al 2006 JASA
9 pages
Fitting Power Law Distributions To Data: Willy Lai
No ratings yet
Fitting Power Law Distributions To Data: Willy Lai
15 pages
DSAI514 Lec2 Point Estimation Part 3
No ratings yet
DSAI514 Lec2 Point Estimation Part 3
21 pages
Preprint: Bayesian Inference of Power Law Distributions
No ratings yet
Preprint: Bayesian Inference of Power Law Distributions
11 pages
PRML Slides 2
No ratings yet
PRML Slides 2
86 pages
Adm2 FR Operating Manual 15.07.02
71% (7)
Adm2 FR Operating Manual 15.07.02
160 pages
Pareto Analysis Technique
No ratings yet
Pareto Analysis Technique
15 pages
Session 32 - Point Estimate
No ratings yet
Session 32 - Point Estimate
53 pages
Leis de Potência, Lei de Zipf's e Distribuições de Cauda Pesada
No ratings yet
Leis de Potência, Lei de Zipf's e Distribuições de Cauda Pesada
40 pages
Lecture Notes On Regression: Markov Chain Monte Carlo (MCMC)
No ratings yet
Lecture Notes On Regression: Markov Chain Monte Carlo (MCMC)
13 pages
18.443 MIT Stats Course
No ratings yet
18.443 MIT Stats Course
139 pages
Regular Implementation Follow Up & Review: Dr. Juran's Problem Solving Steps
No ratings yet
Regular Implementation Follow Up & Review: Dr. Juran's Problem Solving Steps
15 pages
Maximum Likelihood Estimation by K.Kashin
No ratings yet
Maximum Likelihood Estimation by K.Kashin
34 pages
Journal of Statistical Planning and Inference: X) For Generalized Pareto Distribution
No ratings yet
Journal of Statistical Planning and Inference: X) For Generalized Pareto Distribution
15 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
10 pages
TELSMITH Rotary Drum Scrubber New
100% (4)
TELSMITH Rotary Drum Scrubber New
7 pages
Sanku and Tanujit 2014
No ratings yet
Sanku and Tanujit 2014
23 pages
Lecture 2
No ratings yet
Lecture 2
9 pages
Arameter Stimation: Lack of Bias
No ratings yet
Arameter Stimation: Lack of Bias
30 pages
15.097: Probabilistic Modeling and Bayesian Analysis
No ratings yet
15.097: Probabilistic Modeling and Bayesian Analysis
42 pages
03 Lecturenote MLE MAP
No ratings yet
03 Lecturenote MLE MAP
7 pages
Power Law Distribution in Empirical Data
No ratings yet
Power Law Distribution in Empirical Data
27 pages
MATH 437/ MATH 535: Applied Stochastic Processes/ Advanced Applied Stochastic Processes
No ratings yet
MATH 437/ MATH 535: Applied Stochastic Processes/ Advanced Applied Stochastic Processes
7 pages
Predição em Modelos de Tempo de Falha Acelerado Com Efeito Aleatório para Avaliação de Riscos de Falha - (JoaoBC)
No ratings yet
Predição em Modelos de Tempo de Falha Acelerado Com Efeito Aleatório para Avaliação de Riscos de Falha - (JoaoBC)
22 pages
Powerlaw ProgressDiffusion Hilbert
No ratings yet
Powerlaw ProgressDiffusion Hilbert
16 pages
Sta255 Week 11-2 Pre
No ratings yet
Sta255 Week 11-2 Pre
21 pages
Wk04 Machine Learning
No ratings yet
Wk04 Machine Learning
6 pages
Your Energy Bill
No ratings yet
Your Energy Bill
4 pages
Lec22 PDF
No ratings yet
Lec22 PDF
8 pages
Problem
No ratings yet
Problem
25 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
Proceedings ISSI 2021 (Dragged)
No ratings yet
Proceedings ISSI 2021 (Dragged)
6 pages
Estimation of The Generalized Extreme-Value Distribution by The Method of Probability-Weighted Moments
No ratings yet
Estimation of The Generalized Extreme-Value Distribution by The Method of Probability-Weighted Moments
11 pages
Distribuiçao de Pareto
No ratings yet
Distribuiçao de Pareto
29 pages
Minka Gamma
No ratings yet
Minka Gamma
3 pages
A Comparison of Methods For The Estimation of Weibull Distribution Parameters
No ratings yet
A Comparison of Methods For The Estimation of Weibull Distribution Parameters
14 pages
Statistical Inference Based On Pooled Data: A Moment-Based Estimating Equation Approach
No ratings yet
Statistical Inference Based On Pooled Data: A Moment-Based Estimating Equation Approach
23 pages
A320 - 22 Auto Flight
No ratings yet
A320 - 22 Auto Flight
92 pages
Statistical+Inference+1 Shaw2007
No ratings yet
Statistical+Inference+1 Shaw2007
66 pages
Itt459 Individual Assignment
No ratings yet
Itt459 Individual Assignment
28 pages
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
No ratings yet
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
12 pages
Learning Models From Data: 1 Parametric Estimation
No ratings yet
Learning Models From Data: 1 Parametric Estimation
14 pages
ExamplesR Power Law
No ratings yet
ExamplesR Power Law
12 pages
Lecture 6 - Power Law Degree Distributions: February 7, 2008
No ratings yet
Lecture 6 - Power Law Degree Distributions: February 7, 2008
4 pages
Chapter 9 Bayesian Methods - Machine Learning For Factor Investing
No ratings yet
Chapter 9 Bayesian Methods - Machine Learning For Factor Investing
11 pages
Unit 2 - Machine Learning - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 2 - Machine Learning - WWW - Rgpvnotes.in PDF
10 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
Skyjack 4740 Parts Manual
No ratings yet
Skyjack 4740 Parts Manual
148 pages
Sales Force Techleap Ad PDF
No ratings yet
Sales Force Techleap Ad PDF
137 pages
Course Plan - Linux Lab
No ratings yet
Course Plan - Linux Lab
12 pages
Sectional Weights
No ratings yet
Sectional Weights
1 page
Remote Servicing Suite Security Manual Version A
No ratings yet
Remote Servicing Suite Security Manual Version A
45 pages
LM Chart Cast Alloys Aluminum
0% (1)
LM Chart Cast Alloys Aluminum
2 pages
Roller Chains Catalogue en Kettenwulf
No ratings yet
Roller Chains Catalogue en Kettenwulf
146 pages
Gate QP Ee Set-03 2015
No ratings yet
Gate QP Ee Set-03 2015
39 pages
Compliance Under Case-B'.: Notes
No ratings yet
Compliance Under Case-B'.: Notes
10 pages
Duck Creek Questions, Issues, Concerns v4
No ratings yet
Duck Creek Questions, Issues, Concerns v4
6 pages
ICT in Education
No ratings yet
ICT in Education
26 pages
Wravor Catalog en
No ratings yet
Wravor Catalog en
28 pages
CFS Families
No ratings yet
CFS Families
4 pages
Reservdelar x3m Ventil
No ratings yet
Reservdelar x3m Ventil
10 pages
Mock 3
No ratings yet
Mock 3
7 pages
Whitepaper: Decentralized Finance Global Smart AMM DEX Protocol
No ratings yet
Whitepaper: Decentralized Finance Global Smart AMM DEX Protocol
16 pages
Lesson 2 Introduction of Robot HAT
No ratings yet
Lesson 2 Introduction of Robot HAT
4 pages
XT4N 250 Ekip LS/I in 250A 4p F F
No ratings yet
XT4N 250 Ekip LS/I in 250A 4p F F
3 pages
RIL - List of Subsidiaries
No ratings yet
RIL - List of Subsidiaries
7 pages
Anonymous The Rocks of Bawn For Harp Clarinet 47437
No ratings yet
Anonymous The Rocks of Bawn For Harp Clarinet 47437
4 pages
Bitumen Emulsion Production Plant: Capacity: 10 M /H
No ratings yet
Bitumen Emulsion Production Plant: Capacity: 10 M /H
10 pages
Capacidades de Reabastecimento R1700K
No ratings yet
Capacidades de Reabastecimento R1700K
2 pages
Cover Letter Qatar
No ratings yet
Cover Letter Qatar
1 page
B11 - B12 - B13 - 0141 - MAT2002 - 100318 - Dr. Sheerin Kayenat - Fall 22-23 - TEE
No ratings yet
B11 - B12 - B13 - 0141 - MAT2002 - 100318 - Dr. Sheerin Kayenat - Fall 22-23 - TEE
2 pages
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Fitting To The Power-Law Distribution

Uploaded by

Fitting To The Power-Law Distribution

Uploaded by

Fitting to the Power-Law Distribution

Michel L. Goldstein, Steven A. Morris, Gary G. Yen

PACS Number(s): 02.50.Ng, 05.10.Ln, 89.75.-k

In recent years, a significant amount of research focused on showing that many

II. FITTING POWER-LAW DISTRIBUTIONS

A. Maximum likelihood estimation (MLE)

where: l(γ|x) is called the likelihood function of γ given the data x

where: ζ ′ ( γ ) is the derivative of the Riemann Zeta function.

be generated on most modern mathematical and engineering calculation programs.

C. Linear regression-based estimators

b = 1.094 ⋅ γ − 0.963 (5)

where: b is the measured exponent

b = 1.026 ⋅ γ − 0.931 . (6)

III. GOODNESS-OF-FIT TESTS

In statistical analysis, many methods for assessing the goodness-of-fit of a

A. Pearson’s χ2 goodness-of-fit test

where: C is the total number of classes

where: F*(x) is the hypothesized cumulative distribution function

Table II - Sample results for using the KS goodness-of-fit test

no. of no. of no. of

[1] R. Albert, H. Jeong, and A. L. Barabasi, Nature 401, 130 (1999).

You might also like