Surprising Distributions: Khwecoiewematics: Benford's Law and Other
Surprising Distributions: Khwecoiewematics: Benford's Law and Other
Surprising Distributions: Khwecoiewematics: Benford's Law and Other
Surprising Distributions
5394 words (22 pages) Essay
Tags:
Disclaimer: This work has been submitted by a university student. This is not an example of the
work produced by our Essay Writing Service. You can view samples of our professional work
here.
Any opinions, findings, conclusions or recommendations expressed in this material are those of
the authors and do not necessarily reflect the views of UKEssays.com.
Share this: Facebook Twitter Reddit LinkedIn WhatsApp
Introduction
Benford’s law is a surprising mathematical concept which at first seems rather counter-intuitive.
It explains the distribution of the leading digits in a large set of data. A simple example displays
its initial peculiarity. Imagine we look at the current share price of every company on the FTSE
350, an index of the 350 largest UK companies. Within this set of data, each share price has the
possibility of the first digit being any number between 1 and 9 (
d∈{1,…9})
. The average person would believe that each price would have an equal chance of starting with
each number between 1 and 9, so if one of the 350 prices was selected at random the probability
that the first digit was 1 would be about
19
(11.1%) and the probability of the first digit being a 9 would also be about
19
(11.1%). However, this is in fact not the case at all. If you were to calculate this, the probability
of the first digit being 1 would actually be closer to 30% than 11%. Furthermore, the probability
of the leading digit being a 9 would only be just over 4%! I initially read about this strange
distribution in an economics context. I was keen to investigate the mathematics behind this and
test the limits and applications of it. The physicist Frank Benford first discovered this in 1938
when he noticed that pages closer to the beginning of his log tables were increasingly becoming
more worn than those closer to the end, meaning that they were searching for numbers that were
starting with a 1 much more than higher numbers. Benford started to test this theory across
newspapers, populations and river lengths. He found more or less the same result every single
time. Numbers starting with a 1 turned up approximately 30% of the time, almost all the time.
Eventually, this was formed into a mathematical law which includes an equation which displays
the exact probabilities of each number between 1 and 9 occurring as the leading digit. The aim of
this investigation is to explore the explanations and applications behind Benford’s Law and to
touch upon other equally strange distributions and examine if they link to Benford’s law in any
way. NB: For the purposes of this exploration, all logarithms will be assumed as base 10 if the
base is not stated.
Before discussing the law’s explanations and applications, it is first useful to understand how
leading digits are studied in the world of mathematics and their importance. Scientific notation
(also known as standard form) is a pivotal part of this. The notation follows the system that a
positive number x can be expressed in the form
S(x)×10n
in which
1≤Sx<10
, meaning that the number is expressed first as a value of 1 or greater but less than ten, multiplied
by 10 to a given exponent which reaches the initial number. For example, in this format the
number 865,000,000 would be expressed as
8.65×108
. The initial number before the exponent is known as the significand.1 This system allows
numbers spanning many different magnitudes to be expressed in a similar fashion, for example
comparing atomic radii to planetary radii.
After Benford’s experiments, he discovered the approximate percentages for the probability of
each number occurring as the leading digit. The pattern followed in a way as shown on the graph
below:
2
For the purposes of explaining the law and the derivation of its equation, the leading digit
between 1 and 9 is represented as d with the probability of the digit being the leading digit is
represented as P(d).
The basic explanation of the law states that the space between digit d and d+1 is proportional to
the quantity of P(d) on a logathrimic scale. A logarithmic scale is one which is non-linear3 and
based on orders of magnitude meaning each increasing unit on the scale is the unit on the
previous value multiplied with a constant.
By understanding logathrimic scales we can begin to better understand how the percentages in
Benford’s law are derived. When we are working with many values spanning multiple orders of
magnitude, as Benford’s law does, the basic explanation states that:
log1≤logd<log2
and similarly, d will be 9 when
log9≤logd<log10
. On a linear scale the difference between 2 and 1 would be equal to the difference between 10
and 9. However on a logathrimic scale the differences are as follows:
log2–log1 =0.301
log10–log9 =0.0458
Logarithmic Interval Difference of Interval
log2–log1 0.301
log3–log2 0.176
log4–log3 0.125
log5–log4 0.0969
log6–log5 0.0792
log7–log6 0.0669
log8–log7 0.0580
log9–log8 0.0512
log10–log9 0.0458
If we apply this to all the numbers between 1 and 9 the results are as follows:
These log calculations are in fact the probabilities of each number from 1 to 9 occurring as the
leading digit! This can be seen on the graph of the results below, which follows the exact same
pattern as the graph shown above.
From this we can see that the probability P(d) is given by the log of the digit subtracted from the
log of the digit plus one i.e:
P(d)=log10d+1–log10d
=
log10(d+1d)
Aside from the initial explanation of the law and the derivation of the equation, there are more
detailed explanations and perspectives to the law and how it works. One of these is
the Geometric Explanation.1This approach to the law follows the idea that in a model of a
number n in a constant growth rate, n will spend a greater amount of time ‘hanging around’ the
lower digits than the higher ones. To better explain this, I will refer back to an economically
minded example of a geometric series, compound interest. A geometric series is a series in which
there is a constant ratio r between each term u and (u+1). 4Therefore the deductive rule follows
as Un=U1r n-1. For this compound interest example, let us assume I invest $2000 in 2019 for my
retirement in a very generous savings account with an annual 7% interest rate for the long term
of 60 years. This function is modelled by the equation
Un =2000×1.07n
. Note the absence of the subtraction of 1 from n in the exponent. This is due to the fact that we
wish to calculate the value as compounding at the end of each year so the subtraction of 1 is not
useful. This model shows that the balance in the savings account at the end of the 60 years will
be
2000×1.0760=$115,892.85
. However, we are more interested in where the balance lies at the end of each year over the
whole period rather than just the end. See the appendices for the full balance sheet at the end of
each year. When we examine this table from a Benford perspective, we can see that the balance
does indeed tend to stay towards low numbers for the first digit and quickly accelerates through
the higher numbers. For example, the period between when the balance is $10,000 and $20,000
lasts from 10 years from 2043 to 2053 during which the first digit is 1 on the balance sheet. The
table below illustrates this for the point between $10,000 and $99,000 in the account.
Scale Invariance
Another aspect of Benford’s Law which adds to its uniqueness is it’s universality. What I mean
by this is that if a situation follows Benford’s law, it will tend to continue to follow Benford’s
law no matter what operators are imposed upon it. For example, if I took the data set used in the
previous explanation, the list of the investment balance year upon year and converted it into
every single commonly used currency in the world, from the Euro to the Pound and Vietnamese
Dong, the chances are the data would continue to satisfy Benford’s law in almost every single
currency. Since each value in the list would have the same operation applied to it, this means it is
still likely to span many orders of magnitude which is the main condition for Benford’s Law to
apply.
Another aspect of Benford’s Law is that it can be extended to further digits rather than just the
first digit of the number.5 It is possible to calculate the probability of a number occurring as the
2nd or 3rd digit. To do this we must manipulate the equation into a series in sigma notation which
allows us to express a series of additions in one notation. If we have a digit between 0 and 9
(NB: zero can now be included as it is not possible to have zero as the first digit of a number but
it is certainly possible to have it as a following digit) then the probability that this digit will be
the nth digit in a number is given by the equation:
∑x=10n–210n–1log10(1+110x+d)
In which d represents a number between 0 and 10 and n represents the nth digit which the
probability is wanted to be calculated for. However, this is only particularly useful up to the
3rd digit as once the calculation is past the 3rd digit the numbers follow a more expected
distribution and tend closer to each number appearing 10% of the time i.e truly random.
Benford’s law has one major application which makes it particularly useful, fraud detection. Due
to the fact that Benford’s law is present in every aspect of life when numbers are distributed, any
large sets of data which do not follow Benford’s Law could be argued to be fraudulent,
particularly financial data. Programs which test for compliance with Benford’s Law are often
used by tax institutions or banks during audits or to check if data submitted to them is possibly
fraudulent. Benford’s Law was also used as part of fraud detection in the 2009 Iranian election6.
This raises the question as to if it is moral to use mathematical laws in legal proceedings or as
evidence in prosecutions. This morality debate is even more prevalent when there is a certain
degree of uncertainty within the law, or limitations to the law as will be discussed below.
Not every single set of data will be able to follow Benford’s law, for example telephone
numbers, human height in meters or feet and page numbers of small documents. Benford’s law
also does not apply to data which is generated by humans themselves or written within specific
ranges. The chance of Benford’s Law being useful highly depends on how many orders of
magnitude the data set spans. For example, the earlier example of human height in meters or feet
doesn’t follow the law as it only spans one order of magnitude. In meters almost all human
heights will start with a 1, possibly with a few that start with 2 or less than 1. The same applies if
human height is measured in feet, there would have to be a human over 3 meters tall in order to
exceed the 10ft boundary into the next order of magnitude! Also, if there an extremely large
number of orders of magnitudes, then the law also may not apply. For example, Benford’s law
wouldn’t apply to the data set of all real numbers, as clearly if these numbers continue to go on
forever then then the probability for each digit from 1 to 9 to be the leading digit will be the
same.
Benford’s law is surprisingly not alone in its strangeness. Contrary to what one may think after
reading about the uniqueness of Benford’s law, there are a few other patterns and principles
which exist through many different areas of life. Some of these have mathematical patterns
which could link to Benford’s law. One of these is Ziph’s Law which relates to language and
literature rather than numerical data. Ziph’s Law states that in a large set of words, if the most
frequent word is taken, the second most frequent word will appear half as often as the most
frequent word and the third most frequent word will appear half as often as the second most
frequent word. Essentially, the frequency of a word will be inversely proportional to how often
the word appears overall. For example, the most common word in the English language is the
word ‘the’ which accounts for 7% of all words appears twice as much as the second most
common word ‘of’ which accounts for 3.5% of all words. An equation for Ziph’s law has been
created in the context of the English language which states that in a distribution of X number of
words in the language, the frequency of each word occurring in relation to its rank of how
common it is follows this equation:
1/k∑x=1X1/n
In which X is the number of words in the English language and k is their sequential rank of how
common they are in the language. Some have argued that Benford’s law is simply a special case
of Ziph’s law however I personally believe they should be held as separate laws. Ziph’s law
could better be considered as literature’s version of Benford’s law.
Conclusion
Overall, Benford’s Law is deeply rooted into the way numbers are distributed in the real world
and it’s useful applications cannot be denied. The law which at first seems strange and
unexplainable can indeed be explained and analysed as I have demonstrated throughout this
investigative report. The geometric analysis behind Benford’s Law is key to its explanation.
Understanding Benford’s Law is now extremely useful as a student deeply interested in the field
of economics and finance. I had always been curious into how institutions such as HMRC are
able to detect fraud and prosecute those who avoid tax or commit fraudulent actions. Through
conducting this exploration, I have been able to gain a greater understanding of mathematics
while also being able to explore this economic aspect of fraud detection. Overall, I now have a
greater understanding of how mathematics can connect with other fields, even literature which
the average person might say is the ‘furthest you can get from mathematics’ is seen to have a
mathematical distribution through Ziph’s Law. This exploration continues to demonstrate how
mathematics is rooted in every part of life even if we cannot notice it at first
Bibliography:
1 = http://assets.press.princeton.edu/chapters/s10527.pdf
2 = https://www.isaca.org/Journal/archives/2011/Volume-3/Pages/Understanding-and-Applying-
Benfords-Law.aspx?utm_referrer=
4= Fannon, P. (2012). Mathematics for the IB Diploma Standard Level. Cambridge: Cambridge
University Press. p155
5=https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?
referer=&httpsredir=1&article=1041&context=rgp_rsr
6= https://physicsworld.com/a/benfords-law-and-the-iranian-e/
7= https://aclweb.org/anthology/W98-1218
Appendices:
Year Balance
2019 $2,057.18
2020 $2,201.19
2021 $2,355.27
2022 $2,520.14
2023 $2,696.55
2024 $2,885.31
2025 $3,087.28
2026 $3,303.39
2027 $3,534.63
2028 $3,782.05
2029 $4,046.79
2030 $4,330.07
2031 $4,633.17
2032 $4,957.50
2033 $5,304.52
2034 $5,675.84
2035 $6,073.15
2036 $6,498.27
2037 $6,953.14
2038 $7,439.86
2039 $7,960.65
2040 $8,517.90
2041 $9,114.15
2042 $9,752.14
2043 $10,434.79
2044 $11,165.23
2045 $11,946.80
2046 $12,783.07
2047 $13,677.89
2048 $14,635.34
2049 $15,659.81
2050 $16,756.00
2051 $17,928.92
2052 $19,183.94
2053 $20,526.82
2054 $21,963.70
2055 $23,501.16
2056 $25,146.24
2057 $26,906.47
2058 $28,789.93
2059 $30,805.22
2060 $32,961.59
2061 $35,268.90
2062 $37,737.72
2063 $40,379.36
2064 $43,205.92
2065 $46,230.33
2066 $49,466.45
2067 $52,929.11
2068 $56,634.14
2069 $60,598.53
2070 $64,840.43
2071 $69,379.26
2072 $74,235.81
2073 $79,432.32
2074 $84,992.58
2075 $90,942.06
2076 $97,308.00
2077 $104,119.56
2078 $111,407.93
2079 $115,892.85
Share this: Facebook Twitter Reddit LinkedIn WhatsApp
Cite This Work
APA
MLA
MLA-7
Harvard
Vancouver
Wikipedia
OSCOLA
Copy to Clipboard
Related Services
View all
Essay Writing Service
From £124
From £124
Assignment Writing Service
From £124
If you are the original writer of this essay and no longer wish to have your work published on the
UKDiss.com website then please:
Related Lectures
Study Resources