Benford's Law Exploration
Benford's Law Exploration
Benford's Law Exploration
Internal Assessment
Math Exploration
Benfords Law
Student: Caterina Rende Dominis
Class: 4N
Teacher: Jelena Gusi
Examination session: May 2014
Candidate number: fdt217 (000618-0027)
If we were to assume that the distribution of 1
st
digit numbers was divided
proportionally between each number (for example if there were the same
number of 5s as there are 9s), than the next few pages will illuminate the reader
with a mathematical law that is still nowadays an unexplainable mystery to
mankind. If we believe the frequency, or better yet, the probability of numbers
starting from 1 to 9 to be on average divided into equal portions of 1/9
th
we are
far from the truth.
Over the years it has been observed that all numbers that grow naturally
(meaning are not tampered with or made up by humans) most often start with a
1 and the least often with a 9. By the end of this exploration we may be able to
observe how this law can be used for fraud detection in accounting or similar
disciplines.
To get an idea of numbers distribution the table below may be taken as
reference in order to understand Benfords Law further.
P.. probability
d.. number in question
Figure 1: Relative probability of d (with bar chart)
This rule has been quite controversial and surprising in the world of
mathematics, as it cant be fully explained. Some mathematicians claim it may
not be used to detect fraud as they believe that its erroneous, and that one
should not be convicted on the premises that the accounting numbers or election
distributions do not coincide with Benfords Law. In order to prove its accuracy
experiments need to be made in order to prove either theory, depending on what
the results come to show.
The first ever encounter in history with this law was made by Simon Newcomb,
who never explained any of his findings, but just noticed them as something
probing.
The Laws re-discovery happened thanks to Frank Benford, a research physicist
at General Electric in 1930s from whom the law takes its name, who while
working needed to consult a book of logarithmic tables. He suddenly noticed
something rather odd: the first pages of the book were more worn out than the
last ones. By observing this he concluded that the first digit (1) was looked up
more often than any of the other digits.
After this discovery Benford started collecting further data from nature in order
to prove how widespread it actually was. His results were finally published in
1938. His published work showed more than 20000 values that were obtained
from data in lengths of rivers, magazine articles, sports statistics, etc.
Explanation
Figure 2: Linear logarithmic scale
Source: http://www.thisisthegreenroom.com/wordpress/wp-
content/uploads/2009/04/logs2.png
A logarithmic linear scale is determined by multiples of 10. In order to determine
the position of numbers from 1 to 10 or from 10 to 100, etc. we need to find the
logarithm of the number. An easier way to explain this is with an example: if we
find the logarithm of 2 (log 2) the result is 0.301, which equals to the distance
between 1 and 2. This is equivalent to the probability of the occurrence of
number 1 in accordance to Benfords Law.
With the help of what we may observe above we can deduce that, even if we
calculate the area between 1 and 2 it will be exactly 30.1 % of the area between 1
and 10, just like the area between 10 and 20 would be, or 100 and 200, and so
on.
What we can observe with this pattern is the following: that the subtraction
between the logarithm of 2 and the logarithm of 1 will have as a result the exact
occurrence of the number one like in Benfords law. In turn, so will the
subtraction between the logarithm of 3 and the logarithm of 2, and so forth,
which we may see more clearly in the first few examples shown below.
log(2) - log(1) = 0.301
log(3) - log(2) = 0.176
etc
With that in mind we can come up with a formula with which we may be able to
calculate the probability of a certain number, which would follow precisely
Benfords law.
If we consider that the leading digit d (d {1, 9}) is the leading digit in
question than we may come up with the following formula:
P(d) = log 1+
1
d
In conclusion what we can finally observe is that the probability above is equal to
the difference.
Applying Benfords Law to Real-life Examples
Taking what was concluded above into consideration one would probably
wonder what this rule undoubtedly applies to. The first experiment I did was
with the Mathematics book we use in our class daily, the Mathematics SL Course-
book. I counted the 1
st
digit numbers in exactly 10 pages (from page 18 to page
27), and the results were indeed very close to the exact values dictated by
Benfords Law, therefore, all things considered, it followed Benfords Law very
closely. I completed the experiment with no help from technology (as you may
see in the scanned pictures below), which may have caused a slight human error,
but after 3 trials this was the average with which the following results came up:
Figure 3: Benfords Law in math book notes.
Thankfully, with the help of the following source I was able to find a more
accurate and less time consuming way of applying Benfords to data sets using
Microsoft Excel: http://www.theiia.org/intAuditor/media/files/Step-bystep_
Instructions_for_ Using_Benford's_Law[1].pdf
Another experiment that I have done was look at the global lengths of rivers.
With the help of Microsoft Excel and the method that may be seen in the link
above I attempted to use lengths of rivers to further prove Benfords laws
efficiency. Even though this experiment has been done before, and it has been
majorly successful, in my case the results were not what I was expecting.
In the following graph you may see the comparison of the curve that coincides
with Benfords Law, and the result I got from the sample data:
Figure 4: Graph comparing the rate of global rivers and Benfords
Since the data set was unfortunately limited as it included the lengths of rivers of
the 1000 longest rivers on the planet, I assumed it might have been
inappropriate for this kind of experiment.
In light of that fact I chose to experiment with data sets that were not restricted
by length as in this case but rather by territory, so I repeated the same process
with lengths of rivers in Croatia. In the following graph you may observe the
utter similarity with the one above.
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
1 2 3 4 5 6 7 8
Sample Rate
Benford Rate
Figure 5: Graph comparing the rate of Croatian rivers and Benfords
Despite the previous anomalies in the third trial with data sets concerning social
media following the pattern seems to be much more along the lines of Benfords
law. This data set in contrast to the previous ones is quite new to us, and has
been rarely applied to study Benfords law.
Figure 6: Graph comparing the rate of Twitter followers and Benfords
The Benford Rates curve is followed almost perfectly by the Samples curve,
which in turn proves Benfords Law to be valid. Unfortunately my first two trials
were not as successful, even though the same data sets have been in the past. My
0.0000%
5.0000%
10.0000%
15.0000%
20.0000%
25.0000%
30.0000%
35.0000%
1 2 3 4 5 6 7 8 9
Sample Rate
Benford law
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
1 2 3 4 5 6 7 8 9
Sample Rate
Benford Rate
guess is that there were some limitations is the data, or that the method of
procuring the data through Microsoft Excel was inefficient, even though it
worked perfectly in the last trial.
In contrast to the Twitter Census data sets case, there are some cases in which
this doesnt apply, like in the case of measuring human heights. The choice is too
restrictive due to people being no more than 2 m tall, so the only numbers used
to mark peoples height are 1 or 2.
Another case which does not follow the law is in case of pre-assigned numbers
such as postal codes or ID numbers, as those are numbers made up and pre-
assigned by the government (aka made up and distributed by people), and do
not actually occur naturally.
Conclusion
The limited amount of people that know of Benfords Law have come to
commonly know it as the fraud detecting law. And it is its anonymity and the
common lack of knowledge about it that actually allows it work in that field. As
people falsifying results most often do not know about Benfords Law they make
up numbers that they consider most plausible (usually trying not to use the same
numbers too often and trying to divide them equally). They do so thinking that
that will prevent other peoples suspicions, while they unknowingly prove
themselves guilty.
One of Benfords laws applications that is most famous nowadays is its usage to
prove that the Iranian elections of 2009 have been tampered with, where
Mahmoud Ahmadinejad won with 62.63%. The initial digit distribution was not
consistent with the law, thus many mathematicians believed the results had been
rigged. At the end these claims were not taken into consideration, as politics is in
fact not managed by rules and logic as math is, and Benfords law was finally
proclaimed to be inaccurate by Iranian mathematicians.
What we may observe in the overall of this mathematical exploration is that
there is proof of Benfords Law accuracy, as there is for its inaccuracy. Some data
sets showed that Benfords Law really does apply to all naturally occurring
numbers, but there were some that didnt. What makes me go more towards the
tendency to believe that Benfords Law is right are the experiments previously
made by mathematicians who found the same data sets that I used to prove the
Law to be successful. All in all, there is more proof of it being accurate, but we
still may not be 100% certain..
References:
http://www.kirix.com/blog/2008/07/22/fun-and-fraud-detection-with-
benfords-law/
http://ibmathsresources.com/2013/05/22/benfords-law-using-maths-to-catch-
fraudsters/
http://t1.physik.tu-
dortmund.de/kierfeld/teaching/CompPhys_09/benford_iran_0906.2789v1.pdf
http://en.wikipedia.org/wiki/Benford's_law
Digital Analysis using Benfords Law by Mark J. Nigrini
http://www.benfords-law.com/
http://www.thisisthegreenroom.com/wordpress/wp-
content/uploads/2009/04/logs2.png
http://www.khanacademy.org/math/trigonometry/exponential_and_logarithmi
c_func/logarithmic-scale-patterns/v/logarithmic-scale