M A Thesis Anon PDF
M A Thesis Anon PDF
M A Thesis Anon PDF
William Reginald
1
1 Introduction
Bitcoin was developed as a proposed solution to the double spending problem faced by all
cash systems. Previously, this problem necessitated a central authority to ensure all trans-
actions were sent from accounts with sufficient balances; in order to avoid individuals from
paying for multiple transactions, nearly simultaneously, with the same money. Nakamoto’s
security of the cash system (Nakamoto 2009). By aligning the incentives of users, software
developers and mining nodes who maintain the network, Nakamoto has created a global
community connected via a currency supported without a sovereign state. Ironically, the
paper was first published online shortly after the beginning of the 2008 Financial Crisis and
when the public’s trust in traditional financial institutions began to dwindle. Bitcoin has
since gained interest from many adopters, for a variety of use cases spanning remittances,
Bitcoin recently experienced another meteoric price increase as its market capitalization
grew to over $300B USD, surpassing the traditional financial institution Bank of America and
nearly J.P. Morgan too1 . Subsequently, its price decreased substantially and as of July 2018
remains around $7000 USD. This aggregate volatility is reflected by short-term volatility,
as many days throughout the sample period the highest and lowest price Bitcoin was sold
at differed by over $500 or roughly 5%. In general, cryptocurrencies have been regarded
as more volatile than other historical risky financial instruments, such as internet stocks
1
https://coinmarketcap.com/currencies/bitcoin/
2
throughout the late 1990’s. The characteristics that could give Bitcoin underlying value,
such as user and merchant adoption rates, energy costs of production, the number of active
wallets, positive regulatory news, innovative use cases etc., do not vary frequently enough to
justify the rapid daily swings in price. Indeed, these short-term swings led Yermack (2013)
to conclude Bitcoin does not satisfy the unit of account criterion, implying it is a speculative
asset rather than a currency. The first transaction using Bitcoin exemplifies this: in 2010
someone purchased two pizzas for approximately 10000BTC, valued then around $30, which
would be worth around $70,000,000 in July 2018 and over $200,000,000 in December 20172 .
This paper aims to study the association between the actions of the richest addresses
and the short-term price volatility of Bitcoin. A common theme in the cryptocurrency
community is that the big players, informally known as Whales, are the characters often
behind price fluctuations. Many early adopters of Bitcoin acquired large quantities at low
prices throughout 2009 to 2012 and became immensely wealthy as the price recently sky-
rocketed. Consequently, I propose there are two transmission channels for their activity to
effect price:
• Direct: Taking large positions, enduring short-run losses, to influence the price in their
preferred direction (i.e. a large player with a bullish outlook could market-buy a large
quantity of Bitcoin, over-paying in the short-run to increase their overall wealth if the
price increases).
I develop an empirical model that relates the actions of the largest players, visible via Bit-
2
https://www.coindesk.com/he-paid-how-much-coindesk-releases-bitcoin-pizza-day-price-tracker/
3
coin’s blockchain, to price volatility to capture these effects. The main finding is that it is
the players below the very top of the distribution that have the highest degree of association
with increases in price volatility. Further, higher volatility is not monotonically associated
with more activity, as the busiest addresses do not necessarily have the largest, positive
coefficients. Rather, the timing of the activity seems to be more important, indicating that
brains may trump brawn for large players’ signals to reach the broader market. These re-
sults are robust to several measures of volatility, sources of price data and time frequencies
of activity. The baseline empirical procedure yields many interesting extensions and possible
The paper proceeds as follows: section two provides a background to Bitcoin, section
three presents the dataset and summary statistics, section four outlines the empirical model
and baseline results, section five discusses possible extensions and section six concludes.
2 Background
In simple terms, Bitcoin is comprised of several key components: the blockchain technol-
ogy, the Proof-of-Work (PoW) encryption process and the mining network (Nakamoto 2009).
The blockchain is a decentralized public ledger containing a record of all transactions and
balances between Bitcoin addresses. In a traditional banking setting, this would be equiva-
lent to the records of each account’s past transactions and balances. An address is simply
4
a 26-35 alphanumeric identifier, analogous to an email address where the inbox of emails
was to reduce the trust and reliance the public places on traditional financial institutions.
growing dataset, that is universally agreed upon and updated by the network. In the ag-
gregate, every node has a copy of the same data and trust in Bitcoin has been effectively
decentralized.
In order to maintain trust and align the incentives to maintain the network, Nakamoto im-
plemented the PoW process. Using cryptography, the existing blockchain can be repeatedly
updated and verified for correctness, in an efficient manner, yielding a virtually tamper-proof
system. The PoW process involves a injective and non-invertible hashing function, which
takes the existing blockchain and a proposed new block of transactions, that have been suc-
cessfully encrypted, as inputs. The function outputs an updated blockchain ready for the
more complex, but this cursory explanation is sufficient to understand the role cryptography
plays in Bitcoin.
Thirdly, the mining network is a set of globally distributed nodes, that each compete
to find the solution to the hashing function, thereby officially recording the new block of
transactions (Nakamoto 2009). One only requires a computer with good internet access and
enough space to store the 150GB blockchain dataset to become a node in the network3 .The
node that first solves the computationally-intensive cryptography problem processes the new
3
Download Bitcoin’s core protocol here: https://bitcoin.org/en/download.
5
block, appends it to the updated blockchain, earns the fees associated with the transactions
recorded in the new block and also earn the newly minted Bitcoin. The latter compensation
method aligns the incentives of the miners and simultaneously forms an ingenious solution
for controlling inflation, from the perspective of Bitcoin as a currency system. The PoW
process can monitor the network capacity, known as the Hash Rate, and scale the degree
of computational difficulty. This ensures blocks are not created too quickly, which would
result in new Bitcoin entering circulation too fast as well. It is this feature that I view
akin to central banks controlling the money supply of their currencies. I find this structure
asset. See Berentsen and Schar (2018) for a comprehensive and approachable background to
The most relevant features of Bitcoin are: price volatility, availability of data on the public
ledger and the limited supply of Bitcoin in circulation. Figure One plots two measures of
Bitcoin’s price volatility, throughout the sample period of July 2017 through June 2018. As
seen below, the volatility peaks in the December through January period, which corresponds
to the meteoric rise and fall of Bitcoin’s price. There are many instances where the daily
volatility exceeds $500 and occasionally $1000. During the most chaotic period, this daily
volatility often exceeded 10% of the lowest price of the day. The price volatility is treated
more rigorously in Section 3.3 but the primary takeaway is that Bitcoin’s price has been
6
Blockexplorers allow one to search the blockchain instantly for each address and those
related by transaction history, as well as their balance history. The pseudo-anonymity fea-
tures occurs because each address is publicly visible but there is no direct connection to who
owns the address. Software like blockexplorers imbue the community with a higher degree
of connectivity than in traditional banking settings, which supports my argument that the
actions of large players can propagate faster and induce a higher degree of price volatility.
The website Bitinfocharts.com uses software to query addresses in the blockchain by balance
size and provides an income distribution that is constantly updated as blocks are added.
There are many other data sources, for characteristics of Bitcoin relevant to other litera-
tures, publicly available and this ought to motivate the interest of more academic literature.
For example, there are data on the aggregate mining capacity of Bitcoin and the distribution
sub-divided by mining pool4 . This data could be relevant for an empirical industrial organi-
zation study of the mining network, its degree of competitiveness and the return of investing
international finance and monetary policy can be explored via the available data.
Moreover, Bitcoin was designed such that the maximum amount that can enter the money
supply via mining is 21,000,000BTC. There is already over 17,500,000BTC in existing cir-
culation, implying that over time relatively much less Bitcoin enter the money supply with
each new block successfully appended (Nakamoto 2009, Blockchain.info). Since the income
distribution is heavily skewed towards the top, there are relatively much less Bitcoin enter-
ing the circulating supply. Further, Bitcoin costs much more than the prices faced by early
4
https://www.blockchain.com/en/pools
7
adopters, which implies the barriers to moving up the distribution are quite high. These fea-
tures support the argument that the early adopters’, who have primarily become the modern
larger holders, actions bear significant weight in influencing Bitcoin’s price fluctuations. As
an example, amidst the price decline of spring 2018 there were concerns of a large holder in
Japan taking 16,000 BTC out of cold storage5 . Roughly, this is currency that was effectively
taken out of circulation in Bitcoin’s early stages and is being reintroduced now, effectively in-
creasing the money supply by a non-trivial quantity and likely resulting in negative pressure
on its price.
Yermack (2013) provides preliminary academic analysis into viewing Bitcoin as an asset
or currency. His work also gives perspective to how far the cryptocurrencies have developed
in modern markets. Yermack concluded that Bitcoin failed to satisfy the three criteria of
currencies: medium of exchange, unit of account and store of value. This analysis was
based on a recent hack of the most prominent exchange, Mt. Gox, and the observation
that most other exchanges experienced low volume and liquidity. Berentsen and Schar
(2018) provide an updated outlook and risk analysis, concluding despite high price volatility,
cryptocurrencies are a new financial asset class and that the Blockchain technology will likely
have future implications beyond Bitcoin. Together, these contrasting papers highlight the
progression cryptocurrencies have made towards mainstream adoption. While each criterion
of being a currencies has not been entirely met, Bitcoin, and the cryptocurrency ecosystem
5
https://cryptoslate.com/tokyo-whale-prepares-8000-btc-dump-as-crypto-bloodbath-continues/
8
as a whole, have made immense strides forward.
As a cash system, Bitcoin and other decentralized payment networks have many implica-
tions for central banking and monetary economics. Chiu and Koeppl (2017) develop a general
equilibrium monetary model of Bitcoin and find that the resource intensive Proof-of-Work
process yields a welfare loss of consumption. The authors argue that a prominent alterna-
tive, the Proof-of-Stake (PoS) process could be implemented by Bitcoin to utilize resources
more efficiently. Indeed, the annual amount of electricity consumed by the Bitcoin network
is comparable to Denmark and larger than Greece6 . This is due to the computationally dif-
ficult task miners undertake when finding the solution to the hashing function. Specialized
mining rigs, known as Application-Specific Integrated Circuits (ASICs), have been developed
to reduce the immense energy consumption used in the mining process. Alternatively, PoS
proposes to allow for more weighting for validating transactions to be proportional to how
much holdings one has. Proponents of PoW argue that the energy consumption trade-off
should be favoured over PoS, as PoS concentrates the consensus mechanism at the top of the
income distribution. Consequently, smaller users and adopters are at the mercy of trusting
these large holders’ to maintain the network, rather than PoW relying on competition despite
high energy consumption. Moreover, Raskin and Yermack (2016) study the implications of
cryptocurrencies from the perspective of payments systems and conclude that the technol-
ogy is still yet to be fully developed for other uses. To summarize, central banks seem to
9
Budish (2018) develops conditions to study the mining incentives and the susceptibility
of the Bitcoin network to a majority attack. This occurs when over 51% of the network’s
computing power is controlled by one group and an alternate version of the Blockchain, with
as multiple recipients will be waiting for the same funds to arrive. The double-spending
to Bitcoin Gold in May 2018, the 26th largest cryptocurrency at the time7 . Other smaller
cryptocurrencies that have experienced a majority attack in the past year are Verge and
Krypton. With that said, Bitcoin has a much deeper mining network and estimates suggest
that one would need over $6.8 billion in mining equipment and would have to service a daily
electricity bill of 93 million kWh8 . Together, these immense costs align the incentives of the
attacker to maintain the network and preserve the value of Bitcoin, rather than manipulating
the blockchain to steal Bitcoin and cause the community to devalue it.
Furthermore, a key feature of the decentralized mining network is the consensus mecha-
nism, where nodes of the network agree on a common version of the blockchain and compete
to update that copy. Satoshi describes the mechanisms’ design as, “nodes can leave and
rejoin the network at will, accepting the proof-of-work chain as proof of what happened
while they were gone. They vote with their CPU power, expressing their acceptance of valid
blocks by working on extending them and rejecting invalid blocks by refusing to work on
them. Any needed rules and incentives can be enforced with this consensus mechanism”
7
http://fortune.com/2018/05/29/bitcoin-gold-hack/
8
https://gobitcoin.io/tools/cost-51-attack/
10
(Nakamoto 2009). Catalini and Gans (2018) view this consensus mechanism as a new mar-
ket design solution, as it facilitates transactions without a central platform operator. The
authors explain that the costs of verification and networking are reduced by this market
design and conclude that Bitcoin has the opportunity to advance the current approaches
to data, privacy and ownership. I argue that the consensus mechanism, and the associated
culture of information transmission, amplify the activity signals of the big players and their
Moreover, Athey et al. (2016) study the market for Bitcoin, its price determination
and usage from the consumers’ perspective. The author’s conduct an empirical analysis of
the blockchain, using heuristics to match multiple addresses to the same wallet. Further,
the authors’ develop a model of the Bitcoin/USD exchange rate, equivalent to price, as
proportional to the ratio of volume to effective supply in circulation. The price implied by
their model matches some of the overarching trends of Bitcoin’s actual price but do not
capture any of the short-term volatility, shown in Figure Two. Two key implications of their
results are:
• Big players with dormant addresses9 could introduce a large quantity of Bitcoin into
the existing circulating supply. This would dramatically reduce velocity and increase
supply, putting downward pressure on the price. This mechanism is consistent with
both channels described above, as merely a transfer from a dormant address could
indicate the circulating supply will soon increase.
• One must incorporate other features into their model to account for the short-term
volatility in Bitcoin’s price.
9
A dormant address is an address contains a large quantity of Bitcoin, but has no recent transaction
history. The owner of the address could be holding long-term or have lost access, implying those Bitcoin are
effectively removed from the money supply, having the opposite effect as described.
11
Together, these observations motivate the study of high frequency data and formulating a
hypothesis, such as the activity of the big players, to capture the high degree of variability.
Zimmerman (2018) is a recent working paper also aimed at addressing Bitcoin’s short-
term price volatility. Zimmerman studies the fees paid by users for each transaction and
posits “a speculator with a stronger signal about future price will pay a higher fee in order
to trade more rapidly on the private information. This means that more extreme signals
are incorporated into the price more quickly, causing price volatility” (Zimmerman 2018).
The author develops a theoretical model with speculators and transactors, yielding the em-
pirical prediction: greater (less) demand for blockchain space reduces (increases) blockchain
capacity and raises (lowers) fees for transactions to be confirmed on the blockchain, causing
price volatility to be increased (decreased). In the data, this would be observed as peri-
ods with blocks using the highest amount of space are associated with high levels of price
volatility. The preliminary results are ironically the opposite; the coefficient on block size is
both negative and statistically significant10 . With that said, Zimmerman uses quite a long
sample, July 2010 through March 2018, during which Bitcoin evolved enormously and this
Zimmerman and I have consistent mechanisms, as the large players are likely to pay
relatively much higher fees for their transactions. This is because fees are priced in levels and
do not scale with the size of the transaction. Hence, as transactions and holdings increase,
fees are a relatively smaller share of the user’s portfolio. Thus, if large holders’ activity
is driving the price volatility and they are also likely to be paying relatively higher fees,
10
So the blocks with larger higher block size or usage were actually associated with lower price volatility.
12
then Zimmerman’s model may capture the same causal channel. Since blocks are processed
and appended to the blockchain approximately every ten minutes, the daily average used in
Zimmerman’s model is likely a noisy measure that’s obscures the results too. Two innovations
I employed to improve precision are using hourly data, which does not suppress the short-
term volatility like daily price measures, and studying a more relevant sample period. The
This research project was originally motivated to extend the analysis of Gandel et al.
(2018), aiming to address a related question using modern data. The authors examine the
cryptocurrency exchange Mt. Gox, which experienced a hack and subsequent decline in
2014. The exchange’s data was dumped, yielding a matched dataset 18 million buy and
sell transactions, occurring between April 2011 and November 2013. This dataset includes
trading activity that is settled at the exchange level rather than being mined and recorded
on the blockchain, as well as User ID numbers. The latter is crucial, as this is a feature of
the exchange and allows the authors to match transactions to the same actor11 . The authors
aim to study the suspicious trading activity of two players and its association with Bitcoins’
price movements. The paper finds a large increase in the price of Bitcoin on the days each
player was active. The average daily price volatility was approximately 1% if neither player
11
Using only blockchain data, in order to link transactions one must employ heuristics, such as those used
in Athey et al. (2018). Further, these heuristics use probabilistic matching, whereas matching on User ID
results in near-perfect matches
13
was active and increased by 4% for each player, a result that is both economically and
Two things to bear in mind are that the Mt. Gox exchange comprised 80% of the total
Bitcoin transaction volume and the total market capitalization of Bitcoin was around $1
billion. Together, these features imply large actions on Mt. Gox could materialize directly
into price changes. In the modern Bitcoin market, there are many exchanges and no exchange
is responsible for a large share of the global volume. Moreover, due to exchange arbitrage, the
relative pressure from all other exchanges maintains any singular exchange from deviating
far from the global average. Consequently, I was motivated to look at the trading activity of
the largest players in the aggregate, to operationalize the long-held theory that large players
The primary research question I study is there a high degree of association between
Bitcoin’s price volatility and the activity of big players? Upon analyzing the income dis-
tribution for Bitcoin, it is highly skewed and less than 0.1% of all addresses contain over
20% of the existing money supply. Consequently, the largest holders may influence the price
in two ways: directly, by putting upward or downward pressure via large trading positions
and indirectly, by signalling to market that a large quantity of Bitcoin has been moved. In
the latter case, this could be moved from cold storage or another dormant wallet, thereby
increasing the circulating supply of Bitcoin and creating downward pressure on the price.
14
This paper makes three contributions to the cryptocurrency literature. First, the dataset
of transactions for the wealthiest addresses is publicly accessible and readily available for
related analysis. Moreover, I have extended the analysis of Gandel et al. (2018) to more
recent data and utilized additional modelling specifications; namely multiple time frequencies
and dummy variables for many actors. Lastly, I have progressed the empirical literature on
3 Dataset
Taking an alternative approach to Zimmerman (2018) and Gandel et al. (2018), by using
higher frequency data, I developed an hourly time series of the price of Bitcoin in U.S.
computed a simple average of the hourly high and low prices across four major exchanges:
Kraken, Bitfinex, Coinbase and Bitstamp12 . These exchanges accounted for roughly 60% of
the total Bitcoin volume during the sample period13 . Further, due to the deep and liquid
Bitcoin market, the arbitrage opportunities across exchanges are relatively low and hence
these measures are likely quite representative of the hourly high and low prices of Bitcoin.
12
That is the highest and lowest price on each exchange e in period h, P riceHighe,h and P riceLowe,h .
13
https://data.bitcoinity.org/markets/volume/
15
3.2 Income Distribution and Transaction Data
The blockchain contains the complete record of balances and transactions between all
public addresses. Consequently, one can study the distribution of Bitcoin across addresses,
as well as track the activity and balances of addresses across time. Figure Three shows the
income distribution for the 73 largest holders included in the empirical analysis.
Note that users can make a new address for each transaction, so the bottom of the distri-
bution is likely over-stated. Further, the income distribution changes over time, as there are
very low barriers to transferring large quantities of Bitcoin. All income distribution data
is as of July 1st, 2018 and specifically block number 530063. In section 5.2, I discuss the
concept of a Continuous Income Distribution that links addresses that transacted a majority
of their balances together, to build a much richer transaction history.For example, the sixth
proximately $500,000,000 on July 1st, 2018 and has no previous transaction history. The
richer transaction history and would be more representative in the analysis. This example
highlights that the results may be sensitive to the income distribution, as it changes over
time and can suppress previous activity. The sample period used in this analysis would in-
clude the former address, while future work would observe an income distribution containing
Moreover, using a blockexplorer, the individual transaction data was extracted for each
16
address in the blockchain containing over 10,000BTC in balance. Each address corresponds
to a position in the distribution and the N th richest address corresponds to a dummy variable
defined as:
1, if transaction occurs in hour h
P layerNh = (2)
0, otherwise
Transactions recorded were limited to 10BTC minimum14 , as signals from big players likely
need to be of moderate size. Outlier addresses with large quantities of transactions15 were
excluded, as there were too many observations concentrated on a few players. Further work
The resulting dataset contains hourly observations for 73 active addresses, accounting for
approximately twelve percent of all Bitcoin in circulation16 , across the sample period of July
1st, 2017 through June 30th, 2018. There are 120 addresses with balances over 10000BTC
but the excluded addresses were not active over the sample period. Many addresses acquired
Bitcoin in its early stages and have remained dormant for five years or longer. As previously
argued, this feature reduces the circulating supply and likely amplifies the effects of the
actions of the large players. The key variable of interest, price volatility, was computed two
14
As of July 2018, this was worth approximately $65,000 USD.
15
Typically exceeding 300, but some addresses had over 10,000 transactions.
16
According to bitinfocharts.com and includes entire supply, the addresses contain a much larger share of
the circulating supply when you exclude lost Bitcoin and dormant wallets.
17
ways and in both levels and percent. The intra-hourly measure, equation (2), captures the
direct effect on price volatility from the actions of the big players, while the inter-hourly
aims to capture the indirect market signalling effect of the big players’ activity.
P riceHighh − P riceLowh
P riceV ol1h = (3)
P riceLowh
P riceHighh − P riceHighh−1
P riceV ol2h = (4)
P riceHighh−1
Table One summarizes the activity for each of the big players, as well as their balances
and other characteristics. As seen below, it is not the largest holders but rather the medium
ones that are the most active. Interestingly, several addresses with similar balances tend to
act in the same fashion, indicating they could belong to the same holder. Further analysis
akin to Athey et al. (2016) could be conducted to match some addresses to the same owner.
18
Figure 1: Two measures, introduced in Section 3.3, of the hourly price volatility in $USD of
Bitcoin through the sample period of July 2017 through June 2018. The numbers correspond
to the hourly observation at the beginning of each month, so the hourly observation number
4321 corresponds to December 1st .
voltime.png
19
Figure 2: Plots the price of Bitcoin from 2012-2015 and two results of the velocity models
from Athey et al. (2018).
AtheyVol.png
20
Figure 3: The distribution of Bitcoin across the 73 richest addresses, as of July 1st , 2018.
incomedist.png
21
Figure 4: Table containing the position, address name, number of transactions in the same
period, balance in Bitcoin and USD, as of July 1st , 2018.
table.png
22
4 Empirical Model and Results
4.1 Specification
The baseline model studied was inspired by Gandel et al. (2018) and the model specifi-
Their model includes controls for hacking events, which were not statistically significant and
are less relevant in the recent data. Both Zimmerman (2018) and this model relate a measure
of price volatility to a constant, capturing average price volatility, and a key variable that
encapsulates their respective empirical hypothesis. These models highlight the relatively
coefficients in equation (5) represent the additional price volatility conditional on days the
73
X
P riceV olJh = γ0 + γk P layerk,h + ζh (6)
k=1
where γ0 represents the average hourly price volatility, conditional on none of the big players
being active. Under the empirical hypothesis, should be positive and relatively low. Further,
23
each γi represents the differential hourly price volatility conditional on the hourly observa-
tions player i is active; likewise each should be positive and significant. As evident in Figure
Three, the income distribution has a group structure, at different levels of address balances.
Equation (7) aims at capturing the association at the group level. The 6th , 25th , 45th and 73rd
mutually exclusive group cut-offs were selected based on similar balance size and robustness
checks around other cutoffs did not meaningfully change the results.
4.2 Results
Figure Five contains the regression results for the hourly individual address data, across
four measures of price volatility. For the sake of brevity, I have omitted addresses whose
coefficients were not statistically significant. The general result is addresses tend to be
significant across all measures of volatility if they are significant. Secondly, the intra-day
measure in levels, shown in the second and seventh columns, has the most statistically
significant. The signs of the coefficients are stable across the two specifications, further
suggesting accuracy in the results. Lastly, the magnitudes of the intra-day volatility tend
to be larger than the inter-day, suggesting that the markets tended to fluctuation within
an hour significantly but tend to trend in the same direction overtime. This is because the
latter measure compares high prices across days and lower volatility implies the high prices
24
Figure 5: Regression estimates for the hourly frequency data, at the individual address level.
hrlyind.png
25
Regarding the signs of coefficients, surprisingly the largest addresses all have negative co-
efficients. Generally, most addresses have moderate size coefficients and are negative. In-
terestingly, some addresses with large positive coefficients are 24, 27, 40, 41, 50 and most
notably, 62. The constant in the first specification is 128, implying that the average hourly
difference between the high and low price sold across the prominent exchanges varied by
128$, conditional on none of the large players being active. Under the empirical hypothesis,
this can be viewed as the average price fluctuation absent any signals from the large player’s
trading activity. Considering that the price in levels varied immensely across the sample
and on average was approximately $10,000, this equates to around 1% and is in-line with
the estimates in Gandel et al. (2018). Each coefficient is interpreted as the differential price
volatility, conditional on the hours the address was active. For example, the coefficient on
address 50 implies that on average the high and low price varied by $460, an additional $330
compared to the sample average. Similarly, the coefficient in the eighth column, 0.0497,
implies that the average additional price volatility on the days the 50th actor was active was
Figure Six plots all 73 coefficients, with colours corresponding to the level of statistical
significance. Green implies the coefficient is significant at the 1% level, yellow at 5% and red
at 10%. When visualizing the results as a whole, there are not many large, green, positive
bars. This is further indication that perhaps a smaller subset of the largest players’ activity
Figure Seven contains the regression output corresponding to equation (7), the group
26
specification. The same themes highlighted above are broadly present. The richest ad-
dresses maintain negative coefficients and middle groups tend to have the largest, positive
coefficients. An interpretation I have developed is that the largest addresses have consistently
negative coefficients due to lower price elasticity. Due to the addresses being the wealthiest,
they are the least sensitive to price changes and are associated with lower price volatility.
4.3 Robustness
For several robustness checks, I aggregated the transaction and price data to the 6-
hourly and daily time frequencies. Each address dummy variable now equals one if the
address was active during any of the hourly observations within each six hour observation.
The daily transaction data was defined the same way and the price volatility measures were
simply averaged for each new observation. Figure Eight contains the individual address
estimates at the 6-hourly frequency. Moreover, Figure Nine and Ten contain the group
estimates at the 6-hourly and daily frequencies, respectively. The main observations are
broadly consistent across these alternate specifications. The coefficients tend to be smaller
in magnitude, while maintaining sign and degree of statistical significance. One interesting
observation is that address 62 is highly positive and significant at the hourly frequency but
less so when aggregated by time level and across all measures of price volatility.
Another robustness check I conducted used alternate daily price data17 , to compare the
magnitudes to the aggregated data. Most coefficients maintained sign and significance, but
gained magnitude. I interpreted this as evidence that averaging the volatility measures
17
https://www.coindesk.com/price/
27
suppresses the price volatility.
28
Figure 6: A bar graph plotting the coefficients corresponding to column two and seven of
Figure Five. The dependent variable is PriceVol1, the intrahourly volatility measure in levels.
coefficients.png
29
Figure 7: The regression output corresponding to equation (7), aggregated into groups at
the hourly time frequency.
hourly.png
30
Figure 8: Regression estimates for the 6-hourly frequency data, at the individual address
level.
6hrindividual.png
31
Figure 9: The regression output corresponding to equation (7), aggregated into groups at
the 6-hourly time frequency.
6hrly.png
32
Figure 10: The group regression output aggregated at the daily time frequency.
daily.png
33
5 Discussion
A common theme across time frequencies is the trade-off between the frequency of ob-
servations and the level of aggregation of addresses. For example, one can only obtain group
estimates when the data is aggregated to the daily level. As most addresses are active
throughout a significant portion of the sample period, at the daily level the data is not
granular enough. Including all 73 addresses with only 365 daily observations induces multi-
colinearity across regressors and yield little statistical significance. When extending similar
analysis to a much broader set of addresses, one should pay attention to the balance of aver-
age number of observations per address to total time observations. On a related note, careful
attention should be exercised when aggregating the price data. The trade-off is accurately
capturing the short-term volatility representative of the time frame and allowing time for
the transaction activity signals to dissipate. Further analysis employing alternate measures
of price volatility, perhaps using the logarithim approach in Zimmerman (2018), ought to be
conducted too.
Regarding the transactions data, each dummy variable was coded to include any trans-
action over 10BTC. Consequently, the current specification does not account for varying de-
grees of activity. This is relevant because a larger transaction has a larger effect on the price
directly and also, indirectly in the form of a larger signal. Yet, this model would view a com-
34
34,000BTC, roughly equal to $250,000,000, by 1MAhRt279uYmVC1dUxKR6dWwEULBJT34Nh
on May 18th. On a similar note, many transactions within the same hourly period are re-
garded together as one singular observation. Despite an actor having a reason to send more
than one large transaction, this feature of the data is also unaccounted for under the current
specification. Further work is required to develop transaction activity measures that account
Unlike Gandel et al. (2018) having access to the user IDs in the Mt. Gox exchange
dataset, there is no clear way to link multiple addresses to an individual. As a result, the
empirical analysis was conducted under the assumption that each address is not directly
connected to another. When studying a much broader set of addresses, using heuristics as in
Athey et al. (2016) to match addresses is vital. Further, one should consider that including
many addresses can result in both sides of transactions to be included in the data, yielding
another source of multicolinearity. This was likely not a significant problem in this analysis,
as only 73 addresses were included and most addresses transacted with a much broader set
of excluded addresses.
Lastly, an important set of transactions not observed in this analysis are Over-The-
Counter (OTC) transactions. These transactions involve two or more parties privately
agreeing to transfer Bitcoin for fiat currencies or other cryptocurrencies18 . OTC transac-
tions occur off the blockchain and beyond the view of the econometrician. This feature of
the data is akin to the traditional financial markets and particularly in derivatives. The
18
Bitcoin is often paired with USD, EUR, JPY, KRW and CHF fiat currencies, as well as many other
countries. It is also the most common currency pair for over 1000 alternate cryptocurrencies, along with
Ethereum (ETH).
35
prime consequence of interest is poor price discovery, which likely adds to the overall price
volatility.
Further thought should be paid to the income distribution and how it varies over time.
Using the snapshot as of Block 560003 is relatively arbitrary and further study of the changes
in the income distribution are paramount. Particularly, linking addresses that previously
held large quantities of Bitcoin to form a Continuous Income Distribution would be a major
received a large transaction from a series of transactions via one-off address, thereby obfus-
cating the data19 . Consequently, the continuous distribution would include one address with
a linked transaction history, to better capture the effect this actor had on the price volatility
of Bitcoin. In order to fully appreciate this concept, consider the following explanation, “We
define an electronic coin as a chain of digital signatures. Each owner transfers the coin to the
next by digitally signing a hash of the previous transaction and the public key of the next
owner and adding these to the end of the coin.” (Nakamoto 2009). This quote highlights
that Bitcoin aren’t physical, discrete objects but rather a chain of agreements to the bal-
ance of an address. With this formulation in mind, linking addresses to obtain a complete,
time-continous income distribution is feasible and would better capture the true effects of
19
1MAhRt... received 34000BTC from 1NRDQ8..., who received a nearly identical quantity from
1GTYSS..., who only had one other transaction receiving the 34,000BTC from 1CiAzy.... The last ad-
dress held a balance majority of the sample period, so creating a continuous income distribution would
involve linking these addresses as one.
36
5.2 Extensions and Roadmap Ahead
There are several ways to extend the empirical analysis, regarding data. First, a parallel
analysis of the most active accounts ought to be conducted. These accounts were viewed as
outliers, since they often had hundred or thousand times as many observations per address.
Consequently, many addresses that were significant when the outliers were excluded no
longer had evidence to conclude they were statistically distinct from zero. Many of these
addresses belong to exchanges or other Bitcoin companies, such as gambling websites and
mining pools. News about events may materialize in the blockchain, such as a cryptocurrency
exchange sending a large quantity of Bitcoin out of cold storage to help fulfill orders to leave
their exchange, implying money could be leaving the aggregate cryptocurrency markets and
Moreover, developing a model to study the dynamics of the income distribution and
enriching the set of addresses used in the analysis is a very meaningful extension of the data.
Gandel et al. (2018) use dummy variables for important events, such as DDos attacks on
the Mt. Gox exchange, as control variables in their analysis20 . While these controls weren’t
significant in their analysis and are less relevant in the modern ecosystem, finding data on
other events or characteristics of Bitcoin to control for would help to improve the empirical
model.
that have their own blockchains. An important distinction exists between currencies and to-
20
Essentially attacks via the internet where the exchanges were overloaded with information and forced
temporarily offline, leading to short-term illiquidity.
37
kens, where the former possess their own forms of blockchains and the latter are built upon
other blockchains. The procedure of studying the income distribution and the association
between price volatility and the activity of the big players relies solely on the blockchain
technology. Consequently, using Bitinfocharts.com, one can find the income distribution
and transaction data for many other large cryptocurrencies, namely: Bitcoin Cash, Litecoin,
Dash, Dogecoin and many others. Future work could compute the Gini Coefficient for each
cryptocurrency and draw inferences based on the dynamics of each income distribution. Fur-
ther, comparing the association between price volatility and the activity of big players across
cryptocurrencies may yield a range of magnitudes that are both economically significant and
reasonable in size.
6 Conclusion
This paper presents an empirical procedure, leveraging the institutional structure of novel
Blockchain technology, for studying the association between Bitcoin’s price volatility and the
activity of the largest holders. I found robust evidence for the association of price volatility
with several key players, notably below the top of the distribution. More generally, I aimed
to motivate the studies employing more sophisticated econometric techniques and broader
sets of addresses. Combining these results with alternative cryptocurrencies will yield a
range of reasonable magnitudes for empirical results and a better understanding of Bitcoin’s
price volatility.
Bitcoin’s supporters have described it as the next generations’ major innovation, equiv-
38
alent in magnitude to the Internet21 . The cryptocurrency space has been the subject of
fast-growing academic literature, formalizing and operationalizing theories that have long
existed on platforms such as Twitter, Medium and Reddit. While progress has been made,
technological advance and regulatory changes still currently outpace academic research. The
second objective this paper satisfied was paving a roadmap for future work. In the presence
of many open questions and an abundance of data, the cryptocurrency ecosystem is ripe for
further academic study. Future contributions, such as a dynamic income distribution, will
have profound implications on the long debated question of the extent of the Whales’ ability
21
https://medium.com/@andrewcretin/its-2018-blockchain-is-on-it-s-way-to-become-the-new-internet-
7055ed6851e
39
7 References
7.1 Literature
Athey, Susan and Parashkevov, Ivo and Sarukkai, Vishnu and Xia, Jing, Bitcoin Pricing,
Adoption, and Usage: Theory and Evidence (August, 2016). Stanford University Grad-
Berentsen, Aleksander and Schar, Fabian, A Short Introduction to the World of Cryptocur-
rencies (February 2018). Federal Reserve of St. Louis Review, Vol. 100, 1-16.
Catalini, Christian and Gans, Joshua S., Some Simple Economics of the Blockchain (Septem-
ber 2017). Rotman School of Management Working Paper No. 2874598; MIT Sloan
Chiu, Jonathan and Koeppl, Thorsten V., The Economics of Cryptocurrencies – Bitcoin and
Gandel, Neil and Hamrick, JT and Moore, Tyler and Oberman, Tali, Price Manipluation in
the Bitcoin Ecosystem (January 2018). Journal of Monetary Economics, Vol. 95, 86-96.
https://www.bitcoin.org/bitcoin.pdf
Raskin, Max and Yermack, David, Digital Currencies, Decentralized Ledgers, and the Future
40
Yermack, David, Is Bitcoin a Real Currency? An Economic Appraisal (December 2013).
Zimmerman, Peter, Blockchain and Price Volatility (June 2018). Working Paper, Said Busi-
41