Unveiling Cryptocurrency Conversations Insights From Data Mining and Unsupervised Learning Across Multiple Platforms

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Received 30 October 2023, accepted 16 November 2023, date of publication 20 November 2023,

date of current version 27 November 2023.


Digital Object Identifier 10.1109/ACCESS.2023.3334617

Unveiling Cryptocurrency Conversations: Insights


From Data Mining and Unsupervised Learning
Across Multiple Platforms
HAE SUN JUNG1 , HAEIN LEE1,2 , AND JANG HYUN KIM 2,3
1 Department of Applied Artificial Intelligence, Sungkyunkwan University, Seoul 03063, South Korea
2 Department of Human-Artificial Intelligence Interaction, Sungkyunkwan University, Seoul 03063, South Korea
3 Department of Interaction Science, Sungkyunkwan University, Seoul 03063, South Korea

Corresponding author: Jang Hyun Kim ([email protected])


This work was supported by the National Research Foundation of Korea (NRF) Grant funded by the Korean Government under Grant
RS-2023-00208278.

ABSTRACT The rapid growth of the cryptocurrency market has led to an increasing interest in the subject.
Cryptocurrency is now recognized as an asset, and laws and financial regulations have begun to emerge
for supporting its practical use. As a result, it has become essential to perform data mining and attain
knowledge from text data related to cryptocurrency. Previous studies have focused on analyzing data from
a single source such as Twitter. However, there are unique insights to be gained from data across multiple
platforms. In the present study, we utilized data mining techniques to extract insights from LexisNexis,
Web of Science, and Reddit, representing the media, academia, and general public, respectively. Among
unsupervised learning technologies, topic modeling was employed for the analysis. Topic modeling is a
methodology that uncovers hidden meanings within the collected data. Among the diverse topic modeling
techniques available, bidirectional encoder representations from transformers topic was chosen for the
analysis. BERTopic considered to be state-of-the-art in the field of topic modeling. Dynamic topic modeling
was employed to track changes in themes over time. Our experimental results reveal a tendency in the
news to cover major events related to cryptocurrencies, such as regulatory developments and market trends.
Academic papers, on the other hand, tend to focus on the technology behind cryptocurrencies and related
research. Finally, social media conversations center more around information delivery from an investor’s
psychological perspective, such as market sentiment and investment strategies.

INDEX TERMS Bitcoin, cryptocurrency, data mining, machine learning, natural language processing,
unsupervised learning, topic modeling, BERTopic.

I. INTRODUCTION Despite the growing interest in cryptocurrency market,


Bitcoin represents the first cryptocurrency that utilizes decen- there exists a research gap marked by a deficiency in achiev-
tralized cryptographic technology – namely blockchain – ing a comprehensive understanding of the major themes and
allowing for payments or transfers between parties over a perceptions surrounding cryptocurrency. This gap is particu-
short period without relying on financial institutions [1]. larly pronounced due to the relatively short history of Bitcoin
The successful launch and growth of Bitcoin have triggered compared to other assets.
the creation of many other cryptocurrencies, which in turn To address the existing research gap and contribute to the
has substantially expanded the cryptocurrency market, driven body of knowledge, the authors employed topic modeling,
primarily by notable price increases [2]. an unstructured text data mining method, incorporating data
from three distinct sources: LexisNexis, Web of Science,
The associate editor coordinating the review of this manuscript and and Reddit. For the analysis, query ‘‘BTC’’ and ‘‘Bitcoin’’
approving it for publication was Rongbo Zhu . were employed based on previous research that suggests
2023 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 11, 2023 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 130573
H. S. Jung et al.: Unveiling Cryptocurrency Conversations

level of interest on Bitcoin can serve as an indicator of exemplified. Lastly, the default structure and advantages of
the broader interest in cryptocurrencies [3], [4], [5]. This using BERTopic are illustrated.
approach enabled authors to extract valuable insights from
a wide spectrum of sources, encompassing mass media, A. RESEARCH ON BITCOIN USING NATURAL LANGUAGE
academia, and social media, providing a more comprehensive PROCESSING
perspective on the subject.
NLP is utilized to extract valuable information from texts
Topic modeling is a methodology that automatically clus-
across various fields, with the objective of deriving insights
ters words based on statistics and machine learning [6].
and practical applications. The current section presents litera-
The method can help attain insights by extracting the main
ture reviews on prior studies that utilized NLP to gain insights
theme from a given data source [7]. Although numerous into Bitcoin.
topic modeling methods, such as latent Dirichlet allocation In [10], the relation between social media topic discussion
(LDA) and non-negative matrix factorization (NMF) are and cryptocurrency market price fluctuations was analyzed
available, but in this study, the state-of-the-art (SOTA) model
via statistical and NLP models (i.e., DMR). In [11], NLP
known as bidirectional encoder representations from trans-
algorithms were used to measure the relationship between
formers topic (BERTopic) was applied [8]. BERTopic is a investment sentiment and bitcoin price fluctuations using data
topic modeling technique that employs Bidirectional Encoder
from the subreddits ‘‘r/bitcoin’’ and ‘‘r/investing.’’ In [12],
Representations from Transformers (BERT) embeddings and
a prediction was made on the direction and magnitude of
class-based TF-IDF (c-TF-IDF) to formulate compact clus-
Bitcoin price fluctuations using sentiment analysis and post
ters that are easily explicable while preserving important
volume extracted from Twitter data. A relative accuracy of
words in topic explanations [9]. Additionally, BERTopic is 63% was achieved using a model based on recurrent neural
recognized for its modularity, enabling the incorporation and networks (RNN) and convolutional neural networks (CNN).
utilization of diverse algorithms within its framework. Con- Satarov et al. [13] confirmed that sentiment analysis of
sidering the previously mentioned advantages, the authors tweets related to Bitcoin can be used to predict Bitcoin price
have selected modified BERTopic as the most suitable changes. An accuracy of 62.48% was attained by a random
algorithm for investigating perceptions related to Bitcoin. forest regression model when applying sentiment analysis to
At the beginning of the analysis on each specific plat-
Twitter data. Jung et al. [14] aimed to predict Bitcoin price
form, the authors assessed the coherence values of baseline
trends by analyzing both the volume and sentiment of Reddit
models (LDA, NMF) and the modified BERTopic. This pro-
data, as well as technical indicators of chart analysis. The
cess confirmed the capability of BERTopic to accurately
authors achieved an accuracy of 90.57% and an area under
depict topics in the chosen domain. In summary, modified
the curve (AUC) value of 97.48% using an extreme gradient
BERTopic demonstrated the highest coherence values in all
boosting (XGBoost) model.
three analyses, resulting in the subsequent findings derived
from its application.
The results topic representations revealed that news B. TOPIC MODELING
sources primarily cover major events, academic journals Topic modeling is a statistical model used in the field of
focus on technological advancements, and social media plat- NLP to discover abstract main themes, referred to as topics,
forms discuss the sentiments of cryptocurrency investors, within sets of documents [15]. In other words, it is a text
affirming the distinct characteristics within each domain. mining technique that is utilized to uncover hidden semantic
In summary, our research not only addresses the existing structures within textual data. Examples of representative
research problem but also endeavors to bridge the research topic modeling technologies are DMR and LDA.
gap by offering a comprehensive analysis of Bitcoin-related Yin and Yuan [16] employed LDA topic modeling to ana-
text data from diverse sources. The insights gained from this lyze research subjects and progress trends related to blended
study have the potential to advance understanding of the learning using keyword analysis, with results showing that
challenges and issues faced by blockchain technology, setting the ratio of element topics in blended learning has been
the stage for future developments in this field. In addition, the increasing every year. Moreover, the text analysis provided
utilization of modified BERTopic across various platforms theoretical and methodological reference materials to facil-
constitutes a novel approach that remains notably unexplored itate future research. Polyzos and Wang [17] conducted an
within the existing research. LDA analysis on Twitter data to quantify energy market effi-
ciency. The extracted topic was then applied to a classification
model to measure prediction accuracy for market movements.
II. RELATED WORKS Sharma and Sharma [18] collected research papers related to
The following subsections present brief reviews of literature blockchain technology from various databases and attempted
that aim to provide insight by applying natural language to create a semantic map using the LDA model. Through
processing (NLP) to Bitcoin. Subsequently, the concept of a metadata analysis, an abstract perspective of blockchain
topic modeling is explained, and studies that sought to obtain was attained. Avasthi et al. [19] conducted a comparison
knowledge from various domains using topic modeling are of various topical models, including LDA, correlated topic

130574 VOLUME 11, 2023


H. S. Jung et al.: Unveiling Cryptocurrency Conversations

model (CTM), hierarchical Dirichlet process (HDP), and This model is known as class-based TF-IDF (c-TF-IDF).
DMR, using adolescent drug use and depression as keywords.  
Egger and Yu [20] applied LDA, NMF, and BERTopic to A
Wx,c = tfx,c · log 1 + (1)
Twitter posts and conducted a comparative analysis for each f
topic modeling algorithm.
In the aforementioned equation, tfx,c is the frequency in
class c of word x, fx is the frequency of word x among all
C. DEFAULT BIDIRECTIONAL ENCODER classes, and A is the average number of words per class. Sim-
REPRESENTATIONS FROM TRANSFORMERS TOPIC ilar to the traditional TF-IDF approach, the importance score
(BERTOPIC) STRUCTURE of a word in each class is obtained by multiplying the term
BERTopic is a SOTA framework for topic modeling technol- frequency tfx,c and inverse document frequency log(1 + fAx ).
ogy that consists of five sub models, each of which can be In contrast to traditional topic modeling methodologies,
selected and used independently [9]. In the current section, BERTopic stands out by leveraging pretrained language mod-
the default structure of BERTopic is presented. els for document and word representations, making it adept
at capturing complex relationships between words and con-
1) DOCUMENT EMBEDDING text. Furthermore, its non-linear dimensionality reduction
The default BERTopic approach utilizes sentence- approach and modularity enable BERTopic to enhance the
transformers (SBERT) to convert text data into numerical quality of topic representation compared to conventional
representations, allowing for exclusive semantic similarity methods.
that significantly enhances clustering tasks in comparison to This study is meaningful in that it is the first to combine
LDA [21]. the keyword ‘Bitcoin’ with modified BERTopic, the SOTA
in topic modeling, to obtain knowledge from three distinct
sources - LexisNexis, Web of Science, and Reddit.
2) DIMENSIONALITY REDUCTION
Because clustering models struggle with high-dimensional
III. METHOD
data due to the curse of dimensionality, it is essential to
The following subsections outline the experimental pro-
perform dimensionality reduction after obtaining represen-
cedures followed in this study. Firstly, the sources and
tation [22]. By default, BERTopic utilizes uniform manifold
descriptions of collected data are represented. Secondly, the
approximation and projection (UMAP) for this task. UMAP
preprocessing steps undertaken for the text data are presented.
is a dimensionality reduction procedure that preserves both
Finally, the application of BERTopic for deriving topic mod-
local and global structures in data, thereby enabling the
eling results is explained.
clustering of semantically similar documents [23].
A. DATA COLLECTION
3) CLUSTERING DOCUMENTS AND BAG OF WORDS It is possible to gauge the general sentiment toward cryptocur-
Default BERTopic uses hierarchical density-based spatial rency by analyzing data on Bitcoin, which is representative
clustering of applications with noise (HDBSCAN), a density- of blockchain technology and cryptocurrency [3], [4], [5].
based clustering technique that clusters text data [24]. This Accordingly, all data examined in this study were collected
approach allows for outlier detection and the identification using search queries for ‘‘Bitcoin’’ and ‘‘BTC.’’ To interpret
of different cluster shapes, preventing text data from being the sentiment toward cryptocurrency from media, academic,
forcibly included in the wrong cluster. and public perspectives, data were collected from LexisNexis,
Because HDBSCAN generates clusters with varying densi- Web of Science, and Reddit, respectively. Data obtained from
ties and shapes, centroid-based topic presentation techniques LexisNexis comprise the full body of news articles, whereas
may not be suitable. Instead, BERTopic combines all docu- those collected from Web of Science encompass the abstracts
ments within each cluster into a single document to create a of academic papers, and those collected from Reddit encom-
cluster-level bag-of-words (BoW) that records the frequency pass both posts and comments from the r/Bitcoin and r/BTC
of each word in each cluster. This holds significance since subreddits. The data encompass a period of six years from
topic modeling primarily examines words at the topic (i.e., March 1, 2017, to March 1, 2023. In total, 17,230 news
cluster) level. articles, 9,520 academic papers, and 10,914,149 social media
texts were collected.
4) ATTAINING TOPIC REPRESENTATION
Finally, L1 normalization is applied to the BoW repre- B. DATA PREPROCESSING
sentation to account for clusters of varying sizes. Term First, any instances of data wherein the acronym BTC was
Frequency-Inverse Document Frequency (TF-IDF) for BoW used for extraneous contexts (e.g., Cu-BTC, Biliary tract
must consider topics or clusters, rather than individual doc- cancer) were eliminated. All data that use English spelling
uments. By extracting the most significant words from each to express other languages, as well as duplicates and missing
cluster, it is possible to obtain an explanation of the topic. values, were eliminated. A total of 6,011 academic papers,

VOLUME 11, 2023 130575


H. S. Jung et al.: Unveiling Cryptocurrency Conversations

TABLE 1. Implementation setup.

FIGURE 1. Experimental flow diagram of topic modeling with modified


BERTopic.

10,925 news articles, and 9,830,800 social media texts were


used in the analysis.

C. APPLICATION OF MODIFIED BERTOPIC


In this study, the authors partially modified the default
BERTopic structure for the diversity of topic representations.

1) DOCUMENT EMBEDDING
Although SBERT is the default embedding model, the authors
utilized spaCy as an alternative embedding algorithm [25].
One of the motivations for utilizing spaCy lies in its abil-
ity to deliver rapid processing speed while ensuring higher
quality in capturing the textual semantics and maintain-
ing pertinent information during the embedding generation
process.

FIGURE 2. Comparison of coherence values with baseline models on


2) DIMENSIONALITY REDUCTION
LexisNexis data.
The authors adopted UMAP in the dimensionality reduction
phase over other algorithms as it effectively maintains the A. COMPARISON OF COHERENCE VALUES BETWEEN
non-linear and complex structures in textual data. BERTOPIC AND BASELINE MODELS
Coherence serves as a measure for assessing the interpretabil-
3) CLUSTERING DOCUMENTS AND BAG OF WORDS ity, consistency, and meaningfulness of topic modeling
Consequently, the authors adopted HDBSCAN because it outcomes. Typically, a higher coherence value in the results
reduces noise and enhances topic representation. Addition- of topic modeling is regarded as a sign of a more effective
ally, Scikit-Learn’s tokenizer was utilized to generate BoW topic model [26]. Additionally, the use of coherence allows
at the cluster level. for the extraction of the optimal number of topics.
To ensure the validity of the analysis using BERTopic in
4) ATTAINING TOPIC REPRESENTATION each domain and to determine the optimal number of topics,
As a result, topic representation was generated using c-TF- the authors first compared it with baseline models based on
IDF to confirm the results. For each topic, the words with coherence values (Fig. 2, Fig. 3, Fig. 4).
the highest c-TF-IDF score were provided. An experimental Although differences in values were observed, it was con-
diagram has been constructed to illustrate the flow of the firmed that BERTopic exhibited the highest coherence values
experiment (Fig. 1), and the implementation setup and hyper- for all three target platforms. It was also confirmed that,
parameters can be found in Table 1. for each platform, using 8, 9, and 4 topics was suitable for
extracting the optimal representation.
IV. EXPERIMENTS The tendency appeared in previous studies comparing topic
The following subsections describe the differences in coher- models. BERTopic demonstrates its value as SOTA by gen-
ence values between baseline models (LDA, NMF) and erating insights from short and unstructured text, ensuring
BERTopic, as well as the insights gained through BERTopic high stability and diversity across various domains [20], [27].
analysis on data collected from three domains: LexisNexis, Consequently, it outperforms recently introduced models that
Web of Science, and Reddit. lack application in diverse domains.

130576 VOLUME 11, 2023


H. S. Jung et al.: Unveiling Cryptocurrency Conversations

Topic 1 was designated as ‘‘Energy consumption of Bitcoin


mining’’ through ‘‘energy,’’ ‘‘mining,’’ ‘‘power,’’ ‘‘tesla,’’
‘‘cryptocurrency,’’ and ‘‘electricity.’’ The mining of cryp-
tocurrency using the proof-of-work (PoW) methodology and
its associated energy consumption have been topics of ongo-
ing discussion among experts in related fields. Li et al.
[30] investigated the energy consumption of cryptocurrency
mining, and emphasized the need for energy conserva-
tion and sustainable development despite the promising
potential of blockchain technology. O’Dwyer and Malone
[31] examined the energy consumption acquired for bitcoin
mining, and discovered that the aggregate energy consump-
tion utilized in Bitcoin mining was equivalent to that of
Ireland.
Topic 2 was summarized as ‘‘Cryptocurrency-friendly
countries’’ through ‘‘salvador,’’ ‘‘ukraine,’’ ‘‘country,’’ ‘‘gov-
FIGURE 3. Comparison of coherence values with baseline models on Web ernment,’’ ‘‘bitcoin,’’ ‘‘cryptocurrency,’’ ‘‘bukele,’’ ‘‘dollar,’’
of Science data.
and ‘‘Venezuela.’’ An increasing number of countries are
designating cryptocurrency as legal tender or expressing a
cryptocurrency-friendly stance. For instance, President Nayib
Bukele of El Salvador made the move of recognizing Bit-
coin as a legal currency, Ukraine has legalized Bitcoin,
and Venezuela is seeing a rise in cryptocurrency usage
[32], [33], [34].
Topic 3 was named ‘‘Taxation of cryptocurrency income
in the United States’’ through ‘‘tax’’, ‘‘cryptocurrency,’’
‘‘irs,’’ ‘‘government,’’ ‘‘income,’’ ‘‘bill,’’ and ‘‘taxpayer.’’
The United Stastes Treasury Department has required the
Internal Revenue Service (IRS) to report cryptocurrency
transactions worth more than $10,000 [35], [36]. In addi-
tion, plans for cryptocurrency taxation are being continuously
discussed globally [37], [38].
Topic 4 was labeled ‘‘Conflicts between Berkshire Hath-
away and cryptocurrency companies’’ through ‘‘robinhood,’’
FIGURE 4. Comparison of coherence values with baseline models on ‘‘gemini,’’ ‘‘buffett,’’ ‘‘winklevoss,’’ ‘‘investor,’’ and ‘‘berk-
Reddit data.
shire.’’ Prominent stock investors Warren Buffett and Charlie
Munger are known for their negative views on cryptocurrency
B. TOPIC ANALYSIS FROM LEXISNEXIS DATA [39]. As a result, their remarks often draw opposition from
An analysis of LexisNexis data was conducted to gain companies in the cryptocurrency industry [40].
insights into the media’s perception and knowledge of Topic 5 was described as ‘‘Singapore’s cryptocurrency-
Bitcoin. Each topic presentation was evaluated using a related regulations’’ through ‘‘Singapore,’’ ‘‘mas,’’ ‘‘binance,’’
c-TF-IDF score. and ‘‘regulation.’’ Binance, the largest cryptocurrency
Topic 0 was denoted as ‘‘Bitcoin as a digital cur- exchange, made efforts to obtain exchange approval from
rency’’ through ‘‘cryptocurrency,’’ ‘‘currency,’’ ‘‘digital,’’ the Singapore government since 2020, however, it eventually
and ‘‘bank.’’ There are ongoing discussions regarding ceased its services in 2022 due to the Monetary Authority of
whether Bitcoin or other cryptocurrency technologies can Singapore’s (MAS) strict regulations regarding the approval
function as digital currencies and potentially replace tradi- of cryptocurrency exchanges [41], [42].
tional currencies. Grinberg [28] addressed various aspects Topic 6 was designated as ‘‘Cryptocurrency as a
of Bitcoin, including its innovative nature as a decentralized means of donation’’ through ‘‘donor,’’ ‘‘donation,’’ ‘‘char-
digital currency, its relationship with the United States (U.S.) ity,’’ ‘‘donate,’’ ‘‘cryptocurrency,’’ and ‘‘dafs.’’ The faster
dollar as a global reserve currency, and the legal challenges and easier remittance processes offered by cryptocur-
that Bitcoin faces in terms of regulation and adoption. Dwyer rency technology compared to those in conventional
[29] emphasized that Bitcoin can be transferred peer-to-peer finance, as well as the transparency of transactions,
without bank intervention. The authors also mentioned the have led to the emergence of a donation culture using
advantage of Bitcoin as a digital currency that prevents double cryptocurrency [43], [44]. As a representative exam-
spending of wire transfer fees using open-source software. ple, during the Russian-Ukrainian War, donations using
VOLUME 11, 2023 130577
H. S. Jung et al.: Unveiling Cryptocurrency Conversations

TABLE 2. Topic representations of bitcoin from lexisnexis.

FIGURE 5. Hierarchical clustering from LexisNexis data.

cryptocurrency have been collected on an international


scale [45].
Topic 7 was presented as ‘‘Cryptocurrency lending
platform’s bankruptcy’’ through ‘‘celsius,’’ ‘‘mashinsky,’’
‘‘customer,’’ ‘‘lender,’’ ‘‘bankruptcy,’’ ‘‘withdrawal,’’ and
‘‘deposit.’’ The bankruptcy of multiple cryptocurrency
lenders following the collapse of Terra and Luna in
2022 resulted in significant losses for cryptocurrency
investors, leading to widespread distrust and negative effects
on the cryptocurrency market [46], [47]. These events
highlight the need for international cooperation among
relevant organizations to establish standards and promote
collaboration [48].
Results from LexisNexis revealed that media predomi-
nantly covers major issues related to the cryptocurrency
market (Fig. 5, Table 2). Beginning with the function of
cryptocurrency as a digital currency, both the advantages and
disadvantages have been covered extensively. Additionally,
the social and legal challenges faced by cryptocurrency have
been identified, providing insight into problems that must be
addressed in the future.

C. TOPIC ANALYSIS FROM WEB OF SCIENCE DATA


An analysis of Web of Science data was conducted to under-
stand the perception and knowledge of Bitcoin in academia.
Each topic presentation was evaluated using a c-TF-IDF
score.
Topic 0 was denoted as ‘‘Contributions and applications
of blockchain technology’’ through ‘‘blockchain,’’ ‘‘technol-
ogy,’’ ‘‘system,’’ ‘‘node,’’ and ‘‘application.’’ Although M.
Hashemi Joo et al. [49] addressed the use of blockchain
technology in cryptocurrency, it is important to note that
blockchain technology can be utilized in a wide range of
fields, and research regarding its use in other industries is
actively underway [50], [51], [52].
Topic 1 was designated as ‘‘Bitcoin as an investment asset’’
through ‘‘market,’’ ‘‘cryptocurrency,’’ ‘‘volatility,’’ ‘‘asset,’’
‘‘gold,’’ ‘‘price,’’ and ‘‘portfolio.’’ Cryptocurrency continues
to attract more attention as an investment asset rather than a that reflects task participation by iteratively searching for
real-life application of technology [53]. Because cryptocur- a hash value below the target threshold [57]. Because this
rencies are now considered a new asset class, governments process consumes a significant amount of power, there is
around the world are beginning to introduce relevant regula- ongoing discussion and research to reduce its energy con-
tions. In this trend, studies have been conducted on whether sumption while maintaining the security and integrity of the
Bitcoin can be included in the investment portfolio and used system [58].
as a hedging instrument [54], [55], [56]. Topic 3 was named ‘‘Bitcoin price prediction through
Topic 2 was summarized as ‘‘Bitcoin’s PoW mecha- sentiment analysis’’ through ‘‘model,’’ ‘‘price,’’ ‘‘predic-
nism’’ through ‘‘mining,’’ ‘‘btc,’’ ‘‘miner,’’ ‘‘pool,’’ ‘‘block,’’ tion,’’ ‘‘sentiment,’’ ‘‘learning,’’ ‘‘volatility,’’ ‘‘forecast,’’ and
‘‘energy,’’ and ‘‘reward.’’ PoW is a consensus algorithm ‘‘machine.’’ Among machine learning models, studies have

130578 VOLUME 11, 2023


H. S. Jung et al.: Unveiling Cryptocurrency Conversations

been actively conducted to predict the price of Bitcoin, espe-


cially using sentiment analysis. In [59], sentiment scores
attained using Valence Aware Dictionary and sEntiment Rea-
soner (VADER) over 2019 and 2020 were found to correlate
with short-term trends in Bitcoin prices. In [14], the trend
of Bitcoin prices was predicted by combining technical and
FIGURE 6. Hierarchical clustering from web of science data.
sentiment analyses.
Topic 4 was labeled ‘‘Studies on the factors affecting the technical challenges that cryptocurrencies must address and
Bitcoin price’’ through ‘‘study,’’ ‘‘research,’’ ‘‘btc,’’ ‘‘anal- improve.
ysis,’’ and ‘‘factor.’’ Because the cryptocurrency market
experienced rapid growth, research on Bitcoin price variables D. TOPIC ANALYSIS FROM REDDIT DATA
has been conducted actively. R. Hakim das Neves [60] found An analysis of Reddit data was conducted to understand the
interest in Bitcoin to precede price hikes, whereas T. Panagi- perception and knowledge of Bitcoin in social media. Each
otidis et al. [61] indicated that changes in interest rates and topic presentation was evaluated using a c-TF-IDF score.
exchange rates also affect the price of Bitcoin. Topic 0 was denoted as ‘‘Cryptocurrency investment rec-
Topic 5 was described as ‘‘The digital signature algorithm ommendations’’ through ‘‘bitcoin,’’ ‘‘btc,’’ ‘‘buy,’’ ‘‘money,’’
in cryptocurrency wallets’’ through ‘‘signature,’’ ‘‘wallet,’’ ‘‘dip,’’ and ‘‘want.’’ The term ‘‘buy the dip’’ is a phrase
‘‘key,’’ ‘‘protocol,’’ ‘‘security,’’ and ‘‘ecdsa.’’ As the possi- commonly used in investing and trading, referring to the
bility of losing cryptocurrency due to bankruptcy or hacking strategy of purchasing an asset at a lower price point during
increased, the security of cryptocurrency wallets became a market downturn, with the expectation that the asset will
more important [62], [63]. Consequently, research has been eventually recover and increase in value [76]. Expressions
conducted on utilizing digital signature algorithms such as such as ‘‘buy,’’ ‘‘money,’’ and ‘‘want’’ can also be seen as
the Elliptic Curve Digital Signature Algorithm (ECDSA) in words that encourage the purchase of bitcoin.
cryptocurrency wallets [64], [65]. Topic 1 was referred to as ‘‘Negative view of the scam
Topic 6 was stated as ‘‘The need of regulations in token coin projects’’ through ‘‘shitcoin,’’ ‘‘altcoin,’’ ‘‘meatcoin,’’
Initial coin offering (ICO)’’ through ‘‘ico,’’ ‘‘token,’’ ‘‘icos,’’ ‘‘fuck,’’ ‘‘trashcoin,’’ and ‘‘bullshit.’’ There are negative
‘‘offering,’’ ‘‘capital,’’ ‘‘coin,’’ ‘‘scam,’’ and ‘‘regulation.’’ opinions towards developers who attract money early in
ICO is a method of raising investment funds for a cryptocur- development or through the promotion of scam coins, only to
rency project by transferring a portion of a newly developed abandon or cease development afterward. The derived words
cryptocurrency to investors in exchange for cash or other can be seen as expressions with negative connotations on the
cryptocurrencies [66]. Whereas the ICO process has led to the issues.
creation of successful tokens, there have also been instances Topic 2 was summarized as ‘‘Cryptocurrency exchange
of scams and abuse, which have highlighted the need for recommendations’’ through ‘‘gemini,’’ ‘‘coinbase,’’ ‘‘trader,’’
appropriate regulation [67], [68], [69]. ‘‘use,’’ ‘‘gdax,’’ ‘‘fee,’’ ‘‘blockfi,’’ and ‘‘exchange.’’ As the
Topic 7 was presented as ‘‘Hacking attacks on cryp- cryptocurrency market grew, the number of cryptocurrency
tocurrency’’ through ‘‘ransomware,’’ ‘‘ransom,’’ ‘‘malware,’’ exchanges also increased significantly. However, investors
‘‘attack,’’ and ‘‘victim.’’ Despite its recent growth, the cryp- have suffered significant financial losses with the bankruptcy
tocurrency market remains vulnerable to hacking and money of certain exchanges [77]. As a result, users now have a wide
laundering due to the inherent nature of its coding base range of options to choose from in terms of fees, convenience,
structure, leading to a constant need for improved security and stability when selecting a cryptocurrency exchange.
measures [70], [71], [72]. Extracted words can be perceived as the advantages and
Topic 8 was classified as ‘‘Resolving transaction pro- recommendations associated with a specific exchange.
cessing speed and scalability problems through blockchain Topic 3 was named ‘‘Cryptocurrency testnet and mainnet’’
sharding’’ through ‘‘shard,’’ ‘‘sharding,’’ ‘‘blockchain,’’ through ‘‘testnet,’’ ‘‘mainnet,’’ ‘‘test,’’ ‘‘connect,’’ ‘‘net-
‘‘transaction,’’ ‘‘performance,’’ and ‘‘scalability.’’ Sharding, work,’’ ‘‘change,’’ and ‘‘server.’’ A blockchain mainnet is a
which was adopted as a solution to the slow transaction speed live network that runs a blockchain project, whereas a testnet
and scalability trilemma in blockchain, involves dividing the is a temporary network used during the development phase
overhead a transaction process into smaller groups of nodes to build an independent mainnet [78]. News pertaining to a
[73]. Studies on blockchain sharding are actively being con- mainnet or testnet frequently have a positive effect on price.
ducted as the cryptocurrency market grows [74], [75]. Obtained words may reflect investors’ hopes that the launch
Insights obtained from the Web of Science data of a mainnet will have a positive impact on prices.
have been confirmed to primarily focus on the tech- Data from Reddit tended to feature words related to
nology, phenomena, and significance of blockchain in investor sentiment, rather than serious discussions or expla-
an academic context (Fig. 6, Table 3). The applica- nations of Bitcoin (Fig. 7, Table 4). Posts often contained
tions and conceptions of blockchain-related technology favorable information such as updates on the mainnet and
have been extensively discussed, shedding light on the testnet, which can affect prices, as well as recommendations
VOLUME 11, 2023 130579
H. S. Jung et al.: Unveiling Cryptocurrency Conversations

TABLE 3. Topic representations of bitcoin from web of science.

FIGURE 7. Hierarchical clustering from Reddit data.

TABLE 4. Topic representations of bitcoin from REDDIT.

FIGURE 8. Temporal visualization of topics from LexisNexis data.

over time [79]. Dynamic topic modeling generates topics over


time, which can then be visualized using the Plotly Python
library to track their frequency and evolution.
The changes in topics over time within LexisNexis data
are depicted in Fig. 8. Topic 0, which focuses on Bitcoin as
a digital currency, is consistently associated with the highest
percentage every year, indicating sustained interest in Bitcoin
itself. Furthermore, by examining the movement of Topic 0,
it becomes apparent that the level of interest in Bitcoin cor-
relates with its price movements [80]. In detail, interest in
cryptocurrencies can be observed to rise during periods of
for cryptocurrency exchanges. Consequently, social media explosive price growth, such as late 2017 to early 2018 and
can be observed to largely highlight information that investors 2021, and falls during periods of price decline. Similarly,
find significant and relevant. Topic 1, which pertains to Bitcoin mining, exhibited a sharp
increase in 2021, coinciding with the peak of Bitcoin prices.
E. DYNAMIC TOPIC MODEL (DTM) The evolution of topics over time in Web of Science
A dynamic topic model is a generative model that enables the data is illustrated in Fig. 9. Unlike LexisNexis data, which
analysis of topic evolution within a collection of documents was concentrated on a single topic, the data appear to be

130580 VOLUME 11, 2023


H. S. Jung et al.: Unveiling Cryptocurrency Conversations

studies provided limited insights due to their use of data from


single source.
To overcome the limitations of existing studies, the present
study collected unstructured text data from LexisNexis, Web
of Science, and Reddit to represent media, academia, and
the general public, respectively. Additionally, an analysis of
main themes was conducted using BERTopic, a SOTA model
FIGURE 9. Temporal visualization of topics from web of science data.
for topic modeling, to obtain knowledge. Finally, DTM was
applied to study discourse on Bitcoin over time on each plat-
form, thereby gaining insights and identifying differences.
The present study offers several possible applications. The
findings suggest that the use of three distinct platforms, Lex-
isNexis, Web of Science, and Reddit, with Bitcoin queries
can provide a comprehensive understanding of sentiments
in cryptocurrency. Unique characteristics of discourse were
identified for each platform. Specifically, the news mainly
covered major issues related to cryptocurrency, academic
FIGURE 10. Temporal visualization of topics from Reddit data.
journals focused on practical improvements in line with
technological developments, and social media discussions
relatively evenly distributed across multiple topics. More- primarily revolved around investor psychology and the com-
over, the volume of data appears to fluctuate independently munication of information from an investor’s perspective.
of Bitcoin price movements. Although many studies related However, this study has several limitations that must be
to blockchain technology have been conducted prior to 2021, solved in future research. This study collected and pro-
the number of papers related to the cryptocurrency market cessed solely English-language data. Because each country
surpassed that of blockchain technology in 2022, which can has different characteristics and recognition on cryptocur-
be interpreted as a result of the growing market. Research rencies [84], more generalized results could be obtained by
on Bitcoin mining has continued with a similar volume of conducting research that reflects this diversity. Additionally,
studies. The next most studied topics were those related to this study collected and analyzed data from March 1, 2017,
topic 3 (predicting Bitcoin prices using sentiment analysis) to March 1, 2023, whereas cryptocurrencies have existed for
and topic 4 (factors affecting Bitcoin prices), which can be a longer duration. Therefore, future studies may cover longer
linked to the previously mentioned growth of the cryptocur- timespans to grant a more comprehensive understanding of
rency market. cryptocurrency.
Fig. 10 depicts the changes in topics over time within
Reddit data. Because the total volume of data was over 9 mil- ACKNOWLEDGMENT
lion, the frequency was cut into units of 1,000 for visibility. The authors appreciate Editage (www.editage.co.kr) for their
According to [81], the use of social media is influenced English editing service.
to a significant extent by the connections and interactions
REFERENCES
between users within social networks. Consequently, this
[1] S. Nakamoto, ‘‘Bitcoin: A peer-to-peer electronic cash system,’’ Decen-
social influence has a positive effect on the level of partic- tralized Bus. Rev., p. 21260, Nov. 2008.
ipation among users. In this context, the majority of Reddit [2] D.-E. Diaconaşu, S. Mehdian, and O. Stoica, ‘‘An analysis of investors’
posts on Bitcoin relate to purchase and investment recom- behavior in Bitcoin market,’’ PLoS ONE, vol. 17, no. 3, Mar. 2022,
Art. no. e0264522, doi: 10.1371/journal.pone.0264522.
mendations. The total volume of posts, which can be seen to [3] D. Vujičić, D. Jagodić, and S. Randić, ‘‘Blockchain technology, Bitcoin,
correlate with market interest, covariates more with Bitcoin and Ethereum: A brief overview,’’ in Proc. 17th Int. Symp. INFOTEH-
prices than in the case of LexisNexis data. JAHORINA (INFOTEH), Sarajevo, Bosnia Herzegovina, Mar. 2018,
pp. 1–6.
[4] P. Ciaian, M. Rajcaniova, and D. Kancs, ‘‘Virtual relationships: Short-
V. CONCLUSION and long-run evidence from Bitcoin and altcoin markets,’’ J. Int.
Since Bitcoin was first created by Satoshi in 2008, the Financial Markets, Inst. Money, vol. 52, pp. 173–195, Jan. 2018, doi:
10.1016/j.intfin.2017.11.001.
cryptocurrency market has grown rapidly, with interest in it
[5] A. S. Kumar and T. Ajaz, ‘‘Co-movement in crypto-currency markets:
continuing to increase [2], [82]. Cryptocurrency is now rec- Evidences from wavelet analysis,’’ Financial Innov., vol. 5, no. 1, pp. 1–17,
ognized as a type of asset, and related laws and finance have Jul. 2019, doi: 10.1186/s40854-019-0143-3.
emerged [54], [55], [56]. Therefore, it is essential to perform [6] B. V. Barde and A. M. Bainwad, ‘‘An overview of topic modeling methods
and tools,’’ in Proc. Int. Conf. Intell. Comput. Control Syst. (ICICCS),
data mining and attain knowledge pertaining to cryptocur- Madurai, India, Jun. 2017, pp. 745–750.
rencies. In [83], topic modeling was performed using online [7] L. Hong and B. D. Davison, ‘‘Empirical study of topic modeling in
Bitcoin forum data. In [84], topic modeling was performed Twitter,’’ in Proc. 1st Workshop Social Media Anal., Washington, DC,
USA, Jul. 2010, pp. 80–88.
using Twitter data and LDA to investigate the concerns and [8] D. M. Blei, A. Y. Ng, and M. I. Jordan, ‘‘Latent Dirichlet allocation,’’
sentiment analyses of international users. However, these J. Mach. Learn. Res., vol. 3, pp. 993–1022, Mar. 2003.

VOLUME 11, 2023 130581


H. S. Jung et al.: Unveiling Cryptocurrency Conversations

[9] M. Grootendorst, ‘‘BERTopic: Neural topic modeling with a class-based [32] E. Livni and O. Lopez, ‘‘El Salvador’s adoption of Bitcoin is off to a
TF-IDF procedure,’’ 2022, arXiv:2203.05794. rocky start,’’ The New York Times, Sep. 7, 2021. Accessed: Aug. 4, 2023.
[10] M. Ortu, S. Vacca, G. Destefanis, and C. Conversano, ‘‘Cryptocurrency [Online]. Available: https://www.nytimes.com/2021/09/07/business/el-
ecosystems and social media environments: An empirical analysis through salvador-Bitcoin.html?searchResultPosition=2
Hawkes’ models and natural language processing,’’ Mach. Learn. Appl., [33] E. Barrett, ‘‘Ukraine already trades more crypto than fiat currency. Now
vol. 7, Mar. 2022, Art. no. 100229, doi: 10.1016/j.mlwa.2021.100229. assets like Bitcoin are officially legal,’’ Fortune, Feb. 18, 2020. Accessed:
[11] B. Mcmillan, J. Myers, A. Nguyen, D. Robinson, and M. Kennard, ‘‘Anal- Aug. 4, 2023. [Online]. Available: https://fortune.com/2022/02/18/
ysis and comparison of natural language processing algorithms as applied ukraine-legalizes-cryptocurrency-Bitcoin-russia-digital-assets/
to Bitcoin conversations on social media,’’ J. Investing, vol. 31, no. 2, [34] Triple A, Singapore. (2021). Cryptocurrency Information About Venezuela.
pp. 38–59, Jan. 2022, doi: 10.3905/joi.2021.1.213. Accessed: Aug. 4, 2023. [Online]. Available: https://triple-a.io/crypto-
[12] J. V. Critien, A. Gatt, and J. Ellul, ‘‘Bitcoin price change and trend ownership-venezuela-2021/
prediction through Twitter sentiment and data volume,’’ Financial Innov., [35] L. Davison and C. Condon, ‘‘Treasury calls for crypto transfers
vol. 8, no. 1, p. 45, May 2022, doi: 10.1186/s40854-022-00352-7. over $10,000 to be reported to IRS,’’ Bloomberg, May 20, 2021.
[13] O. Sattarov, H. S. Jeon, R. Oh, and J. D. Lee, ‘‘Forecasting Bitcoin price Accessed: Aug. 4, 2023. [Online]. Available: https://www.bloomberg.
fluctuation by Twitter sentiment analysis,’’ in Proc. Int. Conf. Inf. Sci. com/news/articles/2021-05-20/treasury-calls-for-crypto-transfers-over-
Commun. Technol. (ICISCT), Karachi, Pakistan, Nov. 2020, pp. 1–4. 10-000-reported-to-irs#xj4y7vzkg
[14] H. S. Jung, S. H. Lee, H. Lee, and J. H. Kim, ‘‘Predicting Bitcoin trends [36] D. A. Liedel, ‘‘The taxation of Bitcoin: How the IRS views cryptocurren-
through machine learning using sentiment analysis with technical indica- cies,’’ Drake Law Rev., vol. 66, no. 1, pp. 107–146, 2018.
tors,’’ Comput. Syst. Sci. Eng., vol. 46, no. 2, pp. 2231–2246, 2023, doi: [37] K. Solodan, ‘‘Legal regulation of cryptocurrency taxation in European
10.32604/csse.2023.034466. countries,’’ Eur. J. Law Public Admin., vol. 6, no. 1, pp. 64–74, Sep. 2019.
[15] D. M. Blei, ‘‘Probabilistic topic models,’’ Commun. ACM., vol. 55, no. 4, [38] M. Lerer, ‘‘The taxation of cryptocurrency: Virtual transactions bring real-
pp. 77–84, Apr. 2012, doi: 10.1145/2133806.2133826. life tax implications,’’ CPA J., vol. 89, no. 1, pp. 40–43, Jan. 2019.
[16] B. Yin and C.-H. Yuan, ‘‘Detecting latent topics and trends in blended [39] Y. Liu, Z. Yang, and Y. Benslimane, ‘‘Bitcoin data analysis using deep
learning using LDA topic modeling,’’ Educ. Inf. Technol., vol. 27, no. 9, learning and statistical modeling,’’ in Proc. IEEE Int. Conf. Ind. Eng. Eng.
pp. 12689–12712, Nov. 2022, doi: 10.1007/s10639-022-11118-0. Manage. (IEEM), Kuala Lumpur, Malaysia, Dec. 2022, pp. 127–131.
[17] E. Polyzos and F. Wang, ‘‘Twitter and market efficiency in energy markets: [40] C. Jones, ‘‘Bitcoin debate: Warren Buffett bear vs. Winklevoss twins
Evidence using LDA clustered topic extraction,’’ Energy Econ., vol. 114, bull,’’ Forbes, Feb. 23, 2018. Accessed: Aug. 4, 2023. [Online].
Oct. 2022, Art. no. 106264, doi: 10.1016/j.eneco.2022.106264. Available: https://www.forbes.com/sites/chuckjones/2018/02/23/Bitcoin-
[18] C. Sharma, S. Sharma, and Sakshi, ‘‘Latent Dirichlet allocation (LDA) debate-warren-buffett-bear-vs-winklevoss-twins-bull/?sh=48452fd51331
based information modelling on BLOCKCHAIN technology: A review of
[41] G. Ahlstrand and E. Gkritsi, ‘‘Binance Singapore drops crypto
trends and research patterns used in integration,’’ Multimedia Tools Appl.,
license plans in city-state,’’ CoinDesk, Dec. 13, 2021. Accessed:
vol. 81, no. 25, pp. 36805–36831, Oct. 2022, doi: 10.1007/s11042-022-
Aug. 4, 2023. [Online]. Available: https://www.coindesk.com/policy/2021/
13500-z.
12/13/binance-singapore-drops-crypto-license-plans-in-city-state/
[19] S. Avasthi, R. Chauhan, and D. P. Acharjya, ‘‘Topic modeling techniques
[42] J. Lim, ‘‘MAS orders crypto exchange platform Binance.com to stop
for text mining over a large-scale scientific and biomedical text corpus,’’
services in Singapore,’’ The Straits Times, Sep. 9, 2021. Accessed:
Int. J. Ambient Comput. Intell., vol. 13, no. 1, pp. 1–18, Jan. 2022, doi:
Aug. 4, 2023. [Online]. Available: https://www.straitstimes.com/
10.4018/ijaci.293137.
business/banking/binancecom-placed-on-mas-investor-alert-list
[20] R. Egger and J. Yu, ‘‘A topic modeling comparison between LDA, NMF,
[43] A. Singh, R. Rajak, H. Mistry, and P. Raut, ‘‘Aid, charity and donation
Top2Vec, and BERTopic to demystify Twitter posts,’’ Frontiers Sociol.,
tracking system using blockchain,’’ in Proc. 4th Int. Conf. Trends Electron.
vol. 7, May 2022, Art. no. 886498, doi: 10.3389/fsoc.2022.886498.
Informat. (ICOEI), Tirunelveli, India, Jun. 2020, pp. 457–462.
[21] N. Reimers and I. Gurevych, ‘‘Sentence-BERT: Sentence embeddings
using Siamese bert-networks,’’ 2019. arXiv:1908.10084. [44] E. Shaheen, M. A. Hamed, W. Zaghloul, E. A. Mostafa, A. E. Sharkawy,
A. Mahmoud, A. Labeb, M. O. A. Enany, and G. Attiya, ‘‘A track donation
[22] I. Assent, ‘‘Clustering high dimensional data,’’ Wiley Interdiscipl. Rev.,
system using blockchain,’’ in Proc. Int. Conf. Electron. Eng. (ICEEM),
Data Mining Knowl. Discovery, vol. 2, no. 4, pp. 340–350, Jun. 2012, doi:
Menouf, Egypt, Jul. 2021, pp. 1–7.
10.1002/widm.1062.
[23] L. McInnes, J. Healy, and J. Melville, ‘‘UMAP: Uniform manifold approx- [45] B. Lindrea, ‘‘Ukraine netted $70M in crypto donations since start of Russia
imation and projection for dimension reduction,’’ 2018, arXiv:1802.03426. conflict,’’ Cointelegraph, Feb. 27, 2023. Accessed: Aug. 4, 2023. [Online].
Available: https://cointelegraph.com/news/ukraine-netted-70m-in-crypto-
[24] L. McInnes, J. Healy, and S. Astels, ‘‘HDBSCAN: Hierarchical density
donations-since-start-of-russia-conflict
based clustering,’’ J. Open Source Softw., vol. 2, no. 11, p. 205, Mar. 2017,
doi: 10.21105/joss.00205. [46] Forbes. (Sep. 20, 2022). What Really Happened to LUNA Crypto?
Accessed: Aug. 4, 2023. [Online]. Available: https://www.forbes.com/
[25] M. Honnibal and I. Montani, ‘‘SpaCy 2: Natural language understanding
sites/qai/2022/09/20/what-really-happened-to-luna-
with Bloom embeddings, convolutional neural networks and incremental
crypto/?sh=63540a624ff1
parsing,’’ To Appear, vol. 7, no. 1, pp. 411–420, 2017.
[26] M. Röder, A. Both, and A. Hinneburg, ‘‘Exploring the space of topic [47] S. Lee, J. Lee, and Y. Lee, ‘‘Dissecting the Terra-LUNA crash: Evidence
coherence measures,’’ in Proc. 8th ACM Int. Conf. Web Search Data from the spillover effect and information flow,’’ Finance Res. Lett., vol. 53,
Mining, New York, NY, USA, Feb. 2015, pp. 399–408. May 2023, Art. no. 103590, doi: 10.1016/j.frl.2022.103590.
[27] C. Meaney, M. Escobar, T. A. Stukel, P. C. Austin, and L. Jaakkimainen, [48] A. Briola, D. Vidal-Tomás, Y. Wang, and T. Aste, ‘‘Anatomy of a Stable-
‘‘Comparison of methods for estimating temporal topic models from pri- coin’s failure: The Terra-Luna case,’’ Finance Res. Lett., vol. 51, Jan. 2023,
mary care clinical text data: Retrospective closed cohort study,’’ JMIR Med. Art. no. 103358, doi: 10.1016/j.frl.2022.103358.
Informat., vol. 10, no. 12, Dec. 2022, Art. no. e40102. [49] M. H. Joo, Y. Nishikawa, and K. Dandapani, ‘‘Cryptocurrency, a successful
[28] R. Grinberg, ‘‘Bitcoin: An innovative alternative digital currency,’’ Hast- application of blockchain technology,’’ Managerial Finance, vol. 46, no. 6,
ings Sci. Technol. Law J., vol. 4, p. 160, Dec. 2011. pp. 715–733, Aug. 2019, doi: 10.1108/mf-09-2018-0451.
[29] G. P. Dwyer, ‘‘The economics of Bitcoin and similar private digital [50] R. B. Fekih and M. Lahami, ‘‘Application of blockchain technology in
currencies,’’ J. Financial Stability, vol. 17, pp. 81–91, Apr. 2015, doi: healthcare: A comprehensive study,’’ in Proc. 18th Int. Conf. Smart Homes
10.1016/j.jfs.2014.11.006. Health Telematics (ICOST), Hammamet, Tunisia, 2020, pp. 268–276.
[30] J. Li, N. Li, J. Peng, H. Cui, and Z. Wu, ‘‘Energy consumption of [51] M. H. Miraz and M. Ali, ‘‘Applications of blockchain technology beyond
cryptocurrency mining: A study of electricity consumption in min- cryptocurrency,’’ 2018, arXiv:1801.03528.
ing cryptocurrencies,’’ Energy, vol. 168, pp. 160–168, Feb. 2019, doi: [52] P. Tasatanattakool and C. Techapanupreeda, ‘‘Blockchain: Challenges
10.1016/j.energy.2018.11.046. and applications,’’ in Proc. Int. Conf. Inf. Netw. (ICOIN), Chiang Mai,
[31] K. J. O’Dwyer and D. Malone, ‘‘Bitcoin mining and its energy foot- Thailand, Jan. 2018, pp. 473–475.
print,’’ in Proc. 25th IET Irish Signals Syst. Conf. China-Ireland Int. [53] D. G. Baur, K. Hong, and A. D. Lee, ‘‘Bitcoin: Medium of exchange
Conf. Inf. Commun. Technol. (ISSC/CIICT), Limerick, Ireland, Jun. 2014, or speculative assets?’’ J. Int. Financial Markets, Inst. Money, vol. 54,
pp. 280–285. pp. 177–189, May 2018, doi: 10.1016/j.intfin.2017.12.004.

130582 VOLUME 11, 2023


H. S. Jung et al.: Unveiling Cryptocurrency Conversations

[54] E. Bouri, P. Molnár, G. Azzi, D. Roubaud, and L. I. Hagfors, ‘‘On the [75] H. Dang, T. T. A. Dinh, D. Loghin, E.-C. Chang, Q. Lin, and B. C. Ooi,
Hedge and safe haven properties of Bitcoin: Is it really more than a ‘‘Towards scaling blockchain systems via sharding,’’ in Proc. Int. Conf.
diversifier?’’ Finance Res. Lett., vol. 20, pp. 192–198, Feb. 2017, doi: Manage. Data, Amsterdam, The Netherlands, Jun. 2019, pp. 123–140.
10.1016/j.frl.2016.09.025. [76] S. Bonini, T. Shohfi and M. Simaan. (2022). Buy the Dip? [Online].
[55] A. Kliber, P. Marszałek, I. Musiałkowska, and K. Świerczyńska, ‘‘Bitcoin: Available: https://ssrn.com/abstract=3835376
Safe haven, Hedge or diversifier? Perception of Bitcoin in the context [77] N. Sherman and J. Tidy, ‘‘Crypto giant FTX collapses into bankruptcy,’’
of a country’s economic situation—A stochastic volatility approach,’’ BBC, Nov. 11, 2022. Accessed: Aug. 4, 2023. [Online]. Available:
Phys. A, Stat. Mech. Appl., vol. 524, pp. 246–257, Jun. 2019, doi: https://www.bbc.com/news/business-63601213
10.1016/j.physa.2019.04.145. [78] Immunebytes, New Delhi, India. (2022). Mainnet vs Testnet in
[56] S. Corbet, A. Meegan, C. Larkin, B. Lucey, and L. Yarovaya, ‘‘Explor- Blockchain. Accessed: Aug. 4, 2023. [Online]. Available: https://www.
ing the dynamic relationships between cryptocurrencies and other immunebytes.com/blog/mainnet-vs-testnet-in-blockchain/
financial assets,’’ Econ. Lett., vol. 165, pp. 28–34, Apr. 2018, doi: [79] D. M. Blei and J. D. Lafferty, ‘‘Dynamic topic models,’’ in Proc. ICML,
10.1016/j.econlet.2018.01.004. Pittsburgh, PA, USA, 2006, pp. 113–120.
[57] R. Zhang and W. K. V. Chan, ‘‘Evaluation of energy consumption in [80] A.-D. Vo, ‘‘Sentiment analysis of news for effective cryptocurrency price
block-chains with proof of work and proof of stake,’’ J. Phys., Conf. prediction,’’ Int. J. Knowl. Eng., vol. 5, no. 2, pp. 47–52, Dec. 2019, doi:
Ser., vol. 1584, no. 1, Jul. 2020, Art. no. 012023, doi: 10.1088/1742- 10.18178/ijke.2019.5.2.116.
6596/1584/1/012023. [81] S. Boulianne, ‘‘Social media use and participation: A meta-analysis of cur-
rent research,’’ Inf., Commun. Soc., vol. 18, no. 5, pp. 524–538, May 2015,
[58] N. Lasla, L. Al-Sahan, M. Abdallah, and M. Younis, ‘‘Green-
doi: 10.1080/1369118x.2015.1008542.
PoW: An energy-efficient blockchain proof-of-work consensus
[82] S. Bourgi, ‘‘Institutional investors increase their crypto holdings for 5th
algorithm,’’ Comput. Netw., vol. 214, Sep. 2022, Art. no. 109118,
straight week,’’ Cointelegraph, Sep. 20, 2021. Accessed: Aug. 4, 2023.
doi: 10.1016/j.comnet.2022.109118.
[Online]. Available: https://cointelegraph.com/news/institutional-
[59] T. Pano and R. Kashef, ‘‘A complete VADER-based sentiment analysis investors-increase-their-crypto-holdings-for-5th-straight-week
of Bitcoin (BTC) tweets during the era of COVID-19,’’ Big Data Cogn. [83] Y. B. Kim, J. Lee, N. Park, J. Choo, J.-H. Kim, and C. H. Kim, ‘‘When
Comput., vol. 4, no. 4, p. 33, Nov. 2020, doi: 10.3390/bdcc4040033. Bitcoin encounters information in an online forum: Using text mining to
[60] R. H. D. Neves, ‘‘Bitcoin pricing: Impact of attractiveness variables,’’ analyse user opinions and predict value fluctuation,’’ PLoS ONE, vol. 12,
Financial Innov., vol. 6, no. 1, pp. 1–18, Apr. 2020, doi: 10.1186/s40854- no. 5, May 2017, Art. no. e0177630, doi: 10.1371/journal.pone.0177630.
020-00176-3. [84] S. Bibi, S. Hussain, and M. I. Faisal, ‘‘Public perception based recommen-
[61] T. Panagiotidis, T. Stengos, and O. Vravosinos, ‘‘The effects of markets, dation system for cryptocurrency,’’ in Proc. 16th Int. Bhurban Conf. Appl.
uncertainty and search intensity on Bitcoin returns,’’ Int. Rev. Financial Sci. Technol. (IBCAST), Islamabad, Pakistan, Jan. 2019, pp. 661–665.
Anal., vol. 63, pp. 220–242, May 2019, doi: 10.1016/j.irfa.2018.11.002.
[62] J. Korn, ‘‘Record $3.8 billion stolen in crypto hacks last year, report
says,’’ CNN, Feb. 1, 2022. Accessed: Aug. 4, 2023. [Online]. Available:
https://edition.cnn.com/2023/02/01/tech/crypto-hacks-2022/index.html
HAE SUN JUNG is currently pursuing the Ph.D.
[63] D. Nelson and N. De, ‘‘FTX U.S. temporarily froze crypto
withdrawals, adding to chaos of bankruptcy proceedings,’’ Coindesk, degree with the Department of Applied Artifi-
Nov. 12, 2022. Accessed: Aug. 4, 2023. [Online]. Available: cial Intelligence, Sungkyunkwan University. His
https://www.coindesk.com/business/2022/11/11/ftx-us-freezes-crypto- research interests include natural language pro-
withdrawals-sending-millions-in-assets-to-bankruptcy-limbo/ cessing, computer vision, deep learning, and
[64] D. Johnson, A. Menezes, and S. Vanstone, ‘‘The elliptic curve digital machine learning.
signature algorithm (ECDSA),’’ Int. J. Inf. Secur., vol. 1, no. 1, pp. 36–63,
Aug. 2001, doi: 10.1007/s102070100002.
[65] R. Gennaro, S. Goldfeder, and A. Narayanan, ‘‘Threshold-optimal
DSA/ECDSA signatures and an application to Bitcoin wallet security,’’
in Proc. 14th Int. Conf. Appl. Cryptogr. Netw. Secur. (ACNS), vol. 14,
Guildford, U.K., 2016, pp. 156–174.
[66] A. Feign, ‘‘What is an ICO?’’ Coindesk, Dec. 12, 2022. [Online]. Avail- HAEIN LEE is currently pursuing the Ph.D. degree
able: https://www.coindesk.com/learn/what-is-an-ico/ with the Department of Applied Artificial Intel-
[67] D. Boreiko and N. K. Sahdev. (2018). To ICO or Not to ICO-Empirical ligence and the Department of Human-Artificial
Analysis of Initial Coin Offerings and Token Sales. [Online]. Available: Intelligence Interaction, Sungkyunkwan Univer-
https://ssrn.com/abstract=3209180 sity. Her research interests include natural lan-
[68] L. Rhue. (2018). Trust is All You Need: An Empirical Exploration of Initial guage processing, deep learning, and machine
Coin Offerings (ICOs) and ICO Reputation Scores. [Online]. Available: learning.
https://ssrn.com/abstract=3179723
[69] O. A. Karpenko, T. K. Blokhina, and L. V. Chebukhanova, ‘‘The initial coin
offering (ICO) process: Regulation and risks,’’ J. Risk Financial Manage.,
vol. 14, no. 12, p. 599, Dec. 2021, doi: 10.3390/jrfm14120599.
[70] Y. Tsuchiya and N. Hiramoto, ‘‘How cryptocurrency is laundered: Case
study of coincheck hacking incident,’’ Forensic Sci. Int., Rep., vol. 4,
Nov. 2021, Art. no. 100241, doi: 10.1016/j.fsir.2021.100241. JANG HYUN KIM is currently a Professor with
[71] K. Grobys, ‘‘When the blockchain does not block: On hackings and the Department of Human-Artificial Intelligence
uncertainty in the cryptocurrency market,’’ Quant. Finance, vol. 21, no. 8, Interaction, the Department of Interaction Sci-
pp. 1267–1279, Aug. 2021, doi: 10.1080/14697688.2020.1849779. ence, and the Department of Applied Artificial
[72] U. W. Chohan. (2018). The Problems of Cryptocurrency Thefts and Intelligence, Sungkyunkwan University. He has
Exchange Shutdowns. [Online]. Available: https://ssrn.com/abstract= authored more than 50 articles in major journals,
3131702 such as Information Processing and Management,
[73] L. Wang, ‘‘The challenge and prospect of scalability of blockchain tech- Telematics and Informatics, Cities, Government
nology,’’ in Proc. 5th Int. Conf. Comput. Sci. Artif. Intell., Beijing, China, Information Quarterly, Technological Forecasting
Dec. 2021, pp. 296–301. and Social Change, and Journal of Computer-
[74] M. Zamani, M. Movahedi, and M. Raykova, ‘‘RapidChain: Scaling Mediated Communication. His research interests include social/semantic
blockchain via full sharding,’’ in Proc. ACM SIGSAC Conf. Comput. data analysis, social media, and future media.
Commun. Secur., Toronto, ON, Canada, Oct. 2018, pp. 931–948.

VOLUME 11, 2023 130583

You might also like