BitcoinDataAnalysisCaseStudy - GoogleCollab
BitcoinDataAnalysisCaseStudy - GoogleCollab
ipynb - Colaboratory
"An Analysis of Bitcoin Trading Data from 2017-2019: A Brief Case Study Demonstrating Expertise
in Predictive Modeling of Closing Prices" This case study aims to present a streamlined approach
for modeling Bitcoin trading data on a minute-by-minute basis, spanning the years 2017 to 2019.
The primary objective is to develop a straightforward yet effective model to forecast Bitcoin's
closing prices, showcasing both a deep understanding of the cryptocurrency market and proficiency
in data analysis techniques.
"Initially, we commence by integrating the datasets from 2017, 2018, and 2019 into a singular
comprehensive dataset."
import pandas as pd
# File paths .
file_2017 = '/content/drive/MyDrive/BTC-2017min.csv'
file_2018 = '/content/drive/MyDrive/BTC-2018min.csv'
file_2019 = '/content/drive/MyDrive/BTC-2019min.csv'
merged_data.dtypes
unix int64
date object
https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=bC29Dc8VaDPR&printMode=true 1/10
1/13/24, 9:58 PM Untitled0.ipynb - Colaboratory
symbol object
open float64
high float64
low float64
close float64
Volume BTC float64
Volume USD float64
dtype: object
unix datetime64[ns]
date datetime64[ns]
symbol string
open float64
high float64
low float64
close float64
Volume BTC float64
Volume USD float64
dtype: object
https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=bC29Dc8VaDPR&printMode=true 2/10
1/13/24, 9:58 PM Untitled0.ipynb - Colaboratory
array([ True])
Volume
date symbol open high low close Volume USD
BTC
2017-
0 12-31 BTC/USD 13913.28 13913.28 13867.18 13880.00 0.591748 8213.456549
23:59:00
2017-
1 12-31 BTC/USD 13913.26 13953.83 13884.69 13953.77 1.398784 19518.309658
23:58:00
merged_data['symbol'].value_counts()
BTC/USD 1576797
Name: symbol, dtype: Int64
merged_data['symbol'].nunique()
merged_data.corr()
https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=bC29Dc8VaDPR&printMode=true 3/10
1/13/24, 9:58 PM Untitled0.ipynb - Colaboratory
Sample of Data:
merged_data.sample(5)
Volume
date symbol open high low close Volume USD
BTC
2017-
213716 08-05 BTC/USD 3156.07 3156.07 3156.07 3156.07 0.000000 0.000000
14:03:00
2018-
174849 09-01 BTC/USD 7041.85 7051.72 7039.02 7051.72 1.109430 7823.389085
13:50:00
merged_data.duplicated().sum()
merged_data.isnull().sum()
date 0
symbol 0
open 0
high 0
low 0
close 0
Volume BTC 0
Volume USD 0
dtype: int64
https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=bC29Dc8VaDPR&printMode=true 4/10
1/13/24, 9:58 PM Untitled0.ipynb - Colaboratory
merged_data['date'].head(5)
0 2017-12-31 23:59:00
1 2017-12-31 23:58:00
2 2017-12-31 23:57:00
3 2017-12-31 23:56:00
4 2017-12-31 23:55:00
Name: date, dtype: datetime64[ns]
merged_data['date'].nunique()
1576797
"Next, we proceed to meticulously organize the minutely data in chronological order. This sorting
by date is crucial for maintaining the integrity of the time series."
merged_data_sorted = merged_data.sort_values(by='date')
"we then transform the minutely data into a daily format. This conversion is aimed at refining the
analysis process. Aggregating the data on a daily basis allows for a clearer, more manageable
overview of trends and patterns, which is particularly beneficial for more effective and insightful
analysis."
import pandas as pd
https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=bC29Dc8VaDPR&printMode=true 5/10
1/13/24, 9:58 PM Untitled0.ipynb - Colaboratory
daily_data.reset_index(inplace=True)
daily_data.head(5)
2017-
0 977.256602 977.385233 977.132620 977.276060 6850.593309 6.765936e+06
01-01
2017-
1 1012.267604 1012.517181 1011.988826 1012.273903 8167.381030 8.276031e+06
01-02
2017-
2 1020.001535 1020.226840 1019.794437 1020.040472 9089.658025 9.276735e+06
01-03
the next step is to securely store this refined dataset. We accomplish this by exporting the
'daily_data' into a CSV file.
Adding a moving average (or moving mean) to your dataset is a common technique in time series
analysis, especially in financial data analysis. It helps in smoothing out short-term fluctuations and
highlighting longer-term trends or cycles.
import pandas as pd
# Now your daily_data DataFrame has an additional column with the 20-day moving average of t
daily_data.reset_index('date', inplace=True)
#replace first 19 rows of null because of the 20 window.
#used this approach , using close price as defauly price.
daily_data['moving_average_close'].fillna(daily_data['close'], inplace=True)
print(daily_data.head(25)) # Displaying the first 25 rows to see some of the moving average
https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=bC29Dc8VaDPR&printMode=true 6/10
1/13/24, 9:58 PM Untitled0.ipynb - Colaboratory
2 2017-01-03 1020.001535 1020.226840 1019.794437 1020.040472
3 2017-01-04 1076.558840 1077.271167 1075.572542 1076.553639
4 2017-01-05 1043.608646 1044.905549 1042.094125 1043.547951
5 2017-01-06 934.455278 935.419188 933.269312 934.416729
6 2017-01-07 869.618951 870.700465 868.904215 869.738333
7 2017-01-08 914.224917 914.637931 913.597944 913.966083
8 2017-01-09 893.495403 893.856319 893.047132 893.471535
9 2017-01-10 902.637313 902.858104 902.343042 902.638375
10 2017-01-11 846.285701 847.306472 845.153167 846.173313
11 2017-01-12 782.970292 783.542347 782.386604 782.961688
12 2017-01-13 807.222361 807.674451 806.778986 807.177507
13 2017-01-14 827.433958 827.573681 827.271819 827.412431
14 2017-01-15 817.127076 817.298757 816.911528 817.081007
15 2017-01-16 827.983313 828.105757 827.839722 827.977958
16 2017-01-17 876.174264 876.640868 875.760896 876.181472
17 2017-01-18 885.380229 885.695486 885.050083 885.345653
18 2017-01-19 893.326090 893.591750 893.053882 893.294389
19 2017-01-20 895.566618 895.695965 895.415535 895.552688
20 2017-01-21 917.654201 917.797965 917.532965 917.679382
21 2017-01-22 922.678097 922.897590 922.452764 922.689111
22 2017-01-23 920.178285 920.327340 919.997403 920.188507
23 2017-01-24 907.212306 907.418618 906.928014 907.186326
24 2017-01-25 894.505562 894.616382 894.376438 894.509347
daily_data['date'].head()
0 2017-01-01
1 2017-01-02
https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=bC29Dc8VaDPR&printMode=true 7/10
1/13/24, 9:58 PM Untitled0.ipynb - Colaboratory
2 2017-01-03
3 2017-01-04
4 2017-01-05
Name: date, dtype: object
daily_data.head()
2017-
0 977.256602 977.385233 977.132620 977.276060 6850.593309 6.765936e+06
01-01
2017-
1 1012.267604 1012.517181 1011.988826 1012.273903 8167.381030 8.276031e+06
01-02
2017-
2 1020.001535 1020.226840 1019.794437 1020.040472 9089.658025 9.276735e+06
01-03
"After preparing and saving the daily data, we shift our focus to developing a predictive model. For
this purpose, a Linear Regression model is chosen due to its effectiveness in capturing linear
relationships between variables. In this context, the model will be employed to understand and
predict the closing prices of Bitcoin based on the daily data trends observed over the previous
years."
https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=bC29Dc8VaDPR&printMode=true 8/10
1/13/24, 9:58 PM Untitled0.ipynb - Colaboratory
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error
import numpy as np
Scaling the entire dataset is an important preprocessing step, especially in regression analysis
where features might have different scales and units. This can significantly impact the performance
of many machine learning algorithms, including linear regression.
https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=bC29Dc8VaDPR&printMode=true 9/10
1/13/24, 9:58 PM Untitled0.ipynb - Colaboratory
# Scale the training data and also apply the same transformation to the test data
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=bC29Dc8VaDPR&printMode=true 10/10