0% found this document useful (0 votes)
19 views10 pages

BitcoinDataAnalysisCaseStudy - GoogleCollab

The document discusses merging Bitcoin trading data from 2017-2019 into a single dataset and transforming the minute-level data into daily aggregates. Key steps include: 1) Importing and merging the minute-level Bitcoin price and volume data from 2017, 2018, and 2019. 2) Converting the data types and checking for duplicates and null values. 3) Sorting the data chronologically by date and setting date as the index. 4) Resampling the minute-level data to daily averages and sums to refine the analysis and make trends more clear.

Uploaded by

ramihameed2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views10 pages

BitcoinDataAnalysisCaseStudy - GoogleCollab

The document discusses merging Bitcoin trading data from 2017-2019 into a single dataset and transforming the minute-level data into daily aggregates. Key steps include: 1) Importing and merging the minute-level Bitcoin price and volume data from 2017, 2018, and 2019. 2) Converting the data types and checking for duplicates and null values. 3) Sorting the data chronologically by date and setting date as the index. 4) Resampling the minute-level data to daily averages and sums to refine the analysis and make trends more clear.

Uploaded by

ramihameed2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

1/13/24, 9:58 PM Untitled0.

ipynb - Colaboratory

"An Analysis of Bitcoin Trading Data from 2017-2019: A Brief Case Study Demonstrating Expertise
in Predictive Modeling of Closing Prices" This case study aims to present a streamlined approach
for modeling Bitcoin trading data on a minute-by-minute basis, spanning the years 2017 to 2019.
The primary objective is to develop a straightforward yet effective model to forecast Bitcoin's
closing prices, showcasing both a deep understanding of the cryptocurrency market and proficiency
in data analysis techniques.

from google.colab import drive


drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.moun

"Initially, we commence by integrating the datasets from 2017, 2018, and 2019 into a singular
comprehensive dataset."

import pandas as pd

# File paths .
file_2017 = '/content/drive/MyDrive/BTC-2017min.csv'
file_2018 = '/content/drive/MyDrive/BTC-2018min.csv'
file_2019 = '/content/drive/MyDrive/BTC-2019min.csv'

# Load the datasets .


data_2017 = pd.read_csv(file_2017)
data_2018 = pd.read_csv(file_2018)
data_2019 = pd.read_csv(file_2019)

# Merging the datasets


merged_data = pd.concat([data_2017, data_2018, data_2019])

# Saving the merged dataset


merged_data.to_csv('/content/drive/My Drive/BTC_2017-2019_merged.csv', index=False)

data exploration and data cleaning

Column Data Types:

merged_data.dtypes

unix int64
date object
https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=bC29Dc8VaDPR&printMode=true 1/10
1/13/24, 9:58 PM Untitled0.ipynb - Colaboratory
symbol object
open float64
high float64
low float64
close float64
Volume BTC float64
Volume USD float64
dtype: object

Changing data types for usability:

# Convert 'unix' to datetime


merged_data['unix'] = pd.to_datetime(merged_data['unix'], unit='s') # Assuming 'unix' is in
# Convert 'date' to datetime
merged_data['date'] = pd.to_datetime(merged_data['date'])

# Convert 'symbol' to string


merged_data['symbol'] = merged_data['symbol'].astype('string')

# Check the data types again


merged_data.dtypes

unix datetime64[ns]
date datetime64[ns]
symbol string
open float64
high float64
low float64
close float64
Volume BTC float64
Volume USD float64
dtype: object

check unix and date if the same?

# Compare 'unix' and 'date'


# Creating a new column 'is_same' to check if 'unix' and 'date' are the same (up to seconds)
merged_data['is_same'] = merged_data['unix'].dt.floor('S') == merged_data['date'].dt.floor('

# Check the comparison results


print(merged_data[['unix', 'date', 'is_same']].head())
# unique values
merged_data['is_same'].unique()

unix date is_same


0 2017-12-31 23:59:00 2017-12-31 23:59:00 True
1 2017-12-31 23:58:00 2017-12-31 23:58:00 True
2 2017-12-31 23:57:00 2017-12-31 23:57:00 True
3 2017-12-31 23:56:00 2017-12-31 23:56:00 True
4 2017-12-31 23:55:00 2017-12-31 23:55:00 True

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=bC29Dc8VaDPR&printMode=true 2/10
1/13/24, 9:58 PM Untitled0.ipynb - Colaboratory
array([ True])

since unix and date are exact match , drop unix

# Drop 'unix' and 'is_same' columns


merged_data = merged_data.drop(columns=['unix', 'is_same'])
merged_data.head()

Volume
date symbol open high low close Volume USD
BTC

2017-
0 12-31 BTC/USD 13913.28 13913.28 13867.18 13880.00 0.591748 8213.456549
23:59:00

2017-
1 12-31 BTC/USD 13913.26 13953.83 13884.69 13953.77 1.398784 19518.309658
23:58:00

Value Counts for symbol:

merged_data['symbol'].value_counts()

BTC/USD 1576797
Name: symbol, dtype: Int64

Unique Values in a Column:

merged_data['symbol'].nunique()

Correlation Matrix: To check the correlation between different numerical columns:

merged_data.corr()

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=bC29Dc8VaDPR&printMode=true 3/10
1/13/24, 9:58 PM Untitled0.ipynb - Colaboratory

<ipython-input-54-cc54846d37e8>:1: FutureWarning: The default value of numeric_only in D


merged_data.corr()
open high low close Volume BTC Volume USD

open 1.000000 0.999997 0.999996 0.999995 0.027315 0.252212

high 0.999997 1.000000 0.999994 0.999997 0.028008 0.253042

low 0.999996 0.999994 1.000000 0.999996 0.026497 0.251236

close 0.999995 0.999997 0.999996 1.000000 0.027233 0.252119

Volume BTC 0.027315 0.028008 0.026497 0.027233 1.000000 0.831629

Volume USD 0.252212 0.253042 0.251236 0.252119 0.831629 1.000000

Sample of Data:

merged_data.sample(5)

Volume
date symbol open high low close Volume USD
BTC

2017-
213716 08-05 BTC/USD 3156.07 3156.07 3156.07 3156.07 0.000000 0.000000
14:03:00

2018-
174849 09-01 BTC/USD 7041.85 7051.72 7039.02 7051.72 1.109430 7823.389085
13:50:00

Check for duplicates & Nulls entries in the data

merged_data.duplicated().sum()

merged_data.isnull().sum()

date 0
symbol 0
open 0
high 0
low 0
close 0
Volume BTC 0
Volume USD 0
dtype: int64

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=bC29Dc8VaDPR&printMode=true 4/10
1/13/24, 9:58 PM Untitled0.ipynb - Colaboratory

merged_data['date'].head(5)

0 2017-12-31 23:59:00
1 2017-12-31 23:58:00
2 2017-12-31 23:57:00
3 2017-12-31 23:56:00
4 2017-12-31 23:55:00
Name: date, dtype: datetime64[ns]

merged_data['date'].nunique()

1576797

"Next, we proceed to meticulously organize the minutely data in chronological order. This sorting
by date is crucial for maintaining the integrity of the time series."

merged_data_sorted = merged_data.sort_values(by='date')

"we then transform the minutely data into a daily format. This conversion is aimed at refining the
analysis process. Aggregating the data on a daily basis allows for a clearer, more manageable
overview of trends and patterns, which is particularly beneficial for more effective and insightful
analysis."

import pandas as pd

# Convert 'date' to datetime if not already done


merged_data_sorted['date'] = pd.to_datetime(merged_data_sorted['date'])

# Set the 'date' column as the index


merged_data_sorted.set_index('date', inplace=True)

# Resample to daily data and aggregate


daily_data = merged_data_sorted.resample('D').agg({
'open': 'mean', # mean of open prices
'high': 'mean', # mean of high prices
'low': 'mean', # mean of low prices
'close': 'mean', # mean of close prices
'Volume BTC': 'sum', # sum of BTC volumes
'Volume USD': 'sum' # sum of USD volumes
})

reset index and see data

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=bC29Dc8VaDPR&printMode=true 5/10
1/13/24, 9:58 PM Untitled0.ipynb - Colaboratory

daily_data.reset_index(inplace=True)
daily_data.head(5)

date open high low close Volume BTC Volume USD

2017-
0 977.256602 977.385233 977.132620 977.276060 6850.593309 6.765936e+06
01-01

2017-
1 1012.267604 1012.517181 1011.988826 1012.273903 8167.381030 8.276031e+06
01-02

2017-
2 1020.001535 1020.226840 1019.794437 1020.040472 9089.658025 9.276735e+06
01-03

the next step is to securely store this refined dataset. We accomplish this by exporting the
'daily_data' into a CSV file.

daily_data.to_csv('/content/drive/MyDrive/daily_data.csv', index = False)

Adding a moving average (or moving mean) to your dataset is a common technique in time series
analysis, especially in financial data analysis. It helps in smoothing out short-term fluctuations and
highlighting longer-term trends or cycles.

import pandas as pd

# Load your dataset


daily_data = pd.read_csv('/content/drive/MyDrive/daily_data.csv')
#set_index
daily_data.set_index('date', inplace=True)

# Choose a window size for the moving average, 20 days


window_size = 20

# Calculate the moving average for the 'close' price


daily_data['moving_average_close'] = daily_data['close'].rolling(window=window_size).mean()

# Now your daily_data DataFrame has an additional column with the 20-day moving average of t
daily_data.reset_index('date', inplace=True)
#replace first 19 rows of null because of the 20 window.
#used this approach , using close price as defauly price.
daily_data['moving_average_close'].fillna(daily_data['close'], inplace=True)
print(daily_data.head(25)) # Displaying the first 25 rows to see some of the moving average

date open high low close \


0 2017-01-01 977.256602 977.385233 977.132620 977.276060
1 2017-01-02 1012.267604 1012.517181 1011.988826 1012.273903

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=bC29Dc8VaDPR&printMode=true 6/10
1/13/24, 9:58 PM Untitled0.ipynb - Colaboratory
2 2017-01-03 1020.001535 1020.226840 1019.794437 1020.040472
3 2017-01-04 1076.558840 1077.271167 1075.572542 1076.553639
4 2017-01-05 1043.608646 1044.905549 1042.094125 1043.547951
5 2017-01-06 934.455278 935.419188 933.269312 934.416729
6 2017-01-07 869.618951 870.700465 868.904215 869.738333
7 2017-01-08 914.224917 914.637931 913.597944 913.966083
8 2017-01-09 893.495403 893.856319 893.047132 893.471535
9 2017-01-10 902.637313 902.858104 902.343042 902.638375
10 2017-01-11 846.285701 847.306472 845.153167 846.173313
11 2017-01-12 782.970292 783.542347 782.386604 782.961688
12 2017-01-13 807.222361 807.674451 806.778986 807.177507
13 2017-01-14 827.433958 827.573681 827.271819 827.412431
14 2017-01-15 817.127076 817.298757 816.911528 817.081007
15 2017-01-16 827.983313 828.105757 827.839722 827.977958
16 2017-01-17 876.174264 876.640868 875.760896 876.181472
17 2017-01-18 885.380229 885.695486 885.050083 885.345653
18 2017-01-19 893.326090 893.591750 893.053882 893.294389
19 2017-01-20 895.566618 895.695965 895.415535 895.552688
20 2017-01-21 917.654201 917.797965 917.532965 917.679382
21 2017-01-22 922.678097 922.897590 922.452764 922.689111
22 2017-01-23 920.178285 920.327340 919.997403 920.188507
23 2017-01-24 907.212306 907.418618 906.928014 907.186326
24 2017-01-25 894.505562 894.616382 894.376438 894.509347

Volume BTC Volume USD moving_average_close


0 6850.593309 6.765936e+06 977.276060
1 8167.381030 8.276031e+06 1012.273903
2 9089.658025 9.276735e+06 1020.040472
3 21562.456972 2.347651e+07 1076.553639
4 36018.861120 3.619081e+07 1043.547951
5 27916.703099 2.553144e+07 934.416729
6 20401.113591 1.761907e+07 869.738333
7 8937.492708 8.164011e+06 913.966083
8 8716.182941 7.782149e+06 893.471535
9 8535.521688 7.706384e+06 902.638375
10 35893.768368 2.945219e+07 846.173313
11 17400.141555 1.363246e+07 782.961688
12 11409.520330 9.224971e+06 807.177507
13 6614.718992 5.469742e+06 827.412431
14 4231.463903 3.454909e+06 817.081007
15 6166.043977 5.107435e+06 827.977958
16 12264.169385 1.077497e+07 876.181472
17 11181.898878 9.830026e+06 885.345653
18 11094.603298 9.928565e+06 893.294389
19 6618.627764 5.915721e+06 905.154059
20 5865.632031 5.373761e+06 902.174225
21 7166.665479 6.566289e+06 897.694986
22 3514.741429 3.234650e+06 892.702387
23 9405.046565 8.497003e+06 884.234022
24 5291.554742 4.725942e+06 876.782092

daily_data['date'].head()

0 2017-01-01
1 2017-01-02

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=bC29Dc8VaDPR&printMode=true 7/10
1/13/24, 9:58 PM Untitled0.ipynb - Colaboratory
2 2017-01-03
3 2017-01-04
4 2017-01-05
Name: date, dtype: object

daily_data.head()

date open high low close Volume BTC Volume USD m

2017-
0 977.256602 977.385233 977.132620 977.276060 6850.593309 6.765936e+06
01-01

2017-
1 1012.267604 1012.517181 1011.988826 1012.273903 8167.381030 8.276031e+06
01-02

2017-
2 1020.001535 1020.226840 1019.794437 1020.040472 9089.658025 9.276735e+06
01-03

"After preparing and saving the daily data, we shift our focus to developing a predictive model. For
this purpose, a Linear Regression model is chosen due to its effectiveness in capturing linear
relationships between variables. In this context, the model will be employed to understand and
predict the closing prices of Bitcoin based on the daily data trends observed over the previous
years."

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=bC29Dc8VaDPR&printMode=true 8/10
1/13/24, 9:58 PM Untitled0.ipynb - Colaboratory

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error
import numpy as np

# Separate features and target


X = daily_data[['open', 'high', 'low', 'Volume BTC', 'Volume USD', 'moving_average_close']]
y = daily_data['close']

# Split the data


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the model


model = LinearRegression()
model.fit(X_train, y_train)

# Predict and evaluate


predictions = model.predict(X_test)
mae = mean_absolute_error(y_test, predictions)
rmse = np.sqrt(mean_squared_error(y_test, predictions))

print(f'Mean Absolute Error: {mae}')


print(f'Root Mean Squared Error: {rmse}')

Mean Absolute Error: 0.33095838248461557


Root Mean Squared Error: 0.540670528986689

Scaling the entire dataset is an important preprocessing step, especially in regression analysis
where features might have different scales and units. This can significantly impact the performance
of many machine learning algorithms, including linear regression.

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=bC29Dc8VaDPR&printMode=true 9/10
1/13/24, 9:58 PM Untitled0.ipynb - Colaboratory

from sklearn.preprocessing import StandardScaler


from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

# Initialize the StandardScaler


scaler = StandardScaler()

# Scale the training data and also apply the same transformation to the test data
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Initialize the Linear Regression model


linear_model = LinearRegression()

https://colab.research.google.com/drive/1YzxhLK6bzcIR-j5btF8wVutr8BZsToAR#scrollTo=bC29Dc8VaDPR&printMode=true 10/10

You might also like