0% found this document useful (0 votes)
16 views12 pages

Chapter 9 BTC PRICE PRED

The document outlines a comprehensive guide for predicting Bitcoin prices using Linear Regression, covering data inspection, preprocessing, visualization, model training, and evaluation. It includes steps for reading data, visualizing relationships, understanding correlations, selecting features, and making predictions. The final sections discuss model performance metrics and real-world applications, providing insights into the practical use of the regression model for financial forecasting.

Uploaded by

Nawaz Wariya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views12 pages

Chapter 9 BTC PRICE PRED

The document outlines a comprehensive guide for predicting Bitcoin prices using Linear Regression, covering data inspection, preprocessing, visualization, model training, and evaluation. It includes steps for reading data, visualizing relationships, understanding correlations, selecting features, and making predictions. The final sections discuss model performance metrics and real-world applications, providing insights into the practical use of the regression model for financial forecasting.

Uploaded by

Nawaz Wariya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

NW INTERNSHIP 10CP

BTC Price Prediction Using Linear


Regression
Step 1: Reading and Inspecting the Data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import
train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error,
mean_absolute_error

# Reading the CSV file and loading it into a


DataFrame
df = pd.read_csv('BTC-USD.csv')

# Checking basic information about the DataFrame


df.info()

Explanation:
• Imports: Necessary libraries are imported

1|Page
NW INTERNSHIP 10CP

(numpy, pandas, matplotlib, seaborn, sklearn).


• Data Loading: The BTC-USD.csv file is loaded
into a Pandas DataFrame (df).
• Data Inspection: df.info() provides basic
information about the DataFrame, such as
column names, data types, and missing values.

DataFrame Information
The info() method gives a concise summary of the
DataFrame. It provides the following details:
• Data types of each column

• Non-null counts

• Memory usage
df.info()
This helps in understanding the structure of the
data and identifying any potential issues such as
missing values or incorrect data types.

Step 2: Data Preprocessing and Visualization


Converting the 'Date' Column
# Converting the 'Date' column datatype from
object to datetime
df['Date'] = pd.to_datetime(df['Date'])

2|Page
NW INTERNSHIP 10CP

# Checking updated information after conversion


df.info()

Explanation:
• Date Conversion: The 'Date' column is

converted from object type to datetime using


pd.to_datetime().
• Updated Information: df.info() confirms the

conversion, showing the 'Date' column now has


datetime datatype.
Visualizing Data with Scatter Plots
# Visualizing data with scatter plots
plt.figure(figsize=(8, 6))
plt.scatter(df['Date'], df['High'])
plt.ylabel('High')
plt.xlabel('Date')
plt.title("Date vs. High (Scatter Plot)")
plt.show()

Explanation:
• Visualization: A scatter plot (plt.scatter) is

created to visualize the relationship between


'Date' and 'High' prices, helping to understand
the data distribution and trends.
3|Page
NW INTERNSHIP 10CP

Step 3: Exploring Data Relationships and Trends


Scatter Plot of 'Date' vs. 'Low'
# Scatter plot of 'Date' vs. 'Low'
plt.figure(figsize=(8, 6))
plt.scatter(df['Date'], df['Low'])
plt.ylabel('Low')
plt.xlabel('Date')
plt.title("Date vs. Low (Scatter Plot)")
plt.show()
Line Plot of 'Date' with 'High' and 'Low' Prices
# Line plot of 'Date' with 'High' and 'Low' prices
plt.plot(df['Date'], df['High'], label='High')
plt.plot(df['Date'], df['Low'], label='Low')
plt.xlabel('Date')
plt.ylabel('Price')
plt.title('High and Low Prices Over Time')
plt.legend()
plt.show()

Explanation:
• Visualization Continues: Another scatter plot

shows the relationship between 'Date' and 'Low'


prices.
4|Page
NW INTERNSHIP 10CP

• Price Trends: Line plots (plt.plot) are used to


visualize the trends of 'High' and 'Low' prices
over time, providing insights into price volatility
and historical movements.

Step 4: Understanding Data Correlations


Heatmap of Correlations Among Numerical
Columns
# Heatmap of correlations among numerical
columns
numerical_cols = ['Open', 'High', 'Low', 'Close', 'Adj
Close', 'Volume']
corr_matrix = df[numerical_cols].corr()

sns.heatmap(corr_matrix, annot=True,
cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

Explanation:
• Correlation Heatmap: sns.heatmap() creates a

heatmap to visualize correlations (corr())


among numerical columns ('Open', 'High', 'Low',
'Close', 'Adj Close', 'Volume'). This helps in
5|Page
NW INTERNSHIP 10CP

understanding how different variables are


related, which is crucial for feature selection in
modeling.

Detailed Analysis of Correlations


• Strong Positive Correlation: Observing strong

correlations between 'High' and 'Close', 'Open'


and 'Close', etc.
• Weak Correlation: Identifying columns with
weaker correlations which might not be as
useful for prediction.

Step 5: Model Preparation and Feature Selection


Selecting Relevant Features for Modeling
# Selecting relevant features for modeling and
defining target variable
X = df[['Open', 'High', 'Low', 'Volume']]
y = df['Close']

# Splitting the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X,
y, test_size=0.3, random_state=42)

# Checking the first few rows of the training set


6|Page
NW INTERNSHIP 10CP

print(X_train.head())
print(y_train.head())

Explanation:
• Feature Selection: Features (X) such as 'Open',

'High', 'Low', and 'Volume' are selected for


modeling, while 'Close' is chosen as the target
variable (y).
• Data Splitting: train_test_split() splits the data
into training (X_train, y_train) and testing
(X_test, y_test) sets with a test size of 30% and
a fixed random state for reproducibility.
• Data Validation: head() displays the first few
rows of the training set to verify the correct
selection and splitting of data.

Feature Engineering
• Feature Transformation: Discuss potential

feature transformations (e.g., log


transformation) to improve model
performance.
• Handling Missing Values: Describe steps to
handle any missing values if present.

7|Page
NW INTERNSHIP 10CP

Step 6: Model Training and Evaluation


Initializing and Training the Linear Regression
Model
# Initializing and training the Linear Regression
model
model = LinearRegression()
model.fit(X_train, y_train)
Predicting on the Test Set
# Predicting on the test set
y_pred = model.predict(X_test)
Evaluating Model Performance
# Evaluating model performance
r_squared = model.score(X_test, y_test)
print('Coefficient of determination (R^2):',
r_squared)

mse = mean_squared_error(y_test, y_pred)


rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)

print("Mean Squared Error:", mse)


print("Root Mean Squared Error:", rmse)
print("Mean Absolute Error:", mae)

8|Page
NW INTERNSHIP 10CP

Explanation:
• Model Initialization and Training:

LinearRegression() initializes a Linear


Regression model (model) which is then trained
(fit()) on the training data (X_train, y_train).
• Prediction and Evaluation: predict() predicts

'Close' prices on the test set (X_test), and


model performance metrics such as R-squared
(score()), Mean Squared Error
(mean_squared_error()), Root Mean Squared
Error (sqrt()), and Mean Absolute Error
(mean_absolute_error()) are calculated and
printed.

Detailed Performance Metrics Analysis


• R-squared Interpretation: Explaining the

coefficient of determination and its


significance.
• Error Metrics: Detailed interpretation of MSE,

RMSE, and MAE, and their implications on model


performance.

Step 7: Interpreting Model Results


Extracting Model Coefficients and Intercept
9|Page
NW INTERNSHIP 10CP

# Extracting model coefficients and intercept


coefficients = model.coef_
intercept = model.intercept_
print("Coefficients (w):", coefficients)
print("Intercept (b):", intercept)

Explanation:
• Model Coefficients: coef_ retrieves the

coefficients of the features (Open, High, Low,


Volume) in the Linear Regression model
(model), while intercept_ retrieves the intercept
(b).
• Understanding Impact: Printing these
coefficients and intercept helps understand
their impact on predicting the 'Close' price
based on the selected features.

Interpretation of Coefficients
• Feature Impact: Detailed discussion on how

each feature impacts the target variable


('Close').
• Significance Testing: Introduction to
significance testing of coefficients (e.g., p-
values).
10 | P a g e
NW INTERNSHIP 10CP

Step 8: Making a Prediction


Example Prediction Using the Model
# Example prediction using the model
input_data = [1565, 3822.384766, 3901.908936,
3797.219238, 4770578575]
predicted_close_price =
model.predict([input_data])
print("Predicted Closing Price:",
predicted_close_price[0])

Explanation:
• Prediction Example: An example input
(input_data) is used to predict the closing price
('Close') using the trained model
(model.predict()), providing a practical
application of the regression model for
forecasting.

Real-World Application
• Use Case Scenarios: Discuss potential real-

world scenarios where this model can be


applied (e.g., trading strategies, market
analysis).
11 | P a g e
NW INTERNSHIP 10CP

• Model Limitations: Highlight limitations of the


model and potential areas for improvement.

Conclusion
By providing detailed explanations and
visualizations, this extended version helps in
understanding the process of predicting Bitcoin
prices using Linear Regression. The document
covers data inspection, preprocessing, visualization,
model training, evaluation, and practical
applications, offering a comprehensive guide for
anyone interested in applying Linear Regression for
financial forecasting.

12 | P a g e

You might also like