NW INTERNSHIP 10CP
BTC Price Prediction Using Linear
Regression
Step 1: Reading and Inspecting the Data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import
train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error,
mean_absolute_error
# Reading the CSV file and loading it into a
DataFrame
df = pd.read_csv('BTC-USD.csv')
# Checking basic information about the DataFrame
df.info()
Explanation:
• Imports: Necessary libraries are imported
1|Page
NW INTERNSHIP 10CP
(numpy, pandas, matplotlib, seaborn, sklearn).
• Data Loading: The BTC-USD.csv file is loaded
into a Pandas DataFrame (df).
• Data Inspection: df.info() provides basic
information about the DataFrame, such as
column names, data types, and missing values.
DataFrame Information
The info() method gives a concise summary of the
DataFrame. It provides the following details:
• Data types of each column
• Non-null counts
• Memory usage
df.info()
This helps in understanding the structure of the
data and identifying any potential issues such as
missing values or incorrect data types.
Step 2: Data Preprocessing and Visualization
Converting the 'Date' Column
# Converting the 'Date' column datatype from
object to datetime
df['Date'] = pd.to_datetime(df['Date'])
2|Page
NW INTERNSHIP 10CP
# Checking updated information after conversion
df.info()
Explanation:
• Date Conversion: The 'Date' column is
converted from object type to datetime using
pd.to_datetime().
• Updated Information: df.info() confirms the
conversion, showing the 'Date' column now has
datetime datatype.
Visualizing Data with Scatter Plots
# Visualizing data with scatter plots
plt.figure(figsize=(8, 6))
plt.scatter(df['Date'], df['High'])
plt.ylabel('High')
plt.xlabel('Date')
plt.title("Date vs. High (Scatter Plot)")
plt.show()
Explanation:
• Visualization: A scatter plot (plt.scatter) is
created to visualize the relationship between
'Date' and 'High' prices, helping to understand
the data distribution and trends.
3|Page
NW INTERNSHIP 10CP
Step 3: Exploring Data Relationships and Trends
Scatter Plot of 'Date' vs. 'Low'
# Scatter plot of 'Date' vs. 'Low'
plt.figure(figsize=(8, 6))
plt.scatter(df['Date'], df['Low'])
plt.ylabel('Low')
plt.xlabel('Date')
plt.title("Date vs. Low (Scatter Plot)")
plt.show()
Line Plot of 'Date' with 'High' and 'Low' Prices
# Line plot of 'Date' with 'High' and 'Low' prices
plt.plot(df['Date'], df['High'], label='High')
plt.plot(df['Date'], df['Low'], label='Low')
plt.xlabel('Date')
plt.ylabel('Price')
plt.title('High and Low Prices Over Time')
plt.legend()
plt.show()
Explanation:
• Visualization Continues: Another scatter plot
shows the relationship between 'Date' and 'Low'
prices.
4|Page
NW INTERNSHIP 10CP
• Price Trends: Line plots (plt.plot) are used to
visualize the trends of 'High' and 'Low' prices
over time, providing insights into price volatility
and historical movements.
Step 4: Understanding Data Correlations
Heatmap of Correlations Among Numerical
Columns
# Heatmap of correlations among numerical
columns
numerical_cols = ['Open', 'High', 'Low', 'Close', 'Adj
Close', 'Volume']
corr_matrix = df[numerical_cols].corr()
sns.heatmap(corr_matrix, annot=True,
cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()
Explanation:
• Correlation Heatmap: sns.heatmap() creates a
heatmap to visualize correlations (corr())
among numerical columns ('Open', 'High', 'Low',
'Close', 'Adj Close', 'Volume'). This helps in
5|Page
NW INTERNSHIP 10CP
understanding how different variables are
related, which is crucial for feature selection in
modeling.
Detailed Analysis of Correlations
• Strong Positive Correlation: Observing strong
correlations between 'High' and 'Close', 'Open'
and 'Close', etc.
• Weak Correlation: Identifying columns with
weaker correlations which might not be as
useful for prediction.
Step 5: Model Preparation and Feature Selection
Selecting Relevant Features for Modeling
# Selecting relevant features for modeling and
defining target variable
X = df[['Open', 'High', 'Low', 'Volume']]
y = df['Close']
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X,
y, test_size=0.3, random_state=42)
# Checking the first few rows of the training set
6|Page
NW INTERNSHIP 10CP
print(X_train.head())
print(y_train.head())
Explanation:
• Feature Selection: Features (X) such as 'Open',
'High', 'Low', and 'Volume' are selected for
modeling, while 'Close' is chosen as the target
variable (y).
• Data Splitting: train_test_split() splits the data
into training (X_train, y_train) and testing
(X_test, y_test) sets with a test size of 30% and
a fixed random state for reproducibility.
• Data Validation: head() displays the first few
rows of the training set to verify the correct
selection and splitting of data.
Feature Engineering
• Feature Transformation: Discuss potential
feature transformations (e.g., log
transformation) to improve model
performance.
• Handling Missing Values: Describe steps to
handle any missing values if present.
7|Page
NW INTERNSHIP 10CP
Step 6: Model Training and Evaluation
Initializing and Training the Linear Regression
Model
# Initializing and training the Linear Regression
model
model = LinearRegression()
model.fit(X_train, y_train)
Predicting on the Test Set
# Predicting on the test set
y_pred = model.predict(X_test)
Evaluating Model Performance
# Evaluating model performance
r_squared = model.score(X_test, y_test)
print('Coefficient of determination (R^2):',
r_squared)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
print("Mean Squared Error:", mse)
print("Root Mean Squared Error:", rmse)
print("Mean Absolute Error:", mae)
8|Page
NW INTERNSHIP 10CP
Explanation:
• Model Initialization and Training:
LinearRegression() initializes a Linear
Regression model (model) which is then trained
(fit()) on the training data (X_train, y_train).
• Prediction and Evaluation: predict() predicts
'Close' prices on the test set (X_test), and
model performance metrics such as R-squared
(score()), Mean Squared Error
(mean_squared_error()), Root Mean Squared
Error (sqrt()), and Mean Absolute Error
(mean_absolute_error()) are calculated and
printed.
•
Detailed Performance Metrics Analysis
• R-squared Interpretation: Explaining the
coefficient of determination and its
significance.
• Error Metrics: Detailed interpretation of MSE,
RMSE, and MAE, and their implications on model
performance.
Step 7: Interpreting Model Results
Extracting Model Coefficients and Intercept
9|Page
NW INTERNSHIP 10CP
# Extracting model coefficients and intercept
coefficients = model.coef_
intercept = model.intercept_
print("Coefficients (w):", coefficients)
print("Intercept (b):", intercept)
Explanation:
• Model Coefficients: coef_ retrieves the
coefficients of the features (Open, High, Low,
Volume) in the Linear Regression model
(model), while intercept_ retrieves the intercept
(b).
• Understanding Impact: Printing these
coefficients and intercept helps understand
their impact on predicting the 'Close' price
based on the selected features.
Interpretation of Coefficients
• Feature Impact: Detailed discussion on how
each feature impacts the target variable
('Close').
• Significance Testing: Introduction to
significance testing of coefficients (e.g., p-
values).
10 | P a g e
NW INTERNSHIP 10CP
Step 8: Making a Prediction
Example Prediction Using the Model
# Example prediction using the model
input_data = [1565, 3822.384766, 3901.908936,
3797.219238, 4770578575]
predicted_close_price =
model.predict([input_data])
print("Predicted Closing Price:",
predicted_close_price[0])
Explanation:
• Prediction Example: An example input
(input_data) is used to predict the closing price
('Close') using the trained model
(model.predict()), providing a practical
application of the regression model for
forecasting.
Real-World Application
• Use Case Scenarios: Discuss potential real-
world scenarios where this model can be
applied (e.g., trading strategies, market
analysis).
11 | P a g e
NW INTERNSHIP 10CP
• Model Limitations: Highlight limitations of the
model and potential areas for improvement.
Conclusion
By providing detailed explanations and
visualizations, this extended version helps in
understanding the process of predicting Bitcoin
prices using Linear Regression. The document
covers data inspection, preprocessing, visualization,
model training, evaluation, and practical
applications, offering a comprehensive guide for
anyone interested in applying Linear Regression for
financial forecasting.
12 | P a g e