0% found this document useful (0 votes)
248 views5 pages

Exp No 2

lab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
248 views5 pages

Exp No 2

lab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

EXP NO: 2 Exploratory data analysis on a 2 variable linear regression model.

DATE:

Aim:
The aim of this exploratory data analysis is to understand the relationship between two
variables and prepare the data for a 2-variable linear regression model. We want to identify
patterns, check for assumptions, and clean the data as necessary.

Algorithm:

Import Libraries:
Import necessary libraries for data manipulation, visualization, and regression analysis.
Load Data:
Load the dataset containing the two variables of interest.
Explore Data:
Explore the dataset to understand its structure and characteristics.
Visualize Data:
Plot a scatter plot to visualize the relationship between the two variables.
Check Correlation:
Check the correlation coefficient to quantify the linear relationship.
Handle Missing Values:
Check and handle missing values if any.
Split Data:
Split the data into training and testing sets.
Build and Train Model:
Build and train a 2-variable linear regression model.
Make Predictions:
Use the trained model to make predictions on the test set.
Evaluate Model:
Evaluate the performance of the model using metrics such as Mean Absolute Error, Mean
Squared Error, and R-squared.
Visualize Results:
Visualize the actual vs predicted values.

Program:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics

# Load Data
df = pd.read_csv('your_dataset.csv')

# Explore Data
print(df.head())
print(df.info())
print(df.describe())

# Visualize Data
sns.scatterplot(x='Variable1', y='Variable2', data=df)
plt.title('Scatter Plot of Variable1 vs Variable2')
plt.show()

# Check Correlation
correlation = df['Variable1'].corr(df['Variable2'])
print(f'Correlation Coefficient: {correlation}')
# Handle Missing Values
print(df.isnull().sum())
# Handle missing values if necessary

# Split Data
X = df[['Variable1']]
y = df['Variable2']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build and Train Model


model = LinearRegression()
model.fit(X_train, y_train)

# Make Predictions
predictions = model.predict(X_test)

# Evaluate Model
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, predictions))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, predictions))
print('R-squared:', metrics.r2_score(y_test, predictions))

# Visualize Results
plt.scatter(X_test, y_test, color='black')
plt.plot(X_test, predictions, color='blue', linewidth=3)
plt.title('Actual vs Predicted Values')
plt.xlabel('Variable1')
plt.ylabel('Variable2')
plt.show()

Input:( your_dataset.csv)
Variable1,Variable2
1.5, 2.5
2.0, 3.0
3.0, 4.0
4.5, 5.5
5.0, 6.0

Output:
Result:
Thus, exploratory data analysis is to understand the relationship between two variables and
prepare the data for a 2-variable linear regression model was completed successfully.

You might also like