Exp No 2
Exp No 2
DATE:
Aim:
The aim of this exploratory data analysis is to understand the relationship between two
variables and prepare the data for a 2-variable linear regression model. We want to identify
patterns, check for assumptions, and clean the data as necessary.
Algorithm:
Import Libraries:
Import necessary libraries for data manipulation, visualization, and regression analysis.
Load Data:
Load the dataset containing the two variables of interest.
Explore Data:
Explore the dataset to understand its structure and characteristics.
Visualize Data:
Plot a scatter plot to visualize the relationship between the two variables.
Check Correlation:
Check the correlation coefficient to quantify the linear relationship.
Handle Missing Values:
Check and handle missing values if any.
Split Data:
Split the data into training and testing sets.
Build and Train Model:
Build and train a 2-variable linear regression model.
Make Predictions:
Use the trained model to make predictions on the test set.
Evaluate Model:
Evaluate the performance of the model using metrics such as Mean Absolute Error, Mean
Squared Error, and R-squared.
Visualize Results:
Visualize the actual vs predicted values.
Program:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
# Load Data
df = pd.read_csv('your_dataset.csv')
# Explore Data
print(df.head())
print(df.info())
print(df.describe())
# Visualize Data
sns.scatterplot(x='Variable1', y='Variable2', data=df)
plt.title('Scatter Plot of Variable1 vs Variable2')
plt.show()
# Check Correlation
correlation = df['Variable1'].corr(df['Variable2'])
print(f'Correlation Coefficient: {correlation}')
# Handle Missing Values
print(df.isnull().sum())
# Handle missing values if necessary
# Split Data
X = df[['Variable1']]
y = df['Variable2']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Make Predictions
predictions = model.predict(X_test)
# Evaluate Model
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, predictions))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, predictions))
print('R-squared:', metrics.r2_score(y_test, predictions))
# Visualize Results
plt.scatter(X_test, y_test, color='black')
plt.plot(X_test, predictions, color='blue', linewidth=3)
plt.title('Actual vs Predicted Values')
plt.xlabel('Variable1')
plt.ylabel('Variable2')
plt.show()
Input:( your_dataset.csv)
Variable1,Variable2
1.5, 2.5
2.0, 3.0
3.0, 4.0
4.5, 5.5
5.0, 6.0
Output:
Result:
Thus, exploratory data analysis is to understand the relationship between two variables and
prepare the data for a 2-variable linear regression model was completed successfully.