Exp 1
Exp 1
EXPERIMENT NO 1
Title: To implement Linear Regression
Lab Objective: To implement an appropriate machine learning model for the given
application.
Theory:
1. We will begin with importing the dataset using pandas and also import other libraries
such as numpy and matplotlib.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
dataset = pd.read_csv('Salary_Data.csv')
dataset.head()
2. Now that we have imported the dataset, we will perform data preprocessing.
X = dataset.iloc[:,:-1].values #independent variable array
y = dataset.iloc[:,1].values #dependent variable vector
The X is independent variable array and y is the dependent variable vector. Note the
difference between the array and vector. The dependent variable must be in vector and
independent variable must be an array itself.
3. Now that we have imported the dataset, we will perform data preprocessing.
Why is it necessary to perform splitting? This is because we wish to train our model
according to the years and salary. We then test our model on the test set.
We check whether the predictions made by the model on the test set data matches what was
given in the dataset.
If it matches, it implies that our model is accurate and is making the right predictions.
5. From sklearn’s linear model library, import linear regression class. Create an object
for a linear regression class called regressor.
To fit the regressor into the training set, we will call the fit method – function to fit the
regressor into the training set.
We need to fit X_train (training data of matrix of features) into the target values y_train. Thus
the model learns the correlation and learns how to predict the dependent variables based on
the independent variable.
6. We create a vector containing all the predictions of the test set salaries. The predicted
salaries are then put into the vector called y_pred.(contains prediction for all observations in
the test set)
predict method makes the predictions for the test set. Hence, the input is the test set. The
parameter for predict must be an array or sparse matrix, hence input is X_test.
y_pred = regressor.predict(X_test)
y_pred
y-pred output
y_test
y-test output
y_test is the real salary of the test set.
y_pred are the predicted salaries.
Visualizing the results
Let’s see what the results of our code will look like when we visualize it.
1. Plotting the points (observations)
To visualize the data, we plot graphs using matplotlib. To plot real observation points ie
plotting the real given values.
The X-axis will have years of experience and the Y-axis will have the predicted salaries.
plt.scatter plots a scatter plot of the data. Parameters include :
Note : The y-coordinate is not y_pred because y_pred is predicted salaries of the test set
observations.
dataset = pd.read_csv('Salary_Data.csv')
dataset.head()
# data preprocessing
X = dataset.iloc[:, :-1].values #independent variable array
y = dataset.iloc[:,1].values #dependent variable vector
y_test
plt.xlabel("Years of experience")
plt.ylabel("Salaries")
plt.show()
Sample Output:
Program Output: