Lab Experiments Vi Sem-1
Lab Experiments Vi Sem-1
Name:
Semester: 6th
Year : 3rd
Course
S. No Name of Experiment
Outcome
Implement and Load data from CSV for finding the most specific CL601.1
01
hypothesis based on a given set of training data samples.
03 Implement and Estimate Mean and Variance for both X and Y. CL601.1
Theory: The most specific hypothesis is the hypothesis that is as specific as possible while still being
consistent with the training examples. It is used in the Candidate Elimination algorithm, which
updates the hypothesis as more positive examples are encountered.
Implementation:
import csv
def load_csv(filename):
lines = csv.reader(open(filename, "r"))
dataset = list(lines)
headers = dataset.pop(0)
return dataset
def find_most_specific_hypothesis(data):
specific_h = data[0][:-1]
for example in data:
if example[-1] == 'Yes':
for i in range(len(specific_h)):
if specific_h[i] != example[i]:
specific_h[i] = '?'
return specific_h
filename = 'data.csv'
data = load_csv(filename)
hypothesis = find_most_specific_hypothesis(data)
print("Most Specific Hypothesis:", hypothesis)
Theory: Linear regression models the relationship between two variables by fitting a linear equation
to observed data. The model assumes that the relationship between the dependent variable and the
independent variable is linear.
Implementation:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
print("Intercept:", theta_best[0][0])
print("Slope:", theta_best[1][0])
Theory: The mean is the average of a set of values. The variance measures the spread of the data
points around the mean.
Implementation:
import numpy as np
X = np.array([1, 2, 3, 4, 5])
Y = np.array([2, 3, 4, 5, 6])
mean_X = np.mean(X)
mean_Y = np.mean(Y)
var_X = np.var(X)
var_Y = np.var(Y)
Theory: Covariance measures the directional relationship between two random variables. A positive
covariance indicates that the variables increase together, while a negative covariance indicates that
one variable decreases as the other increases.
Implementation:
import numpy as np
X = np.array([1, 2, 3, 4, 5])
Y = np.array([2, 3, 4, 5, 6])
Theory: Linear regression coefficients can be estimated using the least squares method, which
minimizes the sum of the squared differences between the observed and predicted values.
Implementation:
import numpy as np
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([1, 2, 3, 4, 5])
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)
print("Intercept:", theta_best[0])
print("Slope:", theta_best[1])
6. Implement and Estimate Coefficients for Multivariate Linear Regression
Theory: Multivariate linear regression involves more than one independent variable. The coefficients
can be estimated using the same least squares method as in simple linear regression but applied to
multiple variables.
Implementation:
import numpy as np
X = np.array([[1, 2], [2, 4], [3, 6], [4, 8], [5, 10]])
y = np.array([2, 3, 4, 5, 6])
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)
print("Coefficients:", theta_best)
Theory: Stochastic Gradient Descent (SGD) is an iterative method for optimizing an objective
function by updating the model parameters in the direction of the gradient of the objective function
with respect to the parameters.
Implementation:
import numpy as np
y = np.array([2, 3, 4, 5, 6])
m = len(X)
theta = np.random.randn(2, 1)
errors = predictions - y
for i in range(iterations):
errors = predictions - y
return theta
learning_rate = 0.01
iterations = 1000
Theory: The Perceptron algorithm is a type of linear classifier that updates its weights based on the
errors made on training examples. It is an early and simple form of a neural network.
Implementation:
import numpy as np
def step_function(t):
return 1 if t >= 0 else 0
weights = np.zeros(X.shape[1])
for _ in range(epochs):
for i in range(len(X)):
return weights
y = np.array([1, 1, 1, 0])
learning_rate = 0.1
epochs = 10
theta)
Theory: A case study involves detailed analysis of a dataset to understand its characteristics, identify
patterns, and derive insights. In this case, we will use an auto insurance dataset to perform
exploratory data analysis (EDA) and apply machine learning techniques.
Implementation:
import pandas as pd
# Load the dataset
df = pd.read_csv('auto_insurance.csv')
print(df.describe())
print(df.isnull().sum())
# Correlation matrix
print(df.corr())
sns.pairplot(df)
plt.show()
y = df['Claim_Amount']
model = LinearRegression()
model.fit(X_train, y_train)
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)
Theory: Linear regression aims to model the relationship between a dependent variable and one or
more independent variables by fitting a linear equation to observed data.
Implementation:
import numpy as np
import matplotlib.pyplot as plt
# Make predictions
X_new = np.array([[0], [6]])
X_new_b = np.c_[np.ones((len(X_new), 1)), X_new]
y_predict = X_new_b.dot(theta_best)
# Plot results
plt.plot(X_new, y_predict, "r-", linewidth=2, label="Predictions")
plt.plot(X, y, "b.", label="Actual Data")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.show()
print("Intercept:", theta_best[0])
print("Slope:", theta_best[1])