AIML Lab Ex 3-5 - 1
AIML Lab Ex 3-5 - 1
Aim
To implement Naïve bayes classifiers as machine learning models using sklearn library in
python.
Naïve Bayes Algorithm
● Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems.
● It is mainly used in text classification that includes a high-dimensional training
dataset.
● Naïve Bayes Classifier is one of the simplest and most effective Classification
algorithms which helps in building fast machine learning models that can make quick
predictions.
● It is a probabilistic classifier, which means it predicts based on the probability of
an object.
● Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, which can be
described as:
● Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the
bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an
apple. Hence each feature individually contributes to identifying that it is an apple
without depending on each other.
● Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
● Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends on the
conditional probability.
● The formula for Bayes' theorem is given as:
17
Types of Naïve Bayes Model:
There are three types of Naive Bayes Model, which are given below:
● Gaussian: The Gaussian model assumes that features follow a normal distribution.
This means if predictors take continuous values instead of discrete, then the model
assumes that these values are sampled from the Gaussian distribution.
● Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
multinomial distributed. It is primarily used for document classification problems, it
means a particular document belongs to which category such as Sports, Politics,
education, etc.
● Bernoulli: The Bernoulli classifier works like the Multinomial classifier, but the
predictor variables are the independent Booleans variables. Such as if a particular
word is present or not in a document. This model is also famous for document
classification tasks.
Procedure:
1. Import the necessary libraries and dataset (here,”loan_data.csv”)
2. Explore the data to figure out what they look like
3. Pre-process the data
4. Split the data into attributes and labels
5. Divide the data into training and testing sets
6. Train the model using GaussianNB ,MultinomialNB, BernoulliNB algorithms
7. Make some predictions and display confusion matrix for each model and compare
them
8. Evaluate the results of the algorithm and display classification report for train and
test data
Program
CODE:
import pandas as pd
df = pd.read_csv('loan_data.csv')
df.head()
OUTPUT:
credit.policy purpose int.rate installment log.annual.inc
\
18
0 1 debt_consolidation 0.1189 829.10
11.350407
1 1 credit_card 0.1071 228.22
11.082143
2 1 debt_consolidation 0.1357 366.86
10.373491
3 1 debt_consolidation 0.1008 162.34
11.350407
4 1 credit_card 0.1426 102.92
11.299732
delinq.2yrs pub.recnot.fully.paid
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 1 0 0
CODE:
df.info()
OUTPUT:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9578 entries, 0 to 9577
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 credit.policy 9578 non-null int64
1 purpose 9578 non-null object
2 int.rate 9578 non-null float64
3 installment 9578 non-null float64
4 log.annual.inc 9578 non-null float64
5 dti 9578 non-null float64
6 fico 9578 non-null int64
7 days.with.cr.line 9578 non-null float64
8 revol.bal 9578 non-null int64
9 revol.util 9578 non-null float64
10 inq.last.6mths 9578 non-null int64
11 delinq.2yrs 9578 non-null int64
12 pub.rec 9578 non-null int64
13 not.fully.paid 9578 non-null int64
dtypes: float64(6), int64(7), object(1)
memory usage: 1.0+ MB
CODE:
19
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(data=df,x='purpose',hue='not.fully.paid')
plt.xticks(rotation=45, ha='right');
OUTPUT:
CODE:
pre_df = pd.get_dummies(df,columns=['purpose'],drop_first=True)
pre_df.head()
OUTPUT:
credit.policyint.rate installment log.annual.inc dti fico \
0 1 0.1189 829.10 11.350407 19.48 737
1 1 0.1071 228.22 11.082143 14.29 707
2 1 0.1357 366.86 10.373491 11.63 682
3 1 0.1008 162.34 11.350407 8.10 712
4 1 0.1426 102.92 11.299732 14.97 667
days.with.cr.linerevol.balrevol.util i
inq.last.6mths
nq.last.6mths delinq.2yrs \
0 5639.958333 28854 52.1 0 0
1 2760.000000 33623 76.7 0 0
2 4710.000000 3511 25.6 1 0
3 2699.958333 33667 73.2 1 0
4 4066.000000 4740 39.5 0 1
pub.recnot.fully.paidpurpose_credit_cardpurpose_debt_consolidation \
0 0 0 0
1
1 0 0 1
20
0
2 0 0 0
1
3 0 0 0
1
4 0 0 1
0
purpose_educationalpurpose_home_improvementpurpose_major_purchase \
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
purpose_small_business
0 0
1 0
2 0
3 0
4 0
CODE:
from sklearn.model_selection import train_test_split
X = pre_df.drop('not.fully.paid', axis=1)
y = pre_df['not.fully.paid']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33,
random_state=125)
classification_report,)
y_pred1 = model1.predict(X_test)
accuracy1 = accuracy_score(y_pred1, y_test)
print("Accuracy:", accuracy1)
print("F1 Score:", f1)
OUTPUT:
Accuracy: 0.8206263840556786
F1 Score: 0.8686606980013266
CODE:
21
labels = ["Fully Paid", "Not fully Paid"]
cm1 = confusion_matrix(y_test, y_pred1)
disp1 = ConfusionMatrixDisplay(confusion_matrix=cm1,
onMatrixDisplay(confusion_matrix=cm1,
display_labels=labels)
disp1.plot();
OUTPUT:
y_pred2= model2.predict(X_test)
accuracy2
acy2 = accuracy_score(y_pred2, y_test)
print("Accuracy:", accuracy2)
print("F1 Score:", f11)
OUTPUT:
Accuracy: 0.6678266371401456
F1 Score: 0.640426265085445
CODE:
22
cm2 = confusion_matrix(y_test, y_pred2)
disp2 = ConfusionMatrixDisplay(confusion_matrix=cm2,display_labels=labels)
disp2.plot();
OUTPUT:
y_pred3 = model3.predict(X_test)
print("Accuracy:", accuracy3)
print("F1 Score:", f13)
OUTPUT:
Accuracy: 0.8272698513128757
F1 Score: 0.8686606980013266
CODE:
cm3 = confusion_matrix(y_test, y_pred3)
disp3 = ConfusionMatrixDisplay(confusion_matrix=cm3,
display_labels=labels)
disp3.plot();
23
OUTPUT:
#GAUSSIANNB
CODE:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred1))
OUTPUT:
precision recall f1
f1-score support
#MULTINOMIALNB
CODE:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred2))
OUTPUT:
precision recall f1
f1-score support
24
0 0.84 0.74 0.79 2625
1 0.20 0.32 0.25 536
#BERNOULLINB
CODE:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred3))
OUTPUT:
precision recall f1-score support
Result:
25
26
Ex. No. 4 IMPLEMENTATION OF BAYESIAN NETWORKS
Date:
Aim:
To construct a Bayesian network, to demonstrate the diagnosis of heart patients using
standard Heart Disease Data Set.
Bayesian network consists of two major parts: a directed acyclic graph and a set of
conditional probability distributions
Algorithm:
1. Read the training dataset T;
2. Calculate the mean and standard deviation of the predictor variables in each class;
3. Repeat Calculate the probability of fi using the gauss density equation in each class; Until
the probability of all predictor variables (f1, f2, f3, .. , fn) has been calculated.
4. Calculate the likelihood for each class;
5. Get the greatest likelihood;
Program
27
BayesNet()
A BayesNet is a graph (as in the diagram above) where each node represents a random
variable, and the edges are parent→child links. You can construct an empty graph with
BayesNet(), then add variables one at a time with the method call .add(variable_name,
parent_names, cpt), where the names are strings, and each of the parent_names must already
have been .added.
A random variable; the ovals in the diagram above. The value of a variable depends on the
value of the parents, in a probabilistic way specified by the variable's conditional probability
table (CPT). Given the parents, the variable is independent of all the other variables. For
example, if I know whether Alarm is true or false, then I know the probability of JohnCalls,
and evidence about the other variables won't give me any more information about JohnCalls.
Each row of the CPT uses the same order of variables as the list of parents. We will only
allow variables with a finite discrete domain; not continuous values.
ProbDist(mapping)
Factor(mapping)
Evidence(mapping)
A mapping of {Variable: value, ...} pairs, describing the exact values for a set of
variables—the things we know for sure.
CPTable(rows, parents)
A conditional probability table (or CPT) describes the probability of each possible outcome
value of a random variable, given the values of the parent variables. A CPTable is a a
mapping, {tuple: probdist, ...}, where each tuple lists the values of each of the parent
variables, in order, and each probability distribution says what the possible outcomes are,
given those values of the parents. The CPTable for Alarm in the diagram above would be
represented as follows:
28
Take the second row, "(T, F): .94". This means that when the first parent (Burglary) is
true, and the second parent (Earthquake) is fale, then the probability of Alarm being true is
.94. Note that the .94 is an abbreviation for ProbDist({T: .94, F: .06}).
T = Bool(True); F = Bool(False)
In [2]:
from collections import defaultdict, Counter
import itertools
import math
import random
class BayesNet(object):
"Bayesian network: a graph of variables connected by parent links."
def __init__(self):
self.variables = [] # List of variables, in parent-first topological sort order
self.lookup = {} # Mapping of {variable_name: variable} pairs
class Variable(object):
"A discrete random variable; conditional on zero or more parent Variables."
class ProbDist(Factor):
"""A Probability Distribution is an {outcome: probability} mapping.
The values are normalized to sum to 1.
ProbDist(0.75) is an abbreviation for ProbDist({T: 0.75, F: 0.25})."""
def __init__(self, mapping=(), **kwargs):
if isinstance(mapping, float):
29
mapping = {T: mapping, F: 1 - mapping}
self.update(mapping, **kwargs)
normalize(self)
class Evidence(dict):
"A {variable: value} mapping, describing what we know for sure."
class CPTable(dict):
"A mapping of {row: ProbDist, ...} where each row is a tuple of values of the parent
variables."
class Bool(int):
"Just like `bool`, except values display as 'T' and 'F' instead of 'True' and 'False'"
__str__ = __repr__ = lambda self: 'T' if self else 'F'
T = Bool(True)
F = Bool(False)
In [9]:
def normalize(dist):
"Normalize a {key: value} distribution so values sum to 1.0. Mutates dist and returns it."
total = sum(dist.values())
for key in dist:
dist[key] = dist[key] / total
assert 0 <= dist[key] <= 1, "Probabilities must be between 0 and 1."
return dist
def sample(probdist):
"Randomly sample an outcome from a probability distribution."
30
r = random.random() # r is a random point in the probability distribution
c = 0.0 # c is the cumulative probability of outcomes seen so far
for outcome in probdist:
c += probdist[outcome]
if r <= c:
return outcome
def globalize(mapping):
"Given a {name: value} mapping, export all the names to the `globals()` namespace."
globals().update(mapping)
In [4]:
Earthquake = Variable('Earthquake', 0.002)
In [5]:
P(Earthquake)
Out[5]:
{F: 0.998, T: 0.002}
In [6]:
P(Earthquake)[T]
Out[6]:
0.002
In [7]:
alarm_net = (BayesNet()
.add('Burglary', [], 0.001)
.add('Earthquake', [], 0.002)
.add('Alarm', ['Burglary', 'Earthquake'], {(T, T): 0.95, (T, F): 0.94, (F, T): 0.29, (F, F): 0.001})
.add('JohnCalls', ['Alarm'], {T: 0.90, F: 0.05})
.add('MaryCalls', ['Alarm'], {T: 0.70, F: 0.01}))
In [8]:
# Make Burglary, Earthquake, etc. be global variables
globalize(alarm_net.lookup)
alarm_net.variables
Out[8]:
[Burglary, Earthquake, Alarm, JohnCalls, MaryCalls]
In [14]:
# Probability of Alarm going off, given a Burglary and not an Earthquake:
P(Alarm, {Burglary: T, Earthquake: F})
Out[14]:
{T: 0.94, F: 0.06000000000000005}
31
In [15]:
Alarm.cpt
Out[15]:
{(T, T): {T: 0.95, F: 0.050000000000000044},
(T, F): {T: 0.94, F: 0.06000000000000005},
(F, T): {T: 0.29, F: 0.71},
(F, F): {T: 0.001, F: 0.999}}
For a network with n variables, each of which has b values, there are bn rows in the joint
distribution (for example, a billion rows for 30 Boolean variables), making it impractical to
explicitly create the joint distribution for large networks. But for small networks, the function
joint_distribution creates the distribution, which can be instructive to look at, and can be
used to do inference.
In [16]:
def joint_distribution(net):
"Given a Bayes net, create the joint distribution over all variables."
return ProbDist({row: prod(P_xi_given_parents(var, row, net)
for var in net.variables)
for row in all_rows(net)})
def prod(numbers):
"The product of numbers: prod([2, 3, 5]) == 30. Analogous to `sum([2, 3, 5]) == 10`."
result = 1
for x in numbers:
result *= x
return result
In [17]:
32
P(Alarm, {Burglary: F, Earthquake: F})
Out[17]:
{T: 0.001, F: 0.999}
In [18]:
# Probability that "the alarm has sounded, but neither a burglary nor an earthquake has
occurred,
# and both John and Mary call" (page 514 says it should be 0.000628)
print(alarm_net.variables)
joint_distribution(alarm_net)[F, F, T, T, T]
Out[18]:
0.00062811126
Bayes nets allow us to calculate the probability, but the calculation is not just a lookup
in the CPT; it is a global calculation across the whole net. One inefficient but straightforward
way of doing the calculation is to create the joint probability distribution, then pick out just
the rows that match the evidence variables, and for each row check what the value of the
query variable is, and increment the probability for that value accordningly:
In [19]:
In [20]:
# The probability of a Burgalry, given that John calls but Mary does not:
enumeration_ask(Burglary, {JohnCalls: F, MaryCalls: T}, alarm_net)
33
Out[20]:
{F: 0.9931237539265789, T: 0.006876246073421024}
In [21]:
enumeration_ask(Burglary, {JohnCalls: T, MaryCalls: T}, alarm_net)
Out[21]:
{F: 0.7158281646356071, T: 0.28417183536439294}
In [22]:
# The probability of an Alarm, given that there is an Earthquake and Mary calls:
enumeration_ask(Alarm, {MaryCalls: T, Earthquake: T}, alarm_net)
Out[22]:
{F: 0.03368899586522123, T: 0.9663110041347788}
The Cleveland database contains 76 attributes, but all published experiments refer to using a
subset of 14 of them. In particular, the Cleveland database is the only one that has been used
by ML researchers to this date. The “Heartdisease” field refers to the presence of heart disease
in the patient. It is integer valued from 0 (no presence) to 4.
Database: 0 1 2 3 4 Total
Attribute Information:
34
11. slope: the slope of the peak exercise ST segment
1. Value 1: upsloping
2. Value 2: flat
3. Value 3: downsloping
12. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
13. Heartdisease: It is integer valued from 0 (no presence) to 4.
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal Heartdisease
63 1 1 145 233 1 2 150 o 2.3 3 o 6 o
67 1 4 160 286 o 2 108 1 1.5 2 3 3 2
67 1 4 120 229 o 2 129 1 2.6 2 2 7 1
41 o 2 130 204 o 2 172 o 1.4 1 o 3 o
62 o 4 140 268 o 2 160 o 3.6 3 2 3 3
60 1 4 130 206 o 2 132 1 2.4 2 2 7 4
heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)
model=
BayesianModel([('age','heartdisease'),('sex','heartdisease'),('exang','hear
tdisease'),('cp','heartdisease'),('heartdisease','restecg'),('heartdisease'
,'chol')])
print('\nLearning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
35
print(q2)
Output
Result:
Thus Bayesian network is built and inference for earthquake problem is identified.
36
Ex. No. 5 BUILD REGRESSION MODELS
Date:
AIM:
To build regression models using various datasets.
REGRESSION:
Regression shows a line or curve that passes through all the datapoints on
target-predictor graph in such a way that the vertical distance between the datapoints and the
regression line is minimum.
Some examples:
->Prediction of rain
->Determining Market trends
->Prediction of road accidents due to rash driving.
Terminologies:
• Dependent Variable
• Independent Variable
• Outliers
• Multicollinearity
• Underfitting and Overfitting
Types of regression :
• Linear Regression
• Logistic Regression
• Ridge Regression
• Lasso Regression:
Linear Regression:
Linear Regression is an algorithm that belongs to supervised machine learning. It
tries to apply relations that will predict the outcome of an event based on the independent
variable data points. The relation is usually a straight line that best fits the different data
points as close as possible.
Y= aX+b
Here, Y = dependent variables (target variables),
X= Independent variables (predictor variables),
a and b are the linear coefficients
37
Logistic Regression:
Logistic Regression is one of the supervised learning algorithm. It is used to
calculate or predict the probability of an event occurring.. The formula is given below:
1
𝑓 (𝑥 ) =
1+𝑒
• f(x)= Output between the 0 and 1 value.
• x= input to the function
• e= base of natural logarithm.
Ridge Regression:
Ridge Regression Is a technique Which Is used for analyzing Multiple
Regression where the data suffers from multicollinearity. The Problem which arises due to
multicollinearity is that the basic linear regression model (Least Square Estimates) becomes
unbiased and the variance becomes so large that the predicted values are far from the true
value.
Lasso Regression:
Lasso (least absolute shrinkage and selection operator) is a regression analysis
method that performs both variable selection and regularization in order to enhance the
prediction accuracy and interpretability of the resulting statiscal model.
Algorithm:
38
LINEAR REGRESSION:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data=pd.read_csv('headbrain.csv')
data
OUTPUT:
OUTPUT:
coefficients: 0.26342933948939945 325.57342104944223
39
CODE:
ss_t = 0
ss_r = 0
for i in range(n):
y_pred = b0 + b1 * X[i]
ss_t += (Y[i] - mean_y) ** 2
ss_r += (Y[i] - y_pred) ** 2
r2 = 1 - (ss_r/ss_t)
print("mean squared error:",r2)
OUTPUT:
CODE:
OUTPUT:
40
LOGISTIC REGRESSION:
CODE:(Diabetes Dataset)
import pandas as pd
df=pd.read_csv('diabetes.csv')
df
OUTPUT:
Pregnanci Glucose Blood Skin Insuli BMI Diabetes Age Outco
es Pressure Thickness n Pedigree me
Function
0 6 148 72 35 0 33.6 0.627 50 1
1 1 85 66 29 0 26.6 0.351 31 0
2 8 183 64 0 0 23.3 0.672 32 1
3 1 89 66 23 94 28.1 0.167 21 0
4 0 137 40 35 168 43.1 2.288 33 1
... ... ... ... ... ... ... ... ... ...
763 10 101 76 48 180 32.9 0.171 63 0
764 2 122 70 27 0 36.8 0.340 27 0
765 5 121 72 23 112 26.2 0.245 30 0
766 1 126 60 0 0 30.1 0.349 47 1
767 1 93 70 31 0 30.4 0.315 23 0
768 rows × 9 columns
DATA DESCRIPTION:
41
CODE:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
diabetesDF = pd.read_csv('diabetes.csv')
print(diabetesDF.head())
OUTPUT:
DATA EXPLORATION:
CODE:
corr = diabetesDF.corr()
print(corr)
sns.heatmap(corr,xticklabels=corr.columns,yticklabels=corr.col
umns)
OUTPUT:
42
0.207371
SkinThickness -0.081672 0.057328 0.207371
1.000000
Insulin -0.073535 0.331357 0.088933
0.436783
BMI 0.017683 0.221071 0.281805
0.392573
DiabetesPedigreeFunction -0.033523 0.137337 0.041265
0.183928
Age 0.544341 0.263514 0.239528
-0.113970
Outcome 0.221898 0.466581 0.065068
0.074752
Insulin BMI
DiabetesPedigreeFunction \
Pregnancies -0.073535 0.017683 -
0.033523
Glucose 0.331357 0.221071
0.137337
BloodPressure 0.088933 0.281805
0.041265
SkinThickness 0.436783 0.392573
0.183928
Insulin 1.000000 0.197859
0.185071
BMI 0.197859 1.000000
0.140647
DiabetesPedigreeFunction 0.185071 0.140647
1.000000
Age -0.042163 0.036242
0.033561
Outcome 0.130548 0.292695
0.173844
Age Outcome
Pregnancies 0.544341 0.221898
Glucose 0.263514 0.466581
BloodPressure 0.239528 0.065068
SkinThickness -0.113970 0.074752
Insulin -0.042163 0.130548
BMI 0.036242 0.292695
DiabetesPedigreeFunction 0.033561 0.173844
Age 1.000000 0.238356
Outcome 0.238356 1.000000
<Axes: >
43
LASSO REGRESSION:
TO FIND MODEL SCORE:
CODE:
OUTPUT:
44
CODE:
from sklearn.model_selection import GridSearchCV
param_grid = {'alpha': [0.001, 0.01, 0.1, 1, 10]}
lasso_cv = GridSearchCV(Lasso(), param_grid, cv=5)
lasso_cv.fit(X_train, y_train)
print("Best Parameters:", lasso_cv.best_params_)
print("Best Score:", lasso_cv.be
OUTPUT:
RIDGE REGRESSION:
CODE:(Housing Dataset)
import pandas as pd
import numpy as np
df=pd.read_csv("housing.csv")
df.info()
OUTPUT:
45
url='https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
dataframe = read_csv(url, header=None)
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
model = Ridge(alpha=1.0)
model.fit(X, y)
row =[0.00632,18.00,2.310,0,0.5380,6.5750,65.20,4.0900,1,296.0,15.30,396.90,4.98]
yhat = model.predict([row])
print('Predicted: %.3f' % yhat)
OUTPUT:
Predicted: 30.253
CODE:
from numpy import arange
from pandas import read_csv
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RepeatedKFold
from sklearn.linear_model import Ridge
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
dataframe = read_csv(url, header=None)
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
model = Ridge()
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
grid = dict()
grid['alpha'] = arange(0, 1, 0.01)
search = GridSearchCV(model, grid, scoring='neg_mean_absolute_error', cv=cv,n_jobs=-1)
results = search.fit(X, y)
print('MAE: %.3f' % results.best_score_)
print('Config: %s' % results.best_params_)
OUTPUT:
MAE: -3.379
Config: {'alpha': 0.51}
VISUALIZATION:
CODE:
46
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
X = 1.0 / (np.arange(1, 11) + np.arange(0, 10)[:, np.newaxis])
y = np.ones(10)
n_alphas = 200
alphas = np.logspace(-10, -2, n_alphas)
coefs = []
for a in alphas:
ridge = linear_model.Ridge(alpha=a, fit_intercept=False)
ridge.fit(X, y)
coefs.append(ridge.coef_)
ax = plt.gca()
ax.plot(alphas, coefs)
ax.set_xscale("log")
ax.set_xlim(ax.get_xlim()[::-1])
plt.xlabel("population")
plt.ylabel("total_rooms")
plt.title("Ridge regression")
plt.axis("tight")
plt.show()
OUTPUT:
Result:
Thus various Linear regression models are implemented.
47