0% found this document useful (0 votes)
19 views31 pages

AIML Lab Ex 3-5 - 1

AIML lab

Uploaded by

srinithiravi27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views31 pages

AIML Lab Ex 3-5 - 1

AIML lab

Uploaded by

srinithiravi27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Ex. No.

3 IMPLEMENTATION OF NAÏVE BAYES MODELS


Date:

Aim
To implement Naïve bayes classifiers as machine learning models using sklearn library in
python.
Naïve Bayes Algorithm
● Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems.
● It is mainly used in text classification that includes a high-dimensional training
dataset.
● Naïve Bayes Classifier is one of the simplest and most effective Classification
algorithms which helps in building fast machine learning models that can make quick
predictions.
● It is a probabilistic classifier, which means it predicts based on the probability of
an object.
● Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.

The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, which can be
described as:

● Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the
bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an
apple. Hence each feature individually contributes to identifying that it is an apple
without depending on each other.
● Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.

● Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends on the
conditional probability.
● The formula for Bayes' theorem is given as:

P(A|B) = P(B|A) * P(A) / P(B)

17
Types of Naïve Bayes Model:
There are three types of Naive Bayes Model, which are given below:

● Gaussian: The Gaussian model assumes that features follow a normal distribution.
This means if predictors take continuous values instead of discrete, then the model
assumes that these values are sampled from the Gaussian distribution.
● Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
multinomial distributed. It is primarily used for document classification problems, it
means a particular document belongs to which category such as Sports, Politics,
education, etc.
● Bernoulli: The Bernoulli classifier works like the Multinomial classifier, but the
predictor variables are the independent Booleans variables. Such as if a particular
word is present or not in a document. This model is also famous for document
classification tasks.

Procedure:
1. Import the necessary libraries and dataset (here,”loan_data.csv”)
2. Explore the data to figure out what they look like
3. Pre-process the data
4. Split the data into attributes and labels
5. Divide the data into training and testing sets
6. Train the model using GaussianNB ,MultinomialNB, BernoulliNB algorithms
7. Make some predictions and display confusion matrix for each model and compare
them
8. Evaluate the results of the algorithm and display classification report for train and
test data

Program
CODE:
import pandas as pd
df = pd.read_csv('loan_data.csv')
df.head()

OUTPUT:
credit.policy purpose int.rate installment log.annual.inc
\

18
0 1 debt_consolidation 0.1189 829.10
11.350407
1 1 credit_card 0.1071 228.22
11.082143
2 1 debt_consolidation 0.1357 366.86
10.373491
3 1 debt_consolidation 0.1008 162.34
11.350407
4 1 credit_card 0.1426 102.92
11.299732

dti fico days.with.cr.linerevol.balrevol.util inq.last.6mths \


0 19.48 737 5639.958333 28854 52.1 0
1 14.29 707 2760.000000 33623 76.7 0
2 11.63 682 4710.000000 3511 25.6 1
3 8.10 712 2699.958333 33667 73.2 1
4 14.97 667 4066.000000 4740 39.5 0

delinq.2yrs pub.recnot.fully.paid
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 1 0 0

CODE:
df.info()

OUTPUT:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9578 entries, 0 to 9577
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 credit.policy 9578 non-null int64
1 purpose 9578 non-null object
2 int.rate 9578 non-null float64
3 installment 9578 non-null float64
4 log.annual.inc 9578 non-null float64
5 dti 9578 non-null float64
6 fico 9578 non-null int64
7 days.with.cr.line 9578 non-null float64
8 revol.bal 9578 non-null int64
9 revol.util 9578 non-null float64
10 inq.last.6mths 9578 non-null int64
11 delinq.2yrs 9578 non-null int64
12 pub.rec 9578 non-null int64
13 not.fully.paid 9578 non-null int64
dtypes: float64(6), int64(7), object(1)
memory usage: 1.0+ MB

CODE:

19
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(data=df,x='purpose',hue='not.fully.paid')
plt.xticks(rotation=45, ha='right');

OUTPUT:

CODE:
pre_df = pd.get_dummies(df,columns=['purpose'],drop_first=True)
pre_df.head()

OUTPUT:
credit.policyint.rate installment log.annual.inc dti fico \
0 1 0.1189 829.10 11.350407 19.48 737
1 1 0.1071 228.22 11.082143 14.29 707
2 1 0.1357 366.86 10.373491 11.63 682
3 1 0.1008 162.34 11.350407 8.10 712
4 1 0.1426 102.92 11.299732 14.97 667

days.with.cr.linerevol.balrevol.util i
inq.last.6mths
nq.last.6mths delinq.2yrs \
0 5639.958333 28854 52.1 0 0
1 2760.000000 33623 76.7 0 0
2 4710.000000 3511 25.6 1 0
3 2699.958333 33667 73.2 1 0
4 4066.000000 4740 39.5 0 1

pub.recnot.fully.paidpurpose_credit_cardpurpose_debt_consolidation \
0 0 0 0
1
1 0 0 1

20
0
2 0 0 0
1
3 0 0 0
1
4 0 0 1
0

purpose_educationalpurpose_home_improvementpurpose_major_purchase \
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0

purpose_small_business
0 0
1 0
2 0
3 0
4 0

CODE:
from sklearn.model_selection import train_test_split
X = pre_df.drop('not.fully.paid', axis=1)
y = pre_df['not.fully.paid']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33,
random_state=125)

#GAUSSIAN NAIVE BAYES


from sklearn.naive_bayes import GaussianNB
model1 = GaussianNB()
model1.fit(X_train, y_train);

from sklearn.metrics import


(accuracy_score,confusion_matrix,ConfusionMatrixDisplay,f1_score,

classification_report,)
y_pred1 = model1.predict(X_test)
accuracy1 = accuracy_score(y_pred1, y_test)

f1 = f1_score(y_pred1, y_test, average="weighted")

print("Accuracy:", accuracy1)
print("F1 Score:", f1)

OUTPUT:
Accuracy: 0.8206263840556786
F1 Score: 0.8686606980013266

CODE:

21
labels = ["Fully Paid", "Not fully Paid"]
cm1 = confusion_matrix(y_test, y_pred1)
disp1 = ConfusionMatrixDisplay(confusion_matrix=cm1,
onMatrixDisplay(confusion_matrix=cm1,
display_labels=labels)
disp1.plot();

OUTPUT:

#MULTINOMIAL NAIVE BAYES


CODE:
from sklearn.naive_bayes import MultinomialNB
model2 = MultinomialNB()
model2.fit(X_train, y_train);

y_pred2= model2.predict(X_test)

accuracy2
acy2 = accuracy_score(y_pred2, y_test)

f11 = f1_score(y_pred2, y_test, average="weighted")

print("Accuracy:", accuracy2)
print("F1 Score:", f11)

OUTPUT:
Accuracy: 0.6678266371401456
F1 Score: 0.640426265085445

CODE:

22
cm2 = confusion_matrix(y_test, y_pred2)
disp2 = ConfusionMatrixDisplay(confusion_matrix=cm2,display_labels=labels)
disp2.plot();

OUTPUT:

#BERNOULLI NAIVE BAYES


CODE:
from sklearn.naive_bayes import BernoulliNB
model3 = BernoulliNB()
model3.fit(X_train, y_train);

y_pred3 = model3.predict(X_test)

accuracy3 = accuracy_score(y_pred3, y_test)


f13 = f1_score(y_pred1, y_test, average="weighted")

print("Accuracy:", accuracy3)
print("F1 Score:", f13)

OUTPUT:
Accuracy: 0.8272698513128757
F1 Score: 0.8686606980013266

CODE:
cm3 = confusion_matrix(y_test, y_pred3)
disp3 = ConfusionMatrixDisplay(confusion_matrix=cm3,
display_labels=labels)
disp3.plot();

23
OUTPUT:

#GAUSSIANNB
CODE:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred1))

OUTPUT:
precision recall f1
f1-score support

0 0.84 0.97 0.90 2625


1 0.38 0.09 0.15 536

accuracy 0.82 3161


macro avg 0.61 0.53 0.52 3161
weighted avg 0.76 0.82 0.77 3161

#MULTINOMIALNB

CODE:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred2))

OUTPUT:
precision recall f1
f1-score support

24
0 0.84 0.74 0.79 2625
1 0.20 0.32 0.25 536

accuracy 0.67 3161


macro avg 0.52 0.53 0.52 3161
weighted avg 0.73 0.67 0.70 3161

#BERNOULLINB

CODE:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred3))

OUTPUT:
precision recall f1-score support

0 0.83 0.99 0.91 2625


1 0.25 0.01 0.02 536

accuracy 0.83 3161


macro avg 0.54 0.50 0.46 3161
weighted avg 0.73 0.83 0.75 3161

Result:

Thus, implementation of Naïve Bayes Classifiers on “loan_data” using ml models


was successfully executed.

25
26
Ex. No. 4 IMPLEMENTATION OF BAYESIAN NETWORKS
Date:

Aim:
To construct a Bayesian network, to demonstrate the diagnosis of heart patients using
standard Heart Disease Data Set.

A Bayesian network is a directed acyclic graph in which each edge corresponds to a


conditional dependency, and each node corresponds to a unique random variable.

Bayesian network consists of two major parts: a directed acyclic graph and a set of
conditional probability distributions

 The directed acyclic graph is a set of random variables represented by nodes.


 The conditional probability distribution of a node (random variable) is defined for
every possible outcome of the preceding causal node(s).

Algorithm:
1. Read the training dataset T;
2. Calculate the mean and standard deviation of the predictor variables in each class;
3. Repeat Calculate the probability of fi using the gauss density equation in each class; Until
the probability of all predictor variables (f1, f2, f3, .. , fn) has been calculated.
4. Calculate the likelihood for each class;
5. Get the greatest likelihood;

Program

27
BayesNet()

A BayesNet is a graph (as in the diagram above) where each node represents a random
variable, and the edges are parent→child links. You can construct an empty graph with
BayesNet(), then add variables one at a time with the method call .add(variable_name,
parent_names, cpt), where the names are strings, and each of the parent_names must already
have been .added.

Variable(name, cpt, parents)

A random variable; the ovals in the diagram above. The value of a variable depends on the
value of the parents, in a probabilistic way specified by the variable's conditional probability
table (CPT). Given the parents, the variable is independent of all the other variables. For
example, if I know whether Alarm is true or false, then I know the probability of JohnCalls,
and evidence about the other variables won't give me any more information about JohnCalls.
Each row of the CPT uses the same order of variables as the list of parents. We will only
allow variables with a finite discrete domain; not continuous values.

ProbDist(mapping)
Factor(mapping)

A probability distribution is a mapping of {outcome: probability} for every outcome of a


random variable. You can give ProbDist the same arguments that you would give to the dict
initializer, for example ProbDist(sun=0.6, rain=0.1, cloudy=0.3). As a shortcut for
Boolean Variables, you can say ProbDist(0.95) instead of ProbDist({T: 0.95, F:
0.05}). In a probability distribution, every value is between 0 and 1, and the values sum to 1.
A Factor is similar to a probability distribution, except that the values need not sum to 1.
Factors are used in the variable elimination inference method.

Evidence(mapping)

A mapping of {Variable: value, ...} pairs, describing the exact values for a set of
variables—the things we know for sure.

CPTable(rows, parents)

A conditional probability table (or CPT) describes the probability of each possible outcome
value of a random variable, given the values of the parent variables. A CPTable is a a
mapping, {tuple: probdist, ...}, where each tuple lists the values of each of the parent
variables, in order, and each probability distribution says what the possible outcomes are,
given those values of the parents. The CPTable for Alarm in the diagram above would be
represented as follows:

CPTable({(T, T): .95,


(T, F): .94,
(F, T): .29,
(F, F): .001},
[Burglary, Earthquake])

28
Take the second row, "(T, F): .94". This means that when the first parent (Burglary) is
true, and the second parent (Earthquake) is fale, then the probability of Alarm being true is
.94. Note that the .94 is an abbreviation for ProbDist({T: .94, F: .06}).

T = Bool(True); F = Bool(False)

In [2]:
from collections import defaultdict, Counter
import itertools
import math
import random

class BayesNet(object):
"Bayesian network: a graph of variables connected by parent links."

def __init__(self):
self.variables = [] # List of variables, in parent-first topological sort order
self.lookup = {} # Mapping of {variable_name: variable} pairs

def add(self, name, parentnames, cpt):


"Add a new Variable to the BayesNet. Parentnames must have been added previously."
parents = [self.lookup[name] for name in parentnames]
var = Variable(name, cpt, parents)
self.variables.append(var)
self.lookup[name] = var
return self

class Variable(object):
"A discrete random variable; conditional on zero or more parent Variables."

def __init__(self, name, cpt, parents=()):


"A variable has a name, list of parent variables, and a Conditional Probability Table."
self.__name__ = name
self.parents = parents
self.cpt = CPTable(cpt, parents)
self.domain = set(itertools.chain(*self.cpt.values())) # All the outcomes in the CPT

def __repr__(self): return self.__name__

class Factor(dict): "An {outcome: frequency} mapping."

class ProbDist(Factor):
"""A Probability Distribution is an {outcome: probability} mapping.
The values are normalized to sum to 1.
ProbDist(0.75) is an abbreviation for ProbDist({T: 0.75, F: 0.25})."""
def __init__(self, mapping=(), **kwargs):
if isinstance(mapping, float):

29
mapping = {T: mapping, F: 1 - mapping}
self.update(mapping, **kwargs)
normalize(self)

class Evidence(dict):
"A {variable: value} mapping, describing what we know for sure."

class CPTable(dict):
"A mapping of {row: ProbDist, ...} where each row is a tuple of values of the parent
variables."

def __init__(self, mapping, parents=()):


"""Provides two shortcuts for writing a Conditional Probability Table.
With no parents, CPTable(dist) means CPTable({(): dist}).
With one parent, CPTable({val: dist,...}) means CPTable({(val,): dist,...})."""
if len(parents) == 0 and not (isinstance(mapping, dict) and set(mapping.keys()) == {()}):
mapping = {(): mapping}
for (row, dist) in mapping.items():
if len(parents) == 1 and not isinstance(row, tuple):
row = (row,)
self[row] = ProbDist(dist)

class Bool(int):
"Just like `bool`, except values display as 'T' and 'F' instead of 'True' and 'False'"
__str__ = __repr__ = lambda self: 'T' if self else 'F'

T = Bool(True)
F = Bool(False)

In [9]:

def P(var, evidence={}):


"The probability distribution for P(variable | evidence), when all parent variables are
known (in evidence)."
row = tuple(evidence[parent] for parent in var.parents)
return var.cpt[row]

def normalize(dist):
"Normalize a {key: value} distribution so values sum to 1.0. Mutates dist and returns it."
total = sum(dist.values())
for key in dist:
dist[key] = dist[key] / total
assert 0 <= dist[key] <= 1, "Probabilities must be between 0 and 1."
return dist

def sample(probdist):
"Randomly sample an outcome from a probability distribution."

30
r = random.random() # r is a random point in the probability distribution
c = 0.0 # c is the cumulative probability of outcomes seen so far
for outcome in probdist:
c += probdist[outcome]
if r <= c:
return outcome
def globalize(mapping):
"Given a {name: value} mapping, export all the names to the `globals()` namespace."
globals().update(mapping)

In [4]:
Earthquake = Variable('Earthquake', 0.002)

In [5]:
P(Earthquake)

Out[5]:
{F: 0.998, T: 0.002}

In [6]:
P(Earthquake)[T]

Out[6]:
0.002

In [7]:
alarm_net = (BayesNet()
.add('Burglary', [], 0.001)
.add('Earthquake', [], 0.002)
.add('Alarm', ['Burglary', 'Earthquake'], {(T, T): 0.95, (T, F): 0.94, (F, T): 0.29, (F, F): 0.001})
.add('JohnCalls', ['Alarm'], {T: 0.90, F: 0.05})
.add('MaryCalls', ['Alarm'], {T: 0.70, F: 0.01}))

In [8]:
# Make Burglary, Earthquake, etc. be global variables
globalize(alarm_net.lookup)
alarm_net.variables

Out[8]:
[Burglary, Earthquake, Alarm, JohnCalls, MaryCalls]

In [14]:
# Probability of Alarm going off, given a Burglary and not an Earthquake:
P(Alarm, {Burglary: T, Earthquake: F})

Out[14]:
{T: 0.94, F: 0.06000000000000005}

31
In [15]:
Alarm.cpt

Out[15]:
{(T, T): {T: 0.95, F: 0.050000000000000044},
(T, F): {T: 0.94, F: 0.06000000000000005},
(F, T): {T: 0.29, F: 0.71},
(F, F): {T: 0.001, F: 0.999}}

Bayes Nets as Joint Probability Distributions


A Bayes net is a compact way of specifying a full joint distribution over all the variables in
the network. Given a set of variables {X1, ..., X*n*}, the full joint distribution is:

P(X1=x1, ..., X*n*=x*n*) = Π*i* P(X*i* = x*i* | parents(X*i*))

For a network with n variables, each of which has b values, there are bn rows in the joint
distribution (for example, a billion rows for 30 Boolean variables), making it impractical to
explicitly create the joint distribution for large networks. But for small networks, the function
joint_distribution creates the distribution, which can be instructive to look at, and can be
used to do inference.

In [16]:
def joint_distribution(net):
"Given a Bayes net, create the joint distribution over all variables."
return ProbDist({row: prod(P_xi_given_parents(var, row, net)
for var in net.variables)
for row in all_rows(net)})

def all_rows(net): return itertools.product(*[var.domain for var in net.variables])

def P_xi_given_parents(var, row, net):


"The probability that var = xi, given the values in this row."
dist = P(var, Evidence(zip(net.variables, row)))
xi = row[net.variables.index(var)]
return dist[xi]

def prod(numbers):
"The product of numbers: prod([2, 3, 5]) == 30. Analogous to `sum([2, 3, 5]) == 10`."
result = 1
for x in numbers:
result *= x
return result

In [17]:

32
P(Alarm, {Burglary: F, Earthquake: F})

Out[17]:
{T: 0.001, F: 0.999}

In [18]:
# Probability that "the alarm has sounded, but neither a burglary nor an earthquake has
occurred,
# and both John and Mary call" (page 514 says it should be 0.000628)

print(alarm_net.variables)
joint_distribution(alarm_net)[F, F, T, T, T]

[Burglary, Earthquake, Alarm, JohnCalls, MaryCalls]

Out[18]:

0.00062811126

Inference by Querying the Joint Distribution

Bayes nets allow us to calculate the probability, but the calculation is not just a lookup
in the CPT; it is a global calculation across the whole net. One inefficient but straightforward
way of doing the calculation is to create the joint probability distribution, then pick out just
the rows that match the evidence variables, and for each row check what the value of the
query variable is, and increment the probability for that value accordningly:

In [19]:

def enumeration_ask(X, evidence, net):


"The probability distribution for query variable X in a belief net, given evidence."
i = net.variables.index(X) # The index of the query variable X in the row
dist = defaultdict(float) # The resulting probability distribution over X
for (row, p) in joint_distribution(net).items():
if matches_evidence(row, evidence, net):
dist[row[i]] += p
return ProbDist(dist)

def matches_evidence(row, evidence, net):


"Does the tuple of values for this row agree with the evidence?"
return all(evidence[v] == row[net.variables.index(v)]
for v in evidence)

In [20]:

# The probability of a Burgalry, given that John calls but Mary does not:
enumeration_ask(Burglary, {JohnCalls: F, MaryCalls: T}, alarm_net)

33
Out[20]:
{F: 0.9931237539265789, T: 0.006876246073421024}

In [21]:
enumeration_ask(Burglary, {JohnCalls: T, MaryCalls: T}, alarm_net)

Out[21]:
{F: 0.7158281646356071, T: 0.28417183536439294}

In [22]:
# The probability of an Alarm, given that there is an Earthquake and Mary calls:
enumeration_ask(Alarm, {MaryCalls: T, Earthquake: T}, alarm_net)

Out[22]:
{F: 0.03368899586522123, T: 0.9663110041347788}

Using Variable Elimination - Cleveland database

The Cleveland database contains 76 attributes, but all published experiments refer to using a
subset of 14 of them. In particular, the Cleveland database is the only one that has been used
by ML researchers to this date. The “Heartdisease” field refers to the presence of heart disease
in the patient. It is integer valued from 0 (no presence) to 4.

Database: 0 1 2 3 4 Total

Cleveland: 164 55 36 35 13 303

Attribute Information:

1. age: age in years


2. sex: sex (1 = male; 0 = female)
3. cp: chest pain type
1. Value 1: typical angina
2. Value 2: atypical angina
3. Value 3: non-anginal pain
4. Value 4: asymptomatic
4. trestbps: resting blood pressure (in mm Hg on admission to the hospital)
5. chol: serum cholestoral in mg/dl
6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg: resting electrocardiographic results
1. Value 0: normal
2. Value 1: having ST-T wave abnormality (T wave inversions and/or ST
elevation or depression of > 0.05 mV)
3. Value 2: showing probable or definite left ventricular hypertrophy by Estes’
criteria
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina (1 = yes; 0 = no)
10. oldpeak = ST depression induced by exercise relative to rest

34
11. slope: the slope of the peak exercise ST segment
1. Value 1: upsloping
2. Value 2: flat
3. Value 3: downsloping
12. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
13. Heartdisease: It is integer valued from 0 (no presence) to 4.

Some instance from the dataset:

age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal Heartdisease
63 1 1 145 233 1 2 150 o 2.3 3 o 6 o
67 1 4 160 286 o 2 108 1 1.5 2 3 3 2
67 1 4 120 229 o 2 129 1 2.6 2 2 7 1
41 o 2 130 204 o 2 172 o 1.4 1 o 3 o
62 o 4 140 268 o 2 160 o 3.6 3 2 3 3
60 1 4 130 206 o 2 132 1 2.4 2 2 7 4

Python Program to Implement and Demonstrate Bayesian network using


pgmpy Machine Learning
import numpy as np
import pandas as pd
import csv
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination

heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)

print('Sample instances from the dataset are given below')


print(heartDisease.head())

print('\n Attributes and datatypes')


print(heartDisease.dtypes)

model=
BayesianModel([('age','heartdisease'),('sex','heartdisease'),('exang','hear
tdisease'),('cp','heartdisease'),('heartdisease','restecg'),('heartdisease'
,'chol')])
print('\nLearning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)

print('\n Inferencing with Bayesian Network:')


HeartDiseasetest_infer = VariableElimination(model)

print('\n 1. Probability of HeartDisease given evidence= restecg')


q1=HeartDiseasetest_infer.query(variables=['heartdisease'],evidence={'reste
cg':1})
print(q1)

print('\n 2. Probability of HeartDisease given evidence= cp ')


q2=HeartDiseasetest_infer.query(variables=['heartdisease'],evidence={'cp':2
})

35
print(q2)

Output

Result:
Thus Bayesian network is built and inference for earthquake problem is identified.

36
Ex. No. 5 BUILD REGRESSION MODELS
Date:

AIM:
To build regression models using various datasets.

REGRESSION:
Regression shows a line or curve that passes through all the datapoints on
target-predictor graph in such a way that the vertical distance between the datapoints and the
regression line is minimum.

Some examples:
->Prediction of rain
->Determining Market trends
->Prediction of road accidents due to rash driving.

Terminologies:
• Dependent Variable
• Independent Variable
• Outliers
• Multicollinearity
• Underfitting and Overfitting
Types of regression :
• Linear Regression
• Logistic Regression
• Ridge Regression
• Lasso Regression:

Linear Regression:
Linear Regression is an algorithm that belongs to supervised machine learning. It
tries to apply relations that will predict the outcome of an event based on the independent
variable data points. The relation is usually a straight line that best fits the different data
points as close as possible.
Y= aX+b
Here, Y = dependent variables (target variables),
X= Independent variables (predictor variables),
a and b are the linear coefficients

37
Logistic Regression:
Logistic Regression is one of the supervised learning algorithm. It is used to
calculate or predict the probability of an event occurring.. The formula is given below:
1
𝑓 (𝑥 ) =
1+𝑒
• f(x)= Output between the 0 and 1 value.
• x= input to the function
• e= base of natural logarithm.
Ridge Regression:
Ridge Regression Is a technique Which Is used for analyzing Multiple
Regression where the data suffers from multicollinearity. The Problem which arises due to
multicollinearity is that the basic linear regression model (Least Square Estimates) becomes
unbiased and the variance becomes so large that the predicted values are far from the true
value.

Lasso Regression:
Lasso (least absolute shrinkage and selection operator) is a regression analysis
method that performs both variable selection and regularization in order to enhance the
prediction accuracy and interpretability of the resulting statiscal model.

Algorithm:

Step 1:Load the data .


Step 2 : Initialize the parameters.
Step 3 : Predict the value of a dependent variable by given an independent variable.
Step 4: Calculate the error in prediction for all data points.
Step 5:. Check for accuracy.
Step 6: Visualize the results with a graph.
Step 7: Update the values
Step 8: Report your results

38
LINEAR REGRESSION:

CODE:(Head brain dataset)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data=pd.read_csv('headbrain.csv')
data

OUTPUT:

Age Range Head Size(cm) Brain


Gender Weight(grams)
0 1 1 4512 1530
1 1 1 3738 1297
2 1 1 4261 1335
3 1 1 3777 1282
4 1 1 4177 1590
... ... ... ... ...
232 2 2 3214 1110
233 2 2 3394 1215
234 2 2 3233 1104
235 2 2 3352 1170
236 2 2 3391 1120

PREDICTING THE COEFFICIENTS:


CODE:
X = data['Head Size(cm^3)'].values
Y = data['Brain Weight(grams)'].values
mean_x = np.mean(X)
mean_y = np.mean(Y)
n = len(X)
numerator = 0
denominator = 0
for i in range (n):
numerator += (X[i]-mean_x) * (Y[i] - mean_y)
denominator += (X[i] - mean_x) ** 2
b1 = numerator / denominator
b0 = mean_y - (b1 * mean_x)
print("coefficients:",b1,b0)

OUTPUT:
coefficients: 0.26342933948939945 325.57342104944223

FINDING MEAN SQUARED ERROR:

39
CODE:
ss_t = 0
ss_r = 0
for i in range(n):
y_pred = b0 + b1 * X[i]
ss_t += (Y[i] - mean_y) ** 2
ss_r += (Y[i] - y_pred) ** 2
r2 = 1 - (ss_r/ss_t)
print("mean squared error:",r2)

OUTPUT:

mean squared error: 0.6393117199570003

VISUALIZING THE DATA:

CODE:

max_x = np.max(X) + 100


min_x = np.min(X) - 100
x = np.linspace(min_x,max_x,1000)
y = b0+b1*x
plt.plot(x,y,color='#58b970', label = 'Regression line')
plt.scatter(X,Y, c = '#ef5423', label ='Scatter plot')
plt.xlabel('Head Size in cm')
plt.ylabel('Brain weight in grams')
plt.legend()
plt.show()

OUTPUT:

40
LOGISTIC REGRESSION:

CODE:(Diabetes Dataset)

import pandas as pd
df=pd.read_csv('diabetes.csv')
df

OUTPUT:
Pregnanci Glucose Blood Skin Insuli BMI Diabetes Age Outco
es Pressure Thickness n Pedigree me
Function
0 6 148 72 35 0 33.6 0.627 50 1
1 1 85 66 29 0 26.6 0.351 31 0
2 8 183 64 0 0 23.3 0.672 32 1
3 1 89 66 23 94 28.1 0.167 21 0
4 0 137 40 35 168 43.1 2.288 33 1
... ... ... ... ... ... ... ... ... ...
763 10 101 76 48 180 32.9 0.171 63 0
764 2 122 70 27 0 36.8 0.340 27 0
765 5 121 72 23 112 26.2 0.245 30 0
766 1 126 60 0 0 30.1 0.349 47 1
767 1 93 70 31 0 30.4 0.315 23 0
768 rows × 9 columns
DATA DESCRIPTION:

41
CODE:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
diabetesDF = pd.read_csv('diabetes.csv')
print(diabetesDF.head())

OUTPUT:

Pregnancies Glucose BloodPressure SkinThickness Insulin


BMI \
0 6 148 72 35 0
33.6
1 1 85 66 29 0
26.6
2 8 183 64 0 0
23.3
3 1 89 66 23 94
28.1
4 0 137 40 35 168
43.1

DiabetesPedigreeFunction Age Outcome


0 0.627 50 1
1 0.351 31 0
2 0.672 32 1
3 0.167 21 0
4 2.288 33 1

DATA EXPLORATION:

CODE:

corr = diabetesDF.corr()
print(corr)
sns.heatmap(corr,xticklabels=corr.columns,yticklabels=corr.col
umns)

OUTPUT:

Pregnancies Glucose BloodPressure


SkinThickness \
Pregnancies 1.000000 0.129459 0.141282
-0.081672
Glucose 0.129459 1.000000 0.152590
0.057328
BloodPressure 0.141282 0.152590 1.000000

42
0.207371
SkinThickness -0.081672 0.057328 0.207371
1.000000
Insulin -0.073535 0.331357 0.088933
0.436783
BMI 0.017683 0.221071 0.281805
0.392573
DiabetesPedigreeFunction -0.033523 0.137337 0.041265
0.183928
Age 0.544341 0.263514 0.239528
-0.113970
Outcome 0.221898 0.466581 0.065068
0.074752

Insulin BMI
DiabetesPedigreeFunction \
Pregnancies -0.073535 0.017683 -
0.033523
Glucose 0.331357 0.221071
0.137337
BloodPressure 0.088933 0.281805
0.041265
SkinThickness 0.436783 0.392573
0.183928
Insulin 1.000000 0.197859
0.185071
BMI 0.197859 1.000000
0.140647
DiabetesPedigreeFunction 0.185071 0.140647
1.000000
Age -0.042163 0.036242
0.033561
Outcome 0.130548 0.292695
0.173844

Age Outcome
Pregnancies 0.544341 0.221898
Glucose 0.263514 0.466581
BloodPressure 0.239528 0.065068
SkinThickness -0.113970 0.074752
Insulin -0.042163 0.130548
BMI 0.036242 0.292695
DiabetesPedigreeFunction 0.033561 0.173844
Age 1.000000 0.238356
Outcome 0.238356 1.000000
<Axes: >

43
LASSO REGRESSION:
TO FIND MODEL SCORE:

CODE:

from sklearn import datasets


from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
diabetes = datasets.load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(diabetes.d
ata, diabetes.target, test_size=0.3, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)
y_pred = lasso.predict(X_test)
print("Model Score: ", lasso.score(X_test, y_test))
lasso.coef_

OUTPUT:

Model Score: 0.47815356922835583


array([ 1.364918 , -12.21558692, 26.45121861, 18.40929882,
-30.54131232, 14.55719971, 0. , 11.74486066,
26.79441432, 2.06055063])

44
CODE:
from sklearn.model_selection import GridSearchCV
param_grid = {'alpha': [0.001, 0.01, 0.1, 1, 10]}
lasso_cv = GridSearchCV(Lasso(), param_grid, cv=5)
lasso_cv.fit(X_train, y_train)
print("Best Parameters:", lasso_cv.best_params_)
print("Best Score:", lasso_cv.be

OUTPUT:

Best Parameters: {'alpha': 0.1}


Best Score: 0.45302353092758024

RIDGE REGRESSION:
CODE:(Housing Dataset)
import pandas as pd
import numpy as np
df=pd.read_csv("housing.csv")
df.info()

OUTPUT:

Data columns (total 10 columns):


0 longitude 20640 non-null float64
1 latitude 20640 non-null float64
2 housing_median_age 20640 non-null float64
3 total_rooms 20640 non-null float64
4 total_bedrooms 20433 non-null float64
5 population 20640 non-null float64
6 households 20640 non-null float64
7 median_income 20640 non-null float64
8 median_house_value 20640 non-null float64
9 ocean_proximity 20640 non-null object

TO ANALYSE THE DATASET:


CODE:
from pandas import read_csv
from sklearn.linear_model import Ridge

45
url='https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
dataframe = read_csv(url, header=None)
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
model = Ridge(alpha=1.0)
model.fit(X, y)
row =[0.00632,18.00,2.310,0,0.5380,6.5750,65.20,4.0900,1,296.0,15.30,396.90,4.98]
yhat = model.predict([row])
print('Predicted: %.3f' % yhat)

OUTPUT:
Predicted: 30.253

CODE:
from numpy import arange
from pandas import read_csv
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RepeatedKFold
from sklearn.linear_model import Ridge
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
dataframe = read_csv(url, header=None)
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
model = Ridge()
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
grid = dict()
grid['alpha'] = arange(0, 1, 0.01)
search = GridSearchCV(model, grid, scoring='neg_mean_absolute_error', cv=cv,n_jobs=-1)
results = search.fit(X, y)
print('MAE: %.3f' % results.best_score_)
print('Config: %s' % results.best_params_)

OUTPUT:
MAE: -3.379
Config: {'alpha': 0.51}

VISUALIZATION:
CODE:

46
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
X = 1.0 / (np.arange(1, 11) + np.arange(0, 10)[:, np.newaxis])
y = np.ones(10)
n_alphas = 200
alphas = np.logspace(-10, -2, n_alphas)
coefs = []
for a in alphas:
ridge = linear_model.Ridge(alpha=a, fit_intercept=False)
ridge.fit(X, y)
coefs.append(ridge.coef_)
ax = plt.gca()
ax.plot(alphas, coefs)
ax.set_xscale("log")
ax.set_xlim(ax.get_xlim()[::-1])
plt.xlabel("population")
plt.ylabel("total_rooms")
plt.title("Ridge regression")
plt.axis("tight")
plt.show()

OUTPUT:

Result:
Thus various Linear regression models are implemented.

47

You might also like