0% found this document useful (0 votes)

19 views31 pages

AIML Lab Ex 3-5 - 1

AIML lab

Uploaded by

srinithiravi27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views31 pages

AIML Lab Ex 3-5 - 1

AIML lab

Uploaded by

srinithiravi27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Ex. No.

3 IMPLEMENTATION OF NAÏVE BAYES MODELS

Date:

Aim
To implement Naïve bayes classifiers as machine learning models using sklearn library in
python.
Naïve Bayes Algorithm
● Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems.
● It is mainly used in text classification that includes a high-dimensional training
dataset.
● Naïve Bayes Classifier is one of the simplest and most effective Classification
algorithms which helps in building fast machine learning models that can make quick
predictions.
● It is a probabilistic classifier, which means it predicts based on the probability of
an object.
● Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.

The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, which can be
described as:

● Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the
bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an
apple. Hence each feature individually contributes to identifying that it is an apple
without depending on each other.
● Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.

● Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends on the
conditional probability.
● The formula for Bayes' theorem is given as:

P(A|B) = P(B|A) * P(A) / P(B)

17
Types of Naïve Bayes Model:
There are three types of Naive Bayes Model, which are given below:

● Gaussian: The Gaussian model assumes that features follow a normal distribution.
This means if predictors take continuous values instead of discrete, then the model
assumes that these values are sampled from the Gaussian distribution.
● Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
multinomial distributed. It is primarily used for document classification problems, it
means a particular document belongs to which category such as Sports, Politics,
education, etc.
● Bernoulli: The Bernoulli classifier works like the Multinomial classifier, but the
predictor variables are the independent Booleans variables. Such as if a particular
word is present or not in a document. This model is also famous for document
classification tasks.

Procedure:
1. Import the necessary libraries and dataset (here,”loan_data.csv”)
2. Explore the data to figure out what they look like
3. Pre-process the data
4. Split the data into attributes and labels
5. Divide the data into training and testing sets
6. Train the model using GaussianNB ,MultinomialNB, BernoulliNB algorithms
7. Make some predictions and display confusion matrix for each model and compare
them
8. Evaluate the results of the algorithm and display classification report for train and
test data

Program
CODE:
import pandas as pd
df = pd.read_csv('loan_data.csv')
df.head()

OUTPUT:
credit.policy purpose int.rate installment log.annual.inc
\

18
0 1 debt_consolidation 0.1189 829.10
11.350407
1 1 credit_card 0.1071 228.22
11.082143
2 1 debt_consolidation 0.1357 366.86
10.373491
3 1 debt_consolidation 0.1008 162.34
11.350407
4 1 credit_card 0.1426 102.92
11.299732

dti fico days.with.cr.linerevol.balrevol.util inq.last.6mths \

0 19.48 737 5639.958333 28854 52.1 0
1 14.29 707 2760.000000 33623 76.7 0
2 11.63 682 4710.000000 3511 25.6 1
3 8.10 712 2699.958333 33667 73.2 1
4 14.97 667 4066.000000 4740 39.5 0

delinq.2yrs pub.recnot.fully.paid
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 1 0 0

CODE:
df.info()

OUTPUT:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9578 entries, 0 to 9577
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 credit.policy 9578 non-null int64
1 purpose 9578 non-null object
2 int.rate 9578 non-null float64
3 installment 9578 non-null float64
4 log.annual.inc 9578 non-null float64
5 dti 9578 non-null float64
6 fico 9578 non-null int64
7 days.with.cr.line 9578 non-null float64
8 revol.bal 9578 non-null int64
9 revol.util 9578 non-null float64
10 inq.last.6mths 9578 non-null int64
11 delinq.2yrs 9578 non-null int64
12 pub.rec 9578 non-null int64
13 not.fully.paid 9578 non-null int64
dtypes: float64(6), int64(7), object(1)
memory usage: 1.0+ MB

CODE:

19
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(data=df,x='purpose',hue='not.fully.paid')
plt.xticks(rotation=45, ha='right');

OUTPUT:

CODE:
pre_df = pd.get_dummies(df,columns=['purpose'],drop_first=True)
pre_df.head()

OUTPUT:
credit.policyint.rate installment log.annual.inc dti fico \
0 1 0.1189 829.10 11.350407 19.48 737
1 1 0.1071 228.22 11.082143 14.29 707
2 1 0.1357 366.86 10.373491 11.63 682
3 1 0.1008 162.34 11.350407 8.10 712
4 1 0.1426 102.92 11.299732 14.97 667

days.with.cr.linerevol.balrevol.util i
inq.last.6mths
nq.last.6mths delinq.2yrs \
0 5639.958333 28854 52.1 0 0
1 2760.000000 33623 76.7 0 0
2 4710.000000 3511 25.6 1 0
3 2699.958333 33667 73.2 1 0
4 4066.000000 4740 39.5 0 1

pub.recnot.fully.paidpurpose_credit_cardpurpose_debt_consolidation \
0 0 0 0
1
1 0 0 1

20
0
2 0 0 0
1
3 0 0 0
1
4 0 0 1
0

purpose_educationalpurpose_home_improvementpurpose_major_purchase \
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0

purpose_small_business
0 0
1 0
2 0
3 0
4 0

CODE:
from sklearn.model_selection import train_test_split
X = pre_df.drop('not.fully.paid', axis=1)
y = pre_df['not.fully.paid']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33,
random_state=125)

#GAUSSIAN NAIVE BAYES

from sklearn.naive_bayes import GaussianNB
model1 = GaussianNB()
model1.fit(X_train, y_train);

from sklearn.metrics import

(accuracy_score,confusion_matrix,ConfusionMatrixDisplay,f1_score,

classification_report,)
y_pred1 = model1.predict(X_test)
accuracy1 = accuracy_score(y_pred1, y_test)

f1 = f1_score(y_pred1, y_test, average="weighted")

print("Accuracy:", accuracy1)
print("F1 Score:", f1)

OUTPUT:
Accuracy: 0.8206263840556786
F1 Score: 0.8686606980013266

CODE:

21
labels = ["Fully Paid", "Not fully Paid"]
cm1 = confusion_matrix(y_test, y_pred1)
disp1 = ConfusionMatrixDisplay(confusion_matrix=cm1,
onMatrixDisplay(confusion_matrix=cm1,
display_labels=labels)
disp1.plot();

OUTPUT:

#MULTINOMIAL NAIVE BAYES

CODE:
from sklearn.naive_bayes import MultinomialNB
model2 = MultinomialNB()
model2.fit(X_train, y_train);

y_pred2= model2.predict(X_test)

accuracy2
acy2 = accuracy_score(y_pred2, y_test)

f11 = f1_score(y_pred2, y_test, average="weighted")

print("Accuracy:", accuracy2)
print("F1 Score:", f11)

OUTPUT:
Accuracy: 0.6678266371401456
F1 Score: 0.640426265085445

CODE:

22
cm2 = confusion_matrix(y_test, y_pred2)
disp2 = ConfusionMatrixDisplay(confusion_matrix=cm2,display_labels=labels)
disp2.plot();

OUTPUT:

#BERNOULLI NAIVE BAYES

CODE:
from sklearn.naive_bayes import BernoulliNB
model3 = BernoulliNB()
model3.fit(X_train, y_train);

y_pred3 = model3.predict(X_test)

accuracy3 = accuracy_score(y_pred3, y_test)

f13 = f1_score(y_pred1, y_test, average="weighted")

print("Accuracy:", accuracy3)
print("F1 Score:", f13)

OUTPUT:
Accuracy: 0.8272698513128757
F1 Score: 0.8686606980013266

CODE:
cm3 = confusion_matrix(y_test, y_pred3)
disp3 = ConfusionMatrixDisplay(confusion_matrix=cm3,
display_labels=labels)
disp3.plot();

23
OUTPUT:

#GAUSSIANNB
CODE:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred1))

OUTPUT:
precision recall f1
f1-score support

0 0.84 0.97 0.90 2625

1 0.38 0.09 0.15 536

accuracy 0.82 3161

macro avg 0.61 0.53 0.52 3161
weighted avg 0.76 0.82 0.77 3161

#MULTINOMIALNB

CODE:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred2))

OUTPUT:
precision recall f1
f1-score support

24
0 0.84 0.74 0.79 2625
1 0.20 0.32 0.25 536

accuracy 0.67 3161

macro avg 0.52 0.53 0.52 3161
weighted avg 0.73 0.67 0.70 3161

#BERNOULLINB

CODE:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred3))

OUTPUT:
precision recall f1-score support

0 0.83 0.99 0.91 2625

1 0.25 0.01 0.02 536

accuracy 0.83 3161

macro avg 0.54 0.50 0.46 3161
weighted avg 0.73 0.83 0.75 3161

Result:

Thus, implementation of Naïve Bayes Classifiers on “loan_data” using ml models

was successfully executed.

25
26
Ex. No. 4 IMPLEMENTATION OF BAYESIAN NETWORKS
Date:

Aim:
To construct a Bayesian network, to demonstrate the diagnosis of heart patients using
standard Heart Disease Data Set.

A Bayesian network is a directed acyclic graph in which each edge corresponds to a

conditional dependency, and each node corresponds to a unique random variable.

Bayesian network consists of two major parts: a directed acyclic graph and a set of
conditional probability distributions

 The directed acyclic graph is a set of random variables represented by nodes.

 The conditional probability distribution of a node (random variable) is defined for
every possible outcome of the preceding causal node(s).

Algorithm:
1. Read the training dataset T;
2. Calculate the mean and standard deviation of the predictor variables in each class;
3. Repeat Calculate the probability of fi using the gauss density equation in each class; Until
the probability of all predictor variables (f1, f2, f3, .. , fn) has been calculated.
4. Calculate the likelihood for each class;
5. Get the greatest likelihood;

Program

27
BayesNet()

A BayesNet is a graph (as in the diagram above) where each node represents a random
variable, and the edges are parent→child links. You can construct an empty graph with
BayesNet(), then add variables one at a time with the method call .add(variable_name,
parent_names, cpt), where the names are strings, and each of the parent_names must already
have been .added.

Variable(name, cpt, parents)

A random variable; the ovals in the diagram above. The value of a variable depends on the
value of the parents, in a probabilistic way specified by the variable's conditional probability
table (CPT). Given the parents, the variable is independent of all the other variables. For
example, if I know whether Alarm is true or false, then I know the probability of JohnCalls,
and evidence about the other variables won't give me any more information about JohnCalls.
Each row of the CPT uses the same order of variables as the list of parents. We will only
allow variables with a finite discrete domain; not continuous values.

ProbDist(mapping)
Factor(mapping)

A probability distribution is a mapping of {outcome: probability} for every outcome of a

random variable. You can give ProbDist the same arguments that you would give to the dict
initializer, for example ProbDist(sun=0.6, rain=0.1, cloudy=0.3). As a shortcut for
Boolean Variables, you can say ProbDist(0.95) instead of ProbDist({T: 0.95, F:
0.05}). In a probability distribution, every value is between 0 and 1, and the values sum to 1.
A Factor is similar to a probability distribution, except that the values need not sum to 1.
Factors are used in the variable elimination inference method.

Evidence(mapping)

A mapping of {Variable: value, ...} pairs, describing the exact values for a set of
variables—the things we know for sure.

CPTable(rows, parents)

A conditional probability table (or CPT) describes the probability of each possible outcome
value of a random variable, given the values of the parent variables. A CPTable is a a
mapping, {tuple: probdist, ...}, where each tuple lists the values of each of the parent
variables, in order, and each probability distribution says what the possible outcomes are,
given those values of the parents. The CPTable for Alarm in the diagram above would be
represented as follows:

CPTable({(T, T): .95,

(T, F): .94,
(F, T): .29,
(F, F): .001},
[Burglary, Earthquake])

28
Take the second row, "(T, F): .94". This means that when the first parent (Burglary) is
true, and the second parent (Earthquake) is fale, then the probability of Alarm being true is
.94. Note that the .94 is an abbreviation for ProbDist({T: .94, F: .06}).

T = Bool(True); F = Bool(False)

In [2]:
from collections import defaultdict, Counter
import itertools
import math
import random

class BayesNet(object):
"Bayesian network: a graph of variables connected by parent links."

def __init__(self):
self.variables = [] # List of variables, in parent-first topological sort order
self.lookup = {} # Mapping of {variable_name: variable} pairs

def add(self, name, parentnames, cpt):

"Add a new Variable to the BayesNet. Parentnames must have been added previously."
parents = [self.lookup[name] for name in parentnames]
var = Variable(name, cpt, parents)
self.variables.append(var)
self.lookup[name] = var
return self

class Variable(object):
"A discrete random variable; conditional on zero or more parent Variables."

def init(self, name, cpt, parents=()):

"A variable has a name, list of parent variables, and a Conditional Probability Table."
self.__name__ = name
self.parents = parents
self.cpt = CPTable(cpt, parents)
self.domain = set(itertools.chain(*self.cpt.values())) # All the outcomes in the CPT

def repr(self): return self.name

class Factor(dict): "An {outcome: frequency} mapping."

class ProbDist(Factor):
"""A Probability Distribution is an {outcome: probability} mapping.
The values are normalized to sum to 1.
ProbDist(0.75) is an abbreviation for ProbDist({T: 0.75, F: 0.25})."""
def __init__(self, mapping=(), **kwargs):
if isinstance(mapping, float):

29
mapping = {T: mapping, F: 1 - mapping}
self.update(mapping, **kwargs)
normalize(self)

class Evidence(dict):
"A {variable: value} mapping, describing what we know for sure."

class CPTable(dict):
"A mapping of {row: ProbDist, ...} where each row is a tuple of values of the parent
variables."

def init(self, mapping, parents=()):

"""Provides two shortcuts for writing a Conditional Probability Table.
With no parents, CPTable(dist) means CPTable({(): dist}).
With one parent, CPTable({val: dist,...}) means CPTable({(val,): dist,...})."""
if len(parents) == 0 and not (isinstance(mapping, dict) and set(mapping.keys()) == {()}):
mapping = {(): mapping}
for (row, dist) in mapping.items():
if len(parents) == 1 and not isinstance(row, tuple):
row = (row,)
self[row] = ProbDist(dist)

class Bool(int):
"Just like `bool`, except values display as 'T' and 'F' instead of 'True' and 'False'"
__str__ = __repr__ = lambda self: 'T' if self else 'F'

T = Bool(True)
F = Bool(False)

In [9]:

def P(var, evidence={}):

"The probability distribution for P(variable | evidence), when all parent variables are
known (in evidence)."
row = tuple(evidence[parent] for parent in var.parents)
return var.cpt[row]

def normalize(dist):
"Normalize a {key: value} distribution so values sum to 1.0. Mutates dist and returns it."
total = sum(dist.values())
for key in dist:
dist[key] = dist[key] / total
assert 0 <= dist[key] <= 1, "Probabilities must be between 0 and 1."
return dist

def sample(probdist):
"Randomly sample an outcome from a probability distribution."

30
r = random.random() # r is a random point in the probability distribution
c = 0.0 # c is the cumulative probability of outcomes seen so far
for outcome in probdist:
c += probdist[outcome]
if r <= c:
return outcome
def globalize(mapping):
"Given a {name: value} mapping, export all the names to the `globals()` namespace."
globals().update(mapping)

In [4]:
Earthquake = Variable('Earthquake', 0.002)

In [5]:
P(Earthquake)

Out[5]:
{F: 0.998, T: 0.002}

In [6]:
P(Earthquake)[T]

Out[6]:
0.002

In [7]:
alarm_net = (BayesNet()
.add('Burglary', [], 0.001)
.add('Earthquake', [], 0.002)
.add('Alarm', ['Burglary', 'Earthquake'], {(T, T): 0.95, (T, F): 0.94, (F, T): 0.29, (F, F): 0.001})
.add('JohnCalls', ['Alarm'], {T: 0.90, F: 0.05})
.add('MaryCalls', ['Alarm'], {T: 0.70, F: 0.01}))

In [8]:
# Make Burglary, Earthquake, etc. be global variables
globalize(alarm_net.lookup)
alarm_net.variables

Out[8]:
[Burglary, Earthquake, Alarm, JohnCalls, MaryCalls]

In [14]:
# Probability of Alarm going off, given a Burglary and not an Earthquake:
P(Alarm, {Burglary: T, Earthquake: F})

Out[14]:
{T: 0.94, F: 0.06000000000000005}

31
In [15]:
Alarm.cpt

Out[15]:
{(T, T): {T: 0.95, F: 0.050000000000000044},
(T, F): {T: 0.94, F: 0.06000000000000005},
(F, T): {T: 0.29, F: 0.71},
(F, F): {T: 0.001, F: 0.999}}

Bayes Nets as Joint Probability Distributions

A Bayes net is a compact way of specifying a full joint distribution over all the variables in
the network. Given a set of variables {X1, ..., X*n*}, the full joint distribution is:

P(X1=x1, ..., Xn=xn) = Πi P(Xi = xi | parents(Xi))

For a network with n variables, each of which has b values, there are bn rows in the joint
distribution (for example, a billion rows for 30 Boolean variables), making it impractical to
explicitly create the joint distribution for large networks. But for small networks, the function
joint_distribution creates the distribution, which can be instructive to look at, and can be
used to do inference.

In [16]:
def joint_distribution(net):
"Given a Bayes net, create the joint distribution over all variables."
return ProbDist({row: prod(P_xi_given_parents(var, row, net)
for var in net.variables)
for row in all_rows(net)})

def all_rows(net): return itertools.product(*[var.domain for var in net.variables])

def P_xi_given_parents(var, row, net):

"The probability that var = xi, given the values in this row."
dist = P(var, Evidence(zip(net.variables, row)))
xi = row[net.variables.index(var)]
return dist[xi]

def prod(numbers):
"The product of numbers: prod([2, 3, 5]) == 30. Analogous to `sum([2, 3, 5]) == 10`."
result = 1
for x in numbers:
result *= x
return result

In [17]:

32
P(Alarm, {Burglary: F, Earthquake: F})

Out[17]:
{T: 0.001, F: 0.999}

In [18]:
# Probability that "the alarm has sounded, but neither a burglary nor an earthquake has
occurred,
# and both John and Mary call" (page 514 says it should be 0.000628)

print(alarm_net.variables)
joint_distribution(alarm_net)[F, F, T, T, T]

[Burglary, Earthquake, Alarm, JohnCalls, MaryCalls]

Out[18]:

0.00062811126

Inference by Querying the Joint Distribution

Bayes nets allow us to calculate the probability, but the calculation is not just a lookup
in the CPT; it is a global calculation across the whole net. One inefficient but straightforward
way of doing the calculation is to create the joint probability distribution, then pick out just
the rows that match the evidence variables, and for each row check what the value of the
query variable is, and increment the probability for that value accordningly:

In [19]:

def enumeration_ask(X, evidence, net):

"The probability distribution for query variable X in a belief net, given evidence."
i = net.variables.index(X) # The index of the query variable X in the row
dist = defaultdict(float) # The resulting probability distribution over X
for (row, p) in joint_distribution(net).items():
if matches_evidence(row, evidence, net):
dist[row[i]] += p
return ProbDist(dist)

def matches_evidence(row, evidence, net):

"Does the tuple of values for this row agree with the evidence?"
return all(evidence[v] == row[net.variables.index(v)]
for v in evidence)

In [20]:

# The probability of a Burgalry, given that John calls but Mary does not:
enumeration_ask(Burglary, {JohnCalls: F, MaryCalls: T}, alarm_net)

33
Out[20]:
{F: 0.9931237539265789, T: 0.006876246073421024}

In [21]:
enumeration_ask(Burglary, {JohnCalls: T, MaryCalls: T}, alarm_net)

Out[21]:
{F: 0.7158281646356071, T: 0.28417183536439294}

In [22]:
# The probability of an Alarm, given that there is an Earthquake and Mary calls:
enumeration_ask(Alarm, {MaryCalls: T, Earthquake: T}, alarm_net)

Out[22]:
{F: 0.03368899586522123, T: 0.9663110041347788}

Using Variable Elimination - Cleveland database

The Cleveland database contains 76 attributes, but all published experiments refer to using a
subset of 14 of them. In particular, the Cleveland database is the only one that has been used
by ML researchers to this date. The “Heartdisease” field refers to the presence of heart disease
in the patient. It is integer valued from 0 (no presence) to 4.

Database: 0 1 2 3 4 Total

Cleveland: 164 55 36 35 13 303

Attribute Information:

1. age: age in years

2. sex: sex (1 = male; 0 = female)
3. cp: chest pain type
1. Value 1: typical angina
2. Value 2: atypical angina
3. Value 3: non-anginal pain
4. Value 4: asymptomatic
4. trestbps: resting blood pressure (in mm Hg on admission to the hospital)
5. chol: serum cholestoral in mg/dl
6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg: resting electrocardiographic results
1. Value 0: normal
2. Value 1: having ST-T wave abnormality (T wave inversions and/or ST
elevation or depression of > 0.05 mV)
3. Value 2: showing probable or definite left ventricular hypertrophy by Estes’
criteria
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina (1 = yes; 0 = no)
10. oldpeak = ST depression induced by exercise relative to rest

34
11. slope: the slope of the peak exercise ST segment
1. Value 1: upsloping
2. Value 2: flat
3. Value 3: downsloping
12. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
13. Heartdisease: It is integer valued from 0 (no presence) to 4.

Some instance from the dataset:

age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal Heartdisease
63 1 1 145 233 1 2 150 o 2.3 3 o 6 o
67 1 4 160 286 o 2 108 1 1.5 2 3 3 2
67 1 4 120 229 o 2 129 1 2.6 2 2 7 1
41 o 2 130 204 o 2 172 o 1.4 1 o 3 o
62 o 4 140 268 o 2 160 o 3.6 3 2 3 3
60 1 4 130 206 o 2 132 1 2.4 2 2 7 4

Python Program to Implement and Demonstrate Bayesian network using

pgmpy Machine Learning
import numpy as np
import pandas as pd
import csv
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination

heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)

print('Sample instances from the dataset are given below')

print(heartDisease.head())

print('\n Attributes and datatypes')

print(heartDisease.dtypes)

model=
BayesianModel([('age','heartdisease'),('sex','heartdisease'),('exang','hear
tdisease'),('cp','heartdisease'),('heartdisease','restecg'),('heartdisease'
,'chol')])
print('\nLearning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)

print('\n Inferencing with Bayesian Network:')

HeartDiseasetest_infer = VariableElimination(model)

print('\n 1. Probability of HeartDisease given evidence= restecg')

q1=HeartDiseasetest_infer.query(variables=['heartdisease'],evidence={'reste
cg':1})
print(q1)

print('\n 2. Probability of HeartDisease given evidence= cp ')

q2=HeartDiseasetest_infer.query(variables=['heartdisease'],evidence={'cp':2
})

35
print(q2)

Output

Result:
Thus Bayesian network is built and inference for earthquake problem is identified.

36
Ex. No. 5 BUILD REGRESSION MODELS
Date:

AIM:
To build regression models using various datasets.

REGRESSION:
Regression shows a line or curve that passes through all the datapoints on
target-predictor graph in such a way that the vertical distance between the datapoints and the
regression line is minimum.

Some examples:
->Prediction of rain
->Determining Market trends
->Prediction of road accidents due to rash driving.

Terminologies:
• Dependent Variable
• Independent Variable
• Outliers
• Multicollinearity
• Underfitting and Overfitting
Types of regression :
• Linear Regression
• Logistic Regression
• Ridge Regression
• Lasso Regression:

Linear Regression:
Linear Regression is an algorithm that belongs to supervised machine learning. It
tries to apply relations that will predict the outcome of an event based on the independent
variable data points. The relation is usually a straight line that best fits the different data
points as close as possible.
Y= aX+b
Here, Y = dependent variables (target variables),
X= Independent variables (predictor variables),
a and b are the linear coefficients

37
Logistic Regression:
Logistic Regression is one of the supervised learning algorithm. It is used to
calculate or predict the probability of an event occurring.. The formula is given below:
1
𝑓 (𝑥 ) =
1+𝑒
• f(x)= Output between the 0 and 1 value.
• x= input to the function
• e= base of natural logarithm.
Ridge Regression:
Ridge Regression Is a technique Which Is used for analyzing Multiple
Regression where the data suffers from multicollinearity. The Problem which arises due to
multicollinearity is that the basic linear regression model (Least Square Estimates) becomes
unbiased and the variance becomes so large that the predicted values are far from the true
value.

Lasso Regression:
Lasso (least absolute shrinkage and selection operator) is a regression analysis
method that performs both variable selection and regularization in order to enhance the
prediction accuracy and interpretability of the resulting statiscal model.

Algorithm:

Step 1:Load the data .

Step 2 : Initialize the parameters.
Step 3 : Predict the value of a dependent variable by given an independent variable.
Step 4: Calculate the error in prediction for all data points.
Step 5:. Check for accuracy.
Step 6: Visualize the results with a graph.
Step 7: Update the values
Step 8: Report your results

38
LINEAR REGRESSION:

CODE:(Head brain dataset)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data=pd.read_csv('headbrain.csv')
data

OUTPUT:

Age Range Head Size(cm) Brain

Gender Weight(grams)
0 1 1 4512 1530
1 1 1 3738 1297
2 1 1 4261 1335
3 1 1 3777 1282
4 1 1 4177 1590
... ... ... ... ...
232 2 2 3214 1110
233 2 2 3394 1215
234 2 2 3233 1104
235 2 2 3352 1170
236 2 2 3391 1120

PREDICTING THE COEFFICIENTS:

CODE:
X = data['Head Size(cm^3)'].values
Y = data['Brain Weight(grams)'].values
mean_x = np.mean(X)
mean_y = np.mean(Y)
n = len(X)
numerator = 0
denominator = 0
for i in range (n):
numerator += (X[i]-mean_x) * (Y[i] - mean_y)
denominator += (X[i] - mean_x) ** 2
b1 = numerator / denominator
b0 = mean_y - (b1 * mean_x)
print("coefficients:",b1,b0)

OUTPUT:
coefficients: 0.26342933948939945 325.57342104944223

FINDING MEAN SQUARED ERROR:

39
CODE:
ss_t = 0
ss_r = 0
for i in range(n):
y_pred = b0 + b1 * X[i]
ss_t += (Y[i] - mean_y) ** 2
ss_r += (Y[i] - y_pred) ** 2
r2 = 1 - (ss_r/ss_t)
print("mean squared error:",r2)

OUTPUT:

mean squared error: 0.6393117199570003

VISUALIZING THE DATA:

CODE:

max_x = np.max(X) + 100

min_x = np.min(X) - 100
x = np.linspace(min_x,max_x,1000)
y = b0+b1*x
plt.plot(x,y,color='#58b970', label = 'Regression line')
plt.scatter(X,Y, c = '#ef5423', label ='Scatter plot')
plt.xlabel('Head Size in cm')
plt.ylabel('Brain weight in grams')
plt.legend()
plt.show()

OUTPUT:

40
LOGISTIC REGRESSION:

CODE:(Diabetes Dataset)

import pandas as pd
df=pd.read_csv('diabetes.csv')
df

OUTPUT:
Pregnanci Glucose Blood Skin Insuli BMI Diabetes Age Outco
es Pressure Thickness n Pedigree me
Function
0 6 148 72 35 0 33.6 0.627 50 1
1 1 85 66 29 0 26.6 0.351 31 0
2 8 183 64 0 0 23.3 0.672 32 1
3 1 89 66 23 94 28.1 0.167 21 0
4 0 137 40 35 168 43.1 2.288 33 1
... ... ... ... ... ... ... ... ... ...
763 10 101 76 48 180 32.9 0.171 63 0
764 2 122 70 27 0 36.8 0.340 27 0
765 5 121 72 23 112 26.2 0.245 30 0
766 1 126 60 0 0 30.1 0.349 47 1
767 1 93 70 31 0 30.4 0.315 23 0
768 rows × 9 columns
DATA DESCRIPTION:

41
CODE:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
diabetesDF = pd.read_csv('diabetes.csv')
print(diabetesDF.head())

OUTPUT:

Pregnancies Glucose BloodPressure SkinThickness Insulin

BMI \
0 6 148 72 35 0
33.6
1 1 85 66 29 0
26.6
2 8 183 64 0 0
23.3
3 1 89 66 23 94
28.1
4 0 137 40 35 168
43.1

DiabetesPedigreeFunction Age Outcome

0 0.627 50 1
1 0.351 31 0
2 0.672 32 1
3 0.167 21 0
4 2.288 33 1

DATA EXPLORATION:

CODE:

corr = diabetesDF.corr()
print(corr)
sns.heatmap(corr,xticklabels=corr.columns,yticklabels=corr.col
umns)

OUTPUT:

Pregnancies Glucose BloodPressure

SkinThickness \
Pregnancies 1.000000 0.129459 0.141282
-0.081672
Glucose 0.129459 1.000000 0.152590
0.057328
BloodPressure 0.141282 0.152590 1.000000

42
0.207371
SkinThickness -0.081672 0.057328 0.207371
1.000000
Insulin -0.073535 0.331357 0.088933
0.436783
BMI 0.017683 0.221071 0.281805
0.392573
DiabetesPedigreeFunction -0.033523 0.137337 0.041265
0.183928
Age 0.544341 0.263514 0.239528
-0.113970
Outcome 0.221898 0.466581 0.065068
0.074752

Insulin BMI
DiabetesPedigreeFunction \
Pregnancies -0.073535 0.017683 -
0.033523
Glucose 0.331357 0.221071
0.137337
BloodPressure 0.088933 0.281805
0.041265
SkinThickness 0.436783 0.392573
0.183928
Insulin 1.000000 0.197859
0.185071
BMI 0.197859 1.000000
0.140647
DiabetesPedigreeFunction 0.185071 0.140647
1.000000
Age -0.042163 0.036242
0.033561
Outcome 0.130548 0.292695
0.173844

Age Outcome
Pregnancies 0.544341 0.221898
Glucose 0.263514 0.466581
BloodPressure 0.239528 0.065068
SkinThickness -0.113970 0.074752
Insulin -0.042163 0.130548
BMI 0.036242 0.292695
DiabetesPedigreeFunction 0.033561 0.173844
Age 1.000000 0.238356
Outcome 0.238356 1.000000
<Axes: >

43
LASSO REGRESSION:
TO FIND MODEL SCORE:

CODE:

from sklearn import datasets

from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
diabetes = datasets.load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(diabetes.d
ata, diabetes.target, test_size=0.3, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)
y_pred = lasso.predict(X_test)
print("Model Score: ", lasso.score(X_test, y_test))
lasso.coef_

OUTPUT:

Model Score: 0.47815356922835583

array([ 1.364918 , -12.21558692, 26.45121861, 18.40929882,
-30.54131232, 14.55719971, 0. , 11.74486066,
26.79441432, 2.06055063])

44
CODE:
from sklearn.model_selection import GridSearchCV
param_grid = {'alpha': [0.001, 0.01, 0.1, 1, 10]}
lasso_cv = GridSearchCV(Lasso(), param_grid, cv=5)
lasso_cv.fit(X_train, y_train)
print("Best Parameters:", lasso_cv.best_params_)
print("Best Score:", lasso_cv.be

OUTPUT:

Best Parameters: {'alpha': 0.1}

Best Score: 0.45302353092758024

RIDGE REGRESSION:
CODE:(Housing Dataset)
import pandas as pd
import numpy as np
df=pd.read_csv("housing.csv")
df.info()

OUTPUT:

Data columns (total 10 columns):

0 longitude 20640 non-null float64
1 latitude 20640 non-null float64
2 housing_median_age 20640 non-null float64
3 total_rooms 20640 non-null float64
4 total_bedrooms 20433 non-null float64
5 population 20640 non-null float64
6 households 20640 non-null float64
7 median_income 20640 non-null float64
8 median_house_value 20640 non-null float64
9 ocean_proximity 20640 non-null object

TO ANALYSE THE DATASET:

CODE:
from pandas import read_csv
from sklearn.linear_model import Ridge

45
url='https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
dataframe = read_csv(url, header=None)
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
model = Ridge(alpha=1.0)
model.fit(X, y)
row =[0.00632,18.00,2.310,0,0.5380,6.5750,65.20,4.0900,1,296.0,15.30,396.90,4.98]
yhat = model.predict([row])
print('Predicted: %.3f' % yhat)

OUTPUT:
Predicted: 30.253

CODE:
from numpy import arange
from pandas import read_csv
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RepeatedKFold
from sklearn.linear_model import Ridge
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
dataframe = read_csv(url, header=None)
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
model = Ridge()
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
grid = dict()
grid['alpha'] = arange(0, 1, 0.01)
search = GridSearchCV(model, grid, scoring='neg_mean_absolute_error', cv=cv,n_jobs=-1)
results = search.fit(X, y)
print('MAE: %.3f' % results.best_score_)
print('Config: %s' % results.best_params_)

OUTPUT:
MAE: -3.379
Config: {'alpha': 0.51}

VISUALIZATION:
CODE:

46
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
X = 1.0 / (np.arange(1, 11) + np.arange(0, 10)[:, np.newaxis])
y = np.ones(10)
n_alphas = 200
alphas = np.logspace(-10, -2, n_alphas)
coefs = []
for a in alphas:
ridge = linear_model.Ridge(alpha=a, fit_intercept=False)
ridge.fit(X, y)
coefs.append(ridge.coef_)
ax = plt.gca()
ax.plot(alphas, coefs)
ax.set_xscale("log")
ax.set_xlim(ax.get_xlim()[::-1])
plt.xlabel("population")
plt.ylabel("total_rooms")
plt.title("Ridge regression")
plt.axis("tight")
plt.show()

OUTPUT:

Result:
Thus various Linear regression models are implemented.

Aosdijfpqoiew
No ratings yet
Aosdijfpqoiew
6 pages
Sampling Distribution PPT To USE
100% (1)
Sampling Distribution PPT To USE
20 pages
FRA Milestone 1 Jupyter Notebook PDF
100% (3)
FRA Milestone 1 Jupyter Notebook PDF
42 pages
Project +Sweta+Kumari+ +FRA+Milestone+1+ July+ 2021
100% (2)
Project +Sweta+Kumari+ +FRA+Milestone+1+ July+ 2021
31 pages
Import: Sys - Executable - M Pip Install
No ratings yet
Import: Sys - Executable - M Pip Install
23 pages
1) Introduction A) Defining Problem Statement:-: ST ST
No ratings yet
1) Introduction A) Defining Problem Statement:-: ST ST
10 pages
Reading CSV Data Python Example
No ratings yet
Reading CSV Data Python Example
5 pages
Observation: As We Can See We Have Threwe Types of Datatypes I.E. (Int, Float, Object) That Means We Have Both Categorical and Numerical Data
No ratings yet
Observation: As We Can See We Have Threwe Types of Datatypes I.E. (Int, Float, Object) That Means We Have Both Categorical and Numerical Data
2 pages
Chi Square Assignment MOHA 570
No ratings yet
Chi Square Assignment MOHA 570
3 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
Week3 Logistic Regression Post PDF
No ratings yet
Week3 Logistic Regression Post PDF
110 pages
Bank Marketing Ingles
No ratings yet
Bank Marketing Ingles
37 pages
Linear Models Reading
No ratings yet
Linear Models Reading
26 pages
Classification Problems
100% (1)
Classification Problems
25 pages
Credit Card Default
No ratings yet
Credit Card Default
30 pages
Clustering
No ratings yet
Clustering
53 pages
TITLE: Bank Marketing Classification: Submitted To: Dr. Supriya Kumar de Professor XLRI, Jamshedpur
No ratings yet
TITLE: Bank Marketing Classification: Submitted To: Dr. Supriya Kumar de Professor XLRI, Jamshedpur
18 pages
Capstone Report: FIRST NAME: Gopalakrishnan LAST NAME: Kalarikovilagam Subramanian M12821535
No ratings yet
Capstone Report: FIRST NAME: Gopalakrishnan LAST NAME: Kalarikovilagam Subramanian M12821535
17 pages
Note 4
No ratings yet
Note 4
18 pages
Public Policy Report: UC Student Default Python Model: The Council For Education (CED)
No ratings yet
Public Policy Report: UC Student Default Python Model: The Council For Education (CED)
20 pages
Credit Risk Modelling (EDA & Classification) - Kaggle
No ratings yet
Credit Risk Modelling (EDA & Classification) - Kaggle
21 pages
PA v0.21
No ratings yet
PA v0.21
17 pages
Data Analysis in The Banking Sector: Pandas Fundamentals
No ratings yet
Data Analysis in The Banking Sector: Pandas Fundamentals
16 pages
Untitled
No ratings yet
Untitled
29 pages
Danmairo - Analysis - Ipynb - Colaboratory
No ratings yet
Danmairo - Analysis - Ipynb - Colaboratory
18 pages
Fraud Transaction Prediction
No ratings yet
Fraud Transaction Prediction
26 pages
Apex Financial Services Loan Data Automation
No ratings yet
Apex Financial Services Loan Data Automation
18 pages
Lab3.ipynb - Colaboratory
No ratings yet
Lab3.ipynb - Colaboratory
7 pages
Student - Linear Regression Example - Colaboratory
No ratings yet
Student - Linear Regression Example - Colaboratory
6 pages
954/3 STPM (PCS3 2020) : Section A (45 Marks) Answer All Questions
No ratings yet
954/3 STPM (PCS3 2020) : Section A (45 Marks) Answer All Questions
8 pages
Eda Case Study Code
No ratings yet
Eda Case Study Code
40 pages
Week 4 LAB
No ratings yet
Week 4 LAB
26 pages
Germany Credit Analysis
No ratings yet
Germany Credit Analysis
41 pages
IP Practic MINE
No ratings yet
IP Practic MINE
30 pages
DPDZero Assessment
No ratings yet
DPDZero Assessment
12 pages
Data Pre Processing and Cleaning
No ratings yet
Data Pre Processing and Cleaning
56 pages
ML Cops
No ratings yet
ML Cops
17 pages
Predicting Credit Risk 1713295035
No ratings yet
Predicting Credit Risk 1713295035
19 pages
Ensemmmmm
No ratings yet
Ensemmmmm
10 pages
Basic Statistical Concepts
No ratings yet
Basic Statistical Concepts
28 pages
Anova Assignment
No ratings yet
Anova Assignment
11 pages
ECN190 Term Project: Predicting Credit Card Default Risk: Introduction and Literature
No ratings yet
ECN190 Term Project: Predicting Credit Card Default Risk: Introduction and Literature
18 pages
AIML 01 Merged
No ratings yet
AIML 01 Merged
25 pages
Ulo 2 Project Romnickcael
No ratings yet
Ulo 2 Project Romnickcael
13 pages
Confidence Interval
No ratings yet
Confidence Interval
10 pages
Price Utilization and Expense Model
No ratings yet
Price Utilization and Expense Model
52 pages
Hands-On Activity 3.3 Random Forest Mantaring - Ipynb - Mantaring
No ratings yet
Hands-On Activity 3.3 Random Forest Mantaring - Ipynb - Mantaring
13 pages
Econometrics: Assignment 1
No ratings yet
Econometrics: Assignment 1
6 pages
Module 9 Seaborn - Loans MSIS2407 20241113 Filled
No ratings yet
Module 9 Seaborn - Loans MSIS2407 20241113 Filled
38 pages
Credit Card Default
No ratings yet
Credit Card Default
5 pages
Tolerancias Mettler
No ratings yet
Tolerancias Mettler
247 pages
Practical 3
No ratings yet
Practical 3
8 pages
Visualizing Interaction Effects: A Proposal For Presentation and Interpretation
No ratings yet
Visualizing Interaction Effects: A Proposal For Presentation and Interpretation
8 pages
Credit-Card - Notebooks - Preprocessed-Data - Data - Preprocessing - Ipynb at Main Shubhamdongarjal - Credit-Card
No ratings yet
Credit-Card - Notebooks - Preprocessed-Data - Data - Preprocessing - Ipynb at Main Shubhamdongarjal - Credit-Card
15 pages
Ani, Cedric James D. BSEE-2104 18-50158 Engineering Data Analysis
No ratings yet
Ani, Cedric James D. BSEE-2104 18-50158 Engineering Data Analysis
3 pages
Data Frame Notes3
No ratings yet
Data Frame Notes3
39 pages
COMSATS University Islamabad Department of Management Sciences Terminal Exam - Spring 2021
No ratings yet
COMSATS University Islamabad Department of Management Sciences Terminal Exam - Spring 2021
8 pages
Homework 2 With Suggested Answers
No ratings yet
Homework 2 With Suggested Answers
14 pages
AML Project LearnerNotebook LowCode
No ratings yet
AML Project LearnerNotebook LowCode
74 pages
MSML Project 1
No ratings yet
MSML Project 1
8 pages
Hackett 1985
No ratings yet
Hackett 1985
42 pages
Building Logistic Regression Model in Python
No ratings yet
Building Logistic Regression Model in Python
24 pages
I PUC Stats MQP1 IA2
No ratings yet
I PUC Stats MQP1 IA2
3 pages
TQ - Fourth quarter-MATHEMATICS10
No ratings yet
TQ - Fourth quarter-MATHEMATICS10
2 pages
Naive Bayes Vs Logistic Regression
No ratings yet
Naive Bayes Vs Logistic Regression
16 pages
Report On Xtra Care
No ratings yet
Report On Xtra Care
51 pages
2015 Exit Exam - Questions
No ratings yet
2015 Exit Exam - Questions
159 pages
What Drives The Development of Life Insurance Sect
No ratings yet
What Drives The Development of Life Insurance Sect
15 pages
ISLP - Website-135-200 (1) - 1-60
No ratings yet
ISLP - Website-135-200 (1) - 1-60
60 pages
Confusion Matrix in Machine Learning FGVBN
No ratings yet
Confusion Matrix in Machine Learning FGVBN
4 pages
Skewness, Moments and Kurtosis-1
No ratings yet
Skewness, Moments and Kurtosis-1
3 pages
AGE 302 Introductory Notes-1
No ratings yet
AGE 302 Introductory Notes-1
19 pages
1) Download The Binary Classification Dataset For... - Colab
No ratings yet
1) Download The Binary Classification Dataset For... - Colab
6 pages
M1 Lesson 1
No ratings yet
M1 Lesson 1
6 pages
ML Lab Manual 1-10
No ratings yet
ML Lab Manual 1-10
58 pages
Finals MMW Final
No ratings yet
Finals MMW Final
4 pages
Comparative Performance of Three Perennial
No ratings yet
Comparative Performance of Three Perennial
14 pages
Eny Eg LH5 PP LWRJQ AJCb 8 S65 HT0 Ty 8 Q
No ratings yet
Eny Eg LH5 PP LWRJQ AJCb 8 S65 HT0 Ty 8 Q
9 pages
Kunal Assignment 3
No ratings yet
Kunal Assignment 3
19 pages
Screenshot 2024-01-31 at 6.54.16 PM
No ratings yet
Screenshot 2024-01-31 at 6.54.16 PM
8 pages
Lecture Material 2.5 - Bayesian Estimation & Concepts
No ratings yet
Lecture Material 2.5 - Bayesian Estimation & Concepts
12 pages
LoanTap Case Study
No ratings yet
LoanTap Case Study
37 pages
Data Analyst Interview Assignment
No ratings yet
Data Analyst Interview Assignment
26 pages
Dele
No ratings yet
Dele
29 pages
Fraud Transaction Detection - Ipynb - Colab - Rameshkumar
No ratings yet
Fraud Transaction Detection - Ipynb - Colab - Rameshkumar
7 pages
SanatKulkarni - AP22110010183 - Assignment3-1
No ratings yet
SanatKulkarni - AP22110010183 - Assignment3-1
4 pages
A Study On Most Preferred Car Brand
No ratings yet
A Study On Most Preferred Car Brand
7 pages
Microsoft NAV Interview Questions: Unofficial Microsoft Navision Business Solution Certification Review
From Everand
Microsoft NAV Interview Questions: Unofficial Microsoft Navision Business Solution Certification Review
Equity Press
1/5 (1)
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
The DevSecOps Playbook: Deliver Continuous Security at Speed
From Everand
The DevSecOps Playbook: Deliver Continuous Security at Speed
Sean D. Mack
No ratings yet

AIML Lab Ex 3-5 - 1

Uploaded by

AIML Lab Ex 3-5 - 1

Uploaded by

Ex. No.

3 IMPLEMENTATION OF NAÏVE BAYES MODELS

P(A|B) = P(B|A) * P(A) / P(B)

dti fico days.with.cr.linerevol.balrevol.util inq.last.6mths \

#GAUSSIAN NAIVE BAYES

from sklearn.metrics import

f1 = f1_score(y_pred1, y_test, average="weighted")

#MULTINOMIAL NAIVE BAYES

f11 = f1_score(y_pred2, y_test, average="weighted")

#BERNOULLI NAIVE BAYES

accuracy3 = accuracy_score(y_pred3, y_test)

0 0.84 0.97 0.90 2625

accuracy 0.82 3161

accuracy 0.67 3161

0 0.83 0.99 0.91 2625

accuracy 0.83 3161

Thus, implementation of Naïve Bayes Classifiers on “loan_data” using ml models

A Bayesian network is a directed acyclic graph in which each edge corresponds to a

 The directed acyclic graph is a set of random variables represented by nodes.

Variable(name, cpt, parents)

A probability distribution is a mapping of {outcome: probability} for every outcome of a

CPTable({(T, T): .95,

def add(self, name, parentnames, cpt):

def __init__(self, name, cpt, parents=()):

def __repr__(self): return self.__name__

class Factor(dict): "An {outcome: frequency} mapping."

def __init__(self, mapping, parents=()):

def P(var, evidence={}):

Bayes Nets as Joint Probability Distributions

P(X1=x1, ..., X*n*=x*n*) = Π*i* P(X*i* = x*i* | parents(X*i*))

def all_rows(net): return itertools.product(*[var.domain for var in net.variables])

def P_xi_given_parents(var, row, net):

[Burglary, Earthquake, Alarm, JohnCalls, MaryCalls]

Inference by Querying the Joint Distribution

def enumeration_ask(X, evidence, net):

def matches_evidence(row, evidence, net):

Using Variable Elimination - Cleveland database

Cleveland: 164 55 36 35 13 303

1. age: age in years

Some instance from the dataset:

Python Program to Implement and Demonstrate Bayesian network using

print('Sample instances from the dataset are given below')

print('\n Attributes and datatypes')

print('\n Inferencing with Bayesian Network:')

print('\n 1. Probability of HeartDisease given evidence= restecg')

print('\n 2. Probability of HeartDisease given evidence= cp ')

Step 1:Load the data .

CODE:(Head brain dataset)

Age Range Head Size(cm) Brain

PREDICTING THE COEFFICIENTS:

FINDING MEAN SQUARED ERROR:

mean squared error: 0.6393117199570003

VISUALIZING THE DATA:

max_x = np.max(X) + 100

Pregnancies Glucose BloodPressure SkinThickness Insulin

DiabetesPedigreeFunction Age Outcome

Pregnancies Glucose BloodPressure

from sklearn import datasets

Model Score: 0.47815356922835583

Best Parameters: {'alpha': 0.1}

Data columns (total 10 columns):

TO ANALYSE THE DATASET:

You might also like

def init(self, name, cpt, parents=()):

def repr(self): return self.name

def init(self, mapping, parents=()):

P(X1=x1, ..., Xn=xn) = Πi P(Xi = xi | parents(Xi))