0% found this document useful (0 votes)
2 views

Machine Learning External

The document outlines a series of machine learning experiments and algorithms, including FIND-S, Candidate-Elimination, and decision tree algorithms, along with their implementations in Python. It provides an introduction to machine learning concepts, applications, and various approaches such as supervised, unsupervised, and reinforcement learning. Additionally, it includes source code examples for practical implementation of these algorithms using training data from CSV files.

Uploaded by

userstudent7758
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Machine Learning External

The document outlines a series of machine learning experiments and algorithms, including FIND-S, Candidate-Elimination, and decision tree algorithms, along with their implementations in Python. It provides an introduction to machine learning concepts, applications, and various approaches such as supervised, unsupervised, and reinforcement learning. Additionally, it includes source code examples for practical implementation of these algorithms using training data from CSV files.

Uploaded by

userstudent7758
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

CONTENTS

Sl.no. Experiments Page No.


Implement and demonstrate the FIND-S algorithm for finding the most specific
1 hypothesis based on a given set of training data samples. Read the training data 8-9
from a .CSV file.
For a given set of training data examples stored in a .CSV file, implement
2 and demonstrate the Candidate-Elimination algorithm to output a description of 10-13
the set of all hypotheses consistent with the training examples.
Write a program to demonstrate the working of the decision tree based ID3
3 algorithm. Use an appropriate data set for building the decision tree and apply 14-17
this knowledge to classify a new sample.
Exercises to solve the real-world problems using the following machine
4 learning methods: a) Linear Regression b) Logistic Regression c) Binary 18-19
Classifier
Develop a program for Bias, Variance, Remove duplicates , Cross Validation
5 20
Build an Artificial Neural Network by implementing the Back propagation
6 algorithm and test the same using appropriate data sets. 21-25

Implement the non-parametric Locally Weighted Regression algorithm in


7 order to fit data points. Select appropriate data set for your experiment and 26-29
draw graphs.
Assuming a set of documents that need to be classified, use the naïve Bayesian
Classifier model to perform this task. Built-in Java classes/API can be used to
8 write the program. Calculate the accuracy, precision, and recall for your data 30-33
set.
Write a program to Implement Support Vector Machines and Principle Component
9 Analysis 34

Write a program to Implement Principle Component Analysis


10 35-36
MACHINE LEARNING LAB DMSSVH COLLEGE OF
ENGINEERING

Introduction

Machine learning

Machine learning is a subset of artificial intelligence in the field of computer science that often
uses statistical techniques to give computers the ability to "learn" (i.e., progressively improve
performance on a specific task) with data, without being explicitly programmed. In the past
decade, machine learning has given us self-driving cars, practical speech recognition, effective
web search, and a vastly improved understanding of the human genome.

Machine learning tasks

Machine learning tasks are typically classified into two broad categories, depending on whether
there is a learning "signal" or "feedback" available to a learning system:

1. Supervised learning: The computer is presented with example inputs and their desired
outputs, given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs.
As special cases, the input signal can be only partially available, or restricted to special feedback:

2. Semi-supervised learning: the computer is given only an incomplete training signal: a


training set with some (often many) of the target outputs missing.

3. Active learning: the computer can only obtain training labels for a limited set of instances
(based on a budget), and also has to optimize its choice of objects to acquire labels for. When
used interactively, these can be presented to the user for labeling.

4. Reinforcement learning: training data (in form of rewards and punishments) is given only as
feedback to the program's actions in a dynamic environment, such as driving a vehicle or playing
a game against an opponent.
2
MACHINE LEARNING LAB DMSSVH COLLEGE OF
ENGINEERING

5. Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to
find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden
patterns in data) or a means towards an end (feature learning).

Machine Learning Applications

In classification, inputs are divided into two or more classes, and the learner must produce a
model that assigns unseen inputs to one or more (multi-label classification) of these classes. This
is typically tackled in a supervised manner. Spam filtering is an example of classification, where
the inputs are email (or other) messages and the classes are "spam" and "not spam".

In regression, also a supervised problem, the outputs are continuous rather than discrete.
In clustering, a set of inputs is to be divided into groups. Unlike in classification, the groups are
not known beforehand, making this typically an unsupervised task.

Density estimation finds the distribution of inputs in some space.

Dimensionality reduction simplifies inputs by mapping them into a lower dimensional space.
Topic modeling is a related problem, where a program is given a list of human language
documents and is tasked with finding out which documents cover similar topics.

3
MACHINE LEARNING LAB DMSSVH COLLEGE OF
ENGINEERING

Machine learning Approaches

1.Decision tree learning


Decision tree learning uses a decision tree as a predictive model, which maps observations about
an item to conclusions about the item's target value.

2.Association rule learning


Association rule learning is a method for discovering interesting relations between variables in
large databases.

3.Artificial neural networks


An artificial neural network (ANN) learning algorithm, usually called "neural network" (NN), is
a learning algorithm that is vaguely inspired by biological neural networks. Computations are
structured in terms of an interconnected group of artificial neurons, processing information using
a connectionist approach to computation. Modern neural networks are non-linear statistical data
modeling tools. They are usually used to model complex relationships between inputs and
outputs, to find patterns in data, or to capture the statistical structure in an unknown joint
probability distribution between observed variables.

4.Deep learning
Falling hardware prices and the development of GPUs for personal use in the last few years have
contributed to the development of the concept of deep learning which consists of multiple hidden
layers in an artificial neural network. This approach tries to model the way the human brain
processes light and sound into vision and hearing. Some successful applications of deep learning
are computer vision and speech Recognition.

5.Inductive logic programming


Inductive logic programming (ILP) is an approach to rule learning using logic Programming as a
4
MACHINE LEARNING LAB DMSSVH COLLEGE OF
ENGINEERING

uniform representation for input examples, background knowledge, and hypotheses. Given an
encoding of the known background knowledge and a set of examples represented as a logical
database of facts, an ILP system will derive a hypothesized logic program that entails all positive
and no negative examples. Inductive programming is a related field that considers any kind of
programming languages for representing hypotheses (and not only logic programming), such as
functional programs.

6.Support vector machines


Support vector machines (SVMs) are a set of related supervised learning methods used for
classification and regression. Given a set of training examples, each marked as belonging to one
of two categories, an SVM training algorithm builds a model that predicts whether a new
example falls into one category or the other.

7.Clustering
Cluster analysis is the assignment of a set of observations into subsets (called clusters) so that
observations within the same cluster are similar according to some pre designated criterion or
criteria, while observations drawn from different clusters are dissimilar. Different clustering
techniques make different assumptions on the structure of the data, often defined by some
similarity metric and evaluated for example by internal compactness (similarity between
members of the same cluster) and separation between different clusters. Other methods are based
on estimated density and graph connectivity. Clustering is a method of unsupervised learning,
and a common technique for statistical data analysis.

8.Bayesian networks
A Bayesian network, belief network or directed acyclic graphical model is a probabilistic
graphical model that represents a set of random variables and their conditional independencies
via a directed acyclic graph (DAG). For example, a Bayesian network could represent the
probabilistic relationships between diseases and symptoms. Given symptoms, the network can be

5
MACHINE LEARNING LAB DMSSVH COLLEGE OF
ENGINEERING

used to compute the probabilities of the presence of various diseases. Efficient algorithms exist
that perform inference and learning.

9.Reinforcement learning
Reinforcement learning is concerned with how an agent ought to take actions in an environment
so as to maximize some notion of long-term reward. Reinforcement learning algorithms attempt
to find a policy that maps states of the world to the actions the agent ought to take in those states.
Reinforcement learning differs from the supervised learning problem in that correct input/output
pairs are never presented, nor sub-optimal actions explicitly corrected.

10.Similarity and metric learning


In this problem, the learning machine is given pairs of examples that are considered similar and
pairs of less similar objects. It then needs to learn a similarity function (or a distance metric
function) that can predict if new objects are similar. It is sometimes used in Recommendation
systems.

11.Genetic algorithms
A genetic algorithm (GA) is a search heuristic that mimics the process of natural selection, and
uses methods such as mutation and crossover to generate new genotype in the hope of finding
good solutions to a given problem. In machine learning, genetic algorithms found some uses in
the 1980s and 1990s. Conversely, machine learning techniques have been used to improve the
performance of genetic and evolutionary algorithms.

12.Rule-based machine learning


Rule-based machine learning is a general term for any machine learning method that identifies,
learns, or evolves "rules" to store, manipulate or apply, knowledge. The defining characteristic of
a rule- based machine learner is the identification and utilization of a set of relational rules that
collectively represent the knowledge captured by the system. This is in contrast to other machine
learners that commonly identify a singular model that can be universally applied to any instance
in order to make a prediction. Rule-based machine learning approaches include learning
6
MACHINE LEARNING LAB DMSSVH COLLEGE OF
ENGINEERING

classifier systems, association rule learning, and artificial immune systems.

13.Feature selection approach


Feature selection is the process of selecting an optimal subset of relevant features for use in
model construction. It is assumed the data contains some features that are either redundant or
irrelevant, and can thus be removed to reduce calculation cost without incurring much loss of
information. Common optimality criteria include accuracy, similarity and information measures.

7
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

Experiment-1:
Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis based
on a given set of training data samples. Read the training data from a .CSV file.

FIND-S Algorithm

1. Initialize h to the most specific hypothesis in H

2. For each positive training instance x

For each attribute constraint ai in h

If the constraint ai is satisfied by x

Then do nothing

Else replace ai in h by the next more general constraint that is satisfied by x

3. Output hypothesis h

Training Examples:

Example Sky AirTemp Humidity Wind Water Forecast EnjoySport

1 Sunny Warm Normal Strong Warm Same Yes

2 Sunny Warm High Strong Warm Same Yes

3 Rainy Cold High Strong Warm Change No

4 Sunny Warm High Strong Cool Change Yes

SOURCE CODE:
import pandas as pd

import numpy as np

def find_s(c,t):

specific_hypothesis=[]

for i, val in enumerate(c):

specific_hypothesis = c[i].copy()

for i, val in enumerate(c):

if t[i] == "Yes":

for x in range(len(specific_hypothesis)):

if val[x] != specific_hypothesis[x]:

8
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

specific_hypothesis[x] = '?'

else:

pass

return specific_hypothesis

d = pd.read_csv('train.csv')

print("Training Data:")

print(d)

a = np.array(d)[:,:-1]

print(" The attributes are: ",a)

t = np.array(d)[:,-1]

print("The target is: ",t)

print("\nMost Specific Hypothesis:")

print(find_s(a,t))

OUTPUT:
Training Data:
Sky AirTemp Humidity Wind Water Forecast EnjoySport
0 Sunny Warm Normal Strong Warm Same Yes
1 Sunny Warm High Strong Warm Same Yes
2 Rainy Cold High Strong Warm Change No
3 Sunny Warm High Strong Cool Change Yes

The attributes are: [['Sunny' 'Warm' 'Normal' 'Strong' 'Warm' 'Same']


['Sunny' 'Warm' 'High' 'Strong' 'Warm' 'Same']
['Rainy' 'Cold' 'High' 'Strong' 'Warm' 'Change']
['Sunny' 'Warm' 'High' 'Strong' 'Cool' 'Change']]

The target is: ['Yes' 'Yes' 'No' 'Yes']

Most Specific Hypothesis:


['Sunny' 'Warm' '?' 'Strong' '?' '?']

9
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

Experiment-2:
For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate- Elimination algorithm to output a description of the set of all hypotheses consistent
with the training examples.

Candidate-Elimination Algorithm:
1. Load data set
2. G <-maximally general hypotheses in H
3. S <- maximally specific hypotheses in H
4. For each training example d=<x,c(x)>
Case 1 : If d is a positive example
Remove from G any hypothesis that is inconsistent with d For
each hypothesis s in S that is not consistent with d

• Remove s from S.
• Add to S all minimal generalizations h of s such that
• h consistent with d
• Some member of G is more general than h
• Remove from S any hypothesis that is more general than another hypothesis in S

Case 2: If d is a negative example

Remove from S any hypothesis that is inconsistent with d


For each hypothesis g in G that is not consistent with d

*Remove g from G.
*Add to G all minimal specializations h of g such that

o h consistent with d

o Some member of S is more specific than h

• Remove from G any hypothesis that is less general than another hypothesis in G

SOURCE CODE:
import numpy as np

import pandas as pd

# Loading Data from a CSV File

data = pd.DataFrame(data=pd.read_csv('trainingdata3.csv'))

print(data)

10
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

# Separating concept features from Target

concepts = np.array(data.iloc[:,0:-1])

print(concepts)

# Isolating target into a separate DataFrame

# copying last column to target array

target = np.array(data.iloc[:,-1])

print(target)

def learn(concepts, target):

'''

learn() function implements the learning method of the Candidate elimination algorithm.

Arguments:

concepts - a data frame with all the features

target - a data frame with corresponding output values

'''

# Initialise S0 with the first instance from concepts

# .copy() makes sure a new list is created instead of just pointing to the same memory location

specific_h = concepts[0].copy()

print("\nInitialization of specific_h and general_h")

print(specific_h)

#h=["#" for i in range(0,5)]

#print(h)

general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]

print(general_h)

# The learning iterations

for i, h in enumerate(concepts):

# Checking if the hypothesis has a positive target

if target[i] == "Yes":

11
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

for x in range(len(specific_h)):

# Change values in S & G only if values change

if h[x] != specific_h[x]:

specific_h[x] = '?'

general_h[x][x] = '?'

# Checking if the hypothesis has a positive target

if target[i] == "No":

for x in range(len(specific_h)):

# For negative hyposthesis change values only in G

if h[x] != specific_h[x]:

general_h[x][x] = specific_h[x]

else:

general_h[x][x] = '?'

print("\nSteps of Candidate Elimination Algorithm",i+1)

print(specific_h)

print(general_h)

# find indices where we have empty rows, meaning those that are unchanged

indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]

for i in indices:

# remove those rows from general_h

general_h.remove(['?', '?', '?', '?', '?', '?'])

# Return final values

return specific_h, general_h

s_final, g_final = learn(concepts, target)

print("\nFinal Specific_h:", s_final, sep="\n")

print("\nFinal General_h:", g_final, sep="\n")

12
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

OUTPUT:

sky airTemp humidity wind water forecast enjoySport


0 Sunny Warm Normal Strong Warm Same Yes
1 Sunny Warm High Strong Warm Same Yes
2 Rainy Cold High Strong Warm Change No
3 Sunny Warm High Strong Cool Change Yes
[[' Sunny' 'Warm' 'Normal' 'Strong' 'Warm' 'Same']
[' Sunny' 'Warm' 'High' 'Strong' 'Warm' 'Same']
[' Rainy' 'Cold' 'High' 'Strong' 'Warm' 'Change']
[' Sunny' 'Warm' 'High' 'Strong' 'Cool' 'Change']]
['Yes' 'Yes' 'No' 'Yes']

Initialization of specific_h and general_h


[' Sunny' 'Warm' 'Normal' 'Strong' 'Warm' 'Same']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?', '?']]

Steps of Candidate Elimination Algorithm 1


[' Sunny' 'Warm' 'Normal' 'Strong' 'Warm' 'Same']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?', '?']]

Steps of Candidate Elimination Algorithm 2


[' Sunny' 'Warm' '?' 'Strong' 'Warm' 'Same']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?', '?']]

Steps of Candidate Elimination Algorithm 3


[' Sunny' 'Warm' '?' 'Strong' 'Warm' 'Same']
[[' Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?', '?'], ['?',
'?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?',
'?', '?'], ['?', '?', '?', '?', '?', 'Same']]

Steps of Candidate Elimination Algorithm 4


[' Sunny' 'Warm' '?' 'Strong' '?' '?']
[[' Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?', '?'], ['?',
'?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?',
'?', '?'], ['?', '?', '?', '?', '?', '?']]

Final Specific_h:
[' Sunny' 'Warm' '?' 'Strong' '?' '?']

Final General_h:
[[' Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?', '?']]

13
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

Experiment-3:
Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new
sample.

14
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

SOURCE CODE:
import pandas as pd

from sklearn import tree

from sklearn.preprocessing import LabelEncoder

from sklearn.tree import DecisionTreeClassifier

from six import StringIO

# Load data from CSV

data = pd.read_csv('tennisdata.csv')

print("The first 5 values of data is \n",data.head())

# Obtain Train data and Train output

X = data.iloc[:,:-1]

print("\nThe first 5 values of Train data is \n",X.head())

y = data.iloc[:,-1]

print("\nThe first 5 values of Train output is \n",y.head())

15
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

# Convert them in numbers

le_outlook = LabelEncoder()

X.Outlook = le_outlook.fit_transform(X.Outlook)

le_Temperature = LabelEncoder()

X.Temperature = le_Temperature.fit_transform(X.Temperature)

le_Humidity = LabelEncoder()

X.Humidity = le_Humidity.fit_transform(X.Humidity)

le_Windy = LabelEncoder()

X.Windy = le_Windy.fit_transform(X.Windy)

print("\nNow the Train data is",X.head())

le_PlayTennis = LabelEncoder()

y = le_PlayTennis.fit_transform(y)

print("\nNow the Train data is\n",y)

## Train model

classifier = DecisionTreeClassifier()

classifier.fit(X,y)

#""" Lets check model"""

## Function to encode input

def labelEncoderForInput(list1):

list1[0] = le_outlook.transform([list1[0]])[0]

list1[1] = le_Temperature.transform([list1[1]])[0]

list1[2] = le_Humidity.transform([list1[2]])[0]

list1[3] = le_Windy.transform([list1[3]])[0]

return [list1]

## predict for an input

#inp = ["Rainy","Mild","High","False"]

inp1=["Rainy","Cool","High","False"]

pred1 = labelEncoderForInput(inp1)

16
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

y_pred = classifier.predict(pred1)

y_pred

print("\nfor input {0}, we obtain {1}".format(inp1, le_PlayTennis.inverse_transform(y_pred[0])))

OUTPUT:
The first 5 values of data is
Outlook Temperature Humidity Windy PlayTennis
0 Sunny Hot High False No
1 Sunny Hot High True No
2 Overcast Hot High False Yes
3 Rainy Mild High False YeS
4 Rainy Cool Normal False Yes

The first 5 values of Train data is


Outlook Temperature Humidity Windy
0 Sunny Hot High False
1 Sunny Hot High True
2 Overcast Hot High False
3 Rainy Mild High False
4 Rainy Cool Normal False

The first 5 values of Train output is


0 No
1 No
2 Yes
3 YeS
4 Yes
Name: PlayTennis, dtype: object

Now the Train data is Outlook Temperature Humidity Windy


0 2 1 0 0
1 2 1 0 1
2 0 1 0 0
3 1 2 0 0
4 1 0 1 0

Now the Train data is


[0 0 2 1 2 0 2 0 2 2 2 2 2 0]

17
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

Experiment-4:
Exercises to solve the real-world problems using the following machine learning methods: a)
Linear Regression b) Logistic Regression c) Binary Classifier

SOURCE CODE:
a) Linear Regression:

from sklearn.linear_model import LinearRegression

import numpy as np

X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)

y = np.array([2, 4, 6, 8, 10])

lr = LinearRegression()

lr.fit(X, y)

y_pred = lr.predict(X)

print('Coefficients: \n', lr.coef_)

print('Intercept: \n', lr.intercept_)

print('Predicted values: \n', y_pred)

OUTPUT:

Coefficients:
[2.]
Intercept:
-1.7763568394002505e-15
Predicted values:
[ 2. 4. 6. 8. 10.]

18
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

b)Logistic Regression:
SOURCE CODE:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

data = pd.read_csv("4pr.csv")
print(data.head())
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=0)
model = LogisticRegression()
model.fit(X_train,y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(accuracy)
print("Accuracy: {:.2f}%".format(accuracy * 100))

OUTPUT:
sbp tobacco ldl adiposity famhist typea obesity alcohol age chd
0 160 12.00 5.73 23.11 1 49 25.30 97.20 52 1
1 144 0.01 4.41 28.61 0 55 28.87 2.06 63 0
2 118 0.08 3.48 32.28 1 52 29.14 3.81 46 0
3 170 7.50 6.41 38.03 1 51 31.99 24.26 58 1
4 134 13.60 3.50 27.78 1 60 25.99 57.34 49 1
1.0
Accuracy: 100.00%

Experiment-5: Develop a program for Bias, Variance, Remove duplicates , Cross Validation

19
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

SOURCE CODE:
import numpy as np

from sklearn.model_selection import train_test_split, cross_val_score

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error

data = np.loadtxt('prog5.csv', delimiter=',')

X_train, X_test, y_train, y_test = train_test_split(data[:, :-1], data[:, -1], test_size=0.2, random_state=42)

model = LinearRegression()

model.fit(X_train, y_train)

y_pred_train = model.predict(X_train)

y_pred_test = model.predict(X_test)

train_mse = mean_squared_error(y_train, y_pred_train)

test_mse = mean_squared_error(y_test, y_pred_test)

bias = np.mean(y_pred_test) - np.mean(y_test)

variance = np.var(y_pred_test)

print(f'Bias: {bias}')

print(f'Variance: {variance}')

# Remove duplicates from dataset

unique_data = np.unique(data, axis=0)

# Perform cross-validation

cv_scores = cross_val_score(model, unique_data[:, :-1], unique_data[:, -1], cv=5)

print(f'Cross-validation scores: {cv_scores}')

OUTPUT:

Bias: 1.136363636363635
Variance: 0.006448576675849442Cross-validation scores: [-3.99814069 nan
nan nan nan]

20
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

Experiment-6:
Build an Artificial Neural Network by implementing the Back propagation algorithm and test the same using
appropriate data sets.

Back propagation Algorithm:


1. Load data set
2.Assign all network inputs and output
3.Initialize all weights with small random numbers, typically between -1 and 1
repeat
for every pattern in the training set
Present the pattern to the network
// Propagated the input forward through the network: for
each layer in the network
for every node in the layer
1. Calculate the weight sum of the inputs to the node
2. Add the threshold to the sum
3. Calculate the activation for the node
end
end

// Propagate the errors backward through the network for


every node in the output layer
calculate the error signal end

21
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

for all hidden layers


for every node in the layer
1. Calculate the node’s signal error

2. Update each node's weight in the network end


end

// Calculate Global Error Calculate


the Error Function

end

while ((maximum number of iterations < than specified) AND


(Error Function is > than specified))

 Input layer with two inputs neurons


 One hidden layer with two neurons
 Output layer with a single neuron

22
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

SOURCE CODE:
import numpy as np

X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)

y = np.array(([92], [86], [89]), dtype=float)

# scale units

X = X/np.amax(X, axis=0)

y = y/100

class Neural_Network(object):

def __init__(self):

self.inputSize = 2

self.outputSize = 1

self.hiddenSize = 3

self.W1 = np.random.randn(self.inputSize, self.hiddenSize)

self.W2 = np.random.randn(self.hiddenSize, self.outputSize)

def forward(self, X):

self.z = np.dot(X, self.W1)

self.z2 = self.sigmoid(self.z)

self.z3 = np.dot(self.z2, self.W2)

o = self.sigmoid(self.z3)

return o

def sigmoid(self, s):

return 1/(1+np.exp(-s))

def sigmoidPrime(self, s):

return s * (1 - s)

23
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

def backward(self, X, y, o):

self.o_error = y - o

self.o_delta = self.o_error*self.sigmoidPrime(o)

self.z2_error = self.o_delta.dot(self.W2.T)

self.z2_delta = self.z2_error*self.sigmoidPrime(self.z2)

self.W1 += X.T.dot(self.z2_delta)

self.W2 += self.z2.T.dot(self.o_delta)

def train (self, X, y):

o = self.forward(X)

self.backward(X, y, o)

NN = Neural_Network()

for i in range(1000):

print ("\nInput: \n" + str(X))

print ("\nActual Output: \n" + str(y))

print ("\nPredicted Output: \n" + str(NN.forward(X)))

print ("\nLoss: \n" + str(np.mean(np.square(y - NN.forward(X)))))

NN.train(X, y)

OUTPUT:

Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]

Actual Output:
[[0.92]
[0.86]
[0.89]]

24
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

Predicted Output:
[[0.73180088]
[0.70478836]
[0.77438585]]

Loss:
0.024292064176480496

25
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

Experiment-7: Implement the non-parametric Locally Weighted Regression algorithm in order to


fit data points. Select appropriate data set for your experiment and draw graphs.

• Regression is a technique from statistics that is used to predict values of a desired target
quantity when the target quantity is continuous.
• In regression, we seek to identify (or estimate) a continuous variable y associated with a given
input vector x.
• y is called the dependent variable.
• x is called the independent variable.

Loess/Lowess Regression: Loess regression is a nonparametric technique that uses local weighted
regression to fit a smooth curve through points in a scatter plot.

Lowess Algorithm: Locally weighted regression is a very powerful non-parametric model used in
statistical learning .Given a dataset X, y, we attempt to find a model parameter β(x) that
minimizes residual sum of weighted squared errors. The weights are given by a kernel function(k
or w) which can be chosen arbitrarily .

26
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

Locally Weighted Regression Algorithm:

1. Read the Given data Sample to X and the curve (linear or non linear) to Y
2. Set the value for Smoothening parameter or free parameter say τ
3. Set the bias /Point of interest set X0 which is a subset of X
4. Determine the weight matrix using:

5. Determine the value of model term parameter β using :

6. Prediction = x0*β

SOURCE CODE:
from math import ceil

import numpy as np

from scipy import linalg

def lowess(x, y, f, iterations):

n = len(x)

r = int(ceil(f * n))

h = [np.sort(np.abs(x - x[i]))[r] for i in range(n)]

w = np.clip(np.abs((x[:, None] - x[None, :]) / h), 0.0, 1.0)

w = (1 - w ** 3) ** 3

yest = np.zeros(n)

delta = np.ones(n)

for iteration in range(iterations):

for i in range(n):

weights = delta * w[:, i]

b = np.array([np.sum(weights * y), np.sum(weights * y * x)])

27
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

A = np.array([[np.sum(weights), np.sum(weights * x)],[np.sum(weights * x), np.sum(weights * x * x)]])

beta = linalg.solve(A, b)

yest[i] = beta[0] + beta[1] * x[i]

residuals = y - yest

s = np.median(np.abs(residuals))

delta = np.clip(residuals / (6.0 * s), -1, 1)

delta = (1 - delta ** 2) ** 2

return yest

import math

n = 100

x = np.linspace(0, 2 * math.pi, n)

y = np.sin(x) + 0.3 * np.random.randn(n)

f =0.25

iterations=3

yest = lowess(x, y, f, iterations)

import matplotlib.pyplot as plt

plt.plot(x,y,"r.")

plt.plot(x,yest,"b-")

OUTPUT:

[<matplotlib.lines.Line2D at 0x1e4c9120640>]

28
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

Experiment-8:
Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model

29
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

to perform this task. Built-in Java classes/API can be used to write the program. Calculate the
accuracy, precision, and recall for your data set.

The dataset is divided into two parts, namely, feature matrix and the response vector.
 Feature matrix contains all the vectors(rows) of dataset in which each vector consists of the
value of dependent features. In above dataset, features are ‘Outlook’, ‘Temperature’,
‘Humidity’ and ‘Windy’.
 Response vector contains the value of class variable(prediction or output) for each row of
feature matrix. In above dataset, the class variable name is ‘Play golf’.

Types of Naive Bayes Algorithm

Gaussian Naive Bayes


When attribute values are continuous, an assumption is made that the values associated with each class
are distributed according to Gaussian i.e., Normal Distribution.
If in our data, an attribute say “x” contains continuous data. We first segment the data by the
class and then compute mean & Variance of each class.

MultiNomial Naive Bayes


MultiNomial Naive Bayes is preferred to use on data that is multinomially distributed. It is one of
the standard classic algorithms. Which is used in text categorization (classification). Each event in
text classification represents the occurrence of a word in a document.

30
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

Bernoulli Naive Bayes


Bernoulli Naive Bayes is used on the data that is distributed according to multivariate Bernoulli
distributions.i.e., multiple features can be there, but each one is assumed to be a binary-valued
(Bernoulli, boolean) variable. So, it requires features to be binary valued.

SOURCE CODE:
import pandas as pd

msg = pd.read_csv('p4.csv', names=['message', 'label'])

31
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

print("Total Instances of Dataset: ", msg.shape[0])

msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})

X = msg.message

y = msg.labelnum

from sklearn.model_selection import train_test_split

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)

from sklearn.feature_extraction.text import CountVectorizer

count_v = CountVectorizer()

Xtrain_dm = count_v.fit_transform(Xtrain)

Xtest_dm = count_v.transform(Xtest)

df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())

print(df[0:5])

from sklearn.naive_bayes import MultinomialNB

clf = MultinomialNB()

clf.fit(Xtrain_dm, ytrain)

pred = clf.predict(Xtest_dm)

for doc, p in zip(Xtrain, pred):

p = 'pos' if p == 1 else 'neg'

print("%s -> %s" % (doc, p))

from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score

print('Accuracy Metrics: \n')

print('Accuracy: ', accuracy_score(ytest, pred))

print('Recall: ', recall_score(ytest, pred))

print('Precision: ', precision_score(ytest, pred))

print('Confusion Matrix: \n', confusion_matrix(ytest, pred))

OUTPUT:

Total Instances of Dataset: 18

32
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

about am amazing an and awesome bad beers best boss ... today \
0 0 0 0 0 0 0 0 0 1 0 ... 0
1 0 0 0 0 0 0 0 0 0 0 ... 0
2 1 0 0 0 0 0 0 1 0 0 ... 0
3 0 0 0 0 0 0 0 0 0 0 ... 1
4 0 0 0 0 0 0 0 0 0 0 ... 0

tomorrow very view we went what will with work


0 0 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 0 0
2 0 1 0 0 0 0 0 0 0
3 0 0 0 0 1 0 0 0 0
4 0 0 0 0 0 0 0 1 0

[5 rows x 46 columns]
This is my best work -> pos
I love to dance -> neg
I feel very good about these beers -> pos
I went to my enemy's house today -> pos
I can't deal with this -> neg
Accuracy Metrics:

Accuracy: 1.0
Recall: 1.0
Precision: 1.0
Confusion Matrix:
[[2 0]
[0 3]]

Experiment-9:
Write a program to Implement Support Vector Machines and Principle Component Analysis

33
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

SOURCE CODE:
from sklearn import datasets

from sklearn.model_selection import train_test_split

from sklearn.svm import SVC

from sklearn.metrics import accuracy_score

import numpy as np

import matplotlib.pyplot as plt

iris = datasets.load_iris()

X = iris.data

y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

clf = SVC(kernel='linear')

clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

acc_score = accuracy_score(y_test, y_pred)

print("Accuracy score: {:.2f}%".format(acc_score*100))

OUTPUT:

Accuracy score: 97.78%

34
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

Experiment-10:
Write a program to Implement Principle Component Analysis

SOURCE CODE:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

%matplotlib inline

from sklearn.datasets import load_breast_cancer

cancer = load_breast_cancer()

df = pd.DataFrame(cancer['data'], columns = cancer['feature_names'])

df.head()

from sklearn.preprocessing import StandardScaler

scalar = StandardScaler()

scalar.fit(df)

scaled_data = scalar.transform(df)

from sklearn.decomposition import PCA

pca = PCA(n_components = 2)

pca.fit(scaled_data)

x_pca = pca.transform(scaled_data)

x_pca.shape

plt.figure(figsize =(8, 6))

plt.scatter(x_pca[:, 0], x_pca[:, 1], c = cancer['target'], cmap ='plasma')

plt.xlabel('First Principal Component')

plt.ylabel('Second Principal Component')

pca.components_

df_comp = pd.DataFrame(pca.components_, columns = cancer['feature_names'])

plt.figure(figsize =(14, 6))

35
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING

sns.heatmap(df_comp)

OUTPUT:

AxesSubplot:>

36

You might also like