Machine Learning External
Machine Learning External
Introduction
Machine learning
Machine learning is a subset of artificial intelligence in the field of computer science that often
uses statistical techniques to give computers the ability to "learn" (i.e., progressively improve
performance on a specific task) with data, without being explicitly programmed. In the past
decade, machine learning has given us self-driving cars, practical speech recognition, effective
web search, and a vastly improved understanding of the human genome.
Machine learning tasks are typically classified into two broad categories, depending on whether
there is a learning "signal" or "feedback" available to a learning system:
1. Supervised learning: The computer is presented with example inputs and their desired
outputs, given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs.
As special cases, the input signal can be only partially available, or restricted to special feedback:
3. Active learning: the computer can only obtain training labels for a limited set of instances
(based on a budget), and also has to optimize its choice of objects to acquire labels for. When
used interactively, these can be presented to the user for labeling.
4. Reinforcement learning: training data (in form of rewards and punishments) is given only as
feedback to the program's actions in a dynamic environment, such as driving a vehicle or playing
a game against an opponent.
2
MACHINE LEARNING LAB DMSSVH COLLEGE OF
ENGINEERING
5. Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to
find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden
patterns in data) or a means towards an end (feature learning).
In classification, inputs are divided into two or more classes, and the learner must produce a
model that assigns unseen inputs to one or more (multi-label classification) of these classes. This
is typically tackled in a supervised manner. Spam filtering is an example of classification, where
the inputs are email (or other) messages and the classes are "spam" and "not spam".
In regression, also a supervised problem, the outputs are continuous rather than discrete.
In clustering, a set of inputs is to be divided into groups. Unlike in classification, the groups are
not known beforehand, making this typically an unsupervised task.
Dimensionality reduction simplifies inputs by mapping them into a lower dimensional space.
Topic modeling is a related problem, where a program is given a list of human language
documents and is tasked with finding out which documents cover similar topics.
3
MACHINE LEARNING LAB DMSSVH COLLEGE OF
ENGINEERING
4.Deep learning
Falling hardware prices and the development of GPUs for personal use in the last few years have
contributed to the development of the concept of deep learning which consists of multiple hidden
layers in an artificial neural network. This approach tries to model the way the human brain
processes light and sound into vision and hearing. Some successful applications of deep learning
are computer vision and speech Recognition.
uniform representation for input examples, background knowledge, and hypotheses. Given an
encoding of the known background knowledge and a set of examples represented as a logical
database of facts, an ILP system will derive a hypothesized logic program that entails all positive
and no negative examples. Inductive programming is a related field that considers any kind of
programming languages for representing hypotheses (and not only logic programming), such as
functional programs.
7.Clustering
Cluster analysis is the assignment of a set of observations into subsets (called clusters) so that
observations within the same cluster are similar according to some pre designated criterion or
criteria, while observations drawn from different clusters are dissimilar. Different clustering
techniques make different assumptions on the structure of the data, often defined by some
similarity metric and evaluated for example by internal compactness (similarity between
members of the same cluster) and separation between different clusters. Other methods are based
on estimated density and graph connectivity. Clustering is a method of unsupervised learning,
and a common technique for statistical data analysis.
8.Bayesian networks
A Bayesian network, belief network or directed acyclic graphical model is a probabilistic
graphical model that represents a set of random variables and their conditional independencies
via a directed acyclic graph (DAG). For example, a Bayesian network could represent the
probabilistic relationships between diseases and symptoms. Given symptoms, the network can be
5
MACHINE LEARNING LAB DMSSVH COLLEGE OF
ENGINEERING
used to compute the probabilities of the presence of various diseases. Efficient algorithms exist
that perform inference and learning.
9.Reinforcement learning
Reinforcement learning is concerned with how an agent ought to take actions in an environment
so as to maximize some notion of long-term reward. Reinforcement learning algorithms attempt
to find a policy that maps states of the world to the actions the agent ought to take in those states.
Reinforcement learning differs from the supervised learning problem in that correct input/output
pairs are never presented, nor sub-optimal actions explicitly corrected.
11.Genetic algorithms
A genetic algorithm (GA) is a search heuristic that mimics the process of natural selection, and
uses methods such as mutation and crossover to generate new genotype in the hope of finding
good solutions to a given problem. In machine learning, genetic algorithms found some uses in
the 1980s and 1990s. Conversely, machine learning techniques have been used to improve the
performance of genetic and evolutionary algorithms.
7
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
Experiment-1:
Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis based
on a given set of training data samples. Read the training data from a .CSV file.
FIND-S Algorithm
Then do nothing
3. Output hypothesis h
Training Examples:
SOURCE CODE:
import pandas as pd
import numpy as np
def find_s(c,t):
specific_hypothesis=[]
specific_hypothesis = c[i].copy()
if t[i] == "Yes":
for x in range(len(specific_hypothesis)):
if val[x] != specific_hypothesis[x]:
8
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
specific_hypothesis[x] = '?'
else:
pass
return specific_hypothesis
d = pd.read_csv('train.csv')
print("Training Data:")
print(d)
a = np.array(d)[:,:-1]
t = np.array(d)[:,-1]
print(find_s(a,t))
OUTPUT:
Training Data:
Sky AirTemp Humidity Wind Water Forecast EnjoySport
0 Sunny Warm Normal Strong Warm Same Yes
1 Sunny Warm High Strong Warm Same Yes
2 Rainy Cold High Strong Warm Change No
3 Sunny Warm High Strong Cool Change Yes
9
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
Experiment-2:
For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate- Elimination algorithm to output a description of the set of all hypotheses consistent
with the training examples.
Candidate-Elimination Algorithm:
1. Load data set
2. G <-maximally general hypotheses in H
3. S <- maximally specific hypotheses in H
4. For each training example d=<x,c(x)>
Case 1 : If d is a positive example
Remove from G any hypothesis that is inconsistent with d For
each hypothesis s in S that is not consistent with d
• Remove s from S.
• Add to S all minimal generalizations h of s such that
• h consistent with d
• Some member of G is more general than h
• Remove from S any hypothesis that is more general than another hypothesis in S
*Remove g from G.
*Add to G all minimal specializations h of g such that
o h consistent with d
• Remove from G any hypothesis that is less general than another hypothesis in G
SOURCE CODE:
import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('trainingdata3.csv'))
print(data)
10
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
concepts = np.array(data.iloc[:,0:-1])
print(concepts)
target = np.array(data.iloc[:,-1])
print(target)
'''
learn() function implements the learning method of the Candidate elimination algorithm.
Arguments:
'''
# .copy() makes sure a new list is created instead of just pointing to the same memory location
specific_h = concepts[0].copy()
print(specific_h)
#print(h)
print(general_h)
for i, h in enumerate(concepts):
if target[i] == "Yes":
11
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
if target[i] == "No":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(specific_h)
print(general_h)
# find indices where we have empty rows, meaning those that are unchanged
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
12
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
OUTPUT:
Final Specific_h:
[' Sunny' 'Warm' '?' 'Strong' '?' '?']
Final General_h:
[[' Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?', '?']]
13
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
Experiment-3:
Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new
sample.
14
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
SOURCE CODE:
import pandas as pd
data = pd.read_csv('tennisdata.csv')
X = data.iloc[:,:-1]
y = data.iloc[:,-1]
15
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
le_outlook = LabelEncoder()
X.Outlook = le_outlook.fit_transform(X.Outlook)
le_Temperature = LabelEncoder()
X.Temperature = le_Temperature.fit_transform(X.Temperature)
le_Humidity = LabelEncoder()
X.Humidity = le_Humidity.fit_transform(X.Humidity)
le_Windy = LabelEncoder()
X.Windy = le_Windy.fit_transform(X.Windy)
le_PlayTennis = LabelEncoder()
y = le_PlayTennis.fit_transform(y)
## Train model
classifier = DecisionTreeClassifier()
classifier.fit(X,y)
def labelEncoderForInput(list1):
list1[0] = le_outlook.transform([list1[0]])[0]
list1[1] = le_Temperature.transform([list1[1]])[0]
list1[2] = le_Humidity.transform([list1[2]])[0]
list1[3] = le_Windy.transform([list1[3]])[0]
return [list1]
#inp = ["Rainy","Mild","High","False"]
inp1=["Rainy","Cool","High","False"]
pred1 = labelEncoderForInput(inp1)
16
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
y_pred = classifier.predict(pred1)
y_pred
OUTPUT:
The first 5 values of data is
Outlook Temperature Humidity Windy PlayTennis
0 Sunny Hot High False No
1 Sunny Hot High True No
2 Overcast Hot High False Yes
3 Rainy Mild High False YeS
4 Rainy Cool Normal False Yes
17
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
Experiment-4:
Exercises to solve the real-world problems using the following machine learning methods: a)
Linear Regression b) Logistic Regression c) Binary Classifier
SOURCE CODE:
a) Linear Regression:
import numpy as np
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 6, 8, 10])
lr = LinearRegression()
lr.fit(X, y)
y_pred = lr.predict(X)
OUTPUT:
Coefficients:
[2.]
Intercept:
-1.7763568394002505e-15
Predicted values:
[ 2. 4. 6. 8. 10.]
18
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
b)Logistic Regression:
SOURCE CODE:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
data = pd.read_csv("4pr.csv")
print(data.head())
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=0)
model = LogisticRegression()
model.fit(X_train,y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(accuracy)
print("Accuracy: {:.2f}%".format(accuracy * 100))
OUTPUT:
sbp tobacco ldl adiposity famhist typea obesity alcohol age chd
0 160 12.00 5.73 23.11 1 49 25.30 97.20 52 1
1 144 0.01 4.41 28.61 0 55 28.87 2.06 63 0
2 118 0.08 3.48 32.28 1 52 29.14 3.81 46 0
3 170 7.50 6.41 38.03 1 51 31.99 24.26 58 1
4 134 13.60 3.50 27.78 1 60 25.99 57.34 49 1
1.0
Accuracy: 100.00%
Experiment-5: Develop a program for Bias, Variance, Remove duplicates , Cross Validation
19
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
SOURCE CODE:
import numpy as np
X_train, X_test, y_train, y_test = train_test_split(data[:, :-1], data[:, -1], test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred_train = model.predict(X_train)
y_pred_test = model.predict(X_test)
variance = np.var(y_pred_test)
print(f'Bias: {bias}')
print(f'Variance: {variance}')
# Perform cross-validation
OUTPUT:
Bias: 1.136363636363635
Variance: 0.006448576675849442Cross-validation scores: [-3.99814069 nan
nan nan nan]
20
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
Experiment-6:
Build an Artificial Neural Network by implementing the Back propagation algorithm and test the same using
appropriate data sets.
21
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
end
22
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
SOURCE CODE:
import numpy as np
# scale units
X = X/np.amax(X, axis=0)
y = y/100
class Neural_Network(object):
def __init__(self):
self.inputSize = 2
self.outputSize = 1
self.hiddenSize = 3
self.z2 = self.sigmoid(self.z)
o = self.sigmoid(self.z3)
return o
return 1/(1+np.exp(-s))
return s * (1 - s)
23
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
self.o_error = y - o
self.o_delta = self.o_error*self.sigmoidPrime(o)
self.z2_error = self.o_delta.dot(self.W2.T)
self.z2_delta = self.z2_error*self.sigmoidPrime(self.z2)
self.W1 += X.T.dot(self.z2_delta)
self.W2 += self.z2.T.dot(self.o_delta)
o = self.forward(X)
self.backward(X, y, o)
NN = Neural_Network()
for i in range(1000):
NN.train(X, y)
OUTPUT:
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
24
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
Predicted Output:
[[0.73180088]
[0.70478836]
[0.77438585]]
Loss:
0.024292064176480496
25
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
• Regression is a technique from statistics that is used to predict values of a desired target
quantity when the target quantity is continuous.
• In regression, we seek to identify (or estimate) a continuous variable y associated with a given
input vector x.
• y is called the dependent variable.
• x is called the independent variable.
Loess/Lowess Regression: Loess regression is a nonparametric technique that uses local weighted
regression to fit a smooth curve through points in a scatter plot.
Lowess Algorithm: Locally weighted regression is a very powerful non-parametric model used in
statistical learning .Given a dataset X, y, we attempt to find a model parameter β(x) that
minimizes residual sum of weighted squared errors. The weights are given by a kernel function(k
or w) which can be chosen arbitrarily .
26
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
1. Read the Given data Sample to X and the curve (linear or non linear) to Y
2. Set the value for Smoothening parameter or free parameter say τ
3. Set the bias /Point of interest set X0 which is a subset of X
4. Determine the weight matrix using:
6. Prediction = x0*β
SOURCE CODE:
from math import ceil
import numpy as np
n = len(x)
r = int(ceil(f * n))
w = (1 - w ** 3) ** 3
yest = np.zeros(n)
delta = np.ones(n)
for i in range(n):
27
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
beta = linalg.solve(A, b)
residuals = y - yest
s = np.median(np.abs(residuals))
delta = (1 - delta ** 2) ** 2
return yest
import math
n = 100
x = np.linspace(0, 2 * math.pi, n)
f =0.25
iterations=3
plt.plot(x,y,"r.")
plt.plot(x,yest,"b-")
OUTPUT:
[<matplotlib.lines.Line2D at 0x1e4c9120640>]
28
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
Experiment-8:
Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model
29
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
to perform this task. Built-in Java classes/API can be used to write the program. Calculate the
accuracy, precision, and recall for your data set.
The dataset is divided into two parts, namely, feature matrix and the response vector.
Feature matrix contains all the vectors(rows) of dataset in which each vector consists of the
value of dependent features. In above dataset, features are ‘Outlook’, ‘Temperature’,
‘Humidity’ and ‘Windy’.
Response vector contains the value of class variable(prediction or output) for each row of
feature matrix. In above dataset, the class variable name is ‘Play golf’.
30
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
SOURCE CODE:
import pandas as pd
31
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
X = msg.message
y = msg.labelnum
count_v = CountVectorizer()
Xtrain_dm = count_v.fit_transform(Xtrain)
Xtest_dm = count_v.transform(Xtest)
df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
print(df[0:5])
clf = MultinomialNB()
clf.fit(Xtrain_dm, ytrain)
pred = clf.predict(Xtest_dm)
OUTPUT:
32
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
about am amazing an and awesome bad beers best boss ... today \
0 0 0 0 0 0 0 0 0 1 0 ... 0
1 0 0 0 0 0 0 0 0 0 0 ... 0
2 1 0 0 0 0 0 0 1 0 0 ... 0
3 0 0 0 0 0 0 0 0 0 0 ... 1
4 0 0 0 0 0 0 0 0 0 0 ... 0
[5 rows x 46 columns]
This is my best work -> pos
I love to dance -> neg
I feel very good about these beers -> pos
I went to my enemy's house today -> pos
I can't deal with this -> neg
Accuracy Metrics:
Accuracy: 1.0
Recall: 1.0
Precision: 1.0
Confusion Matrix:
[[2 0]
[0 3]]
Experiment-9:
Write a program to Implement Support Vector Machines and Principle Component Analysis
33
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
SOURCE CODE:
from sklearn import datasets
import numpy as np
iris = datasets.load_iris()
X = iris.data
y = iris.target
clf = SVC(kernel='linear')
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
OUTPUT:
34
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
Experiment-10:
Write a program to Implement Principle Component Analysis
SOURCE CODE:
import pandas as pd
import numpy as np
%matplotlib inline
cancer = load_breast_cancer()
df.head()
scalar = StandardScaler()
scalar.fit(df)
scaled_data = scalar.transform(df)
pca = PCA(n_components = 2)
pca.fit(scaled_data)
x_pca = pca.transform(scaled_data)
x_pca.shape
pca.components_
35
MACHINE LEARNING LAB DMSSVH COLLEGE OF ENGINEERING
sns.heatmap(df_comp)
OUTPUT:
AxesSubplot:>
36