Deep Learning Lab
Deep Learning Lab
VISION
To become self-sustainable institution which is recognized for its new age engineering through
innovative teaching and learning culture, inculcating research and entrepreneurial ecosystem, and
sustainable social impact in the community.
MISSION
To offer undergraduate and post-graduate programs that is supported through industry relevant
curriculum and innovative teaching and learning processes that would help students succeed in
their professional careers.
To provide necessary support structures for students, which will contribute to their personal and
professional growth and enable them to become leaders in their respective fields.
To provide faculty and students with an ecosystem that fosters research and development through
strategic partnerships with government organisations and collaboration with industries.
To contribute to the development of the region by using our technological expertise to work with
nearby communities and support them in their social and economic growth.
To provide high quality technical education to students that will enable life-long learning and build
expertise in advanced technologies in Computer Science and Engineering.
To encourage professional development of students that will inculcate ethical values and leadership
skills while working with the community to address societal issues
ProgramEducationalObjectives(PEOs):
AgraduateoftheComputerScience andEngineeringProgramshould:
Page 1
2
ProgramEducationalObjective1:(PEO1)
The Graduates will provide solutions to difficult and challenging issues in their profession by applying
computer science and engineering theory and principles.
ProgramEducationalObjective2:(PEO2)
The Graduates have successful careers in computer science and engineering fields or will be able to
successfully pursue advanced degrees.
ProgramEducationalObjective3:(PEO3)
The Graduates will communicate effectively, work collaboratively and exhibit high levels of
Professionalism, moral and ethical responsibility.
ProgramEducationalObjective4:(PEO4)
The Graduates will develop the ability to understand and analyse Engineering issues in a broader
perspective with ethical responsibility towards sustainable development.
ProgramOutcomes(POs):
PO1 Engineeringknowledge:Applytheknowledgeofmathematics,science,engineering
Fundamentals,andanengineeringspecializationtothesolutionofcomplexengineeringproblems.
PO2 Problemanalysis:Identify,formulate,reviewresearchliterature,andanalyzecomplexengineering
problems reaching substantiated conclusions using first principles ofmathematics,natural
sciences, andengineeringsciences.
PO3 Design/developmentofsolutions:Designsolutionsforcomplexengineeringproblemsanddesign
system components or processes that meet the specified needs with appropriateconsideration
for the public health and safety, and the cultural, societal, and environmentalconsiderations.
PO5 Modern tool usage: Create, select, and apply appropriate techniques, resources, and
modernengineering and IT tools including prediction and modeling to complex engineering
activitieswithan understandingofthelimitations.
PO6 The engineer and society: Apply reasoning informed by the contextual knowledge to
assesssocietal, health, safety, legal and cultural issues and the consequent responsibilities
relevant totheprofessional engineeringpractice.
Page 2
3
PO8 Ethics: Apply ethical principles and commit to professional ethics and responsibilities
andnormsoftheengineering practice.
PO9 Individualand teamwork:Functioneffectivelyasanindividual,andasa memberorleader
indiverseteams,andinmultidisciplinarysettings.
ProgramSpecificOutcomes(PSOs):
PSO1 Problem Solving Skills – Graduate will be able to apply computational techniques and
software principles to solve complex engineering problems pertaining to software engineering.
PSO2
Professional Skills – Graduate will be able to think critically, communicate effectively, and
collaborate in teams through participation in co and extra-curricular activities.
PSO3
Successful Career – Graduates will possess a solid foundation in computer science and
engineering that will enable them to grow in their profession and pursue lifelong learning
through post-graduation and professional development.
Page 3
4
LIST OF EXPERIMENTS
Given the following data, which specify classifications for nine ombinations
of VAR1 and VAR2 predict a classification for a case where VAR1=0.906
and VAR2=0.606, using the result of k-means clustering with 3 means (i.e., 3
centroids)
Page 4
5
credit-worthiness.
Input attributes are (from left to right) income, recreation, job, status, age-
group, home-owner. Find the unconditional probability of `golf' and the
conditional probability of `single' given `medRisk' in the dataset?
Page 5
6
INTRODUCTION TO LAB:
Machine Learning is used anywhere from automating mundane tasks to offering intelligent insights,
industries in every sector try to benefit from it. You may already be using a device that utilizes it. For
example, a wearable fitness tracker like Fitbit, or an intelligent home assistant like Google Home. But
there are much more examples of ML in use.
Prediction:Machine learning can also be used in the prediction systems. Considering the loan
example, to compute the probability of a fault, the system will need to classify the available data in
groups.
Image recognition:Machine learning can be used for face detection in an image as well. There is a
separate category for each person in a database of several people.
Speech Recognition:It is the translation of spoken words into the text. It is used in voice searches
and more. Voice user interfaces include voice dialing, call routing, and appliance control. It can
also be used a simple data entry and the preparation of structured documents.
Medical diagnoses:ML is trained to recognize cancerous tissues.
Financial industry:andtrading:companies use ML in fraud investigations and credit checks.
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
In Supervised learning, an AI system is presented with data which is labeled, which means that each data
tagged with the correct label.
Page 6
7
The goal is to approximate the mapping function so well that when you have new input data (x) that you
can predict the output variables (Y) for that data.
Page 7
8
As shown in the above example, we have initially taken some data and marked them as ‘Spam’ or ‘Not
Spam’. This labeled data is used by the training supervised model, this data is used to train the model.
Once it is trained we can test our model by testing it with some test new mails and checking of the model
is able to predict the right output.
Classification: A classification problem is when the output variable is a category, such as “red” or
“blue” or “disease” and “no disease”.
Regression: A regression problem is when the output variable is a real value, such as “dollars” or
“weight”.
In unsupervised learning, an AI system is presented with unlabeled, uncategorized data and the system’s
algorithms act on the data without prior training. The output is dependent upon the coded algorithms.
Subjecting a system to unsupervised learning is one way of testing AI.
Clustering: A clustering problem is where you want to discover the inherent groupings in the data,
such as grouping customers by purchasing behavior.
Page 8
9
Association: An association rule learning problem is where you want to discover rules that describe
large portions of your data, such as people that buy X also tend to buy Y.
A reinforcement learning algorithm, or agent, learns by interacting with its environment. The agent
receives rewards by performing correctly and penalties for performing incorrectly. The agent learns
without intervention from a human by maximizing its reward and minimizing its penalty. It is a type of
dynamic programming that trains algorithms using a system of reward and punishment.
Page 9
10
in the above example, we can see that the agent is given 2 options i.e. a path with water or a path with fire.
A reinforcement algorithm works on reward a system i.e. if the agent uses the fire path then the rewards
are subtracted and agent tries to learn that it should avoid the fire path. If it had chosen the water path or
the safe path then some points would have been added to the reward points, the agent then would try to
learn what path is safe and what path isn’t.
It is basically leveraging the rewards obtained; the agent improves its environment knowledge to select the
next action.
Page 10
11
PROGRAM 1
The probability thatitisFridayandthatastudentisabsentis3%.Sincethereare5school
daysinaweek,theprobabilitythatitisFridayis20%.Whatisthe probabilitythatastudent is
absent given that today is Friday? Apply Baye’s rule in python to get the result. (Ans:
15%)
AIM: To find the probability that a student is absent given that today is Friday.
DESCRIPTION:
Machine learning is a method of data analysis that automates analytical model building of data set.
Using the implemented algorithms that iteratively learn from data, machine learning allows computers
to find hidden insights without being explicitly programmed where to look. Naive bayes algorithm is
one of the most popular machines learning technique. In this article we will look how to implement
Naive bayes algorithm using python.
Before someone can understand Bayes’ theorem, they need to know a couple of related concepts first,
namely, the idea of Conditional Probability, and Bayes’ Rule.
Conditional Probability is just what is the probability that something will happen, given that something
else has already happened.
Let say we have a collection of people. Some of them are singers. They are either male or female. If we
select a random sample, what is the probability that this person is a male? what is the probability that
this person is a male and singer? Conditional Probability is the best option here. We can calculate
probability like,
We can simply define Bayes rule like this. Let A1, A2, …, An be a set of mutually exclusive events that
together form the sample space S. Let B be any event from the same sample space, such that P(B) > 0.
Then, P( Ak | B ) = P( Ak ∩ B ) / P( A1 ∩ B ) + P( A2 ∩ B ) + . . . + P( An ∩ B )
Page 11
12
Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes’
theorem with strong (naive) independence assumptions between the features in machine learning.
Basically we can use above theories and equations for classification problem.
SOURCE CODE:
probAbsentFriday=0.0
3 probFriday=0.2
# bayes Formula
#p(Absent|Friday)=p(Friday|Absent)p(Absent)/p(Friday)
#p(Friday|Absent)=p(Friday∩Absent)/p(Absent)
# Therefore the result is:
bayesResult=(probAbsentFriday/probFriday)
print(bayesResult * 100)
Output: 15
Bayesian Network is used to represent the graphical model for probability relationship among a set of
variables. Bayes’ theorem is a way to figure out conditional probability. Conditional probability is the
probability of an event happening, given that it has some relationship to one or more other events. For
example, your probability of getting a parking space is connected to the time of day you park, where
you park, and what conventions are going on at any time. Bayes’ theorem is slightly more nuanced. In a
nutshell, it gives you the actual probability of an event given information about tests.
Page 12
13
“Events” Are different from “tests.” For example, there is a test for liver disease, but that’s
separate from the event of actually having liver disease.
Tests are flawed: just because you have a positive test does not mean you actually have the
disease. Many tests have a high false positive rate. Rare events tend to have higher false positive
rates than more common events. We’re not just talking about medical tests here. For example, spam
filtering can have high false positive rates. Bayes’ theorem takes the test results and calculates your real
probability that the test has identified the event.
2. Can you give any real time example using Bayes’ Theorem (liver disease).
You might be interested in finding out a patient’s probability of having liver disease if they are an
alcoholic. “Being an alcoholic” is the test (kind of like a litmus test) for liver disease.
A could mean the event “Patient has liver disease.” Past data tells you that 10% of patients
entering your clinic have liver disease. P(A) = 0.10.
B could mean the litmus test that “Patient is an alcoholic.” Five percent of the clinic’s patients are
alcoholics. P(B) = 0.05.
You might also know that among those patients diagnosed with liver disease, 7% are alcoholics.
This is your B|A: the probability that a patient is alcoholic, given that they have liver disease, is 7%.
P(A|B)=(0.07*0.1)/0.05=0.14
In other words, if the patient is an alcoholic, their chances of having liver disease is 0.14 (14%).
This is a large increase from the 10% suggested by past data. But it’s still unlikely that any particular
patient has liver disease.
3. Bayes’ Theorem Examples 2: what is the probability that they will be prescribed pain pills?
Another way to look at the theorem is to say that one event follows another. Above I said “tests” and
“events”, but it’s also legitimate to think of it as the “first event” that leads to the “second event.”
There’s no one right way to do this: use the terminology that makes most sense to you.
In a particular pain clinic, 10% of patients are prescribed narcotic pain killers. Overall, five percent of
the clinic’s patients are addicted to narcotics (including pain killers and illegal substances). Out of all
the people prescribed pain pills, 8% are addicts. If a patient is an addict, what is the probability that
they will be prescribed pain pills?
Page 13
14
Step 1:Figure out what your event “A” is from the question. That information is in the italicized part
of this particular question. The event that happens first (A) is being prescribed pain pills. That’s given as
10%.
Step 2:Figure out what your event “B” is from the question. That information is also in the italicized
part of this particular question. Event B is being an addict. That’s given as 5%.
Step 3:Figure out what the probability of event B (Step 2) given event A (Step 1). In other words,
find what (B|A) is. We want to know “Given that people are prescribed pain pills, what’s the probability
they are an addict?” That is given in the question as 8%, or .8.
Step 4:Insert your answers from Steps 1, 2 and 3 into the formula and solve.
P(A|B) = P(B|A) * P(A) / P(B) = (0.08 * 0.1)/0.05 = 0.16
4. Bayes’ Theorem Examples 3: the Medical Test if a person gets a positive test result.
what are the odds they actually have the genetic defect?
A slightly more complicated example involves a medical test (in this case, a genetic test):
There are several forms of Bayes’ Theorem out there, and they are all equivalent (they are just written
in slightly different ways). In this next equation, “X” is used in place of “B.” In addition, you’ll see
some changes in the denominator. The proof of why we can rearrange the equation like this is beyond
the scope of this article (otherwise it would be 5,000 words instead of 2,000!). However, if you come
across a question involving medical tests, you’ll likely be using this alternative formula to find the
answer:
Page 14
15
The first step into solving Bayes’ theorem problems is to assign letters to events:
A = chance of having the faulty gene. That was given in the question as 1%. That also means the
probability of not having the gene (~A) is 99%.
X = A positive test result.
So:
Now we have all of the information we need to put into the equation:
P(A|X) = (.9 * .01) / (.9 * .01 + .096 * .99) = 0.0865 (8.65%).
5.Given the following statistics, what is the probability that a woman has cancer if she has a
positive mammogram result?
Step 1: Assign events to A or X. You want to know what a woman’s probability of having cancer is,
given a positive mammogram. For this problem, actually having cancer is A and a positive test result is
X.
Step 2: List out the parts of the equation (this makes it easier to work the actual equation):
P(A)=0.01
P(~A)=0.99
P(X|A)=0.9
P(X|~A)=0.08
Step 3: Insert the parts into the equation and solve. Note that as this is a medical test, we’re using the
form of the equation from example #2:
(0.9 * 0.01) / ((0.9 * 0.01) + (0.08 * 0.99) = 0.10.
The probability of a woman having cancer, given a positive test result, is 10%.
Page 15
16
PROGRAM 2
You’ll learn the following MySQL SELECT operations from Python using a ‘MySQL
Connector Python’ module.
Execute the SELECT query and process the result set returned by the queryin Python.
Use Python variables in a where clause of a SELECT query to pass dynamic
values.
Use fetchall(), fetchmany(), and fetchone() methods of a cursor class to fetch all or
limited rows from atable.
Python Select from MySQL Table
You’ll learn the following MySQL SELECT operations from Python using a ‘MySQL Connector
Python’ module
.
Execute the SELECT query and process the result set returned by the queryin Python.
Use Python variables in a where clause of a SELECT query to pass dynamic values.
Use fetchall(),fetchmany(), and fetchone() methods of a cursor class to fetch all or limited rows
from atable.
Steps to fetch rows from a MySQL database table Follow these steps:–
SOURCE CODE:
importpymysql
defmysqlconnect():
# To connect MySQL
database conn = pymysql.connect(
host='localhost', user='root',
password = "pass",
db='College',
Page 16
17
cur = conn.cursor()
cur.execute("select @@version")
output = cur.fetchall() print(output)
OUTPUT:
Page 17
18
Refer to Python MySQL database connection to connect to MySQL database from Python using
MySQL Connector module
Next, prepare a SQL SELECT query to fetch rows from a table. You can select all or limited rows based
on your requirement. If the where condition is used, then it decides the number of rows to fetch.
For example, SELECT col1, col2,…colnN FROM MySQL_table WHERE id = 10;. This will return row number
10.
Next, use a connection.cursor() method to create a cursor object. This method creates a new MySQLCursor
object.
Page 18
19
After successfully executing a Select operation, Use the fetchall() method of a cursor object to get all
rows from a query result. it returns a list of rows.
Iterate a row list using a for loop and access each row individually (Access each row’s column data
using a column name or index number.)
usecursor.clsoe() and connection.clsoe() method to close open connections after your work completes.
Page 19
20
PROGRAM 3
DESCRIPTION:
This algorithm is used to solve the classification model problems. K-nearest neighbor or K-NN
algorithm basically creates an imaginary boundary to classify the data. When new data points come in,
the algorithm will try to predict that to the nearest of the boundary line.
Therefore, larger k value means smother curves of separation resulting in less complex models.
Whereas, smaller k value tends to over fit the data and resulting in complex models.
It’s very important to have the right k-value when analyzing the dataset to avoid over fitting and under
fitting of the dataset.
Page 20
21
SOURCE CODE:
# Import necessarymodules
fromsklearn.neighbors importKNeighborsClassifier from
sklearn.model_selection import train_test_split from
sklearn.datasets importload_iris
Page 21
22
3.5 OUTPUT:
KNN(K-nearest neighbors) is a supervised learning and non-parametric algorithm that can be used to
solve both classification and regression problem statements.
It uses data in which there is a target column present i.e, labeled data to model a function to produce an
output for the unseen data. It uses the Euclidean distance formula to compute the distance between the data
points for classification or prediction.
The main objective of this algorithm is that similar data points must be close to each other so it uses the
distance to calculate the similar points that are close to each other.
Page 22
23
The term “non-parametric” refers to not making any assumptions on the underlying data distribution.
These methods do not have any fixed numbers of parameters in the model.
Similarly in KNN, the model parameters grow with the training data by considering each training case as a
parameter of the model. So, KNN is a non-parametric algorithm.
K represents the number of nearest neighbors you want to select to predict the class of a given item, which
is coming as an unseen dataset for the model.
4. Why is the odd value of “K” preferred over even values in the KNN Algorithm?
The odd value of K should be preferred over even values in order to ensure that there are no ties in the
voting. If the square root of a number of data points is even, then add or subtract 1 to it to make it odd.
5. How does the KNN algorithm make the predictions on the unseen dataset?
The following operations have happened during each iteration of the algorithm. For each of the unseen or
test data point, the kNN classifier must:
Step-1: Calculate the distances of test point to all points in the training set and store them
Page 23
24
PROGRAM-4
4.Given the following data, which specify classifications for nine combinations of VAR1
and VAR2 predict a classification for a case where VAR1=0.906 and VAR2=0.606, using
the result of k-means clustering with 3 means (i.e., 3centroids)
SOURCE CODE:
import numpy as np
y=np.array([0,1,1,0,1,0,1,1,1])
4.2.OUTPUT:
kmeans.predict([[0.906, 0.606]])
K Means algorithm is a centroid-based clustering (unsupervised) technique. This technique groups the
dataset into k different clusters having an almost equal number of points. Each of the clusters has a
centroid point which represents the mean of the data points lying in that cluster.
The idea of the K-Means algorithm is to find k-centroid points and every point in the dataset will belong to
either of the k-sets having minimum Euclidean distance.
Page 24
25
Yes, K-Means typically needs to have some form of normalization done on the datasets to work properly
since it is sensitive to both the mean and variance of the datasets.
For performing feature scaling, generally, StandardScaler is recommended, but depending on the specific
use cases, other techniques might be more suitable as well.
For Example, let’s have 2 variables, named age and salary where age is in the range of 20 to 60 and
salary is in the range of 100-150K, since scales of these variables are different so when these variables are
substituted in the euclidean distance formula, then the variable which is on the large scale suppresses the
variable which is on the smaller scale. So, the impact of age will not be captured very clearly. Hence, you
have to scale the variables to the same range using Standard Scaler, Min-Max Scaler, etc.
3. Which metrics can you use to find the accuracy of the K means Algorithm?
There does not exist a correct answer to this question as k means being an unsupervised learning technique
does not discuss anything about the output column. As a result, one can not get the accuracy number or
values from the algorithm directly.
Advantages:
Disadvantages:
5. What are the ways to avoid the problem of initialization sensitivity in the K means Algorithm?
Repeat K means: It basically repeats the algorithm again and again along with initializing the
centroids followed by picking up the cluster which results in the small intracluster distance and large
Page 25
26
intercluster distance.
K Means++: It is a smart centroid initialization technique.
Page 26
27
PROGRAM 5
5. The Following
Training Examples Map Descriptions Of Individuals Onto High, Medium
And LowCredit-Worthiness.
SOURCE CODE:
Input attributes are (from left to right) income, recreation, job, status, age-group, home-
owner.Findtheunconditionalprobabilityof`golf'andtheconditionalprobabilityofsingle' given
`medRisk' in thedataset?
totalRecords=10
numberGolfRecreation=4
probGolf=numberGolfRecreation/totalRecordsprint("Unconditio
# bayes Formula
#p(single|medRisk)=p(medRisk|single)p(single)/p(medRisk)
#p(medRisk|single)=p(medRisk ∩single)/p(single)
Page 27
28
numberMedRiskSingle=2
numberMedRisk=3
probMedRiskSingle=numberMedRiskSingle/totalRecordspr
obMedRisk=numberMedRisk/totalRecordsconditionalProba
bility=(probMedRiskSingle/probMedRisk)
OUTPUT:
K Means++ algorithm is a smart technique for centroid initialization that initialized one centroid while
ensuring the others to be far away from the chosen one resulting in faster convergence.
Step-2: Compute the distance of all points in the dataset from the selected centroid. The distance of xi
point from the farthest centroid can be calculated by the given formula:
where,
Step-3: Make the point xi as the new centroid that is having maximum probability proportional to di
Page 28
29
Step-4: Repeat the above last two steps till you find k centroids.
where,
“K*N*M”
Prediction needs to be computed for each record, the distance to each cluster and assigned to the nearest
ones.
3. Is it possible that the assignment of data points to clusters does not change between successive
iterations in the K means Algorithm?
When the K-Means algorithm has reached the local or global minima, it will not change the assignment of
data points to clusters for two successive iterations during the algorithm run.
Page 29
30
1. Elbow method
steps:
step4: The location of bend in the plot is generally considered an indicator of the approximate number of
clusters.
Page
31
6. Why do you prefer Euclidean distance over Manhattan distance in the K means Algorithm?
Euclidean distance is preferred over Manhattan distance since Manhattan distance calculates distance
only vertically or horizontally due to which it has dimension restrictions.
On the contrary, Euclidean distance can be used in any space to calculate the distances between the data
points. Since in K means algorithm the data points can be present in any dimension, so Euclidean
distance is a more suitable option.
Page 31
32
PROGRAM 6:
AIM:
DESCRIPTION:
Regression: Regression analysis is one of the most important fields in statistics and machine
learning.There are many regression methods available. Linear regression is one of them
What Is Regression?
Regression analysis is one of the most important fields in statistics and machine
learning.Thereare many regression methods available. Linear regression is one of
them.Regression searches for relationships among variables.For example, you can observe
several employees of some company and try to understand how their salaries depend on the
features, such as experience, level of education, role, city they work in, and so on.This is a
regression problem where data related to each employee represent one observation.The
presumption is that the experience, education, role, and city are the independent features,
while the salary depends onthem.Generally, in regression analysis, you usually consider some
phenomenon of interest andTherehave a number of observations. Each observation has two or
more features. Following the assumption
that(atleast)oneofthefeaturesdependsontheothers,youtrytoestablisharelationamongthem.you
need to find a function that maps some features or variables to others sufficiently well. The
dependent features are called the dependent variables, outputs, or responses.The
independent features are called the independent variables, inputs, or predictors.
LinearRegression:
Linear regression is probably one of the most important and widely used regression
techniques. It’s among the simplest regression methods. One of its main advantages is the
ease of interpretingresults.When implementing linear regression of some dependent variable
�on the set of independent variables x= (x1 …xrᵣ) , where ris the number of predictors, you
assume a linear relationship between y andx: y= β0 + β1x1 + ……+ βrxr+ ϵ. This equation is
the regression equation. β0,�, βrᵣaretheregressioncoefficients, and �is therandomerror.
Linear regression calculates the estimators of the regression coefficients or simply the predicted
weights, denoted with b1…br. They define the estimated regressionfunction f(x) = b0 +b1x1 +
Page 32
33
….+brxr This function should capture the dependencies between the inputsand output sufficiently
well.
When implementing simple linear regression, you typically start with a given set of input-output
(x-y) pairs (green circles). These pairs are your observations. For example, the leftmost
observation (green circle) has the input x= 5 and the actual output (response) y= 5. The next one
has x= 15 and y= 20, and so on.
Theestimatedregressionfunction(blackline)hastheequationf(x)=b0+b1x.Yourgoalistocalculatetheopti
malvaluesofthepredictedweightsb0andb1thatminimizeSSRanddetermine
theestimatedregressionfunction.Thevalueofb0,alsocalledtheintercept,showsthepointwheretheestima
tedregressionlinecrossesthe y axis.Itisthevalueoftheestimatedresponse f(x)for x=0. Thevalue of b1
determinestheslopeof theestimated regression line.
The predicted responses (red squares) are the points on the regression line that correspond to the
input values. For example, for the input x= 5, the predicted response is f (5) = 8.33 (represented
with the leftmost red square).
Page 33
34
…,n . They are the distances between the green circles and red squares. When you implement
linear regression, you are actually trying to minimize these distances and make the red squares as
close to the predefined green circles as possible.
Implementing Linear Regression in Python
It’s time to start implementing linear regression in Python. Basically, all you should do is apply
the proper packages and their functions and classes.
The package NumPyis a fundamental Python scientific package that allows many high-
performanceoperationsonsingle-andmulti-dimensionalarrays.Italsooffersmanymathematical
routines. Of course, it’s opensource.
The package scikit-learn is a widely used Python library for machine learning, built on top of
NumPy and some other packages. It provides the means for preprocessing data, reducing
dimensionality,implementingregression,classification,clustering,andmore.LikeNumPy,scikit-
learn is also opensource.
If you want to implement linear regression and need the functionality beyond the scope of scikit-
learn, you should consider statsmodels. It’s a powerful Python package for the estimation of
statistical models, performing tests, and more. It’s open source as well.
Let’s start with the simplest case, which is simple linear regression. There
SOURCE CODE:
importnumpy as np
importmatplotlib.pyplot as plt
defestimate_coef(x, y):
# number of observations/points n =
Page 34
35
np.size(x)
np.sum(y*x) - n*m_y*m_x
SS_xy / SS_xx
return(b_0, b_1)
defplot_regression_line(x, y, b):
b[0] + b[1]*x
# putting labels
plt.xlabel('x')
plt.ylabel('y')
Page 35
36
plt.show()
def main():
# observations
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# estimating coefficients b =
estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
plot_regression_line(x, y, b)
ifname == "main":
main()
OUTPUT:
Estimated coefficients:
b_0 = -0.05862068965
Page 36
37
It’s a classification algorithm that is used where the target variable is of categorical nature. The main
objective behind Logistic Regression is to determine the relationship between features and the probability
of a particular outcome.
For Example, when we need to predict whether a student passes or fails in an exam given the number of
hours spent studying as a feature, the target variable comprises two values i.e. pass and fail.
Therefore, we can solve classification problem statements which is a supervised machine learning
technique using Logistic Regression.
1. Binary Logistic Regression: In this, the target variable has only two 2 possible outcomes.
2. Multinomial Logistic Regression: In this, the target variable can have three or more possible values
without any order.
Page 37
38
3. Ordinal Logistic Regression: In this, the target variable can have three or more values with ordering.
The inputs given to a Logistic Regression model need to be numeric. The algorithm cannot handle
categorical variables directly. So, we need to convert the categorical data into a numerical format that is
suitable for the algorithm to process.
Each level of the categorical variable will be assigned a unique numeric value also known as a dummy
variable. These dummy variables are handled by the Logistic Regression model in the same manner as
any other numeric value.
1. It assumes that there is minimal or no multi collinearity among the independent variables i.e,
predictors are not correlated.
2. There should be a linear relationship between the logit of the outcome and each predictor variable. The
logit function is described as logit(p) = log(p/(1-p)), where p is the probability of the target outcome.
4. The Logistic Regression which has binary classificationi.e, two classes assume that the target variable
is binary, and ordered Logistic Regression requires the target variable to be ordered.
5. Can we solve the multiclass classification problems using Logistic Regression? If Yes then How?
Yes, in order to deal with multiclass classification using Logistic Regression, the most famous method is
known as the one-vs-all approach. In this approach, a number of models are trained, which is equal to the
number of classes. These models work in a specific way.
For Example, the first model classifies the data point depending on whether it belongs to class 1 or some
other class(not class 1); the second model classifies the data point into class 2 or some other class(not class
2) and so-on for all other classes. So, in this manner, each data point can be checked over all the classes.
Page 38
39
PROGRAM 7
AIM:
DESCRIPTION:
The challenge of text classification is to attach labels to bodies of text, e.g., tax document, medical form,
etc. based on the text itself. For example, think of your spam folder in your email. How does your email
provider know that a particular message is spam or “ham” (not spam)? We’ll take a look at one natural
language processing technique for text classification called Naive Bayes
SOURCE CODE:
import pandas as pd
X = msg.message
y = msg.labelnum
count_v = CountVectorizer()
Page 39
40
Xtrain_dm = count_v.fit_transform(Xtrain)
Xtest_dm = count_v.transform(Xtest)
df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
clf = MultinomialNB()
clf.fit(Xtrain_dm, ytrain)
pred = clf.predict(Xtest_dm)
print('Accuracy Metrics:')
print('Accuracy: ', accuracy_score(ytest, pred)) print('Recall: ', recall_score(ytest, pred)) print('Precision: ',
precision_score(ytest, pred))
document.csv:
is an amazingplace,pos
is my sworn enemy,neg My
Page 40
41
boss is horrible,neg
I love to dance,pos
OUTPUT:
Accuracy Metrics:
Accuracy: 0.6
Recall: 0.6666666666666666
Precision: 0.6666666666666666
Confusion Matrix:
[[1 1]
[1 2]]
Page 41
42
Let’s understand it using an example. Below I have a training data set of weather and corresponding target
variable ‘Play’ (suggesting possibilities of playing). Now, we need to classify whether players will play or
not based on weather condition. Let’s follow the below steps to perform it.
Step 2: Create Likelihood table by finding the probabilities like Overcast probability = 0.29 and
probability of playing is 0.64.
Step 3: Now, use NaiveBayesian equation to calculate the posterior probability for each class. The class
with the highest posterior probability is the outcome of prediction.
Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64
Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.
Naive Bayes uses a similar method to predict the probability of different class based on various attributes.
This algorithm is mostly used in text classification and with problems having multiple classes.
Page 42
43
Real time Prediction: Naive Bayes is an eager learning classifier and it is sure fast. Thus, it could
be used for making predictions in real time.
Multi class Prediction: This algorithm is also well known for multi class prediction feature. Here
we can predict the probability of multiple classes of target variable.
Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers mostly used in
text classification (due to better result in multi class problems and independence rule) have higher success
rate as compared to other algorithms. As a result, it is widely used in Spam filtering (identify spam e-mail)
and Sentiment Analysis (in social media analysis, to identify positive and negative customer sentiments)
Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds a
Recommendation System that uses machine learning and data mining techniques to filter unseen
information and predict whether a user would like a given resource or not
Page 43
44
PROGRAM 8
AIM:
DESCRIPTION:
Genetic Algorithms (GAs) are adaptive heuristic search algorithms that belong to the
larger part of evolutionary algorithms. Genetic algorithms are based on the ideas of natural
selection and genetics. These are intelligent exploitation of random search provided with
historical data to direct the search into the region of better performance in solution space. They
are commonly used to generate high-quality solutions for optimization problems and search
problems.Genetic algorithms simulate the process of natural selection which means those
species who can adapt to changes in their environment are able to survive and reproduce and go
to next generation. In simple words, they simulate “survival of the fittest” among individual of
consecutive generation for solving a problem. Each generation consist of a population of
individuals and each individual represents a point in search space and possible solution. Each
individual is represented as a string of character/integer/float/bits. This string is analogous to
the Chromosome.Genetic algorithms are based on an analogy with genetic structure and
behavior of chromosome of the population. Following is the foundation of GAs based on this
analogy –
Page 44
45
Once the initial generation is created, the algorithm evolve the generation using following
operators –
1) Selection Operator: The idea is to give preference to the individuals with good fitness
scores and allow them to pass there genes to the successive generations.
2) Crossover Operator: This represents mating between individuals. Two individuals are
selected using selection operator and crossover sites are chosen randomly. Then the genes at
these crossover sites are exchanged thus creating a completely new individual (offspring). For
example –
Page 45
46
Mutation Operator: The key idea is to insert random genes in offspring to maintain the
diversity in population to avoid the premature convergence. For example –
ALGORITHM:
1) Randomly initialize populations p
2) Determine fitness of population
3) Untill convergence repeat:
a) Select parents from population
b) Crossover and generate new population
c) Perform mutation on new population
d) Calculate fitness for new population
USES OF GENETIC ALGORITHM:
Unlike traditional AI, they do not break on slight change in input or presence of noise
Page 46
47
SOURCE CODE:
importnumpy
defcal_pop_fitness(equation_inputs, pop):
# The fitness function calulates the sum of products between each input and its
corresponding weight.
fitness = numpy.sum(pop*equation_inputs,
# Selecting the best individuals in the current generation as parents for producing the
offspring of the next generation.
parents = numpy.empty((num_parents,
range(num_parents):
max_fitness_idx = numpy.where(fitness ==
numpy.max(fitness)) max_fitness_idx =
max_fitness_idx[0][0]
parents[parent_num, :] =
pop[max_fitness_idx, :]
fitness[max_fitness_idx] = -99999999999
return parents
def crossover(parents,
offspring_size): offspring =
numpy.empty(offspring_size)
Page 47
48
# The point at which crossover takes place between two parents. Usually, it is at the center.
crossover_point = numpy.uint8(offspring_size[1]/2)
for k in range(offspring_size[0]):
mate. parent1_idx =
k%parents.shape[0]
parent2_idx = (k+1)%parents.shape[0]
# The new offspring will have its first half of its genes taken from the first
0:crossover_point]
# The new offspring will have its second half of its genes taken from the second parent.
return offspring
foridx in
range(offspring_crossover.shape[0]):
gene_idx = mutations_counter - 1
formutation_num in range(num_mutations):
Page 48
49
1)
gene_idx = gene_idx +
mutations_counter return
offspring_crossover
importnumpy
"""
equation ASAP: y =
w1x1+w2x2+w3x3+w4x4+w5x5+6wx6
where (x1,x2,x3,x4,x5,x6)=(4,-2,3.5,5,-
11,-4.7)
We are going to use the genetic algorithm for the best possible values after a
number of generations.
"""
# Inputs of the equation.
equation_inputs = [4,-
2,3.5,5,-11,-4.7]
Page 49
50
"""
Genetic algorithm
parameters: Mating
pool size
Population size
"""
sol_per_pop = 8
num_parents_mating = 4
# Defining the population size.
size=pop_size) print(new_population)
"""
new_population[5, :] = [-2, 3, -
7,6, 3, 3] """
Page 50
51
best_outputs = []
num_generations =
1000
for generation in
range(num_generations):
print("Generation : ",
generation)
new_population) print("Fitness")
print(fitness)
best_outputs.append(numpy.max(numpy.sum(new_population*equation_inputs,
mating. parents =
select_mating_pool(new_population, fitness,
num_parents_mating)
print("Parents
")
print(parents)
Page 51
52
crossover. offspring_crossover =
crossover(parents,
offspring_size=(pop_size[0]-parents.shape[0], num_weights))
print("Crossover")
print(offspring_cr
ossover)
offspring_mutation = mutation(offspring_crossover,
num_mutations=2) print("Mutation")
print(offspring_mutation)
new_population[0:parents.shape[0], :] = parents
new_population[parents.shape[0]:, :] = offspring_mutation
#At first, the fitness is calculated for each solution in the final
new_population)
numpy.max(fitness))
Page 52
53
importmatplotlib.pyplotmat
plotlib.pyplot.plot(best_outp
uts)
matplotlib.pyplot.xlabel("Ite
ration")
matplotlib.pyplot.ylabel("Fit
ness")
matplotlib.pyplot.show()
OUTPUT:
Gen
erati
on :
Fitn
ess
Page 53
54
45.41084308]
Best result :
51.302143629097614
Parents
Crossover
Mutation
Gener
ation :
999
Fitnes
Page 54
55
Best result :
2554.3935561987346
Parents
Crossover
-1.93705571e+00 -3.36865291e+02]
-1.93705571e+00 -3.36672197e+02]
-1.93705571e+00 -3.37108802e+02]]
Mutation
-1.93705571e+00 -3.36222272e+02]
Page 55
56
-1.93705571e+00 -3.37417363e+02]
-1.93705571e+00 -3.36866918e+02]
-1.93705571e+00 -3.37331663e+02]]
-1.93705571e+00-
3.37417363e+02]]]
Let’s get back to the example we discussed above and summarize what we did.
Page 56
57
PROGRAM 9
9. IMPLEMENT
THE FINITE WORDS CLASSIFICATION SYSTEM USING BACK-
PROPAGATIONALGORITHM
AIM:
DESCRIPTION:
What is backpropagation?
We can define the backpropagation algorithm as an algorithm that trains some given feed-
forward Neural Network for a given input pattern where the classifications are known to us.
At the point when every passage of the example set is exhibited to the network, the network
looks at its yield reaction to the example input pattern. After that, the comparison done
between output response and expected output with the error value is measured. Later, we
adjust the connection weight based upon the error value measured.
It was first introduced in the 1960s and 30 years later it was popularized by David
Rumelhart, Geoffrey Hinton, and Ronald Williams in the famous 1986 paper. In this paper,
they spoke about the various neural networks. Today, back propagation is doing good. Neural
network training happens through back propagation. By this approach, we fine-tune the
weights of a neural net based on the error rate obtained in the previous run. The right manner
of applying this technique reduces error rates and makes the model more reliable.
Backpropagation is used to train the neural network of the chain rule method. In simple
terms, after each feed-forward passes through a network, this algorithm does the backward
Page 57
58
pass to adjust the model’s parameters based on weights and biases. A typical supervised
learning algorithm attempts to find a function that maps input data to the right output. Back
propagation works with a multi-layered neural network and learns internal representations of
input to output mapping.
The Back propagation algorithm is a supervised learning method for multilayer feed-
forward networks from the field of Artificial Neural Networks.
Feed-forward neural networks are inspired by the information processing of one or more neural
cells, called a neuron. A neuron accepts input signals via its dendrites, which pass the electrical
signal down to the cell body. The axon carries the signal out to synapses, which are the
connections of a cell’s axon to other cell’s dendrites.
Technically, the backpropagation algorithm is a method for training the weights in a multilayer
feed-forward neural network. As such, it requires a network structure to be defined of one or
more layers where one layer is fully connected to the next layer. A standard network structure
is one input layer, one hidden layer, and one output layer.Back propagation can be used for
both classification and regression problems.
In classification problems, best results are achieved when the network has one neuron in the
output layer for each class value. For example, a 2-class or binary classification problem with
the class values of A and B. These expected outputs would have to be transformed into binary
vectors with one column for each class value. Such as [1, 0] and [0, 1] for A and B respectively.
This is called a one hot encoding.
Page 58
59
Let us take a look at how backpropagation works. It has four layers: input layer, hidden layer,
hidden layer II and final output layer.
1. Input layer
2. Hidden layer
3. Output layer
Each layer has its own way of working and its own way to take action such that we are able to
get the desired results and correlate these scenarios to our conditions. Let us discuss other
details needed to help summarizing this algorithm.
SOURCE CODE:
import pandas as pd
fromsklearn.feature_extraction.text import
import MLPClassifier
msg = pd.read_csv('document.csv',
X=
msg.me
ssage y
Page 59
60
msg.lab
elnum
count_v = CountVectorizer()
Xtrain_dm =
count_v.fit_transform(Xtrain)
Xtest_dm =
count_v.transform(Xtest)
df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
print('Accuracy Metrics:')
document.csv:
I love this
sandwich,pos This
is an
amazingplace,pos
Page 60
61
work,pos
restaurant,neg I am
this,neg He is
my sworn
enemy,neg My
boss is
horrible,neg
dance,pos
holiday,pos
tomorrow,pos I went to my
Page 61
62
OUTPUT:
Total Instances of
Dataset: 18
Accuracy Metrics:
Accuracy: 0.8
Recall: 1.0
Precisio
n: 0.75
Confusi
on
Matrix:
[[1 1]
[0 3]
A Multi-Layer Perception (MLP) is one of the most basic neuralnetworks that we use for
classification. For a binary classification problem, we know that the output can be either 0 or 1.
This is just like our simple logistic regression, where we use a logit function to generate a
probability between 0 and 1.
Page 62
63
Simply put, it is just the difference in the threshold function! When we restrict the logistic
regression model to give us either exactly 1 or exactly 0, we get a Perception model:
2. Can we have the same bias for all neurons of a hidden layer?
Essentially, you can have a different bias value at each layer or at each neuron as well.
However, it is best if we have a bias matrix for all the neurons in the hidden layers as well.
A point to note is that both these strategies would give you very different results.
The main aim of this question is to understand why we need activation functions in a neural
network. You can start off by giving a simple explanation of how neural networks are built:
Step 1: Calculate the sum of all the inputs (X) according to their weights and include the bias
term:
Z = (weights * X) + bias
Y = Activation(Z)
Steps 1 and 2 are performed at each layer. If you recollect, this is nothing but forward
propagation! Now, what if there is no activation function?
Page 63
64
Y = Z = (weights * X) + bias
Wait – isn’t this just a simple linear equation? Yes – and that is why we need activation
functions. A linear equation will not be able to capture the complex patterns in the data – this is
even more evident in the case of deep learning problems.
In order to capture non-linear relationships, we use activation functions, and that is why a
neural network without an activation function is just a linear regression model.
4. In a neural network, what if all the weights are initialized with the same value?
In simplest terms, if all the neurons have the same value of weights, each hidden unit will get
exactly the same signal. While this might work during forward propagation, the derivative of
the cost function during backward propagation would be the same every time.
In short, there is no learning happening by the network! What do you call the phenomenon of
the model being unable to learn any patterns from the data? Yes, under fitting.
Therefore, if all weights have the same initial value, this would lead to under fitting.
This is a question best explained with a real-life example. Consider that you want to go out
today to play a cricket match with your friends. Now, a number of factors can affect your
decision-making, like:
And so on. These factors can change your decision greatly or not too much. For example, if it is
raining outside, then you cannot go out to play at all. Or if you have only one bat, you can share
it while playing as well. The magnitude by which these factors can affect the game is called the
weight of that factor.
Factors like the weather or temperature might have a higher weight, and other factors like
equipment would have a lower weight.
However, does this mean that we can play a cricket match with only one bat? No – we would
need 1 ball and 6 wickets as well. This is where bias comes into the picture. Bias lets you
assign some threshold which helps you activate a decision-point (or a neuron) only when that
threshold is crossed.
Page 64
65
Page 65