0% found this document useful (0 votes)
4 views

Machine Learning Report

Uploaded by

sk4497774
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Machine Learning Report

Uploaded by

sk4497774
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 73

Machine learning

Abstract:

Machine learning (ML) is a computerized analytical technique that is being


increasingly employed in biomedicine. ML often provides an advantage over explicitly
programmed strategies in the analysis of multidimensional information by recognizing
relationships in the data that were not previously appreciated. As such, the use of ML in
rheumatology is increasing, and numerous studies have employed ML to classify patients
with rheumatic autoimmune inflammatory diseases (RAIDs) from medical records and
imaging, biometric or gene expression data. However, these studies are limited by sample
size, the accuracy of sample labelling, and absence of datasets for external validation. In
addition, there is potential for ML models to overfit or underfit the data and, thereby, these
models might produce results that cannot be replicated in an unrelated dataset. In this
Review, we introduce the basic principles of ML and discuss its current strengths and
weaknesses in the classification of patients with RAIDs. Moreover, we highlight the
successful analysis of the same type of input data (for example, medical records) with
different algorithms, illustrating the potential plasticity of this analytical approach.
Altogether, a better understanding of ML and the future application of advanced analytical
techniques based on this approach, coupled with the increasing availability of biomedical
data, may facilitate the development of meaningful precision medicine for patients with
RAIDs.

Introduction:

Ever since the technical revolution, we’ve been generating an immeasurable amount
of data. As per research, we generate around 2.5 quintillion bytes of data every single day! It
is estimated that by 2020, 1.7MB of data will be created every second for every person on
earth. With the availability of so much data, it is finally possible to build predictive models
that can study and analyze complex data to find useful insights and deliver more accurate
results.Top Tier companies such as Netflix and Amazon build such Machine Learning
models by using tons of data in order to identify profitable opportunities and avoid unwanted
risks.

Here’s a list of reasons why Machine Learning is so important:


Increase in Data Generation: Due to excessive production of data, we need a method
that can be used to structure, analyze and draw useful insights from data. This is where
Machine Learning comes in. It uses data to solve problems and find solutions to the most
complex tasks faced by organizations.

Improve Decision Making: By making use of various algorithms, Machine Learning can be
used to make better business decisions. For example, Machine Learning is used to forecast
sales, predict downfalls in the stock market, identify risks and anomalies, etc.

Uncover patterns & trends in data: Finding hidden patterns and extracting key insights
from data is the most essential part of Machine Learning. By building predictive models and
using statistical techniques, Machine Learning allows you to dig beneath the surface and
explore the data at a minute scale. Understanding data and extracting patterns manually will
take days, whereas Machine Learning algorithms can perform such computations in less than
a second. Solve complex problems: From detecting the genes linked to the deadly ALS
disease to building self-driving cars, Machine Learning can be used to solve the most
complex problems.

Introduction To Machine Learning:

The term Machine Learning was first coined by Arthur Samuel in the year 1959.
Looking back, that year was probably the most significant in terms of technological
advancements.

If you browse through the net about ‘what is Machine Learning’, you’ll get at least 100
different definitions. However, the very first formal definition was given by Tom M.
Mitchell:
“A computer program is said to learn from experience E with respect to some class of tasks T
and performance measure P if its performance at tasks in T, as measured by P, improves with
experience E.”
In simple terms, Machine learning is a subset of Artificial Intelligence (AI) which provides
machines the ability to learn automatically & improve from experience without being
explicitly programmed to do so. In the sense, it is the practice of getting Machines to solve
problems by gaining the ability to think. But wait, can a machine think or make decisions?
Well, if you feed a machine a good amount of data, it will learn how to interpret, process and
analyze this data by using Machine Learning Algorithms, in order to solve real-world
problems.

Before moving any further, let’s discuss some of the most commonly used terminologies in
Machine Learning.

Machine Learning Definitions

Algorithm:

A Machine Learning algorithm is a set of rules and statistical techniques used to learn
patterns from data and draw significant information from it. It is the logic behind a Machine
Learning model. An example of a Machine Learning algorithm is the Linear Regression
algorithm.

Model:

A model is the main component of Machine Learning. A model is trained by using a


Machine Learning Algorithm. An algorithm maps all the decisions that a model is supposed
to take based on the given input, in order to get the correct output.

Predictor Variable:

It is a feature(s) of the data that can be used to predict the output.

Response Variable:

It is the feature or the output variable that needs to be predicted by using the predictor
variable(s).
Training Data:

The Machine Learning model is built using the training data. The training data helps
the model to identify key trends and patterns essential to predict the output.

Testing Data:

After the model is trained, it must be tested to evaluate how accurately it can predict an
outcome. This is done by the testing data set.

Data Types:

To analyze data, it is important to know what type of data we are dealing with.

We can split the data types into three main categories:

 Numerical

 Categorical

 Ordinal

Numerical data are numbers, and can be split into two numerical categories:

 Discrete Data:

- numbers that are limited to integers. Example: The number of cars passing
by.

 Continuous Data:

- numbers that are of infinite value. Example: The price of an item, or the size
of an item

Categorical data are values that cannot be measured up against each other.
Example: a color value, or any yes/no values.

Ordinal data are like categorical data, but can be measured up against each other. Example:
school grades where A is better than B and so on.

Containers that store values are called variables in Python. Variables are only designated
memory spaces for the storage of values. This implies that you set aside some memory when
you create a variable. The interpreter allots memory and determines what can be placed in the
reserved memory based on the data type of a variable. Therefore, you may store integers,
decimals, or characters in these variables by giving them alternative data types. Variables do
not need to be declared or specified beforehand in Python, unlike many other programming
languages. A lot of values need to be managed while developing a program. We utilize
variables to store values. A variable’s value may be modified while a program is running.

Introduction:

A Python variable is a reserved memory location to store values. In other words, a


variable in a python program gives data to the computer for processing.

Variables:

Containers that store values are called variables in Python. Variables are only
designated memory spaces for the storage of values. This implies that you set aside some
memory when you create a variable. The interpreter allots memory and determines what can
be placed in the reserved memory based on the data type of a variable. Therefore, you may
store integers, decimals, or characters in these variables by giving them alternative data types.
Variables do not need to be declared or specified beforehand in Python, unlike many other
programming languages. A lot of values need to be managed while developing a program.
We utilize variables to store values. A variable’s value may be modified while a program is
running.

Introduction:

A Python variable is a reserved memory location to store values. In other words, a


variable in a python program gives data to the computer for processing.

Overview of Machine Learning:

So for all those of you who do not know what is Machine Learning? Machine
Learning, in the simplest of terms, is teaching your machine about something. You collect
data, clean the
data, create algorithms, teach the algorithm essential patterns from the data and then expect
the algorithm to give you a helpful answer. If the algorithm lives up to your expectations, you
have successfully taught your algorithm. If not, just scrap everything and start from scratch.
That is how it works here. Oh, and if you are looking for a formal definition, Machine
Learning is the process of creating models that can perform a certain task without the need
for a human explicitly programming it to do something.

There are 3 types of Machine Learning which are based on the way the algorithms are
created. They are:

 Supervised Learning – You supervise the learning process, meaning the data that
you have collected here is labelled and so you know what input needs to be mapped to
what output. This helps you correct your algorithm if it makes a mistake in giving you
the answer.
 Unsupervised Learning – The data collected here has no labels and you are unsure
about the outputs. So you model your algorithm such that it can understand patterns
from the data and output the required answer. You do not interfere when the
algorithm learns.
 Reinforcement Learning – There is no data in this kind of learning, nor do you teach
the algorithm anything. You model the algorithm such that it interacts with the
environment and if the algorithm does a good job, you reward it, else you punish the
algorithm. With continuous interactions and learning, it goes from being bad to being
the best that it can for the problem assigned to it.

Now that you have a basic idea of what is Machine Learning and the different types of
Machine Learning, let us dwell into the actual topic for discussion here and answer What is
Supervised Learning? Where is Supervised Learning used? What are the types of Supervised
Learning? Supervised Learning Algorithms and much more!

What is Supervised Learning?

Supervised Learning is the process of making an algorithm to learn to map an input to


a particular output. This is achieved using the labelled datasets that you have collected. If the
mapping is correct, the algorithm has successfully learned. Else, you make the necessary
changes to the algorithm so that it can learn correctly. Supervised Learning algorithms can
help make predictions for new unseen data that we obtain later in the future. This is similar to
a teacher-student scenario. There is a teacher who guides the student to learn from books
and
other materials. The student is then tested and if correct, the student passes. Else, the teacher
tunes the student and makes the student learn from the mistakes that he or she had made in
the past. That is the basic principle of Supervised Learning.

Example of Supervised Learning:

Suppose you have a niece who has just turned 2 years old and is learning to speak.
She knows the words, Papa and Mumma, as her parents have taught her how she needs to call
them. You want to teach her what a dog and a cat is. So what do you do? You either show her
videos of dogs and cats or you bring a dog and a cat and show them to her in real-life so that
she can understand how they are different.

Now there are certain things you tell her so that she understands the differences between the 2
animals.
 Dogs and cats both have 4 legs and a tail.
 Dogs come in small to large sizes. Cats, on the other hand, are always small.
 Dogs have a long mouth while cats have smaller mouths.
 Dogs bark while cats meow.
 Different dogs have different ears while cats have almost the same kind of ears.

Now you take your niece back home and show her pictures of different dogs and cats. If she
is able to differentiate between the dog and cat, you have successfully taught her.

So what happened here? You were there to guide her to the goal of differentiating between a
dog and a cat. You taught her every difference there is between a dog and a cat. You then
tested her if she was able to learn. If she was able to learn, she called the dog as a dog and a
cat as a cat. If not, you taught her more and were able to teach her. You acted as the
supervisor and your niece acted as the algorithm that had to learn. You even knew what was a
dog and what was a cat. Making sure that she was learning the correct thing. That is the
principle that Supervised Learning follows.

Now with having a basic understanding of what Supervised Learning is, let’s also understand
what makes this kind of learning important.

 Learning gives the algorithm experience which can be used to output the predictions
for new unseen data
 Experience also helps in optimizing the performance of the algorithm
 Real-world computations can also be taken care of by the Supervised Learning
algorithms

With the importance of Supervised Learning understood, let’s take a look at the types of
Supervised Learning along with the algorithms!

Types of Supervised Learning:

Supervised Learning has been broadly classified into 2 types.

 Regression
 Classification
Regression is the kind of Supervised Learning that learns from the Labelled Datasets and is
then able to predict a continuous-valued output for the new data given to the algorithm. It is
used whenever the output required is a number such as money or height etc.

Some popular Supervised Learning algorithms are discussed below:

Applications of Supervised Learning:

Supervised Learning Algorithms are used in a variety of applications. Let’s go through some
of the most well-known applications.

 BioInformatics – This is one of the most well-known applications of Supervised


Learning because most of us use it in our day-to-day lives. BioInformatics is the
storage of Biological Information of us humans such as fingerprints, iris texture,
earlobe and so on. Cellphones of today are capable of learning our biological
information and are then able to authenticate us bringing up the security of the
system. Smartphones such as iPhones, Google Pixel are capable of facial recognition
while OnePlus, Samsung is capable of In-display finger recognition.
 Speech Recognition – This is the kind of application where you teach the algorithm
about your voice and it will be able to recognize you. The most well-known real-
world applications are virtual assistants such as Google Assistant and Siri, which will
wake up to the keyword with your voice only.
 Spam Detection – This application is used where the unreal or computer-based
messages and E-Mails are to be blocked. G-Mail has an algorithm that learns the
different keywords which could be fake such as “You are the winner of something”
and so forth and blocks those messages directly. OnePlus Messages App gives the
user the task of making the application learn which keywords need to be blocked and
the app will block those messages with the keyword.
 Object-Recognition for Vision – This kind of application is used when you need to
identify something. You have a huge dataset which you use to teach your algorithm
and this can be used to recognize a new instance. Raspberry Pi algorithms which
detect objects are the most well-known example.

Those were some of the places where Supervised Learning has shined and shown its grit in
the real world of today. With that, let us move over to the differences between Supervised
and Unsupervised learning.
 Unsupervised Learning – The data collected here has no labels and you are unsure
about the outputs. So you model your algorithm such that it can understand patterns
from the data and output the required answer. You do not interfere when the
algorithm learns.

 Reinforcement Learning – There is no data in this kind of learning, nor do you teach
the algorithm anything. You model the algorithm such that it interacts with the
environment and if the algorithm does a good job, you reward it, else you punish the
algorithm. With continuous interactions and learning, it goes from being bad to being
the best that it can for the problem assigned to it.

Now that we know what is Machine Learning and the different types of Machine Learning,
let us dwell into the actual topic for discussion here and answer What is Unsupervised
Learning? Where is Unsupervised Learning used? Unsupervised Learning Algorithms and
much more

What is Unsupervised Learning?

Unsupervised Learning, as discussed earlier, can be thought of as self-learning where the


algorithm can find previously unknown patterns in datasets that do not have any sort of
labels. It helps in modelling probability density functions, finding anomalies in the data, and
much more. To give you a simple example, think of a student who has textbooks and all the
required material to study but has no teacher to guide. Ultimately, the student will have to
learn by himself or herself to pass the exams. This sort of self-learning is what we have
scaled into Unsupervised Learning for machines.

Let me give you a real-life example of where Unsupervised Learning may have been used
you to learn about something.

Example of Unsupervised Learning


Suppose you have never watched a cricket match in your entire life and you have been
invited by your friends to hang out at their house for a match between India and Australia.
You have no idea about what cricket is but just for your friends, you say yes and head over
with them. The match starts and you just sit there, blank. Your friends are enjoying the way
Virat Kohli plays and want to join in the fun. Here is when you start learning about the game.
You analyse the screen and come up with certain conclusions that you can use to understand
the game better.

 There are 2 teams with jerseys of colour Blue and Yellow. Since Virat Kohli belongs
to India and you see the score of India on the screen, you conclude that India has the
jersey of Blue which makes Australia have yellow Jersey.

 There are different types of players on the field. 2 which belong to India have bats in
their hand meaning that they are batting. There is someone who runs up and bowls the
ball, making him a bowler. There are around 9 players around the field who try to stop
the ball from reaching the boundary of the stadium. There is someone behind the
wickets and 2 umpires to manage the match.

 If the ball hits the wickets or if the ball is caught by the fielders, the batsman is out
and has to walk back.

 Virat Kohli has the number 18 and his name on the back of his jersey and if this
player scores a 4 or a 6, you need to cheer.

You make these observations one-by-one and now know when to cheer or boo when the
wickets fall. From knowing nothing to knowing the basics of cricket, you can now enjoy the
match with your friends.

What happened here? You had every material that you needed to learn about the basics of
cricket. The TV, when and who your friends cheer for. This made you learn about cricket by
yourself without someone guiding you about anything. This is the principle that unsupervised
learning follows. So having understood what Unsupervised Learning is, let us move over and
understand what makes it so important in the field of Machine Learning.

Why is it important?

So what does Unsupervised Learning help us obtain? Let me tell you all about it.

 Unsupervised Learning algorithms work on datasets that are unlabelled and find
patterns which would previously not be known to us.

 These patterns obtained are helpful if we need to categorize the elements or find
an association between them.

 They can also help detect anomalies and defects in the data which can be taken care of
by us.

Lastly and most importantly, data which we collect is usually unlabelled which makes work
easier for us when we use these algorithms.

Now that we know the importance, let us move ahead and understand the different

Types of Unsupervised Learning:

Unsupervised Learning has been split up majorly into 2 types:

 Clustering

 Association

Clustering is the type of Unsupervised Learning where you find patterns in the data that you
are working on. It may be the shape, size, colour etc. which can be used to group data items
or create clusters.

Some popular algorithms in Clustering are discussed below:


 Hierarchical Clustering – This algorithm builds clusters based on the similarity
between different data points in the dataset. It goes over the various features of the
data points and looks for the similarity between them. If the data points are found to
be similar, they are grouped together. This continues until the dataset has been
grouped which creates a hierarchy for each of these clusters.

 K-Means Clustering – This algorithm works step-by-step where the main goal is to
achieve clusters which have labels to identify them. The algorithm creates clusters of
different data points which are as homogenous as possible by calculating the centroid
of the cluster and making sure that the distance between this centroid and the new data
point is as less as possible. The smallest distance between the data point and the
centroid determines which cluster it belongs to while making sure the clusters do not
interlay with each other. The centroid acts like the heart of the cluster. This ultimately
gives us the cluster which can be labelled as needed.

 K-NN Clustering – This is probably the most simple of the Machine Learning
algorithms as the algorithm does not really learn but rather classifies the new data
point based on the datasets that have been stored by it. This algorithm is also called as
a lazy learner because it learns only when the algorithm is given a new data point. It
works well with smaller datasets as huge datasets take time to learn.

Data Science Training:

Association is the kind of Unsupervised Learning where you find the dependencies of
one data item to another data item and map them such that they help you profit better. Some
popular algorithms in Association Rule Mining are discussed below:
 Apriori algorithm – The Apriori Algorithm is a breadth-first search based which
calculates the support between items. This support basically maps the dependency of
one data item with another which can help us understand what data item influences
the possibility of something happening to the other data item. For example, bread
influences the buyer to buy milk and eggs. So that mapping helps increase profits for
the store. That sort of mapping can be learnt using this algorithm which yields rules as
for its output.

 FP-Growth Algorithm – The Frequency Pattern (FP) algorithm finds the count of the
pattern that has been repeated, adds that to a table and then finds the most plausible
item and sets that as the root of the tree. Other data items are then added into the tree
and the support is calculated. If that particular branch fails to meet the threshold of the
support, it is pruned. Once all the iterations are completed, a tree with the root to the
item will be created which are then used to make the rules of the association. This
algorithm is faster than Apriori as the support is calculated and checked for increasing
iterations rather than creating a rule and checking the support from the dataset.

Now that you have a clear understanding between the two kinds of Unsupervised
Learning, let us now learn about some of the
applications of Unsupervised Learning.

Applications of Unsupervised Learning:

Unsupervised Learning helps in a variety of ways which can be used to solve various real-
world problems.

 They help us in understanding patterns which can be used to cluster the data points
based on various features.
 Understanding various defects in the dataset which we would not be able to detect
initially.

 They help in mapping the various items based on the dependencies of each other.

 Cleansing the datasets by removing features which are not really required for the
machine to learn from.

This ultimately leads to applications which are helpful to us. Certain examples of where
Unsupervised Learning algorithms are used are discussed below:

 AirBnB – This is a great application which helps host stays and experiences
connecting people all over the world. This application uses Unsupervised Learning
where the user queries his or her requirements and Airbnb learns these patterns and
recommends stays and experiences which fall under the same group or cluster.

 Amazon – Amazon also uses unsupervised learning to learn the customer’s purchase
and recommend the products which are most frequently bought together which is an
example of association rule mining.

 Credit -Card Fraud Detection – Unsupervised Learning algorithms learn about


various patterns of the user and their usage of the credit card. If the card is used in
parts that do not match the behaviour, an alarm is generated which could possibly be
marked fraud and calls are given to you to confirm whether it was you using the card
or not.

Reinforcement:

Reinforcement Learning Algorithms:

There are three approaches to implement a Reinforcement Learning algorithm.

Value-Based:

In a value-based Reinforcement Learning method, you should try to maximize a value


function V(s). In this method, the agent is expecting a long-term return of the current states
under policy π.

Policy-based:

In a policy-based RL method, you try to come up with such a policy that the action
performed in every state helps you to gain maximum reward in the future.

Two types of policy-based methods are:


 Deterministic: For any state, the same action is produced by the policy π.

 Stochastic: Every action has a certain probability, which is determined by the


following equation.

Stochastic Policy:

n{a\s) = P\A, = a\S, =S]

Model-Based:

In this Reinforcement Learning method, you need to create a virtual model for each
environment. The agent learns to perform in that specific environment.

Characteristics of Reinforcement Learning:

Here are important characteristics of reinforcement learning

 There is no supervisor, only a real number or reward signal

 Sequential decision making

 Time plays a crucial role in Reinforcement problems

 Feedback is always delayed, not instantaneous

 Agent’s actions determine the subsequent data it receives

Types of Reinforcement Learning:

Two types of reinforcement learning methods are:

Positive:

It is defined as an event, that occurs because of specific behavior. It increases the


strength and the frequency of the behavior and impacts positively on the action taken by the
agent. This type of Reinforcement helps you to maximize performance and sustain change for
a more extended period. However, too much Reinforcement may lead to over-optimization of
state, which can affect the results.

Negative:

Negative Reinforcement is defined as strengthening of behavior that occurs because


of a negative condition which should have stopped or avoided. It helps you to define the
minimum stand of performance. However, the drawback of this method is that it provides
enough to meet up the minimum behavior.
Learning Models of Reinforcement:

There are two important learning models in reinforcement learning:

 Markov Decision Process

 Q learning

Markov Decision Process:

The following parameters are used to get a solution:

 Set of actions- A

 Set of states -S

 Reward- R

 Policy- n

 Value- V

The mathematical approach for mapping a solution in reinforcement Learning is recon as a


Markov Decision Process or (MDP).

Q-Learning:

Q learning is a value-based method of supplying information to inform which action an agent


should take.

Let’s understand this method by the following example:

 There are five rooms in a building which are connected by doors.


 Each room is numbered 0 to 4

 The outside of the building can be one big outside area (5)

 Doors number 1 and 4 lead into the building from room 5

Next, you need to associate a reward value to each door:

 Doors which lead directly to the goal have a reward of 100

 Doors which is not directly connected to the target room gives zero reward

 As doors are two-way, and two arrows are assigned for each room

 Every arrow in the above image contains an instant reward value

Explanation:

In this image, you can view that room represents a state

Agent’s movement from one room to another represents an action

In the below-given image, a state is described as a node, while the arrows show the action.
For example, an agent traverse from room number 2 to 5

 Initial state = state 2

 State 2-> state 3

 State 3 -> state (2,1,4)

 State 4-> state (0,5,3)

 State 1-> state (5,3)

 State 0-> state 4

Applications of Reinforcement Learning:

Here are applications of Reinforcement Learning:

 Robotics for industrial automation.


 Business strategy planning
 Machine learning and data processing
 It helps you to create training systems that provide custom instruction and materials
according to the requirement of students.
 Aircraft control and robot motion control

Linear Regression:

Linear Regression is a machine learning algorithm based on supervised learning. It


performs a regression task. Regression models a target prediction value based on independent
variables. It is mostly used for finding out the relationship between variables and forecasting.
Different regression models differ based on – the kind of relationship between dependent and
independent variables they are considering, and the number of independent variables getting
used.

Linear regression performs the task to predict a dependent variable value (y) based on a given
independent variable (x). So, this regression technique finds out a linear relationship between
x (input) and y(output). Hence, the name is Linear Regression.
In the figure above, X (input) is the work experience and Y (output) is the salary of a person.
The regression line is the best fit line for our model.

Hypothesis function for Linear Regression:

While training the model we are given:

x: input training data (univariate – one input variable(parameter))

y: labels to data (supervised learning)

When training the model – it fits the best line to predict the value of y for a given value of x.
The model gets the best regression fit line by finding the best θ1 and θ2 values.
θ1: intercept
θ2: coefficient of x

Once we find the best θ1 and θ2 values, we get the best fit line. So when we are finally using
our model for prediction, it will predict the value of y for the input value of x.

Cost Function (J):

By achieving the best-fit regression line, the model aims to predict y value such that
the error difference between predicted value and true value is minimum. So, it is very
important to update the θ1 and θ2 values, to reach the best value that minimize the error
between predicted y value (pred) and true y value (y).

Cost function(J) of Linear Regression is the Root Mean Squared Error (RMSE) between
predicted y value (pred) and true y value (y).

Gradient Descent:

To update θ1 and θ2 values in order to reduce Cost function (minimizing RMSE value)
and achieving the best fit line the model uses Gradient Descent. The idea is to start with
random θ1 and θ2 values and then iteratively updating the values, reaching minimum cost.

Pre-requisite: Linear Regression:

This article discusses the basics of Logistic Regression and its implementation in
Python. Logistic regression is basically a supervised classification algorithm. In a
classification problem, the target variable (or output), y, can take only discrete values for a
given set of features (or inputs), X.

Contrary to popular belief, logistic regression IS a regression model. The model builds a
regression model to predict the probability that a given data entry belongs to the category
numbered as “1”. Just like Linear regression assumes that the data follows a linear function,
Logistic regression models the data using the sigmoid function.

Logistic regression becomes a classification technique only when a decision threshold is


brought into the picture. The setting of the threshold value is a very important aspect of
Logistic regression and is dependent on the classification problem itself.

The decision for the value of the threshold value is majorly affected by the values
of precision and recall. Ideally, we want both precision and recall to be 1, but this seldom is
the case.

In the case of a Precision-Recall trade-off, we use the following arguments to decide upon the
threshold:-
1. Low Precision/High Recall: In applications where we want to reduce the number of false
negatives without necessarily reducing the number of false positives, we choose a decision
value that has a low value of Precision or a high value of Recall. For example, in a cancer
diagnosis application, we do not want any affected patient to be classified as not affected
without giving much heed to if the patient is being wrongfully diagnosed with cancer. This is
because the absence of cancer can be detected by further medical diseases but the presence of
the disease cannot be detected in an already rejected candidate.

2. High Precision/Low Recall: In applications where we want to reduce the number of false
positives without necessarily reducing the number of false negatives, we choose a decision
value that has a high value of Precision or a low value of Recall. For example, if we are
classifying customers whether they will react positively or negatively to a personalized
advertisement, we want to be absolutely sure that the customer will react positively to the
advertisement because otherwise, a negative reaction can cause a loss of potential sales from
the customer.
Based on the number of categories, Logistic regression can be classified as:

1. binomial: target variable can have only 2 possible types: “0” or “1” which may
represent “win” vs “loss”, “pass” vs “fail”, “dead” vs “alive”, etc.
2. multinomial: target variable can have 3 or more possible types which are not ordered
(i.e. types have no quantitative significance) like “disease A” vs “disease B” vs
“disease C”.
3. ordinal: it deals with target variables with ordered categories. For example, a test
score can be categorized as:“very poor”, “poor”, “good”, “very good”. Here, each
category can be given a score like 0, 1, 2, 3.

First of all, we explore the simplest form of Logistic Regression, i.e Binomial Logistic
Regression.
Student Self-Evaluation for the Short-
Internship
Student Name: Mallepogu sagar

Registration No: 228A1A0309

Term of Internship: 6 Week From: jun2024 To: july 2024

Date of Evaluation:

Organization Name & Address:

Please rate your performance in the following areas:


Rating Scale: Letter grade of CGPA calculation to be provided

1 Oral communication 1 2 3 4 5

2 Written communication 1 2 3 4 5

3 Proactiveness 1 2 3 4 5

4 Interaction ability with community 1 2 3 4 5

5 Positive Attitude 1 2 3 4 5

6 Self-confidence 1 2 3 4 5

7 Ability to learn 1 2 3 4 5

8 Work Plan and organization 1 2 3 4 5

9 Professionalism 1 2 3 4 5

10 Creativity 1 2 3 4 5

11 Quality of work done 1 2 3 4 5

12 Time Management 1 2 3 4 5

13 Understanding the Community 1 2 3 4 5

14 Achievement of Desired Outcomes 1 2 3 4 5

15 OVERALL PERFORMANCE 1 2 3 4 5

Date: Signature of Student:


Evaluation by the Supervisor of the Intern
Organization
Student Name: Mallepogu sagar

Registration No: 228A1A0309

Term of Internship: 6 Week From: jun 2024 To: july 2024

Date of Evaluation:

Organization Name & Address:


Please rate the student’s performance in the following areas:
Please note that your evaluation shall be done independent of the
student’s self- evaluation Rating Scale:
1 Oral communication 1 2 3 4 5
2 Written communication 1 2 3 4 5
3 Proactiveness 1 2 3 4 5
4 Interaction ability with community 1 2 3 4 5
5 Positive Attitude 1 2 3 4 5
6 Self-confidence 1 2 3 4 5
7 Ability to learn 1 2 3 4 5
8 Work Plan and organization 1 2 3 4 5
9 Professionalism 1 2 3 4 5
10 Creativity 1 2 3 4 5
11 Quality of work done 1 2 3 4 5
12 Time Management 1 2 3 4 5
13 Understanding the Community 1 2 3 4 5
14 Achievement of Desired Outcomes 1 2 3 4 5

15 OVERALL PERFORMANCE 1 2 3 4 5

Date: Signature of Student:


INTERNAL ASSESSMENT STATEMENT

Name Of the Student: Mallepogu


sagar
Programmer of Study: Bachelor of
Technology
Year of Study: 2024 - 2025
Group: Mechanical Engineering.

Register No/H.T. No: 228A1A0309

Name of the College: Rise Krishna Sai Prakasam group of institutions

University: JNTU Kakinada

Sl. Evaluation Criterion


Maximum Marks
N
Marks Awarded
o

1. Activity Log 25

2. Internship Evaluation 50

3. Oral Presentation 25

GRAND TOTAL 100

Date: Signature of the Faculty Guide

Certified by

Date: Signature of the Head of the Department/Principal


Seal:
The reason for taking = 1 is pretty clear now.We needed to do a matrix product, but there was
no actual multiplied to in original hypothesis formula. So, we defined = 1.

Now, if we try to apply Linear Regression to the above problem, we are likely to get
continuous values using the hypothesis we discussed above. Also, it does not make sense

for to take values larger than 1 or smaller than 0.

So, some modifications are made to the hypothesis for classification:

where,

is called logistic function or the sigmoid function.


Here is a plot showing g(z):

We can infer from the above graph that:

 g(z) tends towards 1 as


 g(z) tends towards 0 as
 g(z) is always bounded between 0 and 1

So, now, we can define conditional probabilities for 2 labels(0 and 1) for observation
as:

We can write it more compactly as:

Now, we define another term, likelihood of parameters as:

Likelihood is nothing but the probability of data(training examples), given a model and
specific parameter values(here, ). It measures the support provided by the data for each
possible value of the . We obtain it by multiplying all for given .

And for easier calculations, we take log-likelihood:

The cost function for logistic regression is proportional to the inverse of the likelihood of
parameters. Hence, we can obtain an expression for cost function, J using log-likelihood
equation as:
and our aim is to estimate so that cost function is minimized!! Using Gradient descent
algorithm

Firstly, we take partial derivatives of w.r.t each to derive the stochastic


gradient descent rule (we present only the final derived value here):

Here, y and h(x) represent the response vector and predicted response vector(respectively).
Also, is the vector representing the observation values for feature.

Now, in order to get min,

Introduction to Decision Tree Algorithm:

The Decision Tree Algorithm is one of the popular supervised type machine learning
algorithms that is used for classifications. This algorithm generates the outcome as the
optimized result based upon the tree structure with the conditions or rules. The decision tree
algorithm associated with three major components as Decision Nodes, Design Links, and
Decision Leaves. It operates with the Splitting, pruning, and tree selection process. It
supports both numerical and categorical data to construct the decision tree. Decision tree
algorithms are efficient for large data set with less time complexity. This Algorithm is mostly
used in customer segmentation and marketing strategy implementation in the business.

Decision Tree Algorithm is a supervised Machine Learning Algorithm where data is


continuously divided at each row based on certain rules until the final outcome is generated.
Let’s take an example, suppose you open a shopping mall and of course, you would want it to
grow in business with time. So for that matter, you would require returning customers plus
new customers in your mall. For this, you would prepare different business and marketing
strategies such as sending emails to potential customers, create offers and deals, targeting
new customers, etc. But how do we know who are the potential customers? In other words,
how do we classify the category of the customers? Like some customers will visit once in a
week and others would like to visit once or twice in a month, or some will visit in a quarter.
So decision trees are one such classification algorithm that will classify the results into groups
until no more similarity is left.

In this way, the decision tree goes down in a tree-structured format.


The main components of a decision tree are:

 Decision Nodes, which is where the data is split or, say, it is a place for the attribute.
 Decision Link, which represents a rule.
 Decision Leaves, which are the final outcomes.

Working of a Decision Tree Algorithm:

There are many steps that are involved in the working of a decision tree:

1. Splitting – It is the process of the partitioning of data into subsets. Splitting can be done on
various factors as shown below i.e. on a gender basis, height basis, or based on class.

2. Pruning – It is the process of shortening the branches of the decision tree, hence
limiting the tree depth.

Pruning is also of two types:

 Pre-Pruning – Here, we stop growing the tree when we do not find any statistically
significant association between the attributes and class at any particular node.
 Post-Pruning – In order to post prune, we must validate the performance of the test
set model and then cut the branches that are a result of overfitting noise from the
training set.
3. Tree Selection – The third step is the process of finding the smallest tree that fits the data.

The below images illustrate a learned decision tree.

We can also set some threshold values if the features are continuous.

Entropy in Decision Tree Algorithm?

In simple words, entropy is the measure of how disordered your data is. While you
might have heard this term in your Mathematics or Physics classes, it’s the same here. The
reason Entropy is used in the decision tree is because the ultimate goal in the decision tree is
to group similar data groups into similar classes, i.e. to tidy the data.

Let us see the below image, where we have the initial dataset, and we are required to
apply a decision tree algorithm in order to group together similar data points in one category.
After the decision split, as we can clearly see, most of the red circles fall under one class
while most of the blue crosses fall under another class. Hence a decision was to classify the
attributes that could be based on various factors.

Now, let us try to do some math over here:

Let us say that we have got “N” sets of the item, and these items fall into two
categories, and now in order to group the data based on labels, we introduce the ratio:

The entropy of our set is given by the following equation:

Let us check out the graph for the given equation:


Above Image (with p=0.5 and q=0.5)

Random Forest:

Random forest is a commonly-used machine learning algorithm trademarked by Leo


Breiman and Adele Cutler, which combines the output of multiple decision trees to reach a
single result. Its ease of use and flexibility have fueled its adoption, as it handles both
classification and regression problems.

Decision trees:

Since the random forest model is made up of multiple decision trees, it would be
helpful to start by describing the decision tree algorithm briefly. Decision trees start with a
basic question, such as, “Should I surf?” From there, you can ask a series of questions to
determine an answer, such as, “Is it a long period swell?” or “Is the wind blowing offshore?”.
These questions make up the decision nodes in the tree, acting as a means to split the data.
Each question helps an individual to arrive at a final decision, which would be denoted by the
leaf node. Observations that fit the criteria will follow the “Yes” branch and those that don’t
will follow the alternate path. Decision trees seek to find the best split to subset the data, and
they are typically trained through the Classification and Regression Tree (CART) algorithm.
Metrics, such as Gini impurity, information gain, or mean square error (MSE), can be used to
evaluate the quality of the split.
An Internship Report on
AWS Academy Cloud Virtual Internship

Submitted in accordance with the requirement for the degree of

BACHELOR OF TECHNOLOGY

IN
MECHANICAL ENGINEERING

submitted by

Mallepogu sagar
(228A1A0309)
Under the esteemed guidance
Assistant professor Mr.Ramesh Babu

RISE KRISHNA SAI PRAKASAM GROUP OF INSTITUTIONS


(AUTONOMOUS)
(Approved by AICTE, NEW DELHI & Affiliated to JNTU, KAKINADA).
An ISO9001:2015certified institute accredited NAAC, Ongole, Prakasam Dt., Andhra Pradesh.
RISE KRISHNA SAI PRAKASAM GROUP OF INSTITUTIONS
Approved by AICTE, Permanently Affiliated to JNTUK &Accredited by NAAC with ‘A’ Grade

DEPARTMENT OF MECHANICAL ENGINEERING

CERTIFICATE

This is to certify that the report entitled “MACHINE LEARNING”, that is being submitted by
MALLEPOGU SAGAR of III Year I Semister bearing (228A1A0309), in partial fullfilment for
the award of the Degree of Bachelor of Technology Mechanical Engineering, Rise Krishna Sai
Prakasam Group Of Institutions is a record of Bonafede work carried out by them.

Signature of Guide Signature of principal

Signature of Head of the Department Signature of External Examiner


VISION AND MISSION OF THE INSTITUTION

Vision of To be a premier institution in technical education by creating


professionals of global standards with ethics and social responsibility for
the the development of the nation and the mankind.
Institute
Impart Outcome Based Education through well qualified and dedicated
faculty.

Provide state-of-the-art infrastructure and facilities for application-


oriented research.

Reinforce technical skills with life skills and entrepreneurship skills.


Promote cutting-edge technologies to produce industry-ready
Professionals.
Mission of the
Institute Facilitate interaction with all stakeholders to foster ideas and innovation.

Inculcate moral values, professional ethics and social responsibility

VISION AND MISSION OF THE DEPARTMENT

Vision of
To be a center of excellence in computer science and engineering for
the
Department value-based education to serve humanity and contribute for socio-
economic development.
 Provide professional knowledge by student centric
teaching- learning process to contribute software industry.
 Inculcate training on cutting edge technologies for industryneeds.

 Create an academic ambiance leading to research.


Mission of
the  Promote industry institute interaction for real time problem solving.
Department
Outcomes (POs)

Engineering Knowledge: ApplOy the knowledge of mathematics, science, engineering


fundamentals, and an engineering specialization to the solution of complex engineering
PO1 Problems

PO2 Problem Analysis: Identify, formulate, review research literature, and analyze complex
Engineering problem searching substantiated conclusions using first principles o
mathematics, natural sciences, and engineering sciences

Design/Development of Solutions: Design solutions for complex engineering problems


and design systemcomponents or processes that meet the specified needs with appropriate
PO3 consideration for the public health and safety, and the cultural, societal, and
Environmental considerations.

Conduct Investigations of Complex Problems: Use research-based knowledge and


research methods including design of experiments, analysis and interpretation of data, and
PO4 synthesis of the information provide valid conclusions

Modern Tool Usage: Create, select, and apply appropriate techniques, resources, and
modern engineering and IT tools including prediction and modeling to complex
PO5 Engineering activities with an understanding of the limitations

The Engineer and Society: Apply reasoning informed by the contextual knowledge to
PO6 assess societal, health, safety, legal and cultural issues and the consequent responsibilities
relevant to the Professional engineering practice.

Environment and Sustainability: Understand the impact of the professional engineering


solutions in societal and environmental contexts, and demonstrate the Knowledge of,
PO7 and need for sustainable development.
PO8 Ethics: Applyethicalprinciplesandcommittoprofessionalethicsandresponsibilities and
norms of the engineering practice
PO9 Individual and Team Work: Function effectively as an individual, and as a member
or leader in diverse teams, and in multidisciplinary settings

Communication: Communicate effectively on complex engineering activities with the


engineering community and with society at large, such as, being able to comprehend and
PO10
write effective reports and design documentation, make effective presentations, and give
and receive clear instructions

PO11 Project Management and Finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a
member and leader in a team, to manage projects and in multidisciplinary
environments

Life-long Learning: Recognize the need for, and have the preparation and ability to
PO12 engage in independent and life-long learning in the broadest context of technological
change.
Program Educational Objectives (PEOs):
PEO1: Develop software solutions for real world problems by applying Mathematics
PEO2: Function as members of multi-disciplinary teams and to communicate effectively using
modern tools.
PEO3: Pursue career in software industry or higher studies with continuous learning and apply
professional knowledge.
PEO4: Practice the profession with ethics, integrity, leadership and social responsibility.

Program Specific Outcomes (PSOs):

PSO1 Domain Knowledge: Apply the Knowledge of Programming Languages, Networks


and Databases for design and development of Software Applications.

Computing Paradigms: Understand the evolutionary changes in computing possess


PSO2 knowledge of context aware applicability of paradigms and meet the challenges of the
future.
COMPLETION CERTIFICATE
Acknowledgements

I take this opportunity to express my deep gratitude of appreciation to all those who
encourage us for successful completion of the internship.

I wish to express my sincere gratitude to my professor Dr.J KRISHNA, HOD of


Mechanical department for their consistent help and encouragement to complete the
internship.

I express my sincere thanks to my internship Co-Ordinator Mr.CH RAMESH BABU,


professor department of ECE for his suggestions and constant source of information
for me. I sincerely express thanks to my internship mentors for their excellent
suggestions and extended co-operation for its success.
I wholeheartedly express my thanks to all ECE department faculty members for their full-
fledged co-operation towards completion of my internship.
I wish to express my deepest sense of gratitude and my sincere thanks to

Dr. A.V. BHASKARA RAO, PRINCIPAL of Rise Krishna Sai Prakasam Group Of
Institutions for his suggestions.

I wish to convey sincere thanks to chairman of our college Sri. I.C.


RANGAMANNAR GARU, and secretary Sri. SIDDA HANUMANTHA RAO
GARU.

I am also thankful to all who helped me directly and indirectly in the successful
completion of this internship.

Project associate:

Mallepogu Sagar
(228A1A0309)
Brief description of the Person
Day & daily activity In-Charge
Learning Outcome
Date
Signature

DAY 1 Learned about the


Introduction introduction to cloud
computing

DAY 2 Learned about the


advantages of cloud
Python Basics for ML
computing

DAY 3 Learned about the

Data preprocessing introduction to

amazon web

services

DAY 4 Data preprocessing Learned about the


introduction to amazon

web services

DAY 5 Introduction to Machine learning Learned about the aws


cloud adoption
framework

Learned about the aws


DAY 6 Introduction to Machine cloud adoption
learning framework
WEEKLY REPORT
WEEK-1
The objective of the activity done: The main objective of the
first week activities is to know about the Introduction to cloud
computing and how to complete certain modules as part of the
internshipWeek 1: Introduction to Python and ML Basics

Day 1-2: Python Basics for ML


 Review Python fundamentals: variables, data types, loops, functions,
and classes.
 Learn about key libraries for ML: NumPy (for numerical operations),
Pandas (for data manipulation), and Matplotlib (for visualization).

Day 3-4: Data Preprocessing


 Understand data cleaning: handling missing values, outliers,
duplicates, and normalization.
 Learn how to load and manipulate data using Pandas.
 Learn basic visualization techniques with Matplotlib and Seaborn
(histograms, scatter plots, box plots).

Day 5-7: Introduction to Machine Learning


 Understand the difference between supervised and unsupervised
learning.
 Study the main steps in a machine learning project: problem
definition, data collection, model selection, training, evaluation, and
deployment.
 Learn about key ML algorithms (linear regression, logistic regression,
decision trees, and k-NN).
 Implement a simple ML model using Scikit-Learn: Linear Regression.
Brief description of the Person
Day & daily activity In-Charge
Learning Outcome
Date
Signature

DAY 1 Learned about the


introduction to cloud
Linear Regression computing

DAY 2 Learned about the


advantages of cloud
Linear Regression
computing

DAY 3 Learned about the

introduction to

Logistic Regression amazon web

services

DAY 4 Logistic Regression Learned about the


introduction to amazon

web services

K-Nearest Neighbors (k-NN)


DAY 5 Learned about the aws
cloud adoption
framework

K-Nearest Neighbors (k-NN) Learned about the aws


DAY 6 cloud adoption
framework
WEEKLYREPORT

WEEK 2
Objective of the activity done: The main objective of second week activities
is to know about flows Total Cost of Ownership, Technical Support. And how to
complete given modules as a part of the internship.

Week 2: Supervised Learning - Regression and Classification

Day 8-9: Linear Regression


 Deep dive into linear regression: assumptions, cost function (MSE),
and gradient descent.
 Train and evaluate a linear regression model using Scikit-Learn on a
dataset (e.g., Boston housing data).
 Understand overfitting and underfitting.

Day 10-12: Logistic Regression


 Learn about classification problems and logistic regression.
 Implement a logistic regression model for binary classification (e.g.,
predicting whether a customer buys a product or not).
 Learn about evaluation metrics: accuracy, precision, recall, F1 score,
confusion matrix.

Day 13-14: K-Nearest Neighbors (k-NN)


 Understand the k-NN algorithm: distance metrics, how k is chosen.
 Implement a k-NN classifier for a classification problem (e.g., Iris
dataset).
 Compare the performance of k-NN with logistic regression.
Brief description of the Person
Day & daily activity In-Charge
Learning Outcome
Date
Signature

DAY 1 Learned about the


Decision Trees and Random introduction to cloud
Forests
computing

DAY 2 Learned about the


advantages of cloud
Decision Trees and Random
Forests
computing

DAY 3 I Learned about the

introduction to
Support Vector Machines (SVM)
amazon web

services

DAY 4 Support Vector Machines (SVM Learned about the


introduction to amazon

web services

Implement a Random Forest


DAY 5 Learned about the aws
model cloud adoption
framework

Implement a Random Forest Learned about the aws


DAY 6 cloud adoption
model framework
WEEKLYREPORT

WEEK-3
The objective of the activity done: The main objective of the third week
activities to

know about the Lightning Web Components (LWC), Lightning Web Components (LWC
& API).
Detailed Report:

Week 3: Advanced Supervised Learning


Day 15-16: Decision Trees and Random Forests
 Learn how decision trees work: splitting criteria, Gini impurity, and
entropy.
 Build and visualize decision trees using Scikit-Learn.
 Understand Random Forests and why they are effective.

Day 17-18: Support Vector Machines (SVM)


 Day 19-21 Implement a Random Forest model and compare it to decision
trees.

 Learn about SVM: concept of hyperplanes, kernels, margin


maximization.
 Implement an SVM model for classification (e.g., handwritten digit
classification).
 Study SVM evaluation metrics and fine-tuning.
: Model Evaluation and Tuning

 Learn how to split data: training, validation, and test sets.


 Understand cross-validation and bias-variance tradeoff.
 Study model evaluation techniques: ROC-AUC, cross-validation, hyperparameter tuning with
GridSearchCV.
Brief description of the Person
Day & daily activity In-Charge
Learning Outcome
Date
Signature

DAY 1 Clustering - K-Means and Learned about the


Hierarchical Clustering introduction to cloud
computing

DAY 2 Learned about the


advantages of cloud
computing
Clustering - K-Means and
Hierarchical Clustering
DAY 3 Learned about the
5

introduction to

Dimensionality Reduction - PCA amazon web

services

DAY 4 Dimensionality Reduction - PCA Learned about the


introduction to amazon

web services

DAY 5 Anomaly Detection Learned about the aws


cloud adoption
framework

Anomaly Detection Learned about the aws


DAY 6 cloud adoption
framework
WEEKLYREPORT

WEEK 4

The objective of the activity done: The main objective of the fourth week is to
test with in the knowledge gain all over the course by completing AWS Identity and the
Management (IAM), Securing a new AWS account Securing a new AWS account,
Securing accounts etc.

Week 4: Unsupervised Learning

Day 22-23: Clustering - K-Means and Hierarchical


Clustering
 Implement K- Learn K-Means clustering: initialization, distance
metrics, and convergence.
 Means clustering on a dataset (e.g., customer segmentation).
 Understand hierarchical clustering and dendrograms.

Day 24-25: Dimensionality Reduction - PCA


 Learn Principal Component Analysis (PCA) for reducing feature
space.
 Apply PCA to a high-dimensional dataset (e.g., digit recognition).
 Understand how PCA helps in visualizing and improving model
performance.

Day 26-28: Anomaly Detection


 anomaly Understand anomaly detection techniques.
 Implement detection with Isolation Forest or One-Class SVM.
Brief description of the Person
Day & daily activity Learning Outcome In-Charge
Date
Signature

DAY 1 Learned about the


Introduction to Deep Learning introduction to cloud
computing

DAY 2 Introduction to Deep Learning Learned about the


advantages of cloud
computing

DAY 3 Convolutional Neural Networks Learned about the


(CNN)
introduction to

amazon web

services

DAY 4 Convolutional Neural Networks Learned about the


introduction to amazon
(CNN)
web services

DAY 5 Recurrent Neural Networks Learned about the aws


(RNN) cloud adoption
framework

Learned about the aws


DAY 6 Recurrent Neural Networks cloud adoption
(RNN) framework
WEEKLYREPORT

WEEK-5
The objective of the activity done: The main objective of this
week activities is to know about the Validations and Networking
basics, Amazon VPC etc. based on certain modules as part of the
internship.

Week 5: Deep Learning and Neural Networks

Day 29-31: Introduction to Deep Learning


 Learn the fundamentals of neural networks: nodes, layers, activation
functions, forward propagation, and backpropagation.
 Understand loss functions, optimizers, and gradient descent.
 Implement a simple neural network using Keras/TensorFlow on a
basic classification problem (e.g., MNIST dataset).

Day 32-33: Convolutional Neural Networks (CNN)


 Learn about CNNs for image processing tasks.
 Understand the architecture of CNNs: convolution layers, pooling
layers, and fully connected layers.
 Implement a CNN model using Keras/TensorFlow for image
classification.

Day 34-35: Recurrent Neural Networks (RNN)


 Understand RNNs and their use in sequential data (e.g., time series,
text).
 Learn about LSTM (Long Short-Term Memory) and GRU (Gated
Recurrent Units).
 Build a simple RNN or LSTM model for sequence prediction (e.g.,
sentiment analysis or time-series forecasting).
Brief description of the Person
Day & daily activity In-Charge
Learning Outcome
Date
Signature

DAY 1 Learned about the


End-to-End Machine Learning introduction to cloud
Project I
computing

DAY 2 End-to-End Machine Learning Learned about the


Project advantages of cloud

computing

DAY 3 Learned about the

Deep Learning Project introduction to

amazon web

services

DAY 4 Learned about the


Deep Learning Project introduction to amazon

web services

DAY 5 Learned about the aws


Final Evaluation and Review cloud adoption
framework

Learned about the aws


DAY 6 cloud adoption
Final Evaluation and Review framework
WEEK-6

The objective of the activity done: The main objective


ofthis week activities is to Clarify the doubts and increasing
the soft skills.

Week 6: Project and Deployment

Day 36-38: End-to-End Machine Learning Project


 Choose a real-world dataset (e.g., Kaggle dataset) and define a
problem.
 Preprocess the data, select features, and train different models (both
supervised and unsupervised).
 Evaluate and fine-tune the models.

Day 39-41: Deep Learning Project


 Work on a deep learning project like image classification, sentiment
analysis, or recommendation system.
 Train and test models, experiment with hyperparameters, and optimize
performance.

Day 42-43: Model Deployment


 Learn about model deployment and scalability.
 Use Flask/Django to create an API for your ML model.
 Learn about model versioning and containerization with Docker.

Day 44-45: Final Evaluation and Review


 Review all the models and concepts learned.
 Test your knowledge by revisiting key topics.
 Evaluate your project, document results, and prepare a report or
presentation.
Random forest algorithms have three main hyperparameters, which need to be set
before training. These include node size, the number of trees, and the number of features
sampled. From there, the random forest classifier can be used to solve for regression or
classification problems.

The random forest algorithm is made up of a collection of decision trees, and each
tree in the ensemble is comprised of a data sample drawn from a training set with
replacement, called the bootstrap sample. Of that training sample, one-third of it is set aside
as test data, known as the out-of-bag (oob) sample, which we’ll come

back to later. Another instance of randomness is then injected through feature


bagging, adding more diversity to the dataset and reducing the correlation among decision
trees. Depending on the type of problem, the determination of the prediction will vary. For a
regression task, the individual decision trees will be averaged, and for a classification task, a
majority vote—i.e. the most frequent categorical variable—will yield the predicted class.
Finally, the oob sample is then used for cross-validation, finalizing that prediction.

K-Nearest Neighbour:
K-Nearest Neighbours is one of the most basic yet essential classification algorithms
in Machine Learning. It belongs to the supervised learning domain and finds intense
application in pattern recognition, data mining and intrusion detection.
It is widely disposable in real-life scenarios since it is non-parametric, meaning, it does not
make any underlying assumptions about the distribution of data (as opposed to other
algorithms such as GMM, which assume a Gaussian distribution of the given data).
We are given some prior data (also called training data), which classifies coordinates into
groups identified by an attribute.
As an example, consider the following table of data points containing two features:

Now, given another set of data points (also called testing data), allocate these points a group
by analyzing the training set. Note that the unclassified points are marked as ‘White’.

Intuition:
If we plot these points on a graph, we may be able to locate some clusters or groups.
Now, given an unclassified point, we can assign it to a group by observing what group its
nearest neighbours belong to. This means a point close to a cluster of points classified as
‘Red’ has a higher probability of getting classified as ‘Red’.
Intuitively, we can see that the first point (2.5, 7) should be classified as ‘Green’ and the
second point (5.5, 4.5) should be classified as ‘Red’.
Algorithm
Let m be the number of training data samples. Let p be an unknown point.

1. Store the training samples in an array of data points arr[]. This means each element
of this array represents a tuple (x, y).
for i=0 to m:

Calculate Euclidean distance d(arr[i], p).

1. Make set S of K smallest distances obtained. Each of these distances corresponds


to an already classified data point.
2. Return the majority label among S.

Naive Bayes Classifiers

the theory behind the Naive Bayes classifiers and their implementation.

Naive Bayes classifiers are a collection of classification algorithms based on Bayes’


Theorem. It is not a single algorithm but a family of algorithms where all of them share a
common principle, i.e. every pair of features being classified is independent of each other.
To start with, let us consider a dataset.

Consider a fictional dataset that describes the weather conditions for playing a game of golf.
Given the weather conditions, each tuple classifies the conditions as fit(“Yes”) or unfit(“No”)
for playing golf.

Here is a tabular representation of our dataset.

Outlook Temperature Humidity Windy Play Golf

0 Rainy Hot High False No

1 Rainy Hot High True No


Outlook Temperature Humidity Windy Play Golf

2 Overcast Hot High False Yes

3 Sunny Mild High False Yes

4 Sunny Cool Normal False Yes

5 Sunny Cool Normal True No

6 Overcast Cool Normal True Yes

7 Rainy Mild High False No

8 Rainy Cool Normal False Yes

9 Sunny Mild Normal False Yes

10 Rainy Mild Normal True Yes

11 Overcast Mild High True Yes

12 Overcast Hot Normal False Yes

13 Sunny Mild High True No

The dataset is divided into two parts, namely, feature matrix and the response vector.
 Feature matrix contains all the vectors(rows) of dataset in which each vector
consists of the value of dependent features. In above dataset, features are
‘Outlook’, ‘Temperature’, ‘Humidity’ and ‘Windy’.
 Response vector contains the value of class variable (prediction or output) for
each row of feature matrix. In above dataset, the class variable name is ‘Play
golf’.

Assumption:
The fundamental Naive Bayes assumption is that each feature makes an:
 independent
 equal
contribution to the outcome.

With relation to our dataset, this concept can be understood as:

 We assume that no pair of features are dependent. For example, the temperature
being ‘Hot’ has nothing to do with the humidity or the outlook being ‘Rainy’ has
no effect on the winds. Hence, the features are assumed to be independent.
 Secondly, each feature is given the same weight(or importance). For example,
knowing only temperature and humidity alone can’t predict the outcome
accurately. None of the attributes is irrelevant and assumed to be
contributing equally to the outcome.
Note: The assumptions made by Naive Bayes are not generally correct in real-world
situations. In-fact, the independence assumption is never correct but often works well in
practice.
Now, before moving to the formula for Naive Bayes, it is important to know about Bayes’
theorem.

Bayes’ Theorem
Bayes’ Theorem finds the probability of an event occurring given the probability of another
event that has already occurred. Bayes’ theorem is stated mathematically as the following
equation:

where A and B are events and P(B) ≠ 0.

 Basically, we are trying to find probability of event A, given the event B is true.
Event B is also termed as evidence.
 P(A) is the priori of A (the prior probability, i.e. Probability of event before
evidence is seen). The evidence is an attribute value of an unknown instance(here,
it is event B).
 P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is
seen.

where, y is class variable and X is a dependent feature vector (of size n) where:
Just to clear, an example of a feature vector and corresponding class variable can be: (refer
1st row of dataset)

X = (Rainy, Hot, High, False)


y = No
So basically, P(y|X) here means, the probability of “Not playing golf” given that the weather
conditions are “Rainy outlook”, “Temperature is hot”, “high humidity” and “no wind”.

Naive assumption:
Now, its time to put a naive assumption to the Bayes’ theorem, which
is, independence among the features. So now, we split evidence into the independent parts.
Now, if any two events A and B are independent, then,

P(A,B) = P(A)P(B)

Now, we need to create a classifier model. For this, we find the probability of given set of
inputs for all possible values of the class variable y and pick up the output with maximum
probability. This can be expressed mathematically as:

So, finally, we are left with the task of calculating P(y) and P(xi | y).
Please note that P(y) is also called class probability and P(xi | y) is called conditional
probability.
The different naive Bayes classifiers differ mainly by the assumptions they make regarding
the distribution of P(xi | y).
Let us try to apply the above formula manually on our weather dataset. For this, we need to
do some precomputations on our dataset.

We need to find P(xi | yj) for each xi in X and yj in y. All these calculations have been
demonstrated in the tables below:
So, in the figure above, we have calculated P(x i | yj) for each xi in X and yj in y manually in
the tables 1-4. For example, probability of playing golf given that the temperature is cool, i.e
P(temp. = cool | play golf = Yes) = 3/9.
Also, we need to find class probabilities (P(y)) which has been calculated in the table 5. For
example, P(play golf = Yes) = 9/14.

So now, we are done with our pre-computations and the classifier is ready!

Let us test it on a new set of features (let us call it today):

today = (Sunny, Hot, Normal, False)


So, probability of playing golf is given by:

The method that we discussed above is applicable for discrete data. In case of continuous
data, we need to make some assumptions regarding the distribution of values of each feature.
The different naive Bayes classifiers differ mainly by the assumptions they make regarding
the distribution of P(xi | y).
Now, we discuss one of such classifiers here.

Gaussian Naive Bayes classifier


In Gaussian Naive Bayes, continuous values associated with each feature are assumed to be
distributed according to a Gaussian distribution. A Gaussian distribution is also
called Normal distribution. When plotted, it gives a bell-shaped curve which is symmetric
about the mean of the feature values as shown below:

The likelihood of the features is assumed to be Gaussian, hence, conditional probability is


given by:

Support Vector Machine(SVM) is a supervised machine learning algorithm used for both
classification and regression. Though we say regression problems as well its best suited for
classification. The objective of SVM algorithm is to find a hyperplane in an N-dimensional
space that distinctly classifies the data points. The dimension of the hyperplane depends upon
the number of features. If the number of input features is two, then the hyperplane is just a
line. If the number of input features is three, then the hyperplane becomes a 2-D plane. It
becomes difficult to imagine when the number of features exceeds three.

Let’s consider two independent variables x1, x2 and one dependent variable which is either
a blue circle or a red circle.
Linearly Separable Data points :

From the figure above its very clear that there are multiple lines (our hyperplane here is a
line because we are considering only two input features x1, x2) that segregates our data
points or does a classification between red and blue circles. So how do we choose the best
line or in general the best hyperplane that segregates our data points.

Selecting the best hyper-plane:

One reasonable choice as the best hyperplane is the one that represents the largest separation
or margin between the two classes.
So we choose the hyperplane whose distance from it to the nearest data point on each side is
maximized. If such a hyperplane exists it is known as the maximum-margin hyperplane/hard
margin. So from the above figure, we choose L2.

Let’s consider a scenario like shown below

Here we have one blue ball in the boundary of the red ball. So how does SVM classify the
data? It’s simple! The blue ball in the boundary of red ones is an outlier of blue balls. The
SVM algorithm has the characteristics to ignore the outlier and finds the best hyperplane that
maximizes the margin. SVM is robust to outliers.
So in this type of data points what SVM does is, it finds maximum margin as done with
previous data sets along with that it adds a penalty each time a point crosses the margin. So
the margins in these type of cases are called soft margin. When there is a soft margin to the
data set, the SVM tries to minimize (1/margin+𝖠(∑penalty)). Hinge loss is a commonly used
penalty. If no violations no hinge loss.If violations hinge loss proportional to the distance of
violation.

Till now, we were talking about linearly separable data(the group of blue balls and red balls
are separable by a straight line/linear line). What to do if data are not linearly separable?

Say, our data is like shown in the figure above.SVM solves this by creating a new variable
using a kernel. We call a point xi on the line and we create a new variable yi as a function of
distance from origin o.so if we plot this we get something like as shown below
In this case, the new variable y is created as a function of distance from the origin. A non-
linear function that creates a new variable is referred to as kernel.

SVM Kernel:

The SVM kernel is a function that takes low dimensional input space and transforms it into
higher-dimensional space, ie it converts not separable problem to separable problem. It is
mostly useful in non-linear separation problems. Simply put the kernel, it does some
extremely complex data transformations then finds out the process to separate the data based
on the labels or outputs defined.

Advantages of SVM:

 Effective in high dimensional cases


 Its memory efficient as it uses a subset of training points in the decision function
called support vectors
 Different kernel functions can be specified for the decision functions and its possible
to specify custom kernels

Artificial neural networks are a technology based on studies of the brain and nervous system
as depicted in Fig. 1. These networks emulate a biological neural network but they use a
reduced set of concepts from biological neural systems. Specifically, ANN models simulate
the electrical activity of the brain and nervous system. Processing elements (also known as
either a neurode or perceptron) are connected to other processing elements. Typically the
neurodes are arranged in a layer or vector, with the output of one layer serving as the input to
the next layer and possibly other layers. A neurode may be connected to all or a subset of the
neurodes in the subsequent layer, with these connections simulating the synaptic
connections of the brain. Weighted data signals entering a neurode simulate the electrical
excitation of a nerve cell and consequently the transference of information within the network
or brain. The input values to a processing element, in, are multiplied by a connection
weight, wn,m, that simulates the strengthening of neural pathways in the brain. It is through the
adjustment of the connection strengths or weights that learning is emulated in ANNs.

All of the weight-adjusted input values to a processing element are then aggregated using a
vector to scalar function such as summation (i.e., y = Σwijxi), averaging, input maximum, or
mode value to produce a single input value to the neurode. Once the input value is calculated,
the processing element then uses a transfer function to produce its output (and consequently
the input signals for the next processing layer). The transfer function transforms the neurode's
input value. Typically this transformation involves the use of a sigmoid, hyperbolic-tangent,
or other nonlinear function. The process is repeated between layers of processing elements
until a final output value, on, or vector of values is produced by the neural network.
Theoretically, to simulate the asynchronous activity of the human nervous system, the
processing elements of the artificial neural network should also be activated with the
weighted
input signal in an asynchronous manner. Most software and hardware implementations
of artificial neural networks, however, implement a more discretized approach that
guarantees that each processing element is activated once for each presentation of a vector of
input values

A Convolutional Neural Network, also known as CNN or ConvNet, is a class of neural


networks that specializes in processing data that has a grid-like topology, such as an image. A
digital image is a binary representation of visual data. It contains a series of pixels arranged in
a grid-like fashion that contains pixel values to denote how bright and what color each pixel
should be.

The human brain processes a huge amount of information the second we see an image. Each
neuron works in its own receptive field and is connected to other neurons in a way that they
cover the entire visual field. Just as each neuron responds to stimuli only in the restricted
region of the visual field called the receptive field in the biological vision system, each
neuron in a CNN processes data only in its receptive field as well. The layers are arranged in
such a way so that they detect simpler patterns first (lines, curves, etc.) and more complex
patterns (faces, objects, etc.) further along. By using a CNN, one can enable sight to
computers.
Convolutional Neural Network Architecture:

A CNN typically has three layers: a convolutional layer, a pooling layer, and a fully
connected layer.

Architecture of a CNN

Convolution Layer

The convolution layer is the core building block of the CNN. It carries the main portion of the
network’s computational load.

This layer performs a dot product between two matrices, where one matrix is the set of
learnable parameters otherwise known as a kernel, and the other matrix is the restricted
portion of the receptive field. The kernel is spatially smaller than an image but is more in-
depth. This means that, if the image is composed of three (RGB) channels, the kernel height
and width will be spatially small, but the depth extends up to all three channels.

Recurrent neural network :

A recurrent neural network (RNN) is a type of artificial neural network which uses
sequential data or time series data. These deep learning algorithms are commonly used for
ordinal or temporal problems, such as language translation, natural language processing (nlp),
speech recognition, and image captioning; they are incorporated into popular applications
such as Siri, voice search, and Google Translate. Like feedforward and convolutional neural
networks (CNNs), recurrent neural networks utilize training data to learn. They are
distinguished by their “memory” as they take information from prior inputs to influence the
current input and output. While traditional deep neural networks assume that inputs and
outputs are independent of each other, the output of recurrent neural networks depend on the
prior elements within the sequence. While future events would also be helpful in determining
the output of a given sequence, unidirectional recurrent neural networks cannot account for
these events in their predictions.

Recurrent Neural Network vs. Feedforward Neural Network

Comparison of Recurrent Neural Networks (on the left) and Feedforward Neural Networks
(on the right)

Let’s take an idiom, such as “feeling under the weather”, which is commonly used when
someone is ill, to aid us in the explanation of RNNs. In order for the idiom to make sense, it
needs to be expressed in that specific order. As a result, recurrent networks need to account
for the position of each word in the idiom and they use that information to predict the next
word in the sequence.

Looking at the visual below, the “rolled” visual of the RNN represents the whole neural
network, or rather the entire predicted phrase, like “feeling under the weather.” The
“unrolled” visual represents the individual layers, or time steps, of the neural network. Each
layer maps to a single word in that phrase, such as “weather”. Prior inputs, such as “feeling”
and “under”, would be represented as a hidden state in the third timestep to predict the output
in the sequence, “the”.
Another distinguishing characteristic of recurrent networks is that they share parameters
across each layer of the network. While feedforward networks have different weights across
each node, recurrent neural networks share the same weight parameter within each layer of
the network. That said, these weights are still adjusted in the through the processes of
backpropagation and gradient descent to facilitate reinforcement learning.

Recurrent neural networks leverage backpropagation through time (BPTT) algorithm to


determine the gradients, which is slightly different from traditional backpropagation as it is
specific to sequence data. The principles of BPTT are the same as traditional
backpropagation, where the model trains itself by calculating errors from its output layer to
its input layer. These calculations allow us to adjust and fit the parameters of the model
appropriately. BPTT differs from the traditional approach in that BPTT sums errors at each
time step whereas feedforward networks do not need to sum errors as they do not share
parameters across each layer.

Through this process, RNNs tend to run into two problems, known as exploding gradients
and vanishing gradients. These issues are defined by the size of the gradient, which is the
slope of the loss function along the error curve. When the gradient is too small, it continues to
become smaller, updating the weight parameters until they become insignificant—i.e. 0.
When that occurs, the algorithm is no longer learning. Exploding gradients occur when the
gradient is too large, creating an unstable model. In this case, the model weights will grow
too large, and they will eventually be represented as NaN. One solution to these issues is to
reduce the number of
hidden layers within the neural network, eliminating some of the complexity in the RNN
model.

You might also like