0% found this document useful (0 votes)
23 views23 pages

Sem 6 Ques Data Science

The document discusses key statistical concepts including skewness, kurtosis, correlation, linear regression, and machine learning. Skewness measures the asymmetry of a distribution while kurtosis measures its tailedness; both are important for understanding frequency distributions. It also defines correlation and its types, explains linear regression, and outlines various machine learning types such as supervised, unsupervised, and semi-supervised learning, along with their applications and examples.

Uploaded by

saba.firdous8987
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views23 pages

Sem 6 Ques Data Science

The document discusses key statistical concepts including skewness, kurtosis, correlation, linear regression, and machine learning. Skewness measures the asymmetry of a distribution while kurtosis measures its tailedness; both are important for understanding frequency distributions. It also defines correlation and its types, explains linear regression, and outlines various machine learning types such as supervised, unsupervised, and semi-supervised learning, along with their applications and examples.

Uploaded by

saba.firdous8987
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Ques 1.

Distinguish between Skewness and kurtosis and bring out their importance in
describing frequency distribution.

Difference Between Skewness and Kurtosis

Skewness Kurtosis
Skewness measures the asymmetry of a probability Kurtosis measures the tailedness or peakedness of a probability
distribution distribution
Positive skew indicates a right-skewed distribution, Positive kurtosis indicates a distribution with heavier tails, often
with the tail extending to the right referred to as “leptokurtic”
Negative skew indicates a left-skewed distribution, Negative kurtosis indicates a distribution with lighter tails, often
with the tail extending to the left referred to as “platykurtic”
A skewness value of zero indicates a symmetric A kurtosis value of zero indicates a distribution similar to the no
distribution distribution, often referred to as “mesokurtic”
Used to identify the direction and degree of
Used to identify the presence of outliers or extreme values
asymmetry
Sensitive to changes in the tails of the distribution Sensitive to changes in the center and shoulders of the distributio
Commonly used in fields such as economics,
Commonly used in statistics, engineering, and physical sciences
finance, and social sciences
Examples: income distribution, stock returns Examples: particle physics, image processing

Ques2 : What is Correlation? Explain different types of correlation with suitable example?

Amswer: Correlation refers to a process for establishing the relationships between


two variables. You learned a way to get a general idea about whether or not two
variables are related, is to plot them on a “scatter plot”. While there are many
measures of association for variables which are measured at the ordinal or
higher level of measurement, correlation is the most commonly used approach.

Correlation in Statistics

This section shows how to calculate and interpret correlation coefficients for
ordinal and interval level scales. Methods of correlation summarize the
relationship between two variables in a single number called the correlation
coefficient. The correlation coefficient is usually represented using the symbol r,
and it ranges from -1 to +1.

A correlation coefficient quite close to 0, but either positive or negative, implies


little or no relationship between the two variables. A correlation coefficient close
to plus 1 means a positive relationship between the two variables, with increases
in one of the variables being associated with increases in the other variable.

A correlation coefficient close to -1 indicates a negative relationship between two


variables, with an increase in one of the variables being associated with a
decrease in the other variable. A correlation coefficient can be produced for
ordinal, interval or ratio level variables, but has little meaning for variables which
are measured on a scale which is no more than nominal.

For ordinal scales, the correlation coefficient can be calculated by using


Spearman’s rho. For interval or ratio level scales, the most commonly used
correlation coefficient is Pearson’s r, ordinarily referred to as simply the correlation
coefficient.

Also, read: Correlation and Regression

What Does Correlation Measure?

In statistics, Correlation studies and measures the direction and extent of


relationship among variables, so the correlation measures co-variation, not
causation. Therefore, we should never interpret correlation as implying cause and
effect relation. For example, there exists a correlation between two variables X and
Y, which means the value of one variable is found to change in one direction, the
value of the other variable is found to change either in the same direction (i.e.
positive change) or in the opposite direction (i.e. negative change). Furthermore,
if the correlation exists, it is linear, i.e. we can represent the relative movement of
the two variables by drawing a straight line on graph paper.

Correlation Coefficient

The correlation coefficient, r, is a summary measure that describes the extent of


the statistical relationship between two interval or ratio level variables. The
correlation coefficient is scaled so that it is always between -1 and +1. When r is
close to 0 this means that there is little relationship between the variables and the
farther away from 0 r is, in either the positive or negative direction, the greater the
relationship between the two variables.

The two variables are often given the symbols X and Y. In order to illustrate how
the two variables are related, the values of X and Y are pictured by drawing the
scatter diagram, graphing combinations of the two variables. The scatter
diagram is given first, and then the method of determining Pearson’s r is
presented. From the following examples, relatively small sample sizes are given.
Later, data from larger samples are given.

Scatter Diagram

A scatter diagram is a diagram that shows the values of two variables X and Y,
along with the way in which these two variables relate to each other. The values of
variable X are given along the horizontal axis, with the values of the variable Y
given on the vertical axis.

Later, when the regression model is used, one of the variables is defined as an
independent variable, and the other is defined as a dependent variable. In
regression, the independent variable X is considered to have some effect or
influence on the dependent variable Y. Correlation methods are symmetric with
respect to the two variables, with no indication of causation or direction of
influence being part of the statistical consideration. A scatter diagram is given in
the following example. The same example is later used to determine the
correlation coefficient.

Types of Correlation

The scatter plot explains the correlation between the two attributes or variables. It
represents how closely the two variables are connected. There can be three such
situations to see the relation between the two variables –

• Positive Correlation – when the values of the two variables move in the
same direction so that an increase/decrease in the value of one variable is
followed by an increase/decrease in the value of the other variable.
• Negative Correlation – when the values of the two variables move in the
opposite direction so that an increase/decrease in the value of one
variable is followed by decrease/increase in the value of the other variable.
• No Correlation – when there is no linear dependence or no relation
between the two variables.
Ques 3: Discuss linear regression with Example?
Answer : Linear regression is also a type of machine-learning algorithm more specifically
a supervised machine-learning algorithm that learns from the labelled datasets and
maps the data points to the most optimized linear functions. which can be used for
prediction on new datasets.
First of we should know what supervised machine learning algorithms is. It is a type of
machine learning where the algorithm learns from labelled data. Labeled data means the
dataset whose respective target value is already known. Supervised learning has two
types:
• Classification: It predicts the class of the dataset based on the independent
input variable. Class is the categorical or discrete values. like the image of an
animal is a cat or dog?
• Regression: It predicts the continuous output variables based on the
independent input variable. like the prediction of house prices based on
different parameters like house age, distance from the main road, location,
area, etc.
Ques 4 : Define Machine learning? Explain types of machine learning with suitable
examples.
Answer: Machine learning is the branch of Artificial Intelligence that focuses on developing
models and algorithms that let computers learn from data and improve from previous experience
without being explicitly programmed for every task. In simple words, ML teaches the systems to
think and understand like humans by learning from the data.

Types of Machine Learning


There are several types of machine learning, each with special characteristics and
applications. Some of the main types of machine learning algorithms are as follows:
1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning
. Supervised Machine Learning
Supervised learning is defined as when a model gets trained on a “Labelled Dataset”.
Labelled datasets have both input and output parameters. In Supervised
Learning algorithms learn to map points between inputs and correct outputs. It has both
training and validation datasets labelled.

Supervised Learning

Let’s understand it with the help of an example.


Example: Consider a scenario where you have to build an image classifier to differentiate
between cats and dogs. If you feed the datasets of dogs and cats labelled images to the
algorithm, the machine will learn to classify between a dog or a cat from these labeled
images. When we input new dog or cat images that it has never seen before, it will use the
learned algorithms and predict whether it is a dog or a cat. This is how supervised
learning works, and this is particularly an image classification.
There are two main categories of supervised learning that are mentioned below:
• Classification
• Regression
Classification
Classification deals with predicting categorical target variables, which represent discrete
classes or labels. For instance, classifying emails as spam or not spam, or predicting
whether a patient has a high risk of heart disease. Classification algorithms learn to map
the input features to one of the predefined classes.
Here are some classification algorithms:
• Logistic Regression
• Support Vector Machine
• Random Forest
• Decision Tree
• K-Nearest Neighbors (KNN)
• Naive Bayes
Regression
Regression, on the other hand, deals with predicting continuous target variables, which
represent numerical values. For example, predicting the price of a house based on its size,
location, and amenities, or forecasting the sales of a product. Regression algorithms learn
to map the input features to a continuous numerical value.
Here are some regression algorithms:
• Linear Regression
• Polynomial Regression
• Ridge Regression
• Lasso Regression
• Decision tree
• Random Forest
Advantages of Supervised Machine Learning
• Supervised Learning models can have high accuracy as they are trained
on labelled data.
• The process of decision-making in supervised learning models is often
interpretable.
• It can often be used in pre-trained models which saves time and resources
when developing new models from scratch.
Disadvantages of Supervised Machine Learning
• It has limitations in knowing patterns and may struggle with unseen or
unexpected patterns that are not present in the training data.
• It can be time-consuming and costly as it relies on labeled data only.
• It may lead to poor generalizations based on new data.
Applications of Supervised Learning
Supervised learning is used in a wide variety of applications, including:
• Image classification: Identify objects, faces, and other features in images.
• Natural language processing: Extract information from text, such as
sentiment, entities, and relationships.
• Speech recognition: Convert spoken language into text.
• Recommendation systems: Make personalized recommendations to users.
• Predictive analytics: Predict outcomes, such as sales, customer churn, and
stock prices.
• Medical diagnosis: Detect diseases and other medical conditions.
• Fraud detection: Identify fraudulent transactions.
• Autonomous vehicles: Recognize and respond to objects in the environment.
• Email spam detection: Classify emails as spam or not spam.
• Quality control in manufacturing: Inspect products for defects.
• Credit scoring: Assess the risk of a borrower defaulting on a loan.
• Gaming: Recognize characters, analyze player behavior, and create NPCs.
• Customer support: Automate customer support tasks.
• Weather forecasting: Make predictions for temperature, precipitation, and
other meteorological parameters.
• Sports analytics: Analyze player performance, make game predictions, and
optimize strategies.
2. Unsupervised Machine Learning
Unsupervised Learning Unsupervised learning is a type of machine learning technique in
which an algorithm discovers patterns and relationships using unlabeled data. Unlike
supervised learning, unsupervised learning doesn’t involve providing the algorithm with
labeled target outputs. The primary goal of Unsupervised learning is often to discover
hidden patterns, similarities, or clusters within the data, which can then be used for
various purposes, such as data exploration, visualization, dimensionality reduction, and
more.
Unsupervised Learning

Let’s understand it with the help of an example.


Example: Consider that you have a dataset that contains information about the purchases
you made from the shop. Through clustering, the algorithm can group the same
purchasing behavior among you and other customers, which reveals potential customers
without predefined labels. This type of information can help businesses get target
customers as well as identify outliers.
There are two main categories of unsupervised learning that are mentioned below:
• Clustering
• Association
Clustering
Clustering is the process of grouping data points into clusters based on their similarity.
This technique is useful for identifying patterns and relationships in data without the
need for labeled examples.
Here are some clustering algorithms:
• K-Means Clustering algorithm
• Mean-shift algorithm
• DBSCAN Algorithm
• Principal Component Analysis
• Independent Component Analysis
Association
Association rule learning is a technique for discovering relationships between items in a
dataset. It identifies rules that indicate the presence of one item implies the presence of
another item with a specific probability.
Here are some association rule learning algorithms:
• Apriori Algorithm
• Eclat
• FP-growth Algorithm
Advantages of Unsupervised Machine Learning
• It helps to discover hidden patterns and various relationships between the
data.
• Used for tasks such as customer segmentation, anomaly detection, and data
exploration.
• It does not require labeled data and reduces the effort of data labeling.
Disadvantages of Unsupervised Machine Learning
• Without using labels, it may be difficult to predict the quality of the model’s
output.
• Cluster Interpretability may not be clear and may not have meaningful
interpretations.
• It has techniques such as autoencoders and dimensionality reduction that can
be used to extract meaningful features from raw data.
Applications of Unsupervised Learning
Here are some common applications of unsupervised learning:
• Clustering: Group similar data points into clusters.
• Anomaly detection: Identify outliers or anomalies in data.
• Dimensionality reduction: Reduce the dimensionality of data while preserving
its essential information.
• Recommendation systems: Suggest products, movies, or content to users
based on their historical behavior or preferences.
• Topic modeling: Discover latent topics within a collection of documents.
• Density estimation: Estimate the probability density function of data.
• Image and video compression: Reduce the amount of storage required for
multimedia content.
• Data preprocessing: Help with data preprocessing tasks such as data
cleaning, imputation of missing values, and data scaling.
• Market basket analysis: Discover associations between products.
• Genomic data analysis: Identify patterns or group genes with similar
expression profiles.
• Image segmentation: Segment images into meaningful regions.
• Community detection in social networks: Identify communities or groups of
individuals with similar interests or connections.
• Customer behavior analysis: Uncover patterns and insights for better
marketing and product recommendations.
• Content recommendation: Classify and tag content to make it easier to
recommend similar items to users.
• Exploratory data analysis (EDA): Explore data and gain insights before
defining specific tasks.
3. Semi-Supervised Learning
Semi-Supervised learning is a machine learning algorithm that works between
the supervised and unsupervised learning so it uses both labelled and unlabelled data.
It’s particularly useful when obtaining labeled data is costly, time-consuming, or
resource-intensive. This approach is useful when the dataset is expensive and time-
consuming. Semi-supervised learning is chosen when labeled data requires skills and
relevant resources in order to train or learn from it.
We use these techniques when we are dealing with data that is a little bit labeled and
the rest large portion of it is unlabeled. We can use the unsupervised techniques to
predict labels and then feed these labels to supervised techniques. This technique is
mostly applicable in the case of image data sets where usually all images are not
labeled.
Semi-Supervised Learning

Let’s understand it with the help of an example.


Example: Consider that we are building a language translation model, having labeled
translations for every sentence pair can be resources intensive. It allows the models to
learn from labeled and unlabeled sentence pairs, making them more accurate. This
technique has led to significant improvements in the quality of machine translation
services.
Types of Semi-Supervised Learning Methods
There are a number of different semi-supervised learning methods each with its own
characteristics. Some of the most common ones include:
• Graph-based semi-supervised learning: This approach uses a graph to
represent the relationships between the data points. The graph is then used to
propagate labels from the labeled data points to the unlabeled data points.
• Label propagation: This approach iteratively propagates labels from the
labeled data points to the unlabeled data points, based on the similarities
between the data points.
• Co-training: This approach trains two different machine learning models on
different subsets of the unlabeled data. The two models are then used to label
each other’s predictions.
• Self-training: This approach trains a machine learning model on the labeled
data and then uses the model to predict labels for the unlabeled data. The
model is then retrained on the labeled data and the predicted labels for the
unlabeled data.
• Generative adversarial networks (GANs): GANs are a type of deep learning
algorithm that can be used to generate synthetic data. GANs can be used to
generate unlabeled data for semi-supervised learning by training two neural
networks, a generator and a discriminator.
Advantages of Semi- Supervised Machine Learning
• It leads to better generalization as compared to supervised learning, as it
takes both labeled and unlabeled data.
• Can be applied to a wide range of data.
Disadvantages of Semi- Supervised Machine Learning
• Semi-supervised methods can be more complex to implement compared to
other approaches.
• It still requires some labeled data that might not always be available or easy
to obtain.
• The unlabeled data can impact the model performance accordingly.
Applications of Semi-Supervised Learning
Here are some common applications of semi-supervised learning:
• Image Classification and Object Recognition: Improve the accuracy of models
by combining a small set of labeled images with a larger set of unlabeled
images.
• Natural Language Processing (NLP): Enhance the performance of language
models and classifiers by combining a small set of labeled text data with a
vast amount of unlabeled text.
• Speech Recognition: Improve the accuracy of speech recognition by leveraging
a limited amount of transcribed speech data and a more extensive set of
unlabeled audio.
• Recommendation Systems: Improve the accuracy of personalized
recommendations by supplementing a sparse set of user-item interactions
(labeled data) with a wealth of unlabeled user behavior data.
• Healthcare and Medical Imaging: Enhance medical image analysis by utilizing
a small set of labeled medical images alongside a larger set of unlabeled
images.
4. Reinforcement Machine Learning
Reinforcement machine learning algorithm is a learning method that interacts with the
environment by producing actions and discovering errors. Trial, error, and delay are the
most relevant characteristics of reinforcement learning. In this technique, the model keeps
on increasing its performance using Reward Feedback to learn the behavior or pattern.
These algorithms are specific to a particular problem e.g. Google Self Driving car,
AlphaGo where a bot competes with humans and even itself to get better and better
performers in Go Game. Each time we feed in data, they learn and add the data to their
knowledge which is training data. So, the more it learns the better it gets trained and
hence experienced.
Here are some of most common reinforcement learning algorithms:
• Q-learning: Q-learning is a model-free RL algorithm that learns a Q-function,
which maps states to actions. The Q-function estimates the expected reward
of taking a particular action in a given state.
• SARSA (State-Action-Reward-State-Action): SARSA is another model-free
RL algorithm that learns a Q-function. However, unlike Q-learning, SARSA
updates the Q-function for the action that was actually taken, rather than the
optimal action.
• Deep Q-learning: Deep Q-learning is a combination of Q-learning and deep
learning. Deep Q-learning uses a neural network to represent the Q-function,
which allows it to learn complex relationships between states and actions.

Reinforcement Machine Learning


Let’s understand it with the help of examples.
Example: Consider that you are training an AI agent to play a game like chess. The agent
explores different moves and receives positive or negative feedback based on the
outcome. Reinforcement Learning also finds applications in which they learn to perform
tasks by interacting with their surroundings.
Types of Reinforcement Machine Learning
There are two main types of reinforcement learning:
Positive reinforcement
• Rewards the agent for taking a desired action.
• Encourages the agent to repeat the behavior.
• Examples: Giving a treat to a dog for sitting, providing a point in a game for a
correct answer.
Negative reinforcement
• Removes an undesirable stimulus to encourage a desired behavior.
• Discourages the agent from repeating the behavior.
• Examples: Turning off a loud buzzer when a lever is pressed, avoiding a
penalty by completing a task.
Advantages of Reinforcement Machine Learning
• It has autonomous decision-making that is well-suited for tasks and that can
learn to make a sequence of decisions, like robotics and game-playing.
• This technique is preferred to achieve long-term results that are very difficult
to achieve.
• It is used to solve a complex problems that cannot be solved by conventional
techniques.
Disadvantages of Reinforcement Machine Learning
• Training Reinforcement Learning agents can be computationally expensive and
time-consuming.
• Reinforcement learning is not preferable to solving simple problems.
• It needs a lot of data and a lot of computation, which makes it impractical and
costly.
Applications of Reinforcement Machine Learning
Here are some applications of reinforcement learning:
• Game Playing: RL can teach agents to play games, even complex ones.
• Robotics: RL can teach robots to perform tasks autonomously.
• Autonomous Vehicles: RL can help self-driving cars navigate and make
decisions.
• Recommendation Systems: RL can enhance recommendation algorithms by
learning user preferences.
• Healthcare: RL can be used to optimize treatment plans and drug discovery.
• Natural Language Processing (NLP): RL can be used in dialogue systems and
chatbots.
• Finance and Trading: RL can be used for algorithmic trading.
• Supply Chain and Inventory Management: RL can be used to optimize supply
chain operations.
• Energy Management: RL can be used to optimize energy consumption.
• Game AI: RL can be used to create more intelligent and adaptive NPCs in video
games.
• Adaptive Personal Assistants: RL can be used to improve personal assistants.
• Virtual Reality (VR) and Augmented Reality (AR): RL can be used to create
immersive and interactive experiences.
• Industrial Control: RL can be used to optimize industrial processes.
• Education: RL can be used to create adaptive learning systems.
• Agriculture: RL can be used to optimize agricultural operations.

Ques 5: What is the goal of Support Vector Machine? How to compute the margin?

Answer : Support Vector Machine


Support Vector Machine (SVM) is a supervised machine learning algorithm used for both
classification and regression. Though we say regression problems as well it’s best suited
for classification. The main objective of the SVM algorithm is to find the
optimal hyperplane in an N-dimensional space that can separate the data points in
different classes in the feature space. The hyperplane tries that the margin between the
closest points of different classes should be as maximum as possible. The dimension of
the hyperplane depends upon the number of features. If the number of input features is
two, then the hyperplane is just a line. If the number of input features is three, then the
hyperplane becomes a 2-D plane. It becomes difficult to imagine when the number of
features exceeds three.
Let’s consider two independent variables x1, x2, and one dependent variable which is
either a blue circle or a red circle.

Linearly Separable Data points

From the figure above it’s very clear that there are multiple lines (our hyperplane here is a
line because we are considering only two input features x1, x2) that segregate our data
points or do a classification between red and blue circles. So how do we choose the best
line or in general the best hyperplane that segregates our data points?
How does SVM work?
One reasonable choice as the best hyperplane is the one that represents the largest
separation or margin between the two classes.
Multiple hyperplanes separate the data from two classes

So we choose the hyperplane whose distance from it to the nearest data point on each
side is maximized. If such a hyperplane exists it is known as the maximum-margin
hyperplane/hard margin. So from the above figure, we choose L2. Let’s consider a
scenario like shown below

Selecting hyperplane for data with outlier

Here we have one blue ball in the boundary of the red ball. So how does SVM classify the
data? It’s simple! The blue ball in the boundary of red ones is an outlier of blue balls. The
SVM algorithm has the characteristics to ignore the outlier and finds the best hyperplane
that maximizes the margin. SVM is robust to outliers.

Hyperplane which is the most optimized one

So in this type of data point what SVM does is, finds the maximum margin as done with
previous data sets along with that it adds a penalty each time a point crosses the margin.
So the margins in these types of cases are called soft margins. When there is a soft
margin to the data set, the SVM tries to minimize (1/margin+∧(∑penalty)). Hinge loss is a
commonly used penalty. If no violations no hinge loss.If violations hinge loss proportional
to the distance of violation.
Till now, we were talking about linearly separable data(the group of blue balls and red
balls are separable by a straight line/linear line). What to do if data are not linearly
separable?
Original 1D dataset for classification

Say, our data is shown in the figure above. SVM solves this by creating a new variable
using a kernel. We call a point xi on the line and we create a new variable yi as a function
of distance from origin o.so if we plot this we get something like as shown below

Mapping 1D data to 2D to become able to separate the two classes

In this case, the new variable y is created as a function of distance from the origin. A non-
linear function that creates a new variable is referred to as a kernel.
Support Vector Machine Terminology
1. Hyperplane: Hyperplane is the decision boundary that is used to separate the
data points of different classes in a feature space. In the case of linear
classifications, it will be a linear equation i.e. wx+b = 0.
2. Support Vectors: Support vectors are the closest data points to the
hyperplane, which makes a critical role in deciding the hyperplane and margin.
3. Margin: Margin is the distance between the support vector and hyperplane.
The main objective of the support vector machine algorithm is to maximize the
margin. The wider margin indicates better classification performance.
4. Kernel: Kernel is the mathematical function, which is used in SVM to map the
original input data points into high-dimensional feature spaces, so, that the
hyperplane can be easily found out even if the data points are not linearly
separable in the original input space. Some of the common kernel functions are
linear, polynomial, radial basis function(RBF), and sigmoid.
5. Hard Margin: The maximum-margin hyperplane or the hard margin
hyperplane is a hyperplane that properly separates the data points of different
categories without any misclassifications.
6. Soft Margin: When the data is not perfectly separable or contains outliers,
SVM permits a soft margin technique. Each data point has a slack variable
introduced by the soft-margin SVM formulation, which softens the strict
margin requirement and permits certain misclassifications or violations. It
discovers a compromise between increasing the margin and reducing
violations.
7. C: Margin maximisation and misclassification fines are balanced by the
regularisation parameter C in SVM. The penalty for going over the margin or
misclassifying data items is decided by it. A stricter penalty is imposed with a
greater value of C, which results in a smaller margin and perhaps fewer
misclassifications.
8. Hinge Loss: A typical loss function in SVMs is hinge loss. It punishes incorrect
classifications or margin violations. The objective function in SVM is frequently
formed by combining it with the regularisation term.
9. Dual Problem: A dual Problem of the optimisation problem that requires
locating the Lagrange multipliers related to the support vectors can be used to
solve SVM. The dual formulation enables the use of kernel tricks and more
effective computing.
Mathematical intuition of Support Vector Machine
Consider a binary classification problem with two classes, labeled as +1 and -1. We have
a training dataset consisting of input feature vectors X and their corresponding class
labels Y.
The equation for the linear hyperplane can be written as:
wTx+b=0wTx+b=0
The vector W represents the normal vector to the hyperplane. i.e the direction
perpendicular to the hyperplane. The parameter b in the equation represents the offset or
distance of the hyperplane from the origin along the normal vector w.
The distance between a data point x_i and the decision boundary can be calculated as:
di=wTxi+b∣∣w∣∣di=∣∣w∣∣wTxi+b
where ||w|| represents the Euclidean norm of the weight vector w. Euclidean norm of the
normal vector W
For Linear SVM classifier :
y^={1: wTx+b≥00: wTx+b <0y^={10: wTx+b≥0: wTx+b <0
Optimization:
• For Hard margin linear SVM classifier:
minimizew,b12wTw=minimizeW,b12∥w∥2subject toyi(wTxi+b)≥1fori=1,2,3,⋯,mw,bminimize21
wTw=W,bminimize21∥w∥2subject toyi(wTxi+b)≥1fori=1,2,3,⋯,m
The target variable or label for the ith training instance is denoted by the symbol ti in this
statement. And ti=-1 for negative occurrences (when yi= 0) and ti=1positive instances
(when yi = 1) respectively. Because we require the decision boundary that satisfy the
constraint: ti(wTxi+b)≥1ti(wTxi+b)≥1
• For Soft margin linear SVM classifier:
minimize w,b12wTw+C∑i=1mζisubject to yi(wTxi+b)≥ 1−ζiandζi≥0fori=1,2,3,⋯,mw,bminimize
21wTw+C∑i=1mζisubject to yi(wTxi+b)≥ 1−ζiandζi≥0fori=1,2,3,⋯,m
• Dual Problem: A dual Problem of the optimisation problem that requires
locating the Lagrange multipliers related to the support vectors can be used to
solve SVM. The optimal Lagrange multipliers α(i) that maximize the following
dual objective function
maximizeα:12∑i→m∑j→mαiαjtitjK(xi,xj)−∑i→mαiαmaximize:21i→m∑j→m∑αiαjtitjK(xi,xj
)−i→m∑αi
where,
• αi is the Lagrange multiplier associated with the ith training sample.
• K(xi, xj) is the kernel function that computes the similarity between two
samples xi and xj. It allows SVM to handle nonlinear classification problems by
implicitly mapping the samples into a higher-dimensional feature space.
• The term ∑αi represents the sum of all Lagrange multipliers.
The SVM decision boundary can be described in terms of these optimal Lagrange
multipliers and the support vectors once the dual issue has been solved and the optimal
Lagrange multipliers have been discovered. The training samples that have i > 0 are the
support vectors, while the decision boundary is supplied by:
w=∑i→mαitiK(xi,x)+bti(wTxi−b)=1⟺b=wTxi−tiw=i→m∑αitiK(xi,x)+bti(wTxi−b)=1⟺b=wTxi
−ti
Types of Support Vector Machine
Based on the nature of the decision boundary, Support Vector Machines (SVM) can be
divided into two main parts:
• Linear SVM: Linear SVMs use a linear decision boundary to separate the data
points of different classes. When the data can be precisely linearly separated,
linear SVMs are very suitable. This means that a single straight line (in 2D) or a
hyperplane (in higher dimensions) can entirely divide the data points into their
respective classes. A hyperplane that maximizes the margin between the
classes is the decision boundary.
• Non-Linear SVM: Non-Linear SVM can be used to classify data when it cannot
be separated into two classes by a straight line (in the case of 2D). By using
kernel functions, nonlinear SVMs can handle nonlinearly separable data. The
original input data is transformed by these kernel functions into a higher-
dimensional feature space, where the data points can be linearly separated. A
linear SVM is used to locate a nonlinear decision boundary in this modified
space.
Popular kernel functions in SVM
The SVM kernel is a function that takes low-dimensional input space and transforms it
into higher-dimensional space, ie it converts nonseparable problems to separable
problems. It is mostly useful in non-linear separation problems. Simply put the kernel,
does some extremely complex data transformations and then finds out the process to
separate the data based on the labels or outputs defined.
Linear : K(w,b)=wTx+bPolynomial : K(w,x)=(γwTx+b)NGaussian RBF: K(w,x)=exp⁡(−γ∣∣xi−xj∣∣
nSigmoid :K(xi,xj)=tanh⁡(αxiTxj+b)Linear : K(w,b)Polynomial : K(w,x)Gaussian RBF: K(w,x)Sig
moid :K(xi,xj)=wTx+b=(γwTx+b)N=exp(−γ∣∣xi−xj∣∣n=tanh(αxiTxj+b)
Advantages of SVM
• Effective in high-dimensional cases.
• Its memory is efficient as it uses a subset of training points in the decision
function called support vectors.
• Different kernel functions can be specified for the decision functions and its
possible to specify custom kernels.
SVM implementation in Python
Predict if cancer is Benign or malignant. Using historical data about patients diagnosed
with cancer enables doctors to differentiate malignant cases and benign ones are given
independent attributes.
Steps
• Load the breast cancer dataset from sklearn.datasets
• Separate input features and target variables.
• Build and train the SVM classifiers using RBF kernel.
• Plot the scatter plot of the input features.
• Plot the decision boundary.
• Plot the decision boundary
Ques 6: What is K-means algorithms? Discuss in details?
Answer

What is K-means Clustering?


Unsupervised Machine Learning is the process of teaching a computer to use unlabeled,
unclassified data and enabling the algorithm to operate on that data without supervision.
Without any previous data training, the machine’s job in this case is to organize unsorted
data according to parallels, patterns, and variations.
K means clustering, assigns data points to one of the K clusters depending on their
distance from the center of the clusters. It starts by randomly assigning the clusters
centroid in the space. Then each data point assign to one of the cluster based on its
distance from centroid of the cluster. After assigning each point to one of the cluster, new
cluster centroids are assigned. This process runs iteratively until it finds good cluster. In
the analysis we assume that number of cluster is given in advanced and we have to put
points in one of the group.
In some cases, K is not clearly defined, and we have to think about the optimal number of
K. K Means clustering performs best data is well separated. When data points overlapped
this clustering is not suitable. K Means is faster as compare to other clustering technique.
It provides strong coupling between the data points. K Means cluster do not provide clear
information regarding the quality of clusters. Different initial assignment of cluster
centroid may lead to different clusters. Also, K Means algorithm is sensitive to noise. It
maymhave stuck in local minima.
How k-means clustering works?
We are given a data set of items, with certain features, and values for these features (like
a vector). The task is to categorize those items into groups. To achieve this, we will use
the K-means algorithm, an unsupervised learning algorithm. ‘K’ in the name of the
algorithm represents the number of groups/clusters we want to classify our items into.
(It will help if you think of items as points in an n-dimensional space). The algorithm will
categorize the items into k groups or clusters of similarity. To calculate that similarity, we
will use the Euclidean distance as a measurement.
The algorithm works as follows:
1. First, we randomly initialize k points, called means or cluster centroids.
2. We categorize each item to its closest mean, and we update the mean’s
coordinates, which are the averages of the items categorized in that cluster so
far.
3. We repeat the process for a given number of iterations and at the end, we
have our clusters.
The “points” mentioned above are called means because they are the mean values of the
items categorized in them. To initialize these means, we have a lot of options. An intuitive
method is to initialize the means at random items in the data set. Another method is to
initialize the means at random values between the boundaries of the data set (if for a
feature x, the items have values in [0,3], we will initialize the means with values for x at
[0,3]).
Ques 7: Describe how principal component analysis is carried out to reduce dimensionality
of data sets?
Answer: As the number of features or dimensions in a dataset increases, the amount of
data required to obtain a statistically significant result increases exponentially. This can
lead to issues such as overfitting, increased computation time, and reduced accuracy of
machine learning models this is known as the curse of dimensionality problems that arise
while working with high-dimensional data.
As the number of dimensions increases, the number of possible combinations of features
increases exponentially, which makes it computationally difficult to obtain a
representative sample of the data and it becomes expensive to perform tasks such as
clustering or classification because it becomes. Additionally, some machine
learning algorithms can be sensitive to the number of dimensions, requiring more data to
achieve the same level of accuracy as lower-dimensional data.
To address the curse of dimensionality, Feature engineering techniques are used which
include feature selection and feature extraction. Dimensionality reduction is a type of
feature extraction technique that aims to reduce the number of input features while
retaining as much of the original information as possible.
In this article, we will discuss one of the most popular dimensionality reduction
techniques i.e. Principal Component Analysis(PCA).
What is Principal Component Analysis(PCA)?
Principal Component Analysis(PCA) technique was introduced by the mathematician Karl
Pearson in 1901. It works on the condition that while the data in a higher dimensional
space is mapped to data in a lower dimension space, the variance of the data in the lower
dimensional space should be maximum.
• Principal Component Analysis (PCA) is a statistical procedure that uses an
orthogonal transformation that converts a set of correlated variables to a set of
uncorrelated variables.PCA is the most widely used tool in exploratory data
analysis and in machine learning for predictive models. Moreover,
• Principal Component Analysis (PCA) is an unsupervised learning algorithm
technique used to examine the interrelations among a set of variables. It is also
known as a general factor analysis where regression determines a line of best
fit.
• The main goal of Principal Component Analysis (PCA) is to reduce the
dimensionality of a dataset while preserving the most important patterns or
relationships between the variables without any prior knowledge of the target
variables.
Principal Component Analysis (PCA) is used to reduce the dimensionality of a data set by
finding a new set of variables, smaller than the original set of variables, retaining most of
the sample’s information, and useful for the regression and classification of data.

1. Principal Component Analysis (PCA) is a technique for dimensionality


reduction that identifies a set of orthogonal axes, called principal components,
that capture the maximum variance in the data. The principal components are
linear combinations of the original variables in the dataset and are ordered in
decreasing order of importance. The total variance captured by all the principal
components is equal to the total variance in the original dataset.
2. The first principal component captures the most variation in the data, but the
second principal component captures the maximum variance that
is orthogonal to the first principal component, and so on.
3. Principal Component Analysis can be used for a variety of purposes, including
data visualization, feature selection, and data compression. In data
visualization, PCA can be used to plot high-dimensional data in two or three
dimensions, making it easier to interpret. In feature selection, PCA can be used
to identify the most important variables in a dataset. In data compression, PCA
can be used to reduce the size of a dataset without losing important
information.
4. In Principal Component Analysis, it is assumed that the information is carried in
the variance of the features, that is, the higher the variation in a feature, the
more information that features carries.
Overall, PCA is a powerful tool for data analysis and can help to simplify complex
datasets, making them easier to understand and work with.
Step-By-Step Explanation of PCA (Principal Component Analysis)
Step 1: Standardization
First, we need to standardize our dataset to ensure that each variable has a mean of 0
and a standard deviation of 1.
Z=X−μσZ=σX−μ
Here,
• μμ is the mean of independent features μ={μ1,μ2,⋯,μm}μ={μ1,μ2,⋯,μm}
• σσ is the standard deviation of independent features σ={σ1,σ2,⋯,σm}σ={σ1,σ2
,⋯,σm}
Step2: Covariance Matrix Computation
Covariance measures the strength of joint variability between two or more variables,
indicating how much they change in relation to each other. To find the covariance we can
use the formula:
cov(x1,x2)=∑i=1n(x1i−x1ˉ)(x2i−x2ˉ)n−1cov(x1,x2)=n−1∑i=1n(x1i−x1ˉ)(x2i−x2ˉ)
The value of covariance can be positive, negative, or zeros.
• Positive: As the x1 increases x2 also increases.
• Negative: As the x1 increases x2 also decreases.
• Zeros: No direct relation
Step 3: Compute Eigenvalues and Eigenvectors of Covariance Matrix to Identify Principal
Components
Let A be a square nXn matrix and X be a non-zero vector for which
AX=λXAX=λX
for some scalar values λλ. then λλ is known as the eigenvalue of matrix A and X is known
as the eigenvector of matrix A for the corresponding eigenvalue.
It can also be written as :
AX−λX=0(A−λI)X=0AX−λX(A−λI)X=0=0
where I am the identity matrix of the same shape as matrix A. And the above conditions
will be true only if (A–λI)(A–λI) will be non-invertible (i.e. singular matrix). That means,
∣A–λI∣=0∣A–λI∣=0
From the above equation, we can find the eigenvalues \lambda, and therefore
corresponding eigenvector can be found using the equation AX=λXAX=λX.

Ques 8: Write short notes on reinforcement learning?


Answer: Reinforcement learning is an area of Machine Learning. It is about taking suitable
action to maximize reward in a particular situation. It is employed by various software and
machines to find the best possible behavior or path it should take in a specific situation.
Reinforcement learning differs from supervised learning in a way that in supervised
learning the training data has the answer key with it so the model is trained with the correct
answer itself whereas in reinforcement learning, there is no answer but the reinforcement
agent decides what to do to perform the given task. In the absence of a training dataset, it
is bound to learn from its experience.
Reinforcement Learning (RL) is the science of decision making. It is about learning the
optimal behavior in an environment to obtain maximum reward. In RL, the data is
accumulated from machine learning systems that use a trial-and-error method. Data is not
part of the input that we would find in supervised or unsupervised machine learning.
Reinforcement learning uses algorithms that learn from outcomes and decide which action
to take next. After each action, the algorithm receives feedback that helps it determine
whether the choice it made was correct, neutral or incorrect. It is a good technique to use
for automated systems that have to make a lot of small decisions without human guidance.
Reinforcement learning is an autonomous, self-teaching system that essentially learns by
trial and error. It performs actions with the aim of maximizing rewards, or in other words, it
is learning by doing in order to achieve the best outcomes.
Example:
The problem is as follows: We have an agent and a reward, with many hurdles in between.
The agent is supposed to find the best possible path to reach the reward. The following
problem explains the problem more easily.

The above image shows the robot, diamond, and fire. The goal of the robot is to get the
reward that is the diamond and avoid the hurdles that are fired. The robot learns by trying
all the possible paths and then choosing the path which gives him the reward with the
least hurdles. Each right step will give the robot a reward and each wrong step will
subtract the reward of the robot. The total reward will be calculated when it reaches the
final reward that is the diamond.
Main points in Reinforcement learning –

• Input: The input should be an initial state from which the model will start
• Output: There are many possible outputs as there are a variety of solutions to
a particular problem
• Training: The training is based upon the input, The model will return a state
and the user will decide to reward or punish the model based on its output.
• The model keeps continues to learn.
• The best solution is decided based on the maximum reward.
Difference between Reinforcement learning and Supervised learning:

Reinforcement learning Supervised learning

Reinforcement learning is all about making


In Supervised learning, the
decisions sequentially. In simple words, we can say
decision is made on the initial
that the output depends on the state of the current
input or the input given at the
input and the next input depends on the output of
start
the previous input

In supervised learning the


In Reinforcement learning decision is dependent,
decisions are independent of
So we give labels to sequences of dependent
each other so labels are given to
decisions
each decision.

Example: Object
Example: Chess game,text summarization
recognition,spam detetction

Types of Reinforcement:
There are two types of Reinforcement:
1. Positive: Positive Reinforcement is defined as when an event, occurs due to a
particular behavior, increases the strength and the frequency of the behavior.
In other words, it has a positive effect on behavior.
Advantages of reinforcement learning are:
• Maximizes Performance
• Sustain Change for a long period of time
• Too much Reinforcement can lead to an overload of states which
can diminish the results
2. Negative: Negative Reinforcement is defined as strengthening of behavior
because a negative condition is stopped or avoided.
Advantages of reinforcement learning:
• Increases Behavior
• Provide defiance to a minimum standard of performance
• It Only provides enough to meet up the minimum behavior
Elements of Reinforcement Learning
Reinforcement learning elements are as follows:
1. Policy
2. Reward function
3. Value function
4. Model of the environment
Policy: Policy defines the learning agent behavior for given time period. It is a mapping
from perceived states of the environment to actions to be taken when in those states.
Reward function: Reward function is used to define a goal in a reinforcement learning
problem.A reward function is a function that provides a numerical score based on the
state of the environment
Value function: Value functions specify what is good in the long run. The value of a state
is the total amount of reward an agent can expect to accumulate over the future, starting
from that state.
Model of the environment: Models are used for planning.
• Credit assignment problem: Reinforcement learning algorithms learn to
generate an internal value for the intermediate states as to how good they are
in leading to the goal. The learning decision maker is called the agent. The
agent interacts with the environment that includes everything outside the
agent.
The agent has sensors to decide on its state in the environment and takes action that
modifies its state.
• The reinforcement learning problem model is an agent continuously interacting
with an environment. The agent and the environment interact in a sequence of
time steps. At each time step t, the agent receives the state of the environment
and a scalar numerical reward for the previous action, and then the agent then
selects an action.
Reinforcement learning is a technique for solving Markov decision problems.
• Reinforcement learning uses a formal framework defining the interaction
between a learning agent and its environment in terms of states, actions, and
rewards. This framework is intended to be a simple way of representing
essential features of the artificial intelligence problem.
Various Practical Applications of Reinforcement Learning –

• RL can be used in robotics for industrial automation.


• RL can be used in machine learning and data processing
• RL can be used to create training systems that provide custom instruction and
materials according to the requirement of students.
Application of Reinforcement Learnings
1. Robotics: Robots with pre-programmed behavior are useful in structured environments,
such as the assembly line of an automobile manufacturing plant, where the task is
repetitive in nature.
2. A master chess player makes a move. The choice is informed both by planning,
anticipating possible replies and counter replies.
3. An adaptive controller adjusts parameters of a petroleum refinery’s operation in real
time.
RL can be used in large environments in the following situations:

1. A model of the environment is known, but an analytic solution is not available;


2. Only a simulation model of the environment is given (the subject of simulation-
based optimization)
3. The only way to collect information about the environment is to interact with it.
Advantages and Disadvantages of Reinforcement Learning
Advantages of Reinforcement learning
1. Reinforcement learning can be used to solve very complex problems that cannot be
solved by conventional techniques.
2. The model can correct the errors that occurred during the training process.
3. In RL, training data is obtained via the direct interaction of the agent with the
environment
4. Reinforcement learning can handle environments that are non-deterministic, meaning
that the outcomes of actions are not always predictable. This is useful in real-world
applications where the environment may change over time or is uncertain.
5. Reinforcement learning can be used to solve a wide range of problems, including those
that involve decision making, control, and optimization.
6. Reinforcement learning is a flexible approach that can be combined with other machine
learning techniques, such as deep learning, to improve performance.
Disadvantages of Reinforcement learning
1. Reinforcement learning is not preferable to use for solving simple problems.
2. Reinforcement learning needs a lot of data and a lot of computation
3. Reinforcement learning is highly dependent on the quality of the reward function. If the
reward function is poorly designed, the agent may not learn the desired behavior.
4. Reinforcement learning can be difficult to debug and interpret. It is not always clear
why the agent is behaving in a certain way, which can make it difficult to diagnose and fix
problems.

You might also like