Sem 6 Ques Data Science
Sem 6 Ques Data Science
Distinguish between Skewness and kurtosis and bring out their importance in
describing frequency distribution.
Skewness Kurtosis
Skewness measures the asymmetry of a probability Kurtosis measures the tailedness or peakedness of a probability
distribution distribution
Positive skew indicates a right-skewed distribution, Positive kurtosis indicates a distribution with heavier tails, often
with the tail extending to the right referred to as “leptokurtic”
Negative skew indicates a left-skewed distribution, Negative kurtosis indicates a distribution with lighter tails, often
with the tail extending to the left referred to as “platykurtic”
A skewness value of zero indicates a symmetric A kurtosis value of zero indicates a distribution similar to the no
distribution distribution, often referred to as “mesokurtic”
Used to identify the direction and degree of
Used to identify the presence of outliers or extreme values
asymmetry
Sensitive to changes in the tails of the distribution Sensitive to changes in the center and shoulders of the distributio
Commonly used in fields such as economics,
Commonly used in statistics, engineering, and physical sciences
finance, and social sciences
Examples: income distribution, stock returns Examples: particle physics, image processing
Ques2 : What is Correlation? Explain different types of correlation with suitable example?
Correlation in Statistics
This section shows how to calculate and interpret correlation coefficients for
ordinal and interval level scales. Methods of correlation summarize the
relationship between two variables in a single number called the correlation
coefficient. The correlation coefficient is usually represented using the symbol r,
and it ranges from -1 to +1.
Correlation Coefficient
The two variables are often given the symbols X and Y. In order to illustrate how
the two variables are related, the values of X and Y are pictured by drawing the
scatter diagram, graphing combinations of the two variables. The scatter
diagram is given first, and then the method of determining Pearson’s r is
presented. From the following examples, relatively small sample sizes are given.
Later, data from larger samples are given.
Scatter Diagram
A scatter diagram is a diagram that shows the values of two variables X and Y,
along with the way in which these two variables relate to each other. The values of
variable X are given along the horizontal axis, with the values of the variable Y
given on the vertical axis.
Later, when the regression model is used, one of the variables is defined as an
independent variable, and the other is defined as a dependent variable. In
regression, the independent variable X is considered to have some effect or
influence on the dependent variable Y. Correlation methods are symmetric with
respect to the two variables, with no indication of causation or direction of
influence being part of the statistical consideration. A scatter diagram is given in
the following example. The same example is later used to determine the
correlation coefficient.
Types of Correlation
The scatter plot explains the correlation between the two attributes or variables. It
represents how closely the two variables are connected. There can be three such
situations to see the relation between the two variables –
• Positive Correlation – when the values of the two variables move in the
same direction so that an increase/decrease in the value of one variable is
followed by an increase/decrease in the value of the other variable.
• Negative Correlation – when the values of the two variables move in the
opposite direction so that an increase/decrease in the value of one
variable is followed by decrease/increase in the value of the other variable.
• No Correlation – when there is no linear dependence or no relation
between the two variables.
Ques 3: Discuss linear regression with Example?
Answer : Linear regression is also a type of machine-learning algorithm more specifically
a supervised machine-learning algorithm that learns from the labelled datasets and
maps the data points to the most optimized linear functions. which can be used for
prediction on new datasets.
First of we should know what supervised machine learning algorithms is. It is a type of
machine learning where the algorithm learns from labelled data. Labeled data means the
dataset whose respective target value is already known. Supervised learning has two
types:
• Classification: It predicts the class of the dataset based on the independent
input variable. Class is the categorical or discrete values. like the image of an
animal is a cat or dog?
• Regression: It predicts the continuous output variables based on the
independent input variable. like the prediction of house prices based on
different parameters like house age, distance from the main road, location,
area, etc.
Ques 4 : Define Machine learning? Explain types of machine learning with suitable
examples.
Answer: Machine learning is the branch of Artificial Intelligence that focuses on developing
models and algorithms that let computers learn from data and improve from previous experience
without being explicitly programmed for every task. In simple words, ML teaches the systems to
think and understand like humans by learning from the data.
Supervised Learning
Ques 5: What is the goal of Support Vector Machine? How to compute the margin?
From the figure above it’s very clear that there are multiple lines (our hyperplane here is a
line because we are considering only two input features x1, x2) that segregate our data
points or do a classification between red and blue circles. So how do we choose the best
line or in general the best hyperplane that segregates our data points?
How does SVM work?
One reasonable choice as the best hyperplane is the one that represents the largest
separation or margin between the two classes.
Multiple hyperplanes separate the data from two classes
So we choose the hyperplane whose distance from it to the nearest data point on each
side is maximized. If such a hyperplane exists it is known as the maximum-margin
hyperplane/hard margin. So from the above figure, we choose L2. Let’s consider a
scenario like shown below
Here we have one blue ball in the boundary of the red ball. So how does SVM classify the
data? It’s simple! The blue ball in the boundary of red ones is an outlier of blue balls. The
SVM algorithm has the characteristics to ignore the outlier and finds the best hyperplane
that maximizes the margin. SVM is robust to outliers.
So in this type of data point what SVM does is, finds the maximum margin as done with
previous data sets along with that it adds a penalty each time a point crosses the margin.
So the margins in these types of cases are called soft margins. When there is a soft
margin to the data set, the SVM tries to minimize (1/margin+∧(∑penalty)). Hinge loss is a
commonly used penalty. If no violations no hinge loss.If violations hinge loss proportional
to the distance of violation.
Till now, we were talking about linearly separable data(the group of blue balls and red
balls are separable by a straight line/linear line). What to do if data are not linearly
separable?
Original 1D dataset for classification
Say, our data is shown in the figure above. SVM solves this by creating a new variable
using a kernel. We call a point xi on the line and we create a new variable yi as a function
of distance from origin o.so if we plot this we get something like as shown below
In this case, the new variable y is created as a function of distance from the origin. A non-
linear function that creates a new variable is referred to as a kernel.
Support Vector Machine Terminology
1. Hyperplane: Hyperplane is the decision boundary that is used to separate the
data points of different classes in a feature space. In the case of linear
classifications, it will be a linear equation i.e. wx+b = 0.
2. Support Vectors: Support vectors are the closest data points to the
hyperplane, which makes a critical role in deciding the hyperplane and margin.
3. Margin: Margin is the distance between the support vector and hyperplane.
The main objective of the support vector machine algorithm is to maximize the
margin. The wider margin indicates better classification performance.
4. Kernel: Kernel is the mathematical function, which is used in SVM to map the
original input data points into high-dimensional feature spaces, so, that the
hyperplane can be easily found out even if the data points are not linearly
separable in the original input space. Some of the common kernel functions are
linear, polynomial, radial basis function(RBF), and sigmoid.
5. Hard Margin: The maximum-margin hyperplane or the hard margin
hyperplane is a hyperplane that properly separates the data points of different
categories without any misclassifications.
6. Soft Margin: When the data is not perfectly separable or contains outliers,
SVM permits a soft margin technique. Each data point has a slack variable
introduced by the soft-margin SVM formulation, which softens the strict
margin requirement and permits certain misclassifications or violations. It
discovers a compromise between increasing the margin and reducing
violations.
7. C: Margin maximisation and misclassification fines are balanced by the
regularisation parameter C in SVM. The penalty for going over the margin or
misclassifying data items is decided by it. A stricter penalty is imposed with a
greater value of C, which results in a smaller margin and perhaps fewer
misclassifications.
8. Hinge Loss: A typical loss function in SVMs is hinge loss. It punishes incorrect
classifications or margin violations. The objective function in SVM is frequently
formed by combining it with the regularisation term.
9. Dual Problem: A dual Problem of the optimisation problem that requires
locating the Lagrange multipliers related to the support vectors can be used to
solve SVM. The dual formulation enables the use of kernel tricks and more
effective computing.
Mathematical intuition of Support Vector Machine
Consider a binary classification problem with two classes, labeled as +1 and -1. We have
a training dataset consisting of input feature vectors X and their corresponding class
labels Y.
The equation for the linear hyperplane can be written as:
wTx+b=0wTx+b=0
The vector W represents the normal vector to the hyperplane. i.e the direction
perpendicular to the hyperplane. The parameter b in the equation represents the offset or
distance of the hyperplane from the origin along the normal vector w.
The distance between a data point x_i and the decision boundary can be calculated as:
di=wTxi+b∣∣w∣∣di=∣∣w∣∣wTxi+b
where ||w|| represents the Euclidean norm of the weight vector w. Euclidean norm of the
normal vector W
For Linear SVM classifier :
y^={1: wTx+b≥00: wTx+b <0y^={10: wTx+b≥0: wTx+b <0
Optimization:
• For Hard margin linear SVM classifier:
minimizew,b12wTw=minimizeW,b12∥w∥2subject toyi(wTxi+b)≥1fori=1,2,3,⋯,mw,bminimize21
wTw=W,bminimize21∥w∥2subject toyi(wTxi+b)≥1fori=1,2,3,⋯,m
The target variable or label for the ith training instance is denoted by the symbol ti in this
statement. And ti=-1 for negative occurrences (when yi= 0) and ti=1positive instances
(when yi = 1) respectively. Because we require the decision boundary that satisfy the
constraint: ti(wTxi+b)≥1ti(wTxi+b)≥1
• For Soft margin linear SVM classifier:
minimize w,b12wTw+C∑i=1mζisubject to yi(wTxi+b)≥ 1−ζiandζi≥0fori=1,2,3,⋯,mw,bminimize
21wTw+C∑i=1mζisubject to yi(wTxi+b)≥ 1−ζiandζi≥0fori=1,2,3,⋯,m
• Dual Problem: A dual Problem of the optimisation problem that requires
locating the Lagrange multipliers related to the support vectors can be used to
solve SVM. The optimal Lagrange multipliers α(i) that maximize the following
dual objective function
maximizeα:12∑i→m∑j→mαiαjtitjK(xi,xj)−∑i→mαiαmaximize:21i→m∑j→m∑αiαjtitjK(xi,xj
)−i→m∑αi
where,
• αi is the Lagrange multiplier associated with the ith training sample.
• K(xi, xj) is the kernel function that computes the similarity between two
samples xi and xj. It allows SVM to handle nonlinear classification problems by
implicitly mapping the samples into a higher-dimensional feature space.
• The term ∑αi represents the sum of all Lagrange multipliers.
The SVM decision boundary can be described in terms of these optimal Lagrange
multipliers and the support vectors once the dual issue has been solved and the optimal
Lagrange multipliers have been discovered. The training samples that have i > 0 are the
support vectors, while the decision boundary is supplied by:
w=∑i→mαitiK(xi,x)+bti(wTxi−b)=1⟺b=wTxi−tiw=i→m∑αitiK(xi,x)+bti(wTxi−b)=1⟺b=wTxi
−ti
Types of Support Vector Machine
Based on the nature of the decision boundary, Support Vector Machines (SVM) can be
divided into two main parts:
• Linear SVM: Linear SVMs use a linear decision boundary to separate the data
points of different classes. When the data can be precisely linearly separated,
linear SVMs are very suitable. This means that a single straight line (in 2D) or a
hyperplane (in higher dimensions) can entirely divide the data points into their
respective classes. A hyperplane that maximizes the margin between the
classes is the decision boundary.
• Non-Linear SVM: Non-Linear SVM can be used to classify data when it cannot
be separated into two classes by a straight line (in the case of 2D). By using
kernel functions, nonlinear SVMs can handle nonlinearly separable data. The
original input data is transformed by these kernel functions into a higher-
dimensional feature space, where the data points can be linearly separated. A
linear SVM is used to locate a nonlinear decision boundary in this modified
space.
Popular kernel functions in SVM
The SVM kernel is a function that takes low-dimensional input space and transforms it
into higher-dimensional space, ie it converts nonseparable problems to separable
problems. It is mostly useful in non-linear separation problems. Simply put the kernel,
does some extremely complex data transformations and then finds out the process to
separate the data based on the labels or outputs defined.
Linear : K(w,b)=wTx+bPolynomial : K(w,x)=(γwTx+b)NGaussian RBF: K(w,x)=exp(−γ∣∣xi−xj∣∣
nSigmoid :K(xi,xj)=tanh(αxiTxj+b)Linear : K(w,b)Polynomial : K(w,x)Gaussian RBF: K(w,x)Sig
moid :K(xi,xj)=wTx+b=(γwTx+b)N=exp(−γ∣∣xi−xj∣∣n=tanh(αxiTxj+b)
Advantages of SVM
• Effective in high-dimensional cases.
• Its memory is efficient as it uses a subset of training points in the decision
function called support vectors.
• Different kernel functions can be specified for the decision functions and its
possible to specify custom kernels.
SVM implementation in Python
Predict if cancer is Benign or malignant. Using historical data about patients diagnosed
with cancer enables doctors to differentiate malignant cases and benign ones are given
independent attributes.
Steps
• Load the breast cancer dataset from sklearn.datasets
• Separate input features and target variables.
• Build and train the SVM classifiers using RBF kernel.
• Plot the scatter plot of the input features.
• Plot the decision boundary.
• Plot the decision boundary
Ques 6: What is K-means algorithms? Discuss in details?
Answer
The above image shows the robot, diamond, and fire. The goal of the robot is to get the
reward that is the diamond and avoid the hurdles that are fired. The robot learns by trying
all the possible paths and then choosing the path which gives him the reward with the
least hurdles. Each right step will give the robot a reward and each wrong step will
subtract the reward of the robot. The total reward will be calculated when it reaches the
final reward that is the diamond.
Main points in Reinforcement learning –
• Input: The input should be an initial state from which the model will start
• Output: There are many possible outputs as there are a variety of solutions to
a particular problem
• Training: The training is based upon the input, The model will return a state
and the user will decide to reward or punish the model based on its output.
• The model keeps continues to learn.
• The best solution is decided based on the maximum reward.
Difference between Reinforcement learning and Supervised learning:
Example: Object
Example: Chess game,text summarization
recognition,spam detetction
Types of Reinforcement:
There are two types of Reinforcement:
1. Positive: Positive Reinforcement is defined as when an event, occurs due to a
particular behavior, increases the strength and the frequency of the behavior.
In other words, it has a positive effect on behavior.
Advantages of reinforcement learning are:
• Maximizes Performance
• Sustain Change for a long period of time
• Too much Reinforcement can lead to an overload of states which
can diminish the results
2. Negative: Negative Reinforcement is defined as strengthening of behavior
because a negative condition is stopped or avoided.
Advantages of reinforcement learning:
• Increases Behavior
• Provide defiance to a minimum standard of performance
• It Only provides enough to meet up the minimum behavior
Elements of Reinforcement Learning
Reinforcement learning elements are as follows:
1. Policy
2. Reward function
3. Value function
4. Model of the environment
Policy: Policy defines the learning agent behavior for given time period. It is a mapping
from perceived states of the environment to actions to be taken when in those states.
Reward function: Reward function is used to define a goal in a reinforcement learning
problem.A reward function is a function that provides a numerical score based on the
state of the environment
Value function: Value functions specify what is good in the long run. The value of a state
is the total amount of reward an agent can expect to accumulate over the future, starting
from that state.
Model of the environment: Models are used for planning.
• Credit assignment problem: Reinforcement learning algorithms learn to
generate an internal value for the intermediate states as to how good they are
in leading to the goal. The learning decision maker is called the agent. The
agent interacts with the environment that includes everything outside the
agent.
The agent has sensors to decide on its state in the environment and takes action that
modifies its state.
• The reinforcement learning problem model is an agent continuously interacting
with an environment. The agent and the environment interact in a sequence of
time steps. At each time step t, the agent receives the state of the environment
and a scalar numerical reward for the previous action, and then the agent then
selects an action.
Reinforcement learning is a technique for solving Markov decision problems.
• Reinforcement learning uses a formal framework defining the interaction
between a learning agent and its environment in terms of states, actions, and
rewards. This framework is intended to be a simple way of representing
essential features of the artificial intelligence problem.
Various Practical Applications of Reinforcement Learning –