0% found this document useful (0 votes)
138 views5 pages

Machine Learning Techniques Assignment-7: Name:Ishaan Kapoor Rollno:1/15/Fet/Bcs/1/055

The document contains a student assignment containing questions and answers about machine learning techniques. It discusses the properties of expectation in question 1a, the differences between supervised and unsupervised learning in 1b, support vector machines in 1c, the need for feature selection in 1d, and factor analysis in 1e. Question 2 explains Gaussian distribution and its applications. Question 3 describes ensemble learning. Question 4 discusses how principle component analysis is used to reduce data dimensionality. Finally, question 5 summarizes the k-means clustering algorithm.

Uploaded by

bharti goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
138 views5 pages

Machine Learning Techniques Assignment-7: Name:Ishaan Kapoor Rollno:1/15/Fet/Bcs/1/055

The document contains a student assignment containing questions and answers about machine learning techniques. It discusses the properties of expectation in question 1a, the differences between supervised and unsupervised learning in 1b, support vector machines in 1c, the need for feature selection in 1d, and factor analysis in 1e. Question 2 explains Gaussian distribution and its applications. Question 3 describes ensemble learning. Question 4 discusses how principle component analysis is used to reduce data dimensionality. Finally, question 5 summarizes the k-means clustering algorithm.

Uploaded by

bharti goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Name :Ishaan Kapoor RollNo:1/15/FET/BCS/1/055 Class: 8CSA2

MACHINE LEARNING TECHNIQUES


ASSIGNMENT-7

Ques 1:
(a) Write the properties of Expectation?
Ans : In probability theory, the expected value of a random variable, intuitively, is the
long-run average value of repetitions of the same experiment it represents. ... More
practically, the expected value of a discrete random variable is the probability-weighted
average of all possible values.
(b) Differentiate between Supervised and Unsupervised
learning? Ans :
SUPERVISED
UNSUPERVISED LEARNING
LEARNING

Uses Known and Labeled Data


Input Data Uses Unknown Data as input
as input
Computational
Very Complex Less Computational Complexity
Complexity
Real Time Uses off-line analysis Uses Real Time Analysis of Data
Number of Classes Number of Classes are known Number of Classes are not known
Moderate Accurate and Reliable
Accuracy of Results Accurate and Reliable Results
Results

(c) Explain Support Vector Machine

Ans : Support Vector Machine” (SVM) is a supervised machine learning algorithm


which can be used for both classification or regression challenges. However, it is mostly
used in classification problems. ... Support Vectors are simply the co-ordinates of
individual observation. A Support Vector Machine (SVM) performs classification by
finding the hyperplane that maximizes the margin between the two classes. The vectors
(cases) that define the hyperplane are the support vectors.

(d) What is the need of feature selection

Ans : Feature selection or data dimension reduction or variable screening in


predictive analytics refers to the process of identifying the few most important variables or
parameters which help in predicting the outcome. In today's charged up world of high
speed computing, one might be forgiven for asking, why bother? The most important
reasons all come from practicality.
Reason 1: If two or more of the independent variables (or predictors) are correlated to the
dependent (or predicted) variable, then the estimates of coefficients in a regression model
tend to be unstable or counter intuitive.
Example: y = 45 + 0.8x1 and y = 45 + 0.1x2 are two linear regression models which
predict y. Both clearly indicate that if x's increase, y also increases. If x1 and x2 show a
strong correlation to y, then a multiple regression model might look like y = 45 + 0.02 x1 -
0.4 x2. In this case, because the three (x1, x2 and y) are strongly correlated, interaction
effects between x1 and x2 lead to a situation where x2 is in a negative relationship with y,
meaning y will decrease with increase in x2. This is not only the reverse of what was seen
in the simple model, but is also counter-intuitive.

Reason 2: The law of averages states that the larger the set of predictors, the higher the
probability of having missing values in the data. If we chose to delete cases which have
missing values for some predictors, we may end up with a shortage of samples.
Example: A practical rule of thumb used by data miners is to have atleast 5(p+2) samples
where p is the number of predictors. If your data set is sufficiently large and this rule is
easily satisfied, then you may not be risking much by deleting cases. But if your data is
from an expensive market survey for example, a systematic procedure to actually reduce
the data set, may result in a situation where you dont have to address this problem of losing
samples. It is better to lose variables which dont impact your prediction than to lose
somewhat more expensive samples.

There are several other more technical reasons for reducing data dimensionality which
will be explored in subsequent articles. In a next article, we will discuss some common
techniques for actually implementing this process.

(e) What is Factor Analysis

Ans : Factor analysis is a statistical method used to describe variability among observed,
correlated variables in terms of a potentially lower number of unobserved variables
called factors. For example, it is possible that variations in six observed variables mainly
reflect the variations in two unobserved variables.

Ques 2: Explain Gaussian Distribution and its application.


Ans : Gaussian distribution (also known as normal distribution) is a bell-shaped curve, and it
is assumed that during any measurement values will follow a normal distribution with an equal
number of measurements above and below the mean value. In order to understand normal
distribution, it is important to know the definitions of “mean,” “median,” and “mode.” The
“mean” is the calculated average of all values, the “median” is the value at the center point
(mid-point) of the distribution, while the “mode” is the value that was observed most
frequently during the measurement. If a distribution is normal, then the values of the
mean, median, and mode are the same. However, the value of the mean, median, and
mode may be different if the distribution is skewed (not Gaussian distribution). Other
characteristics of Gaussian distributionsare as follows:

Mean±1 SD contain 68.2% of all values.

Mean±2 SD contain 95.5% of all values.

Mean±3 SD contain 99.7% of all values.

A Gaussian distribution is shown in Figure. Usually, reference range is determined by


measuring the value of an analyte in a large number of normal subjects (at least 100
normal healthy people, but preferably 200–300 healthy individuals). Then the mean and
standard deviations are determined.

Ques 3: Explain Ensemble Learning.


Ans : In machine learning, ensemble methods use multiple learning algorithms to obtain better
predictive performance than could be obtained from any of the constituent learning algorithms
alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine
learning ensemble consists of only a concrete finite set of alternative models, but typically
allows for much more flexible structure to exist among those alternatives. Ensemble learning is
the use of algorithms and tools in machine learning and other disciplines, to form a
collaborative whole where multiple methods are more effective than a single learning method.
Ensemble learning can be used in many different types of research, for flexibility and enhanced
results. Many ensemble learning tools can be trained to produce various results. Individual
algorithms may be stacked on top of each other, or rely on a “bucket of models” method of
evaluating multiple methods for one system. In some cases, multiple data sets are aggregated
and combined. For example, a geographic research program may use multiple methods to
assess the prevalence of items in a geographic space. One of the issues with this type of
research involves making sure that various models are independent, and that the combination
of data is practical and works in a particular scenario.

Ensemble learning methods are included in different types of statistical software packages.
Some experts describe ensemble learning as “crowdsourcing” of data aggregation.
Ques 4: Describe how principle component analysis is carried out to
reduce dimensionality of data sets .
Ans : The main idea of principal component analysis (PCA) is to reduce the dimensionality of
a data set consisting of many variables correlated with each other, either heavily or lightly,
while retaining the variation present in the dataset, up to the maximum extent. The same is done
by transforming the variables to a new set of variables, which are known as the principal
components (or simply, the PCs) and are orthogonal, ordered such that the retention of variation
present in the original variables decreases as we move down in the order. So, in this way, the
1st principal component retains maximum variation that was present in the original
components. The principal components are the eigenvectors of a covariance matrix, and hence
they are orthogonal.
Importantly, the dataset on which PCA technique is to be used must be scaled. The results are
also sensitive to the relative scaling. As a layman, it is a method of summarizing data. Imagine
some wine bottles on a dining table. Each wine is described by its attributes like colour,
strength, age, etc. But redundancy will arise because many of them will measure related
properties. So what PCA will do in this case is summarize each wine in the stock with less
characteristics.
Intuitively, Principal Component Analysis can supply the user with a lower-dimensional
picture, a projection or "shadow" of this object when viewed from its most informative
viewpoint.

Ques 5: Summarize K-means Algorithm


Ans : K-means clustering is one of the simplest and popular unsupervised machine learning
algorithms.
Typically, unsupervised algorithms make inferences from datasets using only input
vectors without referring to known, or labelled, outcomes.
To process the learning data, the K-means algorithm in data mining starts with a first group of
randomly selected centroids, which are used as the beginning points for every cluster, and then
performs iterative (repetitive) calculations to optimize the positions of the centroids

It halts creating and optimizing clusters when either:


• The centroids have stabilized — there is no change in their values because the
clustering has been successful.

• The defined number of iterations has been achieved.

You might also like