0% found this document useful (0 votes)

138 views5 pages

Machine Learning Techniques Assignment-7: Name:Ishaan Kapoor Rollno:1/15/Fet/Bcs/1/055

The document contains a student assignment containing questions and answers about machine learning techniques. It discusses the properties of expectation in question 1a, the differences between supervised and unsupervised learning in 1b, support vector machines in 1c, the need for feature selection in 1d, and factor analysis in 1e. Question 2 explains Gaussian distribution and its applications. Question 3 describes ensemble learning. Question 4 discusses how principle component analysis is used to reduce data dimensionality. Finally, question 5 summarizes the k-means clustering algorithm.

Uploaded by

bharti goyal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

138 views5 pages

Machine Learning Techniques Assignment-7: Name:Ishaan Kapoor Rollno:1/15/Fet/Bcs/1/055

Uploaded by

bharti goyal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Name :Ishaan Kapoor RollNo:1/15/FET/BCS/1/055 Class: 8CSA2

MACHINE LEARNING TECHNIQUES

ASSIGNMENT-7

Ques 1:
(a) Write the properties of Expectation?
Ans : In probability theory, the expected value of a random variable, intuitively, is the
long-run average value of repetitions of the same experiment it represents. ... More
practically, the expected value of a discrete random variable is the probability-weighted
average of all possible values.
(b) Differentiate between Supervised and Unsupervised
learning? Ans :
SUPERVISED
UNSUPERVISED LEARNING
LEARNING

Uses Known and Labeled Data

Input Data Uses Unknown Data as input
as input
Computational
Very Complex Less Computational Complexity
Complexity
Real Time Uses off-line analysis Uses Real Time Analysis of Data
Number of Classes Number of Classes are known Number of Classes are not known
Moderate Accurate and Reliable
Accuracy of Results Accurate and Reliable Results
Results

(c) Explain Support Vector Machine

Ans : Support Vector Machine” (SVM) is a supervised machine learning algorithm

which can be used for both classification or regression challenges. However, it is mostly
used in classification problems. ... Support Vectors are simply the co-ordinates of
individual observation. A Support Vector Machine (SVM) performs classification by
finding the hyperplane that maximizes the margin between the two classes. The vectors
(cases) that define the hyperplane are the support vectors.

(d) What is the need of feature selection

Ans : Feature selection or data dimension reduction or variable screening in

predictive analytics refers to the process of identifying the few most important variables or
parameters which help in predicting the outcome. In today's charged up world of high
speed computing, one might be forgiven for asking, why bother? The most important
reasons all come from practicality.
Reason 1: If two or more of the independent variables (or predictors) are correlated to the
dependent (or predicted) variable, then the estimates of coefficients in a regression model
tend to be unstable or counter intuitive.
Example: y = 45 + 0.8x1 and y = 45 + 0.1x2 are two linear regression models which
predict y. Both clearly indicate that if x's increase, y also increases. If x1 and x2 show a
strong correlation to y, then a multiple regression model might look like y = 45 + 0.02 x1 -
0.4 x2. In this case, because the three (x1, x2 and y) are strongly correlated, interaction
effects between x1 and x2 lead to a situation where x2 is in a negative relationship with y,
meaning y will decrease with increase in x2. This is not only the reverse of what was seen
in the simple model, but is also counter-intuitive.

Reason 2: The law of averages states that the larger the set of predictors, the higher the
probability of having missing values in the data. If we chose to delete cases which have
missing values for some predictors, we may end up with a shortage of samples.
Example: A practical rule of thumb used by data miners is to have atleast 5(p+2) samples
where p is the number of predictors. If your data set is sufficiently large and this rule is
easily satisfied, then you may not be risking much by deleting cases. But if your data is
from an expensive market survey for example, a systematic procedure to actually reduce
the data set, may result in a situation where you dont have to address this problem of losing
samples. It is better to lose variables which dont impact your prediction than to lose
somewhat more expensive samples.

There are several other more technical reasons for reducing data dimensionality which
will be explored in subsequent articles. In a next article, we will discuss some common
techniques for actually implementing this process.

(e) What is Factor Analysis

Ans : Factor analysis is a statistical method used to describe variability among observed,
correlated variables in terms of a potentially lower number of unobserved variables
called factors. For example, it is possible that variations in six observed variables mainly
reflect the variations in two unobserved variables.

Ques 2: Explain Gaussian Distribution and its application.

Ans : Gaussian distribution (also known as normal distribution) is a bell-shaped curve, and it
is assumed that during any measurement values will follow a normal distribution with an equal
number of measurements above and below the mean value. In order to understand normal
distribution, it is important to know the definitions of “mean,” “median,” and “mode.” The
“mean” is the calculated average of all values, the “median” is the value at the center point
(mid-point) of the distribution, while the “mode” is the value that was observed most
frequently during the measurement. If a distribution is normal, then the values of the
mean, median, and mode are the same. However, the value of the mean, median, and
mode may be different if the distribution is skewed (not Gaussian distribution). Other
characteristics of Gaussian distributionsare as follows:

Mean±1 SD contain 68.2% of all values.

Mean±2 SD contain 95.5% of all values.

Mean±3 SD contain 99.7% of all values.

A Gaussian distribution is shown in Figure. Usually, reference range is determined by

measuring the value of an analyte in a large number of normal subjects (at least 100
normal healthy people, but preferably 200–300 healthy individuals). Then the mean and
standard deviations are determined.

Ques 3: Explain Ensemble Learning.

Ans : In machine learning, ensemble methods use multiple learning algorithms to obtain better
predictive performance than could be obtained from any of the constituent learning algorithms
alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine
learning ensemble consists of only a concrete finite set of alternative models, but typically
allows for much more flexible structure to exist among those alternatives. Ensemble learning is
the use of algorithms and tools in machine learning and other disciplines, to form a
collaborative whole where multiple methods are more effective than a single learning method.
Ensemble learning can be used in many different types of research, for flexibility and enhanced
results. Many ensemble learning tools can be trained to produce various results. Individual
algorithms may be stacked on top of each other, or rely on a “bucket of models” method of
evaluating multiple methods for one system. In some cases, multiple data sets are aggregated
and combined. For example, a geographic research program may use multiple methods to
assess the prevalence of items in a geographic space. One of the issues with this type of
research involves making sure that various models are independent, and that the combination
of data is practical and works in a particular scenario.

Ensemble learning methods are included in different types of statistical software packages.
Some experts describe ensemble learning as “crowdsourcing” of data aggregation.
Ques 4: Describe how principle component analysis is carried out to
reduce dimensionality of data sets .
Ans : The main idea of principal component analysis (PCA) is to reduce the dimensionality of
a data set consisting of many variables correlated with each other, either heavily or lightly,
while retaining the variation present in the dataset, up to the maximum extent. The same is done
by transforming the variables to a new set of variables, which are known as the principal
components (or simply, the PCs) and are orthogonal, ordered such that the retention of variation
present in the original variables decreases as we move down in the order. So, in this way, the
1st principal component retains maximum variation that was present in the original
components. The principal components are the eigenvectors of a covariance matrix, and hence
they are orthogonal.
Importantly, the dataset on which PCA technique is to be used must be scaled. The results are
also sensitive to the relative scaling. As a layman, it is a method of summarizing data. Imagine
some wine bottles on a dining table. Each wine is described by its attributes like colour,
strength, age, etc. But redundancy will arise because many of them will measure related
properties. So what PCA will do in this case is summarize each wine in the stock with less
characteristics.
Intuitively, Principal Component Analysis can supply the user with a lower-dimensional
picture, a projection or "shadow" of this object when viewed from its most informative
viewpoint.

Ques 5: Summarize K-means Algorithm

Ans : K-means clustering is one of the simplest and popular unsupervised machine learning
algorithms.
Typically, unsupervised algorithms make inferences from datasets using only input
vectors without referring to known, or labelled, outcomes.
To process the learning data, the K-means algorithm in data mining starts with a first group of
randomly selected centroids, which are used as the beginning points for every cluster, and then
performs iterative (repetitive) calculations to optimize the positions of the centroids

It halts creating and optimizing clusters when either:

• The centroids have stabilized — there is no change in their values because the
clustering has been successful.

• The defined number of iterations has been achieved.

Batc 601
No ratings yet
Batc 601
6 pages
306-MKT CB MCQ-min
No ratings yet
306-MKT CB MCQ-min
63 pages
Marketing Management
No ratings yet
Marketing Management
197 pages
TB Chap001
100% (7)
TB Chap001
6 pages
Using SAS For Econometrics, 4th Edition - Griffiths, William E - PDF
No ratings yet
Using SAS For Econometrics, 4th Edition - Griffiths, William E - PDF
592 pages
OMPC002 Final Edition YnR
100% (1)
OMPC002 Final Edition YnR
18 pages
Sem-2 PGEM-Jan 21 Batch
No ratings yet
Sem-2 PGEM-Jan 21 Batch
3 pages
LDR 531 Final Exam - LDR 531 Final Exam Organizational Leadership Final Exam - UOP Students
100% (1)
LDR 531 Final Exam - LDR 531 Final Exam Organizational Leadership Final Exam - UOP Students
66 pages
MARKETING MANAGEMENT MCQ
No ratings yet
MARKETING MANAGEMENT MCQ
24 pages
AFM Unit 2 (Problems Only)
No ratings yet
AFM Unit 2 (Problems Only)
17 pages
Semester 2 Online Class Schedule
No ratings yet
Semester 2 Online Class Schedule
1 page
Marketing 1
No ratings yet
Marketing 1
11 pages
Management Accountin ... : Skip To Main Content
No ratings yet
Management Accountin ... : Skip To Main Content
16 pages
Stats-Edited Answers
No ratings yet
Stats-Edited Answers
30 pages
Batc 602
No ratings yet
Batc 602
8 pages
HRMC002 - 10: Sumit Singh Full Description
No ratings yet
HRMC002 - 10: Sumit Singh Full Description
213 pages
Service MKT MCQ
No ratings yet
Service MKT MCQ
34 pages
Questions Answers: Operating Activities, Investment, Financing Activities Zero Debit Telephone and Credit Cash
No ratings yet
Questions Answers: Operating Activities, Investment, Financing Activities Zero Debit Telephone and Credit Cash
3 pages
Media-Planning Solved MCQs (Set-6)
No ratings yet
Media-Planning Solved MCQs (Set-6)
8 pages
Imt 15
No ratings yet
Imt 15
8 pages
MCQ - TMH
No ratings yet
MCQ - TMH
4 pages
HRMC002
No ratings yet
HRMC002
8 pages
MKTQSMMMMM
No ratings yet
MKTQSMMMMM
26 pages
HRMC001 - Managerial Communication
No ratings yet
HRMC001 - Managerial Communication
307 pages
Sawtooth Software: Analysis of Traditional Conjoint Using Microsoft Excel: An Introductory Example
No ratings yet
Sawtooth Software: Analysis of Traditional Conjoint Using Microsoft Excel: An Introductory Example
7 pages
Opme004 Assignment Questions
No ratings yet
Opme004 Assignment Questions
4 pages
TAN410 Quiz 1. Review Chapter 1 Chapter 2 Questions PDF
No ratings yet
TAN410 Quiz 1. Review Chapter 1 Chapter 2 Questions PDF
14 pages
Quiz 2 Strategic Marketing
No ratings yet
Quiz 2 Strategic Marketing
2 pages
Anushree Itae003 Marketing Analytics Both Assignment
100% (1)
Anushree Itae003 Marketing Analytics Both Assignment
61 pages
International Marketing MCQ
No ratings yet
International Marketing MCQ
13 pages
Marketing Research SCDL
50% (4)
Marketing Research SCDL
64 pages
Market 2.1
No ratings yet
Market 2.1
16 pages
MM Sample QP
No ratings yet
MM Sample QP
11 pages
Advertising Paper MCQ New Updated File
No ratings yet
Advertising Paper MCQ New Updated File
115 pages
Marketing of Services V4
No ratings yet
Marketing of Services V4
9 pages
OPMC001 - Business Statistics - Both Assignment
No ratings yet
OPMC001 - Business Statistics - Both Assignment
189 pages
C U Âm Chân Kinh BRM
No ratings yet
C U Âm Chân Kinh BRM
235 pages
FINC001 - Part1
No ratings yet
FINC001 - Part1
9 pages
MGT301 More Latest Solved Quizzes
100% (1)
MGT301 More Latest Solved Quizzes
24 pages
BRM MCQ 03
100% (1)
BRM MCQ 03
21 pages
MCQ 2
0% (1)
MCQ 2
79 pages
Marketing Management SAMPLE PAPER Solution
No ratings yet
Marketing Management SAMPLE PAPER Solution
22 pages
FIN009 Test2
No ratings yet
FIN009 Test2
7 pages
Mis 107 - Course - Outline
No ratings yet
Mis 107 - Course - Outline
4 pages
CH 02 PQ
100% (1)
CH 02 PQ
8 pages
SCDL Finance Management Q1docx
100% (1)
SCDL Finance Management Q1docx
7 pages
Marketing Research Asm
No ratings yet
Marketing Research Asm
14 pages
Strategic 2
No ratings yet
Strategic 2
22 pages
Marketing Management Exam Answer Keys
No ratings yet
Marketing Management Exam Answer Keys
10 pages
Marketing Chapter Nine
0% (1)
Marketing Chapter Nine
35 pages
Opme004 Operations Stratgy Notes
No ratings yet
Opme004 Operations Stratgy Notes
14 pages
202c - Business Environment
100% (1)
202c - Business Environment
22 pages
Fin C 000002
No ratings yet
Fin C 000002
14 pages
Brand Management: View Answer
No ratings yet
Brand Management: View Answer
24 pages
Strategic Managemen-1MCQs
No ratings yet
Strategic Managemen-1MCQs
17 pages
Statement of Income: P&L Account Can Also Be Named As
No ratings yet
Statement of Income: P&L Account Can Also Be Named As
5 pages
Marketing Management-1 08.03.2022 MBS
No ratings yet
Marketing Management-1 08.03.2022 MBS
18 pages
Chapter 06
No ratings yet
Chapter 06
5 pages
Marketing Management MCQS: Chapter 1: Marketing: Creating Customer Value and Engagement
No ratings yet
Marketing Management MCQS: Chapter 1: Marketing: Creating Customer Value and Engagement
113 pages
Mar
No ratings yet
Mar
55 pages
Unit No.02 - Feature Extraction and Selection
No ratings yet
Unit No.02 - Feature Extraction and Selection
17 pages
Essential Statistics 2E: William Navidi and Barry Monk
No ratings yet
Essential Statistics 2E: William Navidi and Barry Monk
30 pages
PN 5000 Fispq GHS
No ratings yet
PN 5000 Fispq GHS
6 pages
Quarter 4 Mod 3 Identifying Rejection Region
67% (3)
Quarter 4 Mod 3 Identifying Rejection Region
12 pages
85 ArticleText 340 1 10 20200730
No ratings yet
85 ArticleText 340 1 10 20200730
8 pages
Discovering Statistics Using Ibm Spss Statistics 4Th Edition (Ebook PDF) Download
No ratings yet
Discovering Statistics Using Ibm Spss Statistics 4Th Edition (Ebook PDF) Download
45 pages
Chapter 9: Serial Correlation
No ratings yet
Chapter 9: Serial Correlation
7 pages
Lampiran Data Hasil Penelitian: Case Summaries
No ratings yet
Lampiran Data Hasil Penelitian: Case Summaries
5 pages
Fie453 Final-Report 20221218
No ratings yet
Fie453 Final-Report 20221218
24 pages
Module 4
No ratings yet
Module 4
17 pages
Arima and Holt Winter PDF
No ratings yet
Arima and Holt Winter PDF
43 pages
Effect Size and Association
No ratings yet
Effect Size and Association
36 pages
Cultural Adaptation and Validation of The Family Nursing PR - 2023 - Nurse Educa
No ratings yet
Cultural Adaptation and Validation of The Family Nursing PR - 2023 - Nurse Educa
7 pages
WBS-2-Operations Analytics-W1S5-Practice-Problems-Solutions
No ratings yet
WBS-2-Operations Analytics-W1S5-Practice-Problems-Solutions
6 pages
MAT 3 14th WeeK
No ratings yet
MAT 3 14th WeeK
28 pages
Chap 05 Time Series Analysis and Forecasting
No ratings yet
Chap 05 Time Series Analysis and Forecasting
63 pages
DAV Short Notes
No ratings yet
DAV Short Notes
5 pages
Hasil Output SPSS 21
No ratings yet
Hasil Output SPSS 21
7 pages
Exercises
No ratings yet
Exercises
7 pages
Student Performance Analysis Using Machine Learning: Yamnampet, Hyderabad.
No ratings yet
Student Performance Analysis Using Machine Learning: Yamnampet, Hyderabad.
8 pages
Activity 3 General
No ratings yet
Activity 3 General
21 pages
Expert Systems With Applications: Van Tung Tran, Bo-Suk Yang
No ratings yet
Expert Systems With Applications: Van Tung Tran, Bo-Suk Yang
12 pages
Q&A Univ 3unit
No ratings yet
Q&A Univ 3unit
18 pages
Assignment 4
No ratings yet
Assignment 4
3 pages
Quantii Alt
No ratings yet
Quantii Alt
41 pages
CmkMachine Capability Format
No ratings yet
CmkMachine Capability Format
1 page
Meta Analysis Jamovi Author Year
No ratings yet
Meta Analysis Jamovi Author Year
2 pages
MTP 22 56 Questions 1716557591
No ratings yet
MTP 22 56 Questions 1716557591
19 pages
Geostat2 Fall18 Notes PDF
No ratings yet
Geostat2 Fall18 Notes PDF
7 pages
Non Parametric Tests
100% (1)
Non Parametric Tests
49 pages

Machine Learning Techniques Assignment-7: Name:Ishaan Kapoor Rollno:1/15/Fet/Bcs/1/055

Uploaded by

Machine Learning Techniques Assignment-7: Name:Ishaan Kapoor Rollno:1/15/Fet/Bcs/1/055

Uploaded by

Name :Ishaan Kapoor RollNo:1/15/FET/BCS/1/055 Class: 8CSA2

MACHINE LEARNING TECHNIQUES

Uses Known and Labeled Data

(c) Explain Support Vector Machine

Ans : Support Vector Machine” (SVM) is a supervised machine learning algorithm

(d) What is the need of feature selection

Ans : Feature selection or data dimension reduction or variable screening in

(e) What is Factor Analysis

Ques 2: Explain Gaussian Distribution and its application.

Mean±1 SD contain 68.2% of all values.

Mean±2 SD contain 95.5% of all values.

Mean±3 SD contain 99.7% of all values.

A Gaussian distribution is shown in Figure. Usually, reference range is determined by

Ques 3: Explain Ensemble Learning.

Ques 5: Summarize K-means Algorithm

It halts creating and optimizing clusters when either:

• The defined number of iterations has been achieved.

You might also like