Compre FoDS

Uploaded by

Azure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views2 pages

Compre FoDS

Uploaded by

Azure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Birla Institute of Technology & Science - Pilani, Hyderabad Campus

First Semester 2021-2022

CS F320 – Foundations of Data Science
Comprehensive Examination
Type: Closed Time: 180 mins Max Marks: 80 Date: 17.12.2021

All parts of the same question should be answered together.

1. Suppose there are 100 features for a classification problem and you are given one million examples.
Suppose you are given the best possible five principal components v1, v2, v3, v4, v5 (i.e., eigen vectors
of covariance matrix) after performing PCA and let the corresponding eigen values be m1, m2, m3, m4,
m5.

You are given a testing example say (xt1, xt2, . . , xt100).

a. What would be the representation (or components) of this testing example in the five-dimensional
principal component space?

b. Specify the formula to find out the percentage of variance of the original data is being captured if
all points are transformed to five-dimensional components?

c. How many principal components should be considered to achieve 100 percent accuracy?

d. Let us consider another problem. Suppose there is a problem with two features and the four
training data points are (1,0), (2,3), (4,1) and (5,4). The eigen vector (with maximum eigen value)
of the corresponding covariance matrix is (2-1/2, 2-1/2 ). Write down the four data points after they
are transformed to the first principal component. Draw a two-dimensional graph to pictorially
represent your findings. How much percentage of the original variance is captured after this
transformation. [2 + 2 + 2 + 6 = 12 Marks]

2. a. Let x be D-variate feature vector and t be a single variate target attribute for a regression problem.
Let p(x,t) be the joint probability distribution from which the data is generated. Suppose you are given the
joint probability distribution p(x,t). Let L(t,y(x)) be the loss function and L(t,y(x)) for this problem is
taken to be the (y(x) – t)2.. If you are given the freedom to have y(x) to be as flexible as you can, then find
out y(x) that minimizes expected loss.

b. With y(x) that is found out by the above methodology, would the expected loss be zero or non-zero. If
it is zero prove it otherwise derive the remaining expected loss term. [6 + 6 = 12 Marks]

3. a. The objective function in ridge regression can also be derived using probabilistic approach by
assuming an appropriate prior distribution. Suppose there is a regression problem with ‘D’ features and a
single variate target attribute and you are given ‘N’ training data points. By assuming an appropriate prior
distribution, derive the objective function that is typically used in ridge regression.

Note: You may make necessary and appropriate assumptions to solve this problem.

b. In Bayesian curve fitting, derive the probability density function of the target attribute given the feature
vector of the test example and all training examples. You need not figure out the parameters of this
distribution but the derivation should be complete. [5 + 5 = 10 Marks]

4.a. Identify at least two advantages and two disadvantages of using color to visually represent
information.
4.b. Would simple random sampling (without replacement) be a good approach to sampling? Why or why
not?
4.c. Describe how a box plot can give information about whether the value of an attribute is
symmetrically distributed.
4.d. Explain the information conveyed by the star coordinates of Iris database shown in the following
Figures. [2 + 2 + 3 + 3 = 10 Marks]

5. a. In real-world data, tuples with missing values for some attributes are a common occurrence. Describe
two methods for handling this problem.

5.b. Suppose a group of 12 sales price records has been sorted as follows: 5, 10, 11, 13, 15, 35, 50, 55, 72,
92, 204, 215. Partition them into three bins by each of the following methods. (i) equal-frequency
partitioning (ii) equal-width partitioning.

5.c. Robust data loading poses a challenge in database systems because the input data are often dirty. In
many cases, an input record may have several missing values and some records could be contaminated
(i.e., with some data values out of range or of a different data type than expected). Work out an automated
data cleaning and loading algorithm so that the erroneous data will be marked and contaminated data will
not be mistakenly inserted into the database during data loading.

5.d. Distinguish between noise and outliers by answering the following questions.
(a) Is noise ever interesting or desirable? Outliers?
(b) Can noise objects be outliers?
(c) Are noise objects always outliers?
(e) Can noise make a typical value into an unusual one, or vice versa? [3 + 3 + 3 + 3 = 12 Marks]

6.a. Suppose there is a linear regression problem with only one feature and one target attribute. Let
(x1,y1), (x2,y2),…,,(xN,yN) be ‘N’ training data points and you are asked to fit a simple linear regression
by minimizing the sum of squares of error. Let the resultant built up regression model be y = α + β x.
Prove or disprove that the predicted linear regression model (i.e., y = α + β x) passes through the mean of
training examples (x^, y^). [10 Marks]

Note: x^ = (x1 + x2 + . . + xN) / N and y^ = (y1 + y2 + . . + yN) / N

6.b. Suppose there is a learning problem with ‘D’ features and we would like to find out the best subset of
features which might give optimal results. As the number of subsets of feature set is finite we can actually
find out the best feature subset by finding out the validation error with respect to each subset. Can you
give reasons why we will typically make use of heuristics like forward/backward greedy feature selection
algorithms? [4 Marks]

7. We know that variance of f(x) is defined as Var[f] = E[(f(x) – E[f(x)])2]. Show that variance of f(x) can
also be written as follows: Var[f] = E[f(x)2] – E[f(x)]2. [5 Marks]

8. Let X be a discrete random variable taking the values 1, 2, . . , n. We can have numerous discrete
probability distributions for the random variable X. Find out a probability distribution with maximum
entropy and another probability distribution with minimum entropy. [5 Marks]

Statistics and Probability: Quarter 3 - Module 1: Random Variables
100% (1)
Statistics and Probability: Quarter 3 - Module 1: Random Variables
19 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
Assignment 2: Introduction To Machine Learning Prof. B. Ravindran
100% (1)
Assignment 2: Introduction To Machine Learning Prof. B. Ravindran
3 pages
5
No ratings yet
5
100 pages
18CSO106T Data Analysis Using Open Source Tool: Question Bank
No ratings yet
18CSO106T Data Analysis Using Open Source Tool: Question Bank
26 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
Analytics Quiz and Case Study
No ratings yet
Analytics Quiz and Case Study
12 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
11 pages
MS4610 - Introduction To Data Analytics Final Exam Date: November 24, 2021, Duration: 1 Hour, Max Marks: 75
No ratings yet
MS4610 - Introduction To Data Analytics Final Exam Date: November 24, 2021, Duration: 1 Hour, Max Marks: 75
11 pages
2022 CS244 End Sem Soln
No ratings yet
2022 CS244 End Sem Soln
6 pages
Summative Assessment
No ratings yet
Summative Assessment
31 pages
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
No ratings yet
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
3 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
Itae002 Test 2
No ratings yet
Itae002 Test 2
150 pages
Homework Set 3
No ratings yet
Homework Set 3
7 pages
DS Assignment No 2
No ratings yet
DS Assignment No 2
21 pages
Amazon ML Pyq
No ratings yet
Amazon ML Pyq
8 pages
PRML 2022 Endsem
No ratings yet
PRML 2022 Endsem
3 pages
ECE457 Pattern Recognition Techniques and Algorithms: Answer All Questions
No ratings yet
ECE457 Pattern Recognition Techniques and Algorithms: Answer All Questions
3 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
Epfl Machine Learning Final Exam 2021 Solutions
No ratings yet
Epfl Machine Learning Final Exam 2021 Solutions
21 pages
CSE1703 - Fundamental of Data Science
No ratings yet
CSE1703 - Fundamental of Data Science
6 pages
Data Analytics Questions
No ratings yet
Data Analytics Questions
40 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
Statistics 1 Discrete Random Variables Past Examination
No ratings yet
Statistics 1 Discrete Random Variables Past Examination
24 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
71 pages
Foundations of Econometrics Using SAS Simulations and Examples
No ratings yet
Foundations of Econometrics Using SAS Simulations and Examples
56 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
Midterm Trial
No ratings yet
Midterm Trial
15 pages
Worksheet For Quiz
No ratings yet
Worksheet For Quiz
5 pages
Final Exam AIML2023
No ratings yet
Final Exam AIML2023
3 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
Test Code MS (Short Answer Type) 2011 Syllabus For Mathematics
No ratings yet
Test Code MS (Short Answer Type) 2011 Syllabus For Mathematics
5 pages
STA642 Handouts Topic 1 To 187 by Mahar Afaq Safdar Muhammad
No ratings yet
STA642 Handouts Topic 1 To 187 by Mahar Afaq Safdar Muhammad
1,739 pages
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
No ratings yet
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
4 pages
Data Science Cse
No ratings yet
Data Science Cse
24 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
69 pages
IE506 Assignment1
No ratings yet
IE506 Assignment1
2 pages
Mid Sem Exam
No ratings yet
Mid Sem Exam
3 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
Vtu Mtech Structural Engg Syllabus
No ratings yet
Vtu Mtech Structural Engg Syllabus
11 pages
UNIT II Probability Problems
No ratings yet
UNIT II Probability Problems
42 pages
Allama Iqbal Open University, Islamabad Col Mba/Mpa Programme
No ratings yet
Allama Iqbal Open University, Islamabad Col Mba/Mpa Programme
8 pages
BDS 2019-20
No ratings yet
BDS 2019-20
5 pages
2 Marks
No ratings yet
2 Marks
2 pages
COMPSCI5014 1 Machine Learning (M) 201904
No ratings yet
COMPSCI5014 1 Machine Learning (M) 201904
7 pages
Question Bank
No ratings yet
Question Bank
6 pages
Chap 2 Introduction To Statistics
No ratings yet
Chap 2 Introduction To Statistics
46 pages
Binomial and Poission Distribution 6
No ratings yet
Binomial and Poission Distribution 6
34 pages
I PUC Economics Passing Package 2023-24
No ratings yet
I PUC Economics Passing Package 2023-24
17 pages
Statistics and Probability Katabasis
No ratings yet
Statistics and Probability Katabasis
7 pages
MLvsMAP Merged
No ratings yet
MLvsMAP Merged
208 pages
Compre FoDS
No ratings yet
Compre FoDS
2 pages
Business Statistics - Assignment
No ratings yet
Business Statistics - Assignment
7 pages
Lesson 6 Normal Distribution
No ratings yet
Lesson 6 Normal Distribution
15 pages
Compre FoDS
No ratings yet
Compre FoDS
3 pages
Compre FoDS
No ratings yet
Compre FoDS
2 pages
Cusat MCA Syllubus - 2012
No ratings yet
Cusat MCA Syllubus - 2012
46 pages
Problem: Maximum Inventory Level M 11 Units Review Period N 5 Days
No ratings yet
Problem: Maximum Inventory Level M 11 Units Review Period N 5 Days
8 pages
Assignment For Project Planning and Scheduling
No ratings yet
Assignment For Project Planning and Scheduling
7 pages
Math 32 - 09 - Cumulative Binomial
No ratings yet
Math 32 - 09 - Cumulative Binomial
6 pages
Final Compre - Solutions - Updated FoDS
No ratings yet
Final Compre - Solutions - Updated FoDS
12 pages
Statistical and Mathematical Methods For Data Analysis
No ratings yet
Statistical and Mathematical Methods For Data Analysis
25 pages
Lesson 6 Prob Distributions
No ratings yet
Lesson 6 Prob Distributions
17 pages
Fdsa Unit 5
No ratings yet
Fdsa Unit 5
48 pages
ERERER
No ratings yet
ERERER
1 page
Quiz1 18September2021-Ans
No ratings yet
Quiz1 18September2021-Ans
3 pages
Mid Sem Final - Solutions
No ratings yet
Mid Sem Final - Solutions
9 pages
Assignment III
No ratings yet
Assignment III
3 pages
Unit V - Update
No ratings yet
Unit V - Update
53 pages
Unit Summary
No ratings yet
Unit Summary
31 pages
CS-30004 (Dsa) - CS End Nov 2024
No ratings yet
CS-30004 (Dsa) - CS End Nov 2024
17 pages
Mid-Sem 9
No ratings yet
Mid-Sem 9
2 pages
Random Variable Exercises
No ratings yet
Random Variable Exercises
5 pages
PDF 63823314943067501932
No ratings yet
PDF 63823314943067501932
36 pages
Mathematics V Kas 404
No ratings yet
Mathematics V Kas 404
3 pages
Department of Electrical Engineering School of Science and Engineering
No ratings yet
Department of Electrical Engineering School of Science and Engineering
10 pages
S&UL Subjective Question Bank
No ratings yet
S&UL Subjective Question Bank
7 pages
AI Final Spring 2021
No ratings yet
AI Final Spring 2021
3 pages
CT-2 Ak
No ratings yet
CT-2 Ak
13 pages
Problabistic PP
No ratings yet
Problabistic PP
26 pages
01 IE 312 Simulation 1st and 2nd Weeks D.H.utku Spring 2024-25
No ratings yet
01 IE 312 Simulation 1st and 2nd Weeks D.H.utku Spring 2024-25
71 pages
Practice Questions Lec 18 45
No ratings yet
Practice Questions Lec 18 45
4 pages
Normalized Central Moments and 7 Moments of Hu: Moment Based Features, Moment Invariants
No ratings yet
Normalized Central Moments and 7 Moments of Hu: Moment Based Features, Moment Invariants
18 pages
New Assign 1
No ratings yet
New Assign 1
5 pages
Stochastic Systems Theory and Applications V S Pugachev Igor Sinitsyn Download
No ratings yet
Stochastic Systems Theory and Applications V S Pugachev Igor Sinitsyn Download
91 pages
Introductory Statistics 8th Edition by Prem S Mann
No ratings yet
Introductory Statistics 8th Edition by Prem S Mann
304 pages
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
From Everand
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
Manish Soni
No ratings yet
IGNOU MCA Discrete Mathematics Previous Years Unsolved Papers MCS 212
From Everand
IGNOU MCA Discrete Mathematics Previous Years Unsolved Papers MCS 212
Manish Soni
No ratings yet

Compre FoDS

Uploaded by

Compre FoDS

Uploaded by

Birla Institute of Technology & Science - Pilani, Hyderabad Campus

First Semester 2021-2022

All parts of the same question should be answered together.

You are given a testing example say (xt1, xt2, . . , xt100).

Note: x^ = (x1 + x2 + . . + xN) / N and y^ = (y1 + y2 + . . + yN) / N

You might also like