QB - Data Science
QB - Data Science
Unit 1
• Explain different stages of data Science?
• Explain Raw Data with example.
• Illustrate Central Limit Theorem with a neat diagram.
• Describe Baye’s theorem in details with an example.
• Explain Processed Data with example.
• Explain Meta Data (Code Book) with example.
Unit 2
• Differentiate between: Point Estimate, Interval Estimate & Confidence Interval.
• Explain null and alternative hypothesis by considering the example for a flipping coin.
• Explain Type 1 & Type 2 errors in hypothesis testing with suitable examples.
• What is Significance Level? How it regulates the possibility of occurrence of Type 1 & Type 2
errors?
• Explain p values with example.
• Explain the interrelationship of Margin of Error and Standard Error?
• In the population, the average IQ is 100 with a standard deviation of 15. A team of scientists
want to test a new medication to see if it has either a positive or negative effect on intelligence
or not effect at all. A sample of 30 participants who have taken the medication has a mean of
140. Did the medication affect intelligence?
• Study the data distribution given in table and answer the questions below.
Value 1 2 3 4 5 6 7 8
No. of data points with 1 0 0 3 4 10 12 8
that value i.e. frequency
o What is the mean value?
o How would you describe the data distribution? Why?
Unit 3
• Explain how gradient descent is used to fit parameterized models.
• Explain the concept of Lp norm.
• State the advantages and disadvantages of using L1 norm.
• Illustrate with an example, L1 metric distance is always larger than 1.2 metric distance.
• Draw a typical Hessian Matrix? Indicate how is it used in Optimization
Unit 4
• What is machine learning? What is its role in data Science?
• Explain supervised and unsupervised machine learning?
• Why we measure impurity of a resulting node in Decision tree? List the different measures of
impurity in DT?
• There are 4 coins A, B, C and D out of which 3 coins are of equal weight and one coin is heavier.
Find out the heavier coin using Decision Tree.
Unit 5
• Use K-Means
Means Algorithm to create two clusters.
clusters. Assume A(2, 2) and C(1, 1) are centers of the
two clusters.
• Marks scored by 10 students in mathematics and computer science are given in table below along
with their result as Pass or Fail. Pappu scores 41 marks in mathematics and 38 marks in computer
science. Using KNN classifier algorithm, determine whether Pappu has passed or failed using K as
1,2,3,5 and 7.
Student Mathematics Computer Result
Science
Naren 80 80 Pass
Amit 75 40 Pass
Deven 65 50 Pass
Surya 40 40 Pass
Data Science – Question Bank
Sanjay 70 40 Pass
Teja 65 37 Fail
Akhilesh 70 25 Fail
Sharad 38 38 Fail
Ajit 35 59 Fail
Shivraj 70 65 Pass
• Using the Naïve Bayes Classifier approach based on the training data set given in table.
Predict Class = Buy Laptop: Yes or No for the feature set: {Income = Low; Student =
No; Credit Rating = Excellent}
Unit 6
• Define Genie impurity and Entropy impurity. What will their values be, for the purest node?
• How would you execute the k-fold cross validation strategy? Why is Leave-one-out-method its
specialization?
• A confusion matrix for a classification exercise returns the following values: TP=0.962, TN:0.93,
FP:0.12, FN:0.07. Calculate accuracy, precision, recall, sensitivity, specificity and f-score.
• The confusion matrix for a certain classification activity is as shown in Table no. 2
Predicted: NO Predicted: YES
Actual: NO 50 10
Actual: YES 5 100
Find the following classifier performance measures –
1. Accuracy
2. Precision
3. Recall
4. Specificity
5. F-Score
6. Error rate
• Explain the following methods used for training and testing –
Data Science – Question Bank
1. Re substitution
2. K fold Cross-validation
3. Bootstrapping