QB - Data Science

The document contains questions divided into six units related to data science topics including data processing, statistical analysis, machine learning algorithms, and model evaluation. Questions cover concepts such as data distributions, hypothesis testing, regression, clustering, naive Bayes classification, and performance metrics. Example problems are provided for many of the concepts.

Uploaded by

svaishnavi112003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views4 pages

QB - Data Science

Uploaded by

svaishnavi112003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Data Science – Question Bank

Unit 1
• Explain different stages of data Science?
• Explain Raw Data with example.
• Illustrate Central Limit Theorem with a neat diagram.
• Describe Baye’s theorem in details with an example.
• Explain Processed Data with example.
• Explain Meta Data (Code Book) with example.

Unit 2
• Differentiate between: Point Estimate, Interval Estimate & Confidence Interval.
• Explain null and alternative hypothesis by considering the example for a flipping coin.
• Explain Type 1 & Type 2 errors in hypothesis testing with suitable examples.
• What is Significance Level? How it regulates the possibility of occurrence of Type 1 & Type 2
errors?
• Explain p values with example.
• Explain the interrelationship of Margin of Error and Standard Error?
• In the population, the average IQ is 100 with a standard deviation of 15. A team of scientists
want to test a new medication to see if it has either a positive or negative effect on intelligence
or not effect at all. A sample of 30 participants who have taken the medication has a mean of
140. Did the medication affect intelligence?

• Study the data distribution given in table and answer the questions below.
Value 1 2 3 4 5 6 7 8
No. of data points with 1 0 0 3 4 10 12 8
that value i.e. frequency
o What is the mean value?
o How would you describe the data distribution? Why?

Unit 3
• Explain how gradient descent is used to fit parameterized models.
• Explain the concept of Lp norm.
• State the advantages and disadvantages of using L1 norm.
• Illustrate with an example, L1 metric distance is always larger than 1.2 metric distance.
• Draw a typical Hessian Matrix? Indicate how is it used in Optimization

Unit 4
• What is machine learning? What is its role in data Science?
• Explain supervised and unsupervised machine learning?

• Explain the standard errors of regression coefficients.

• What is the significance of R2 in regression? If a regression activity returns R2 as 0.9354, what is
your interpretation of the same?
• Which regression is used in modeling of sensor characteristics?
• What do you understand by Logistic regression? What are dichotomous variables in the context of
Logistic regression?
• Explain dichotomous variables in context of Logistics regression using suitable examples.
Data Science – Question Bank
• What do you mean by interpretation of beta coefficients? Explain with examples.
• A certain regression equation obtained following scores: SSR: 20.24 and SSE: 2.11. What is value of
R2? Based on the value of R2 comment on the relationship between the variables.

• Why we measure impurity of a resulting node in Decision tree? List the different measures of
impurity in DT?
• There are 4 coins A, B, C and D out of which 3 coins are of equal weight and one coin is heavier.
Find out the heavier coin using Decision Tree.

Unit 5

• Compare and contrast between Divisive and Agglomerative clustering algorithm?

algo
• How does the KNN algorithm make the predictions on the unseen dataset?
dataset
• Is Feature Scaling required for the KNN Algorithm? Explain with proper justification.
• Cluster the followingg eight points (with (x, y) representing locations) into three clusters: A1(2,
10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9). Initial cluster centers are:
A1(2, 10), A4(5, 8) and A7(1, 2). The distance function between two pointspoint a = (x1, y1) and b =
(x2, y2) is defined as-- Ρ(a, b) = |x2 – x1| + |y2 – y1|. Use K-Means
Means Algorithm to find the three
cluster centers after the second iteration.
• Apply KNN and predict the class for the test point (3,7) for k=3. Training points with class are
(x,y,class). (7,7,2), (7,4,2), (3,4,1), (1,4,1), (2,5,2), (3,8,1)

• Use K-Means
Means Algorithm to create two clusters.
clusters. Assume A(2, 2) and C(1, 1) are centers of the
two clusters.

• Marks scored by 10 students in mathematics and computer science are given in table below along
with their result as Pass or Fail. Pappu scores 41 marks in mathematics and 38 marks in computer
science. Using KNN classifier algorithm, determine whether Pappu has passed or failed using K as
1,2,3,5 and 7.
Student Mathematics Computer Result
Science
Naren 80 80 Pass
Amit 75 40 Pass
Deven 65 50 Pass
Surya 40 40 Pass
Data Science – Question Bank
Sanjay 70 40 Pass
Teja 65 37 Fail
Akhilesh 70 25 Fail
Sharad 38 38 Fail
Ajit 35 59 Fail
Shivraj 70 65 Pass

• Using the Naïve Bayes Classifier approach based on the training data set given in table.
Predict Class = Buy Laptop: Yes or No for the feature set: {Income = Low; Student =
No; Credit Rating = Excellent}

Sr. No. Income Student Credit Buy Laptop

Rating
1 High No Fair No
2 High No Excellent No
3 High No Fair Yes
4 Medium No Fair Yes
5 Low Yes Fair Yes
6 Low Yes Excellent No
7 Low Yes Excellent Yes
8 Medium No Fair No
9 Low Yes Fair Yes
10 Medium Yes Fair Yes
11 Medium Yes Excellent Yes
12 Medium No Excellent Yes
13 High Yes Fair Yes
14 Medium No Excellent No

Unit 6

• Define Genie impurity and Entropy impurity. What will their values be, for the purest node?
• How would you execute the k-fold cross validation strategy? Why is Leave-one-out-method its
specialization?
• A confusion matrix for a classification exercise returns the following values: TP=0.962, TN:0.93,
FP:0.12, FN:0.07. Calculate accuracy, precision, recall, sensitivity, specificity and f-score.
• The confusion matrix for a certain classification activity is as shown in Table no. 2
Predicted: NO Predicted: YES
Actual: NO 50 10
Actual: YES 5 100
Find the following classifier performance measures –

1. Accuracy
2. Precision
3. Recall
4. Specificity
5. F-Score
6. Error rate
• Explain the following methods used for training and testing –
Data Science – Question Bank
1. Re substitution
2. K fold Cross-validation
3. Bootstrapping

Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
Data Mining Sample Midterm Questions (Last Modified 2/17/19)
No ratings yet
Data Mining Sample Midterm Questions (Last Modified 2/17/19)
4 pages
QB - Data Science
No ratings yet
QB - Data Science
7 pages
ML Questions Answers
No ratings yet
ML Questions Answers
4 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
ML Questions
No ratings yet
ML Questions
9 pages
Data Sciene - Unit 5 Material
No ratings yet
Data Sciene - Unit 5 Material
15 pages
DM 2023
No ratings yet
DM 2023
8 pages
DWM - END SEM LAB Questions
No ratings yet
DWM - END SEM LAB Questions
9 pages
QBank All Mod
No ratings yet
QBank All Mod
5 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
CS305 Exercise 5: Task 1: Comparing Machine Learning Algorithms
No ratings yet
CS305 Exercise 5: Task 1: Comparing Machine Learning Algorithms
7 pages
0.extracted Pages 20MCA201 From 2020 MCA S3 S4
No ratings yet
0.extracted Pages 20MCA201 From 2020 MCA S3 S4
18 pages
Data Science and ML-KTU
No ratings yet
Data Science and ML-KTU
11 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
ML Unit-2
No ratings yet
ML Unit-2
33 pages
Aiml K2
No ratings yet
Aiml K2
8 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
DM 23
No ratings yet
DM 23
8 pages
Coincent Data Analysis Answers
No ratings yet
Coincent Data Analysis Answers
16 pages
Đề Data mining
No ratings yet
Đề Data mining
6 pages
B. Sc. H Computer S 3OWYH6v
No ratings yet
B. Sc. H Computer S 3OWYH6v
6 pages
Classification and K Nearest Neighbour Algorithm
No ratings yet
Classification and K Nearest Neighbour Algorithm
53 pages
ML Assignment
No ratings yet
ML Assignment
5 pages
Pcit 114
No ratings yet
Pcit 114
2 pages
DWDM Unit Wise Question Bank
No ratings yet
DWDM Unit Wise Question Bank
8 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
Unit 6 Ai
No ratings yet
Unit 6 Ai
28 pages
Test Papers
No ratings yet
Test Papers
6 pages
NLP Chapter 2
No ratings yet
NLP Chapter 2
79 pages
Pa ZG512 Ec-3r First Sem 2022-2023
No ratings yet
Pa ZG512 Ec-3r First Sem 2022-2023
5 pages
DM 2019
No ratings yet
DM 2019
7 pages
Data Science and ML - End Term
No ratings yet
Data Science and ML - End Term
4 pages
MachineLearning MidTerm UMT Spring 2021
100% (1)
MachineLearning MidTerm UMT Spring 2021
12 pages
Machine Learning Solutions
No ratings yet
Machine Learning Solutions
6 pages
Classification
No ratings yet
Classification
58 pages
Michael Melese (PH.D.) Michael - Melese@aau - Edu.et
No ratings yet
Michael Melese (PH.D.) Michael - Melese@aau - Edu.et
22 pages
Q. 1) What Is Class Condition Density? (3 Marks) Ans
No ratings yet
Q. 1) What Is Class Condition Density? (3 Marks) Ans
12 pages
DSV Ia2
No ratings yet
DSV Ia2
18 pages
Midterm - APS1070 - 2019 - 09 Fall
No ratings yet
Midterm - APS1070 - 2019 - 09 Fall
2 pages
ML Unit-2 (CEC)
No ratings yet
ML Unit-2 (CEC)
96 pages
Interview Questions
No ratings yet
Interview Questions
23 pages
ML Imp QB
No ratings yet
ML Imp QB
34 pages
Data Mining - Sem 3 - Assignment - 2
No ratings yet
Data Mining - Sem 3 - Assignment - 2
5 pages
13.question Bank
No ratings yet
13.question Bank
4 pages
COSC 6335 Data Mining (Dr. Eick) Solution Sketches Midterm Exam October 25, 2012
No ratings yet
COSC 6335 Data Mining (Dr. Eick) Solution Sketches Midterm Exam October 25, 2012
11 pages
Data Science Technical Interview Questions
No ratings yet
Data Science Technical Interview Questions
24 pages
Data Mining f20 Practice Final Solutions
No ratings yet
Data Mining f20 Practice Final Solutions
8 pages
HW 02
No ratings yet
HW 02
3 pages
Document
No ratings yet
Document
6 pages
QCM DL
No ratings yet
QCM DL
7 pages
Unit 2 ML
No ratings yet
Unit 2 ML
89 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
Data Science in FInancial Services - 3
No ratings yet
Data Science in FInancial Services - 3
76 pages
Assingment On Database
No ratings yet
Assingment On Database
16 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
3 pages
2022 CS244 End Sem Soln
No ratings yet
2022 CS244 End Sem Soln
6 pages
Data Science Homework
No ratings yet
Data Science Homework
13 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
Diabetes Project Using Machine Learning
No ratings yet
Diabetes Project Using Machine Learning
49 pages
E BPF
No ratings yet
E BPF
13 pages
Garment Returns Prediction For AI-Based Processing and Waste Reduction in E-Commerce
No ratings yet
Garment Returns Prediction For AI-Based Processing and Waste Reduction in E-Commerce
9 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
208 pages
ART16000108
No ratings yet
ART16000108
6 pages
Main Report
No ratings yet
Main Report
94 pages
03 - Malware Analysis Primer
No ratings yet
03 - Malware Analysis Primer
64 pages
Sensors 23 08685
No ratings yet
Sensors 23 08685
21 pages
Model Klasifikasi Multi Class
No ratings yet
Model Klasifikasi Multi Class
28 pages
Sazan 1803020
No ratings yet
Sazan 1803020
57 pages
Unit 3 (ML)
No ratings yet
Unit 3 (ML)
26 pages
Challenges and Reliability of Predictive Maintenance
No ratings yet
Challenges and Reliability of Predictive Maintenance
25 pages
International Journal On Natural Language Computing (IJNLC)
No ratings yet
International Journal On Natural Language Computing (IJNLC)
15 pages
Tooth Detection From Panoramic Radiographs Using Deep Learning
No ratings yet
Tooth Detection From Panoramic Radiographs Using Deep Learning
11 pages
Priority Based Email Summarizer
No ratings yet
Priority Based Email Summarizer
8 pages
Heart Failure Prediction Using Machine Learning Algorithm
No ratings yet
Heart Failure Prediction Using Machine Learning Algorithm
5 pages
Final Report Final
No ratings yet
Final Report Final
129 pages
Paper Survey On Performance Metrics For Object Detection Algorithms
No ratings yet
Paper Survey On Performance Metrics For Object Detection Algorithms
6 pages
IoT-Based Smart Biofloc Monitoring System For Fish Farming Using Machine Learning
No ratings yet
IoT-Based Smart Biofloc Monitoring System For Fish Farming Using Machine Learning
13 pages
Data Mining Lab Manual List of Sample Problems
No ratings yet
Data Mining Lab Manual List of Sample Problems
34 pages
CNN Project
No ratings yet
CNN Project
9 pages
Improving Heart Disease Prediction Accuracy Using A Hybrid Machine Learning Approach: A Comparative Study of SVM and KNN Algorithms
No ratings yet
Improving Heart Disease Prediction Accuracy Using A Hybrid Machine Learning Approach: A Comparative Study of SVM and KNN Algorithms
6 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
44 pages
DWM - Classification-Unit7
No ratings yet
DWM - Classification-Unit7
44 pages
Defect Detection Choladeck
No ratings yet
Defect Detection Choladeck
14 pages
Phase 3 IBM
No ratings yet
Phase 3 IBM
7 pages
Sample Q - A For Module 3 - 4
No ratings yet
Sample Q - A For Module 3 - 4
18 pages
1 s2.0 S1566253522002081 Main
No ratings yet
1 s2.0 S1566253522002081 Main
19 pages
Predicting and Defining B2B Sales Success With Machine Learning
No ratings yet
Predicting and Defining B2B Sales Success With Machine Learning
5 pages
(IJCST-V13I2P6) :kishore S, Akilesh G, Kugan K, Bharathi V
No ratings yet
(IJCST-V13I2P6) :kishore S, Akilesh G, Kugan K, Bharathi V
4 pages

QB - Data Science

Uploaded by

QB - Data Science

Uploaded by

Data Science – Question Bank

• Explain the standard errors of regression coefficients.

• Compare and contrast between Divisive and Agglomerative clustering algorithm?

Sr. No. Income Student Credit Buy Laptop

You might also like