0% found this document useful (0 votes)

33 views8 pages

Data Science - QB

Uploaded by

Maidul Islam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views8 pages

Data Science - QB

Uploaded by

Maidul Islam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

DATA SCIENCE USING PYTHON TOOLS

Model Question
____________________________________________________________________________

(1 Mark Questions)
Chapter 1
1. What is the primary goal of data science?
2. Define 'Big Data.'
3. What are the key steps in the data science process?
4. What is the significance of machine learning in data science?
5. Name two types of regularization commonly used in linear regression.
6. Define regularization in the context of linear regression.
7. Define cross-validation.
8. What is data science?
9. What are the key components of data science?
10. Name a few programming languages commonly used in data science.
11. Name some common techniques used in EDA.
12. What are outliers?
13. What is regularization
14. Define linear regression
15. What is model selection
16. Name some common techniques for model evaluation.
17. Explain the concept of overfitting.
18. Explain the concept of underfitting.

Chapter 2
1. What does kNN stand for?
2. How does kNN classify a new data point?
3. Name one advantage of kNN.
4. Define a decision tree.
5. What is entropy in decision trees?
6. How does a decision tree handle categorical data?
7. What is the main goal of SVM?
8. Define the maximum margin in SVM.
9. What are kernels used for in SVM?
10. How does a random forest reduce overfitting?
11. Name one application of random forests.
12. What does "naïve" mean in Naïve Bayes?
13. Give an example of when Naïve Bayes is suitable.
14. What kind of problem does logistic regression solve?
15. What function is used in logistic regression?
16. What are the coefficients in logistic regression used for?
17. What is the goal of classification algorithms?
18. Name two main types of classification algorithms.
19. How does the choice of 'k' affect the kNN algorithm?
20. What does the 'k' in kNN represent?
21. What is entropy in the context of decision trees?
22. What kind of problems can SVM solve?
23. Why is maximizing the margin important in SVM?
24. What is the kernel trick in SVM?
25. What are ensemble methods in machine learning?
26. Provide one example of an ensemble method.
27. Name one advantage of using Random Forests over individual decision trees.

Chapter 3
1. What is feature engineering in machine learning?
2. Why is feature selection important in machine learning?
3. What is the main objective of k-means clustering?
4. What is clustering in machine learning?
5. Name one application of clustering algorithms.
6. What is the main goal of clustering?
7. Name one common evaluation metric for clustering.
8. How are initial centroids chosen in k-means?
9. What is the stopping criterion for the k-means algorithm?
10. What is hierarchical clustering?
11. What is a dendrogram in hierarchical clustering?
12. Why is dimensionality reduction important in machine learning?
13. What is the primary goal of dimensionality reduction?
14. Name two common techniques for dimensionality reduction.
15. What is the role of variance in PCA?
16. What are the components of SVD?
17. Name one application of SVD in machine learning or data analysis.

Chapter 4
1. What is text mining?
2. What are the main goals of text mining?
3. Name two common techniques used in text mining.
4. What is information retrieval?
5. What are the main components of an information retrieval system?
6. Name one evaluation metric used in information retrieval.
7. What is network analysis?
8. What are recommender systems?
9. What is content-based filtering?
(5 marks Questions)
Chapter 1
1. Why is data cleaning important in data science? What ethical considerations should data
scientists keep in mind?
2. Why is it important to perform EDA before building a predictive model? Define the terms
"dependent variable" and "independent variable" in the context of linear regression.
3. What is the purpose of the cost function in linear regression? How does regularization prevent
overfitting in linear regression?
Chapter 2
1. What is classification in machine learning? Explain the concept of ensemble learning.
2. What are the advantages and disadvantages of using kNN? How do you choose the optimal
value of 'k' in kNN?
3. What is a Decision Tree in the context of classification? Explain the concepts of entropy and
information gain in Decision Trees.
4. What is a Support Vector Machine (SVM) in classification? Explain the concept of a
hyperplane in SVM.
5. What are support vectors, and how are they determined in SVM? Describe the role of the
kernel function in SVM.
6. What are the advantages and disadvantages of using SVM?
7. What is a Random Forest? What are the advantages of using Random Forests over individual
decision trees?
8. What is the "naïve" assumption made in Naïve Bayes? Explain the principle behind Naïve
Bayes classification.
9. What is the likelihood function in logistic regression? Discuss the difference between logistic
regression and linear regression.
CHAPTER 3
1. What is feature engineering? What are some common techniques for handling missing values
in feature engineering?
2. What is feature selection, and why is it important? Name a few feature selection methods
commonly used in machine learning.
3. What is clustering in machine learning? Discuss the importance of choosing the appropriate
number of clusters in k-means clustering.
4. What is dimensionality reduction? How does SVD differ from PCA?
5. What are the advantages of using dimensionality reduction techniques like PCA or SVD?
CHAPTER 4
1. What is text mining, and how does it differ from information retrieval? What are some
common techniques for text normalization?
2. Explain the term "bag of words" model in text mining. Discuss the concept of stemming and
its importance in text mining.
3. What is network analysis? How do you calculate the clustering coefficient of a network?
4. What are recommender systems? Explain the concept of similarity metrics in recommender
systems.

(15 marks Questions)

Chapter 1
1. What is Exploratory Data Analysis (EDA), and why is it considered a crucial step in the data
analysis process? What is linear regression, and how does it differ from other regression
techniques? What are the different types of EDA? Discuss a few advantages of using EDA.
5+5+2+3
2. What is linear regression, and how does it work?
Obtain the equations of two lines of regression for the following data
X 65 66 67 67 68 69 70 72
Y 67 68 65 68 72 72 69 71
6+9
3. What is data science? What is the difference between data science and data analytics? What
are the differences between supervised and unsupervised learning? What is logistic regression?
How does it differ from linear regression? What is reinforcement Learning?
2+3+3+2+3+2
4. How does machine learning fit into the field of data science? Explain. What is regularization,
and why is it used in machine learning? Explain the concept of overfitting and underfitting in the
context of model selection. Discuss the trade-offs between bias and variance in model selection.
4+4+4+3
5. a) Define the following terms: precision, recall, sensitivity, F1-score.
b) Suppose a computer program for recognizing dogs in photographs identifies eight dogs in a
picture containing 12 dogs and some cats. Of the eight dogs identified, five actually are dogs
while the rest are cats. Compute the precision and recall of the computer program.
Can you explain the concept of confusion matrix and its role in model evaluation?
8+4+3
6. What is model selection, and why is it important in machine learning? Discuss the trade-offs
between bias and variance in model selection. Describe the process of cross-validation and its
role in model selection. Discuss the challenges of evaluating imbalanced datasets in
classification tasks.
3+3+5+4
CHAPTER 2
1. a) Describe the working principle of the k-nearest neighbors algorithm. How does it classify
new data points?
b) Discuss the importance of choosing the right value of 'k' in kNN. How does the choice of 'k'
affect the model's performance?
c) Suppose we have a dataset with the following points in a two-dimensional space:
Point A: (2, 3)
Point B: (3, 5)
Point C: (5, 8)
Point D: (7, 2)
Point E: (8, 6)
You are asked to classify a new point, P= (4, 4) using the KNN algorithm with K=3.
6+4+5=15
2. a) Explain the concept of decision trees in classification. How does the decision tree algorithm
construct a tree from the training data?
b) Use ID3 algorithm to construct a decision tree for the data in the following table.

Age Competition Type Class

(profit)

Old Yes Software Down

Old No Software Down

Old Yes Hardware Down

Mid Yes Software Down

Mid Yes Hardware Down

Mid No Hardware Up
Mid No Software Up

New Yes Software Up

New No Hardware Up

New No Software Up

6+9=15
3. Explain the fundamental principles of Support Vector Machines (SVM). How does SVM find
the optimal hyperplane for classification? Describe the concept of margin in SVM. Explain the
concept of kernel functions in SVM.
4+4+3+4
4. Explain the fundamental principles of the Naïve Bayes algorithm.
Based on the following data determine the gender of a person having height 6 ft., weight 130 lbs.
and foot size 8 in. (use naive Bayes algorithm).

person height weight foot size

(feet) (lbs) (inches)

Male 6 180 10

Male 5.5 170 8

Male 6 170 10
Female 5 130 8

Female 5.5 150 7

Female 5 130 6

Female 6 150 8

5+10=15
5. Define classification and explain its importance in machine learning. How does it differ from
other tasks like regression or clustering? Discuss the key components of a classification problem,
including features, labels, training data, and evaluation metrics.
(3+4)+8=15
6. Compare and contrast the strengths and weaknesses of kNN with other classification
algorithms, such as decision trees and SVM. Discuss methods for handling overfitting in
decision trees, such as tree pruning, minimum sample split, and maximum depth constraints.
6+9=15
CHAPTER 3
1. Explain the importance of feature engineering in machine learning. How does feature
engineering contribute to improving the performance of machine learning models? Describe the
process of feature selection. What are the different approaches to feature selection, including
filter methods, wrapper methods, and embedded methods?
3+3+3+6=15
2. Explain the k-means clustering algorithm. How does it partition data into clusters based on the
similarity of data points to cluster centroids? Discuss strategies for evaluating the quality of
clustering results. What are the common metrics used to assess clustering performance?
5+3+3+4=15
3. Explain the concept of dimensionality reduction and its importance in machine learning.
How does dimensionality reduction help address the curse of dimensionality and improve the
efficiency of machine learning algorithms? What are the commonly used dimensionality
reduction techniques in machine learning?
6+5+4=15
4. Describe Principal Component Analysis (PCA) as a technique for dimensionality reduction.
Compare PCA with Singular Value Decomposition (SVD) as dimensionality reduction
techniques.
Given the following data, compute the principal component vectors and the first principal
components:

X 2 3 7

Y 11 14 26

5+3+7=15
5. What is clustering? Is clustering supervised learning? Why?
Use k-means algorithm to find 2 clusters in the following data:

No 1 2 3 4 5 6 7

X1 1 1.5 3 5 3.5 4.5 3.5

X2 1 2 4 7 5 5 4.5

(2+4)+9=15
CHAPTER 4
1. Discuss the challenges associated with text mining and how they can be addressed.
Explain how text mining techniques such as natural language processing (NLP), sentiment
analysis, and topic modeling are used to extract meaningful insights from textual data.
6+9
2. Describe the key components of an information retrieval system and how they work together
to retrieve relevant information from a large corpus of documents. Compare and contrast
different information retrieval models such as Boolean retrieval, vector space models, and
probabilistic models.
6+9
3. Explain the role of recommender systems in personalized recommendation and content
discovery. Discuss the different types of recommender systems, including collaborative filtering,
content-based filtering, and hybrid approaches.
6+9

Psyc 2F23 - Stat Notes
No ratings yet
Psyc 2F23 - Stat Notes
8 pages
Chapter 3 Methods and Procedures This Chapter
90% (48)
Chapter 3 Methods and Procedures This Chapter
11 pages
Questions and Answers
No ratings yet
Questions and Answers
7 pages
Machine Learning QB
No ratings yet
Machine Learning QB
5 pages
Machine Learning Unit Wise Important Questions
No ratings yet
Machine Learning Unit Wise Important Questions
2 pages
Answer 2022-23
No ratings yet
Answer 2022-23
22 pages
13.question Bank
No ratings yet
13.question Bank
4 pages
DS QB
No ratings yet
DS QB
6 pages
Shivaji University, Kolhapur
No ratings yet
Shivaji University, Kolhapur
12 pages
ML Question Bank
No ratings yet
ML Question Bank
68 pages
Machine Learning CA 2
No ratings yet
Machine Learning CA 2
19 pages
ML DS Interview Quetions
No ratings yet
ML DS Interview Quetions
17 pages
Aiml Unit Iii Class Test 3.1
No ratings yet
Aiml Unit Iii Class Test 3.1
3 pages
ML Two Marks Question According To Syllabus
No ratings yet
ML Two Marks Question According To Syllabus
4 pages
ANS - For ML
No ratings yet
ANS - For ML
10 pages
Data Science 1731953513
No ratings yet
Data Science 1731953513
33 pages
ML Gtu Questions
No ratings yet
ML Gtu Questions
4 pages
Unit 4 & 5-Data Science and Computer Vision
No ratings yet
Unit 4 & 5-Data Science and Computer Vision
18 pages
ML Questions
No ratings yet
ML Questions
9 pages
SEC III Artificial Intelligence Question Bank
No ratings yet
SEC III Artificial Intelligence Question Bank
86 pages
Machine Learning IMP Questions
No ratings yet
Machine Learning IMP Questions
5 pages
MLANS
No ratings yet
MLANS
26 pages
Data Science
No ratings yet
Data Science
28 pages
Research Paper ML Unit Wise Questions
No ratings yet
Research Paper ML Unit Wise Questions
3 pages
ML GTU Question Bank
No ratings yet
ML GTU Question Bank
4 pages
Data Science
No ratings yet
Data Science
14 pages
23CS0902
No ratings yet
23CS0902
13 pages
Deep Learning Techniques
No ratings yet
Deep Learning Techniques
65 pages
面试问答
No ratings yet
面试问答
39 pages
QB For AIML
No ratings yet
QB For AIML
4 pages
Interview Question For Data Science
No ratings yet
Interview Question For Data Science
33 pages
Question Bank For ML
No ratings yet
Question Bank For ML
3 pages
Robotics AI& ML Sample Questions
No ratings yet
Robotics AI& ML Sample Questions
11 pages
SemSuggestions DM
No ratings yet
SemSuggestions DM
6 pages
Full ML Viva Questions Answers Q1 To Q70
No ratings yet
Full ML Viva Questions Answers Q1 To Q70
6 pages
The Cartoon Guide To Statistics-3
100% (1)
The Cartoon Guide To Statistics-3
8 pages
QBank All Mod
No ratings yet
QBank All Mod
5 pages
Questo Es
No ratings yet
Questo Es
8 pages
Class 10 - AI STUDY MATERIAL 19.08.2024 - Removed (1) - Removed
No ratings yet
Class 10 - AI STUDY MATERIAL 19.08.2024 - Removed (1) - Removed
2 pages
ML - Question Bank Part I
No ratings yet
ML - Question Bank Part I
6 pages
CHP 1,2
No ratings yet
CHP 1,2
18 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
4 pages
Machine Learning Algorithms 1728923216
No ratings yet
Machine Learning Algorithms 1728923216
12 pages
MLT Question Bank New
No ratings yet
MLT Question Bank New
6 pages
Interview QUES - AI
No ratings yet
Interview QUES - AI
18 pages
Rtmnu Machine Learning Paper Winter 2024
100% (1)
Rtmnu Machine Learning Paper Winter 2024
4 pages
Lecture 3 Mcqs
No ratings yet
Lecture 3 Mcqs
7 pages
Machine Learning Units 1 To 5 Bolded Questions
No ratings yet
Machine Learning Units 1 To 5 Bolded Questions
19 pages
Data Science
No ratings yet
Data Science
24 pages
Xii Ai Worksheet & Questions
No ratings yet
Xii Ai Worksheet & Questions
33 pages
Sem3 Asmt Answers
No ratings yet
Sem3 Asmt Answers
20 pages
DMDW Lab Oral Question Bank
No ratings yet
DMDW Lab Oral Question Bank
4 pages
FDS QB
No ratings yet
FDS QB
5 pages
Top 100 Interview Questions On Machine Learning
100% (1)
Top 100 Interview Questions On Machine Learning
155 pages
Question Bank
No ratings yet
Question Bank
3 pages
Machine Learning Qs
No ratings yet
Machine Learning Qs
10 pages
FDS QP - Thy
No ratings yet
FDS QP - Thy
1 page
ML Question
No ratings yet
ML Question
4 pages
Unit 6 Questions and Answers
No ratings yet
Unit 6 Questions and Answers
4 pages
IGNOU MCA Data Science and Big Data Previous Years Unsolved Papers MCS 226
From Everand
IGNOU MCA Data Science and Big Data Previous Years Unsolved Papers MCS 226
Manish Soni
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Data Science 6th Sem CS Engineesring Questions
No ratings yet
Data Science 6th Sem CS Engineesring Questions
35 pages
Lecture 13 NPTEL SAMPLING THEORY
No ratings yet
Lecture 13 NPTEL SAMPLING THEORY
8 pages
Business Analytics: Advance: Logistic Regression
100% (1)
Business Analytics: Advance: Logistic Regression
26 pages
@vtucode - in 21AI63 Model Set 2 Paper
No ratings yet
@vtucode - in 21AI63 Model Set 2 Paper
3 pages
Quiz Calculation Sheet
No ratings yet
Quiz Calculation Sheet
17 pages
Method of Estimating Percentage of Material Within Specification Limits (PWL)
No ratings yet
Method of Estimating Percentage of Material Within Specification Limits (PWL)
6 pages
Determinasi Sistem Pengendalian Mutu Dan Sikap Profesionalisme Auditor Terhadap Kualitas Audit
No ratings yet
Determinasi Sistem Pengendalian Mutu Dan Sikap Profesionalisme Auditor Terhadap Kualitas Audit
23 pages
Applied Statistic HW 2
No ratings yet
Applied Statistic HW 2
16 pages
Letter of Permission
100% (7)
Letter of Permission
20 pages
Chapter 12 ANOVA
No ratings yet
Chapter 12 ANOVA
25 pages
Mps 3 - Desain Studi Kohort - Dr. Budi Utomo, DR., M.kes.
No ratings yet
Mps 3 - Desain Studi Kohort - Dr. Budi Utomo, DR., M.kes.
43 pages
Suggested Solution To Finals Long Quiz
No ratings yet
Suggested Solution To Finals Long Quiz
5 pages
Komar University of Science and Technology
No ratings yet
Komar University of Science and Technology
15 pages
Regressi On
No ratings yet
Regressi On
16 pages
Unit 2 1
No ratings yet
Unit 2 1
54 pages
Presentation 37
No ratings yet
Presentation 37
12 pages
F Distribution: DF Denomi Nator DF For Numerator
No ratings yet
F Distribution: DF Denomi Nator DF For Numerator
12 pages
My Report Quasi Experimental Research Design
No ratings yet
My Report Quasi Experimental Research Design
15 pages
Lab Report
No ratings yet
Lab Report
65 pages
Quality Control Notes
No ratings yet
Quality Control Notes
14 pages
Unit I
No ratings yet
Unit I
26 pages
The Normal Distribution
No ratings yet
The Normal Distribution
5 pages
SASA
No ratings yet
SASA
4 pages
MR - Factor Analysis
No ratings yet
MR - Factor Analysis
61 pages
Lab06 Confidence Intervals
No ratings yet
Lab06 Confidence Intervals
4 pages
Statistic CHP 15 Revision Worksheet
No ratings yet
Statistic CHP 15 Revision Worksheet
15 pages
Frequency Table: Statistics
No ratings yet
Frequency Table: Statistics
4 pages
العلاقة بين أساليب إدارة النزاع التنظيمي وفعالية فريق العمل في الإدارات المحلية دراسة حالة في المجالس الشعبية البلدية
No ratings yet
العلاقة بين أساليب إدارة النزاع التنظيمي وفعالية فريق العمل في الإدارات المحلية دراسة حالة في المجالس الشعبية البلدية
21 pages

Data Science - QB

Uploaded by

Data Science - QB

Uploaded by

DATA SCIENCE USING PYTHON TOOLS

(15 marks Questions)

Age Competition Type Class

Old Yes Software Down

Old No Software Down

Old Yes Hardware Down

Mid Yes Software Down

Mid Yes Hardware Down

New Yes Software Up

person height weight foot size

Male 5.5 170 8

Female 5.5 150 7

X1 1 1.5 3 5 3.5 4.5 3.5

You might also like