0% found this document useful (0 votes)
33 views8 pages

Data Science - QB

Uploaded by

Maidul Islam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views8 pages

Data Science - QB

Uploaded by

Maidul Islam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

DATA SCIENCE USING PYTHON TOOLS

Model Question
____________________________________________________________________________

(1 Mark Questions)
Chapter 1
1. What is the primary goal of data science?
2. Define 'Big Data.'
3. What are the key steps in the data science process?
4. What is the significance of machine learning in data science?
5. Name two types of regularization commonly used in linear regression.
6. Define regularization in the context of linear regression.
7. Define cross-validation.
8. What is data science?
9. What are the key components of data science?
10. Name a few programming languages commonly used in data science.
11. Name some common techniques used in EDA.
12. What are outliers?
13. What is regularization
14. Define linear regression
15. What is model selection
16. Name some common techniques for model evaluation.
17. Explain the concept of overfitting.
18. Explain the concept of underfitting.

Chapter 2
1. What does kNN stand for?
2. How does kNN classify a new data point?
3. Name one advantage of kNN.
4. Define a decision tree.
5. What is entropy in decision trees?
6. How does a decision tree handle categorical data?
7. What is the main goal of SVM?
8. Define the maximum margin in SVM.
9. What are kernels used for in SVM?
10. How does a random forest reduce overfitting?
11. Name one application of random forests.
12. What does "naïve" mean in Naïve Bayes?
13. Give an example of when Naïve Bayes is suitable.
14. What kind of problem does logistic regression solve?
15. What function is used in logistic regression?
16. What are the coefficients in logistic regression used for?
17. What is the goal of classification algorithms?
18. Name two main types of classification algorithms.
19. How does the choice of 'k' affect the kNN algorithm?
20. What does the 'k' in kNN represent?
21. What is entropy in the context of decision trees?
22. What kind of problems can SVM solve?
23. Why is maximizing the margin important in SVM?
24. What is the kernel trick in SVM?
25. What are ensemble methods in machine learning?
26. Provide one example of an ensemble method.
27. Name one advantage of using Random Forests over individual decision trees.

Chapter 3
1. What is feature engineering in machine learning?
2. Why is feature selection important in machine learning?
3. What is the main objective of k-means clustering?
4. What is clustering in machine learning?
5. Name one application of clustering algorithms.
6. What is the main goal of clustering?
7. Name one common evaluation metric for clustering.
8. How are initial centroids chosen in k-means?
9. What is the stopping criterion for the k-means algorithm?
10. What is hierarchical clustering?
11. What is a dendrogram in hierarchical clustering?
12. Why is dimensionality reduction important in machine learning?
13. What is the primary goal of dimensionality reduction?
14. Name two common techniques for dimensionality reduction.
15. What is the role of variance in PCA?
16. What are the components of SVD?
17. Name one application of SVD in machine learning or data analysis.

Chapter 4
1. What is text mining?
2. What are the main goals of text mining?
3. Name two common techniques used in text mining.
4. What is information retrieval?
5. What are the main components of an information retrieval system?
6. Name one evaluation metric used in information retrieval.
7. What is network analysis?
8. What are recommender systems?
9. What is content-based filtering?
(5 marks Questions)
Chapter 1
1. Why is data cleaning important in data science? What ethical considerations should data
scientists keep in mind?
2. Why is it important to perform EDA before building a predictive model? Define the terms
"dependent variable" and "independent variable" in the context of linear regression.
3. What is the purpose of the cost function in linear regression? How does regularization prevent
overfitting in linear regression?
Chapter 2
1. What is classification in machine learning? Explain the concept of ensemble learning.
2. What are the advantages and disadvantages of using kNN? How do you choose the optimal
value of 'k' in kNN?
3. What is a Decision Tree in the context of classification? Explain the concepts of entropy and
information gain in Decision Trees.
4. What is a Support Vector Machine (SVM) in classification? Explain the concept of a
hyperplane in SVM.
5. What are support vectors, and how are they determined in SVM? Describe the role of the
kernel function in SVM.
6. What are the advantages and disadvantages of using SVM?
7. What is a Random Forest? What are the advantages of using Random Forests over individual
decision trees?
8. What is the "naïve" assumption made in Naïve Bayes? Explain the principle behind Naïve
Bayes classification.
9. What is the likelihood function in logistic regression? Discuss the difference between logistic
regression and linear regression.
CHAPTER 3
1. What is feature engineering? What are some common techniques for handling missing values
in feature engineering?
2. What is feature selection, and why is it important? Name a few feature selection methods
commonly used in machine learning.
3. What is clustering in machine learning? Discuss the importance of choosing the appropriate
number of clusters in k-means clustering.
4. What is dimensionality reduction? How does SVD differ from PCA?
5. What are the advantages of using dimensionality reduction techniques like PCA or SVD?
CHAPTER 4
1. What is text mining, and how does it differ from information retrieval? What are some
common techniques for text normalization?
2. Explain the term "bag of words" model in text mining. Discuss the concept of stemming and
its importance in text mining.
3. What is network analysis? How do you calculate the clustering coefficient of a network?
4. What are recommender systems? Explain the concept of similarity metrics in recommender
systems.

(15 marks Questions)


Chapter 1
1. What is Exploratory Data Analysis (EDA), and why is it considered a crucial step in the data
analysis process? What is linear regression, and how does it differ from other regression
techniques? What are the different types of EDA? Discuss a few advantages of using EDA.
5+5+2+3
2. What is linear regression, and how does it work?
Obtain the equations of two lines of regression for the following data
X 65 66 67 67 68 69 70 72
Y 67 68 65 68 72 72 69 71
6+9
3. What is data science? What is the difference between data science and data analytics? What
are the differences between supervised and unsupervised learning? What is logistic regression?
How does it differ from linear regression? What is reinforcement Learning?
2+3+3+2+3+2
4. How does machine learning fit into the field of data science? Explain. What is regularization,
and why is it used in machine learning? Explain the concept of overfitting and underfitting in the
context of model selection. Discuss the trade-offs between bias and variance in model selection.
4+4+4+3
5. a) Define the following terms: precision, recall, sensitivity, F1-score.
b) Suppose a computer program for recognizing dogs in photographs identifies eight dogs in a
picture containing 12 dogs and some cats. Of the eight dogs identified, five actually are dogs
while the rest are cats. Compute the precision and recall of the computer program.
Can you explain the concept of confusion matrix and its role in model evaluation?
8+4+3
6. What is model selection, and why is it important in machine learning? Discuss the trade-offs
between bias and variance in model selection. Describe the process of cross-validation and its
role in model selection. Discuss the challenges of evaluating imbalanced datasets in
classification tasks.
3+3+5+4
CHAPTER 2
1. a) Describe the working principle of the k-nearest neighbors algorithm. How does it classify
new data points?
b) Discuss the importance of choosing the right value of 'k' in kNN. How does the choice of 'k'
affect the model's performance?
c) Suppose we have a dataset with the following points in a two-dimensional space:
Point A: (2, 3)
Point B: (3, 5)
Point C: (5, 8)
Point D: (7, 2)
Point E: (8, 6)
You are asked to classify a new point, P= (4, 4) using the KNN algorithm with K=3.
6+4+5=15
2. a) Explain the concept of decision trees in classification. How does the decision tree algorithm
construct a tree from the training data?
b) Use ID3 algorithm to construct a decision tree for the data in the following table.

Age Competition Type Class


(profit)

Old Yes Software Down

Old No Software Down

Old Yes Hardware Down

Mid Yes Software Down

Mid Yes Hardware Down

Mid No Hardware Up
Mid No Software Up

New Yes Software Up

New No Hardware Up

New No Software Up

6+9=15
3. Explain the fundamental principles of Support Vector Machines (SVM). How does SVM find
the optimal hyperplane for classification? Describe the concept of margin in SVM. Explain the
concept of kernel functions in SVM.
4+4+3+4
4. Explain the fundamental principles of the Naïve Bayes algorithm.
Based on the following data determine the gender of a person having height 6 ft., weight 130 lbs.
and foot size 8 in. (use naive Bayes algorithm).

person height weight foot size


(feet) (lbs) (inches)

Male 6 180 10

Male 6 180 10

Male 5.5 170 8

Male 6 170 10
Female 5 130 8

Female 5.5 150 7

Female 5 130 6

Female 6 150 8

5+10=15
5. Define classification and explain its importance in machine learning. How does it differ from
other tasks like regression or clustering? Discuss the key components of a classification problem,
including features, labels, training data, and evaluation metrics.
(3+4)+8=15
6. Compare and contrast the strengths and weaknesses of kNN with other classification
algorithms, such as decision trees and SVM. Discuss methods for handling overfitting in
decision trees, such as tree pruning, minimum sample split, and maximum depth constraints.
6+9=15
CHAPTER 3
1. Explain the importance of feature engineering in machine learning. How does feature
engineering contribute to improving the performance of machine learning models? Describe the
process of feature selection. What are the different approaches to feature selection, including
filter methods, wrapper methods, and embedded methods?
3+3+3+6=15
2. Explain the k-means clustering algorithm. How does it partition data into clusters based on the
similarity of data points to cluster centroids? Discuss strategies for evaluating the quality of
clustering results. What are the common metrics used to assess clustering performance?
5+3+3+4=15
3. Explain the concept of dimensionality reduction and its importance in machine learning.
How does dimensionality reduction help address the curse of dimensionality and improve the
efficiency of machine learning algorithms? What are the commonly used dimensionality
reduction techniques in machine learning?
6+5+4=15
4. Describe Principal Component Analysis (PCA) as a technique for dimensionality reduction.
Compare PCA with Singular Value Decomposition (SVD) as dimensionality reduction
techniques.
Given the following data, compute the principal component vectors and the first principal
components:

X 2 3 7

Y 11 14 26

5+3+7=15
5. What is clustering? Is clustering supervised learning? Why?
Use k-means algorithm to find 2 clusters in the following data:

No 1 2 3 4 5 6 7

X1 1 1.5 3 5 3.5 4.5 3.5

X2 1 2 4 7 5 5 4.5

(2+4)+9=15
CHAPTER 4
1. Discuss the challenges associated with text mining and how they can be addressed.
Explain how text mining techniques such as natural language processing (NLP), sentiment
analysis, and topic modeling are used to extract meaningful insights from textual data.
6+9
2. Describe the key components of an information retrieval system and how they work together
to retrieve relevant information from a large corpus of documents. Compare and contrast
different information retrieval models such as Boolean retrieval, vector space models, and
probabilistic models.
6+9
3. Explain the role of recommender systems in personalized recommendation and content
discovery. Discuss the different types of recommender systems, including collaborative filtering,
content-based filtering, and hybrid approaches.
6+9

You might also like