Data Science QB
Data Science QB
M.TECH(CSE)-2ND SEM
Unit 1
1. Analyze the role of data exploration and modeling in predictive analytics.
2. Discuss the importance of automating the data science workflow. What tools and methods are
commonly used?
3. Elaborate on the complete data science life cycle, explaining each step with examples.
4. Write short notes on:
a) Data retrieval
b) Data presentation
5. What are the challenges in handling big data in the data science process?
6. Differentiate between data modeling and data exploration.
7. Discuss how data science creates value in a big data world. Provide use cases from industries
such as healthcare, finance, or retail.
8. Explain the importance of data preparation and cleaning with examples.
9. What is an API? Give one example.
UNIT 2
1. Explain the differences between string, list, tuple, and dictionary with examples.
2. Discuss the role of major Python libraries (Matplotlib, NumPy, Scikit-learn, NLTK) in data
science projects.
3. What are some common techniques for data cleaning and munging?
4. Describe how to read and parse data from files in Python.
5. Describe dimensionality reduction. Why is it important in data science?
6. Write a Python code snippet to:
a) Plot a line chart
b) Plot a scatter plot
7. Describe the complete workflow of:
a) Cleaning and manipulating raw data
b) Rescaling and dimensionality reduction techniques using Python
8. What are the uses of the following toolkits in Python?
a) Matplotlib
b) NumPy
c) Scikit-learn
UNIT 3
1. What is a confidence interval? How is it interpreted in practice?
2. Explain Bayes’s Theorem with a practical example (e.g., disease testing).
3. Describe the role of linear algebra (vectors and matrices) in data science. Give use cases where
matrix operations are crucial.
4. Explain the difference between:
a) Independence and dependence
b) Discrete and continuous distributions
5. What is Simpson’s Paradox? Explain with an example.
6. Explain vectors and matrices with real-world examples.
7. How does the Central Limit Theorem help in statistical analysis?
8. What is p-hacking, and why is it problematic?
UNIT 4
1. What are neural networks? Explain their role in learning and generalization.
2. Provide an overview of deep learning and its advantages over traditional machine
learning models.
3. Describe the entire process of building a machine learning model, including data
preparation, training, testing, and evaluation.
4. What is the role of SVM in classification tasks?
5. Explain the differences between:
a) Supervised learning and unsupervised learning
b) Reinforcement learning and supervised learning
6. How do decision trees and random forests differ?
7. What are the main causes and consequences of overfitting?
8. Describe the working principle of K-Nearest Neighbors (KNN).
9. Describe a real-world application scenario using:
a) Logistic regression for classification
b) Random forest for decision-making
c) Neural networks for image recognition
UNIT 5
1. Describe the complete process of developing a weather forecasting system using data
science. Include data collection, EDA, visualization, modeling, and accuracy analysis.
2. What is object recognition? Describe the process of implementing it using deep learning
techniques.
3. Compare the accuracy and performance of different models (e.g., Linear Regression,
Random Forest, Neural Networks) in predicting outcomes on a chosen dataset.
4. Explain how sentiment analysis works using data from social media.
5. What are the common tools and libraries used for EDA and prediction in
Python?
6. Describe how data visualization helps in understanding patterns and trends.
7. Write short notes on:a) Real-time sentiment analysis b) Object recognition
8. Present a case study of any one data science application (e.g., predicting house
prices, customer churn, or product recommendation). Explain each phase in detail
from EDA to evaluation.
9. What are the common tools and libraries used for EDA and prediction in
Python?