Data Science
Data Science
Unit 1
Introduction to Data Science and Data Preprocessing:
1. Explain the concept of Data Science and its significance in modern-day industries.
2. Explain the term Data Science and its role in extracting knowledge from data.
3. Discuss three key applications of Data Science in different domains.
4. Compare and contrast Data Science with Business Intelligence (BI) in terms of
goals/objectives, methodologies, and outcomes.
5. Differentiate between Artificial Intelligence (AI) and Machine Learning (ML) with
respect to their scope and applications.
6. Analyze the relationship between Data Warehousing/Data Mining (DW-DM) and Data
Science, highlighting their similarities and differences.
7. Discuss the importance of Data Preprocessing in the Data Science pipeline and its
impact on the quality of analysis and modeling outcomes.
Unit 2
Exploratory Data Analysis (EDA):
1. Explain the importance of exploratory data analysis (EDA) in the data science
process.
2. Describe three data visualization techniques commonly used in EDA and their
applications.
3. Discuss the role of histograms, scatter plots, and box plots in understanding the
distribution and relationships within a dataset.
4. Define descriptive statistics and provide examples of commonly used measures
such as mean, median, and standard deviation. OR Define descriptive statistics and
discuss their role in summarizing and understanding datasets. Compare and
contrast measures such as mean, median, mode, and standard deviation.
5. Discuss the significance of histograms, scatter plots, and box plots in visualizing
different types of data distributions.
6. Explain the concept of hypothesis testing and provide examples of situations where
t-tests, chi-square tests, and ANOVA are applicable.
Unit 3
Model Evaluation Metrics:
1. Define accuracy, precision, recall, and F1-score as metrics for evaluating
classification models. Discuss its limitations, especially in the presence of
imbalanced datasets. Also discuss scenarios where each metric might be more
appropriate.
2. Explain the concept of the Area Under the Curve (AUC) in ROC curve analysis. How
does AUC help in evaluating the performance of a binary classification model?
3. Discuss the challenges of evaluating models for imbalanced datasets. How do
imbalanced classes affect traditional evaluation metrics?
4. Describe techniques that can be used to address these challenges and ensure
reliable model evaluation.