Q 1. What is data science?
Ans: Data science is the field that combines statistical
analysis, machine learning, and programming to
extract insights from data.
Q 2. What are the key steps in the data science
process?
Ans: The key steps are data collection, data cleaning, data
exploration, modeling, evaluation, and deployment.
Q 3. What is supervised learning?
Ans: Supervised learning is a machine learning technique
where the algorithm learns from labeled data to
make predictions or classifications.
01
Q 4. What is unsupervised learning?
Ans: Unsupervised learning is a machine learning
technique where the algorithm explores patterns in
unlabeled data without predefined outputs.
Q 5. What is the difference between classification
and regression?
Ans: Classification predicts categorical labels, while
regression predicts continuous values.
02
Q 6. What is overfitting?
Ans: Overfitting occurs when a model performs well on
the training data but fails to generalize to new data
due to capturing noise or irrelevant patterns.
Q 7. What is feature selection?
Ans: Feature selection is the process of selecting relevant
features to improve model performance and reduce
complexity.
The three steps of feature selection can be
summarized as follows:
Data Preprocessing: Clean and prepare the data
for feature selection.
Feature Scoring: Compute scores for each feature
to reflect its importance to the target variable.
Selection: Select a subset of the most important
features based on their scores, and use them for
training the predictive model
Q 8. What is cross-validation?
Ans: Cross-validation is a technique to assess model
performance by dividing the data into subsets for
training and evaluation.
03
Q 9. What is regularization?
Ans: Regularization is a technique to prevent overfitting
by adding a penalty term to the model's objective
function.
Q 10. What is dimensionality reduction?
Ans: Dimensionality reduction reduces the number of
features in a dataset while preserving important
information and structure.
04
Q 11. What is clustering?
Ans: Clustering is the process of grouping similar data
points together based on their characteristics or
patterns.
Q 12. What is precision and recall?
Ans: Precision measures the accuracy of positive
predictions, while recall measures the coverage of
positive instances.
05
Q 13. What is feature engineering?
Ans: Feature engineering involves transforming raw data
into meaningful features to improve model
performance.
06
Q 14. What is ensemble learning?
Ans: Ensemble learning is a machine learning technique
that combines the predictions of multiple models to
make more accurate predictions than any individual
model could make on its own.
This is done by training a number of different models
on the same data set, and then combining their
predictions to get a final prediction.
07
Q 15. What is A/B testing?
Ans: A/B testing compares two or more versions of a
product or feature to determine the best-performing
option.
Q 16. What is the central limit theorem?
Ans: The central limit theorem (CLT) is a theorem in
probability theory that states that, given certain
conditions, the arithmetic mean of a sufficiently
large number of iterates of independent random
variables, each with a well-defined expected value
and well-defined variance, will be approximately
normally distributed, regardless of the underlying
distribution.
08
Q 17. What is the difference between correlation
and causation?
Ans: Correlation indicates a relationship between
variables, while causation implies that one variable
directly affects another.
Q 18. What is outlier detection?
Ans: Outlier detection helps identify data points that
significantly deviate from the expected patterns.
09
Q 19. What is the difference between a parametric
and non-parametric model?
Ans: Feature Parametric models Non-parametric models
Assume a specific data Do not make any assumptions
Assumptions
distribution about the data distribution
Can make more precise Less accurate than parametric
Accuracy
predictions models
Sensitive to changes in More robust to changes in the
Robustness
the data distribution data distribution
Complexity Simpler More complex
Data size Need more data Can work with less data
Computational time Faster to train Slower to train
Q 20. What is data mining?
Ans: Data mining involves discovering patterns and
insights from large datasets.
Q 21. What is the bias-variance trade-off?
Ans: The bias-variance trade-off refers to the balance
between model complexity and its ability to
generalize well to new data.
10
Q 22. What is time series analysis?
Ans: Time series analysis is used to analyze and forecast
data points collected over time.
11
Q 23. What is natural language processing (NLP)?
Ans: NLP focuses on the interaction between computers
and human language, enabling machines to
understand, interpret, and generate human language.
Q 24. What is deep learning?
Ans: Deep learning is a subset of machine learning that
uses artificial neural networks with multiple layers to
learn complex representations of data.
Q 25. What is reinforcement learning?
Ans: Reinforcement learning is a type of machine learning
where an agent learns to make decisions based on
feedback from its environment.
12
07
+91-7260058093 www.algotutor.io [email protected]