0% found this document useful (0 votes)
14 views55 pages

ADS Viva

The document provides foundational knowledge for Data Science, covering key concepts such as Data Science itself, Statistical Learning, Data Visualization, and the application of Linear Algebra. It explains essential statistical principles, including Probability, Hypothesis Testing, and the importance of Optimization, along with various data analysis techniques and visualization methods. Additionally, it discusses matrix operations, eigenvalues, and dimensionality reduction techniques like PCA, emphasizing their relevance in real-world applications.

Uploaded by

Ashiv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views55 pages

ADS Viva

The document provides foundational knowledge for Data Science, covering key concepts such as Data Science itself, Statistical Learning, Data Visualization, and the application of Linear Algebra. It explains essential statistical principles, including Probability, Hypothesis Testing, and the importance of Optimization, along with various data analysis techniques and visualization methods. Additionally, it discusses matrix operations, eigenvalues, and dimensionality reduction techniques like PCA, emphasizing their relevance in real-world applications.

Uploaded by

Ashiv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

✅ Module 1: Foundations for Data Science

1. What is Data Science?

Answer:​
Data Science is the process of collecting, analyzing, and interpreting large amounts of data to make better
decisions. It combines tools from statistics, computer science, and domain knowledge.​
Example: Netflix uses data science to recommend shows based on your watch history.

2. What is Statistical Learning?

Answer:​
Statistical learning is about using statistics to learn patterns from data. It includes both supervised
learning (like prediction) and unsupervised learning (like clustering).​
Example: Predicting house prices based on location and size.

3. What is the difference between Modeling and Prediction?

Answer:

●​ Modeling builds a mathematical formula based on data.​

●​ Prediction uses that model to guess future outcomes.​


Example: A model might learn that house prices go up with more bedrooms. It then predicts the
price of a new house.​

4. Why is Data Visualization important?

Answer:​
Data visualization turns numbers into graphs and charts, making it easier to see trends, patterns, and
outliers.​
Example: A line chart showing monthly sales helps a manager quickly spot which months performed
better.

5. How does Linear Algebra apply to Data Science?


Answer:​
Linear algebra is used in machine learning algorithms, image processing, and even data storage.​
Example: A dataset is often stored as a matrix. Matrix operations are used in models like linear
regression.

6. What is the role of Statistics in Data Science?

Answer:​
Statistics helps us understand and describe data, test hypotheses, and build models.​
Example: Mean, median, and standard deviation summarize a dataset.

7. Why is Optimization important in Data Science?

Answer:​
Optimization finds the best solution under constraints. In ML, it's used to minimize error.​
Example: In linear regression, optimization finds the best-fit line that minimizes the distance between
predicted and actual values.

8. What is Structured Thinking in Data Science?

Answer:​
Structured thinking breaks down problems into smaller parts:

1.​ Understand the problem​

2.​ Ask the right questions​

3.​ Identify needed data​

4.​ Choose the right methods​

5.​ Interpret the results​


Example: For reducing delivery time, first understand causes, collect data, then analyze and fix.​

9. What are the Axioms of Probability?

Answer:
1.​ Non-negativity: Probabilities are ≥ 0​

2.​ Normalization: Total probability = 1​

3.​ Additivity: If two events can't happen at the same time, their combined probability is the sum​
Example: The chance of rolling a 1 or a 2 on a die = P(1) + P(2) = 1/6 + 1/6 = 1/3​

10. What is a Random Variable?

Answer:​
A random variable maps outcomes of a random event to numbers.​
Example: When flipping a coin:

●​ Heads = 1​

●​ Tails = 0​
The outcome is random, but we can assign it a number.​

11. What is the difference between Discrete and Continuous Random Variables?

Answer:

●​ Discrete: Finite/countable outcomes (e.g., number of kids in a family)​

●​ Continuous: Infinite values within a range (e.g., temperature)​

12. What is a Probability Distribution?

Answer:​
It describes the likelihood of outcomes for a random variable.​
Example: For a fair die, each number (1–6) has a probability of 1/6.

13. What is the Expectation (Mean) of a Random Variable?


Answer:​
It's the average outcome if the experiment is repeated many times.​
Example: Expected value of a die roll = (1+2+3+4+5+6)/6 = 3.5

14. What is Variance and Standard Deviation?

Answer:​
They measure spread in data.

●​ Variance = average of squared differences from the mean​

●​ Standard deviation = square root of variance​


Example: Small SD means data is tightly grouped around the mean.​

15. What is Bayes' Theorem?

Answer:​
It updates the probability of an event based on new evidence.​
Example: If 1% of emails are spam, and a keyword appears more in spam, Bayes’ Theorem helps
calculate the chance an email with that keyword is actually spam.

16. What is Conditional Probability?

Answer:​
Probability of an event given another event has happened.​
Example: P(Rain | Clouds) = Probability of rain given it's cloudy.

17. What is the Law of Large Numbers?

Answer:​
As the number of trials increases, the sample mean gets closer to the true mean.​
Example: Flip a fair coin 1,000 times. Around 50% should be heads.

18. What is Central Limit Theorem (CLT)?


Answer:​
CLT says that if you take the average of many samples, the result follows a normal distribution, even if
the original data doesn’t.​
Example: Taking average test scores from many small student groups.

19. What is a Hypothesis Test?

Answer:​
A way to check if a result is statistically significant.​
Example: You want to check if a new drug is better than the old one. You test the difference using
hypothesis testing.

20. What is a Correlation?

Answer:​
It shows how two variables are related.

●​ Positive: Both increase​

●​ Negative: One increases, the other decreases​


Example: Higher temperature correlates with more ice cream sales.​

21. What is Overfitting in modeling?

Answer:​
Overfitting happens when a model learns the training data too well, including noise and outliers. This
makes it perform poorly on new/unseen data.​
Example: A student memorizes past exam questions but can't answer new ones in the actual test.​
Fixes: Use simpler models, cross-validation, or regularization.

22. What is Underfitting?

Answer:​
Underfitting happens when a model is too simple to capture the pattern in data. It performs badly on both
training and test data.​
Example: Predicting house prices using just the number of bathrooms when many other factors (like
location) matter.
23. What is Cross-validation?

Answer:​
Cross-validation is a method to test how well a model works by splitting data into training and test sets
multiple times.​
Example: In 5-fold CV, data is split into 5 parts, and the model trains on 4 while testing on 1. This
repeats 5 times for better accuracy estimate.

24. What is the difference between Supervised and Unsupervised Learning?

Answer:

●​ Supervised: Data has labels. Goal is prediction.​


Example: Predicting if an email is spam.​

●​ Unsupervised: No labels. Goal is to find patterns.​


Example: Grouping customers by behavior (clustering).​

25. What are Descriptive and Inferential Statistics?

Answer:

●​ Descriptive: Summarizes data (mean, median, charts).​

●​ Inferential: Makes predictions about a population from a sample using probability.​


Example: Describing average age vs. predicting average age of all citizens from a small group.​

26. What is a Confounding Variable?

Answer:​
A confounding variable influences both independent and dependent variables, creating a false
relationship.​
Example: Ice cream sales and drowning deaths both increase in summer. The confounding variable is
temperature.
27. What is a Histogram and what does it show?

Answer:​
A histogram is a bar chart that shows the distribution of numerical data. It groups data into bins and
shows how many values fall into each bin.​
Example: A histogram of student test scores shows how many scored between 60–70, 70–80, etc.

28. What is a Boxplot and what does it show?

Answer:​
A box plot shows the spread and skewness of data using five-number summary:

●​ Minimum, Q1, Median, Q3, and Maximum​


It also shows outliers.​
Example: Comparing salaries across different departments.​

29. What is the role of Matrix Operations in Data Science?

Answer:​
Matrices are used to represent and process data efficiently. Operations like multiplication, inversion, and
transpose are used in:

●​ Linear regression​

●​ Neural networks​

●​ PCA (Principal Component Analysis)​

30. What is Eigenvalue and Eigenvector and why are they important?

Answer:

●​ Eigenvector: A direction that doesn’t change under a linear transformation.​

●​ Eigenvalue: How much the vector is stretched/shrunk.​


Use in Data Science: PCA uses them to reduce dimensionality while keeping key patterns.​
31. What is the difference between a Parameter and a Statistic?

Answer:

●​ Parameter: A value that describes a population (unknown).​

●​ Statistic: A value from a sample (used to estimate the parameter).​


Example: Average height of all adults (parameter) vs. average height from a survey of 1,000
adults (statistic).​

32. What is a Normal Distribution and why is it important?

Answer:​
It's a bell-shaped curve where data is symmetrically distributed around the mean. Many natural processes
follow it.​
Example: Human heights, test scores.​
It's used in hypothesis testing and confidence intervals.

33. What is a Null Hypothesis?

Answer:​
The null hypothesis (H₀) is a statement that there is no effect or difference. We test it to see if we can
reject it in favor of the alternative hypothesis.​
Example: H₀: New drug has no effect.​
If results are significant, we reject H₀.

34. What is P-value?

Answer:​
The p-value tells us how likely we would get our data if the null hypothesis were true.

●​ Small p-value (< 0.05) → evidence against H₀​


Example: A p-value of 0.01 means there’s only 1% chance our result is due to random chance.​

35. What is a Confidence Interval?


Answer:​
A confidence interval gives a range in which we expect the true value to fall, with a certain level of
confidence (usually 95%).​
Example: A confidence interval of [48, 52] for average height means we are 95% confident the true
average is between 48 and 52 inches.

✅ Module 2: Data Analysis & Visualization


1. What is a Matrix in data science?

Answer:​
A matrix is a rectangular arrangement of numbers in rows and columns. It's used to represent datasets,
transformations, and operations in machine learning.​
Example: A table of student scores (rows = students, columns = subjects) is a matrix.

2. What is the Determinant of a matrix and what does it tell us?

Answer:​
The determinant is a single number that gives information about a square matrix.

●​ If the determinant is 0, the matrix is not invertible (i.e., you can't solve equations with it).​

●​ It also tells us if a transformation changes volume or orientation.​


Example: For a 2×2 matrix, det = ad - bc​

3. What is the Trace of a matrix?

Answer:​
The trace is the sum of the diagonal elements of a square matrix.​
It’s used in optimization and machine learning cost functions.

4. What is Rank of a matrix?


Answer:​
The rank is the number of independent rows or columns in a matrix.

●​ High rank = more useful information​

●​ Low rank = redundant or dependent data​


Example: A rank-1 matrix has all rows as multiples of one row.​

5. What is Nullity of a matrix?

Answer:​
Nullity = number of solutions to the equation Ax = 0. It’s related to the number of columns minus the
rank.​
Formula:​
Nullity = Number of columns - Rank​
It shows how many directions in the solution space give 0 when the matrix multiplies a vector.

6. What are Eigenvalues and Eigenvectors?

Answer:

●​ An eigenvector doesn’t change direction when a matrix transforms it.​

●​ An eigenvalue tells how much the eigenvector is stretched or shrunk.​


Used in: PCA, image compression, and system stability.​

7. Give a real-world use case of Eigenvalues and Eigenvectors.

Answer:​
In facial recognition, images are converted to matrices. PCA (which uses eigenvectors) reduces the size
of data while keeping important features, helping recognize faces efficiently.

8. What is Matrix Factorization?

Answer:​
Matrix factorization breaks a matrix into simpler matrices that, when multiplied together, give the
original matrix.​
Used in:

●​ Recommender systems (like Netflix)​

●​ Dimensionality reduction​
Example: A = LU, where L is lower triangular, U is upper triangular.​

9. What is LU Decomposition?

Answer:​
LU stands for Lower-Upper. A matrix is split into:

●​ L (lower triangular)​

●​ U (upper triangular)​
This helps solve equations and find determinants efficiently.​

10. What is QR Decomposition?

Answer:​
QR splits a matrix into:

●​ Q: An orthogonal matrix​

●​ R: An upper triangular matrix​


It’s useful in regression and solving least squares problems.​

11. What is SVD (Singular Value Decomposition)?

Answer:​
SVD breaks a matrix A into three parts:​
A = UΣVᵗ​
Where:

●​ U and V are orthogonal​


●​ Σ is a diagonal matrix of singular values​
Used in: Image compression, noise reduction, recommendation systems​

12. How is SVD used in data science?

Answer:​
In recommendation engines like Netflix, SVD helps predict missing ratings by simplifying the
user-item matrix while keeping key patterns.

13. What is Data Visualization and why is it important?

Answer:​
Data visualization is turning data into charts, graphs, and visuals to make it easier to understand.​
Example: A pie chart of expenses helps you instantly see what you spend the most on.

14. What are the common types of data visualization?

Answer:

●​ Bar chart: Compare categories​

●​ Line graph: Show trends over time​

●​ Histogram: Show data distribution​

●​ Scatter plot: Show relationships​

●​ Boxplot: Show spread and outliers​

15. What is a Scatter Plot and when is it used?

Answer:​
It plots points on an x-y graph to show relationships between two variables.​
Example: Plotting study hours vs. exam scores to see correlation.
16. What is a Heatmap?

Answer:​
A heatmap uses colors to represent values in a matrix.​
Example: A correlation matrix heatmap shows which variables are strongly related.

17. What is Dimensionality Reduction and why is it needed?

Answer:​
It reduces the number of features in data while keeping important information.​
Why:

●​ Faster computation​

●​ Easier to visualize​

●​ Removes noise​
Example: Reducing 1000-pixel features in image recognition to top 50.​

18. What is PCA (Principal Component Analysis)?

Answer:​
PCA is a method that uses eigenvalues and eigenvectors to reduce data dimensions. It keeps
components with the most variation.​
Example: Compressing a 100-feature dataset to top 2 features for visualization.

19. What is a Correlation Matrix and how is it visualized?

Answer:​
A correlation matrix shows how strongly variables are related. It’s often shown using a heatmap.​
Example: In sales data, a correlation matrix might show that advertising spend is strongly correlated
with revenue.

20. What is the role of matplotlib and seaborn in Python for visualization?

Answer:
●​ matplotlib: A basic plotting library (line, bar, scatter, etc.)​

●​ seaborn: Built on matplotlib, provides prettier, high-level charts like boxplots, heatmaps, and
regression plots.​

21. What is an Inner Product in linear algebra?

Answer:​
An inner product is a way to measure the similarity between two vectors.​
For two vectors A and B, the inner product is:

A⋅B=a1​b1​+a2​b2​+...+an​bn​

Example: If A = [1, 2], B = [3, 4],​


A·B = (1×3) + (2×4) = 11​
If the inner product is 0, the vectors are orthogonal (completely unrelated).

22. What is the geometric interpretation of the inner product?

Answer:​
It relates to the angle between two vectors.

A⋅B=∥A∥∥B∥cos(θ)

●​ If angle = 0°, vectors point in same direction → inner product is large​

●​ If angle = 90°, vectors are orthogonal → inner product is 0​


Used in: Cosine similarity, projections, and machine learning models.​

23. What is Cosine Similarity and how is it related to inner product?

Answer:​
Cosine similarity measures the angle between two vectors, not their size.

Cosine Similarity=A.B / ∥A∥∥B∥​

●​ Value ranges from -1 to 1​


●​ 1 = same direction, 0 = orthogonal, -1 = opposite​
Example: Used in document similarity (e.g., spam detection or search).​

24. What is Euclidean Distance?

Answer:​
It’s the straight-line distance between two points in space.

Example: Distance between two cities based on coordinates.​


Used in: K-Nearest Neighbors (KNN), clustering.

25. What is Manhattan Distance?

Answer:​
Also called L1 distance, it’s the total distance moved along axes.

Distance=∣x1​−x2​∣+∣y1​−y2​∣

Example: Like walking city blocks—no diagonal paths allowed.​


Used in: Sparse data scenarios.

26. What is the difference between Euclidean and Manhattan Distance?

Answer:

●​ Euclidean: Straight line (shortest path)​

●​ Manhattan: Axis-aligned (like grid or streets)​


Use Euclidean when direction matters, Manhattan when movements are constrained.​

27. What is Minkowski Distance?


Answer:​
A general formula for both Euclidean and Manhattan distances.

●​ p = 1 → Manhattan​

●​ p = 2 → Euclidean​
Can be adjusted for different situations in ML.​

28. What is Mahalanobis Distance?

Answer:​
It measures distance while considering correlations in data.​
Useful when features have different scales or variances.​
Used in: Outlier detection, clustering.

29. What is Jaccard Similarity?

Answer:​
Used to compare two sets.

Jaccard=Intersection​/ Union

Example: Comparing user A and B’s movie preferences.​


If they have 3 movies in common out of 6 total, Jaccard = 3/6 = 0.5

30. When should we use Cosine Similarity vs. Euclidean Distance?

Answer:

●​ Cosine Similarity: When direction matters more than magnitude (e.g., document similarity).​

●​ Euclidean Distance: When magnitude and location both matter (e.g., physical location).​
31. How are distance measures used in KNN (K-Nearest Neighbors)?

Answer:​
KNN uses distance measures (usually Euclidean) to find the k closest data points and predicts based on
their labels.​
Example: To classify a fruit as apple or orange based on nearest neighbors in feature space.

32. What is a Distance Matrix?

Answer:​
A table that shows the distances between every pair of points in a dataset.​
Example: If you have 5 cities, the matrix shows pairwise distances between them all.​
Used in: Clustering, path optimization.

33. What is Projection in linear algebra and data science?

Answer:​
Projection means "dropping" a vector onto another vector or plane.​
Used in:

●​ PCA: project data onto principal components​

●​ Inner product: helps calculate projection length​


Example: Shadow of a vector on a line.​

34. How does dimensionality affect distance measures?

Answer:​
In high dimensions, distances become less meaningful (curse of dimensionality).

●​ All points seem equally far.​

●​ Distance-based models like KNN become less accurate.​


Fix: Use dimensionality reduction (e.g., PCA) before applying distance-based methods.​
35. Summarize the difference between similarity and distance.

Answer:

●​ Similarity: Higher = more alike (e.g., Cosine similarity, Jaccard)​

●​ Distance: Lower = more alike (e.g., Euclidean, Manhattan)​


They’re inversely related.​
Example: High cosine similarity → small angle → low "distance" between vectors.​

✅ Module 3: Exploratory Data Analysis


1. What is Exploratory Data Analysis (EDA)?

Answer:​
EDA is the process of analyzing datasets to summarize their main features using statistics and
visualizations.​
Purpose:

●​ Understand patterns​

●​ Spot anomalies​

●​ Test assumptions​
Example: Before building a model on sales data, EDA helps you understand trends and outliers.​

2. What are the elements of structured data?

Answer:​
Structured data is organized in rows and columns (like in a spreadsheet).​
Elements include:

●​ Variables (columns): age, income, gender​

●​ Observations (rows): one entry per person or case​

●​ Data types: numeric, categorical, binary​


Example: Customer database with name, age, purchase history.​
3. What are Estimates of Location?

Answer:​
They show where most values lie in a dataset.​
Common estimates:

●​ Mean (average)​

●​ Median (middle value)​

●​ Mode (most frequent value)​


Example: If the average income of a group is $40,000, that's the mean (estimate of location).​

4. What are Estimates of Variability?

Answer:​
They describe how spread out the data is.​
Common ones:

●​ Range = Max - Min​

●​ Variance = average squared distance from mean​

●​ Standard deviation = square root of variance​


Example: A low SD means most scores are close to the mean (consistent performance).​

5. What is the importance of variability in data analysis?

Answer:​
Variability helps us understand risk, reliability, and differences.​
Example: If two employees have the same average performance but one has high variability, they may
be unreliable.

6. What is an Expectation in statistics?

Answer:​
The expected value is the long-term average of a random variable.​
Formula:​
E[X] = Σ (x * P(x))​
Example: In tossing a fair die, expected value = (1+2+3+4+5+6)/6 = 3.5

7. What is a Moment in statistics?

Answer:​
Moments are values that describe shape and distribution.

●​ 1st moment: Mean​

●​ 2nd moment: Variance​

●​ 3rd moment: Skewness (asymmetry)​

●​ 4th moment: Kurtosis (peakedness)​


They help describe how data behaves beyond just average and spread.​

8. What is a Histogram and how is it useful?

Answer:​
A histogram is a bar chart showing frequency distribution of numerical data.​
Used to:

●​ Check symmetry or skewness​

●​ Spot modes and gaps​


Example: A histogram of exam scores shows if most students scored in the same range.​

9. What is Skewness?

Answer:​
Skewness tells whether the data is leaning left or right (not symmetrical).

●​ Positive skew: Long tail to the right (e.g., income)​

●​ Negative skew: Long tail to the left​


Example: Most people earn average income, but a few earn a lot → positive skew.​
10. What is Kurtosis?

Answer:​
Kurtosis shows how peaked or flat the distribution is.

●​ High kurtosis = tall, narrow peak (heavy tails)​

●​ Low kurtosis = flat top, light tails​


Used to: Detect outliers and assess risk in finance or ML.​

11. How do we explore the distribution of numerical data?

Answer:​
Use tools like:

●​ Histogram​

●​ Box plot​

●​ Density plot​
Also calculate: mean, median, mode, variance, skewness, kurtosis.​
Goal: Understand shape, center, spread, and outliers.​

12. What is a Box Plot and what does it show?

Answer:​
A box plot shows:

●​ Median (line in box)​

●​ Quartiles (box edges)​

●​ Min and Max (whiskers)​

●​ Outliers (dots)​
Example: Box plot of test scores can highlight if a few students performed much worse or better.​
13. How do we explore binary and categorical data?

Answer:​
Use counts and proportions, and visualize with:

●​ Bar charts​

●​ Pie charts​

●​ Frequency tables​
Example: Pie chart showing % of customers who said "yes" or "no" to buying a product.​

14. What is Covariance?

Answer:​
Covariance tells if two variables move together.

●​ Positive: both increase​

●​ Negative: one increases, the other decreases​


Example: More study hours and higher grades → positive covariance​

15. What is Correlation and how is it different from Covariance?

Answer:​
Correlation is a standardized form of covariance (range = -1 to 1).

●​ +1: perfect positive​

●​ 0: no relation​

●​ -1: perfect negative​


Difference: Correlation is scale-independent; covariance is not.​

16. What is a Correlation Matrix?


Answer:​
A table showing pairwise correlation values among multiple variables.​
Used to: Spot relationships or multicollinearity before modeling.​
Example: A matrix showing height, weight, age correlations.

17. What is Multivariate Analysis?

Answer:​
Analyzing two or more variables together to see relationships.​
Techniques include:

●​ Scatter plots​

●​ Pair plots​

●​ Heatmaps​
Example: Exploring relationship between income, education, and job satisfaction.​

18. What is a Pair Plot and when is it useful?

Answer:​
A grid of scatter plots comparing all variable pairs in a dataset.​
Use it to:

●​ Spot trends​

●​ Detect correlations​

●​ Identify outliers​
Tool: sns.pairplot() in Python.​

19. Why is EDA important before building models?

Answer:​
EDA helps to:

●​ Detect missing values​


●​ Spot outliers​

●​ Understand variable types​

●​ Choose right preprocessing​


Without EDA, your model may be built on bad or misunderstood data.​

20. What tools are commonly used for EDA in Python?

Answer:

●​ Pandas: Data loading and stats​

●​ Matplotlib / Seaborn: Charts​

●​ Plotly: Interactive plots​

●​ Scipy / Numpy: Statistics​

●​ Sweetviz / pandas-profiling: Auto-reports​

21. What are outliers and how can we detect them?

Answer:​
Outliers are data points that are significantly different from the rest.​
Detection methods:

●​ Box plot (look for points beyond whiskers)​

●​ Z-score (if > 3 or < -3, likely an outlier)​

●​ IQR method:​

Outliers<Q1−1.5×IQR or >Q3+1.5×IQR

Example: A student's test score of 100 when most score between 50–70.

22. How can missing values affect data analysis?


Answer:​
Missing values can:

●​ Distort statistical summaries​

●​ Affect model accuracy​

●​ Cause errors in algorithms​


Solution:​

●​ Drop rows/columns​

●​ Impute with mean/median/mode​

●​ Use advanced imputation (e.g., KNN, regression)​

23. What is the difference between univariate and multivariate analysis?

Answer:

●​ Univariate: Analyzing one variable (mean, histogram)​

●​ Multivariate: Analyzing two or more variables (correlation, scatter plot)​


Example:​

●​ Univariate: Examining just "income"​

●​ Multivariate: Examining "income" vs. "education"​

24. What is a scatter plot and what does it show?

Answer:​
A scatter plot shows the relationship between two numeric variables.​
Example: Plotting "hours studied" vs. "exam score" reveals positive correlation.​
You can visually detect trends, clusters, or outliers.

25. How do you handle categorical variables during EDA?


Answer:

●​ Count unique categories​

●​ Use bar plots or pie charts​

●​ Check frequency distribution​


Example: Analyze how many customers fall into each region or age group.​

26. What is cross-tabulation (crosstab)?

Answer:​
It’s a table showing the frequency distribution of two categorical variables.​
Example: Crosstab of "gender" vs. "purchased" shows how many men/women made a purchase.

27. What is a heatmap and how is it useful in EDA?

Answer:​
A heatmap displays values or relationships using color.

●​ Often used to visualize correlation matrices.​


Example: Red cells might show strong positive correlation between "age" and "income".​

28. How can we identify multicollinearity in data?

Answer:​
Multicollinearity = variables are too strongly related.​
Check using:

●​ Correlation matrix​

●​ Variance Inflation Factor (VIF)​


If two variables are highly correlated (e.g., r > 0.8), consider dropping one.​

29. What is the role of visualizations in EDA?


Answer:​
Visuals make it easier to:

●​ Spot patterns and outliers​

●​ Understand distribution​

●​ Identify relationships​
Tools: Bar chart, histogram, box plot, scatter plot, line chart​

30. What is the IQR and why is it important in EDA?

Answer:​
IQR (Interquartile Range) = Q3 - Q1​
It measures spread of the middle 50% of the data and helps find outliers.​
Example:​
Q1 = 30, Q3 = 70 → IQR = 40​
Outlier threshold = Q1 - 1.5×IQR = -30, Q3 + 1.5×IQR = 130

31. What are summary statistics and how are they used in EDA?

Answer:​
Summary statistics = basic metrics describing data

●​ Mean, median, mode​

●​ Min, max, std dev, percentiles​


Used to quickly get a sense of a variable’s behavior.​

32. What is data profiling?

Answer:​
Data profiling is the process of examining data quality and content:

●​ Null counts​

●​ Unique values​
●​ Distribution​

●​ Min/Max values​
Tools: pandas-profiling, Sweetviz​

33. What is the role of correlation in feature selection?

Answer:​
Highly correlated features may give redundant information.​
Removing one helps reduce model complexity and multicollinearity.​
Example: Height (cm) and Height (inches) are perfectly correlated.

34. How can we analyze time series data in EDA?

Answer:​
Use:

●​ Line plots to observe trends​

●​ Rolling averages to smooth fluctuations​

●​ Seasonal decomposition​
Example: Plotting monthly sales to see if they rise every December.​

35. What is a violin plot and when is it better than a box plot?

Answer:​
A violin plot combines box plot and density plot.​
It shows median, IQR, and full distribution shape.​
Useful when: You want to see both summary stats and distribution in one chart.

✅ Module 4: Data and Sampling Distributions


1. What is random sampling and why is it important?
Answer:​
Random sampling means selecting data points in such a way that every item has an equal chance of
being chosen.​
It ensures that the sample represents the whole population and avoids bias.​
Example: Surveying 100 students randomly from a university ensures opinions reflect the entire student
body.

2. What is sample bias?

Answer:​
Sample bias happens when the sample does not represent the full population correctly.​
Causes: Bad sampling methods, missing groups, etc.​
Example: Surveying only morning gym users may miss out on night-time users.

3. What is selection bias? How is it different from sample bias?

Answer:​
Selection bias is a type of sample bias that occurs when some groups are systematically excluded or
more likely to be included.​
Example: If an online survey excludes those without internet access, it suffers from selection bias.

4. What is the Central Limit Theorem (CLT)?

Answer:​
The CLT states that:

If you take many random samples from any population, the distribution of the sample
means will tend to be normal (bell-shaped) as the sample size increases.​
Importance: It allows us to use normal distribution for hypothesis testing even if the data
isn't normal.​
Example: If you repeatedly average samples of people's daily steps, the averages will form
a normal curve.

5. What is the standard error?

Answer:​
Standard error is the standard deviation of the sample mean.​
It tells us how much the sample mean varies from the actual population mean.
SE=Standard Deviation / sqrt{n}
Example: A small standard error means sample averages are close to the true mean.

6. What are bootstrap confidence intervals?

Answer:​
Bootstrapping is a method to estimate confidence intervals by:

1.​ Resampling your data (with replacement) many times​

2.​ Calculating the statistic each time​

3.​ Using the spread of those statistics to build the interval​


Used when: You can’t assume the data is normal.​
Example: Estimating the average age of users with 95% confidence.​

7. What is a confidence interval?

Answer:​
A range that’s likely to contain the true population parameter (like the mean), based on sample data.​
Example: "The average height is 170cm ± 5cm with 95% confidence" means there's a 95% chance the
real average lies between 165 and 175cm.

8. What is the normal distribution and what are its properties?

Answer:​
The normal distribution is a bell-shaped symmetric curve with:

●​ Mean = Median = Mode​

●​ 68% of values within 1 SD​

●​ 95% within 2 SD​

●​ 99.7% within 3 SD​


Example: Adult heights, IQ scores often follow this distribution.​
9. What is a long-tailed distribution?

Answer:​
It’s a distribution with extreme values (very high or low) that are not rare.​
Right-tailed (positive): Income (few people earn a lot)​
Left-tailed (negative): Some error metrics​
They don’t drop off quickly like the normal curve.

10. What is the Student’s t-distribution?

Answer:​
A bell-shaped curve like the normal distribution, but wider—used when:

●​ Sample size is small (n < 30)​

●​ Population standard deviation is unknown​


Example: Used in t-tests for comparing sample means.​

11. What is the binomial distribution?

Answer:​
Used for binary outcomes (success/failure) repeated n times.

Example: Tossing a coin 10 times and counting heads.

12. What is the Poisson distribution?

Answer:​
Models the number of events in a fixed time or space, where events happen independently.


Example: Number of emails received per hour.

13. What is the exponential distribution?

Answer:​
It models the time between events in a Poisson process.​
Properties:

●​ Skewed​

●​ Mean = 1/λ​
Example: Time between buses arriving at a stop.​

14. What is the Weibull distribution used for?

Answer:​
Used in survival analysis and reliability engineering.​
It models time until failure of a system.​
Example: Predicting the life of a machine component.

15. What does it mean to “fit a model” to data?

Answer:​
It means using statistical or machine learning methods to find a function that explains the data.​
Example: Fitting a line through a scatter plot of sales vs. advertising.

16. What’s the difference between a population and a sample?

Answer:

●​ Population: Entire group (e.g., all voters in a country)​

●​ Sample: Subset of population (e.g., 1,000 voters)​


We use samples to make inferences about the population.​
17. When should we use t-distribution instead of normal distribution?

Answer:​
Use t-distribution when:

●​ Sample size < 30​

●​ Population standard deviation is unknown​


Otherwise, use normal distribution.​

18. How does sample size affect confidence intervals?

Answer:​
Larger sample size → narrower confidence interval (more precise).​
Smaller sample size → wider interval (less reliable).

19. What is meant by “law of large numbers”?

Answer:​
As you increase the number of samples, the sample mean approaches the population mean.​
Example: Rolling a die many times → average approaches 3.5.

20. How is Poisson distribution different from binomial distribution?

Answer:

Property Poisson Binomial

Outcome type Count of events Success/failure

Trials Not fixed Fixed

Mean = Variance? Often yes Not always

Example:

●​ ​
Binomial: 10 coin flips​
●​ Poisson: Calls received per minute​

21. What is the shape of a binomial distribution?

Answer:

●​ Symmetric if p = 0.5​

●​ Skewed if p ≠ 0.5​
As the number of trials increases, the binomial distribution approaches normal.​
Example: With 100 coin flips, the binomial curve looks bell-shaped.​

22. What does the mean and variance of a binomial distribution depend on?

Answer:​
Mean (μ) = n × p​
Variance (σ²) = n × p × (1 - p)​
Example: For 10 coin tosses (p = 0.5),​
μ = 5, σ² = 2.5

23. What are the assumptions behind Poisson distribution?

Answer:

●​ Events occur independently​

●​ Constant average rate (λ)​

●​ No two events occur at exactly the same time​


Example: Website hits per minute.​

24. What’s the relationship between exponential and Poisson distributions?

Answer:

●​ Poisson models count of events in time​


●​ Exponential models time between events​
They are used together in modeling arrival processes (e.g., queue systems).​

25. What kind of real-life data fits a Weibull distribution?

Answer:

●​ Mechanical failures​

●​ Product lifespans​
Shape varies:​

●​ Shape < 1: failure rate decreases over time (infant mortality)​

●​ Shape > 1: failure rate increases (aging equipment)​

26. Why is standard error smaller than standard deviation?

Answer:​
Because it reflects variability of the mean, not individual data points.​
SE = SD / √n​
Larger samples → smaller SE → more accurate estimate

27. Why is the t-distribution wider than the normal distribution?

Answer:​
Because it accounts for more uncertainty when sample size is small and standard deviation is
unknown.​
As sample size grows, t-distribution becomes more like the normal.

28. How do you choose the right distribution for your data?

Answer:

1.​ Look at the type of variable (discrete vs. continuous)​


2.​ Check data shape (histogram, skewness)​

3.​ Consider context (e.g., arrival time = exponential)​

4.​ Use Q-Q plots or fitting tests​

29. What is skewness and how does it affect distribution?

Answer:​
Skewness measures asymmetry:

●​ Positive skew: long right tail (e.g., income)​

●​ Negative skew: long left tail (e.g., exam scores with few low scores)​
It affects which distribution and statistical methods to use.​

30. What is kurtosis and what does it indicate?

Answer:​
Kurtosis measures "peakedness" or tail thickness:

●​ High kurtosis = heavy tails (more outliers)​

●​ Low kurtosis = light tails​


Example: Normal distribution has kurtosis = 3.​

31. How do histograms help identify data distribution?

Answer:​
Histograms show the shape, skewness, peaks, and spread of the data.​
You can visually detect if data looks:

●​ Normal (bell-shaped)​

●​ Right-skewed (long tail on right)​


●​ Multimodal (multiple peaks)​

32. Why is the Central Limit Theorem powerful in statistics?

Answer:​
It allows us to:

●​ Use normal approximation for sample means​

●​ Make inference and build confidence intervals​


Even if the population is not normal, as long as the sample is large (n ≥ 30), the sampling
distribution is normal.​

33. What does it mean if a distribution has a fat tail?

Answer:​
Fat-tailed distributions (like Cauchy or some power laws):

●​ Have higher probability of extreme values (outliers)​

●​ Don’t drop off as fast as normal distribution​


Example: Stock market returns are often fat-tailed.​

34. How does bootstrapping improve statistical reliability?

Answer:​
Bootstrapping:

●​ Makes no assumptions about data distribution​

●​ Uses the original data to create many pseudo-samples​

●​ Helps estimate standard errors and confidence intervals​


Especially useful when: You have small or non-normal datasets.​
35. What is the role of simulation in understanding distributions?

Answer:​
Simulations can help visualize and understand:

●​ Sampling variability​

●​ Shape of distributions​

●​ Behavior of statistics under different conditions​


Example: Simulating 10,000 coin flips to see binomial distribution in action.​

✅ Module 5: Statistics and Significance Testing


1. What is hypothesis testing in statistics?

Answer:​
Hypothesis testing is a method to decide whether a claim about a population is true based on sample
data.​
You start with two statements:

●​ Null hypothesis (H₀): Nothing has changed​

●​ Alternative hypothesis (H₁): There is a difference or effect​


Then, you use data to support or reject H₀.​
Example: A company says its product lasts 100 days. You test if that's still true.​

2. What are p-values?

Answer:​
A p-value tells you how likely your sample data would occur if the null hypothesis were true.

●​ Low p-value (≤ 0.05): Strong evidence against H₀ → reject it​

●​ High p-value (> 0.05): Weak evidence → fail to reject H₀​


Example: p = 0.02 means there's only a 2% chance of observing the result by luck if H₀ is true.​

3. What is an A/B test?


Answer:​
A/B testing compares two versions (A and B) to see which performs better.​
Used in marketing, websites, ads.​
Example: Version A of a webpage has a 5% signup rate; B has 8%. You test if the difference is
statistically significant.

4. What is a chi-square test?

Answer:​
Chi-square tests are used to compare observed counts with expected counts to see if there is a
relationship between categorical variables.​
Example: Testing if gender is related to choice of product color (e.g., blue or red).

5. What is a confidence interval?

Answer:​
A confidence interval is a range where we believe the true population value (like the mean) lies.​
Example: If we say average income is $50,000 ± $2,000 with 95% confidence, it means we’re 95% sure
the real average is between $48,000 and $52,000.

6. What is a t-test and when is it used?

Answer:​
A t-test compares the means of two groups.​
Use it when:

●​ Sample size is small​

●​ Standard deviation is unknown​


Example: Comparing average test scores of two classes.​

7. What is the difference between a one-tailed and a two-tailed test?

Answer:

●​ One-tailed test: Looks for difference in one direction (greater or less)​


●​ Two-tailed test: Checks for any difference (higher or lower)​
Example: Testing if a new drug works better = one-tailed.​
Testing if it works differently (better or worse) = two-tailed.​

8. What is statistical significance?

Answer:​
If results are statistically significant, it means they’re unlikely to have happened by random chance,
usually based on a p-value threshold (e.g., 0.05).

9. What is ANOVA (Analysis of Variance)?

Answer:​
ANOVA checks whether three or more groups have different means.​
Instead of doing multiple t-tests, ANOVA gives a single test.​
Example: Comparing exam scores from 3 different schools.

10. What are degrees of freedom (df) in statistics?

Answer:​
Degrees of freedom are the number of independent values that can vary when calculating a statistic.​
Example: In a sample of 5 numbers with a fixed mean, only 4 values can freely change → df = 4.

11. Why use ANOVA instead of multiple t-tests?

Answer:​
Using many t-tests increases the chance of false positives (Type I errors).​
ANOVA controls this risk and provides one overall test for all group differences.

12. What are the assumptions for using a t-test?

Answer:

●​ Data is normally distributed​


●​ Variances are equal (for independent t-test)​

●​ Samples are independent​


Violating these can make results unreliable.​

13. What is a Type I and Type II error?

Answer:

●​ Type I error: Rejecting a true null hypothesis (false alarm)​

●​ Type II error: Not rejecting a false null hypothesis (missed detection)​


Example: Type I: Saying a medicine works when it doesn't.​
Type II: Missing a drug effect that is real.​

14. What affects the width of a confidence interval?

Answer:

●​ Sample size: Larger n → narrower interval​

●​ Variability: More variation → wider interval​

●​ Confidence level: 99% interval is wider than 95%​

15. How do we interpret a 95% confidence interval?

Answer:​
If we repeated the experiment 100 times, about 95 of those intervals would contain the true value.​
It does not mean there's a 95% chance the value is in this one interval.

16. What is a white-noise process in statistics?

Answer:​
It’s a sequence of random values with no pattern or correlation.​
Used in time-series to model random variation.​
Example: Random stock market noise.

17. What is statistical power?

Answer:​
The probability of correctly rejecting a false null hypothesis.​
High power = low chance of a Type II error.​
Power increases with:

●​ Larger sample size​

●​ Bigger effect size​

●​ Lower variability​

18. How do sample size and effect size affect significance?

Answer:

●​ Larger sample size → more likely to detect small differences​

●​ Larger effect size → easier to find significant results​


Small samples may miss real effects.​

19. When should you use a paired t-test?

Answer:​
Use a paired t-test when comparing measurements from the same group before and after a change.​
Example: Testing blood pressure before and after medication in the same patients.

20. How is a chi-square test different from a t-test?

Answer:

●​ Chi-square test: For categorical data (counts, frequencies)​


●​ t-test: For numeric data (means)​
Example:​

●​ Chi-square: Testing if male/female preference differs by product​

●​ t-test: Comparing average income between two cities​

21. What is the purpose of conducting A/B testing in marketing?

Answer:​
A/B testing helps marketers decide which version of a product, webpage, or ad performs better by
comparing two variants. It tests hypotheses and helps optimize user experiences.​
Example: Testing two versions of an email subject line to see which one generates more clicks.

22. What are the assumptions behind using ANOVA?

Answer:

●​ The data is normally distributed​

●​ Variances are homogeneous (equal variance)​

●​ Observations are independent​


If assumptions are violated, consider using a non-parametric test like Kruskal-Wallis.​

23. What is a paired t-test and when do you use it?

Answer:​
A paired t-test compares the means of two related groups. It’s used when data points are paired or
matched (e.g., before and after measurements).​
Example: Comparing a person’s weight before and after a diet program.

24. What does it mean if a p-value is less than 0.05?

Answer:​
A p-value < 0.05 means there’s strong evidence against the null hypothesis. This indicates that the
observed result is statistically significant, and we reject H₀.​
Example: A drug’s effectiveness p-value = 0.03 suggests it's likely better than the placebo.
25. What is the difference between confidence level and significance level?

Answer:

●​ Confidence level (e.g., 95%) refers to the degree of certainty that the population parameter lies
within the confidence interval.​

●​ Significance level (e.g., 0.05) is the threshold for rejecting the null hypothesis.​
If the p-value < significance level, we reject H₀.​

26. How do you interpret the results of a one-way ANOVA?

Answer:​
If the p-value from ANOVA is less than 0.05, you conclude that at least one of the group means is
significantly different from the others. To determine which group differs, you would conduct post-hoc
tests like Tukey's HSD.​
Example: Comparing salaries across three industries (Tech, Healthcare, and Education).

27. What is the null hypothesis in A/B testing?

Answer:​
The null hypothesis in A/B testing is that there is no difference between the two versions being tested.​
Example: In a test between Version A (button red) and Version B (button green), the null hypothesis is
that the color doesn’t affect the click-through rate.

28. What does the term "statistical power" mean?

Answer:​
Statistical power is the probability that a test will correctly reject the null hypothesis when it is false
(i.e., detect a true effect). High power reduces the risk of a Type II error.​
Power = 1 - β (β is the probability of a Type II error).

29. What is the relationship between sample size and significance?

Answer:​
Larger sample sizes increase the power of a test and can make it easier to detect significant differences.
Small sample sizes may fail to detect real effects, even if they exist.​
Example: Testing a new treatment with 50 patients vs. 500 patients can yield more reliable results with
the larger sample.

30. What is a non-parametric test?

Answer:​
A non-parametric test doesn’t assume that data follows a specific distribution (e.g., normal distribution).
It is used when the data violates the assumptions of parametric tests like t-tests or ANOVA.​
Example: Mann-Whitney U test (non-parametric alternative to the t-test).

31. What is the difference between a confidence interval and a prediction interval?

Answer:

●​ Confidence interval: Provides a range for the mean of a population based on a sample.​

●​ Prediction interval: Provides a range for a single future observation.​


Prediction intervals are wider than confidence intervals because they account for variability in
individual predictions.​

32. What is the F-distribution used for in ANOVA?

Answer:​
The F-distribution is used in ANOVA to test the ratio of variances between groups. If the F-statistic is
large, it suggests the group means are significantly different.​
Example: Testing if average test scores differ between three teaching methods.

33. How does a Type I error affect decision-making?

Answer:​
A Type I error is when you incorrectly reject a true null hypothesis (a false positive). This could lead
to wrong decisions like thinking a treatment works when it actually doesn’t.​
Example: Approving a drug that has no effect.

34. What does it mean if the confidence interval for a mean includes zero?
Answer:​
If the confidence interval includes zero, it suggests there’s a possibility of no effect. This would
indicate the null hypothesis cannot be rejected at a given confidence level.​
Example: The interval for the difference in weight loss between two groups includes 0, so we fail to
reject the null hypothesis.

35. What is the chi-square goodness-of-fit test?

Answer:​
The chi-square goodness-of-fit test compares the observed frequencies of a categorical variable to the
expected frequencies under a specific hypothesis.​
Example: Testing if a dice is fair by comparing the number of times each face appears to the expected
1/6 probability.

✅ Module 6: Evaluation and Optimization


1. What is a confusion matrix and why is it important?

Answer:​
A confusion matrix is a table used to evaluate the performance of a classification model. It shows the
counts of true positives, false positives, true negatives, and false negatives.​
Example: For a spam email classifier, it shows how many emails were correctly marked as spam (true
positives) or incorrectly marked (false positives).

2. What is precision, and how is it calculated?

Answer:​
Precision measures the proportion of positive predictions that were correct. It is calculated as:

Where TP = True Positives, FP = False Positives.​


Example: In predicting whether a patient has a disease, precision tells you how many of the predicted
positives were actually sick.
3. What is recall, and how is it calculated?

Answer:​
Recall measures the proportion of actual positives that were correctly identified. It is calculated as:

Where TP = True Positives, FN = False Negatives.​


Example: If you are predicting customer churn, recall tells you how many of the actual churned
customers were correctly predicted.

4. What is specificity, and how does it differ from recall?

Answer:​
Specificity measures the proportion of actual negatives correctly identified by the model.

Where TN = True Negatives, FP = False Positives.​


Difference from Recall: Recall focuses on identifying positive cases (e.g., detecting cancer), while
specificity focuses on correctly identifying non-cases (e.g., avoiding false alarms for cancer).

5. What is an ROC curve?

Answer:​
An ROC (Receiver Operating Characteristic) curve plots the true positive rate (recall) against the
false positive rate at various thresholds. It helps you understand the trade-off between sensitivity (recall)
and specificity.​
Example: A model with a better ROC curve can better discriminate between classes.

6. What is AUC, and how is it related to the ROC curve?


Answer:​
AUC (Area Under the Curve) measures the area under the ROC curve. A higher AUC indicates better
model performance.

●​ AUC = 1: Perfect classifier​

●​ AUC = 0.5: Random classifier​


Example: A model with AUC = 0.85 is better than one with AUC = 0.70.​

7. What is lift in the context of model evaluation?

Answer:​
Lift measures how much better a model performs compared to random guessing. It’s the ratio of the
results predicted by the model to the baseline performance (random prediction).​
Example: In direct mail marketing, if a model predicts that 30% of the recipients will respond, and only
10% would respond randomly, the lift is 3.

8. What is the difference between global and local optima in optimization?

Answer:

●​ Global optima: The absolute best solution in the entire search space.​

●​ Local optima: A solution that is better than nearby solutions but not necessarily the best overall.​
Example: Imagine a mountain range; the highest peak is the global optimum, while smaller
peaks are local optima.​

9. What is unconstrained optimization?

Answer:​
Unconstrained optimization involves finding the optimal solution for a problem without any restrictions
or constraints.​
Example: Finding the maximum profit in a business without any resource limitations.

10. What is constrained optimization?


Answer:​
Constrained optimization involves finding the optimal solution while satisfying certain conditions or
constraints.​
Example: Maximizing profit subject to resource constraints, like labor or budget.

11. What is the least-squares optimization method?

Answer:​
Least-squares optimization minimizes the sum of squared errors between observed and predicted
values, commonly used in linear regression.

Example: Fitting a straight line to a scatterplot of data points.

12. How do you optimize a machine learning model?

Answer:​
Optimizing a machine learning model involves:

1.​ Hyperparameter tuning: Adjusting model parameters like learning rate or tree depth.​

2.​ Feature selection: Choosing the most important features for the model.​

3.​ Regularization: Preventing overfitting by adding penalty terms.​

4.​ Cross-validation: Testing the model on multiple data splits.​

13. What is the role of gradient descent in optimization?

Answer:​
Gradient descent is an iterative optimization algorithm used to minimize a function (like the loss function
in machine learning). It updates model parameters in the opposite direction of the gradient to find the
minimum.​
Example: In linear regression, gradient descent finds the best line by iteratively adjusting the
coefficients.
14. What is the difference between convex and non-convex optimization problems?

Answer:

●​ Convex optimization: The objective function has a single global minimum. It’s easier to solve
because any local minimum is also the global minimum.​

●​ Non-convex optimization: The objective function has multiple local minima and possibly
several global minima.​
Example: Convex: Linear regression. Non-convex: Neural networks.​

15. What are constraints in optimization?

Answer:​
Constraints are conditions that limit the solutions of an optimization problem. They could be in the form
of equality or inequality restrictions.​
Example: In business, constraints may be a limited budget, manpower, or materials.

16. What are Lagrange multipliers used for in optimization?

Answer:​
Lagrange multipliers are used to find the maximum or minimum of a function subject to equality
constraints. They help solve constrained optimization problems.​
Example: Maximizing profit while keeping production cost within a limit.

17. What is a convex function in optimization?

Answer:​
A convex function is one where the line segment between any two points on the graph lies above the
graph itself. In optimization, this property guarantees that any local minimum is the global minimum.​
Example: A simple parabolic function (e.g., f(x)=x2) is convex.

18. What is a local minimum in optimization?

Answer:​
A local minimum is a point where the function value is lower than its neighboring points, but it may not
be the lowest point overall.​
Example: A bowl-shaped curve has many local minima, but only one global minimum.

19. What is the role of the objective function in optimization?

Answer:​
The objective function is the function that needs to be maximized or minimized in an optimization
problem. It’s the central piece in finding the optimal solution.​
Example: In machine learning, the objective function could be the loss function (like mean squared
error) that needs to be minimized.

20. What is overfitting, and how does it affect model optimization?

Answer:​
Overfitting occurs when a model learns the noise in the training data rather than the actual pattern. This
results in a model that performs well on the training data but poorly on new data.​
Example: A decision tree that perfectly classifies training data but fails on test data due to overfitting.

21. What is the difference between precision and specificity?

Answer:​
Precision measures the proportion of true positives among all predicted positives. Specificity, on the
other hand, measures the proportion of true negatives among all actual negatives.​
Precision: How many predicted positives are actually correct?​
Specificity: How many actual negatives are correctly identified?​
Example: In a disease detection model, precision focuses on the accuracy of predicted positive
diagnoses, while specificity focuses on avoiding false alarms.

22. How do you choose an appropriate threshold for classification models?

Answer:​
The threshold is the probability value above which the model predicts the positive class. You can choose
a threshold based on:

●​ ROC curve analysis (maximize AUC).​

●​ Desired balance between precision and recall.​


●​ Business objectives: For example, in fraud detection, you might prioritize recall over precision to
catch more fraud cases, even at the cost of more false positives.​

23. What is the F1-score and when should you use it?

Answer:​
The F1-score is the harmonic mean of precision and recall, giving a balance between them. It’s useful
when you need to balance the trade-off between precision and recall, especially when dealing with
imbalanced datasets.

Example: In medical diagnostics, F1-score is useful when both false positives and false negatives carry
significant costs.

24. What is the difference between the ROC curve and Precision-Recall curve?

Answer:

●​ ROC Curve: Plots True Positive Rate (Recall) vs False Positive Rate. It is useful for balanced
datasets.​

●​ Precision-Recall Curve: Plots Precision vs Recall. It’s more informative when the dataset is
imbalanced (e.g., detecting rare events like fraud).​
Example: In fraud detection, a Precision-Recall curve is preferred because fraud cases are rare.​

25. What are the advantages of using cross-validation in model evaluation?

Answer:​
Cross-validation helps assess the model’s performance by training and testing it on different subsets of
data, providing a more reliable estimate of how the model will generalize to new data.​
Example: If you use k-fold cross-validation, the model is trained on k-1 folds and tested on the
remaining fold, rotating through all folds.
26. What is the role of the AUC-ROC curve in imbalanced classification problems?

Answer:​
The AUC-ROC curve helps evaluate classifier performance in imbalanced datasets by focusing on how
well the model distinguishes between the positive and negative classes, regardless of class distribution.​
Example: In predicting rare diseases, the AUC-ROC curve is useful to ensure the model isn’t biased
toward the more common negative class.

27. What is global optimization, and how does it differ from local optimization?

Answer:​
Global optimization finds the best possible solution across the entire search space. Local optimization
only finds the best solution within a local region of the search space.​
Example: In training deep learning models, local optimization (e.g., gradient descent) might only find
local minima, whereas global optimization seeks the overall best solution.

28. What is a "loss function" in optimization?

Answer:​
A loss function measures the difference between predicted and actual values. It helps guide the
optimization process by indicating how well the model is performing.​
Example: In regression, the Mean Squared Error (MSE) is a commonly used loss function to minimize
during model training.

29. What is the difference between global optima and saddle points?

Answer:

●​ Global optima represents the lowest (minimization problem) or highest (maximization problem)
point in the entire optimization space.​

●​ Saddle points are points where the derivative is zero, but they are neither minima nor maxima.​
Example: A saddle point might appear in a neural network’s loss function where the gradient is
zero, but it’s not the true global minimum.​

30. How can you use the gradient of a function in optimization?


Answer:​
The gradient indicates the direction of the steepest ascent (maximization) or descent (minimization). In
gradient-based optimization algorithms like gradient descent, the model parameters are adjusted in the
direction opposite to the gradient to minimize the loss function.​
Example: In linear regression, the gradient tells you how much to adjust the weights to reduce the error.

31. What is stochastic gradient descent (SGD), and how is it different from regular gradient
descent?

Answer:​
SGD updates the model parameters using only a single random sample (or a small batch) at each
iteration, making it computationally faster. It introduces randomness, which can help escape local optima.​
Example: Training a neural network using SGD instead of batch gradient descent allows faster
convergence with large datasets.

32. What are the limitations of using only precision or recall as evaluation metrics?

Answer:​
Using only precision or recall can be misleading.

●​ Precision ignores false negatives, and might be misleading in cases where false negatives are
important.​

●​ Recall ignores false positives, which can be a problem when false positives are costly.​
Example: In spam email detection, a high recall but low precision might flood users with spam
emails.​

33. What is regularization, and how does it help in optimization?

Answer:​
Regularization adds a penalty term to the optimization function to prevent overfitting by reducing the
complexity of the model.​
Example: In linear regression, L2 regularization (Ridge regression) adds a penalty to large coefficients to
avoid overfitting.

34. What are the main differences between L1 and L2 regularization?


Answer:

●​ L1 regularization (Lasso) encourages sparsity by setting some coefficients to zero, effectively


selecting features.​

●​ L2 regularization (Ridge) discourages large coefficients but does not set them exactly to zero,
leading to a smoother model.​
Example: L1 regularization is useful for feature selection, while L2 regularization is used to
prevent overfitting by shrinking coefficients.​

35. How do you interpret the ROC AUC score in model evaluation?

Answer:

●​ AUC = 1: Perfect model​

●​ AUC > 0.8: Excellent model​

●​ AUC between 0.7 and 0.8: Good model​

●​ AUC = 0.5: No better than random guessing​


Example: If your AUC score is 0.85, it indicates that your model has a good ability to
distinguish between the positive and negative classes.​

You might also like