Module-2 Statistical Concepts
Module-2 Statistical Concepts
● Probability Distributions……………………………………………………… 44
● Hypothesis Testing……………………………………………………………... 45
● Entropy & Information Gain…………………………………………………. 45
● Regression & Correlation…………………………………………………. 45–46
● Confusion Matrix……………………………………………………………… 46
● Bias & Variance………………………………………………………………… 46
● MCQs on Statistical Techniques …………………………………………. 47–50
2
Module Name:- Acquiring Skills on Statistical Concepts
Topic - 2.1 Able to use Descriptive & Inferential Statistics concepts in data analysis
and algorithm development
Artificial intelligence has changed industry after industry in that it helps machines
process the large amount of data available in order to know the patterns inside the
data set and take smarter decisions. Here, statistics help AI models by giving
meaning and insight into such data sets, and at its core lies descriptive and
inferential statistics. Of these, these two play vital roles in making and running
algorithms for AI.
summarize, visualize, and interpret the data. This helps identify patterns,
statistics can help find the average spending amount, detect unusual
Feature Selection: In AI and ML, not all data points (features) contribute
3
relevant and influential features, improving model accuracy and reducing
computational cost.
Precision & Recall: Helps in imbalanced datasets (e.g., detecting fraud where
fraud cases are rare).
F1 Score: Harmonic mean of precision and recall, balancing false positives and
false negatives.
4
Example: In a spam email classifier, accuracy alone may not be sufficient.
Precision and recall help evaluate whether important emails are mistakenly
classified as spam or vice versa.
Probability Theory:
of an event occurring.
email being spam based on the frequency of certain words like “lottery” or “win.”
5
Regularization (L1 & L2): Reduces overfitting by penalizing large
weights to minimize the difference between predicted and actual values (loss
function).
Central Tendency: These are mean, median, and mode that note the central
Dispersion: Standard deviation, variance, and range measure how data points
Data Visualization: Histograms, box plots, and scatter plots provide graphical
6
Types of Descriptive Statistics in AI:
data.
Common Measures:
7
Example: A survey finds that 50% of respondents prefer product A, 30%
Common Measures:
Example: If five students score 80, 85, 90, 95, and 100, then:
Mean = (80+85+90+95+100) / 5 = 90
how spread out the data values are around the central tendency.
Common Measures:
8
Standard Deviation: Square root of variance, indicating how
d.) Variability: Variability measures the inconsistency in data and how
9
Common Measures:
10
Example: In a company, employee salaries may have high variability
inequality.
data distributions.
influence on outcomes.
11
12
Types of Inferential statistics
13
Summary
14
1️⃣ Normal Distribution (Gaussian Distribution)
Used for: Modeling natural phenomena like heights, exam scores, and errors in
measurements.
Shape: Bell-shaped, symmetric around the mean.
Formula:
Where:
Formula:
15
Example: If you flip a fair coin 10 times, the probability of getting exactly 6
heads follows a binomial distribution.
Used for: Modeling rare events occurring in a fixed interval (e.g., number of
customer arrivals per minute, number of defects in a product).
Formula:
Where:
Example: If a call center gets 5 calls per minute, the Poisson distribution can
predict the probability of receiving exactly 7 calls in a minute.
Used for: Predicting a continuous (numeric) outcome based on one or more input
variables.
Formula:
16
Used for: Predicting a categorical outcome (e.g., Yes/No, 0/1, Spam/Not Spam).
17
Example: Predicting whether a customer will buy a product (Yes/No) based on
their income and browsing history.
18
5. Which is a summary statistic?
a) Mean
b) Gradient Descent
c) Confusion Matrix
d) Bayesian Network
Answer: a
19
11. What is the goal of feature selection?
a) Add features
b) Improve model accuracy
c) Visualize data
d) Summarize data
Answer: b
20
16. How does feature selection save computing power?
a) Adds variables
b) Removes irrelevant features
c) Increases data size
d) Visualizes data
Answer: b
21
22. What is hyperparameter tuning for?
a) Summarize data
b) Find best model settings
c) Calculate variance
d) Detect outliers
Answer: b
22
28. Which measures data spread?
a) Median
b) Variance
c) Mode
d) Correlation
Answer: b
23
34. Which is a real-time AI use of statistics?
a) Chip design
b) Finding customer patterns
c) Coding without data
d) Removing outliers
Answer: b
24
40. What does high correlation mean?
a) No relation
b) Strong relation
c) Outliers
d) Identical data
Answer: b
25
46. What do statistics measure in data relationships?
a) Algorithm design
b) Correlations
c) Data reduction
d) Visualization
Answer: b
26
2.2 Develop software applications using Probability Concept:
Marginal, Joint & Conditional Probability, Bayes Theorem
🔹
Formula:
For two events AAA and BBB, the marginal probability of AAA is:
📌If Example:
we have the probability distribution of students' grades based on study hours
and IQ, then the probability of a student getting an "A" regardless of IQ is a
marginal probability.
Joint probability refers to the probability of two (or more) events occurring
together.
🔹
Formula:
For events AAA and BBB:
This means the probability that both A and B happen at the same time.
📌TheExample:
probability that a randomly selected student studies for more than 5 hours
and scores an "A" is a joint probability.
27
(c) Conditional Probability
🔹 Formula:
This means we focus only on cases where BBB has already happened.
📌TheExample:
probability that a student scores an "A" given that they have studied for more
than 5 hours.
📌Email
Example2: Spam Filtering
spam filters use Bayes’ theorem to determine whether an email is spam
based on the occurrence of certain words.
28
📌Naïve
Example3: Machine Learning
Bayes classifiers in Natural Language Processing (NLP) use Bayes’
theorem for text classification.
Summary Table
Probability Distributions
Formula:
29
Question on Normal Distribution
Ques-1) Calculate the probability density function of normal distribution using the
following data. x = 3, μ = 4 and σ = 2.
Standard deviation = 2
Ques-2) If the value of the random variable is 2, mean is 5 and the standard
deviation is 4, then find the probability density function of the gaussian
distribution.
Ans-2) Given,
Variable, x = 2
Mean = 5 and
Standard deviation = 4
f(2,2,4) = 1/(4√2π) e0
f(2,2,4) = 0.0997
30
Key Features of Normal Distribution :
Bell-shaped curve.
68-95-99.7 rule (68% within 1 std. dev., 95% within 2, 99.7% within 3).
Example: Heights of people follow a normal distribution.
2. Binomial Distribution
Formula:
31
3. Poisson Distribution
1. Regression
Types:
32
2. Correlation
Definition: Measures the strength and direction of the relationship between two
variables.
000: No correlation.
📌 Example: A strong positive correlation between study hours and exam scores.
Bias-Variance Tradeoff
Key Concepts:
📌 Example:
High Bias: A straight line trying to fit complex data (too simple, underfitting).
High Variance: A very wiggly curve fitting all data points exactly (too
complex, overfitting).
33
Module-2 Topic-2 Multiple Choice Questions on
Probability Concepts in AI
1.) What is marginal probability?
a) Probability of two events
b) Probability of one event
c) Probability given another event
d) Probability of no event
Answer: b
7.) What percent of data is within one standard deviation in a normal distribution?
a) 50%
b) 68%
c) 95%
d) 99.7%
Answer: b
34
8.) What type of distribution is binomial?
a) Continuous
b) Discrete
c) Uniform
d) Skewed
Answer: b
10.) If P(A) = 0.4 and P(B) = 0.5, and A and B are independent, what is P(A ∩ B)?
a) 0.9
b) 0.2
c) 0.1
d) 0.4
Answer: b
35
15.) In Bayes' Theorem, what is P(B)?
a) Posterior
b) Prior
c) Evidence
d) Likelihood
Answer: c
36
23.) In linear regression, what is the dependent variable?
a) x
b) y
c) m
d) b
Answer: b
37
31.) If events A and B are mutually exclusive, what is P(A ∩ B)?
a) 0
b) 1
c) P(A)
d) P(B)
Answer: a
38
39.) A model fitting training data too well has:
a) High bias
b) High variance
c) Low variance
d) No error
Answer: b
39
47.) What does correlation measure?
a) Causation
b) Strength and direction
c) Mean
d) Outliers
Answer: b
40
2.3 Able to write applications using Probability Distributions,
Hypothesis Test, Entropy & Information Gain, Regression &
Correlation, Confusion Matrix, Bias & Variance
1. Probability Distributions
Probability distributions are used to model randomness and uncertain events. For
example, the Normal distribution (Gaussian) is symmetric and commonly used
for natural phenomena like height or test scores. Its formula is:
2
where μ\muμ is the mean and σ is the variance.
The Binomial distribution is used for discrete outcomes like success/failure, with
the PMF:
The Poisson distribution models rare events, like customer arrivals or system
failures:
These distributions are widely applied in risk assessment, quality control, and
prediction models.
41
2. Hypothesis Testing
Hypothesis testing is used to test assumptions or claims about data. There are two
hypotheses: Null Hypothesis (H0) and Alternative Hypothesis (H1).
We calculate a test statistic (like z-score or t-score) and derive the p-value. If the
p-value < significance level (α), we reject the null hypothesis.
Information Gain (IG) measures the reduction in entropy after a dataset is split
using an attribute:
This is used in decision tree algorithms (like ID3, C4.5) to choose the best features
for splitting.
Regression is used for predicting numerical outcomes. The formula for Simple
Linear Regression is:
y=mx+c
where m is the slope and ccc is the intercept. This is used in forecasting prices,
sales, etc.
42
Correlation measures how strongly two variables are related. The Pearson
correlation coefficient is given by:
5. Confusion Matrix
In machine learning, bias and variance are two main sources of model error.
High bias means the model is too simple and underfits the data.
High variance means the model is too complex and overfits the training data.
The ideal model balances both, known as the bias-variance tradeoff. The
total error is:
43
Topic-2 Module-3 Multiple Choice
Questions on Probability Concepts in AI
1.) What is a probability distribution?
a) A chart
b) Random events model
c) A number
d) A test
Answer: b
44
8.) Does binomial need fixed trials?
a) Yes
b) No
c) Maybe
d) Never
Answer: a
45
16.) What is α in testing?
a) Mean
b) Error level
c) Sample size
d) Data
Answer: b
46
23.) What is information gain?
a) More confusion
b) Less confusion
c) Error
d) Data size
Answer: b
47
31.) What is linear regression formula?
a) y = mx + c
b) x = my
c) y = x²
d) xy = c
Answer: a
48
39.) Correlation of +1 is?
a) No link
b) Strong positive
c) Strong negative
d) Weak
Answer: b
49
47.) What is variance?
a) Fixed model
b) Changing model
c) Wrong data
d) Simple data
Answer: b
50