0% found this document useful (0 votes)
19 views50 pages

Module-2 Statistical Concepts

The document outlines a course module on acquiring skills in statistical concepts relevant to artificial intelligence (AI), covering topics such as descriptive and inferential statistics, probability theory, and model evaluation techniques. It emphasizes the importance of statistics in data understanding, feature selection, and optimization for AI algorithms. Additionally, it includes multiple-choice questions to assess knowledge on these statistical concepts.

Uploaded by

rreddy64034
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views50 pages

Module-2 Statistical Concepts

The document outlines a course module on acquiring skills in statistical concepts relevant to artificial intelligence (AI), covering topics such as descriptive and inferential statistics, probability theory, and model evaluation techniques. It emphasizes the importance of statistics in data understanding, feature selection, and optimization for AI algorithms. Additionally, it includes multiple-choice questions to assess knowledge on these statistical concepts.

Uploaded by

rreddy64034
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

NIELIT EAST DELHI CENTRE

Institutional Area, FC-18, near Kendriya Vidyalaya, Karkardooma, East,


Delhi, 110092

Course : Certified Artificial Intelligence (AI)


Associate "Upskilling"

Module 2 : Acquiring Skills on Statistical Concepts


Module 2: Acquiring Skills on Statistical Concepts
Topic 2.1: Descriptive & Inferential Statistics in AI

●​ Role of Statistics in AI/ML (Pages 3–6)


○​ Data Understanding: Summary Stats, Visualization, Outliers……………. 3
○​ Feature Selection: Correlation, Mutual Info, PCA………………………... 4
○​ Model Evaluation: Accuracy, Precision, Recall, Confusion Matrix………. 5
○​ Probability Theory: Naïve Bayes, HMM, Bayesian Networks…………… 5
○​ Optimization: Gradient Descent, Regularization, Hyperparameter Tuning..6
●​ Descriptive vs. Inferential Statistics (Pages 6–13)
○​ Descriptive: Mean, Median, Mode, Variance, IQR…………………… 7–11
○​ Inferential: Hypothesis Testing, Confidence Intervals, Regression……… 12
○​ Probability Distributions: Normal, Binomial, Poisson………………. 13–17
○​ Regression: Linear, Logistic…………………………………………. 18–19
●​ MCQs on Statistical Concepts……………………………………………. 20–29

Topic 2.2: Probability Concepts in Software Applications

●​ Probability Types…………………………………………………………... 30–31


○​ Marginal, Joint, Conditional Probability……………………………...30–31
●​ Bayes’ Theorem……………………………………………………………. 31–32
○​ Applications: Medical Diagnosis, Spam Filtering………………………. 32
●​ Probability Distributions……………………………………………………32–35
○​ Normal, Binomial, Poisson……………………………………………32–35
●​ Regression & Correlation……………………………………………………… 35
●​ Bias-Variance Tradeoff………………………………………………………… 36
●​ MCQs on Probability Concepts…………………………………………… 37–43

Topic 2.3: Applications Using Statistical Techniques

●​ Probability Distributions……………………………………………………… 44
●​ Hypothesis Testing……………………………………………………………... 45
●​ Entropy & Information Gain…………………………………………………. 45
●​ Regression & Correlation…………………………………………………. 45–46
●​ Confusion Matrix……………………………………………………………… 46
●​ Bias & Variance………………………………………………………………… 46
●​ MCQs on Statistical Techniques …………………………………………. 47–50

2
Module Name:- Acquiring Skills on Statistical Concepts

Topic - 2.1 Able to use Descriptive & Inferential Statistics concepts in data analysis
and algorithm development

a.) Role of statistics in AI and machine learning

Artificial intelligence has changed industry after industry in that it helps machines
process the large amount of data available in order to know the patterns inside the
data set and take smarter decisions. Here, statistics help AI models by giving
meaning and insight into such data sets, and at its core lies descriptive and
inferential statistics. Of these, these two play vital roles in making and running
algorithms for AI.

Data Understanding: Before building any AI or ML model, it is essential to

summarize, visualize, and interpret the data. This helps identify patterns,

anomalies, and relationships in the dataset.

Key Techniques in Data Understanding:

Summary Statistics: Mean, Median, Mode, Variance, Standard Deviation

Data Visualization: Histograms, Box Plots, Scatter Plots

Outlier Detection: Using Z-scores or the Interquartile Range (IQR)​

Example: Suppose you have a dataset of customer purchases. Descriptive

statistics can help find the average spending amount, detect unusual

transactions, and visualize spending trends over time.

Feature Selection: In AI and ML, not all data points (features) contribute

equally to model performance. Feature selection helps identify the most

3
relevant and influential features, improving model accuracy and reducing

computational cost.

Key Techniques in Feature Selection:

Correlation Analysis: Measures the relationship between variables.​

Mutual Information: Determines how much knowing one feature reduces


uncertainty about another.​

Principal Component Analysis (PCA): Reduces dimensionality by


transforming data into a smaller set of uncorrelated features.

Example: In predicting house prices, features like location, size, and

number of bedrooms may be more important than wall color or owner's

name. Feature selection helps eliminate irrelevant features.

Model Evaluation: AI models must be evaluated to determine their

effectiveness. Many evaluation metrics are based on statistical principles.

Common Model Evaluation Metrics:

Accuracy: Measures the proportion of correctly predicted outcomes.​

Precision & Recall: Helps in imbalanced datasets (e.g., detecting fraud where
fraud cases are rare).​

F1 Score: Harmonic mean of precision and recall, balancing false positives and
false negatives.​

Confusion Matrix: A table used to understand model performance on different


classes.

4
Example: In a spam email classifier, accuracy alone may not be sufficient.
Precision and recall help evaluate whether important emails are mistakenly
classified as spam or vice versa.

Probability Theory:

Probability forms the basis of many AI algorithms, particularly in


probabilistic models such as:

Naïve Bayes Classifier: Uses Bayes' Theorem to predict the probability

of an event occurring.

Hidden Markov Models (HMM): Used in speech recognition and

natural language processing (NLP).

Bayesian Networks: Graphical models that represent probabilistic

relationships between variables.

Example: In spam detection, Bayesian probability calculates the likelihood of an

email being spam based on the frequency of certain words like “lottery” or “win.”

Optimization: Optimization techniques help AI models improve performance

by minimizing errors and maximizing accuracy. Many of these techniques

rely on statistical methods.

Common Optimization Techniques in AI:

Gradient Descent: An iterative method to minimize loss functions in

deep learning models.

5
Regularization (L1 & L2): Reduces overfitting by penalizing large

weights in neural networks.

Hyperparameter Tuning: Uses techniques like grid search and Bayesian

optimization to find the best model parameters.

Example: In training a neural network, gradient descent helps adjust

weights to minimize the difference between predicted and actual values (loss

function).

2. Types of statistics: Descriptive vs. Inferential

1. Descriptive Statistics: Descriptive statistics is a mode of summarizing and

arranging data to note meaningful patterns and generalizations. It is a way through

which raw data can be understood through measures like this:

Central Tendency: These are mean, median, and mode that note the central

value or point the data takes.

Dispersion: Standard deviation, variance, and range measure how data points

spread around the mean.

Data Visualization: Histograms, box plots, and scatter plots provide graphical

representations of data distributions.

6
Types of Descriptive Statistics in AI:

a.) Measures of Frequency: These measures describe how often a particular

value appears in a dataset. They help in understanding the distribution of

data.

Common Measures:

Count: Number of times a value appears

Percentages: Proportion of occurrences in percentage

Frequency Distributions: Tables or charts showing how often

each value appears

7
Example: A survey finds that 50% of respondents prefer product A,​ 30%​

prefer product B, and 20% prefer product C. This is a measure of frequency.

b.) Measures of Central Tendency: These measures describe the center of

the dataset, representing a typical value.

Common Measures:

Mean (Average): Sum of values divided by the total number

Median: Middle value when data is arranged in order

Mode: Most frequently occurring value

​ Example: If five students score 80, 85, 90, 95, and 100, then:

Mean = (80+85+90+95+100) / 5 = 90

Median = 90 (Middle value)

Mode = None (No repeated values)

c.) Measures of Dispersion (Spread of Data): These measures indicate

how spread out the data values are around the central tendency.

Common Measures:

Range: Difference between the maximum and minimum values

Variance: Average squared difference from the mean

8
Standard Deviation: Square root of variance, indicating how

much values deviate from the mean

Example: Two datasets:

Set A: 50, 52, 53, 51, 50 (Less spread, small variance)

Set B: 10, 30, 50, 70, 90 (More spread, large variance)​

Set B has higher dispersion than Set A.

​ ​
d.) Variability: Variability measures the inconsistency in data and how

much values differ from one another.

9
​ ​
Common Measures:

Coefficient of Variation (CV): Ratio of standard deviation to the mean,


used for comparing variability across datasets.

Interquartile Range (IQR): Difference between the 75th percentile


and the 25th percentile, removing extreme values (outliers).

10
Example: In a company, employee salaries may have high variability

(some earn ₹20,000 while others earn ₹200,000), indicating high

inequality.

2.) Inferential Statistics: Whereas descriptive statistics summarize the existing

data, inferential statistics allow AI systems to make predictions and generalize

insights about a sample to a large population.

This includes methods such as:

Hypothesis testing: It is used to determine if assumptions about distributions of

data are correct.

Hypothesis Testing: Determines the validity of assumptions made about

data distributions.

Confidence Intervals: Provides a range within which a population

parameter is expected to lie.

Regression Analysis: Predicts relationships between variables and their

influence on outcomes.

Bayesian Inference: Utilizes prior knowledge to update probabilities,

aiding AI models in decision-making.

Probability Distributions: Normal, Poisson and Binomial distributions

help model uncertainty in AI predictions.

11
12
Types of Inferential statistics

Hypothesis Testing: t-tests, Chi-square test

​ ​

13
Summary

t-Test → Compares means (numerical data).​

Chi-Square Test → Compares categorical data.

Probability Distributions: Normal, Binomial, Poisson distributions

14
1️⃣ Normal Distribution (Gaussian Distribution)

Used for: Modeling natural phenomena like heights, exam scores, and errors in
measurements.
Shape: Bell-shaped, symmetric around the mean.

Formula:

Where:

μ\muμ = Mean (center)


σ\sigmaσ = Standard deviation (spread)

Example: IQ scores follow a normal distribution with a mean of 100 and a


standard deviation of 15.

2️⃣ Binomial Distribution

Used for: Modeling the number of successes in a fixed number of independent


trials (e.g., flipping a coin, passing/failing an exam).

Formula:

15
Example: If you flip a fair coin 10 times, the probability of getting exactly 6
heads follows a binomial distribution.

3️⃣ Poisson Distribution

Used for: Modeling rare events occurring in a fixed interval (e.g., number of
customer arrivals per minute, number of defects in a product).
Formula:

Where:

λ = Average rate of occurrence


k = Number of occurrences

Example: If a call center gets 5 calls per minute, the Poisson distribution can
predict the probability of receiving exactly 7 calls in a minute.

3. Regression Analysis: Linear & Logistic Regression

1️⃣ Linear Regression:

Used for: Predicting a continuous (numeric) outcome based on one or more input
variables.
Formula:

16
​ ​

Types of Linear Regression:

Simple Linear Regression → One independent variable (e.g., predicting salary


based on experience).

Multiple Linear Regression → Multiple independent variables (e.g., predicting


house price based on size, location, and number of rooms).

Example: Predicting house prices based on square footage.


2️⃣ Logistic Regression

Used for: Predicting a categorical outcome (e.g., Yes/No, 0/1, Spam/Not Spam).

Formula (Sigmoid Function):

​ ​

17
Example: Predicting whether a customer will buy a product (Yes/No) based on
their income and browsing history.

Topic-2 Module-1 Multiple Choice Questions on Statistical Concepts in AI

1. What do statistics do in AI?


a) Design hardware
b) Give insights into data
c) Write code
d) Remove data
Answer: b

2. Which statistics summarizes data?


a) Inferential
b) Descriptive
c) Predictive
d) Computational
Answer: b

3. What does inferential statistics help with?


a) Summarize data
b) Predict for a population
c) Visualize data
d) Choose features
Answer: b

4. Why is understanding data important in AI?


a) To pick hardware
b) To find patterns
c) To increase data
d) To optimize models
Answer: b

18
5. Which is a summary statistic?
a) Mean
b) Gradient Descent
c) Confusion Matrix
d) Bayesian Network
Answer: a

6. What does a histogram show?


a) Data frequency
b) Variable relationships
c) Model accuracy
d) Outliers
Answer: a

7. Which plot shows outliers?


a) Scatter Plot
b) Box Plot
c) Histogram
d) Pie Chart
Answer: b

8. How are outliers found using Z-scores?


a) Z-scores > 3 or < -3
b) Equal to mean
c) Inside IQR
d) Most frequent
Answer: a

9. What does Interquartile Range (IQR) find?


a) Mean
b) Outliers
c) Correlation
d) Accuracy
Answer: b

10. What can descriptive statistics find in customer data?


a) Average spending
b) Hardware needs
c) Optimization method
d) Event probability
Answer: a

19
11. What is the goal of feature selection?
a) Add features
b) Improve model accuracy
c) Visualize data
d) Summarize data
Answer: b

12. Which method checks variable relationships?


a) Correlation Analysis
b) Gradient Descent
c) Standard Deviation
d) Mutual Information
Answer: a

13. What does Mutual Information measure?


a) Data spread
b) Feature uncertainty reduction
c) Dataset mean
d) Value frequency
Answer: b

14. What does PCA do?


a) Increase data size
b) Reduce dimensions
c) Find variance
d) Detect outliers
Answer: b

15. Which feature matters more for house prices?


a) Wall color
b) Location
c) Owner’s name
d) Paint brand
Answer: b

20
16. How does feature selection save computing power?
a) Adds variables
b) Removes irrelevant features
c) Increases data size
d) Visualizes data
Answer: b

17. Which is NOT a feature selection method?


a) Correlation Analysis
b) Histogram
c) PCA
d) Mutual Information
Answer: b

18. What does optimization do in AI?


a) Summarize data
b) Reduce errors
c) Visualize patterns
d) Count frequencies
Answer: b

19. Which method is used in deep learning optimization?


a) Z-score
b) Gradient Descent
c) Box Plot
d) IQR
Answer: b

20. What does Gradient Descent change in a neural network?


a) Data size
b) Weights
c) Correlations
d) Visualizations
Answer: b

21. Which method reduces large weights?


a) L1 & L2 Regularization
b) PCA
c) Mutual Information
d) Summary Statistics
Answer: a

21
22. What is hyperparameter tuning for?
a) Summarize data
b) Find best model settings
c) Calculate variance
d) Detect outliers
Answer: b

23. Which method helps tune hyperparameters?


a) Grid Search
b) Histogram
c) Standard Deviation
d) Scatter Plot
Answer: a

24. What does minimizing loss in a neural network do?


a) Increases data
b) Reduces prediction errors
c) Shows trends
d) Finds features
Answer: b

25. What do statistics help AI algorithms do?


a) Replace data
b) Make better decisions
c) Design hardware
d) Remove outliers
Answer: b

26. Which is a visualization method?


a) Gradient Descent
b) Scatter Plot
c) Regularization
d) Bayesian Optimization
Answer: b

27. What can summary statistics find?


a) Model accuracy
b) Unusual data
c) Hardware speed
d) Probabilities
Answer: b

22
28. Which measures data spread?
a) Median
b) Variance
c) Mode
d) Correlation
Answer: b

29. What does a box plot show?


a) Frequency
b) Spread and outliers
c) Correlations
d) Model performance
Answer: b

30. What does PCA help with?


a) Add features
b) Simplify models
c) Increase cost
d) Find mean
Answer: b

31. Which method uses statistical dependency?


a) Mutual Information
b) Gradient Descent
c) Histogram
d) Z-score
Answer: a

32. What can visualization show in customer data?


a) Hardware needs
b) Spending trends
c) Model settings
d) Success probability
Answer: b

33. What does L1 regularization prevent?


a) Model complexity
b) Overfitting
c) Data visualization
d) Frequency counts
Answer: b

23
34. Which is a real-time AI use of statistics?
a) Chip design
b) Finding customer patterns
c) Coding without data
d) Removing outliers
Answer: b

35. What does standard deviation measure?


a) Central value
b) Data spread
c) Frequency
d) Accuracy
Answer: b

36. Which is NOT used in data understanding?


a) Summary Statistics
b) Visualization
c) Gradient Descent
d) Outlier Detection
Answer: c

37. What does feature selection improve?


a) Data noise
b) Model accuracy
c) Data size
d) Visualization
Answer: b

38. What does Bayesian optimization do?


a) Calculate variance
b) Tune hyperparameters
c) Visualize data
d) Find outliers
Answer: b

39. Which plot shows variable relationships?


a) Histogram
b) Box Plot
c) Scatter Plot
d) Pie Chart
Answer: c

24
40. What does high correlation mean?
a) No relation
b) Strong relation
c) Outliers
d) Identical data
Answer: b

41. Why are data patterns important in AI?


a) Choose hardware
b) Make smart decisions
c) Increase data
d) Remove statistics
Answer: b

42. Which is an optimization method?


a) Histogram
b) Regularization
c) Box Plot
d) Frequency Table
Answer: b

43. What does the mean show?


a) Most frequent value
b) Average value
c) Middle value
d) Data range
Answer: b

44. Which method finds data anomalies?


a) Gradient Descent
b) Outlier Detection
c) PCA
d) Correlation
Answer: b

45. What does removing irrelevant features improve?


a) Visualization
b) Model efficiency
c) Data size
d) Hardware
Answer: b

25
46. What do statistics measure in data relationships?
a) Algorithm design
b) Correlations
c) Data reduction
d) Visualization
Answer: b

47. Which method reduces errors in optimization?


a) Summary Statistics
b) Gradient Descent
c) Z-score
d) Histogram
Answer: b

48. What does a scatter plot show?


a) Frequency
b) Variable relationships
c) Model accuracy
d) Hardware speed
Answer: b

49. Why reduce overfitting in AI?


a) Increase complexity
b) Improve generalization
c) Summarize data
d) Visualize data
Answer: b

50. What do statistics provide in AI?


a) Hardware details
b) Data insights
c) Code syntax
d) Data removal
Answer: b

26
2.2 Develop software applications using Probability Concept:
Marginal, Joint & Conditional Probability, Bayes Theorem

1. Marginal, Joint, and Conditional Probability


(a) Marginal Probability

Marginal probability refers to the probability of a single event occurring, regardless


of other variables. It is obtained by summing (or integrating) over possible values
of other variables.

🔹

Formula:​
For two events AAA and BBB, the marginal probability of AAA is:

For continuous distributions, the summation is replaced by integration.

📌If Example:​
we have the probability distribution of students' grades based on study hours
and IQ, then the probability of a student getting an "A" regardless of IQ is a
marginal probability.

(b) Joint Probability

Joint probability refers to the probability of two (or more) events occurring
together.

🔹

Formula:​
For events AAA and BBB:

​ ​

This means the probability that both A and B happen at the same time.

📌TheExample:​
probability that a randomly selected student studies for more than 5 hours
and scores an "A" is a joint probability.

27
(c) Conditional Probability

Conditional probability is the probability of an event occurring given that another


event has already occurred.

🔹 Formula:

This means we focus only on cases where BBB has already happened.

📌TheExample:​
probability that a student scores an "A" given that they have studied for more
than 5 hours.

2. Bayes' Theorem and Applications


Bayes' Theorem provides a way to update probabilities based on new
evidence.
🔹 Formula:

📌If Example1: Medical Diagnosis​


a test for a disease is 90% accurate but the disease is rare (1% prevalence),
Bayes’ theorem helps determine the true probability that a person has the disease
given a positive test result.

📌Email
Example2: Spam Filtering​
spam filters use Bayes’ theorem to determine whether an email is spam
based on the occurrence of certain words.

28
📌Naïve
Example3: Machine Learning​
Bayes classifiers in Natural Language Processing (NLP) use Bayes’
theorem for text classification.

Summary Table

Probability Distributions

1. Normal Distribution (Gaussian Distribution)

Normal distribution, also known as the Gaussian distribution, is a probability


distribution that is symmetric about the mean, showing that data near the mean are
more frequent in occurrence than data far from the mean. The normal distribution
appears as a "bell curve" when graphed.

Formula:

29
Question on Normal Distribution

Ques-1) Calculate the probability density function of normal distribution using the
following data. x = 3, μ = 4 and σ = 2.

Ans-1) Given, variable, x = 3


Mean = 4 and

Standard deviation = 2

By the formula of the probability density of normal distribution, we can write;

Hence, f(3,4,2) = 1.106.

Ques-2) If the value of the random variable is 2, mean is 5 and the standard
deviation is 4, then find the probability density function of the gaussian
distribution.

Ans-2) Given,
Variable, x = 2

Mean = 5 and

Standard deviation = 4

f(2,2,4) = 1/(4√2π) e0

f(2,2,4) = 0.0997

30
Key Features of Normal Distribution :

Bell-shaped curve.​

Mean = Median = Mode.​

68-95-99.7 rule (68% within 1 std. dev., 95% within 2, 99.7% within 3).​
Example: Heights of people follow a normal distribution.

2. Binomial Distribution

Definition: A discrete probability distribution for the number of successes in n


independent trials.​

Formula:

31
3. Poisson Distribution

Definition: A discrete probability distribution that models the number of events


occurring in a fixed interval.​
Formula:

where λ is the expected number of events in a given time.​

Key Features: * Describes rare events.


* Mean and variance both equal λ.
Example: Number of calls received at a call center per hour.

Regression and Correlation

1. Regression

Definition: A statistical technique to model the relationship between dependent


and independent variables.​

Types:​

Linear Regression: Y=mX+bY = mX + bY=mX+b (predicting a


continuous value).​

Logistic Regression: Used for classification (predicting probability of


categories).​

📌 Example: Predicting house prices based on size.

32
2. Correlation

Definition: Measures the strength and direction of the relationship between two
variables.​

Correlation Coefficient (rrr) Range:​

+1+1+1: Perfect positive correlation.​

000: No correlation.​

−1-1−1: Perfect negative correlation.​

📌 Example: A strong positive correlation between study hours and exam scores.
Bias-Variance Tradeoff

Definition: A fundamental problem in machine learning that describes the


tradeoff between bias and variance.​

Key Concepts:​

Bias: Error due to overly simple models (underfitting).​

Variance: Error due to overly complex models (overfitting).​

Goal: Find a balance to minimize total error.​

📌 Example:
High Bias: A straight line trying to fit complex data (too simple, underfitting).​

High Variance: A very wiggly curve fitting all data points exactly (too
complex, overfitting).

33
Module-2 Topic-2 Multiple Choice Questions on
Probability Concepts in AI
1.) What is marginal probability?​
a) Probability of two events​
b) Probability of one event​
c) Probability given another event​
d) Probability of no event​
Answer: b​

2.) What does joint probability measure?​


a) One event alone​
b) Two events together​
c) Event after another​
d) Event difference​
Answer: b​

3.) What is conditional probability?​


a) Probability of one event​
b) Probability given another event​
c) Probability of both events​
d) Probability of no event​
Answer: b​

4.) What does Bayes' Theorem do?​


a) Finds variance​
b) Updates probabilities​
c) Measures correlation​
d) Calculates mean​
Answer: b​

5.) What shape is a normal distribution?​


a) Flat​
b) Bell-shaped​
c) Skewed​
d) Straight​
Answer: b​

6.) In a normal distribution, what equals the mean?​


a) Variance​
b) Median and mode​
c) Standard deviation​
d) Correlation​
Answer: b​

7.) What percent of data is within one standard deviation in a normal distribution?​
a) 50%​
b) 68%​
c) 95%​
d) 99.7%​
Answer: b​

34
8.) What type of distribution is binomial?​
a) Continuous​
b) Discrete​
c) Uniform​
d) Skewed​
Answer: b​

9.) What does Poisson distribution model?​


a) Continuous data​
b) Rare events​
c) Fixed trials​
d) Normal data​
Answer: b​

10.) If P(A) = 0.4 and P(B) = 0.5, and A and B are independent, what is P(A ∩ B)?​
a) 0.9​
b) 0.2​
c) 0.1​
d) 0.4​
Answer: b​

11.) What does correlation measure?​


a) Data spread​
b) Relationship strength​
c) Average value​
d) Probability​
Answer: b​

12.) What is the range of the correlation coefficient?​


a) 0 to 1​
b) -1 to 1​
c) -2 to 2​
d) 1 to 2​
Answer: b​

13.) What is logistic regression used for?​


a) Predicting numbers​
b) Classification​
c) Finding variance​
d) Plotting data​
Answer: b​

14.) What does high variance cause in a model?​


a) Underfitting​
b) Overfitting​
c) No error​
d) Low bias​
Answer: b​

35
15.) In Bayes' Theorem, what is P(B)?​
a) Posterior​
b) Prior​
c) Evidence​
d) Likelihood​
Answer: c​

16.) If A and B are independent, what is P(A ∩ B)?​


a) P(A) + P(B)​
b) P(A) * P(B)​
c) P(A) - P(B)​
d) P(A) / P(B)​
Answer: b​

17.) What is the expected value of a Poisson distribution?​


a) 1​
b) λ​
c) n​
d) σ²​
Answer: b​

18.) Is a normal distribution symmetrical?​


a) Yes​
b) No​
c) Sometimes​
d) Never​
Answer: a​

19.) What rule applies to normal distribution?​


a) Bayes' Rule​
b) 68-95-99.7 Rule​
c) Chebyshev's Rule​
d) Bernoulli's Rule​
Answer: b​

20.) What does regression predict?​


a) Variance​
b) Relationships​
c) Probability​
d) Outliers​
Answer: b​

21.) What distribution models calls per hour?​


a) Normal​
b) Binomial​
c) Poisson​
d) Uniform​
Answer: c

22.) In a binomial experiment, trials are:​


a) Dependent​
b) Independent​
c) Continuous​
d) Uniform​
Answer: b​

36
23.) In linear regression, what is the dependent variable?​
a) x​
b) y​
c) m​
d) b​
Answer: b​

24.) In Y = mX + b, what is m?​


a) Intercept​
b) Slope​
c) Error​
d) Mean​
Answer: b​

25.) What does high bias cause?​


a) Overfitting​
b) Underfitting​
c) No error​
d) High variance​
Answer: b​

26.) What is the total area under a probability density curve?​


a) 0​
b) 1​
c) 2​
d) Depends​
Answer: b​

27.) What does Poisson distribution assume?​


a) Fixed trials​
b) Rare events​
c) Uniform outcomes​
d) Large population​
Answer: b​

28.) Which distribution is used for binary classification?​


a) Normal​
b) Logistic​
c) Linear​
d) Poisson​
Answer: b​

29.) What does a correlation of 0 mean?​


a) Strong positive​
b) Strong negative​
c) No relationship​
d) Perfect correlation​
Answer: c​

30.) In normal distribution, what is σ?​


a) Mean​
b) Median​
c) Standard deviation​
d) Mode​
Answer: c​

37
31.) If events A and B are mutually exclusive, what is P(A ∩ B)?​
a) 0​
b) 1​
c) P(A)​
d) P(B)​
Answer: a​

32.) Which is not a valid probability?​


a) 0​
b) 0.5​
c) 1​
d) 1.5​
Answer: d​

33.) When mean equals variance, which distribution is it?​


a) Binomial​
b) Normal​
c) Poisson​
d) Uniform​
Answer: c​

34.) In a positively skewed distribution, what is true?​


a) Mean < Median​
b) Mean > Median​
c) Mean = Median​
d) Median > Mode​
Answer: b​

35.) Where is bias-variance tradeoff important?​


a) Sampling​
b) Machine learning​
c) Hypothesis testing​
d) Distribution modeling​
Answer: b​

36.) What is the goal of bias-variance tradeoff?​


a) Maximize bias​
b) Minimize variance​
c) Minimize total error​
d) Increase correlation​
Answer: c​

37.) What is a perfect positive correlation?​


a) 0​
b) -1​
c) 0.5​
d) +1​
Answer: d​

38.) Which formula is Bayes' Theorem?​


a) P(A ∩ B) = P(A) + P(B)​
b) P(A | B) = P(B | A) P(A) / P(B)​
c) P(A) = P(A | B) / P(B)​
d) P(A) = P(B)​
Answer: b​

38
39.) A model fitting training data too well has:​
a) High bias​
b) High variance​
c) Low variance​
d) No error​
Answer: b​

40.) What is the outcome of a Bernoulli trial?​


a) Many outcomes​
b) Continuous​
c) Success or failure​
d) Multivariate​
Answer: c​

41.) What is the sum of probabilities in a binomial distribution?​


a) 0​
b) 1​
c) Depends on p​
d) Depends on n​
Answer: b​

42.) If P(A ∩ B) = 0.1 and P(B) = 0.5, what is P(A | B)?​


a) 0.2​
b) 0.5​
c) 0.1​
d) 0.6​
Answer: a​

43.) What is regression used for?​


a) Find probability​
b) Estimate relationships​
c) Calculate variance​
d) Find sample size​
Answer: b

44.) What is not needed in Bayes' Theorem?​


a) Prior probability​
b) Variance​
c) Likelihood​
d) Evidence​
Answer: b

45.) When does overfitting happen?​


a) Model is too simple​
b) Model is too complex​
c) Too little data​
d) No error​
Answer: b​

46.) What is the mean of a binomial distribution?​


a) np​
b) n/p​
c) p/n​
d) n²p​
Answer: a​

39
47.) What does correlation measure?​
a) Causation​
b) Strength and direction​
c) Mean​
d) Outliers​
Answer: b​

48.) What is the probability of the complement of A?​


a) P(A)​
b) 1 + P(A)​
c) 1 - P(A)​
d) P(A ∩ A)​
Answer: c​

49.) In Bayes' Theorem, P(B) is called:​


a) Posterior​
b) Prior​
c) Evidence​
d) Likelihood​
Answer: c​

50.) In normal distribution, 95% of data is within:​


a) 1 standard deviation​
b) 2 standard deviations​
c) 3 standard deviations​
d) 4 standard deviations​
Answer: b​

40
2.3 Able to write applications using Probability Distributions,
Hypothesis Test, Entropy & Information Gain, Regression &
Correlation, Confusion Matrix, Bias & Variance

1. Probability Distributions

Probability distributions are used to model randomness and uncertain events. For
example, the Normal distribution (Gaussian) is symmetric and commonly used
for natural phenomena like height or test scores. Its formula is:

2
where μ\muμ is the mean and σ is the variance.
The Binomial distribution is used for discrete outcomes like success/failure, with
the PMF:

The Poisson distribution models rare events, like customer arrivals or system
failures:

These distributions are widely applied in risk assessment, quality control, and
prediction models.

41
2. Hypothesis Testing

Hypothesis testing is used to test assumptions or claims about data. There are two
hypotheses: Null Hypothesis (H0​) and Alternative Hypothesis (H1​).​

We calculate a test statistic (like z-score or t-score) and derive the p-value. If the
p-value < significance level (α), we reject the null hypothesis.​

Example: A company may use a two-sample t-test to check whether a new


marketing strategy significantly increases sales compared to the old one.

3. Entropy & Information Gain

Entropy measures uncertainty in a dataset and is calculated as:

where pi​is the proportion of class iii.

Information Gain (IG) measures the reduction in entropy after a dataset is split
using an attribute:

This is used in decision tree algorithms (like ID3, C4.5) to choose the best features
for splitting.

4. Regression & Correlation

Regression is used for predicting numerical outcomes. The formula for Simple
Linear Regression is:

y=mx+c

where m is the slope and ccc is the intercept. This is used in forecasting prices,
sales, etc.

42
Correlation measures how strongly two variables are related. The Pearson
correlation coefficient is given by:

A value of r=1 indicates a perfect positive correlation.

5. Confusion Matrix

A confusion matrix is used to evaluate classification model performance. It


consists of:

True Positive (TP): Correctly predicted positive cases​

True Negative (TN): Correctly predicted negative cases​

False Positive (FP): Incorrectly predicted positive​

False Negative (FN): Missed actual positive​


From this, we calculate:

This helps analyze how well a model is performing in different real-world


scenarios like spam detection or medical diagnosis.

6. Bias & Variance

In machine learning, bias and variance are two main sources of model error.

High bias means the model is too simple and underfits the data.​

High variance means the model is too complex and overfits the training data.​
The ideal model balances both, known as the bias-variance tradeoff. The
total error is:

43
Topic-2 Module-3 Multiple Choice
Questions on Probability Concepts in AI
1.) What is a probability distribution?​
a) A chart​
b) Random events model​
c) A number​
d) A test​
Answer: b​

2.) What is normal distribution's shape?​


a) Flat​
b) Bell​
c) Line​
d) Circle​
Answer: b​

3.) What does binomial count?​


a) Heights​
b) Successes​
c) Rare events​
d) Time​
Answer: b​

4.) What does Poisson count?​


a) Normal data​
b) Rare events​
c) Trials​
d) Numbers​
Answer: b​

5.) What is the mean in normal distribution?​


a) Spread​
b) Middle​
c) Total​
d) Error​
Answer: b​

6.) What is the sum of probabilities?​


a) 0​
b) 1​
c) 2​
d) Any​
Answer: b​

7.) In Poisson, is variance same as mean?​


a) No​
b) Yes​
c) Sometimes​
d) Never​
Answer: b​

44
8.) Does binomial need fixed trials?​
a) Yes​
b) No​
c) Maybe​
d) Never​
Answer: a​

9.) What models device life?​


a) Binomial​
b) Poisson​
c) Exponential​
d) Normal​
Answer: c​

10.) What is binomial variance?​


a) np​
b) np(1-p)​
c) λ​
d) n​
Answer: b​

11.) What is null hypothesis?​


a) We prove it​
b) We assume it​
c) It’s false​
d) A guess​
Answer: b​

12.) If p-value is low (<0.05), what do we do?​


a) Keep null​
b) Reject null​
c) Stop test​
d) Ignore​
Answer: b​

13.) What is Type I error?​


a) Wrong reject of true null​
b) Wrong keep of false null​
c) Right guess​
d) No data​
Answer: a​

14.) What is a common confidence level?​


a) 50%​
b) 95%​
c) 10%​
d) 80%​
Answer: b​

15.) When is two-tailed test used?​


a) Direction known​
b) No direction​
c) Small data​
d) Big data​
Answer: b​

45
16.) What is α in testing?​
a) Mean​
b) Error level​
c) Sample size​
d) Data​
Answer: b​

17.) What test compares two means?​


a) t-test​
b) Chi-square​
c) ANOVA​
d) z-test​
Answer: a​

18.) If p-value is high (>0.05), what happens?​


a) Reject null​
b) Keep null​
c) Prove alternative​
d) End test​
Answer: b​

19.) What is hypothesis testing for?​


a) Drawing​
b) Deciding​
c) Guessing​
d) Counting​
Answer: b​

20.) What is alternative hypothesis?​


a) Always true​
b) We test it​
c) Always false​
d) Ignored​
Answer: b​

21.) What is entropy?​


a) Speed​
b) Confusion​
c) Size​
d) Accuracy​
Answer: b​

22.) When is entropy high?​


a) One class wins​
b) Classes are equal​
c) No data​
d) All same​
Answer: b​

46
23.) What is information gain?​
a) More confusion​
b) Less confusion​
c) Error​
d) Data size​
Answer: b

24.) What is entropy of same-class data?​


a) 0​
b) 1​
c) -1​
d) Big​
Answer: a​

25.) Where is information gain used?​


a) Regression​
b) Decision trees​
c) Graphs​
d) Tests​
Answer: b​

26.) High information gain is?​


a) Bad​
b) Good​
c) Wrong​
d) Same​
Answer: b​

27.) What is entropy formula?​


a) Σp²​
b) -Σp log₂ p​
c) Σx​
d) -Σp​
Answer: b​

28.) Which uses information gain?​


a) KNN​
b) ID3​
c) SVM​
d) Bayes​
Answer: b​

29.) Zero information gain means?​


a) Best split​
b) No help​
c) Error​
d) Overfit​
Answer: b​

30.) What does regression predict?​


a) Labels​
b) Numbers​
c) Groups​
d) Errors​
Answer: b​

47
31.) What is linear regression formula?​
a) y = mx + c​
b) x = my​
c) y = x²​
d) xy = c​
Answer: a​

32.) In regression, m is?​


a) Start​
b) Slope​
c) Error​
d) Total​
Answer: b​

33.) What does correlation show?​


a) Cause​
b) Link​
c) Error​
d) Size​
Answer: b​

34.) What is correlation range?​


a) 0 to 1​
b) -1 to 1​
c) -2 to 2​
d) 1 to 10​
Answer: b​

35.) Correlation of 0 means?​


a) Strong link​
b) No link​
c) Negative link​
d) Big link​
Answer: b​

36.) High R-squared means?​


a) Bad fit​
b) Good fit​
c) Error​
d) Bias​
Answer: b​

37.) Regression minimizes what?​


a) Errors​
b) Points​
c) Lines​
d) Data​
Answer: a​

38.) What is regression intercept?​


a) y at x = 0​
b) Slope​
c) Mean​
d) Error​
Answer: a​

48
39.) Correlation of +1 is?​
a) No link​
b) Strong positive​
c) Strong negative​
d) Weak​
Answer: b

40.) Confusion matrix is for?​


a) Numbers​
b) Classes​
c) Graphs​
d) Tests​
Answer: b​

41.) What is accuracy?​


a) TP / Total​
b) (TP + TN) / Total​
c) TP / FP​
d) FN / Total​
Answer: b​

42.) What is precision?​


a) TP / (TP + FP)​
b) TP / (TP + FN)​
c) TN / Total​
d) FP / Total​
Answer: a​

43) Recall is also called?​


a) Error​
b) Sensitivity​
c) Accuracy​
d) Specificity​
Answer: b​

44.) False positives are?​


a) Wrong positive guess​
b) Wrong negative guess​
c) Right guess​
d) No guess​
Answer: a​

45.) What is bias?​


a) Wrong guess​
b) Simple model​
c) Complex model​
d) Noise​
Answer: b​

46.) High bias causes?​


a) Overfit​
b) Underfit​
c) No error​
d) Variance​
Answer: b​

49
47.) What is variance?​
a) Fixed model​
b) Changing model​
c) Wrong data​
d) Simple data​
Answer: b

48.) High variance causes?​


a) Underfit​
b) Overfit​
c) No error​
d) Bias​
Answer: b​

49.) Bias-variance tradeoff balances?​


a) Errors​
b) Fit and overfit​
c) True and false​
d) Points​
Answer: b​

50.) Total error includes?​


a) Bias only​
b) Variance only​
c) Bias + Variance + Error​
d) Noise only​
Answer: c​

50

You might also like