0% found this document useful (0 votes)
9 views

Module 2 Math Foundation II

The document outlines the mathematical foundations for AI, focusing on probability, statistics, and calculus. It covers key concepts such as Bayes' theorem, various probability distributions, statistical inference, and optimization techniques including gradients and convex optimization. These mathematical principles are essential for developing robust AI systems and improving model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Module 2 Math Foundation II

The document outlines the mathematical foundations for AI, focusing on probability, statistics, and calculus. It covers key concepts such as Bayes' theorem, various probability distributions, statistical inference, and optimization techniques including gradients and convex optimization. These mathematical principles are essential for developing robust AI systems and improving model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

www.covenantuniversity.edu.

ng

Raising a new Generation of Leaders

CEN524: MATHEMATICAL
FOUNDATIONS FOR AI

BY
Omoruyi O., Olatimehin O. and Ajilore A.
Module 2
Outline II
• Introduction to Probability and Statistics
- Define probability and statistics, and their importance in AI
- Introduce Bayes' theorem and its significance
• Bayes' Theorem
- Explain the concept of Bayes' theorem and its application
- Provide examples and practice problems
• Distributions
- Introduce different types of distributions, such as Gaussian, Bernoulli, and Poisson
- Explain their importance in AI and provide examples
• Statistical Inference
- Introduce the concept of statistical inference and its application in AI
- Explain the importance of hypothesis testing and confidence intervals

2
Outline III
• Introduction to Calculus
- Define calculus and its importance in AI
- Introduce the concept of gradients and optimization
• Gradients
- Explain the concept of gradients and their application in optimization
- Provide examples and practice problems
• Optimization Basics
- Introduce the concept of optimization and its application in AI
- Explain the importance of convex optimization and local minima
• Applications of Calculus in AI
- Explain the application of calculus in AI, such as in deep learning and neural networks
- Provide examples and case studies

3
Defining Probability and Statistics, and Their Importance in AI
• Probability: The study of uncertainty, quantifying the likelihood of
events (e.g., a coin landing heads has a 50% chance). It provides a
mathematical framework to model randomness.
• Statistics: The science of collecting, analyzing, and interpreting data
to uncover patterns or make predictions.
• Importance in AI:
§ AI systems rely on probability to handle uncertainty (e.g., predicting
outcomes in self-driving cars).
§ Statistics enables data-driven decisions, model evaluation, and learning
from patterns (e.g., training machine learning models on datasets).
§ Together, they form the backbone of algorithms like neural networks,
decision trees, and reinforcement learning.

4
Bayes' Theorem
A fundamental rule in probability that updates beliefs based on
new evidence. Enables reasoning under uncertainty (e.g., spam
email detection).
• P(A|B) = [P(B|A) · P(A)] / P(B)
• ( P(A|B) ): Posterior probability (probability of A given B).
• ( P(B|A) ): Likelihood (probability of B given A).
• ( P(A) ): Prior probability (initial belief about A).
• ( P(B) ): Evidence (normalizing constant).
• Foundation for probabilistic models like Naive Bayes classifiers
and Bayesian networks.

5
Bayes' Theorem and Its Application
Bayes' theorem reverses conditional probabilities, allowing us to update probabilities as
new data arrives.
Example 1: If 1% of people have a disease (P(A) = 0.01), a test is 95% accurate (P(B|A) =
0.95), and 10% false positive rate (P(B|¬A) = 0.10), what's the probability of having the
disease given a positive test (P(A|B))?
Use Bayes:
P(A|B) = [P(B|A) · P(A)] / P(B) where
P(B) = P(B|A) · P(A) + P(B|¬A) · P(¬A)
Example 2: A disease affects 2% of people (P(D) = 0.02). A test is 95% accurate for
positives (P(T|D) = 0.95) and has a 10% false positive rate (P(T|¬D) = 0.10). If the test is
positive, what's P(D|T)?
P(T) = P(T|D) · P(D) + P(T|¬D) · P(¬D) = (0.95 · 0.02) + (0.10 · 0.98) = 0.019 + 0.098 =
0.117
P(D|T) = [P(T|D) · P(D)] / P(T) = 0.019 / 0.117 ≈ 0.162 (16.2%).

6
Bayes Practice Problems
Problem 1: A robot predicts rain with 80% accuracy.
If it rains 30% of the time, what’s the probability it’s
raining given the robot predicts rain?
• P(Rain|Predict) = (P(Predict|Rain) · P(Rain)) /
P(Predict)
Problem 2: A model detects fraud with 90% accuracy.
Fraud occurs in 2% of transactions. If the model flags
a transaction, what’s the chance it’s fraud?

7
Different Types of Distributions
Gaussian (Normal) Distribution: Continuous, bell-shaped distribution
defined by mean (μ) and variance (σ²). Example: Test scores or sensor noise.
Bernoulli Distribution: Discrete, models a single trial with two outcomes
(success = 1 with probability p, failure = 0). Example: Whether a user clicks a
link.
Poisson Distribution: Discrete, models the number of events in a fixed interval,
parameterized by λ (average rate). Example: Number of emails received per
hour.
Gaussian: Assumed in many models (e.g., linear regression) and used for data
normalization.
Bernoulli: Core to binary classification tasks (e.g., fraud detection).
Poisson: Useful for modeling event frequencies (e.g., traffic prediction).

8
Probability Distribution

9
Probability Distribution in AI
• Gaussian in AI: Neural networks often assume input
features are normally distributed after standardization.
Example: Predicting house pr ices with normally
distributed errors.
• Bernoulli in AI: Used in logistic regression to predict
binary outcomes. Example: Classifying an image as “cat”
or “not cat.”
• Poisson in AI: Models rare events in time-series data.
Example: Predicting server failures based on historical
crash rates.

10
Statistical Inference and Its Application in AI
Statistical Inference is the process of using sample data to make
generalizations about a population. Includes estimating
parameters (e.g., mean accuracy of a model) and testing
hypotheses.
Application in AI:
• Parameter Estimation: Inferring weights in a machine
learning model from training data.
• M o d e l E v a l u a t i o n : D e t e r m i n i n g i f a n A I s y s t e m ’s
performance is due to skill or chance. Example: Inferring
customer preferences from a sample of purchase data.

11
Hypothesis Testing
A method to test claims about data using statistical
evidence.
Process:
1. State null hypothesis (H₀ , e.g., "no difference in model
performance"),
2. compute a test statistic,
3. and compare to a threshold (p-value < α, typically 0.05).
• Example: Test if a new AI algorithm outperforms an old
one.

12
Confidence Interval
A range estimating a parameter with a confidence level
( e. g. , 9 5 % C I f o r a c c u ra c y : 8 8 % – 9 2 % ) . I n d i c a t e s
uncertainty in estimates.
• Example: Reporting an AI’s error rate with a range to
show reliability.
Importance in AI:
• Hypothesis testing validates improvements (e.g., “Is this
model significantly better?”).
• Confidence intervals quantify uncertainty, ensuring
trustworthy AI deployment.
13
Conclusion on Probability and Statistics
• Probability and statistics are essential for AI to model
uncertainty, learn from data, and evaluate performance.
• Bayes' theorem enables adaptive reasoning, critical for
real-time AI applications.
• Distributions like Gaussian, Bernoulli, and Poisson
underpin data modeling in AI tasks.
• Statistical inference, hypothesis testing, and confidence
intervals ensure AI systems are robust and reliable.

14
Calculus
The mathematical study of change and accumulation, divided into:
• Differential Calculus: Focuses on rates of change (e.g., slopes, derivatives).
• Integral Calculus: Deals with accumulation (e.g., areas, sums over intervals).

Importance in AI:
• Enables optimization of models by finding minima or maxima (e.g.,
minimizing error in machine learning).
• Powers gradient-based methods, the backbone of training algorithms like
neural networks.
• Helps model continuous relationships in data, critical for tasks like
regression and deep learning.

15
Gradient
A vector of partial derivatives representing the
direction and rate of steepest increase of a function.
For a function f(x, y), the gradient is ∇f = (∂f/∂x,
∂f/∂y).
• Optimization: The process of finding the best
solution (e.g., minimum or maximum) of a function,
often called the objective or loss function in AI.
• Connection: Gradients guide optimization by
indicating how to adjust variables to reduce error
or improve performance.
16
Optimization
The gradient points uphill; its negative points downhill. In optimization,
we follow the negative gradient to minimize a function.
• Example: For f(x) = x², the derivative is f'(x) = 2x. At x = 2, the
gradient is 4, so moving opposite (downhill) reduces f(x).
• Application in Optimization:
Gradient Descent: Iteratively update parameters: xₙ ₑₓ = xₒₗ ₖ - η•∇f,
where η is the learning rate.
• Used to minimize loss functions in AI (e.g., mean squared error in
regression).
• Intuition: Think of gradient descent as a hiker descending a foggy
mountain by feeling the steepest slope underfoot.

17
Examples and Practice Problems
Example 1: Minimize f(x) = x² + 2x + 1.
• Derivative: f'(x) = 2x + 2.
• Set f'(x) = 0: 2x + 2 = 0, so x = -1 (minimum).
• Gradient descent: Start at x = 1, f'(1) = 4, step with η
= 0.1: x = 1 - 0.1•4 = 0.6.
Practice Problem: Use gradient descent to minimize
f(x) = 3x² - 6x + 5. Start at x = 0, η = 0.1, 2 steps.
18
Optimization and Its Application in AI
Finding the parameter values that minimize (or maximize) an
objective function. In AI, the objective is often a loss function
(e.g., difference between predicted and actual values).
Application in AI:
• Linear Regression: Minimize squared error to fit a line to data.
• Neural Networks: Adjust weights to minimize prediction error.
• Reinforcement Learning: Maximize cumulative reward.
• Example: Training a model to predict house prices by
minimizing the error between predicted and actual prices.

19
Convex Optimization:
A function is convex if any line segment between two
points on its graph lies above or on the graph (e.g., f(x) =
x²).
Importance:
• Guarantees a single global minimum, making
optimization reliable and efficient.
• In AI: Convex loss functions (e.g., in logistic regression)
ensure gradient descent finds the best solution.
• Local Minima: Points where the function is lower than
nearby points but not the global minimum.
20
Calculus in AI
• Forward Pass: Compute predictions using a function of inputs and weights.
• Backward Pass (Backpropagation): Use gradients to update weights by
minimizing loss.
• Chain rule: ∂L/∂w = ∂L/∂y·∂y/∂w propagates errors backward.
Deep Learning:
Neural networks are compositions of functions (layers), and calculus optimizes
millions of parameters.
• Example: In a 3-layer network, gradients adjust weights to reduce
classification error.
Neural Networks:
• Loss function (e.g., cross-entropy) is differentiated w.r.t. each weight,
enabling learning.

21
Example:
Linear Regression:
• Loss: L = (1/n)∑(yᵢ - (wxᵢ + b))²
• Gradients: ∂L/∂w = -(2/n)∑xᵢ(yᵢ - (wxᵢ + b)),
∂L/∂b = -(2/n)∑(yᵢ - (wxᵢ + b))
• Optimization: Gradient descent fits the line.

22
Case Study
Image Classification with CNNs:
• Convolutional Neural Networks (CNNs) use calculus to
optimize filters and weights.
• Loss: Cross-entropy between predicted and true labels.
• Backpropagation adjusts millions of parameters to recognize
patterns (e.g., edges, shapes).
GPT Models:
• Transformer-based models (like ChatGPT) rely on gradient
descent to optimize attention weights, enabling language
understanding.

23
References
• Grok – x.com/i/Grok

24

You might also like