Module 5
Module 5
Module 5
What is a vector ?
*A vector is a mathematical quantity with both magnitude and direction.*
The magnitude of vector u, denoted |u|, is it's length.
One way to express its direction is to give the angle it makes with a horizontal ray (that points to the
right) that is parallel to the positive x-axis. This is called the vector's direction angle.
display.IFrame(src='https://www.geogebra.org/classic/xbunqxhv', width=1000,
height=800)
Vector Addition
The sum, u + v, of two vectors, u and v, is constructed by placing u, at some arbitrary location, and then placing
v such that v's tail point coincides with u's tip point, and u + v is the vector that starts at u's tail point, and ends at
v's tip point.
Adding Vectors Geometrically
Let us discover how we geometrically add any 2 vectors.
1. Determine the components of vectors u and v by using the purple and green sliders.
2. Slide the black slider (bottom right) to geometrically form vectors
3. Slide the black slider (bottom right) to geometrically form vectors u and v.
4. Move the YELLOW POINT (initial point of v) ON TOP OF the ORANGE POINT (terminal point of
u).
5. Slide the additional slider that appears once you complete the previous step
The vector that appears will be the the
6. Slide the additional slider that appears. The vector that appears will be the the RESULTANT
VECTOR.
7. In this case, the RESULTANT VECTOR is the sum of vectors u and v.
Vector Subtraction
display.IFrame(src='https://www.geogebra.org/classic/thkzrnqa', width=1000,
height=800)
Vector Projections
The projection of u(tree) onto v(ground) is another vector that is parallel to v and has a length equal to what
vector u's shadow would be (if it were cast onto the ground).
Orthogonality
2 vectors are orthogonal if they are perpendicular to each other. i.e. the dot product of the two vectors
is zero.
display.IFrame(src='https://www.geogebra.org/classic/hz92uxy3', width=1000,
height=800)
Vector Norms
L1 norm
It is defined as the sum of magnitudes of each component a = ( a1 , a2 , a3 )
L1 norm of vector a = |a1| + |a2| + |a3|
L2 norm
It is defined as the square root of sum of squares of each component
L2 norm of vector a = √( a12 + a22 + a32 )
Types of Matrices:
Square matrices: A square matrix has an equal number of rows and columns.
Symmetric matrices: A symmetric matrix is a square matrix that is equal to its transpose.
Diagonal matrices: A diagonal matrix is a square matrix in which all the off-diagonal elements are zero.
Identity matrices: An identity matrix is a square matrix in which all the diagonal elements are equal to one, and
all the off-diagonal elements are equal to zero.
Singular matrices: A singular matrix is a matrix that does not have an inverse.
Matrix Operations:
There are several operations that can be performed on matrices:
Addition: To add two matrices, we add their corresponding elements.
Subtraction: To subtract two matrices, we subtract their corresponding elements.
Multiplication: To multiply two matrices, we multiply their corresponding elements and sum the products.
Transpose: To find the transpose of a matrix, we interchange its rows and columns.
Inverse: To find the inverse of a matrix, we use a formula that involves the determinant of the matrix.
Examples
Addition:
A =[2 4
1 3]
B =[5 1
2 7]
A + B =[2+5 4+1
1+2 3+7]= [7 5
3 10]
Subtraction:
A =[1 2
3 4]
B =[5 6
7 8]
A - B =[1−5 2−6
3−7 4−8]= [−4 −4
−4 −4]
Multiplication:
A =[1 2
3 4]
B =[5 6
7 8]
A * B =[1×5+2×7 1×6+2×8
3×5+4×7 3×6+4×8]= [19 22
43 50]
Transpose:
A =[1 2
3 4
5 6]
[1 3 5
2 4 6]
Inverse:
A =[2143]
we first calculate the determinant
=|2143|
=2×3−1×4 = 2
Since the determinant is nonzero, we know that is invertible. We can then use the following formula to find the
inverse o:
A(−1)=1/det(A)*adj(A)
=[3−1−42]=[32−12−21]
Therefore, the inverse of A is:
[32−12−21]
The formula for finding the determinant of a larger square matrix is more complex and involves expanding the
matrix along any row or column using a method called cofactor expansion.
Example:
Suppose we have a 3x3 matrix A, where:
A =[123456789]
To find the determinant of A using the formula for a 3x3 matrix, we have:
|A|=1(5∗9−6∗8)−2(4∗9−6∗7)+3(4∗8−5∗7) = 0
*Since the determinant is zero, we know that the matrix A is singular and does not have an inverse.*
Inverse of a matrix
The inverse of a matrix is a matrix that, when multiplied by the original matrix, results in the identity matrix.
The identity matrix is a square matrix that has ones on the main diagonal and zeros elsewhere. The inverse of a
matrix is important in many applications in data science, such as solving systems of linear equations, finding
eigenvalues and eigenvectors, and inverting matrices to transform data.
Formula for Inverse:
The formula for finding the inverse of a matrix A is
A(−1)=1/det(A)*adj(A)
where |A| or det(A) is the determinant of A, and adj(A) is the adjoint of A. The adjoint of A is the transpose of
the cofactor matrix of A, where the cofactor matrix is formed by taking the determinant of each minor of A.
🔹 Eigenvalues and eigenvectors are important concepts in linear algebra and have various applications in data
science and machine learning.
🔹 An eigenvector of a square matrix A is a non-zero vector v such that when A is multiplied by v, the result is a
scalar multiple of v. The scalar multiple is called the eigenvalue of the matrix A.
🔹 Eigenvalues and eigenvectors are often used to transform and reduce the dimensionality of data in various
algorithms like PCA, SVD, and others.
🔹 The eigenvalues of a matrix can be found by solving the characteristic equation of the matrix, which is det(A -
λI) = 0, where det is the determinant of the matrix, A is the matrix, λ is the eigenvalue and I is the identity
matrix of the same size as A.
🔹 Once we find the eigenvalues, we can find the corresponding eigenvectors by solving the system of linear
equations (A - λI)x = 0.
🔹 Eigenvectors corresponding to different eigenvalues are linearly independent, and form the basis of the space
spanned by the matrix A.
🔹 Eigenvectors can also be normalized to have unit length, and thus form an orthonormal basis for the space.
🔹 A matrix is diagonalizable if it has n linearly independent eigenvectors, where n is the size of the matrix.
Consider the matrix A = [[3, 1], [1, 3], [2, 2]]. To find the SVD of A, we can follow these steps:
Compute A*TA and AA*T:
[148 814]
[101010 101010 101012]
Find the eigenvalues and eigenvectors of ��� and ���:
Eigenvalues of ���: 20,0
Eigenvectors of ���: [1/2 1/2], [1/2 −1/2]
Eigenvalues of ���: 22,12,0
Eigenvectors of ���: [1/2 1/2 0], [−1/2 1/2 0]
What is Differentiation?
Differentiation is a method of finding the rate at which a function changes over a certain period of
time. It is an important concept in calculus and is used to determine the maximum and minimum
values of a function.
The derivative of a function f(x) with respect to x is denoted by f'(x) or dy/dx.
Formulas:
1. Power Rule: ddxxn=nxn−1
2. Product Rule: ddx(f(x)g(x))=f′(x)g(x)+f(x)g′(x)
3. Quotient Rule: ddx(f(x)g(x))=f′(x)g(x)−f(x)g′(x)(g(x))2
4. Chain Rule: ddxf(g(x))=f′(g(x))g′(x)
Integration
Integration is the inverse process of differentiation. It involves finding the function whose derivative is a given
function. The symbol used to represent integration is the integral sign (∫), and the function to be integrated is
placed after the integral sign. The limits of integration are also specified, which determine the range of values
over which the integration is to be performed.
There are different techniques for integration, including:
Integration by Substitution:
Example: Evaluate the integral ∫2x⋅(x2+1)3dx.
u=x2+1
dx=du/2x
(x2+1)4/8+C
Integration by parts:
∫udv=uv−∫vdu.
Example: Evaluate the integral: ∫x⋅exdx.
Solution: Integration by parts involves choosing two functions to differentiate and integrate.
Let's choose u=x and dv=exdx.
Differentiate u to get du:
x⋅ex−ex+C
Partial fraction decomposition:
This technique is used for integrating rational functions, where the function is expressed as a sum of simpler
fractions.
Example: Decompose the rational function 3x+1x2+2x into partial fractions.
Solution: To decompose the rational function, we want to express it as a sum of simpler fractions. First, we
factor the denominator:
x2+2x=x(x+2).
Now, we want to express the given rational function as a sum of two partial fractions:
3x+1x2+2x=Ax+Bx+2.
Where A and B are constants that we need to find.
To solve for A and B, we'll find a common denominator on the right side:
3x+1x2+2x=A(x+2)x(x+2)+Bxx(x+2).
Now, we'll combine the fractions:
3x+1x2+2x=A(x+2)+Bxx(x+2).
We want the numerators to be the same, so we'll set up an equation:
3x+1=A(x+2)+Bx.
Now, let's solve for A and B:
3x+1=Ax+2A+Bx.
Equating coefficients of x terms:
3=A+B,
1=2A.
From the second equation, we find A=12.
Substitute A back into the first equation to find B:
3=12+B,
B=3−12,
B=52.
So, we have the partial fraction decomposition:
3x+1x2+2x=12x+52(x+2).
This is the partial fraction decomposition of the given rational function.
Trigonometric substitution:
This technique is used for integrating expressions that involve trigonometric functions.
Example: Evaluate the integral
∫x21+x4−−−−−√dx
Solution: We can use trigonometric substitution to simplify this integral. Let's use the
substitution x2=tan(θ) which gives us
dx=2xsec2(θ)dθ
Now, we need to express the entire integrand in terms of θ:
1+x4−−−−−√=1+tan2(θ)−−−−−−−−−√=sec2(θ)−−−−−−√=sec(θ)
The integral becomes:
∫x21+x4−−−−−√dx=∫x2sec(θ)⋅2xsec2(θ)dθ,=2∫x3dθ
Now, we have a simpler integral in terms of θ. We'll integrate with respect to θ:
2∫x3dθ=2⋅14x4+C=x42+C
where C is the constant of integration.
However, we need to convert our answer back to the original variable x. Recall that we used x2=tan(θ). So, we
can substitute back:
x42+C=(x2)2/2+C, =tan2(θ)2+C.
Since we want the integral in terms of x, we need to find θ.
From x2=tan(θ), we have θ=arctan(x2).
Substitute back:
tan2(θ)2+C=(x2)2/2+C=x42+C
So, the final result is:
∫x21+x4−−−−−√dx=x42+C
where C is the constant of integration.
Practice Problem:
Q. ∫xx2+1dx
Answer. Let u=x2+1, then du/dx=2x, which implies dx=du/(2x).
Substituting these values in the integral, we get:
Chain Rule:
The chain rule is a formula for computing the derivative of the composition of two or more functions. In other
words, it tells us how to take the derivative of a function that is formed by nesting one function inside another.
Formula:
Let f and g be functions, and let y=f(g(x)) be their composite function. Then the derivative of y with respect
to x is given by:
If y = f(g(x)), then
dydx=dydu⋅dudx=f′(g(x))⋅g′(x)
dxdy=1dydx=1f′(g(x))⋅g′(x)=1dydu⋅dudx
Partial Derivatives:
A partial derivative of a multivariable function is the derivative with respect to one of its variables, keeping all
other variables constant. For example, if we have a function f(x,y), the partial derivative of f with respect to x is
denoted as ∂f∂x and is defined as the rate of change of f as x changes, with y held constant.
Notation:
The partial derivative of a function f(x,y) with respect to x is denoted by ∂f∂x. The symbol ∂ is used to indicate
a partial derivative, and it is pronounced "dee" or "del". To denote the partial derivative of f with respect to y,
we write ∂f∂y.
Interpretation:
The partial derivative of a function gives us the rate at which the function changes as one of its input variables
changes, while holding all other input variables constant. It tells us how much the output of the function changes
when we change only one of the input variables.
Rules:
Just like ordinary derivatives, partial derivatives satisfy many of the same rules. Here are a few of the most
important ones:
1. The sum rule: ∂(f+g)∂x=∂f∂x+∂g∂x
2. The product rule: ∂(fg)∂x=f∂g∂x+g∂f∂x
3. The chain rule: ∂f(u,v)∂x=∂f∂u∂u∂x+∂f∂v∂v∂x
Application:
Partial derivatives are used in many areas of mathematics and science, including optimization, physics,
economics, and engineering. For example, in optimization problems, we may need to find the maximum or
minimum value of a function of several variables. To do this, we can take partial derivatives of the function with
respect to each variable, set them equal to zero, and solve for the variables.
Suppose we have the function f(x,y) = x2y+y3, and we want to find the partial derivatives with respect to
x and y at the point (1,2).
The partial derivative with respect to x can be found by treating y as a constant and taking the derivative of the
function with respect to x:
∂f∂x=2xy
At the point (1,2), this gives us:
∂f∂x(1,2)=2(1)(2)=4
The partial derivative with respect to y can be found by treating x as a constant and taking the derivative of the
function with respect to y:
∂f∂y=x2+3y2
At the point (1,2), this gives us:
∂f∂y(1,2)=(1)2+3(2)2=13
Therefore, the partial derivatives of f(x,y) with respect to x and y at the point (1,2) are 4 and 13, respectively.
Chapter 2: Descriptive Statistics
Probability Theory
Probability theory is a branch of mathematics that deals with the study of randomness and
uncertainty. It provides a framework for understanding and quantifying the likelihood of events,
making it an essential tool in many fields such as statistics, finance, and engineering.
In probability theory, we define events as subsets of a sample space, which is the set of all possible
outcomes of a random experiment. We use probabilities to assign a numerical measure to the
likelihood of each event occurring. Probability is always a number between 0 and 1, with 0 indicating
that an event is impossible and 1 indicating that an event is certain.
Probability theory provides us with many useful tools for analyzing and quantifying uncertainty, such
as probability distributions, random variables, and expected values. By understanding these concepts,
we can make more informed decisions and better manage risk in our daily lives.
One common example of probability theory in action is in casino games such as roulette or blackjack.
The rules of these games are based on the laws of probability, and understanding these laws can help
players make more strategic bets and increase their chances of winning.
import random
# Simulate rolling a six-sided die
die_roll = random.randint(1, 6)
print("You rolled a", die_roll)
# Simulate flipping a coin
coin_flip = random.choice(["heads", "tails"])
print("The coin landed on", coin_flip)
You rolled a 2
The coin landed on heads
This code uses the random library in Python to simulate the outcomes of rolling a die or flipping a coin, which
are classic examples of random experiments. By repeating these experiments many times, we can estimate the
probabilities of different outcomes and gain a deeper understanding of probability theory.
Probability axioms:
Non-negativity: The probability of an event can never be negative, i.e., P(A) >= 0.
Normalization: The sum of probabilities of all possible outcomes of an experiment is equal to 1, i.e., P(S) = 1
where S is the sample space.
Additivity: The probability of the union of two mutually exclusive events is equal to the sum of their individual
probabilities, i.e., P(A or B) = P(A) + P(B) for A and B that are mutually exclusive.
Probability rules:
Complement rule: The probability of an event not occurring is 1 minus the probability of the event occurring,
i.e., P(A') = 1 - P(A).
Union rule: The probability of the union of two events A and B is given by P(A or B) = P(A) + P(B) - P(A and
B), where P(A and B) is the probability of the intersection of A and B.
Conditional probability rule: The probability of an event A given that event B has occurred is given by P(A|B)
= P(A and B) / P(B), where P(A and B) is the joint probability of A and B occurring together and P(B) is the
probability of event B occurring.
Multiplication rule: The probability of two events A and B occurring together is given by P(A and B) = P(A|B)
* P(B) or P(B|A) * P(A), where P(A|B) and P(B|A) are the conditional probabilities of A given B and B given
A, respectively.
Conditional Probability:
Conditional probability is the probability of an event A given that another event B has already occurred. It is
denoted by P(A|B), which means the probability of A given B. The formula for conditional probability is:
P(A|B) = P(A ∩ B) / P(B)
Where P(A ∩ B) is the probability of both A and B occurring, and P(B) is the probability of B occurring.
Bayes' Theorem:
Bayes' theorem is a fundamental concept in probability theory, and it is used to calculate conditional
probabilities. It states that the probability of an event A given that another event B has occurred can be
calculated as:
P(A|B) = P(B|A) * P(A) / P(B)
Where P(B|A) is the probability of B given A has occurred, P(A) is the prior probability of A, and P(B) is the
prior probability of B. Bayes' theorem can be used to update our beliefs about the probability of an event as new
information becomes available.
Suppose you have two decks of cards, one red and one blue, with 52 cards each. You draw a card from the red
deck and observe that it is a heart. What is the probability that the card is an ace?
Here, A is the event of drawing an ace, and B is the event of drawing a heart. The probability of drawing an ace
and a heart is P(A ∩ B) = 1/52, and the probability of drawing a heart is P(B) = 13/52. Using the formula for
conditional probability, we get P(A|B) = P(A ∩ B) / P(B) = (1/52) / (13/52) = 1/13, which is the probability of
drawing an ace given that the card is a heart.
Suppose a test for a disease is 95% accurate, meaning that if a person has the disease, the test will correctly
identify it 95% of the time. However, the test also has a false positive rate of 5%, meaning that if a person does
not have the disease, the test will incorrectly identify them as having it 5% of the time. If 1% of the population
has the disease, what is the probability that a person who tests positive actually has the disease?
Here, A is the event of having the disease, and B is the event of testing positive. The probability of having the
disease is P(A) = 0.01, and the probability of testing positive given that the person has the disease is P(B|A) =
0.95. The probability of testing positive given that the person does not have the disease is P(B|¬A) = 0.05. Using
Bayes' theorem, we can calculate the probability of having the disease given that the person tests positive as
P(A|B) = P(B|A) * P(A) / (P(B|A) * P(A) + P(B|¬A) * P(¬A)) = (0.95 * 0.01) / (0.95 * 0.01 + 0.05 * 0.99) =
0.16, which is the probability of having the disease given that the person tests positive.
Random variables can be classified into two types: discrete and continuous. A discrete random variable can take
on a countable number of values, while a continuous random variable can take on any value in a continuous
range. Examples of discrete random variables include the number of heads in multiple coin flips, while an
example of a continuous random variable is the height of a randomly selected person.
The properties of a random variable can be described by its probability distribution, which specifies the
probabilities of each possible value of the variable. The probability distribution of a discrete random variable
can be represented by a probability mass function (PMF), while the probability distribution of a continuous
random variable can be represented by a probability density function (PDF).
The expected value of a random variable is the weighted average of its possible values, with the weights given
by their respective probabilities. It is a measure of the central tendency of the variable's probability distribution.
The variance of a random variable measures how much its values deviate from its expected value, and it is a
measure of the variability of the variable's probability distribution.
In Python, we can use libraries like NumPy and SciPy to work with random variables and their properties. For
example, we can generate random samples from a given probability distribution using the random module of
NumPy, and we can calculate the expected value and variance of a given distribution using the functions in the
SciPy library.
Here's an example code snippet to generate a random sample of size 10 from a normal distribution with
mean 0 and variance 1 using NumPy:
import numpy as np
sample = np.random.normal(loc=0, scale=1, size=10)
print(sample)
here's an example code snippet to calculate the expected value and variance of a normal distribution with mean
0 and variance 1 using SciPy:
from scipy.stats import norm
mu, var = norm.stats(loc=0, scale=1, moments='mv')
print(f"Expected value: {mu}, Variance: {var}")
Law of large numbers:
The law of large numbers is a fundamental concept in probability theory that describes the relationship between
the sample size of a random variable and its expected value.
1. The law of large numbers states that as the sample size of a random variable increases, the sample
mean will approach the expected value of the variable. This means that the more data you have, the
more accurate your estimate of the true underlying probability distribution will be.
2. This law applies to both discrete and continuous random variables, and it is a key concept in many
areas of statistics and machine learning.
3. The law of large numbers is closely related to the central limit theorem, which states that the
distribution of the sample means approaches a normal distribution as the sample size increases.
4. The law of large numbers has important implications for decision-making and risk management. It
suggests that making decisions based on a large sample size is generally more reliable and accurate
than making decisions based on a small sample size.
5. In practice, the law of large numbers is often used in simulations and statistical modeling to generate
more accurate estimates of probabilities and other statistical measures. For example, if we want to
estimate the probability of a rare event occurring, we can use the law of large numbers to simulate
many trials and calculate the proportion of trials in which the event occurs.
Here is an example of using the law of large numbers to estimate the probability of rolling a 6 on a fair six-
sided die:
import random
def roll_die(n):
count = 0
for i in range(n):
if random.randint(1, 6) == 6:
count += 1
return count / n
print(roll_die(10)) # output: 0.2
print(roll_die(100)) # output: 0.12
print(roll_die(1000)) # output: 0.173
print(roll_die(10000)) # output: 0.1676
import numpy as np
import matplotlib.pyplot as plt
# Generate random samples
n_samples = 100000
sample_sizes = [1, 2, 5, 10, 50]
samples = np.random.uniform(size=(n_samples, max(sample_sizes)))
# Calculate sample means
sample_means = [np.mean(samples[:, :n], axis=1) for n in sample_sizes]
# Plot distribution of sample means
fig, axs = plt.subplots(ncols=len(sample_sizes), figsize=(15, 5))
for i, ax in enumerate(axs):
ax.hist(sample_means[i], bins=50, density=True)
ax.set_title(f"Sample size = {sample_sizes[i]}")
plt.show()
This code generates 100,000 random samples from a uniform distribution, calculates the sample means for
various sample sizes, and plots the distribution of the sample means. As you can see from the resulting plots, as
the sample size increases, the distribution of the sample means becomes increasingly normal, demonstrating the
central limit theorem in action.
Data Summarization
Data summarization is the process of presenting a large dataset in a concise and meaningful way.
Here are some important concepts and techniques in data summarization:
Descriptive statistics:
These are statistical measures that describe the main features of a dataset, including measures of
central tendency (mean, median, mode) and measures of dispersion (range, variance, standard
deviation).
Data visualization:
This involves representing data in a graphical or pictorial form, which can help to reveal patterns and
relationships that may not be apparent from numerical summaries alone. Examples include
histograms, scatter plots, and box plots.
Aggregation:
This involves combining data into groups or categories to facilitate analysis. Common methods of
aggregation include grouping by time periods, geographic regions, or other relevant factors.
Sampling:
This involves selecting a subset of data from a larger dataset in order to gain insights about the larger
population. Various sampling techniques can be used depending on the nature of the data and the
research question.
Dimensionality reduction:
This involves reducing the number of variables in a dataset while still retaining as much information as
possible. Techniques such as principal component analysis and factor analysis can be used for this
purpose.
Machine learning:
This involves using algorithms to automatically identify patterns and relationships in a dataset.
Techniques such as clustering, regression, and classification can be used to summarize data and make
predictions.
Here's an example of using Python's pandas library to calculate some basic descriptive statistics for a
dataset:
import pandas as pd
*These statistics can be calculated using Python libraries such as NumPy, Pandas, and SciPy. Here's an
example using NumPy:*
import numpy as np
# Mean
mean = np.mean(data)
print("Mean:", mean)
# Median
median = np.median(data)
print("Median:", median)
# Range
range = np.ptp(data)
print("Range:", range)
# Variance
variance = np.var(data)
print("Variance:", variance)
# Standard deviation
std_dev = np.std(data)
print("Standard deviation:", std_dev)
Measures of central tendency: mean, median, and mode:
Measures of central tendency are statistical measures that determine the center of a dataset. The three most
common measures of central tendency are mean, median, and mode.
Here are some important points to consider about these measures:
Mean:
It is the arithmetic average of a dataset and is calculated by summing all values in the dataset and dividing by
the number of observations. Mean can be sensitive to outliers and extreme values in a dataset. The formula for
calculating the mean is:
mean = (sum of all values) / (number of observations)
Median:
It is the middle value in a dataset when the values are arranged in ascending or descending order. Median is less
sensitive to outliers compared to mean. In case of even number of observations, the median is calculated as the
average of the two middle values.
Mode:
It is the most frequently occurring value in a dataset. Mode can be used for both numerical and categorical data.
A dataset can have one or more modes, or it may have no mode at all.
*Here are some examples of code samples in Python to calculate these measures using NumPy library:*
import numpy as np
# Example dataset
data = [1, 2, 3, 4, 5, 5, 6, 7, 8, 8, 8]
# Mean
mean = np.mean(data)
print("Mean:", mean)
# Median
median = np.median(data)
print("Median:", median)
Measures of dispersion: Variance, Standard Deviation, and Range:
Measures of dispersion are statistical values that help to describe how spread out a dataset is.
Range is the simplest measure of dispersion and is defined as the difference between the maximum and
minimum values in a dataset.
Variance is a measure of how much the data points are dispersed around the mean value. It is calculated by
taking the average of the squared differences between each data point and the mean.
Standard deviation is another measure of dispersion that indicates how much the data deviates from the mean
value. It is the square root of the variance and is often used as a more interpretable measure of dispersion
compared to variance.
Kurtosis can be calculated using the kurtosis() function in the scipy.stats module in Python. The
function takes an array of numbers as input and returns the kurtosis value.
Example-
import numpy as np
from scipy.stats import kurtosis
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
kurt = kurtosis(data)
print("Kurtosis:", kurt)
Skewness and kurtosis are useful in understanding the shape of a distribution and can help in
choosing appropriate statistical methods for data analysis.
Types of Skewness:
Positive skewness:
A distribution is said to be positively skewed if its tail is longer on the positive side (to the right) of the
distribution. This means that the majority of the data is on the left side of the distribution.
Negative skewness:
A distribution is said to be negatively skewed if its tail is longer on the negative side (to the left) of the
distribution. This means that the majority of the data is on the right side of the distribution.
Zero skewness:
A distribution is said to have zero skewness if it is symmetric around its mean. This means that the left
and right tails are of equal length.
Types of Kurtosis:
Leptokurtic:
A distribution is said to be leptokurtic if it has a high degree of peakedness. This means that the data
is heavily concentrated around the mean, and the tails are relatively thin.
Mesokurtic:
A distribution is said to be mesokurtic if it has a moderate degree of peakedness. This means that the
data is moderately concentrated around the mean, and the tails are neither too thick nor too thin.
Platykurtic:
A distribution is said to be platykurtic if it has a low degree of peakedness. This means that the data is
widely spread out, and the tails are relatively thick.
More Examples:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
# example data
data = [1, 2, 3, 4, 5]
# calculate mean and standard deviation
mu, std = norm.fit(data)
# plot histogram with normal distribution curve
plt.hist(data, bins=10, density=True, alpha=0.6, color='g')
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x, mu, std)
plt.plot(x, p, 'k', linewidth=2)
plt.title("Histogram with Normal Distribution Curve")
plt.show()
A discrete probability distribution is a statistical function that describes the likelihood of obtaining a
particular value or set of values from a discrete set of possible values.
Examples of discrete probability distributions include the binomial distribution, the Poisson
distribution, and the geometric distribution.
The binomial distribution models the probability of a binary outcome (success/failure) given a fixed
number of trials and a known probability of success.
The Poisson distribution models the probability of a certain number of events occurring in a fixed
interval of time or space, given a known average rate of occurrence.
The geometric distribution models the probability of the number of trials required to obtain the first
success in a sequence of independent trials, given a known probability of success.
Discrete probability distributions can be used in a wide range of fields, including finance, engineering,
and biology, to model real-world phenomena and make predictions based on probability.
Here's an example code snippet in Python for generating a random sample of 100 values from
a binomial distribution with 10 trials and a probability of success of 0.3:
import numpy as np
sample = np.random.binomial(n=10, p=0.3, size=100)
print(sample)
Here's an example code in Python that calculates the CDF of a normal distribution:
import numpy as np
from scipy.stats import norm
mu, sigma = 0, 1 # mean and standard deviation
x = np.linspace(-3,3,1000) # create 1000 evenly spaced points from -3 to 3
cdf = norm.cdf(x, mu, sigma) # calculate the CDF
import matplotlib.pyplot as plt
plt.plot(x, cdf)
plt.title('Normal Cumulative Distribution Function')
plt.xlabel('X')
plt.ylabel('F(X)')
plt.show()
This code generates a plot of the CDF of a normal distribution with mean 0 and standard deviation 1, for values
of X ranging from -3 to 3.
Expected Value and Variance:
Expected value and variance are important concepts in probability theory and statistics. They are used to
measure the central tendency and variability of random variables. The expected value is the average value of a
random variable, while variance measures how spread out the data is around the expected value.
Expected Value:
The expected value of a discrete random variable is the sum of the product of each possible value of the variable
and its probability. Mathematically, it can be represented as E(X)=Σxi∗P(X=xi), where xi is the possible value
of X and P(X=xi) is the probability of X taking the value xi.
Variance:
The variance of a random variable measures how much the values of the variable deviate from its expected
value. It is calculated by taking the sum of the squared differences of each value from the expected value,
multiplied by the probability of that value. Mathematically, it can be represented
as Var(X)=Σ(xi−E(X))2∗P(X=xi), where xi is the possible value of X, E(X) is the expected value of X, and
P(X=xi) is the probability of X taking the value xi.
Let's say we have a dice that we roll and get a random number between 1 and 6. We can calculate the
expected value and variance of this random variable as follows:
import numpy as np
# Define the possible values and their probabilities
values = [1, 2, 3, 4, 5, 6]
probs = [1/6, 1/6, 1/6, 1/6, 1/6, 1/6]
# Calculate the expected value
expected_value = np.sum(np.multiply(values, probs))
print("Expected Value:", expected_value)
# Calculate the variance
variance = np.sum(np.multiply(np.square(np.subtract(values,
expected_value)), probs))
print("Variance:", variance)
Binomial Distribution:
Binomial distribution is a type of discrete probability distribution that deals with the probability of a certain
number of successes in a fixed number of independent trials, each with the same probability of success.
Properties:
1. The trials must be independent.
2. The probability of success, denoted by p, must be constant for each trial.
3. The number of trials, denoted by n, must be fixed.
4. The random variable X, which represents the number of successes, can take on integer values from 0 to
n.
Probability Mass Function (PMF):
The PMF of a binomial distribution is given by:
Using the factorial function: P(X=k)=n!k!(n−k)!pk(1−p)n−k.
Mean and Variance:
The mean of a binomial distribution is given by:
μ=np
The variance of a binomial distribution is given by:
σ2=np(1−p)
Example:
Suppose we have a coin that has a probability of landing heads (success) of 0.6. We flip the coin 10 times. What
is the probability of getting exactly 5 heads?
Solution:
Poisson Distribution:
Poisson distribution is a discrete probability distribution that describes the number of events occurring in a fixed
interval of time or space, given the average rate of occurrence of the events.
Parameters:
The Poisson distribution is determined by a single parameter, λ, which represents the average rate of occurrence
of the event in the given interval.
Probability Mass Function:
The probability mass function (PMF) of a Poisson distribution is given by:
P(X=k)=(λk∗e(−λ))/k!
where X is the random variable representing the number of events, k is the number of events, λ is the average
rate of occurrence, and e is the base of the natural logarithm.
Uses:
The Poisson distribution is commonly used in many fields such as biology, physics, finance, and economics to
model the occurrence of rare events.
Some examples of the Poisson distribution in real-life applications include:
1. The number of customers arriving at a service desk in a given time interval.
2. The number of defects in a manufacturing process.
3. The number of earthquakes in a region over a given time period.
Geometric Distribution:
Geometric Distribution is a probability distribution that represents the number of Bernoulli trials needed to
obtain the first success in a sequence of independent trials.
Formula:
The probability mass function (PMF) of the Geometric Distribution is given by P(X=k)=q(k−1)p, where X is
the number of trials needed to obtain the first success, p is the probability of success in each trial, and q=1-p is
the probability of failure.
Mean and Variance:
The mean of the Geometric Distribution is μ=1/p and the variance is σ2=q/p2
Example:
Suppose we have a coin with probability of heads p=0.3. We want to find the probability of getting heads for the
first time on the third flip. Using the Geometric Distribution, we can calculate P(X=3)=(1−0.3)2∗0.3=0.147
Uses:
The Geometric Distribution is commonly used in situations where we are interested in the number of trials
needed to obtain the first success, such as in models for the time to failure of a product or the number of calls
before a customer reaches a call center.
To generate a random sample of size n from a Geometric Distribution with probability of success p in Python,
we can use the numpy.random.geometric function:
import numpy as np
p = 0.3
n = 1000
samples = np.random.geometric(p, size=n)
Hypergeometric Distribution
Hypergeometric Distribution is a probability distribution used to model the probability of drawing a specified
number of objects from a finite population without replacement. It is a discrete probability distribution and is
widely used in statistical inference, sampling theory, and quality control.
Definition:
The hypergeometric distribution describes the probability of k successes in n draws from a finite population size
N with M total successes and N - M total failures, where the draws are made without replacement.
Formula:
The probability mass function (PMF) for Hypergeometric Distribution is given by:
(nk)=n!k!(n−k)!
Where, k = number of successes
n = number of draws
M = total number of successes in the population
N = population size
Properties:
The mean of the hypergeometric distribution is given by E(X) = n * M / N.
The variance of the hypergeometric distribution is given by
Var(X)=n∗M∗(N−M)∗(N−n)/(N2∗(N−1))
In Python, you can use the scipy.stats module to calculate the PMF, mean, and variance of the hypergeometric
distribution.
from scipy.stats import hypergeom
# Define the parameters
M = 10 # Total number of successes
N = 15 # Population size
n = 3 # Number of draws
k = 2 # Number of successes
# Calculate the PMF
pmf = hypergeom.pmf(k, N, M, n)
print("PMF:", pmf)
# Calculate the mean and variance
mean = hypergeom.mean(N, M, n)
var = hypergeom.var(N, M, n)
print("Mean:", mean)
print("Variance:", var)
Here is an example code snippet in Python that calculates the probability mass function of a Poisson
distribution with a given mean:
import math
def poisson_pmf(mean, k):
return math.exp(-mean) * (mean ** k) / math.factorial(k)
mean = 3
k = 2
pmf = poisson_pmf(mean, k)
print(pmf)
Activity 1:
A fair six-sided die is rolled two times. What is the probability of getting a sum of 7?
Activity 2:
Suppose the heights of a group of people are normally distributed with a mean of 68 inches and a standard
deviation of 3 inches. What is the probability that a randomly selected person from this group is shorter than 70
inches tall?
Activity 3:
A call center receives an average of 30 calls per hour. What is the probability that the call center will receive
exactly 40 calls in a given hour, assuming that the number of calls follows a Poisson distribution?
Activity 4:
A coin is tossed 12 times. What is the probability of getting exactly 7 heads?
Some examples of continuous probability distributions are Normal Distribution, Uniform Distribution,
Exponential Distribution, Beta Distribution, Gamma Distribution, and Weibull Distribution.
*Here is an example of calculating the PDF and CDF of a Normal Distribution using Python:*
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
mu = 0
sigma = 1
x = np.linspace(-5, 5, 1000)
pdf = norm.pdf(x, mu, sigma)
cdf = norm.cdf(x, mu, sigma)
plt.plot(x, pdf, label='PDF')
plt.plot(x, cdf, label='CDF')
plt.legend()
plt.show()
This code generates a plot of the PDF and CDF of a Normal Distribution with mean 0 and standard
deviation 1, for values of x ranging from -5 to 5.
Here's an example code snippet to plot the CDF of a normal distribution with mean 0 and
standard deviation 1
from scipy.stats import norm
import matplotlib.pyplot as plt
x = np.linspace(-4, 4, num=100)
y = norm.cdf(x, loc=0, scale=1)
plt.plot(x, y)
plt.title('Cumulative Distribution Function of Normal Distribution')
plt.xlabel('x')
plt.ylabel('Cumulative Probability')
plt.show()
This will plot the CDF of the normal distribution. We can see how the probability of observing a value of X less
than or equal to a certain value changes as we move along the x-axis.
Why is CDF important?
The CDF is an important concept in probability theory and statistics because it allows us to calculate various
properties of a probability distribution. For example, we can use the CDF to calculate the probability of
observing a value of the random variable X within a certain range of values. We can also use the CDF to
calculate the median and quartiles of a probability distribution.
example code snippet in Python to generate random numbers from a uniform distribution:
import random
# generate a random number between 0 and 1 from a uniform distribution
random.uniform(0, 1)
# generate a list of 10 random numbers between 1 and 100 from a uniform
distribution
[random.uniform(1, 100) for _ in range(10)]
Uniform distribution is commonly used in simulations, games, and optimization problems.
Normal Distribution
Normal distribution, also known as Gaussian distribution, is a continuous probability distribution that describes
the probability of a random variable taking on a range of values.
The probability density function (PDF) of a normal distribution is defined by two parameters: the mean (μ) and
the standard deviation (σ).
The shape of the normal distribution is symmetric around the mean, with the highest point of the curve
occurring at the mean.
The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1.
The cumulative distribution function (CDF) of the normal distribution is given by the area under the normal
curve to the left of a given value.
The normal distribution is widely used in statistical analysis and modeling, as many natural phenomena follow
this distribution.
The normal distribution can be simulated in Python using the numpy and scipy libraries.
Here's an example code snippet to generate random numbers from a normal distribution with a mean of 0 and a
standard deviation of 1:
import numpy as np
import scipy.stats as stats
# generate 1000 random numbers from a standard normal distribution
samples = np.random.normal(loc=0, scale=1, size=1000)
# calculate the mean and standard deviation of the samples
mean = np.mean(samples)
std_dev = np.std(samples)
# calculate the probability of a value being between 1 and 2 standard
deviations from the mean
prob = stats.norm.cdf(2) - stats.norm.cdf(-2)
print(f"Mean: {mean:.2f}, Standard Deviation: {std_dev:.2f}, Probability:
{prob:.2f}")
The normal distribution is also used in hypothesis testing, as it is often assumed that the distribution of sample
means is approximately normal, thanks to the central limit theorem.
Exponential Distribution
Exponential Distribution is a type of continuous probability distribution that describes the time between events
in a Poisson process, where events occur continuously and independently at a constant average rate.
Probability Density Function:
The probability density function of Exponential Distribution is given by f(x)=λe(−λx), where λ is the rate
parameter.
Cumulative Distribution Function:
The cumulative distribution function of Exponential Distribution is given by F(x)=1−e(−λx)
Expected Value and Variance:
The expected value of Exponential Distribution is E(X) = 1/λ, and the variance is Var(X)=1/λ2
Applications:
Exponential Distribution is used in a wide range of applications, such as:
Modeling the time between events, such as the time between calls in a call center.
Reliability analysis, such as predicting the time until a component fails.
Financial modeling, such as modeling the time between trades in financial markets.
To generate random numbers from Exponential Distribution in Python, we can use the
numpy.random.exponential() function. For example, the following code generates 1000 random numbers from
Exponential Distribution with a rate parameter of 0.5:
import numpy as np
x = np.random.exponential(scale=1/0.5, size=1000)
We can also plot the probability density function and cumulative distribution function using scipy.stats.expon
module:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
x = np.linspace(0, 5, 100)
pdf = stats.expon.pdf(x, scale=1/0.5)
cdf = stats.expon.cdf(x, scale=1/0.5)
plt.plot(x, pdf, label='PDF')
plt.plot(x, cdf, label='CDF')
plt.legend()
plt.show()
Gamma Distribution
Gamma Distribution is a continuous probability distribution that is used to model the waiting time until a
specified number of events occur in a Poisson process. It is a two-parameter family of distributions that has
many applications in fields such as physics, engineering, and finance.
Probability Density Function:
The probability density function (PDF) of the Gamma Distribution is given
by f(x)=(1/Γ(α)βα)x(α−1)e(−x/β) where α and β are the shape and scale parameters, respectively, and Γ is
the gamma function.
Cumulative Distribution Function:
The cumulative distribution function (CDF) of the Gamma Distribution is given
by F(x)=P(X≤x)=I(α,x/β) where I is the incomplete gamma function.
Expected Value and Variance:
The expected value of a Gamma Distribution is given by E(X) = αβ, and the variance is given by Var(X)=αβ2
Notes:
1. When α is an integer, the Gamma Distribution reduces to the Erlang Distribution, which models the
waiting time until a specified number of events occur in a Poisson process with a constant rate.
2. When α = 1, the Gamma Distribution reduces to the Exponential Distribution.
Beta Distribution
Beta distribution is a continuous probability distribution that is widely used in Bayesian statistics, modeling of
proportions and probabilities, and machine learning algorithms. It has two shape parameters, commonly denoted
by α and β, which determine the shape and spread of the distribution.
Probability Density Function (PDF):
The probability density function of the Beta distribution is given by the following equation:
f(x;α,β)=(1/B(α,β))∗x(α−1)∗(1−x)(β−1)
where x ∈ [0, 1], α > 0 and β > 0 are the shape parameters, and B(α, β) is the beta function.
Cumulative Distribution Function (CDF):
The cumulative distribution function of the Beta distribution does not have a closed-form solution. However, it
can be computed numerically using various numerical integration methods.
Important Properties:
The Beta distribution is a flexible distribution that can take on a wide range of shapes, from U-shaped to J-
shaped to bell-shaped, depending on the values of α and β.
When α = β = 1, the Beta distribution reduces to a uniform distribution over [0, 1].
The mean of the Beta distribution is given by α/(α+β), and the variance is given by α∗β/[(α+β)2∗(α+β+1)]
The mode of the Beta distribution is given by (α−1)/(α+β−2), provided that α > 1 and β > 1. If either α or β is
less than or equal to 1, the mode does not exist.
The Beta distribution is conjugate to the binomial distribution, which means that if the prior distribution of the
probability parameter of a binomial distribution is a Beta distribution, then the posterior distribution is also a
Beta distribution.
example of how to generate random samples from a Beta distribution using Python's NumPy library:
import numpy as np
# Set the shape parameters
alpha = 2
beta = 5
# Generate 1000 random samples from the Beta distribution
samples = np.random.beta(alpha, beta, size=1000)
# Compute the mean and variance of the samples
mean = np.mean(samples)
variance = np.var(samples)
print("Mean:", mean)
print("Variance:", variance)
Activity 1:
The lifetimes of a certain brand of light bulbs are normally distributed with a mean of 800 hours and a standard
deviation of 100 hours. What is the probability that a randomly selected light bulb will last more than 900
hours?
Activity 2:
The time between arrivals at a certain store follows an exponential distribution with a mean of 8 minutes. What
is the probability that the time between two consecutive arrivals will be less than 5 minutes?
Activity 3:
The scores on a standardized test are normally distributed with a mean of 75 and a standard deviation of 10.
What is the probability that a randomly selected student scores between 70 and 80 on the test?
Activity 4:
Let X be a continuous random variable with the PDF given by:
Joint distribution is an important concept in probability theory and statistics, and it refers to the distribution of
two or more random variables together.
Joint probability mass function:
If we have two discrete random variables, their joint distribution can be described by a joint probability mass
function, which gives the probability of each possible combination of values for the two variables.
Joint probability density function:
If we have two continuous random variables, their joint distribution can be described by a joint probability
density function, which gives the probability density at each point in the joint space.
an example of how to define a joint probability mass function for two discrete random variables X and Y in
Python:
import numpy as np
# Define the joint probability mass function
joint_pmf = np.array([[0.1, 0.2, 0.05],
[0.05, 0.15, 0.2],
[0.05, 0.1, 0.1]])
Marginal distribution:
The marginal distribution of a single random variable can be obtained from the joint distribution by summing (in
the case of discrete variables) or integrating (in the case of continuous variables) over the other variable(s).
Let's break it down further:
Discrete Variables: If you have a joint probability distribution for two or more discrete random variables, you
can obtain the marginal distribution of one of those variables by summing the joint probabilities over all
possible values of the other variable(s). This effectively "marginalizes out" the other variable(s), leaving you
with the distribution of the variable of interest.
Mathematically, if you have a joint distribution P(X, Y) for discrete variables X and Y, then the marginal
distribution P(X) can be obtained by summing over all possible values of Y:
P(X) = ∑ P(X, y) for all y
Continuous Variables: Similarly, for continuous random variables, you would integrate the joint probability
density function over the entire range of values of the other variable(s) to obtain the marginal density function of
the variable of interest.
Mathematically, if you have a joint density function f(X, Y) for continuous variables X and Y, then the marginal
density function f(X) can be obtained by integrating over the entire range of Y:
f(X) = ∫ f(X, y) dy for all y
In both cases, the resulting marginal distribution or density function will satisfy the properties of a valid
probability distribution: it will be non-negative, and the total probability (or density) will sum (or integrate) to 1.
Conditional distribution:
The conditional distribution of one random variable given another can also be obtained from the joint
distribution by dividing by the marginal distribution of the conditioning variable.
Let's delve into this further:
Discrete Variables: Suppose you have two discrete random variables, X and Y, and you want to find the
conditional distribution of X given a particular value y of Y. You can do this by dividing the joint probability
P(X, Y) by the marginal probability P(Y = y):
P(X = x | Y = y) = P(X = x, Y = y) / P(Y = y)
Here, P(X = x, Y = y) is the joint probability of X and Y taking on specific values x and y, and P(Y = y) is the
marginal probability of Y taking on the value y.
Continuous Variables: Similarly, if you have two continuous random variables, X and Y, and you want to find
the conditional density of X given a particular value y of Y, you can divide the joint density f(X, Y) by the
marginal density f_Y(y):
f(X = x | Y = y) = f(X = x, Y = y) / f_Y(y)
Here, f(X = x, Y = y) is the joint density of X and Y, and f_Y(y) is the marginal density of Y.
This process of obtaining conditional distributions is known as conditional probability or conditional density,
and it allows you to analyze how one random variable behaves given specific information about another random
variable.
Applications:
Joint distribution is used in a variety of fields, including engineering, physics, and finance, to model the
relationships between multiple random variables and make predictions or decisions based on those relationships.
🔹 Joint PMF is a function that maps each pair of values (x,y) from the joint domain of X and Y to the probability
that X = x and Y = y.
🔹 The joint PMF must satisfy two properties: non-negativity and summation.
🔹 The sum of all the probabilities in the joint PMF over all possible values of X and Y must be equal to 1.
🔹 Joint PMF can be used to calculate marginal probabilities, which are probabilities of individual random
variables.
🔹 Joint PMF can also be used to calculate conditional probabilities, which are probabilities of one random
variable given that another random variable has a particular value.
🔹 In code, joint PMF can be defined using a two-dimensional array or a dictionary of tuples.
The joint CDF can be calculated using Python's scipy.stats module. Here's an example code:
This code generates two discrete random variables X and Y with uniform probability distribution between 1 and
6. Then it defines a lambda function F to calculate the joint CDF for X and Y. Finally, it calculates the joint
CDF for x = 3 and y = 4.
Joint Probability Density Function (PDF) is a function that describes the probability of multiple random
variables taking specific values simultaneously. It is used in continuous probability distributions to model the
likelihood of two or more variables occurring together.
Here are some important points regarding Joint PDF:
1. Definition:
The Joint Probability Density Function is a function of two or more variables that define the probability of those
variables taking specific values simultaneously.
2. Properties:
The joint PDF must satisfy certain properties, such as non-negativity and the integral over the entire domain
equals to one.
3. Interpretation:
The value of the joint PDF at any point (x, y) represents the likelihood of the variables X and Y taking on values
close to x and y, respectively.
4. Relationship with Marginal PDFs:
By integrating the joint PDF over one of the variables, we obtain the marginal PDF for the other variable.
5. Relationship with Joint CDF:
The joint PDF is related to the joint CDF by taking the partial derivative with respect to each variable.
Examples:
Some examples of probability distributions that have a joint PDF include the bivariate normal distribution and
the joint uniform distribution.
import numpy as np
import matplotlib.pyplot as plt
# Define the joint PDF function
def joint_pdf(x, y):
return 3*(1-x**2)*y**2*np.exp(-(x**2+y**2)/2)
# Create a grid of x and y values
x = np.linspace(-3, 3, 100)
y = np.linspace(-3, 3, 100)
X, Y = np.meshgrid(x, y)
# Evaluate the joint PDF at each point in the grid
Z = joint_pdf(X, Y)
# Plot the Joint PDF as a contour plot
plt.contour(X, Y, Z)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Joint PDF of X and Y')
plt.show()
This code creates a contour plot of the Joint PDF of two random variables X and Y, with X values ranging from
-3 to 3 and Y values ranging from -3 to 3. The joint PDF function is defined using the input variables x and y,
and the plot is generated using the contour function from the matplotlib library. The resulting plot shows the
contours of the Joint PDF, with darker regions indicating higher probability density.
Marginal Distribution:
Marginal distribution refers to the probability distribution of one or more variables in a joint probability
distribution.
The marginal distribution is obtained by summing (or integrating, in the case of continuous variables) the joint
probability distribution over the variables not of interest.
Marginal distributions can be calculated for both discrete and continuous variables.
Marginal distribution is useful in simplifying the analysis of complex problems that involve multiple variables.
The marginal distribution of a single variable can be obtained by summing (or integrating) the joint distribution
over all possible values of the other variables.
The marginal distribution of multiple variables can be obtained by summing (or integrating) the joint
distribution over all possible values of the variables not of interest.
Marginal distributions are important in statistics, as they allow us to study the behavior of individual variables in
a multivariate distribution.
In Python, the marginal distribution can be calculated using the numpy.sum() function for discrete variables and
scipy.integrate.simps() function for continuous variables. Here's an example:
import numpy as np
import scipy.integrate as spi
# Define the joint probability density function
def joint_pdf(x, y):
return x*y*np.exp(-x*y)
# Define the marginal probability density function for x
def marginal_pdf_x(x):
return spi.simps(joint_pdf(x, y_vals), y_vals)
# Define the marginal probability density function for y
def marginal_pdf_y(y):
return spi.simps(joint_pdf(x_vals, y), x_vals)
# Define the range of values for x and y
x_vals = np.linspace(0, 5, 50)
y_vals = np.linspace(0, 5, 50)
# Calculate the marginal PDFs for x and y
marginal_x = np.array([marginal_pdf_x(x) for x in x_vals])
marginal_y = np.array([marginal_pdf_y(y) for y in y_vals])
The resulting arrays marginal_x and marginal_y contain the marginal probability density functions for the
variables x and y, respectively.
Conditional Distribution:
Conditional distribution is a concept in probability theory that involves calculating the probability distribution of
a random variable given that another random variable takes a certain value. It is a way to understand how the
probability distribution of one variable changes when another variable is known or fixed.
Conditional Probability:
Conditional probability is the probability of an event occurring given that another event has occurred. It is
defined as the probability of event A occurring given that event B has already occurred and is denoted by P(A|
B).
Conditional Probability Mass Function (PMF):
In the case of discrete random variables, the conditional probability mass function (PMF) gives the probability
of a particular value of the random variable, given that another random variable takes a certain value. It is
denoted by P(X = x|Y = y).
Conditional Probability Density Function (PDF):
For continuous random variables, the conditional probability density function (PDF) gives the probability
density of a particular value of the random variable, given that another random variable takes a certain value. It
is denoted by f(x|y).
Examples:
Suppose we have two random variables X and Y, and we want to find the conditional probability of X given that
Y takes a certain value y. We can use the conditional probability formula:
P(X=x|Y=y) = P(X=x, Y=y) / P(Y=y)
where P(X=x, Y=y) is the joint probability of X and Y taking the values x and y respectively, and P(Y=y) is the
marginal probability of Y taking the value y.
Example code for calculating the conditional PMF of X given Y = y:
# Define the joint PMF of X and Y
pmf_XY = {(0, 1): 0.1, (1, 1): 0.3, (0, 2): 0.2, (1, 2): 0.4}
# Define the marginal PMF of Y
pmf_Y = {1: 0.4, 2: 0.6}
# Calculate the conditional PMF of X given Y=1
pmf_X_given_Y1 = {(0): pmf_XY[(0, 1)] / pmf_Y[1], (1): pmf_XY[(1, 1)] /
pmf_Y[1]}
# Calculate the conditional PMF of X given Y=2
pmf_X_given_Y2 = {(0): pmf_XY[(0, 2)] / pmf_Y[2], (1): pmf_XY[(1, 2)] /
pmf_Y[2]}
Covariance and correlation are two statistical concepts that measure the relationship between two variables.
Covariance is a measure of how two variables change together, while correlation is a measure of the strength of
their linear relationship. Both are important tools in data analysis and can help us understand the relationship
between different variables.
Covariance:
Covariance is a measure of how two variables vary together. It measures the degree to which two variables are
related to each other. A positive covariance means that the two variables tend to move in the same direction,
while a negative covariance means they tend to move in opposite directions. Covariance is sensitive to the scale
of the variables and is not standardized.
Covariance can be calculated using the following formula: cov(X,Y)=E[(X−E[X])(Y−E[Y])]
Correlation:
Correlation measures the strength of the linear relationship between two variables. It ranges from -1 to 1, where
-1 indicates a perfectly negative correlation, 0 indicates no correlation, and 1 indicates a perfectly positive
correlation. Correlation is standardized, which means it is not sensitive to the scale of the variables.
Correlation can be calculated using the following formula: corr(X,Y)=cov(X,Y)/(std(X)∗std(Y))
Interpretation:
A positive covariance or correlation indicates that two variables tend to move in the same direction, while a
negative covariance or correlation indicates that they tend to move in opposite directions. A correlation of 0
indicates no linear relationship between the variables. The strength of the correlation can be interpreted using
the correlation coefficient.
🔹 Joint Distribution is an important concept in probability theory and statistics that deals with the probability of
two or more random variables occurring together.
🔹 It involves understanding the joint probability mass function, joint probability density function, joint
cumulative distribution function, and conditional distribution.
🔹 The relationship between random variables can be explored using covariance and correlation.
🔹 Multivariate normal distribution is an example of joint distribution, which is commonly used in statistical
inference and modeling.
🔹 Joint distribution can be used in various applications such as in finance, economics, engineering, and social
sciences.
🔹 Understanding joint distribution is crucial for conducting hypothesis testing, building regression models, and
making predictions in various fields.
🔹 Python provides several packages such as NumPy, SciPy, and Pandas for working with joint distributions,
calculating their properties, and visualizing them.
import numpy as np
from scipy.stats import multivariate_normal
# Generate random data
x = np.random.normal(0, 1, 100)
y = np.random.normal(0, 1, 100)
data = np.column_stack((x, y))
# Calculate mean and covariance matrix
mean = np.mean(data, axis=0)
cov = np.cov(data.T)
# Create a multivariate normal distribution
mnorm = multivariate_normal(mean=mean, cov=cov)
# Calculate probability density function at a point
point = [0.5, 0.5]
pdf = mnorm.pdf(point)
print(f"PDF at {point}: {pdf}")
# Generate random samples from the distribution
samples = mnorm.rvs(10)
print(f"Random samples:\n{samples}")
This code generates random data from a multivariate normal distribution, calculates its mean and covariance
matrix, creates a multivariate_normal object using these parameters, calculates the probability density function
at a point, and generates random samples from the distribution. This is just a simple example to demonstrate
how joint distribution can be used in Python programming.
Activity 1:
A group of students measured the height and weight of 10 classmates. The data is given in the following table:
Student Height (inches) Weight
(pounds)
1 68 165
2 71 201
3 61 140
4 72 210
5 67 165
6 64 125
7 65 150
8 62 120
9 66 160
10 69 186
a) Calculate the covariance between height and weight.
Activity 2:
Suppose you are a teacher and you have a class of 30 students. You know that the students' heights are normally
distributed with mean μ=64 inches and standard deviation σ=3 inches. Additionally, you know that the heights
of the female students in the class are also normally distributed with mean μf=62 inches and standard
deviation σf=2.5 inches.
a) If a student is chosen at random from the class, what is the probability that their height is between 62 and 66
inches?
b) If you randomly select a female student from the class, what is the probability that her height is between 60
and 64 inches?
Activity 3:
Suppose the heights and weights of a group of 50 students are jointly distributed according to a bivariate normal
distribution with mean vector μ=[65150] and covariance matrix Σ=[9242464].
What is the probability that a randomly selected student has a height between 62 and 68 inches and a weight
between 140 and 160 pounds?
Activity 4:
Given the following set of data:
X = {2, 4, 6, 8, 10}
Y = {1, 3, 5, 7, 9}
Calculate the covariance and correlation between X and Y.
Activity 5: Insurance Cost Prediction:
In this assignment, we will explore some statistical concepts using the Insurance Cost Prediction dataset from
Kaggle. The dataset contains information on insurance cost for individuals based on their age, sex, BMI, number
of children, smoking habit, and geographic region.
1. Load the dataset into a Pandas dataframe and display the first five rows of the dataset. Dataset
link: https://www.kaggle.com/mirichoi0218/insurance
2. Plot the distribution of insurance charges for male and female policyholders.
3. Calculate the conditional distribution of insurance charges given the policyholder's smoking habit.
4. Create a scatter plot to visualize the relationship between BMI and insurance charges.
5. Compute the covariance and correlation between the policyholder's age and insurance charges.
6. Create a heatmap to visualize the covariance matrix of the dataset.
7. Fit a multivariate normal distribution to the dataset using the maximum likelihood estimation and
visualize the contours of the distribution in a 2D plot.
Sampling refers to the process of selecting a subset, known as a sample, from a larger group, known
as a population. The sample is carefully chosen to be representative of the population's characteristics,
allowing researchers to draw meaningful conclusions about the entire population based on the
analysis of the sample data. Sampling involves techniques that aim to minimize bias and ensure the
sample accurately reflects the diversity and variability present in the population.
Probability Sampling
Probabilty sampling involves the selection of elements from the population using random in which each element
of the population has an equal and independent chance of being chosen.
To put it simple, It is a sampling technique wherein the samples are gathered in a process that gives all the
individuals in the population equal chances of being selected.
Imagine you have a big bag of candies, and you want to know what flavors are inside without eating all of them.
Probability sampling is like reaching into the bag and picking out candies in a way that gives every candy an
equal chance to be chosen. This method helps you get a good idea of what flavors are in the bag without having
to check every single candy.
It is also called random sampling.
Random Sample
A set of items selected from a parent population is a random sample if:
the probability that any item in the population is included in the sample is proportional to its frequency in the
parent population, and
the inclusion/exclusion of any item in the sample operates independently of the inclusion/exclusion of any other
item.
Random Sample Notation
A random sample is made up of (iid) random variables and so they are denoted by capital X’s.
We will use the shorthand notation X to denote a random sample, that is, X = (X1, X2, ..., Xn).
An observed sample will be denoted by x=(x1,x2,…,xn).
The population distribution will be specified by a density (or probability function) denoted by f(x;θ), where θ
denotes the parameter(s) of the distribution such as mean (denoted by μ) and variance (denoted by σ2).
Non-Probability Sampling:
Non-probability sampling is a type of sampling technique where not all members of the population have a
known and equal chance of being selected into the sample.
Unlike probability sampling, non-probability sampling methods do not rely on random selection and may
introduce various forms of bias into the sample.
Non-probability sampling is often used when it is not feasible or practical to implement probability sampling
methods.
Types of Non-Probability Sampling
Judgment or purposive or deliberate sampling
Convenience sampling
Quota sampling
Snow ball sampling
Snowball sampling is used when the population is hard to reach, and participants are recruited through
referrals from existing participants.
It's often used in studies involving hidden or marginalized populations.
Example: Imagine a study investigating the experiences of individuals who have experienced homelessness .
The researcher starts with a small group of participants who are willing to share their experiences. Then, these
initial participants refer the researcher to others within the homeless community, forming a "snowball" effect.
This method allows researchers to access a population that may not be easily accessible through traditional
sampling methods.
Sample Size Determination:
Sum of Independent R.V.
Expected Value (Mean) of the Sum of Independent Random Variables
Suppose we have n independent random variables, X1, X2, ..., Xn. Each of these random variables has its own
expected value (mean), denoted as E[Xi], where i ranges from 1 to n.
The expected value of the sum of these independent random variables, denoted as E[X1 + X2 + ... + Xn], is the
sum of their individual expected values:
E[X1+X2+....+Xn] = E[X1]+E[X2]+....+E[Xn]
This means that the expected value of the sum of independent random variables is simply the sum of their
individual expected values.
Variance of the Sum of Independent Random Variables
Similarly, the variance of the sum of independent random variables, denoted as var[X1 + X2 + ... + Xn], is the
sum of their individual variances:
var[X1+X2+....+Xn] = var[X1]+var[X2]+....+var[Xn]
Again, this equation states that the variance of the sum of independent random variables is the sum of their
individual variances.
Example:
Let's say you're rolling two fair six-sided dice 🎲 . Each die has an expected value of (1+2+3+4+5+6)6 = 3.5
and a variance of ((1−3.5)2+(2−3.5)2+...+(6−3.5)2)6 = 2.9167.
Now, if you want to find the expected value and variance of the sum of the two dice rolls:
Expected value: E[X1 + X2] = E[X1] + E[X2] = 3.5 + 3.5 = 7 Variance: var[X1 + X2] = var[X1] + var[X2] =
2.9167 + 2.9167 = 5.8334
This demonstrates how you can use the properties of expected values and variances to analyze the combined
outcomes of independent random variables.
Sampling Bias:
Definition of a statistic
A statistic is a function of X only and does not involve any unknown parameters.
Examples:
Sample Mean, X̄ =1n∗ΣXi
Sample Variance, S2=1n−1∗∑(Xi−X¯)2
Non-Example:
1n∗∑(Xi−μ)2 is not a statistic, unless μ is known.
A statistic can be generally denoted by g(X) . Since a statistic is a function of random variables, it will be a
random variable itself and will have a distribution, its sampling distribution.
The t result
In most cases, σ2 is not known and so the z result cannot be used in such cases. We use the t result in such
cases.
t=X¯−μS/n√ ~ tn−1
This is the t result or the t sampling distribution.
This result is valid for samples from normal distribution only.
The t distribution is symmetrical about zero.
The F result
If independent random samples of size n1 and n2 respectively are taken from normal populations with
variances σ21 and σ22, then
F=S21/σ21S22/σ22 ~ Fn1−1,n2−1
This result is valid for samples from normal distribution only.
Point Estimation:
Method of Moments
The basic principle is to equate population moments (i.e the means, variances, etc of the theoretical model) to
corresponding sample moments (i.e the means, variances, etc of the sample data observed) and solve for the
parameter(s).
One parameter case:
E(X)=1n∑ni=1Xi
Two parameter case:
E(X)=1n∑ni=1Xi OR E(X2)=1n∑ni=1X2i
Method of Moments Example
A random sample from a Exp(λ) distribution is as follows:
14.84, 0.19, 11.75, 1.18, 2.44, 0.53
Calculate the method of moments estimate for λ.
x̄ = (14.84+0.19+11.75+1.18+2.44+0.53)/6 = 5.155
E[X] = 1λ
According to method of moments: E[X] = x̄ -> 1/λ = 5.155 -> λ = 0.1940
Summary:
🔹 Random Sample: Understanding the characteristics and importance of random sampling in statistical
analysis.
🔹 Random Sample Notation: Familiarizing yourself with the notation and representation of random samples
using capital X's.
🔹 Sum of Independent R.V.: Learning the properties of sums of independent random variables, facilitating
calculations in statistical studies.
🔹 Normal Approximations Using CLT: Exploring the Central Limit Theorem and its application to
approximating distributions.
🔹 The Sample Variance: Understanding the sample variance as an unbiased estimator of population variance
and its distribution.
🔹 The t result: Recognizing the t-distribution as an alternative when population variance is unknown.
🔹 The F result: Exploring the F-distribution in scenarios with two independent samples from normal
populations.
🔹 Point Estimation: Embracing the method of moments and maximum likelihood as techniques for estimating
parameters.
🔹 Method of Moments Example: Applying the method of moments to estimate parameters using a sample from
an exponential distribution.
🔹 Method of Maximum Likelihood: Understanding the widely acclaimed method for finding estimators based
on the likelihood of samples.
🔹 Properties of Estimators: Uncovering the properties of estimators, including bias, variance, and mean
squared error.
Activity 1:
Consider a sample of 10 students' test scores: {75, 80, 92, 68, 85, 77, 81, 79, 90, 88}. Calculate the sample
mean.
Activity 2:
Consider a sample of 10 students' test scores: {75, 80, 92, 68, 85, 77, 81, 79, 90, 88}. Calculate the sample
variance.
Activity 3:
A sample of 12 observations is collected from a normally distributed population. The sample mean is 68, and
the sample standard deviation is 5. Test the hypothesis that the population mean is 65 at a significance level of
0.01.
Activity 4:
Calculate the probability that, for a random sample of 5 values taken from a N(100, 252) population
(i) X will be between 80 and 120, and
(ii) S will exceed 41.7.
Activity 5:
A random sample from an exponential distribution is given as follows:
8.21, 3.47, 5.92, 2.14, 1.05
Calculate the method of moments estimate for the parameter λ of the exponential distribution.
Concept of Confidence
Confidence levels
The confidence level is the probability that a sample statistic (e.g., a mean or proportion) falls within a given
range of values.
The width of a confidence interval is affected by sample size, confidence level, and standard deviation.
Confidence levels in statistics are like a rollercoaster ride. We start with a hypothesis and take a leap of faith by
making a prediction about our population parameter. Then, we collect data and take a thrilling ride through the
statistical analysis process.
Along the way, we use tools like sample means and standard deviations to estimate the population parameter,
but we know that our estimate may not be perfect due to sampling variability. This is where confidence levels
come in.
Think of a confidence level like a safety harness on a rollercoaster. It helps us to stay within a certain range of
our estimated value, keeping us safe from extreme values that could throw off our analysis. A wider confidence
interval suggests that the estimate is less precise, as there is more variability in the data.
Example:
A 95% confidence level means that we are 95% certain that the true population parameter falls within our
calculated confidence interval. It's like taking 100 rides on a rollercoaster and being confident that 95 of them
will keep us safe within the bounds of our confidence level.
To understand confidence levels better, imagine that you want to estimate the average height of all the people in
your city. You can't measure everyone's height, so you take a sample of people and calculate their average
height. But you know that your sample may not be representative of the entire population, so you want to be
sure that your estimate is accurate within a certain range.
This is where confidence levels come in. You can calculate a confidence interval that gives you a range of
possible values for the true population mean. The confidence level tells you how confident you are that the true
mean falls within that interval.
A higher confidence level means that you're more certain that the true mean falls within the interval, but it also
means that the interval will be wider. A lower confidence level means that the interval will be narrower, but
you'll be less certain that the true mean falls within it.
So, understanding confidence levels in statistics is like knowing the limits of our rollercoaster ride. We can take
risks and make predictions, but with the help of confidence levels, we can stay safe and confident in our results.
Note: If a confidence interval includes the value of zero, this typically means there is no effect or difference
between groups.
The level of confidence c is the probability that the interval estimate contains the population parameter. The
remaining area in the tail is 1-c
Interpreting Confidence Intervals:
If we require a 95% confidence interval, then we can read off the 2.5% and 97.5% z-values.
This gives us z0.025 = -1.96 and z0.975 = +1.96.
Putting the values of z0.025 and z0.975 and rearranging we get,
(X¯−1.96σ/√n,X¯+1.96σ/√n)
This is the 95% confidence interval of the population mean, μ. It is also expressed as:
X¯±1.96σ/√n
Example
The average IQ of a sample of 50 university students was found to be 132. Calculate a symmetrical 95%
confidence interval for the average IQ of university students, assuming that IQs are normally distributed. It is
known from previous studies that the standard deviation of IQs among students is approximately 20.
Here, X ~ N(μ,202).
Given, n = 50, x̄ = 132, α=0.05
=> 95% confidence interval for μ is (132 - 1.96 * 2050√ , 132 + 1.96 * 2050√),
i.e, (126.5, 137.5) => We are 95% confident that the average IQ of the population lies between 126.5 and 137.5
Width = 2 * MoE
Given, the value of MoE and σ, we can determine n.
Example Calculate a 95% confidence interval for the average height of 10-year-old children 👫 , assuming that
heights have a N(μ, σ2) distribution (where μ and σ are unknown), based on a random sample of 5 children
whose heights are: 124cm, 122cm, 130cm, 125cm and 132cm.
Here, n = 5, x̄ = 126.6, s = 4.22, α=0.05
=> t0.025,4 = −2.776 => 95% confidence interval for μ is
(126.6 - 2.776 * 4.22√5 , 126.6 + 2.776 * 4.22√5) , i.e, (121.4, 131.8)
−√ = 0.024
=> 90% confidence interval for p is (0.18 - 1.6449 * 0.024, 0.18 + 1.6449 * 0.024) , i.e, (0.140, 0.220)
A motor company runs tests to investigate the fuel consumption of cars 🚜 using a newly developed fuel additive.
Sixteen cars of the same make and age are used, eight with the new additive and eight as controls. The results, in
miles per gallon over a test track under regulated conditions, are as follows:
Obtain a 95% confidence interval for the increase in miles per gallon achieved by cars with the additive. State
clearly any assumptions required for this analysis.
In a one-year mortality investigation, 25 of the 100 ninety-year-old males 🧑 and 20 of the 150 ninety-year-old
females 👩 present at the start of the investigation died before the end of the year. Assuming that the numbers of
deaths follow binomial distributions, calculate a symmetrical 95% confidence interval for the difference
Choosing the appropriate confidence level in statistics is like choosing the right door to open. You want to be
sure that the door you choose leads to the right answer, but you also don't want to waste time and resources on
unnecessary or inaccurate information.
When choosing a confidence level, you need to balance the trade-off between being confident in your results
and having a narrow interval. Generally, higher confidence levels provide greater confidence in your results, but
they also require larger sample sizes and lead to wider intervals. Lower confidence levels provide narrower
intervals, but with less confidence in your results.
For example - let's say you want to estimate the mean height of all the trees in a large forest .If you choose a
90% confidence level, you can be 90% confident that the true population mean falls within your calculated
confidence interval. However, the interval may be wider, requiring a larger sample size. On the other hand, if
you choose a 99% confidence level, you can be more confident in your results, but the interval may be too wide
to be practical.
In general, a confidence level of 95% is commonly used in many statistical analyses as a balance between
precision and confidence. This means that we can be 95% confident that the true population parameter falls
within our calculated interval. However, the appropriate confidence level depends on the specific context and
goals of the analysis.
Activity 1:
A pharmaceutical company is conducting a study to compare the effectiveness of two different medications, A
and C, in treating a specific medical condition. The company randomly assigns patients to two groups: Group A
receives Medication A, and Group C receives Medication C. The company measures a certain outcome variable
before and after the treatment period for each group. The results are as follows:
Group A: Sample size = 30, Sample mean = 85, Sample standard deviation = 10 Group C: Sample size = 35,
Sample mean = 75, Sample standard deviation = 12
Using the provided information, calculate the test statistic for comparing the means of Medication A and
Medication C.
A) 0.97
B) 1.35
C) -0.97
D) -1.35
Activity 2:
A research study investigated the mortality rates of two different groups, Group X and Group Y. In Group X, out
of 80 individuals aged 60, 15 died within a year. In Group Y, out of 120 individuals aged 60, 25 died within a
year. The researchers assume that the number of deaths follows a binomial distribution.
Calculate a symmetrical 95% confidence interval for the difference between the mortality rates of Group X and
Group Y at this age.
A) (-0.060, 0.164)
B) (0.023, 0.126)
C) (-0.105, 0.219)
D) (0.042, 0.107)
Activity 3:
A motor company runs tests to investigate the fuel consumption of cars using a newly developed fuel additive.
Sixteen cars of the same make and age are used, eight with the new additive and eight as controls. The results, in
miles per gallon over a test track under regulated conditions, are as follows:
Obtain a 95% confidence interval for the increase in miles per gallon achieved by cars with the additive. State
clearly any assumptions required for this analysis.
Activity 4:
In a mortality investigation, 25 of the 100 ninety-year-old males and 20 of the 150 ninety-year-old females
present at the start of the investigation dies before the end of the year. Assuming that the number of deaths
follow binomial distributions, calculate a symmetrical 95% confidence interval for the difference between male
and female mortality rates at this age.
Activity 5:
A pharmaceutical company wants to determine the average effectiveness of a new pain medication. A random
sample of 100 patients who were administered the medication is selected. The company wants to estimate the
average pain reduction with 95% confidence. The sample mean pain reduction is found to be 2.5 units with a
standard deviation of 0.8 units.
What is the 95% confidence interval for the true mean pain reduction of the medication?
Hypothesis Testing
Hypothesis:
A hypothesis can be defined as a proposed explanation for a phenomenon. It is not the absolute truth but a
provisional working assumption. In statistics, a hypothesis is considered to be a particular assumption
about a set of parameters of a population distribution. It is called a hypothesis because it is not known
whether or not it is true.
For example, imagineyou notice that your plants are growing taller when you play classical music for
them. You might come up with a hypothesis that the music helps the plants grow. This is just an initial idea
that you can test by playing different genres of music for different groups of plants, measuring their
growth, and comparing the results. If you find that the plants do indeed grow better with classical music,
you can refine your hypothesis and test it further with more experiments.
In statistics, a hypothesis is similar in that it's an assumption that we make about a population based on a
sample of data. We can use statistical tests to evaluate the likelihood that our hypothesis is true, or to
identify other possible explanations for our observations.
Hypothesis Test:
A hypothesis test is a standard procedure for testing a claim about a property of a population.
Rare Event Rule for Inferential Statistics:
If, under a given assumption, the probability of a particular observed event is exceptionally small, we
conclude that the assumption is probably not correct.
For Example:
ProCare Industries, Ltd., once provided a product called “Gender Choice,” which, according to advertising
claims, allowed couples to “increase your chances of having a boy up to 85%, a girl up to 80%.” Gender Choice
was available in blue packages for couples wanting a baby boy and (you guessed it) pink packages for couples
wanting a baby girl.
Suppose we conduct an experiment with 100 couples who want to have baby girls, and they all follow the
Gender Choice “easy-to-use in-home system” described in the pink package. For the purpose of testing the
claim of an increased likelihood for girls, we will assume that Gender Choice has no effect.
Null Hypothesis : Ho
The null hypothesis (denoted by Ho) is a statement that the value of a population parameter (such as proportion,
mean, or standard deviation) is equal to some claimed value.
We test the null hypothesis directly.
Either reject Ho or fail to reject Ho.
The null hypothesis is what we are willing to assume is the case until proven otherwise. We can never claim that
the null hypothesis has been actually proved.
Alternative Hypothesis : HA
The alternative hypothesis (denoted by H1orHaorHA) is the statement that the statistic has a value that
somehow differs from the null hypothesis.
The symbolic form of the alternative hypothesis must use one of these symbols: ≠, <, >.
The alternative hypothesis is typically what researchers are hoping to find evidence for, as it represents a new
theory or idea that could advance our understanding of the world.
There are three types of alternative hypotheses:
One-tailed alternative hypothesis: This type of alternative hypothesis proposes that there is a difference
between two groups in a specific direction.
For example, if we wanted to test the hypothesis that a new medication reduces pain better than a placebo, we
might use a one-tailed alternative hypothesis that says the medication reduces pain more than the placebo.
Two-tailed alternative hypothesis: This type of alternative hypothesis proposes that there is a difference
between two groups, but does not specify a particular direction.
For example, if we wanted to test the hypothesis that a new medication has a different effect on pain than a
placebo, we might use a two-tailed alternative hypothesis that says the medication has a different effect on pain
than the placebo.
Non-inferiority or equivalence alternative hypothesis: This type of alternative hypothesis proposes that there is
no significant difference between two groups or treatment, or that the difference is not clinically meaningful.
For example, if we wanted to test the hypothesis that a new medication is no worse than an existing medication
for treating a certain condition, we might use a non-inferiority alternative hypothesis.
For example: suppose we want to test the hypothesis that a new pain medication is more effective than an
existing medication. We could set up our hypotheses as follows:
Null hypothesis: The new pain medication is no more effective than the existing medication.
One-tailed alternative hypothesis: The new pain medication is more effective than the existing medication.
Two-tailed alternative hypothesis: The new pain medication has a different effect on pain than the existing
medication. Non-inferiority or equivalence alternative hypothesis: The new pain medication is not significantly
worse than the existing medication.
We could then conduct a statistical test to determine whether the data supports one of these hypotheses over the
others. By carefully choosing our hypotheses and analyzing the data, we can draw conclusions about the
effectiveness of the new medication and potentially make improvements to patient care.
How to form your claim or hypothesis?
If you are conducting a study and want to use a hypothesis test to support your claim, the claim must be worded
so that it becomes the alternative hypothesis.
Step 1 : Identify the specific claim or hypothesis to be tested and express it in symbolic form
Step 2 : Give the symbolic form that must be true when the original claim is false
Step 3 : Of the two symbolic expressions obtained so far, let the alternative hypothesis HA be the one not
containing equality so that HA uses the symbol < or > or ≠. Let the null hypothesis Ho be the symbolic
expression that the statistic equals the fixed value being considered
Identify the null and alternative hypothesis:
The proportion of drivers who admit to running red lights is greater than 0.5.
Step 1 : We express the given claim as p > 0.5.
Step 2 : We see that if p > 0.5 is false, then p ≤ 0.5 must be true.
Step 3 : We let the alternative hypothesis HA be p > 0.5, and we let Ho be p = 0.5.
Type - I error
A Type I error is the mistake of rejecting the null hypothesis when it is true.
The symbol α (alpha) is used to represent the probability of a type I error.
Type - II error
A Type II error is the mistake of failing to reject the null hypothesis when it is false.
The symbol β (beta) is used to represent the probability of a type II error.
Test Statistic
The test statistic is a value used in making a decision about the null hypothesis, and is found by converting the
sample statistic to a score with the assumption that the null hypothesis is true.
Test Statistic - Formula
Test statistics for proportions:
z=p^−p/√pq/n
Test statistic for mean
z=X¯−μX¯/σ/√n
Test statistics for variance
X2=(n−1)S^2/σ^2
Test Statistic - Example
Problem :
A survey of n = 880 randomly selected adult drivers showed that 56% (or p = 0.56) of those respondents
admitted to running red lights. Find the value of the test statistic for the claim that the majority of all adult
drivers admit to running red lights.
The preceding example showed that the given claim results in the following null and alternative
hypotheses: H0 : p = 0.5 and HA : p > 0.5. Because we work under the assumption that the null hypothesis is
true with p = 0.5, we get the following test statistic:
z=p^−ppqn√=>0.56−0.5(0.5)(0.5)880√=3.56
Significance Level
The significance level (denoted by α) defines how much evidence we require to reject H0 in favor of HA
Critical Region
The critical region (or rejection region) is the set of all values of the test statistic that cause us to reject the null
hypothesis.
Critical Value
A critical value is any value that separates the critical region (where we reject the null hypothesis) from the
values of the test statistic that do not lead to rejection of the null hypothesis. The critical values depend on the
nature of the null hypothesis, the sampling distribution that applies, and the significance level α.
Finding Critical Value for α=0.01 (Two-tailed test)
In hypothesis testing, a null hypothesis (H₀) is tested against an alternative hypothesis (H ₁). The alternative
hypothesis can be one-tailed or two-tailed, which determines the direction of the test.
One-tailed test
A one-tailed test is when the alternative hypothesis is directional, meaning it specifies a particular direction
of difference between the sample mean and the population mean.
The critical region is located entirely in one tail of the distribution.
For example, if we want to test whether a new drug increases blood pressure, the one-tailed alternative
hypothesis would be that the drug increases blood pressure.
A one-tailed test, showing the p-value as the size of one tail.
Two-tailed test :
A two-tailed test is when the alternative hypothesis is non-directional, meaning it specifies a difference
between the sample mean and the population mean without specifying a particular direction.
The critical region is located in both tails of the distribution.
For example, if we want to test whether a coin is biased, the two-tailed alternative hypothesis would be
that the coin is not fair.
A two-tailed test applied to the normal distribution.
One-Tailed Test:
Null Hypothesis (H0): μ≤X
Alternative Hypothesis (Ha): μ>X
Test Statistic: z=(X¯−μ)/(σ√n)
where, X¯ is the sample mean,
μ is the hypothesized population mean,
σ is the population standard deviation,
n is the sample size.
Critical Value: zα
where, α is the significance level and can be found in a z-table or calculated using a calculator.
Rejection Region:
If, z>zα reject the null hypothesis.
Example:
A shoe manufacturer claims that their shoes have an average lifespan of 12 months. You take a random sample
of 100 shoes and find that the average lifespan is 13 months with a standard deviation of 2 months. Test the
manufacturer's claim at a 5% level of significance.
H0:μ≤12
Ha:μ>12
Test Statistic: z=(13−12)(2100√)=5
Critical Value: z0.05=1.645
Rejection Region: If z>1.645, reject the null hypothesis.
Since the calculated test statistic (z=5) is greater than the critical value (z0.05=1.645), we reject the null
hypothesis and conclude that the shoe manufacturer's claim is not supported by the data.
Two-Tailed Test
Null Hypothesis (H0): μ=X
Alternative Hypothesis (Ha): μ≠X
Test Statistic: t=(X¯−μ)(s√n)
where, X̄ is the sample mean, μ is the hypothesized population mean,
s is the sample standard deviation, and n is the sample size.
Degrees of Freedom: n−1
Critical Value: tα2
where, α is the significance level and can be found in a t-table or calculated using a calculator.
Rejection Region: If, t<−tα2 or t>tα2 reject the null hypothesis.
Example:
A grocery store claims that their organic apples weigh an average of 150 grams. You take a random sample of
30 organic apples and find that the average weight is 155 grams with a standard deviation of 10 grams. Test the
grocery store's claim at a 5% level of significance.
H0:μ=150
Ha:μ≠150
Test Statistic: t=(155−150)(1030√)=3.07
Degrees of Freedom: 30−1=29
Critical Value: t0.025=±2.045
Rejection Region: If t<−2.045 or t>2.045, reject the null hypothesis.
Since the calculated test statistic (t=3.07) falls outside of the rejection region (±2.045), we reject the null
hypothesis and conclude that the grocery store's claim is not supported by the data.
Right and left-tailed tests are two types of hypothesis tests that are commonly used in statistical analysis.
Right-tailed test
A right-tailed test is used to determine if a sample mean is significantly greater than a hypothesized population
mean. The null hypothesis assumes that the sample mean is less than or equal to the hypothesized population
mean. The alternative hypothesis assumes that the sample mean is greater than the hypothesized population
mean.
Formula:
Null Hypothesis (H0): μ≤X
Alternative Hypothesis (Ha): μ>X
Test Statistic: z=(X¯−μ)(sn√)
where,X¯ is the sample mean,
μ is the hypothesized population mean,
s is the sample standard deviation,
n is the sample size.
Critical Value: zα ,where, α is the significance level and can be found in a z-table or calculated using a
calculator.
Rejection Region: If, z>zα, reject the null hypothesis.
Example:
A car manufacturer claims that a new engine design can achieve an average fuel efficiency of 30 miles per
gallon (MPG) or more. You take a random sample of 50 cars with the new engine and find that the average fuel
efficiency is 31 MPG with a standard deviation of 2 MPG. Test the manufacturer's claim at a 5% level of
significance.
H0:μ≤30
Ha:μ>30
Test Statistic: z=(31−30)(250√)=5
Critical Value: z0.05=1.645
Rejection Region: If z>1.645, reject the null hypothesis.
Since the calculated test statistic (z = 5) is greater than the critical value (z0.05=1.645), we reject the null
hypothesis and conclude that the car manufacturer's claim is supported by the data.
Left-tailed test
A left-tailed test is used to determine if a sample mean is significantly less than a hypothesized population mean.
The null hypothesis assumes that the sample mean is greater than or equal to the hypothesized population mean.
The alternative hypothesis assumes that the sample mean is less than the hypothesized population mean.
Formula:
Null Hypothesis (H0): μ≥X
Alternative Hypothesis (Ha): μ<X
Test Statistic: z=(X¯−μ)(s√n)
where, X¯ is the sample mean,
μ is the hypothesized population mean,
s is the sample standard deviation,
n is the sample size.
Critical Value: −zα, where α is the significance level and can be found in a z-table or calculated using
a calculator.
Rejection Region: If, z<−zα, reject the null hypothesis.
Example:
A bakery claims that their new recipe for muffins contains no more than 10 grams of sugar. You take a random
sample of 25 muffins and find that the average sugar content is 9 grams with a standard deviation of 1 gram.
Test the bakery's claim at a 1% level of significance.
H0:μ≥10
Ha:μ<10
Test Statistic: z=(9−10)(125√)=−5
Critical Value: −z0.01=−2.326
Rejection Region: If z<−2.326, reject the null hypothesis.
Since -5 is less than -2.326, we reject the null hypothesis and conclude that the bakery's claim of no more than
10 grams of sugar per muffin is supported by the data.
A one-tailed test is used when the alternative hypothesis specifies a direction, either an increase or a decrease, in
the parameter being tested. A two-tailed test is used when the alternative hypothesis does not specify a direction,
but only that the parameter is not equal to the hypothesized value.
In a right-tailed test, the critical region is in the right tail of the distribution and the null hypothesis is rejected if
the test statistic falls in the critical region to the right of the mean. In a left-tailed test, the critical region is in the
left tail of the distribution and the null hypothesis is rejected if the test statistic falls in the critical region to the
left of the mean.
In summary, one-tailed and two-tailed tests are concerned with the directionality of the alternative hypothesis,
while left-tailed and right-tailed tests are concerned with the location of the critical region.
P-Values:
The P-value (or p-value or probability value) is the probability of getting a value of the test statistic that is at
least as extreme as the one representing the sample data, assuming that the null hypothesis is true. The null
hypothesis is rejected if the P-value is very small, such as 0.05 or less.
If a P-value is small enough, then we say the results are statistically significant.
We see that for values of z = 3.50 and higher, we use 0.9999 for the cumulative area to the left of the test
statistic.
The P-value is 1 – 0.9999 = 0.0001. Since the P-value of 0.0001 is less than the significance level of = 0.05, we
reject the null hypothesis. There is sufficient evidence to support the claim.
Example 2:
We have a sample of 106 body temperatures having a mean of 98.20°F. Assume that the sample is a simple
random sample and that the population standard deviation is known to be 0.62°F. Use a 0.05 significance level
to test the common belief that the mean body temperature of healthy adults is equal to 98.6°F.
H0 : μ = 98.6 , HA : μ ≠ 98.6, α = 0.05 , X¯ = 98.2 , σ = 0.62
z=X¯−μX¯σn√=>98.2−98.60.62106√=−6.64
This is a two-tailed test and the test statistic is to the left of the center, so the P-value is twice the area to the left
of z = –6.64.
Using norm.cdf , the area to the left of z = –6.64 is 0.0001, so the P-value is 2(0.0001) = 0.0002.
Because the P-value of 0.0002 is less than the significance level of = 0.05, we reject the null hypothesis. There
is sufficient evidence to conclude that the mean body temperature of healthy adults differs from 98.6°F.
Activity 1
A researcher wants to investigate whether the average IQ of university students is greater than 100. A sample of
50 university students was selected, and their average IQ was found to be 105. The researcher assumes that IQs
are normally distributed, and previous studies suggest that the standard deviation of IQs among students is
approximately 20.
Is there sufficient evidence to conclude that the average IQ of university students is greater than 100, using a
significance level of 0.05?
Activity 2
A researcher wants to assess whether the standard deviation of the heights of 10-year-old children is equal to
3cm. The researcher randomly selects a sample of 5 heights (in cm): 124, 122, 130, 125, and 132. What
statistical test can be conducted to determine if the standard deviation of heights is equal to 3cm?
Activity 3
In a one-year mortality investigation, 60 out of 300 ninety-year-olds present at the start of the investigation died
before the end of the year. The investigation assumes that the number of deaths follows a binomial distribution
with parameters N = 300 and q = 0.25. What conclusion can be drawn from this data regarding the mortality rate
for this age?
Activity 4
A new gene has been identified that makes individuals particularly susceptible to a specific food allergy. In a
random sample of 100 children from a certain region, 15 were found to be carriers of the gene. Test whether the
proportion of children in that region carrying the gene is greater than 20%.
Null Hypothesis (H0): The proportion of children carrying the gene in the region is equal to or less than 20%.
Alternative Hypothesis (Ha): The proportion of children carrying the gene in the region is greater than 20%.
You are conducting a hypothesis test using the provided information with a significance level of 0.05. What can
you conclude based on the test results regarding the alternative hypothesis?
Experiment Design
Experiment Design:
Experimental design in statistics refers to the process of planning and conducting experiments to answer
research questions or test hypotheses.
Let's say you're a scientist studying the effects of a new fertilizer on the growth of tomato plants. To conduct
your experiment, you need to design a plan that controls for variables that could affect your results, like the
amount of water and sunlight each plant receives.
First, you decide to divide your tomato plants into two groups: an experimental group that will receive the new
fertilizer and a control group that won't receive any fertilizer.
Next, you randomly assign each plant to one of the two groups to avoid any bias in the results.
You also decide to measure the height of each plant at regular intervals over the course of the experiment and
record the data in a spreadsheet.
After the experiment is complete, you can use inferential statistics to analyze your data and determine if there
was a significant difference in the growth of the two groups of tomato plants.
By carefully designing your experiment and controlling for variables, you can draw accurate conclusions about
the effects of the new fertilizer on tomato plant growth.
Overall, experimental design is a crucial step in statistical analysis as it allows researchers to control the
variables and obtain meaningful results that can be used to make informed decisions.
Randomized designs
In this type of experimental design, the subjects are randomly assigned to different treatment groups.
One example of a randomized design is a clinical trial for a new medication, where patients are randomly
assigned to either the treatment group (receiving the new medication) or the control group (receiving a
placebo).
Post-test only design
In this design, subjects are randomly assigned to either the treatment or control group and are measured on the
dependent variable after the treatment is given.
An example of this could be a study measuring the effectiveness of a new exercise program by comparing the
fitness levels of participants who completed the program with those who did not.
Formula:
The formula for the post-test only design is Y=X+ε, Where:
Y = the dependent variable measured after the intervention
X = the intervention or treatment
ε = the error term
Example:
Suppose we want to evaluate the effectiveness of a new pain medication. We randomly select 100 patients with
chronic pain and divide them into two groups: treatment and control. The treatment group receives the new
medication while the control group receives a placebo. After two weeks, we measure the level of pain using a
pain scale from 1 to 10.
The data we collected is as follows:
Treatment Group: Y1=X1+ε1
Control Group: Y2=X2+ε2
Suppose we found that the average pain score for the treatment group is 4.5, and the average pain score for the
control group is 6.5. Using the formula above, we can test whether there is a significant difference between the
two groups:
Y1−Y2=(X1+ε1)−(X2+ε2)
Y1−Y2=X1−X2+(ε1−ε2)
If the difference between the two groups is statistically significant, we can conclude that the new medication is
effective in reducing chronic pain.
This design is similar to the post-test only design, but with the additional step of measuring the dependent
variable before the treatment is given.
An example of this could be a study measuring the effect of a new teaching method on student test scores by
comparing their scores before and after the new method is implemented.
Formula:
Treatment effect = (Posttest score of treatment group - Pretest score of treatment group) - (Posttest score of
control group - Pretest score of control group)
Example:
Suppose a researcher wants to test the effectiveness of a new teaching method on students' test scores. They
randomly assign half of the students to receive the new teaching method (treatment group) and half to receive
the traditional teaching method (control group). Before the teaching methods are implemented, all students take
a pretest. After the teaching methods are implemented, all students take a posttest. The results are as follows:
Treatment group: Pretest mean = 65, Posttest mean = 85 Control group: Pretest mean = 62, Posttest mean = 75
Using the formula, the treatment effect can be calculated as:
Treatment effect = (85 - 65) - (75 - 62) = 10
Therefore, the new teaching method resulted in a significant treatment effect of 10 points on the posttest scores.
Activity :
A researcher conducts a study to evaluate the effectiveness of a new exercise program on weight loss. The study
includes two groups: the exercise group and the control group. The researcher measures the participants' weights
before and after the program. The results are as follows:
Exercise group: Pretest mean = 75 kg, Posttest mean = 70 kg Control group: Pretest mean = 73 kg, Posttest
mean = 72 kg
Using the treatment effect formula, calculate the treatment effect of the exercise program on weight loss.
A) 1 kg
B) -2 kg
C) 3 kg
D) -5 kg
Example:
A company wants to test a new training program to see if it improves employees' productivity. They randomly
assign their employees to one of the four groups. The scores of the groups are as follows:
Group 1 (treatment and post-test): 80
Group 2 (treatment, pretest, and post-test): 75
Group 3 (pretest and post-test only): 70
Group 4 (no treatment and post-test only): 72
The overall mean score of all the groups is (80 + 75 + 70 + 72) / 4 = 74.25.
The effect of the treatment (τ) is calculated as follows:
τ=[(Y1−μ)−(Y2−μ)]−[(Y4−μ)−(Y3−μ)]
= (80 - 74.25) - (75 - 74.25) - (72 - 74.25) + (70 - 74.25)
= 5.75 - 2.75 - 2.25 + 4.25 = 5
The result indicates that the new training program has a significant positive effect on employees' productivity,
with an effect size of 5.
Factorial Design
Factorial design is a type of experimental design that involves the manipulation of two or more independent
variables to study their effects on the dependent variable. In other words, it is a design that looks at how
different levels of multiple factors (variables) affect the outcome.
Formula:
Y=μ+A+B+AB+e, where
Non equivalent control group designs Treatment effect=( Y 1−Y 3 )−(Y 2−Y 4 )
Example:
A new fertilizer has been developed for a specific type of plant, and a group of researchers wants to test whether
this fertilizer has a significant effect on plant growth compared to the standard fertilizer. They plan to conduct a
randomized controlled trial, where they randomly assign 50 plants to either the new fertilizer group or the
standard fertilizer group.
Pros:
Provides a standardized approach to analyzing data
Allows for objective decision-making based on statistical evidence
Provides a measure of confidence in the results
Cons:
Can be affected by sample size, outliers, and other factors that may not be relevant to the research question
May not provide a complete picture of the data and may oversimplify complex relationships between variables
May be influenced by researcher bias or other factors that are difficult to control for.
Sampling Methods:
Sampling is the process of selecting a representative subset of the population for study in order to make
inferences about the entire population. In experimental design, sampling refers to the process of selecting
individuals or units from the population to be included in the study.
Probability Sampling:
Probability sampling involves selecting individuals or units from the population at random.
Example:
simple random sampling -> Involves selecting individuals or units at random from the population.
stratified random sampling -> Involves dividing the population into strata based on some characteristic and then
randomly selecting individuals or units from each stratum.
cluster sampling -> Involves dividing the population into clusters and then randomly selecting entire clusters for
inclusion in the study.
Systematic Sampling -> involves selecting every nth item from a population. The value of n is determined by
dividing the population size by the desired sample size.
Non-Probability Sampling:
Non-probability sampling involves selecting individuals or units based on some non-random criterion.
Example:
convenience sampling -> Involves selecting individuals or units based on their availability and
accessibility.
purposive sampling -> Involves selecting individuals or units based on some specific criterion, such as
age, gender, or occupation.
snowball sampling -> Involves selecting individuals or units based on referrals from other individuals
or units already included in the study.
Based on previous studies and preliminary data, the researcher estimates the standard deviation of test
scores to be 12.
Using a desired power of 0.80 and a significance level of 0.05, the researcher performs a power
analysis and determines that a sample size of 100 participants is needed to detect a 10% difference in
test scores with sufficient power.
Pros:
Helps researchers determine an appropriate sample size, ensuring adequate statistical power.
Enhances the reliability and validity of research findings by minimizing the risk of Type II errors.
Allows researchers to optimize the allocation of resources, such as time and funding, by focusing on studies that
have a high likelihood of detecting an effect.
Cons:
Power analysis requires assumptions about effect size, variability, and significance level, which may not always
accurately reflect the true population parameters.
Power analysis is based on statistical assumptions and can be sensitive to deviations from those assumptions.
Power analysis does not guarantee that a study will detect an effect if it truly exists, as other factors (e.g., study
design, measurement error) can also influence the results.
Ethical considerations:
Ethical considerations play a crucial role in experimental design as they ensure that the rights and well-being of
participants are protected throughout the study.
Let's explore ethical considerations in experimental design, along with an example and their pros and cons.
Informed Consent:
Researchers must obtain informed consent from participants, ensuring they have a clear understanding of the
study's purpose, procedures, risks, and benefits before voluntarily agreeing to participate. This can be obtained
through written consent forms or verbal agreements.
Example:
Before conducting a study on the effects of a new medication, researchers should inform participants about the
potential side effects, benefits, and any alternative treatments available. Participants can then provide informed
consent to participate in the study.
Pros:
Protects participants' autonomy and right to make decisions about their involvement in research.
Ensures transparency and fosters trust between researchers and participants.
Cons:
Obtaining informed consent may introduce selection bias if certain individuals choose not to participate
based on the study's nature or requirements.
Minimization of Harm:
Researchers should minimize any potential physical or psychological harm to participants. They should
carefully assess the risks and benefits of the study and take appropriate measures to protect participants' well-
being.
Example:
In an experimental study involving medical interventions, researchers must ensure that potential risks to
participants' health are minimized, and proper medical supervision is provided throughout the study.
Pros:
Prioritizes participant safety and well-being. Demonstrates ethical responsibility towards the
participants.
Cons:
It can be challenging to anticipate and mitigate all potential risks, especially in complex research
designs or when studying vulnerable populations.
Activity 1:
A group of researchers is studying the effect of a new exercise routine on weight loss compared to the standard
routine. They randomly assign 60 participants to either the new exercise group or the standard exercise group.
After 8 weeks of treatment, they record the weight of each participant and calculate the mean and standard
deviation for both groups.
New exercise group: mean = 10 kg, standard deviation = 2 kg Standard exercise group: mean = 8 kg, standard
deviation = 3 kg
Answer the following questions based on the given information:
Question 1: What is the alternative hypothesis in the study comparing weight loss between two exercise groups?
Question 2: Based on the calculated p-value of 0.012, what should be done with the null hypothesis?
Question 3: If the researchers had chosen a significance level of 0.01 instead of 0.05, what decision would they
make regarding the null hypothesis?
Question 4: If the researchers had chosen a significance level of 0.01, the decision regarding the null
hypothesis would be to fail to reject the null hypothesis since the calculated p-value (0.023) is greater than the
significance level (0.01).
Activity 2:
A researcher wants to determine whether the average height of a sample of N individuals is greater than the
average height of the general population, which is μ . The heights are assumed to be normally distributed, and it
is known from previous studies that the standard deviation of heights among individuals is approximately σ .
Write a Python function that performs a statistical test to determine whether the average height of the sample is
greater than μ , given the sample size (N) , the sample mean (x¯) , the population mean (μ) , the population
standard deviation (σ) , and the significance level. The function should return True if the null hypothesis can be
rejected (indicating that the average height of the sample is indeed greater than μ ), and False otherwise. Use
only the functions imported above to perform the tests. To check the function, you can use the following input
combinations:
N = 100, x¯ = 175, μ = 170, σ = 5, significance level = 0.05
N = 50, x¯ = 163, μ = 160, σ = 4, significance level = 0.01
N = 200, x¯ = 180, μ = 175, σ = 6, significance level = 0.05
Activity 3:
Problem:
A researcher wants to evaluate the effectiveness of a new exercise program on improving cardiovascular fitness.
They conduct a pretest-post-test only design study where they measure the cardiovascular fitness of two groups:
the treatment group, which participates in the exercise program, and the control group, which does not
participate in any specific exercise program. The researcher measures the cardiovascular fitness level of each
participant before and after the study.
The following table shows the pretest and post-test scores for both the treatment and control groups:
Group - Pretest Score, Post-test Score
Treatment - 60, 75
Control- 62, 65
Using the formula: Treatment effect = (Posttest score of treatment group - Pretest score of treatment group) -
(Posttest score of control group - Pretest score of control group)
Design a Python function to calculate the treatment effect based on the pretest-post-test only design. The
function should take the pretest and post-test scores of both the treatment and control groups as input and output
the treatment effect.
Use the provided data to check the function output.
Activity 4:
A study was conducted to investigate the effect of a new drug on the cure rates of a disease. The study enrolled
200 patients, out of which 100 were randomly assigned to receive the drug and 100 were assigned to the control
group. After the treatment, it was found that 60 patients in the drug group were cured, while only 40 patients in
the control group were cured.
Write a Python function to test whether the cure rates of the two groups are significantly different. Assume that
the cure rates follow binomial distributions and the null hypothesis is that the cure rates in both groups are
equal.
Formally:
Ho: P(drug) = P(control)
Ha: P(drug) ≠ P(control)
The function should take in the sample sizes and cure rates of the two groups as arguments and output the p-
value for the test. Use a significance level of 0.05.