0% found this document useful (0 votes)
67 views12 pages

Probability and Statistics: Wikipedia

This document provides an overview of probability and statistics concepts. It defines probability as the number of times an event occurs divided by the total number of trials. Probability can be discrete, taking on certain values, or continuous. Common probability distributions are discussed like binomial, Poisson, Gaussian, and exponential. Key concepts like mean, mode, median, and variance are defined for both continuous and discrete distributions. The document also discusses accuracy versus precision in measurements and how statistical analysis can provide information about errors and consistency of experiments with theories.

Uploaded by

Gharib Mahmoud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views12 pages

Probability and Statistics: Wikipedia

This document provides an overview of probability and statistics concepts. It defines probability as the number of times an event occurs divided by the total number of trials. Probability can be discrete, taking on certain values, or continuous. Common probability distributions are discussed like binomial, Poisson, Gaussian, and exponential. Key concepts like mean, mode, median, and variance are defined for both continuous and discrete distributions. The document also discusses accuracy versus precision in measurements and how statistical analysis can provide information about errors and consistency of experiments with theories.

Uploaded by

Gharib Mahmoud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

K.K.

Gan L1: Probability and Statistics 1


Lecture 1
Probability and Statistics
Wikipedia:
! Benjamin Disraeli, British statesman and literary gure (1804 1881):
# There are three kinds of lies: lies, damned lies, and statistics.
$ popularized in US by Mark Twain
$ the statement shows the persuasive power of numbers
% use of statistics to bolster weak arguments
% tendency of people to disparage statistics that do not support their positions
! The purpose of P3700:
# how to understand the statistical uncertainty of observation/measurement
# how to use statistics to argue against a weak argument (or bolster a weak argument?)
# how to argue against people disparaging statistics that do not support their positions
# how to lie with statistics?
K.K. Gan L1: Probability and Statistics 2
Introduction:
! Understanding of many physical phenomena depend on statistical and probabilistic concepts:
#Statistical Mechanics (physics of systems composed of many parts: gases, liquids, solids.)
$ 1 mole of anything contains 6x10
23
particles (Avogadro's number)
$ impossible to keep track of all 6x10
23
particles even with the fastest computer imaginable
% resort to learning about the group properties of all the particles
% partition function: calculate energy, entropy, pressure... of a system
# Quantum Mechanics (physics at the atomic or smaller scale)
$ wavefunction = probability amplitude
% probability of an electron being located at (x,y,z) at a certain time.
! Understanding/interpretation of experimental data depend on statistical and probabilistic concepts:
# how do we extract the best value of a quantity from a set of measurements?
# how do we decide if our experiment is consistent/inconsistent with a given theory?
# how do we decide if our experiment is internally consistent?
# how do we decide if our experiment is consistent with other experiments?
% In this course we will concentrate on the above experimental issues!
K.K. Gan L1: Probability and Statistics 3
Denition of probability:
! Suppose we have N trials and a specied event occurs r times.
# example: rolling a dice and the event could be rolling a 6.
$ dene probability (P) of an event (E) occurring as:
P(E) = r/N when N !"
# examples:
& six sided dice: P(6) = 1/6
& coin toss: P(heads) = 0.5
% P(heads) should approach 0.5 the more times you toss the coin.
% for a single coin toss we can never get P(heads) = 0.5!
$ by denition probability is a non-negative real number bounded by 0$ P $1
# if P = 0 then the event never occurs
# if P = 1 then the event always occurs
# sum (or integral) of all probabilities if they are mutually exclusive must = 1.
& events are independent if: P(A%B) = P(A)P(B)
' coin tosses are independent events, the result of next toss does not depend on previous toss.
& events are mutually exclusive (disjoint) if: P(A%B) = 0 or P(A&B) = P(A) + P(B)
' in coin tossing, we either get a head or a tail.
%'intersection, &' union
K.K. Gan L1: Probability and Statistics 4
! Probability can be a discrete or a continuous variable.
$ Discrete probability: P can have certain values only.
# examples:
& tossing a six-sided dice: P(x
i
) = P
i
here x
i
= 1, 2, 3, 4, 5, 6 and P
i
= 1/6 for all x
i
.
& tossing a coin: only 2 choices, heads or tails.
# for both of the above discrete examples (and in general)
when we sum over all mutually exclusive possibilities:
$ Continuous probability: P can be any number between 0 and 1.
# dene a probability density function, pdf,
with ! a continuous variable
# probability for x to be in the range a $ x $ b is:
# just like the discrete case the sum of all probabilities must equal 1.
% f(x) is normalized to one.
# probability for x to be exactly some number is zero since:
!
P x
i
( )
i
" =1
!
f x
( )
dx = dP x " # " x + dx
( )
!
P(a " x " b) = f x
( )
a
b
# dx
f x ( )
!
f x
( )
"#
+#
$ dx =1
!
f x
( )
x=a
x=a
" dx = 0
Notation:
x
i
is called a
random variable
K.K. Gan L1: Probability and Statistics 5
! Examples of some common P(x)s and f(x)s:
Discrete = P(x) Continuous = f(x)
binomial uniform, i.e. constant
Poisson Gaussian
exponential
chi square
! How do we describe a probability distribution?
$ mean, mode, median, and variance
$ for a continuous distribution, these quantities are dened by:
$ for a discrete distribution, the mean and variance are dened by:
Mean Mode Median Variance
average most probable 50% point width of distribution
= xf (x)dx
! "
+ "
#
$f x ( )
$x
x= a
= 0 0. 5 = f (x)dx
! "
a
#
%
2
= f (x) x ! ( )
2
dx
! "
+ "
#
!
=
1
n
x
i
i=1
n
"
!
"
2
=
1
n
(x
i
#)
2
i=1
n
$
K.K. Gan L1: Probability and Statistics 6
!
mode
median
mean
symmetric distribution (gaussian)
Asymmetric distribution showing the mean,
median and mode
! Some continuous pdf:
$ Probability is the area under the curves!
For a Gaussian pdf,
the mean, mode,
and median are
all at the same x.
For most pdfs,
the mean, mode,
and median are
at different locations.
K.K. Gan L1: Probability and Statistics 7
! Calculation of mean and variance:
$ example: a discrete data set consisting of three numbers: {1, 2, 3}
# average () is just:
# complication: suppose some measurement are more precise than others.
% if each measurement x
i
have a weight w
i
associated with it:
# variance ("
2
) or average squared deviation from the mean is just:
& " is called the standard deviation
% rewrite the above expression by expanding the summations:
& n in the denominator would be n -1 if we determined the average () from the data itself.
!
=
x
i
n
i=1
n
" =
1+2+ 3
3
= 2
!
= x
i
i=1
n
" w
i
/ w
i
i=1
n
"
!
"
2
=
1
n
(x
i
#)
2
i=1
n
$
!
"
2
=
1
n
x
i
2
+
2
i=1
n
# $2 x
i
i=1
n
#
i=1
n
#
%
&
'
(
)
*
=
1
n
x
i
2
+
i=1
n
#
2
$2
2
=
1
n
x
i
2
$
i=1
n
#
2
= x
2
$ x
2
variance describes the width of the pdf!
weighted average
< > ' average
K.K. Gan L1: Probability and Statistics 8
# using the denition of from above we have for our example of {1,2,3}:
# the case where the measurements have different weights is more complicated:
& is the weighted mean
& if we calculated from the data, "
2
gets multiplied by a factor n/(n-1).
$ example: a continuous probability distribution,


# has two modes!
# has same mean and median, but differ from the mode(s).
!
"
2
=
1
n
x
i
2
#
i=1
n
$
2
= 4.67 #2
2
= 0.67
!
"
2
= w
i
(x
i
i=1
n
# $)
2
/ w
i
i=1
n
# = w
i
x
i
2
i=1
n
# / w
i
i=1
n
# $
2
!
f (x) = sin
2
x for 0 " x " 2#
# f(x) is not properly normalized:
% normalized pdf:
!
sin
2
xdx
0
2"
# = " $1
!
f (x) = sin
2
x / sin
2
xdx
0
2"
# =
1
"
sin
2
x
K.K. Gan L1: Probability and Statistics 9
# for continuous probability distributions, the mean, mode, and median are
calculated using either integrals or derivatives:
$ example: Gaussian distribution function, a continuous probability distribution
!
=
1
"
x sin
2
xdx
0
2"
# = "
mode :
$
$x
sin
2
x = 0 %
"
2
,
3"
2
median :
1
"
sin
2
xdx
0
&
# =
1
2
%& = "

p(x) =
1
! 2"
e
#
( x # )
2
2!
2
gaussian
K.K. Gan L1: Probability and Statistics 10
Accuracy and Precision:
! Accuracy: The accuracy of an experiment refers to how close the experimental measurement
is to the true value of the quantity being measured.
! Precision: This refers to how well the experimental result has been determined, without
regard to the true value of the quantity being measured.
$ just because an experiment is precise it does not mean it is accurate!!
accurate but not precise
precise but not accurate
K.K. Gan L1: Probability and Statistics 11
Measurement Errors (Uncertainties)
! Use results from probability and statistics as a way of indicating how good a measurement is.
$ most common quality indicator:
relative precision = [uncertainty of measurement]/measurement
# example: we measure a table to be 10 inches with uncertainty of 1 inch.
relative precision = 1/10 = 0.1 or 10% (% relative precision)
$ uncertainty in measurement is usually square root of variance:
( = standard deviation
# usually calculated using the technique of propagation of errors.
Statistics and Systematic Errors
! Results from experiments are often presented as:
N XX YY
N: value of quantity measured (or determined) by experiment.
XX: statistical error, usually assumed to be from a Gaussian distribution.
$ with the assumption of Gaussian statistics we can say (calculate) something about
how well our experiment agrees with other experiments and/or theories.
# Expect an 68% chance that the true value is between N - XX and N + XX.
YY: systematic error. Hard to estimate, distribution of errors usually not known.
$ examples: mass of proton = 0.9382769 0.0000027 GeV (only statistical error given)
mass of W boson = 80.8 1.5 2.4 GeV
K.K. Gan L1: Probability and Statistics 12
! Whats the difference between statistical and systematic errors?
N XX YY
$ statistical errors are random in the sense that if we repeat the measurement enough times:
XX -> 0
$ systematic errors do not -> 0 with repetition.
# examples of sources of systematic errors:
& voltmeter not calibrated properly
& a ruler not the length we think is (meter stick might really be < meter!)
$ because of systematic errors, an experimental result can be precise, but not accurate!
! How do we combine systematic and statistical errors to get one estimate of precision?
% big problem!
$ two choices:
# "
tot
= XX + YY add them linearly
# "
tot
= (XX
2
+ YY
2
)
1/2
add them in quadrature
! Some other ways of quoting experimental results
$ lower limit: the mass of particle X is > 100 GeV
$ upper limit: the mass of particle X is < 100 GeV
$ asymmetric errors: mass of particle
! Dont quote any measurement to more than three digits as it is difcult to achieve 0.1% precision:
$ 0.231 0.013, 791 57, (5.98 0.43)x10
-5
$ measurement and uncertainty should have the same number of digits

X =100
!3
+4
GeV

You might also like