Notes
Notes
Data are collected everywhere and require CHAPTER 4 DESCRIBING DATA : DISPLAYING AND EXPLORING DATA
statistical knowledge to make the information useful. 2)to make professional and personal Four shapes are commonly observed: symmetric, positively skewed, negatively skewed, and
decisions. 3) to understand the world and be conversant in your career. In summary, statistics will bimodal.
help you make more effective personal and professional decisions. STATISTICS DEFINITION: The SKEWNESS:
science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making A measure of
more effective decisions. Types of Statistics : DESCRIPTIVE AND INFERENTIAL the symmetry
1)Descriptive statistics can be used to organize data into a meaningful form DEFINITION: Methods of a
of organizing, summarizing, and presenting data in an informative way. distribution
2) Inferential Statistics is used to estimate a property of a population on the basis of a sample.
POPULATION V S SAMPLE POPULATION : The entire set of individuals or objects of interest or the
measurements obtained from all individuals or objects of interest
SAMPLE: A portion or part of the population of interest FORMULA SKEWNESS =
TWO TYPES OF VARIABLES: QUALITATIVE VARIABLE VS QUANTITATIVE VARIABLE
1) QUALITATIVE: An object or individual is observed and recorded as a non-numeric characteristic
or attribute. Examples: gender, state of birth, eye color 1)The coefficient of skewness can range from -3 to +3 2)A value near -3 indicates
2) QUANTITATIVE: A variable that is reported numerically. Examples : Balance in your checking considerable negative skewness 3) A value of 1.63 indicates moderate positive skewness
account, the life of a car battery, the number of people employed by a company 4) A value of 0 means the distribution is symmetrical
QUANTI CAN BE DISCRETE OR CONTINUOUS DISCRETE RESULT OF COUNTING & CONTINUOUS RESULT DESCRIBING THE RELATIONSHIP BETWEEN TWO VARIABLES
OF MEASURING SOMETHING. DISCRETE HAVE GAPS BETWEEN THE VALUES, EX: THE NUM.OF BEDROOMS A SCATTER DIAGRAM/SCATTER PLOT/SCATTERGRAM: A graphical technique we use to show
IN A HOUSE. CONTINUOUS: CAN ASSUME ANY VALUE WITHIN A SPECIFIC RANGE. EX: DURATION OF the relationship between variables is called a scatter diagram.
FLIGHTS FROM KL TO SABAH 1) A scatter diagram is a graphical tool to portray the relationship between two variables or
LEVEL OF MEASUREMENT (NOMINAL, ORDINAL,INTERVAL, AND RATIO) Nominal is the lowest
bivariate data 2) Both variables are measured with interval or ratio level scale 3) If the
LOM 1)NOMINAL: Data recorded at the nominal level of measurement is represented as labels or
scatter of points moves from the lower left to the upper right, the variables under
names. They have no order. They can only be classified and counted(EX: Gender) 2)ORDINAL :
Variables based on this level of measurement are only ranked and counted(EX: The list of top ten consideration are directly or positively related 4) If the scatter of points moves from the
states for best business climate, student ratings of professors) 3)INTERVAL(NO NATURAL 0 POINT) upper left to the lower right, the variables are inversely or negatively related.
Lukis
: For data recorded at the interval level of measurement, the interval or the distance between
values is meaningful(EX: Temperature scale,dress size) 4) RATIO(Highest LOM) : Data recorded at
the ratio level of measurement are based on a scale with a known unit of measurement and a
meaningful interpretation of zero on the scale(EX: Wages,changes in stock price,and weight.)
CHAPTER 6 DISCRETE PROBABILITY DISTRIBUTIONS: 1) BINOMIAL 2) POISSON CHAPTER 8 – SAMPLING METHODS AND THE CENTRAL LIMIT THEOREM
WHAT IS PROBABILITY DIST? - the emphasis is on describing the distribution of the data. describe something that has already happened.
1) It describes the likelihoods for a range of possible future outcomes 2) A listing of all the outcomes Sampling is a process of selecting items from a population -use this information to make judgments or
of an experiment and the probability associated with each outcome. inferences about the population.
CHARACTERISTICS OF PROB DISTR: 1) The probability of a particular outcome is between 0 and 1 REASONS TO SAMPLE: 1)The results of a sample may adequately estimate the value of the population
inclusive. 2) The outcomes are mutually exclusive. 3) The list of outcomes is exhaustive. So the sum parameter, saving time and money.2)It may be too time-consuming to contact all members of the population.
of the probabilities of the outcomes is equal to 1. 3)It may be impossible to check or locate all the members of the population 4)The cost of studying all the
WHAT IS RANDOM VARIABLES = A quantity resulting from an experiment that, by chance, can items in the population may be prohibitive 5)Often testing destroys the sampled item and it cannot be
assume different values. Measure both quantitative and qualitative variables. Examples: The returned to the population
number of employees absent from the day shift on Monday, the number might be 0, 1, 2, 3, …The SAMPLING METHOD: 1)In a simple random sample, all members of the population have the same chance of
number absent is the random variable. being selected for the sample 2)In a systematic sample, a random starting point is selected, and then every kth
2 TYPES OF RANDOM VARIABLES: 1) DISCRETE RANDOM VAR: A random variable that can assume item thereafter is selected for the sample. If you do not have a list of the entire population, to begin with, you
only certain clearly separated values. EXAMPLE: Tossing a coin three times and counting the can use the systematic random sample 3)In a stratified sample, the population is divided into several groups
number of heads. based on some characteristics, called strata, and then a random sample is selected from each stratum 4)In
2) CONTINUOUS RANDOM VAR: Can assume an infinite number of values within a given range. clustered sampling, the population is divided into primary units, then samples are drawn from the primary
EXAMPLE: The time between flights between Atlanta and LA are 4.67 hours, 5.13 hours, and so on units.
MEAN AND VARIANCE OF PROB DISTR SAMPLING ERROR: The difference between a sample statistic and its corresponding population parameter.
a) MEAN (EXPECTED VALUE) = - It is unlikely the mean of a sample will be exactly equal to the mean of the population. Sometimes these
b) VARIANCE = errors are positive values, indicating that the sample mean overestimated the population mean; other times
C) STANDARD DEVIATION = are negative values, indicating the sample mean was less than the population mean.
BINOMIAL PROBABILITY DISTR. REQUIREMENTS/EXPERIMENT: 1) An outcome on each trial of an SAMPLING DISTRIBUTION OF THE SAMPLE MEAN: When we use the sample mean to estimate the population
experiment is classified into one of two mutually exclusive categories — a success or a failure. 2) mean, how can we determine how accurate the estimate is? DEFINITION: A probability distribution of
The random variable is the number of successes in a fixed number of trials. 3) The probability of all possible sample means of a given sample size. For a given sample size, the mean of all possible sample
success is the same for each trial. 3)The trials are independent, meaning that the outcome of one means selected from a population is equal to the population mean μx = μ
trial does not affect the outcome of any other trial. There is less variation in the distribution of the sample mean than in the population distribution
BINOMIAL FORMULA =
MEAN OF BINOMIAL DIST = The sampling distribution of the sample mean tends to become bell-shaped
VARIANCE OF BINOMIAL DISTR. = μx= Sum of all sample means/Total num of sample means
POISSON PROBABILITY DISTRIBUTION: 1) This describes the number of times some event occurs CONCLUSION:
during a specified interval 2) The interval can be time, distance, area, or volume 1)The mean of the distribution of the sample mean ($15.43) is equal to the mean of the population, 2) The
TWO ASSUMPTIONS: 1) The probability is proportional to the length of the interval 2) The intervals spread in the distribution of the sample mean is less than the spread in the population values 3)The shapes of
are independent. the population and sample distributions are different.
POISSON PROBABILITY EXPERIMENT: 1) The random variable is the number of times some event CENTRAL LIMIT THEOREM: If samples of a particular size are selected from any population, the sampling
occurs during a defined interval. 2) The probability of the event is proportional to the size of the distribution of the sample mean is approximately a normal distribution. The approximation improves with
interval. 3) The intervals do not overlap and are independent. larger samples. If the population follows a normal probability distribution, then for any sample size the
FORMULA POISSON DISTRIBUTION = sampling distribution of the sample mean will also be normal
If the population distribution is symmetrical, you will see the normal shape of the distribution of the sample
mean emerge with samples as small as 10 . If the distribution is skewed or has thick tails, it may require
MEAN OF A POISSON DISTRIBUTION = samples of 30 or more to observe the normality feature CONCLUSION: 1) The mean of the distribution of
VARIANCE OF POISSON IS EQUAL TO MEAN sample means will be exactly equal to the population mean if we select all possible samples of the same size
from the population μ = μx 2) The standard deviation of the sampling distribution of the sample mean is
also called the standard error of the mean σx = σ /√n
NORMAL DISTRIBUTION; 1) If the population follows a normal distribution, the sampling distribution of the
sample mean will also follow the normal distribution for samples of any size 2)If the population is not
normally distributed, the sampling distribution of the sample mean will approach a normal distribution when
the sample size is at least 30 3)Assume the population standard deviation is known 4)To determine the
probability that a sample mean falls in a particular region, use the following formula z = x bar (the sample
mean) − μ /σ∕√n
CHAPTER 7 CONTINUOUS PROBABILITY DISTRIBUTION CHAPTER 9 – ESTIMATION AND CONFIDENCE INTERVAL
CHARACTERISTICS OF Normal probability distributions: 1) bell-shaped and has a single peak 1) A point estimate is a single value (statistic) used to estimate a population value (parameter). The
at the center of the distribution 2) The distribution is symmetric 3) Asymptotic, meaning the statistic, computed from sample information, that estimates a population parameter.
curve approaches but never touches the X-axis 40 completely described by its mean and 2) A confidence interval is a range of values within which the population parameter is expected to
standard deviation(for dispersion)
occur . A range of values constructed from sample data so that the population parameter is likely to
FAMILY OF NORMAL PROB DISTR:
occur within that range at a specified probability. The specified probability is called the level of
1) EQUAL MEANS AND DIFFERENT SD
confidence.(90% etc) 95% confidence interval= 1.96 (z values) 90% confidence interval= 1.65 (z values)
3) Factors determine confidence interval : 1) The level of confidence (ex: 95%) 2) The size/variability of
2) DIFFERENT MEANS AND DIFFERENT DS standard error of the mean (standard dev. of sample mean) 3) num. of observations in the sample, n.
4) CONFIDENCE INTERVAL FOR A POPULATION MEAN WITH σ KNOWN
x sample mean
3) DIFFERENT MEANS AND EQUAL DISTRIBUTION
z z - value for a particular confidence level
σ the population standard deviation
THE STANDARD NORMAL PROBABILITY DISTRIBUTION (used to determine all the
n the number of observatio ns in the sample
probabilities for all normal prob dist.) (unique, has a mean of 0 and standard deviation of 1)
Any normal probability distribution can be converted to the standard normal probability 5) CONFIDENCE INTERVAL FOR A POPULATION MEAN WITH σ UNKNOWN-using t-dist.
t = x bar − μ /s∕√n (how to find t values)
distribution with the following formula =
Characteristics of T-distribution: 1) It is, like the z distribution, a continuous distribution. 2)It is, like the z
EMPIRICAL RULE: 1) z of 1.00 = .3413 so .3413 * 2 = .6826 or about 68% 2) z of 2.00 = .4772
distribution, bell-shaped and symmetrical. 3) T distribution is flatter, more spread out at center than the
so .4772 * 2 = .9544 or about 95% 3) z of 3.00 = .4987 so .4987 * 2 = .9974 or about 99.7%
FINDING A VALUE FOR X USING Z standard normal distribution, because the standard deviation of the t distribution is larger than that of the
*Two unknown which are x and z
standard normal distribution 4) a family of t distributions. All t distributions have a mean of 0, but their
First, find z, look at the probability
standard deviations differ according to the sample size
Under the curve and find it in the
6) CONFIDENCE INTERVAL FOR POPULATION PROPORTION (nominal scale measurement,outcome is limited to
content of the table. two values)
PROPORTION: The fraction, ratio, or percent indicating the part of the sample or the population having a
APPROXIMATE A BINOMIAL DISTR. particular trait of interest.
A sample proportion, p, is found by x, the number of successes, divided by n, the number of observations
USING NORMAL PROB DISTR. Under
Certain conditions: 1) nπ and A population proportion is identified by "π" (success) Two requirements:
n(1-"π") must both be at least 5
1)The binomial conditions have been met
2) n is the number of observations 2)The values nπ and n(1- "π") should both be greater than or equal to 5
3) π is the probability of a success -Confidence Interval for a population proportion formula =
The four conditions for a binomial probability distribution are
1) There are only two possible outcomes
2) "π" (pi) remains the same from trial to trial 7) DETERMINING SAMPLE SIZE TO ESTIMATE POPULATION MEANS =
3) The trials are independent
4) The distribution results from a count of the number of successes in a fixed number of Three factors that determine the sample size when we wish to estimate the mean
trials 1)The margin of error, E 2) The desired level of confidence, for example, 95% 3)The variation in the population
8) DETERMINING SAMPLE SIZE TO ESTIMATE POPULATION PROPORTION =
three factors that determine the sample size when we wish to estimate a proportion
1)The margin of error, E 2)The desired level of confidence 3)A value for "π" to calculate the variation in the
population
The size of the standard error is affected by two values. The first is the standard deviation of the population. The larger the population standard deviation, σ, the larger σ∕√n. If the population is
homogeneous, resulting in a small population standard deviation, the standard error will also be small. However, the standard error is also affected by the number of observations in the sample.
A large number of observations in the sample will result in a small standard error of estimate, indicating that there is less variability in the sample means.