0% found this document useful (0 votes)
31 views3 pages

Notes

1) Statistics is the science of collecting, organizing, and interpreting data to assist in making effective decisions. It has two main branches: descriptive statistics which summarizes and presents data, and inferential statistics which makes estimations about populations from samples. 2) There are two main types of variables: qualitative which are non-numeric attributes, and quantitative which are numeric and can be discrete or continuous. Data can also be measured at nominal, ordinal, interval, or ratio levels. 3) Common measures used to describe data include the mean, median, and mode which represent central tendency, and measures like skewness which describe the symmetry of distributions. Relationships between variables can be shown using tools like scatter plots.

Uploaded by

Nur Hannan Amani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views3 pages

Notes

1) Statistics is the science of collecting, organizing, and interpreting data to assist in making effective decisions. It has two main branches: descriptive statistics which summarizes and presents data, and inferential statistics which makes estimations about populations from samples. 2) There are two main types of variables: qualitative which are non-numeric attributes, and quantitative which are numeric and can be discrete or continuous. Data can also be measured at nominal, ordinal, interval, or ratio levels. 3) Common measures used to describe data include the mean, median, and mode which represent central tendency, and measures like skewness which describe the symmetry of distributions. Relationships between variables can be shown using tools like scatter plots.

Uploaded by

Nur Hannan Amani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

CHAPTER 1 WHAT IS STATISTICS Why Study Statistics?

Data are collected everywhere and require CHAPTER 4 DESCRIBING DATA : DISPLAYING AND EXPLORING DATA
statistical knowledge to make the information useful. 2)to make professional and personal Four shapes are commonly observed: symmetric, positively skewed, negatively skewed, and
decisions. 3) to understand the world and be conversant in your career. In summary, statistics will bimodal.
help you make more effective personal and professional decisions. STATISTICS DEFINITION: The SKEWNESS:
science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making A measure of
more effective decisions. Types of Statistics : DESCRIPTIVE AND INFERENTIAL the symmetry
1)Descriptive statistics can be used to organize data into a meaningful form DEFINITION: Methods of a
of organizing, summarizing, and presenting data in an informative way. distribution
2) Inferential Statistics is used to estimate a property of a population on the basis of a sample.
POPULATION V S SAMPLE POPULATION : The entire set of individuals or objects of interest or the
measurements obtained from all individuals or objects of interest
SAMPLE: A portion or part of the population of interest FORMULA SKEWNESS =
TWO TYPES OF VARIABLES: QUALITATIVE VARIABLE VS QUANTITATIVE VARIABLE
1) QUALITATIVE: An object or individual is observed and recorded as a non-numeric characteristic
or attribute. Examples: gender, state of birth, eye color 1)The coefficient of skewness can range from -3 to +3 2)A value near -3 indicates
2) QUANTITATIVE: A variable that is reported numerically. Examples : Balance in your checking considerable negative skewness 3) A value of 1.63 indicates moderate positive skewness
account, the life of a car battery, the number of people employed by a company 4) A value of 0 means the distribution is symmetrical
QUANTI CAN BE DISCRETE OR CONTINUOUS DISCRETE RESULT OF COUNTING & CONTINUOUS RESULT DESCRIBING THE RELATIONSHIP BETWEEN TWO VARIABLES
OF MEASURING SOMETHING. DISCRETE HAVE GAPS BETWEEN THE VALUES, EX: THE NUM.OF BEDROOMS A SCATTER DIAGRAM/SCATTER PLOT/SCATTERGRAM: A graphical technique we use to show
IN A HOUSE. CONTINUOUS: CAN ASSUME ANY VALUE WITHIN A SPECIFIC RANGE. EX: DURATION OF the relationship between variables is called a scatter diagram.
FLIGHTS FROM KL TO SABAH 1) A scatter diagram is a graphical tool to portray the relationship between two variables or
LEVEL OF MEASUREMENT (NOMINAL, ORDINAL,INTERVAL, AND RATIO) Nominal is the lowest
bivariate data 2) Both variables are measured with interval or ratio level scale 3) If the
LOM 1)NOMINAL: Data recorded at the nominal level of measurement is represented as labels or
scatter of points moves from the lower left to the upper right, the variables under
names. They have no order. They can only be classified and counted(EX: Gender) 2)ORDINAL :
Variables based on this level of measurement are only ranked and counted(EX: The list of top ten consideration are directly or positively related 4) If the scatter of points moves from the
states for best business climate, student ratings of professors) 3)INTERVAL(NO NATURAL 0 POINT) upper left to the lower right, the variables are inversely or negatively related.
Lukis
: For data recorded at the interval level of measurement, the interval or the distance between
values is meaningful(EX: Temperature scale,dress size) 4) RATIO(Highest LOM) : Data recorded at
the ratio level of measurement are based on a scale with a known unit of measurement and a
meaningful interpretation of zero on the scale(EX: Wages,changes in stock price,and weight.)

CHAPTER 3 DESCRIBING DATA: NUMERICAL MEASURES


MEASURES OF LOCATION: A value used to describe the central tendency of a set of data.
Common measures of location: 1) Mean 2)Median 3)Mode 4)Weighted Mean 5)Geoometric Mean
1) a) Population Mean μ = ΣX / N A measurable characteristic of a population is a parameter.
b) Sample Mean = Statistic is a characteristic of a sample.
PROPERTIES OF ARITHMETIC MEAN : INTERVAL/RATIO LOM IS REQUIRED
1) Interval or ratio scale of measurement is required. 2) All the data values are used in the
calculation 3) It is unique, only one mean in the set of data 4) The sum of the deviations from the
mean equals zero, Σ(x − x) = 0 5) A weakness of the mean is that it is affected by extreme values
CONTIGENCY TABLE : Used to classify nominal scale observations according to two characteristics.
2) Median : The midpoint of the values after they have been ordered from the minimum to the
Both variables need only be nominal or ordinal
maximum values.
Students at a university are classified by gender and class (freshman, sophomore,
CHARACTERISTICS OF MEDIAN: a) The median is the value in the middle of a set of ordered data
junior, or senior).
b) At least the ordinal scale of measurement is required c) It is not influenced by extreme values
d) Fifty percent of the observations are larger than the median (50% below the median & 50%
above median) e) It is unique to a set of data CHAPTER 17: INDEX NUMBERS (A number that expresses the relative change in price, quantity, or
value compared to a base period)/ measures the change in a particular item (typically a product or
service) between two-time periods
MAJOR CHARACTERISTICS OF AN INDEX: 1) It is a percentage, but the percent sign is usually
omitted 2) It has a base period
REASONS FOR COMPUTING INDEX: 1) It facilitates the comparison of unlike series 2) If the
numbers are very large, often it is easier to comprehend the change of the index rather than the
change in numbers
ODD number for MEDIAN= 1) SIMPLE INDEX : The index number is used to measure the relative change in just one variable,
such as hourly wages in manufacturing, population index, airport index, and e-commerce sales
index.
EVEN numbered for MEDIAN =
3) Mode : The value of the observation that appears most frequently. FORMULA SIMPLE INDEX:
CHARACTERISTICS OF MODE : 1) The mode can be found for nominal level data 2) A set of data can have
more than one mode 3) Not being affected by extremely high or low values. * IF there are base price for more than one year = use mean of the price of those bare price years.
THE RELATIVE POSITION OF MEAN, MEDIAN, AND MODE
2) UNWEIGHTED INDEXES, WE DON’T CONSIDER THE QUANTITIES
We may wish to combine several related items and compare this group of items in two different
time periods. An index for items related to owning and operating an automobile (tires, oil changes,
and gasoline). An index for items related to expenses of a college student (books, tuition, housing,
meals, entertainment). In the a) simple average of price indexes, we add the simple indexes for
each item and divide by the number of items
FORMULA SIMPLE AVERAGE OF PRICE RELATIVES =
4) The Weighted Mean: Found by multiplying each observation, x, by its corresponding weight, w b) SIMPLE AGGREGATE PRICE INDEX: The price of the items in the group are totaled for both
Formula periods and compared FORMULA =
3) a) WEIGHTED INDEXES: LASPEYRES METHOD
5)The Geometric Mean: Useful in finding the average change of percentages, ratios, indexes, or growth In weighted indexes, the quantities are considered. In the laspeyres method, base period
rates over time quantities are used in both the base period and the given period
FORMULA GM = FORMULA LASPEYRES PRICE INDEX=
RATE OF INCREASE OVER TIME, GM: To find an average percentage change over a period of time
ITS FORMULA, GM =
ADVANTAGE OF LASPEYRES METHOD: Only quantity data from the base period is used which
allows for more meaningful comparison over time. DISADVANTAGE: 1) Does not reflect changes in
buying patterns over time. 2) It may overweight goods whose prices increase.
DISPERSION: Is about the variation/spread of data 3) b) WEIGHTED INDEXES : PAASCHE METHOD
SEVERAL MEASURES OF DISPERSION 1) The range = Max value-Min Value In Paasche, current period quantities are used ADVANTAGE : Current buying habits are reflected
2) a) Variance Sample = DISADVANTAGES: 1) It requires quantity data for the current year and it tends to overweight goods
whose prices have declined 2) It requires the product of prices and quantities to be recomputed
b) Variance Population = each year
FORMULA PAASCHE PRICE INDEX :

3) a) Standard Deviation Sample =


4) FISHER’S IDEAL INDEX: IS THE GEOMETRIC MEAN OF LASPEYRES AND PAASCHE INDEXES
FORMULA FISHER’S IDEAL INDEX =
b) Standard Deviation Population =
CONCLUSION: A Small value for measure of dispersion shows it less disperse and data are clustered 5) VALUE INDEX use both base period and current period prices and quantities.
closely around arithmetic mean. The mean therefore considered representative of the data. AND VICE FORMULA =
VERSA (large values more widely scattered about the mean) 6) a) CPI to determine real income. FORMULA REAL INCOME =
INTERPRETATION AND USES OF STANDARD DEV. Characteristics of SD: 1)It is in the same units as the b) CPI to determine purchasing power of dollar, FORMULA =
original data 2)It is the square root of the average squared distance from the mean 3) It cannot be 7) PPI use to deflate sales FORMULA = DEFLATED SALES =
negative 4) It is the most widely used measure of dispersion.
CHEBYSHEV’S THEOREM : For any set of observations (sample or population), the
The proportion of the values that lie within k standard deviations of the mean is at least
1 – 1/k2, where k is any value greater than 1.
EMPIRICAL RULES: approximately 68% of the observations will lie within plus and minus one
standard deviation of the mean; about 95% of the observations will lie within plus and minus two
standard deviations of the mean; and practically all (99.7%) will lie within plus and minus three
standard deviations of the mean

THE MEAN & SD OF GROUPED DATA (Letak formula)

CHAPTER 6 DISCRETE PROBABILITY DISTRIBUTIONS: 1) BINOMIAL 2) POISSON CHAPTER 8 – SAMPLING METHODS AND THE CENTRAL LIMIT THEOREM
WHAT IS PROBABILITY DIST? - the emphasis is on describing the distribution of the data. describe something that has already happened.
1) It describes the likelihoods for a range of possible future outcomes 2) A listing of all the outcomes Sampling is a process of selecting items from a population -use this information to make judgments or
of an experiment and the probability associated with each outcome. inferences about the population.
CHARACTERISTICS OF PROB DISTR: 1) The probability of a particular outcome is between 0 and 1 REASONS TO SAMPLE: 1)The results of a sample may adequately estimate the value of the population
inclusive. 2) The outcomes are mutually exclusive. 3) The list of outcomes is exhaustive. So the sum parameter, saving time and money.2)It may be too time-consuming to contact all members of the population.
of the probabilities of the outcomes is equal to 1. 3)It may be impossible to check or locate all the members of the population 4)The cost of studying all the
WHAT IS RANDOM VARIABLES = A quantity resulting from an experiment that, by chance, can items in the population may be prohibitive 5)Often testing destroys the sampled item and it cannot be
assume different values. Measure both quantitative and qualitative variables. Examples: The returned to the population
number of employees absent from the day shift on Monday, the number might be 0, 1, 2, 3, …The SAMPLING METHOD: 1)In a simple random sample, all members of the population have the same chance of
number absent is the random variable. being selected for the sample 2)In a systematic sample, a random starting point is selected, and then every kth
2 TYPES OF RANDOM VARIABLES: 1) DISCRETE RANDOM VAR: A random variable that can assume item thereafter is selected for the sample. If you do not have a list of the entire population, to begin with, you
only certain clearly separated values. EXAMPLE: Tossing a coin three times and counting the can use the systematic random sample 3)In a stratified sample, the population is divided into several groups
number of heads. based on some characteristics, called strata, and then a random sample is selected from each stratum 4)In
2) CONTINUOUS RANDOM VAR: Can assume an infinite number of values within a given range. clustered sampling, the population is divided into primary units, then samples are drawn from the primary
EXAMPLE: The time between flights between Atlanta and LA are 4.67 hours, 5.13 hours, and so on units.
MEAN AND VARIANCE OF PROB DISTR SAMPLING ERROR: The difference between a sample statistic and its corresponding population parameter.
a) MEAN (EXPECTED VALUE) = - It is unlikely the mean of a sample will be exactly equal to the mean of the population. Sometimes these
b) VARIANCE = errors are positive values, indicating that the sample mean overestimated the population mean; other times
C) STANDARD DEVIATION = are negative values, indicating the sample mean was less than the population mean.
BINOMIAL PROBABILITY DISTR. REQUIREMENTS/EXPERIMENT: 1) An outcome on each trial of an SAMPLING DISTRIBUTION OF THE SAMPLE MEAN: When we use the sample mean to estimate the population
experiment is classified into one of two mutually exclusive categories — a success or a failure. 2) mean, how can we determine how accurate the estimate is? DEFINITION: A probability distribution of
The random variable is the number of successes in a fixed number of trials. 3) The probability of all possible sample means of a given sample size. For a given sample size, the mean of all possible sample
success is the same for each trial. 3)The trials are independent, meaning that the outcome of one means selected from a population is equal to the population mean μx = μ
trial does not affect the outcome of any other trial. There is less variation in the distribution of the sample mean than in the population distribution
BINOMIAL FORMULA =
MEAN OF BINOMIAL DIST = The sampling distribution of the sample mean tends to become bell-shaped
VARIANCE OF BINOMIAL DISTR. = μx= Sum of all sample means/Total num of sample means
POISSON PROBABILITY DISTRIBUTION: 1) This describes the number of times some event occurs CONCLUSION:
during a specified interval 2) The interval can be time, distance, area, or volume 1)The mean of the distribution of the sample mean ($15.43) is equal to the mean of the population, 2) The
TWO ASSUMPTIONS: 1) The probability is proportional to the length of the interval 2) The intervals spread in the distribution of the sample mean is less than the spread in the population values 3)The shapes of
are independent. the population and sample distributions are different.
POISSON PROBABILITY EXPERIMENT: 1) The random variable is the number of times some event CENTRAL LIMIT THEOREM: If samples of a particular size are selected from any population, the sampling
occurs during a defined interval. 2) The probability of the event is proportional to the size of the distribution of the sample mean is approximately a normal distribution. The approximation improves with
interval. 3) The intervals do not overlap and are independent. larger samples. If the population follows a normal probability distribution, then for any sample size the
FORMULA POISSON DISTRIBUTION = sampling distribution of the sample mean will also be normal
If the population distribution is symmetrical, you will see the normal shape of the distribution of the sample
mean emerge with samples as small as 10 . If the distribution is skewed or has thick tails, it may require
MEAN OF A POISSON DISTRIBUTION = samples of 30 or more to observe the normality feature CONCLUSION: 1) The mean of the distribution of
VARIANCE OF POISSON IS EQUAL TO MEAN sample means will be exactly equal to the population mean if we select all possible samples of the same size
from the population  μ  = μx 2) The standard deviation of the sampling distribution of the sample mean is
also called the standard error of the mean σx = σ /√n
NORMAL DISTRIBUTION; 1) If the population follows a normal distribution, the sampling distribution of the
sample mean will also follow the normal distribution for samples of any size 2)If the population is not
normally distributed, the sampling distribution of the sample mean will approach a normal distribution when
the sample size is at least 30 3)Assume the population standard deviation is known 4)To determine the
probability that a sample mean falls in a particular region, use the following formula z = x bar (the sample
mean) − μ /σ∕√n
CHAPTER 7 CONTINUOUS PROBABILITY DISTRIBUTION CHAPTER 9 – ESTIMATION AND CONFIDENCE INTERVAL
CHARACTERISTICS OF Normal probability distributions: 1) bell-shaped and has a single peak 1) A point estimate is a single value (statistic) used to estimate a population value (parameter). The
at the center of the distribution 2) The distribution is symmetric 3) Asymptotic, meaning the statistic, computed from sample information, that estimates a population parameter.
curve approaches but never touches the X-axis 40 completely described by its mean and 2) A confidence interval is a range of values within which the population parameter is expected to
standard deviation(for dispersion)
occur . A range of values constructed from sample data so that the population parameter is likely to
FAMILY OF NORMAL PROB DISTR:
occur within that range at a specified probability. The specified probability is called the level of
1) EQUAL MEANS AND DIFFERENT SD
confidence.(90% etc) 95% confidence interval= 1.96 (z values) 90% confidence interval= 1.65 (z values)
3) Factors determine confidence interval : 1) The level of confidence (ex: 95%) 2) The size/variability of
2) DIFFERENT MEANS AND DIFFERENT DS standard error of the mean (standard dev. of sample mean) 3) num. of observations in the sample, n.
4) CONFIDENCE INTERVAL FOR A POPULATION MEAN WITH σ KNOWN
x  sample mean
3) DIFFERENT MEANS AND EQUAL DISTRIBUTION
z  z - value for a particular confidence level
σ  the population standard deviation
THE STANDARD NORMAL PROBABILITY DISTRIBUTION (used to determine all the
n  the number of observatio ns in the sample
probabilities for all normal prob dist.) (unique, has a mean of 0 and standard deviation of 1)
Any normal probability distribution can be converted to the standard normal probability 5) CONFIDENCE INTERVAL FOR A POPULATION MEAN WITH σ UNKNOWN-using t-dist.
t = x bar − μ /s∕√n (how to find t values)
distribution with the following formula =
Characteristics of T-distribution: 1) It is, like the z distribution, a continuous distribution. 2)It is, like the z
EMPIRICAL RULE: 1) z of 1.00 = .3413 so .3413 * 2 = .6826 or about 68% 2) z of 2.00 = .4772
distribution, bell-shaped and symmetrical. 3) T distribution is flatter, more spread out at center than the
so .4772 * 2 = .9544 or about 95% 3) z of 3.00 = .4987 so .4987 * 2 = .9974 or about 99.7%
FINDING A VALUE FOR X USING Z standard normal distribution, because the standard deviation of the t distribution is larger than that of the
*Two unknown which are x and z
standard normal distribution 4) a family of t distributions. All t distributions have a mean of 0, but their
First, find z, look at the probability
standard deviations differ according to the sample size
Under the curve and find it in the
6) CONFIDENCE INTERVAL FOR POPULATION PROPORTION (nominal scale measurement,outcome is limited to
content of the table. two values)
PROPORTION: The fraction, ratio, or percent indicating the part of the sample or the population having a
APPROXIMATE A BINOMIAL DISTR. particular trait of interest.
A sample proportion, p, is found by x, the number of successes, divided by n, the number of observations
USING NORMAL PROB DISTR. Under
Certain conditions: 1) nπ and A population proportion is identified by "π" (success) Two requirements:
n(1-"π") must both be at least 5
1)The binomial conditions have been met
2) n is the number of observations 2)The values nπ and n(1- "π") should both be greater than or equal to 5
3) π is the probability of a success -Confidence Interval for a population proportion formula =
The four conditions for a binomial probability distribution are
1) There are only two possible outcomes
2) "π" (pi) remains the same from trial to trial 7) DETERMINING SAMPLE SIZE TO ESTIMATE POPULATION MEANS =
3) The trials are independent
4) The distribution results from a count of the number of successes in a fixed number of Three factors that determine the sample size when we wish to estimate the mean
trials 1)The margin of error, E 2) The desired level of confidence, for example, 95% 3)The variation in the population
8) DETERMINING SAMPLE SIZE TO ESTIMATE POPULATION PROPORTION =

three factors that determine the sample size when we wish to estimate a proportion
1)The margin of error, E 2)The desired level of confidence 3)A value for "π" to calculate the variation in the
population
The size of the standard error is affected by two values. The first is the standard deviation of the population. The larger the population standard deviation, σ, the larger σ∕√n. If the population is
homogeneous, resulting in a small population standard deviation, the standard error will also be small. However, the standard error is also affected by the number of observations in the sample.
A large number of observations in the sample will result in a small standard error of estimate, indicating that there is less variability in the sample means.

You might also like