0% found this document useful (0 votes)
17 views19 pages

Advanced Statistics1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views19 pages

Advanced Statistics1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

(BA-STAT102) Advanced Statistics – CHAPTER 1: REVIEW OF ESTIMATES &

SAMPLE SIZES

Statistics is an efficient and effective system or body of knowledge and


processes that deals with the collection, organization and presentation, analysis and
interpretation of all kinds of data pertinent to the study/research being considered.

Four Essential Processes in Statistics


1. Collection of Data
- refers to the gathering of related information-
- Process involves (what is useful and needed, (b) where to get
information, and (c) how to get information.
2. Presentation of Data
- Refers to the systematics way of organizing data.
- Process involves (a) collecting, (b) classifying and © arraying data
gathered in preparation to its analysis.
3. Analysis of data
- Refers to extracting relevant information from the data at hand
- Process involves (a) comparison, (b) description, and © statistical
measurements to come up with numerical values and/or qualitative
summary as a resulting conclusion.
4. Interpretation of data
- Refers to the drawing of logical statements from the analyzed
information.
- Process involves (a) generalizing, (b) forecasting, and (c)
recommending solutions/interventions about the study.

Two Major Fields in Statistics


1. Descriptive Statistics
- It is a group of statistical measurements or methods that functions and
aims to describe the data.
- It exposes the basic characteristics or summaries of the data.
Descriptive statistics are brief informational coefficients that summarize a given
data set, which can be either a representation of the entire population or a sample of a
population.
Descriptive statistics are broken down into measures of central tendency and
measures of variability (spread). Measures of central tendency include the mean,
median, and mode, while measures of variability include standard deviation, variance,
minimum and maximum variables, kurtosis, and skewness.
(BA-STAT102) Advanced Statistics – CHAPTER 1: REVIEW OF ESTIMATES &
SAMPLE SIZES

Descriptive statistics, in short, help describe and understand the features of a


specific data set by giving short summaries about the sample and measures of the data.
The most recognized types of descriptive statistics are measures of center: the mean,
median, and mode, which are used at almost all levels of math and statistics. The
mean, or the average, is calculated by adding all the figures within the data set and then
dividing by the number of figures within the set.
Types of Descriptive Statistics
 Distribution (also called Frequency Distribution) – Data sets consist of a
distribution of scores or values. Statisticians use graphs and tables to summarize the
frequency of every possible value of a variable, rendered in percentages or
numbers. For instance, if you held a poll to determine people’s favorite Beatle, you’d
set up one column with all possible variables (John, Paul, George, and Ringo), and
another with the number of votes.
 Measures of Central Tendency - estimate a dataset's average or center, finding the
result using three methods: mean, mode, and median.
 Mean. The mean is also known as “M” and is the most common method for finding
averages. You get the mean by adding all the response values together, dividing the
sum by the number of responses, or “N.”
 Mode. The mode is just the most frequent response value. Datasets may have any
number of modes, including “zero.” You can find the mode by arranging your
dataset's order from the lowest to highest value and then looking for the most
common response.
 Median. Finally, we have the median, defined as the value in the precise center of
the dataset. Arrange the values in ascending order (like we did for the mode) and
look for the number in the set’s middle. In this case, the median is eight.
 Variability (also called Dispersion) - The measure of variability gives the statistician
an idea of how spread out the responses are. The spread has three aspects range,
standard deviation & variance.

2. Inferential Statistics
- It is a group of statistical measurements or methods that functions or
aims to infer or to make interpretations.
- It makes a concluding statement about the population based on the
result derived from the data of the sample.
- When you have collected data from a sample, you can use inferential
statistics to understand the larger population from which the sample is
taken.
(BA-STAT102) Advanced Statistics – CHAPTER 1: REVIEW OF ESTIMATES &
SAMPLE SIZES

Inferential statistics have two main uses:


- Making estimates about populations (for example, the mean
Examination score of all 11th graders in the Philippines).
- Testing hypothesis to draw conclusions about populations (for
example, the relationship between Examination scores family income).
Example: Analysis of variance
t-test
regression
Chi-square test
Types of Data
Raw data – data in their original form, as they were collected.
1. Categorical Data – are observations that are part in the same or different
classes. To categorize is to make different things equivalent; group objects,
people & events into classes & to study them according to class, not according to
their uniqueness & sometimes they consist of “qualitative differences”.
2. Ranked data – are observations that show their relative positions based on
some characteristics, without necessarily yielding a numerical value for that
characteristic. For example, the order of finish of a horse race is listed as first,
second, third, etc. in which they are ranked according to speed.
Rank data as the name implied is only about “rank order” or position in a series
which may not include numerical values at all, although the ranking may have been
based on the numerical values that characterized the ranking.
3. Numerical data or variables – can be thought of as discrete or continuous.

Classification of Data
A. According to Nature
1. Quantitative Data – information that are obtains from variables which
are in the form of numbers.
Ex. Age, bills, financial ratios, supply capacity
2. Qualitative Data – information that one obtains from variables which
are in the form of categories, characteristics, names or labels.
Ex. Gender, type of business, socio-economic status
B. According to Source
1. Primary Data – first –hand information
Ex. Autobiography, financial statement
(BA-STAT102) Advanced Statistics – CHAPTER 1: REVIEW OF ESTIMATES &
SAMPLE SIZES

2. Secondary Data – second – hand information


Ex. Information from business journal
economic indicators from newspapers
C. According to Measurement
1. Discrete Data – set of countable numerical observation
- data obtained through the process of counting
Ex. # of members of a family, # of students in Mathematics
2. Continuous Data – set of measurable observations
- data obtained through process of measuring
Ex. Height, weight, length & temperature

Methods of Gathering Data


The following are the methods of gathering data:
1. Interview Method - Using this method, information Is obtained through oral
exchange of questions and answers by the researcher and the respondents of
the study.
2. Questionnaire Method - Using this method, information is provided and written
by the respondents corresponding to the items on a questionnaire carefully by
the researcher.
3. Document Method - Using this method, information is stored and documented
by certain institutions, whether private or government, and made available for the
researcher’s perusal.
4. Observation Method - Using this method, information is acquired or recorded
through direct observation by the researcher of special respondents or subjects
of the study without prior influence of the former.
5. Experiment Method - Using this method, information is gathered similar to
observation method, but, it allows some manipulations or influences of the
respondents or subjects as required by the experimentation process.

Levels of Measurement & Scale


1. Nominal Scale: Categorical Data
- simply using numbers to label categories
- it is the lowest level of data measurement
- The numerical result in measuring variables is used for identification
purposes only.
- It does not signify any quantitative value.
Ex. Bank account number, tax identification number, telephone
number, Car Ownership,
(BA-STAT102) Advanced Statistics – CHAPTER 1: REVIEW OF ESTIMATES &
SAMPLE SIZES

2. Ordinal Scale: Ranked Data


- It has all the properties of the nominal scale.
- The numbers obtained does not only identify variables but also give
order/rank.
- Quantitative differences cannot be determined.
Ex. Managerial positions, expenditure priorities, Faculty rank,
Student class designation
Highest to Lowest) or (Lowest to Highest)
3. Interval Scale: Measurement Data
- It has all the properties of the Ordinal Scale.
- Quantitative differences can be determined.
- It does not have true value of zero
- Addition & subtraction of measurements can be performed.
Ex. IQ test scores of employees; Fahrenheit scales of temp.
measurement;
4. Ratio Scale
- It has all the properties of the interval scale.
- There is an absolute value of zero.
- Multiplication & division of measurements can be performed.
Ex. Earnings per share of common stock in a certain corporation
Sales of RTU Canteen
Voters in a particular area
There are four key important concepts in understanding not only descriptive
statistics and inferential statistics but also: the population, the sample, the parameter
and the statistic.

THE POPULATION:
In statistics, population is the entire set of items from which you draw data for a
statistical study.
A population is the entire group that you want to draw conclusions about.
A population can either be finite or infinite . If there is an upper limit to the number
of observations it contains, then, the population is finite, but, if there is no qualifying limit
to its size, then the population is infinite.
Examples of finite population:
(1) The number of families living in a certain community
(2) The number of students enrolled in a particular subject on a particular sem.
(3) The number of cards in a deck playing cards
(BA-STAT102) Advanced Statistics – CHAPTER 1: REVIEW OF ESTIMATES &
SAMPLE SIZES

Examples of Infinite population:


(1) The number of vehicles passing through the tollgate NLEX at all times
(2) The number of rocks on the beaches of the Atlantic Ocean
Population refers to the people who live in a particular area at a specific time. But in
statistics, population refers to data on your study of interest.
An example of a population would be the entire student body at a school. It would
contain all the students who study in that school at the time of data collection.
Depending on the problem statement, data from each of these students is collected. An
example is the students who speak English among students of a school.

When do you need to collect data from population


You use populations when your research calls for or requires you to collect data
from every member of the population.
For larger and more diverse populations, on the other hand a regional study on
people living in Europe while you would get findings representative of the entire
population it would take a considerable amount of time.
It’s in these instances that you use sampling. It allows you to make more precise
inferences about the population as a whole and streamline your research project.
In research, a population doesn’t always refer to people. It can mean a group
containing elements of anything you want to study, such as objects, events,
organizations, countries, species, organisms, etc.

What is Sample?
Sampling is the process of collecting data from a small subsection of the
population and then using it to generalize over the entire set.
A sample consists of a smaller group of entities, which are taken from the entire
population. This creates a subset group that is easier to manage and has the
characteristics of the larger population.
A sample is the specific group that you will collect data from. The size of the
sample is always less than the total size of the population.
This smaller subset is then surveyed to gain information and data. The sample
should reflect the population as a whole, without any bias towards a specific attribute or
characteristic.
(BA-STAT102) Advanced Statistics – CHAPTER 1: REVIEW OF ESTIMATES &
SAMPLE SIZES

Types of Sample
 Probability sampling, also known as random sampling, is a kind of sample
selection where randomization is used instead of deliberate choice.
 Simple random sampling - Every element in the population has an equal chance of
being selected as part of the sample.
 Systematic sampling - Also known as systematic clustering, in this method,
random selection only applies to the first item chosen. A rule then applies so that
every nth item or person after that is picked.
 Stratified random sampling - Sampling uses random selection within predefined
groups.
 Cluster sampling - Groups rather than individual units of the target population are
selected at random.
Non-probability sampling techniques involve the researcher deliberately picking
items or individuals for the sample based on their research goals or knowledge
 Convenience sampling – People or elements in a sample are selected based
on their availability.
 Quota sampling – The sample is formed according to certain groups or criteria.
 Purposive sampling Also known as judgmental sampling. The sample is
formed by the researcher consciously choosing entities, based on the survey
goals.
 Snowball sampling Also known as referral sampling. The sample is formed by
sample participants recruiting connections.
Difference Between Population & Sample:
Population Sample
All residents of a country would constitute All residents who live above the poverty
the population set. line would be the sample.
All residents above the poverty line in a All residents who are millionaires will
country would be the population. make-up the Sample
All employees in an office would be the Out of all the employees, all managers in
population. the office would be the Sample.

Estimating population parameters from sample statistics


The characteristics of samples and populations are described by numbers called
statistics and parameters
(BA-STAT102) Advanced Statistics – CHAPTER 1: REVIEW OF ESTIMATES &
SAMPLE SIZES

THE PARAMETER
When you collect data from a population or a sample, there are various
measurements and numbers you can calculate from the data.
A parameter is a measure that describes the whole population (population
mean).
Example: 20% of Philippine senators voted for a specific measure. Since there
are only 24 senators, you can count what each of them voted.
Sampling error is the difference between a parameter and a corresponding
statistic. Since in most cases you don’t know the real population parameter, you can use
inferential statistics to estimate these parameters in a way that takes Sampling error into
account.
There are two important types of estimates you can make about the population: point
estimates and interval estimates.
 A Point Estimate is a single value estimate of a parameter. For instance, a
sample mean is a point estimate of a population mean.
 An Interval Estimate gives you a range of values where the parameter is
expected to lie. A confidence interval is the most common type of interval
estimate.
Both types of estimates are important for gathering a clear idea of where a parameter is
likely to lie.

THE STATISTICS
A statistic is a measure that describes the sample. (Sample mean).
Example: 50% of people living in Cebu agree with the latest health care
proposal. Researchers can’t ask hundreds of millions of people if they agree, so they
take samples, or part of the population and calculate the rest.

1.2 Estimating a Population Mean: Large Sample


In estimating the population mean µ, the sample mean X usually provides the
best estimate of a population mean, although other sample statistic such as the median,
midrange, or mode could also be used. An estimator is a sample statistic (such as the
sample mean) that could be used to approximate a population parameter.
There are two important reasons why a sample mean is a better estimator of
population mean than other estimators such as the median or the mode.
(BA-STAT102) Advanced Statistics – CHAPTER 1: REVIEW OF ESTIMATES &
SAMPLE SIZES

1. For many population, the distribution of the sample mean X tends to be more
consistent (with less variation) than the distributions of other sample statistics.
2. For all population, the sample mean is an unbiased estimator of the population
mean µ , meaning that the distribution of the sample mean tends to center about
the value of the population mean µ.

CONFIDENCE INTERVAL
A confidence interval (or interval estimate) is a range (or an interval) of values
that is likely to contain the true value of the population parameter. A confidence interval
is associated with a degree of confidence, which is a measure of how certain you are
that the interval contains the population parameter. The definition of degree of
confidence uses α (lowercase Greek alpha) to describe a probability that corresponds to
an area. The degree of confidence is also called the level of confidence.
Common choices for the degree of confidence are 90% (with α = 0.01), 95%
(with α = 0.05), and 99% (with α = 0.01). The choice of 95% is most common because
it provides a good balance between precision & reliability.

CRITICAL VALUE
A critical value is the number in the borderline separating sample statistics that
are likely to occur from those that are unlikely to occur, the number Zα/2 is a critical
value that is a Z score with the property that it separates an area of α/2 in the right tail of
the standard normal distribution. There is an area of 1 – α between the vertical
borderlines at - Zα/2 and Zα/2
(BA-STAT102) Advanced Statistics – CHAPTER 1: REVIEW OF ESTIMATES &
SAMPLE SIZES
(BA-STAT102) Advanced Statistics – CHAPTER 1: REVIEW OF ESTIMATES &
SAMPLE SIZES
(BA-STAT102) Advanced Statistics – CHAPTER 1: REVIEW OF ESTIMATES &
SAMPLE SIZES

Example: Find the critical value Zα/2 corresponding to a 95% degree of confidence.
Solution: A 95% degree of confidence corresponds to α = 0.05 (5%). In Fig. 1.2, we
can see that the area in each of the shaded tail is α/2 = 0.025. The region to the left of
Zα/2 and bounded by the mean of Z = 0 must be 0.5 – 0.025 = 0.4750. We can now find
the critical value Zα/2 from table II, A, the area 0.4750 corresponds exactly to a Z value
of 1.96. Therefore, for a 95% degree of confidence, the critical value of Z α/2 = 1.96 at the
right tail and – 1.96 at the left.
Solve the following problems:
1. Find the critical value Zα/2 that corresponds to the given degree of confidence.
a. 99% b. 98%
(BA-STAT102) Advanced Statistics – CHAPTER 1: REVIEW OF ESTIMATES &
SAMPLE SIZES

MARGIN OF ERROR
Margin of Error (E) is the maximum likely (with probability 1 - α) difference
between the observed sample mean X and the true value pf the population mean (µ).
The margin of error E also called maximum error of the estimate can be found by
multiplying the critical value and the standard deviation of the sample mean.
In symbol,
E = Zα/2 · σ/ √n

1.2 ESTIMATING A POPULATION MEAN: LARGE SAMPLES (n > 30)


1.2.1 Finding Confidence Interval for the Population Mean – Based on Large samples (n
> 30)
X -E<µ<X+E where: E = Zα/2 · σ/ √n
The values X – E and X + E are called confidence interval limits.

Example: µ = 98.6 0F; X = 98.2 0F; σ = 0.62; n = 106


For a 0.95 degree of confidence
Find: a. The margin of error E
b. The confidence interval for µ
c. The confidence interval limits for µ.
Solution:
a. The 0.95 degree of confidence means that α = 0.05, so Zα/2 = 1.96 (From Table II, A)
Therefore:
E = Zα/2 · σ/ √n
E = (1.96) 0.62/√106
= (1.96) 0.62/10.295630
E = 0.12
b. With X = 98.20, and E = 0.12, the confidence interval for the population mean µ is
X-E<µ< X +E
= 98.20 – 0.12 < µ < 98.20 + 0.12
(BA-STAT102) Advanced Statistics – CHAPTER 1: REVIEW OF ESTIMATES &
SAMPLE SIZES

= 98.08 < µ < 98.32


Therefore, µ = 98.20 ± 0.12
Confidence interval limits µ = (98.08, 98.32)
Note that the confidence interval limits of 98.08 0F and 98.32 0F do not contain
98.6 0F, the value of µ. Therefore, it seems very unlikely that 98.6 0F is the correct value
of µ.

1.3 ESTIMATING A POPULATION MEAN: SMALL SAMPLES (n < 30)


When the sample size is small (n < 30), the sample mean X is generally best
point estimate of the population mean µ.
Based on small samples (n < 30), σ is UNKNOWN. If σ is unknown, the t
distribution may be used.
The t distribution is used to find critical value denoted by ta/2. For critical values
ta/2 use Table III.

1.3.1 CALCULATING MARGIN OF ERROR (E) WHEN n < 30 AND α IS UNKNOWN


E = ta/2 · s/ √n where: ta/2 has n – 1 degrees of
freedom (df)
Degrees of freedom df = n - 1

1.3.2 FINDING CONFIDENCE INTERVAL WHEN n < 30 AND µ IS UNKNOWN


X - E < µ < X + E where: E = ta/2 · s/ √n
Example: In a time – use study, 20 randomly selected managers were found to spend a
mean of 2.4 hours each day on paperwork. The standard deviation of the 20 scores is
1.3 hours. Also, the ample data appear to have a bell-shaped distribution. Construct
the 95% confidence interval for the mean time spent on paperwork by all
managers.
Solution: Given: n = 20 X = 2.4 hr. S = 1.3 hr.
With 95% level of confidence, α = 0.05 df = 20 – 1 = 19
cv (0.05, 19) ta/2 = 0.05/2 = (0.025, 19) = 2.093 (Table III, A)
(BA-STAT102) Advanced Statistics – CHAPTER 1: REVIEW OF ESTIMATES &
SAMPLE SIZES

E = ta/2 · s/ √n = 2.093 ( 1.3/ √20)


= 2.093 (1.3/4.472136)
E = 0.61

X - E<µ< X +E
2.4 - 0.61 < µ < 2.4 + 0.61
1.79 < µ < 3.01

Therefore, the confidence interval is µ = 2.4 ± 0.61


and the confidence interval limits are µ = (1.79, 3.01)
(BA-STAT102) Advanced Statistics – CHAPTER 1: REVIEW OF ESTIMATES &
SAMPLE SIZES
(BA-STAT102) Advanced Statistics – CHAPTER 1: REVIEW OF ESTIMATES &
SAMPLE SIZES

1.4 ESTIMATE A POPULATION PROPORTION


Probabilities and percent can be expressed as proportion. For example, if you
survey 1068 Filipino families and find that 694 of them have telephone, the sample
population p = s/n = 694/1068 = 0.650. You may also find the value of the sample
proportion p when the percentage is given directly.

1.4.1 CALCULATING MARGIN OF ERROR OF POPULATION PROPORTION P


E = Zα/2 · √pq/n
where: P = population proportion E = margin of error
p= sample proportion n = population size
q=1–p Zα/2 = critical value (found in Table II, A)

1.4.2 FINDING CONFIDENCE INTERVAL FOR THE POPULATION PROPORTION P


P-E<P<P+E where: E = Zα/2 · √pq/n
Example: In a survey of 1068 Filipinos, 694 stated that they have answering machines.
Using this sample result, find the 95% confidence interval of the population proportion of
all families who have answering machines.
Solution: Given: n = 1068 x = 694
Therefore, p = x/n = 694/1068 = 0.65
and q = 1 - p = 1 - 0.650 = 0.35
with 95% confidence interval, α = 0.05 which is divided equally between the two
tails so that the area 0.4750 (0.5 - 0.025 = 0.4750) correspond to the critical value Zα/2
= 1.96 (Table II, A)
where: E = Zα/2 · √pq/n
= 1.96 • √(0.650)(0.350)/ 1068 = 0.029
= 1.96 • √ 0.2275/1068
= 1.96 • √ 0.0002 E = 1.96 • 0.014595
E = 0.03

Therefore, the confidence interval is:


(BA-STAT102) Advanced Statistics – CHAPTER 1: REVIEW OF ESTIMATES &
SAMPLE SIZES

P - E<P< P +E
0.650 - 0.03 < P < 0.650 + 0.0
0.62 < P < 0.68
Confidence Interval P = 0.650 ± 0.03
Confidence Interval Limits P = (0.62, 0.68)

1.5 ESTIMATING SAMPLE SIZE


1.5.1 WHEN AN ESTIMATE p IS UNKNOWN
n = (Zα/2 ) 2 (pq)
E2

Example 1. Concerned with the increasing number of vehicular accidents due to


texting while driving, the Malayan Insurance Co. wants to estimate with a margin of
error of three percent (3%), the percentage of drivers who are texting while they are
driving. Assume that the insurance company wants a 95% confidence in the results,
how many drivers should the insurance company survey? Assume that the insurance
company had an estimate of p based on a previous study that showed that 18% of
drivers are texting while they are driving.
Solution: Given: p = 18% E = 3% = 0.03
q = 1 - p = 100% - 18% = 82% or 0.82
with a 95% level of confidence , α = 0.05, so, from Table II, A, Zα/2 = 1.96
n = (Zα/2 ) 2 (pq) = (1.96)2 (0.18) (0.82) = 3.8416 (0.1476) = 0.567020
E2 (0.03) 2 0.0009 0.0009
n = 630.0224 n = 630

1.5.2 WHEN AN ESTIMATE p IS NOT KNOWN


n = (Zα/2 ) 2 [ (0.5) (0.5)]
E2
When the value of p is not known, the value of 0.5 is assigned to each of p and q,
so the resulting sample size will be at least large as it should be.
(BA-STAT102) Advanced Statistics – CHAPTER 1: REVIEW OF ESTIMATES &
SAMPLE SIZES

Example 2. Suppose in example 1, the insurance company has no prior information


about the possible value of p. find the percentage of drivers who are texting while
driving, with 95% level of confidence.
Solution: As in Example 1, you use Zα/2 = 1.96p.and E = 0.03, but with no prior
knowledge of p.
n = (Zα/2 ) 2 [ (0.5) (0.5)] = (1.96)2 (0.25) = 3.8416 (0.25) = 0.9604
E2 (0.03) 2 0.0009 0.0009
n = 1067

SEATWORK:

Solve the following problems:


1. The Center for Education Statistics surveyed 4400 college graduates about the
lengths of time required to earn their bachelor’s degree. The mean is 5.15
years, and the standard deviation is 1.68 years. Based on these sample data,
construct a 90% confidence interval for the mean required by all college
graduates.
2. Construct a 95% degree of confidence for the mean income of all workers in the
Filipino Store. A sample of 25 workers shows that the income distribution is
minimal with a mean of P35,486 and standard deviation of P19,825.
3. Assume that the given sample is used to estimate a population proportion P.
Find the margin of error E that correspond to: n = 1020 x = 300 95% degree
of confidence.
4. How many TV household viewers must be surveyed to estimate the percentage
of households that is tuned to “Darna” of ABS-CBN? Assume a 99% degree of
confidence and a margin of error E of 0.02.
1) Assume that p = 25%
2) P is not known

You might also like