0% found this document useful (0 votes)
49 views30 pages

ESTADISTICA APLICADA - Elementos Básicos

This document provides an overview of inferential statistics concepts including hypothesis testing using the t-distribution. It discusses the basic steps of hypothesis testing which include formulating the null and alternative hypotheses, selecting a significance level, choosing a test statistic and calculating its value, identifying critical values, and making a decision to reject or fail to reject the null hypothesis. Key concepts explained include the t-distribution, how it is used for hypothesis tests involving sample means, and interpreting p-values. Examples of null and alternative hypotheses for common hypothesis tests are also provided.

Uploaded by

Martha Huaman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views30 pages

ESTADISTICA APLICADA - Elementos Básicos

This document provides an overview of inferential statistics concepts including hypothesis testing using the t-distribution. It discusses the basic steps of hypothesis testing which include formulating the null and alternative hypotheses, selecting a significance level, choosing a test statistic and calculating its value, identifying critical values, and making a decision to reject or fail to reject the null hypothesis. Key concepts explained include the t-distribution, how it is used for hypothesis tests involving sample means, and interpreting p-values. Examples of null and alternative hypotheses for common hypothesis tests are also provided.

Uploaded by

Martha Huaman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

ESTADISTICA APLICADA

ENTORNO STATA
2020-I
WALTER BAZÁN

Walter Bazán - Estadística Aplicada


INFERENTIAL STATISTICS
STATA
2020-I
WALTER BAZÁN

Walter Bazán - Estadística Aplicada


Basic Concepts
Experiment. An activity or measurement that results in an outcome
Sample Space. All possible outcomes of an experiment
Event. One or more of the possible outcomes of an experiment; a subset of the sample space
Probability. A number between 0 and 1 that expresses the chance that an event will occur
Classical Approach
Number of possible outcomes in which the event occurs / Total number of possible outcomes
Relative Frequency Approach
Number of trials in which the event occurs / Total number of trials
The relative frequency approach to probability depends on what is known as the law of large
numbers: Over a large number of trials, the relative frequency with which an event occurs will
approach the probability of its occurrence for a single trial.

Walter Bazán - Estadística Aplicada


Sampling Distribution
tĞ͛ƌĞnow going to escalate to a new level and consider the sample mean itself as a random
variable. If we were to take a great many samples of size n from the same population, we would
end up with a great many different values for the sample mean. The resulting collection of
sample means could then be viewed as a new random variable with its own mean and standard
deviation. The probability distribution of these sample means is called the distribution of
sample means, or the sampling distribution of the mean.

Walter Bazán - Estadística Aplicada


Walter Bazán - Estadística Aplicada
Hypothesis Testing
Overview
Researchers use statistical techniques to answer scientific questions. A very common statistical
technique for answering scientific questions is called hypothesis testing.
Hypothesis testing is an inferential procedure in which we test to see if we have sufficient evidence
to reject a null hypothesis (H0) in favor of an alternative hypothesis (H1).
We choose between H0 and H1 by computing a test statistic from a set of data, which quantifies the
strength of our evidence against the null hypothesis.
This statistic will follow a known sampling distribution, which in our examples will be the t-
distribution. The t-distribution is very similar to the standard normal z-distribution, just more spread
out for statistics based on small sample sizes.
Since we know the sampling distribution that our test statistic follows, we can calculate the
probability that it would be a certain size if H0 were true.

Walter Bazán - Estadística Aplicada


Hypothesis Testing
The null hypothesis is a statement about the value of a population parameter and is put up for
testing in the face of numerical evidence. The null hypothesis is either rejected or fails to be rejected
dŚĞŶƵůůŚLJƉŽƚŚĞƐŝƐƚĞŶĚƐƚŽďĞĂ͞ďƵƐŝŶĞƐƐĂƐƵƐƵĂů͕ŶŽƚŚŝŶŐŽƵƚŽĨƚŚĞordinary is ŚĂƉƉĞŶŝŶŐ͟
statement that practically invites you to challenge its truthfulness. In the philosophy of hypothesis
testing, the null hypothesis is assumed to be true unless we have statistically overwhelming evidence
to the contrary.
The alternative hypothesis is an assertion that holds if the null hypothesis is false. For a given test,
the null and alternative hypotheses include all possible values of the population parameter, so either
one or the other must be false.
ĚŝƌĞĐƚŝŽŶĂůĐůĂŝŵŽƌĂƐƐĞƌƚŝŽŶŚŽůĚƐƚŚĂƚĂƉŽƉƵůĂƚŝŽŶƉĂƌĂŵĞƚĞƌŝƐŐƌĞĂƚĞƌƚŚĂŶ;хͿ͕ĂƚůĞĂƐƚ;шͿ͕ŶŽ
ŵŽƌĞƚŚĂŶ;чͿ͕ŽƌůĞƐƐƚŚĂŶ;фͿƐŽŵĞƋƵĂŶƚŝƚLJ͘
ʅ т value
ʅ <,>, ш, ч value

Walter Bazán - Estadística Aplicada


Hypothesis Testing
The sampling distribution of the test statistic under H0 shows the distribution of values of the
test statistic that we would expect to obtain if H0 were true. This distribution tells us which
values of the test statistic are likely to occur when H0 is true, and of course which values are
unlikely to occur when H0 is true.
The center of the distribution contains the values that are likely to occur. The tails of the
distribution are where the values which are unlikely to occur lie.
For two tailed tests, we will reject H0 is if we obtain a test statistic in either the left or right tail
of the t-distribution. For one tailed tests, we will reject H0 only if we obtain a test statistic in the
relevant tail.
The critical value ŝƐƚŚĞ͞ĐƵƚŽĨĨ͟ƚŚĂƚĚŝƐƚŝŶŐƵŝƐŚĞƐǀĂůƵĞƐŽĨƚŚĞƚĞƐƚƐƚĂƚŝƐƚŝĐƐƚŚĂƚƌĞƐƵůƚƐŝŶ
rejecting the null from values that results in failing to reject the null.

Walter Bazán - Estadística Aplicada


Hypothesis Testing
Whenever we reject a null hypothesis, there is a
chance that we have made a mistakeͶi.e., that
we have rejected a true statement. Rejecting a
true null hypothesis is referred to as a Type I
error, and our probability of making such an error
is represented by the Greek letter alpha. This
probability, which is referred to as the
significance level of the test, is of primary concern
in hypothesis testing.
On the other hand, we can also make the mistake
of failing to reject a false null hypothesisͶthis is The probability of rejecting
a Type II error. Our probability of making it is a false null hypothesis is called the
represented by the Greek letter beta. power of the test

Walter Bazán - Estadística Aplicada


Hypothesis Testing
Basic Steps:
1. Formulate the null and alternative hypotheses. The null hypothesis asserts that a population
parameter is equal to, no more than, or no less than some exact value, and it is evaluated in the
face of numerical evidence. An appropriate alternative hypothesis covers other possible values
for the parameter.
2. Select the significance level. If we end up rejecting the null hypothesis, ƚŚĞƌĞ͛ƐĂchance that
ǁĞ͛ƌĞǁƌŽŶŐŝŶĚŽŝŶŐƐŽͶŝ͘Ğ͕͘ƚŚĂƚǁĞ͛ǀĞŵĂĚĞĂdLJƉĞ/error. The significance level is the
ŵĂdžŝŵƵŵƉƌŽďĂďŝůŝƚLJƚŚĂƚǁĞ͛ůůŵĂŬĞƐƵĐŚa mistake.
3. Select the test statistic and calculate its value. For example, the test statistic will be either z or
t, corresponding to the normal and t distributions, respectively.
4. Identify critical value(s) for the test statistic and state the decision rule. The critical value(s) will
bound rejection and nonrejection regions for the null hypothesis.

Walter Bazán - Estadística Aplicada


 Assertion͗͞ϯϱйŽĨƚŚĞƌŝĚĞƌƐĂƌĞƐĞŶŝŽƌĐŝƚŝnjĞŶƐ͘͟
 Null hypothesis: H0: ʅ = 0.35, where 0.35 the population proportion. The null hypothesis is
ŝĚĞŶƚŝĐĂůƚŽŚŝƐƐƚĂƚĞŵĞŶƚƐŝŶĐĞŚĞ͛ƐĐůĂŝŵĞĚĂŶĞdžĂĐƚǀĂůƵĞĨŽƌƚŚĞƉŽƉƵůĂƚŝŽŶparameter.
 Alternative hypothesis: H1: ʅ т 0.35. If the population proportion is not 0.35, then it must
be some other value.

Walter Bazán - Estadística Aplicada


Hypothesis Testing (the p-value)
First approach: Using a predetermined level of significance, establish critical value(s), then see
whether the calculated test statistic falls into a rejection region for the test. This is similar to
placing a high-jump bar at a given height, then seeing whether you can clear it.
Second approach: Determine the exact level of significance associated with the calculated value
of ƚŚĞƚĞƐƚƐƚĂƚŝƐƚŝĐ͘/ŶƚŚŝƐĐĂƐĞ͕ǁĞ͛ƌĞŝĚĞŶƚŝĨLJŝŶŐƚŚĞŵŽƐƚĞdžƚƌĞŵĞcritical value that the test
statistic would be capable of exceeding. This is equivalent to your jumping as high as you can
with no bar in place, then having the judges tell you how high you would have cleared if there
had been a crossbar.

Walter Bazán - Estadística Aplicada


Hypothesis Testing (t-distribution)
Your t-table gives you critical values based on the t-distribution for all of the most common levels of
significance.
When performing hypothesis tests for means, our test statistic follows a t distribution, and so our
critical value will be in terms of t.
The t-distribution is more spread out when the sample size is smaller. The intuition behind this is
that, while the z-distribution is based on a population ƐƚĂŶĚĂƌĚĚĞǀŝĂƚŝŽŶ;ʍͿ͕the t-distribution is
based on an estimated sample standard deviation (s).
So, our test statistic will be based not only on the mean of a random sample, but also on the
standard deviation of a random sample.
This sample standard deviation introduces and extra amount of natural variability in the possible
values that our test statistic can take on.
The test statistic tells us how much evidence we have against H0. The bigger the test statistic, the
stronger the evidence.

Walter Bazán - Estadística Aplicada


Hypothesis Testing (t-distribution)
For testing hypotheses (n-1 degrees of freedom):
X  0 X  0
t n1  
sX s
n
µH0 or ʅ0 denotes the claimed population mean (for example, a company might be making
a claim about this parameter).

(1-)% Confidence Interval Estimator:


s
X  t
n

The t Distribution 14
Hypothesis Testing (t-distribution)

Walter Bazán - Estadística Aplicada


Hypothesis Testing (t-distribution)

Walter Bazán - Estadística Aplicada


An important consideration in choosing a sample
statistic as a point estimate of the value of a population
parameter is that the sample statistic be an unbiased
estimator. An estimator is unbiased if the expected
value of the sample statistic is the same as the actual
value of the population parameter it is intended to
estimate.

Three important point estimators introduced in the


chapter are those for a population mean
(ʅ), a population variance (ʍ2), and a population
proportion (ʋ).

Walter Bazán - Estadística Aplicada


Construction of the 95% confidence interval
for the population mean, based on a sample
of 30 rods for which the average diameter
is 1.400 inches. From past experience, the
population standard deviation is known to
be 0.053 inches. Because the latter is known,
the normal distribution can be used in
determining the interval limits. We have 95%
confidence that ʅ is between 1.381 and 1.419
inches

Walter Bazán - Estadística Aplicada


Walter Bazán - Estadística Aplicada
Exercise 5.1
The level of phosphate, in mg/dl in the blood of a patient undergoing dialysis treatment was
measured on six consecutive visits: 5.6, 5.1, 4.6, 4.8, 5.7, 6.4.
Construct a symmetric 99% confidence interval.

Walter Bazán - Estadística Aplicada


Survey data analysis
We collect data from a population of interest so that we can describe the population and make
inferences about the population.
Sampling
The goal of sampling is to collect data that represents the population of interest. If the sample does
not reasonably represent the population of interest, then we cannot accurately describe the
population or make inferences.
Sampling weights
 Correctly scaled sampling weights are necessary for estimating population totals.
 Typically provide for consistent and approximately unbiased estimates.
 Typically provide for more accurate variance estimation when used with the survey design
characteristics.

Walter Bazán - Estadística Aplicada


STATA ʹ Inferential Statistics
1. Declare survey design for dataset
svy: No todos los comandos de estimación de Stata son compatibles con svy. Aquellos compatibles con svy
cuya finalidad son realizar una descripción estadística son: mean, ratio, tabulate, proportions, total. Para
ver una lista, poner help svy estimation.
2. Confidence Intervals
ci
3. Hypothesis Testing
ttest, prtest, lincom
4. Correlation
corr, pwcorr
5. Regression Analysis
reg
Walter Bazán - Estadística Aplicada
STATA ʹ Inferential Statistics
Confidence Intervals
ci means [varlist] [if] [in] [weight] [, options]  confidence intervals

The ttest command (mean-comparison tests)


ttest varname == # [if] [in] [, level(#)]  one-sample t test
ttest varname [if] [in] , by(groupvar) [options1]  two-sample t test using groups
ttest varname1 == varname2 [if] [in], unpaired [unequal welch level(#)]  two-sample t test
using variables

Walter Bazán - Estadística Aplicada


STATA ʹ Inferential Statistics
The prtest command (tests of proportions)
prtest varname == #p [if] [in] [, onesampleopts]  one-sample test of proportion
prtest varname [if] [in] , by(groupvar) [twosamplegropts]  two-sample test of proportions using groups
prtest varname1 == varname2 [if] [in] [, level(#)]  two-sample test of proportions using variables
The lincom command (linear combinations of parameters)
lincom exp [, options]  Note: needs a previous estimation
eform generic label; exp(b) or odds ratio
hr hazard ratio irr incidence-rate ratio
rrr relative-risk ratio level(#) set confidence level; default is level(95)

Walter Bazán - Estadística Aplicada


STATA ʹ Inferential Statistics
Correlation Analysis
correlate [varlist] [if] [in] [weight] [, correlate_options]  display correlation matrix or
covariance matrix
pwcorr [varlist] [if] [in] [weight] [, pwcorr_options]  display all pairwise correlation
coefficients
Regression Analysis
regress depvar [indepvars] [if] [in] [weight] [, options]
regress performs ordinary least-squares linear regression. regress can also perform weighted
estimation, compute robust and cluster-robust standard errors, and adjust results for complex
survey designs.

Walter Bazán - Estadística Aplicada


ENAHO y STATA
svyset [psu] [weight] [, design_options options]
svyset  Declare survey design for dataset
Para la ENAHO, se tiene un muestreo por conglomerados (conglome), estratificado (estrato) y
probabilístico (factor07/fac500a).
svyset conglome [pw=factor07], strata(estrato)
Dependiendo de cómo ponderen los datos y del nivel de su base (hogar o individuos), la
interpretación de sus cálculos puede variar.
Recuerden:
 (factor07/fac500a) para estadísticas a nivel hogar  [pw = factor07]
 (factor07/fac500a * mieperho) para estadísticas a nivel individuo  [pw = factor07*mieperho]

Walter Bazán - Estadística Aplicada


Appendix: Linear Regression

Walter Bazán - Estadística Aplicada


Walter Bazán - Estadística Aplicada
Walter Bazán - Estadística Aplicada

You might also like