This document provides an overview of inferential statistics concepts including hypothesis testing using the t-distribution. It discusses the basic steps of hypothesis testing which include formulating the null and alternative hypotheses, selecting a significance level, choosing a test statistic and calculating its value, identifying critical values, and making a decision to reject or fail to reject the null hypothesis. Key concepts explained include the t-distribution, how it is used for hypothesis tests involving sample means, and interpreting p-values. Examples of null and alternative hypotheses for common hypothesis tests are also provided.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
49 views30 pages
ESTADISTICA APLICADA - Elementos Básicos
This document provides an overview of inferential statistics concepts including hypothesis testing using the t-distribution. It discusses the basic steps of hypothesis testing which include formulating the null and alternative hypotheses, selecting a significance level, choosing a test statistic and calculating its value, identifying critical values, and making a decision to reject or fail to reject the null hypothesis. Key concepts explained include the t-distribution, how it is used for hypothesis tests involving sample means, and interpreting p-values. Examples of null and alternative hypotheses for common hypothesis tests are also provided.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30
ESTADISTICA APLICADA
ENTORNO STATA 2020-I WALTER BAZÁN
Walter Bazán - Estadística Aplicada
INFERENTIAL STATISTICS STATA 2020-I WALTER BAZÁN
Walter Bazán - Estadística Aplicada
Basic Concepts Experiment. An activity or measurement that results in an outcome Sample Space. All possible outcomes of an experiment Event. One or more of the possible outcomes of an experiment; a subset of the sample space Probability. A number between 0 and 1 that expresses the chance that an event will occur Classical Approach Number of possible outcomes in which the event occurs / Total number of possible outcomes Relative Frequency Approach Number of trials in which the event occurs / Total number of trials The relative frequency approach to probability depends on what is known as the law of large numbers: Over a large number of trials, the relative frequency with which an event occurs will approach the probability of its occurrence for a single trial.
Walter Bazán - Estadística Aplicada
Sampling Distribution tĞ͛ƌĞnow going to escalate to a new level and consider the sample mean itself as a random variable. If we were to take a great many samples of size n from the same population, we would end up with a great many different values for the sample mean. The resulting collection of sample means could then be viewed as a new random variable with its own mean and standard deviation. The probability distribution of these sample means is called the distribution of sample means, or the sampling distribution of the mean.
Walter Bazán - Estadística Aplicada
Walter Bazán - Estadística Aplicada Hypothesis Testing Overview Researchers use statistical techniques to answer scientific questions. A very common statistical technique for answering scientific questions is called hypothesis testing. Hypothesis testing is an inferential procedure in which we test to see if we have sufficient evidence to reject a null hypothesis (H0) in favor of an alternative hypothesis (H1). We choose between H0 and H1 by computing a test statistic from a set of data, which quantifies the strength of our evidence against the null hypothesis. This statistic will follow a known sampling distribution, which in our examples will be the t- distribution. The t-distribution is very similar to the standard normal z-distribution, just more spread out for statistics based on small sample sizes. Since we know the sampling distribution that our test statistic follows, we can calculate the probability that it would be a certain size if H0 were true.
Walter Bazán - Estadística Aplicada
Hypothesis Testing The null hypothesis is a statement about the value of a population parameter and is put up for testing in the face of numerical evidence. The null hypothesis is either rejected or fails to be rejected dŚĞŶƵůůŚLJƉŽƚŚĞƐŝƐƚĞŶĚƐƚŽďĞĂ͞ďƵƐŝŶĞƐƐĂƐƵƐƵĂů͕ŶŽƚŚŝŶŐŽƵƚŽĨƚŚĞordinary is ŚĂƉƉĞŶŝŶŐ͟ statement that practically invites you to challenge its truthfulness. In the philosophy of hypothesis testing, the null hypothesis is assumed to be true unless we have statistically overwhelming evidence to the contrary. The alternative hypothesis is an assertion that holds if the null hypothesis is false. For a given test, the null and alternative hypotheses include all possible values of the population parameter, so either one or the other must be false. ĚŝƌĞĐƚŝŽŶĂůĐůĂŝŵŽƌĂƐƐĞƌƚŝŽŶŚŽůĚƐƚŚĂƚĂƉŽƉƵůĂƚŝŽŶƉĂƌĂŵĞƚĞƌŝƐŐƌĞĂƚĞƌƚŚĂŶ;хͿ͕ĂƚůĞĂƐƚ;шͿ͕ŶŽ ŵŽƌĞƚŚĂŶ;чͿ͕ŽƌůĞƐƐƚŚĂŶ;фͿƐŽŵĞƋƵĂŶƚŝƚLJ͘ ʅ т value ʅ <,>, ш, ч value
Walter Bazán - Estadística Aplicada
Hypothesis Testing The sampling distribution of the test statistic under H0 shows the distribution of values of the test statistic that we would expect to obtain if H0 were true. This distribution tells us which values of the test statistic are likely to occur when H0 is true, and of course which values are unlikely to occur when H0 is true. The center of the distribution contains the values that are likely to occur. The tails of the distribution are where the values which are unlikely to occur lie. For two tailed tests, we will reject H0 is if we obtain a test statistic in either the left or right tail of the t-distribution. For one tailed tests, we will reject H0 only if we obtain a test statistic in the relevant tail. The critical value ŝƐƚŚĞ͞ĐƵƚŽĨĨ͟ƚŚĂƚĚŝƐƚŝŶŐƵŝƐŚĞƐǀĂůƵĞƐŽĨƚŚĞƚĞƐƚƐƚĂƚŝƐƚŝĐƐƚŚĂƚƌĞƐƵůƚƐŝŶ rejecting the null from values that results in failing to reject the null.
Walter Bazán - Estadística Aplicada
Hypothesis Testing Whenever we reject a null hypothesis, there is a chance that we have made a mistakeͶi.e., that we have rejected a true statement. Rejecting a true null hypothesis is referred to as a Type I error, and our probability of making such an error is represented by the Greek letter alpha. This probability, which is referred to as the significance level of the test, is of primary concern in hypothesis testing. On the other hand, we can also make the mistake of failing to reject a false null hypothesisͶthis is The probability of rejecting a Type II error. Our probability of making it is a false null hypothesis is called the represented by the Greek letter beta. power of the test
Walter Bazán - Estadística Aplicada
Hypothesis Testing Basic Steps: 1. Formulate the null and alternative hypotheses. The null hypothesis asserts that a population parameter is equal to, no more than, or no less than some exact value, and it is evaluated in the face of numerical evidence. An appropriate alternative hypothesis covers other possible values for the parameter. 2. Select the significance level. If we end up rejecting the null hypothesis, ƚŚĞƌĞ͛ƐĂchance that ǁĞ͛ƌĞǁƌŽŶŐŝŶĚŽŝŶŐƐŽͶŝ͘Ğ͕͘ƚŚĂƚǁĞ͛ǀĞŵĂĚĞĂdLJƉĞ/error. The significance level is the ŵĂdžŝŵƵŵƉƌŽďĂďŝůŝƚLJƚŚĂƚǁĞ͛ůůŵĂŬĞƐƵĐŚa mistake. 3. Select the test statistic and calculate its value. For example, the test statistic will be either z or t, corresponding to the normal and t distributions, respectively. 4. Identify critical value(s) for the test statistic and state the decision rule. The critical value(s) will bound rejection and nonrejection regions for the null hypothesis.
Walter Bazán - Estadística Aplicada
Assertion͗͞ϯϱйŽĨƚŚĞƌŝĚĞƌƐĂƌĞƐĞŶŝŽƌĐŝƚŝnjĞŶƐ͘͟ Null hypothesis: H0: ʅ = 0.35, where 0.35 the population proportion. The null hypothesis is ŝĚĞŶƚŝĐĂůƚŽŚŝƐƐƚĂƚĞŵĞŶƚƐŝŶĐĞŚĞ͛ƐĐůĂŝŵĞĚĂŶĞdžĂĐƚǀĂůƵĞĨŽƌƚŚĞƉŽƉƵůĂƚŝŽŶparameter. Alternative hypothesis: H1: ʅ т 0.35. If the population proportion is not 0.35, then it must be some other value.
Walter Bazán - Estadística Aplicada
Hypothesis Testing (the p-value) First approach: Using a predetermined level of significance, establish critical value(s), then see whether the calculated test statistic falls into a rejection region for the test. This is similar to placing a high-jump bar at a given height, then seeing whether you can clear it. Second approach: Determine the exact level of significance associated with the calculated value of ƚŚĞƚĞƐƚƐƚĂƚŝƐƚŝĐ͘/ŶƚŚŝƐĐĂƐĞ͕ǁĞ͛ƌĞŝĚĞŶƚŝĨLJŝŶŐƚŚĞŵŽƐƚĞdžƚƌĞŵĞcritical value that the test statistic would be capable of exceeding. This is equivalent to your jumping as high as you can with no bar in place, then having the judges tell you how high you would have cleared if there had been a crossbar.
Walter Bazán - Estadística Aplicada
Hypothesis Testing (t-distribution) Your t-table gives you critical values based on the t-distribution for all of the most common levels of significance. When performing hypothesis tests for means, our test statistic follows a t distribution, and so our critical value will be in terms of t. The t-distribution is more spread out when the sample size is smaller. The intuition behind this is that, while the z-distribution is based on a population ƐƚĂŶĚĂƌĚĚĞǀŝĂƚŝŽŶ;ʍͿ͕the t-distribution is based on an estimated sample standard deviation (s). So, our test statistic will be based not only on the mean of a random sample, but also on the standard deviation of a random sample. This sample standard deviation introduces and extra amount of natural variability in the possible values that our test statistic can take on. The test statistic tells us how much evidence we have against H0. The bigger the test statistic, the stronger the evidence.
Walter Bazán - Estadística Aplicada
Hypothesis Testing (t-distribution) For testing hypotheses (n-1 degrees of freedom): X 0 X 0 t n1 sX s n µH0 or ʅ0 denotes the claimed population mean (for example, a company might be making a claim about this parameter).
(1-)% Confidence Interval Estimator:
s X t n
The t Distribution 14 Hypothesis Testing (t-distribution)
Walter Bazán - Estadística Aplicada
Hypothesis Testing (t-distribution)
Walter Bazán - Estadística Aplicada
An important consideration in choosing a sample statistic as a point estimate of the value of a population parameter is that the sample statistic be an unbiased estimator. An estimator is unbiased if the expected value of the sample statistic is the same as the actual value of the population parameter it is intended to estimate.
Three important point estimators introduced in the
chapter are those for a population mean (ʅ), a population variance (ʍ2), and a population proportion (ʋ).
Walter Bazán - Estadística Aplicada
Construction of the 95% confidence interval for the population mean, based on a sample of 30 rods for which the average diameter is 1.400 inches. From past experience, the population standard deviation is known to be 0.053 inches. Because the latter is known, the normal distribution can be used in determining the interval limits. We have 95% confidence that ʅ is between 1.381 and 1.419 inches
Walter Bazán - Estadística Aplicada
Walter Bazán - Estadística Aplicada Exercise 5.1 The level of phosphate, in mg/dl in the blood of a patient undergoing dialysis treatment was measured on six consecutive visits: 5.6, 5.1, 4.6, 4.8, 5.7, 6.4. Construct a symmetric 99% confidence interval.
Walter Bazán - Estadística Aplicada
Survey data analysis We collect data from a population of interest so that we can describe the population and make inferences about the population. Sampling The goal of sampling is to collect data that represents the population of interest. If the sample does not reasonably represent the population of interest, then we cannot accurately describe the population or make inferences. Sampling weights Correctly scaled sampling weights are necessary for estimating population totals. Typically provide for consistent and approximately unbiased estimates. Typically provide for more accurate variance estimation when used with the survey design characteristics.
Walter Bazán - Estadística Aplicada
STATA ʹ Inferential Statistics 1. Declare survey design for dataset svy: No todos los comandos de estimación de Stata son compatibles con svy. Aquellos compatibles con svy cuya finalidad son realizar una descripción estadística son: mean, ratio, tabulate, proportions, total. Para ver una lista, poner help svy estimation. 2. Confidence Intervals ci 3. Hypothesis Testing ttest, prtest, lincom 4. Correlation corr, pwcorr 5. Regression Analysis reg Walter Bazán - Estadística Aplicada STATA ʹ Inferential Statistics Confidence Intervals ci means [varlist] [if] [in] [weight] [, options] confidence intervals
The ttest command (mean-comparison tests)
ttest varname == # [if] [in] [, level(#)] one-sample t test ttest varname [if] [in] , by(groupvar) [options1] two-sample t test using groups ttest varname1 == varname2 [if] [in], unpaired [unequal welch level(#)] two-sample t test using variables
Walter Bazán - Estadística Aplicada
STATA ʹ Inferential Statistics The prtest command (tests of proportions) prtest varname == #p [if] [in] [, onesampleopts] one-sample test of proportion prtest varname [if] [in] , by(groupvar) [twosamplegropts] two-sample test of proportions using groups prtest varname1 == varname2 [if] [in] [, level(#)] two-sample test of proportions using variables The lincom command (linear combinations of parameters) lincom exp [, options] Note: needs a previous estimation eform generic label; exp(b) or odds ratio hr hazard ratio irr incidence-rate ratio rrr relative-risk ratio level(#) set confidence level; default is level(95)
Walter Bazán - Estadística Aplicada
STATA ʹ Inferential Statistics Correlation Analysis correlate [varlist] [if] [in] [weight] [, correlate_options] display correlation matrix or covariance matrix pwcorr [varlist] [if] [in] [weight] [, pwcorr_options] display all pairwise correlation coefficients Regression Analysis regress depvar [indepvars] [if] [in] [weight] [, options] regress performs ordinary least-squares linear regression. regress can also perform weighted estimation, compute robust and cluster-robust standard errors, and adjust results for complex survey designs.
Walter Bazán - Estadística Aplicada
ENAHO y STATA svyset [psu] [weight] [, design_options options] svyset Declare survey design for dataset Para la ENAHO, se tiene un muestreo por conglomerados (conglome), estratificado (estrato) y probabilístico (factor07/fac500a). svyset conglome [pw=factor07], strata(estrato) Dependiendo de cómo ponderen los datos y del nivel de su base (hogar o individuos), la interpretación de sus cálculos puede variar. Recuerden: (factor07/fac500a) para estadísticas a nivel hogar [pw = factor07] (factor07/fac500a * mieperho) para estadísticas a nivel individuo [pw = factor07*mieperho]
Walter Bazán - Estadística Aplicada
Appendix: Linear Regression
Walter Bazán - Estadística Aplicada
Walter Bazán - Estadística Aplicada Walter Bazán - Estadística Aplicada