Statistics-Glossary CSE
Statistics-Glossary CSE
Statistics-Glossary CSE
Glossaries
What Is Population?
A population is the complete set group of individuals, whether that group comprises a
nation or a group of people with a common characteristic.
What Is a Sample?
A sample refers to a smaller, manageable version of a larger group. It is a subset
containing the characteristics of a larger population. Samples are used in statistical
testing when population sizes are too large for the test to include all possible
members or observations. A sample should represent the population as a whole
and not reflect any bias toward a specific attribute.
There are several sampling techniques used by researchers and statisticians, each
with its own benefits and drawbacks.
What is Parameter?
A parameter is a useful component of statistical analysis. It refers to the
characteristics that are used to define a given population. It is used to describe a
specific characteristic of the entire population. When making an inference about
the population, the parameter is unknown because it would be impossible to collect
information from every member of the population. Rather, we use a statistic of a
sample picked from the population to derive a conclusion about the parameter.
A parameter is used to describe the entire population being studied. For example, we
want to know the average length of a butterfly. This is a parameter because it is states
something about the entire population of butterflies.
Populations, Samples, Parameters, and Statistics
The field of inferential statistics enables you to make educated guesses about the numerical
characteristics of large groups. The logic of sampling gives you a way to test conclusions about
such groups using only a small portion of its members.
A population is a group of phenomena that have something in common. The term often refers to
a group of people, as in the following examples:
All Americans who played golf at least once in the past year
Often, researchers want to know things about populations but do not have data for every person
or thing in the population. If a company's customer service division wanted to learn whether its
customers were satisfied, it would not be practical (or perhaps even possible) to contact every
individual who purchased a product. Instead, the company might select a sample of the
population. A sample is a smaller group of members of a population selected to represent the
population. In order to use statistics to learn things about the population, the sample must
be random. A random sample is one in which every member of a population has an equal
chance of being selected. The most commonly used sample is a simple random sample. It
requires that every possible sample of the selected size has an equal chance of being used.
Different symbols are used to denote statistics and parameters, as Table 1 shows.
When a statistic is used for estimating a population parameter, the statistic is called an estimator.
A population parameter is any characteristic of a population under study, but when it is not
feasible to directly measure the value of a population parameter, statistical methods are used to
infer the likely value of the parameter on the basis of a statistic computed from a sample taken
from the population. For example, the sample mean is an unbiased estimator of the population
mean. This means that the expected value of the sample mean equals the true population mean.[1]
A descriptive statistic is used to summarize the sample data. A test statistic is used in statistical
hypothesis testing. Note that a single statistic can be used for multiple purposes – for example
the sample mean can be used to estimate the population mean, to describe a sample data set, or to
test a hypothesis.
What is an Estimator?
The sample mean is an estimator for the population mean. An estimator is a statistic that
estimates some fact about the population. We can also think of an estimator as the rule that
creates an estimate. For example, the sample mean(x̄) is an estimator for the population mean, μ.
The quantity that is being estimated (i.e. the one you want to know) is called the estimand. For
example, let’s say we wanted to know the average height of children in a certain school with a
population of 1000 students. We take a sample of 30 children, measure them and find that the
mean height is 56 inches. This is your sample mean, the estimator. We use the sample mean
to estimate that the population mean (our estimand) is about 56 inches.
Statistical hypothesis?
A statistical hypothesis test is a method of statistical inference used to decide whether the data
at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make
probabilistic statements about population parameters.
So when is a one-tailed test appropriate? If you consider the consequences of missing an effect in
the untested direction and conclude that they are negligible and in no way irresponsible or
unethical, then you can proceed with a one-tailed test. For example, imagine again that you have
developed a new drug. It is cheaper than the existing drug and, you believe, no less effective. In
testing this drug, you are only interested in testing if it less effective than the existing drug. You
do not care if it is significantly more effective. You only wish to show that it is not less
effective. In this scenario, a one-tailed test would be appropriate.
Choosing a one-tailed test for the sole purpose of attaining significance is not
appropriate. Choosing a one-tailed test after running a two-tailed test that failed to reject the null
hypothesis is not appropriate, no matter how "close" to significant the two-tailed test was. Using
statistical tests inappropriately can lead to invalid results that are not replicable and highly
questionable–a steep price to pay for a significance star in your results table!
The default among statistical packages performing tests is to report two-tailed p-values. Because
the most commonly used test statistic distributions (standard normal, Student’s t) are symmetric
about zero, most one-tailed p-values can be derived from the two-tailed p-values.
Below, we have the output from a two-sample t-test in Stata. The test is comparing the mean
male score to the mean female score. The null hypothesis is that the difference in means is
zero. The two-sided alternative is that the difference in means is not zero. There are two one-
sided alternatives that one could opt to test instead: that the male score is higher than the female
score (diff > 0) or that the female score is higher than the male score (diff < 0). In this instance,
Stata presents results for all three alternatives. Under the headings Ha: diff < 0 and Ha: diff >
0 are the results for the one-tailed tests. In the middle, under the heading Ha: diff != 0 (which
means that the difference is not equal to 0), are the results for the two-tailed test.
The p-value is a number, calculated from a statistical test, that describes how likely you are
to have found a particular set of observations if the null hypothesis were true. P-values are
used in hypothesis testing to help decide whether to reject the null hypothesis.