0% found this document useful (0 votes)
127 views

Statistics Individuals Sample Population Randomly Probability

Simple random sampling is a basic sampling technique where each member of the population has an equal chance of being chosen for the sample. It involves randomly selecting a subset of individuals from a larger population without bias, such that each individual and subset of individuals has the same probability of being selected. While simple random sampling provides an unbiased representation, it does not guarantee the sample will perfectly represent the population.

Uploaded by

Ann Jaleen Catu
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views

Statistics Individuals Sample Population Randomly Probability

Simple random sampling is a basic sampling technique where each member of the population has an equal chance of being chosen for the sample. It involves randomly selecting a subset of individuals from a larger population without bias, such that each individual and subset of individuals has the same probability of being selected. While simple random sampling provides an unbiased representation, it does not guarantee the sample will perfectly represent the population.

Uploaded by

Ann Jaleen Catu
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

In statistics, a simple random sample is a subset of individuals (a sample) chosen from a larger set (a population).

Each individual is chosen randomly and entirely by chance, such that each individual has the same probability of being chosen at any stage during the sampling process, and each subset of k individuals has the same probability of being chosen for the sample as any other subset [1] of kindividuals . This process and technique is known as simple random sampling, and should not be confused with random sampling. Simple random sampling is a basic type of sampling, since it can be a component of other more complex sampling methods. The principle of simple random sampling is that every object has the same possibility to be chosen. For example, N college students want to get a ticket for a basketball game, but there are not enough tickets (X) for them, so they decide to have a fair way to see who gets to go. Then, everybody is given a number (0 to N-1), and random numbers are generated, either electronically or from a table of random numbers. Non-existent numbers are ignored, as are any numbers previously selected. The first X numbers would be the lucky ticket winners. In small populations and often in large ones, such sampling is typically done "without replacement" , i.e., one deliberately avoids choosing any member of the population more than once. Although simple random sampling can be conducted with replacement instead, this is less common and would normally be described more fully as simple random sampling with replacement. Sampling done without replacement is no longer independent, but still satisfies exchangeability, hence many results still hold. Further, for a small sample from a large population, sampling without replacement is approximately the same as sampling with replacement, since the odds of choosing the same individual twice is low. An unbiased random selection of individuals is important so that in the long run, the sample represents the population. However, this does not guarantee that a particular sample is a perfect representation of the population. Simple random sampling merely allows one to draw externally valid conclusions about the entire population based on the sample. Conceptually, simple random sampling is the simplest of the probability sampling techniques. It requires a complete sampling frame, which may not be available or feasible to construct for large populations. Even if a complete frame is available, more efficient approaches may be possible if other useful information is available about the units in the population. Advantages are that it is free of classification error, and it requires minimum advance knowledge of the population other than the frame. Its simplicity also makes it relatively easy to interpret data collected via SRS. For these reasons, simple random sampling best suits situations where not much information is available about the population and data collection can be efficiently conducted on randomly distributed items, or where the cost of sampling is small enough to make efficiency less important than simplicity. If these conditions are not true, stratified sampling or cluster sampling may be a better choice.
Contents
[hide]

1 Distinction between a systematic random sample and a simple random sample 2 Sampling a dichotomous population 3 See also

4 References

[edit]Distinction

between a systematic random sample and a simple random sample


In a simple random sample, one person must take a random sample from a population, and not have any order in which one chooses the specific individual. Let us assume you had a school with 1000 students, divided equally into boys and girls, and you wanted to select 100 of them for further study. You might put all their names in a bucket and then pull 100 names out. Not only does each person have an equal chance of being selected, we can also easily calculate the probability of a given person being chosen, since we know the sample size (n) and the population (N): 1. In the case that any given person can only be selected once ie. after selection person is removed from the selection pool (basic probability):

2. In the case that any selected person is returned to the selection pool ie. can be picked more than once (Geometric distribution):

This means that every student in the school has in any case approximately 1 in 10 chance of being selected using this method. Further, all combinations of 100 students have the same probability of selection. If a systematic pattern is introduced into random sampling, it is referred to as "systematic (random) sampling". For instance, if the students in our school had numbers attached to their names ranging from 0001 to 1000, and we chose a random starting point, e.g. 0533, and then pick every 10th name thereafter to give us our sample of 100 (starting over with 0003 after reaching 0993). In this sense, this technique is similar to cluster sampling, since the choice of the first unit will determine the remainder. This is no longer simple random sampling, because some combinations of 100 students have a larger selection probability than others - for instance, {3, 13, 23, ..., 993} has a 1/10 chance of selection, while {1, 2, 3, ..., 100} cannot be selected under this method. [edit]Sampling

a dichotomous population

If the members of the population come in two kinds, say "red" and "black", one can consider the distribution of the number of red elements in a sample of a given size. That distribution depends on the numbers of red and black elements in the full population. For a simple random sample with replacement, the distribution is a binomial distribution. For a simple random sample without replacement, one obtains a hypergeometric distribution.

Definition of 'Simple Random Sample'


A subset of a statistical population in which each member of the subset has an equal probability of being chosen. A simple random sample is meant to be an unbiased representation of a group. An example of a simple random sample would be a group of 25 employees chosen out of a hat from a company of 250 employees. In this case, the population is all 250 employees, and the sample is random because each employee has an equal chance of being chosen.
Read more: http://www.investopedia.com/terms/s/simple-randomsample.asp#ixzz22Eh550Wh

Investopedia explains 'Simple Random Sample'


A sampling error can occur with a simple random sample if the sample doesn't end up accurately reflecting the population it is supposed to represent. For example, in our simple random sample of 25 employees, it would be possible to draw 25 men even if the population consisted of 125 women and 125 men. For this reason, simple random sampling is more commonly used when the researcher knows little about the population. If the researcher knew more, it would be better to use a different sampling technique, such as stratified random sampling, which helps to account for the differences within the population (such as age, race or gender).
Read more: http://www.investopedia.com/terms/s/simple-randomsample.asp#ixzz22Eh9UV8P

Simple random sampling


(go to Outline) Simple random sampling is the most intuitive sampling approach. If every household in the population has some unique identifier, such as a number or the name of the head of the household, and you know how many households you want to include in the survey sample, then you could simply write this identifier for each household on a separate piece of paper, put all the pieces of paper in a bag, shake well, and draw as many from the bag as you need to achieve your intended sample size. This is simple random sampling. Simple random sampling: Involves selection of households which is independent and random Is the basis for most statistical theory, that is:

o o

The most common methods to calculate p values and confidence limits The output from most statistics computer programmes assume simple random sampling

Regardless of what form your data are in, the important characteristic of simple random sampling is that the person doing the selecting has NO CONTROL over which households are selected. The selection is entirely random, and the selection of each household is not dependent on the selection of other households. Example of simple random sampling of 10 households from a list of 40 households We have a list of 40 heads of households. Each has a unique number, 1 through 40. We want to select 10 households randomly from this list. Using a random number table, we select consecutive 2-digit numbers starting from the upper left. If a random number matches a household's number, that household is added to the list of selected households. If a random number does not match a household's number (for example, if it is greater than 40), then it does not select a household. After each random number is used, it is crossed out so that it is never used again. We continue to select households until we have 10.

Note that even though the selected households appear somewhat clustered, if the random number table is truly random, the selected households have been randomly selected.

Simple Random Sampling


A simple random sample gives each member of the population an equal chance of being chosen. It is not a haphazard sample as some people think! One way of achieving a simple random sample is to number each element in the sampling frame (e.g. give everyone on the Electoral register a number) and then use random numbers to select the required sample. Random numbers can be obtained using your calculator, a spreadsheet, printed tables of random numbers, or by the more traditional methods of drawing slips of paper from a hat, tossing coins or rolling dice. The optimum sample is the one which maximises precision per unit cost, and by this criterion simple random sampling can often be bettered by other methods. Advantages

ideal for

statistical purposes Disadvantages

hard to achieve in practice requires an accurate list of the whole population expensive to conduct as those sampled may be scattered over a wide area

Random Numbers from a Calculator or Spreadsheet


Most electronic calculators have a RAN# function that produces a random decimal number between 0 and 1. The formula =RAND( ) in Excel achieves the same result, but to more decimal places. So how can you use these to select a random sample? Suppose you wanted to select a random lottery number between 1 and 49. There are two approaches. Firstly, you could multiply the electronic random number by 49 to get a random number between 0 and 49. Round this number up to the nearest whole number. For example, if the electronic random number is 0.497, when multiplied by 49 this gives 24.353, which you should round up to 25. Secondly, you could treat the electronic random number as a series of random digits and use the first two as your random number, ignoring any that are greater than 49. For example, the electronic random number 0.632 has first two digits 63 and you ignore it, whereas 0.317 gives

the random number 31.

Random Number Tables


Random number tables consist of a randomly generated series of digits (0-9). To make them easy to read there is typically a space between every 4th digit and between every 10th row. When reading from random number tables you can begin anywhere (choose a number at random) but having once started you should continue to read across the line or down a column and NOT jump about. Here is an extract from a table of random sampling numbers: 3680 7071 2231 2597 8846 5418 0498 5245

If we were doing market research and wanted to sample two houses from a street containing houses numbered 1 to 48 we would read off the digits in pairs 36 80 22 31 88 46 54 18 04 98 52 45 70 71 25 97 and take the first two pairs that were less than 48, which gives house numbers 36 and 22. If we wanted to sample two houses from a much longer road with 140 houses in it we would need to read the digits off in groups of three: 368 022 318 846 541 804 985 245 707 1 25 97 and the numbers underlined would be the ones to visit: 22 and 125. Houses in a road usually have numbers attached, which is convenient (except where there is no number 13). In many cases, however, one has first to give each member of the population a number. For a group of 10 people we could number them as: 0 1 2 Appleyard Banyard Croft 5 Francis 6 Gray 7 Hibbert

3 4

Durran Entwhistle

8 Jones 9 Lillywhite

By numbering them from 0 to 9 you need only use single digits from the random number table. 36802231884654180498524570712597 In this case the first digit is 3 and so Durran is chosen.

The most common sampling design in vegetation science is simple random sampling. Simple random sampling is a type of probability sampling where each sampling location is equally likely to be selected, and the selection of one location does not influence which is selected next. In statistical terms, the sampling locations are independent and identically distributed. Consider an example of simple random sampling (SRS) of canopy forest trees. You have determined that there are 24 canopy trees in the sampling universe of interest, and you want to take measurements from a subset of this group of 24, using simple random sampling. One way to do this is to number each tree (1-24), put numbers in a hat, and pick one. The tree corresponding to the number is now part of your sampling subset. Each number (that is, each tree) is equally likely to get picked and picking one number doesn't change the probability that another number will get picked next time. There are two versions of random sampling: sampling with replacement and sampling without replacement. In the example of tree numbers in a hat, if you return the selected number to the hat, the corresponding tree has another chance to get selected. (And if selected, you repeat your measurements on the tree.) That is sampling with replacement. If instead you discard a number once it is selectedsampling without replacementa tree can be selected only once. In vegetation science, SRS without replacement is much more common than SRS with replacement. Picking numbers out of a hat is perfectly valid, if done correctly, but there are better ways to select random numbers. Even if you are familiar with using random number tables and random number generators in calculators, review the section of the course called How to use random number tables and generators.

The general sequence for conducting simple random sampling studies in the field
The general procedures for any simple random sampling study in vegetation science are about the same. First, as has been emphasized in the course, you must determine your ecological objectives. For example, you might wish to know the stand basal area

of a community. Then decide on the sampling scheme, such as sampling by individual or sampling by area, as with quadrats. Then pick individuals or locations at random (this is the simple random sampling part), and take your measurements. Finally, you use the data you collected to make inferences about the whole sampling universe, coming up with statements like "the stand basal area is 49 m2/ha." In this section of the course you will learn the procedures for locating random samples in the field and the formulas for analyzing data collected from simple random samples.

Sampling by individual
The first step is to number all the individuals in your sampling universe. In simple random sampling, each of these individuals has an equal chance of being selected. This step is a lot trickier than it might seem. For one thing, you must use an unambiguous definition of what constitutes an individual. Plants that spread vegetatively are notoriously difficult to separate into individuals. You must also enumerate all individuals in your sampling universe; if you don't, you violate the "equal chance of being selected" tenet of simple random sampling. Perhaps the most common use of sampling by individuals is with mature trees, where separate trunks define individuals and it is feasible to number all individuals. Sampling rhizomatous grasses, mosses, and much of the rest of the plant world by individual is usually not feasible. Once your individuals are numbered, the next step is to select among those numbers at random, using a random number table or random number generator. You will make your measurements on the group of selected individuals. It can be inefficient to pick a random number, take measurements on that individual, pick another random number, take measurements on that individual, and so forth. Much better is to pick the numbers for all the individuals to be measured ahead of time. Then you can plot a short path that visits each selected individual, and save yourself a lot of time.

The figure on the right shows how this works. You have selected four trees at random. Don't go to the first tree you selected (marked as 1), make measurements, then traipse to the second tree. Rather, pick an efficient path, as from tree 3 to tree 1 to tree 4 to tree 2.

The wrong way to pick random individuals


As mentioned earlier, the process of selecting random individuals requires an enumeration of all individuals. This enumeration can be an exhausting task. It might be tempting to use other techniques for the random selection of individuals. One of the most tempting shortcuts is to use the coordinate system to find a random location, then select the nearest individual for measurement. Although this sounds good, it is both technically invalid and can produce bad data. Look at the diagram to see how this approach can go wrong. The illustration uses trees, but the principle holds for most any kind of plant. X marks the spot of a point selected at random; tree A is the nearest tree to this point (see left diagram). The problem with this approach is that plants are seldom uniformly distributed throughout vegetation. In the illustration, the trees are distributed in clumps. Look at the diagram on the right. The irregular polygons show all the points that are closest to the enclosed tree. That is, any random point that lands within the polygon results in that tree being selected. The polygon for tree A is much larger than the polygon for tree B, meaning that tree A is more likely to get selected than tree B. This violates the basic assumption of simple random sampling! Whenever plants are distributed in a nonuniform pattern, isolated individuals are more likely to be selected.

In the illustration, using this flawed technique for selected trees would produce misleading data. Because of crowding, trees within the clumps tend to be stunted and trees on the edge of clumps larger. In the illustration, taking measurements from trees that were selected because they are closest to random points will strongly overestimate tree abundance, because you are more likely to select trees on the edges of clumps.

Locating quadrats using the coordinate system


The coordinate system is easier to explain if we assume that your study area (your sampling universe) is a rectangular tract of vegetation. Later, you'll learn how to relax this requirement. So let's say your study area is 100 m by 60 m, and you want to sample with quadrats selected at random from this area.
Every point in this 100-m by 60-m rectangle corresponds to a pair of Cartesian coordinates. Call the 100-m side the X axis, and the 60-m side the Y axis. By picking a pair of random numbers, one between 0 and 100 and the other between 0 and 60, you are picking a

random location within your study area. The figure shows where your quadrat would be located if you picked as your random pair of numbers X = 60.7 and Y = 36.2

OK, but finding your quadrat in the field is not as easy as finding it on a diagram. The most efficient process is to create one axis of this coordinate system by placing a meter tape along one side of the study area, with the zero end of the tape at one corner. To locate your plot, go to the point on this axis corresponding to the first number in your random number pair. Then run a second tape out at right angles for a distance corresponding to the second number in your random number pair. To see this process in action, click here. (The coordinates have been rounded in this animation; do not round in the field.) Repeat this process for each quadrat location. As usual, it is more efficient to select the series of random numbers first, even in the lab well before going to the field. That way you can rearrange the sequence of quadrats into an efficient order. Once you have your random location for the quadrat, you need a system for actually placing the quadrat on the ground. You want a system that doesn't harm the vegetation and a system that is statistically valid. See the section on 'Hints for dealing with reality' for my advice.
An important note about resolution The axes in the coordinate system represent continuous numbers from 0 to the end of the axis. When picking random numbers, however, you have to determine how many digits of resolution to use in locating your quadrats. The example used a resolution of whole meters (0 digits). That means that quadrats could not be located at 61.4 m or 60.9 m. What resolution is acceptable? Use a resolution that is at least as fine as your quadrat size. For example, if you are using a 0.5 m by 0.5 m quadrat, use a resolution of at least 0.5 m. If you use a 1-m resolution, as in the example, then 3/4 of your study area will not be available for sampling! (Do you see why?) Because most quadrats in vegetation science are in the range of 0.2 m to 1.0 m, I recommend using a resolution of 0.01 m or sometimes 0.1 m.

Using a resolution of 0.01 m instead of a resolution of 1 m takes no more work with the random number table, except that you read two additional digits. It also takes no more work in the field. If you are using a standard tape in herbs and low shrubs, measuring to the nearest centimeter is just as fast as measuring to the nearest decimeter (0.1 m) or meter. You still find your position along the tape in the same way. So go with the finest resolution on your measuring device feasible in the vegetation you are studying. That's usually 0.01 m.

Locating quadrats using the grid system


In the grid system, you divide up your study area into nonoverlapping quadrat-sized rectangles. See the figure for what this looks like. These rectangles make up a grid for your study area. Do this on paper, not on the ground! Each rectangle segment of the resulting grid is a potential location for a quadrat. Number all the grid rectangles. Pick your quadrat locations by selecting from their numbers at random.

To actually find these quadrat locations in the field, use the procedure described for the coordinate system. Now is a good chance to visit How to use random number tables and generators, if you haven't already. This section explains some nuances about using random numbers in the coordinate and grid systems.)

Simple random sampling by area for non-rectangular study areas

Many studies in vegetation science do not have the luxury of rectangular study areas. You can still use the coordinate system, but there is some extra work involved. Basically, you pick random coordinates as before but discard any locations that fall outside your sampling universe. This process is a lot easier if you have a map of the area boundary so you can select random locations in the lab.

The grid system for selecting sample locations does not work well for non-rectangular study areas because the study area usually cannot be broken up into equal-sized rectangles.

Using GPS to locate quadrats


The Global Positioning System (GPS), coupled with Geographical Information Systems (GIS), provides an efficient way to locate points in the field. Modern, affordable GPS units can take you to a defined location within 2-5 meters. For locating sites or for locating large sampling plots, GPS can save a lot of effort. For intensive sampling with quadrats less than 200 m2 in area, GPS is usually too coarse and you need to stick with measuring tapes. A good procedure is to use a GPS unit to establish the boundaries of your study area, then use tape and stakes to locate sampling quadrats. Tips on using GPS in vegetation science:

Use a GPS unit with a high precision (at least within 5 m). Be wary of GPS units that drift rapidly, that is, the readings change before your eyes. Drift occurs because the unit has not locked onto enough satellites or does not have good enough software. If you have any drift, create a rule for knowing when you have reached your location, otherwise your subjective judgment will creep in and the sampling will no longer be random. A good rule is to stop the first time the GPS unit says

that you have reached your destination. That is, do not wait for the unit to "settle down." Once you have stopped, have a rule for locating the plot itself. I like to use the mid-point between the toes of my boots. If you use your GPS unit to enter the boundary of your study area as a polygon, you can use some GIS systems to help in sampling. For example, many GIS systems will select random coordinates from within a polygon. This automatic process is much faster than the coordinate method when your study area is highly irregular in shape.

Hints for dealing with reality


Sometimes the application of a procedure that sounds straightforward gets tricky in the application. This section presents some hints on dealing with the details of locating your samples.
What if the sample location is damaged?

If your next random sample location is filled with gopher holes or tire ruts, what do you do? If a deer trail runs through it or if the field crew had lunch at that spot, what do you do? There are two important questions involved. First, what is the cause of the damage? Second, what is your sampling universe? If the damage was caused by the process of sampling, skip that location and select another spot (using your procedure for randomization). If the damage was by another agent, like gophers, you then need to decide if locations disturbed by gophers are part of your sampling universe. It is legitimate to exclude damaged locations from your study, but only if you exclude them from your inferences. To see why this is important, think back to our familiar sword-fern example. Originally, the objective was to estimate sword-fern production in the entire tract. If you decide to exclude from your sample locations damaged by skid trails and gravel pits, you must state explicitly that your inferences are only for parts of the forest undamaged by skid trails and gravel pits. After all, if 50% of your forest is damaged by skid trails and gravel pits, extrapolating your measurements from pristine samples to the whole forest would be misleading and wrong.
Avoid self-inflicted damage

It is unavoidable. You have to walk through your study area as you establish its boundaries, as you find your sampling locations, and as you shift from side to side as you collect data. If a plot ends up where your boots have ripped up the vegetation, what do you do? (See the previous paragraph.) Best minimize the damage that you and your crew-mates inflict on your study area. Walk on animal trails when you can.

Know where your future plots will be, so you can avoid walking through those locations. Eat your lunch outside the study area.
Warnings and technicalities about the coordinate system.

When using the coordinate system, you need to decide if the selected coordinates designate the center of the quadrat or one of its corners. You also need to pick a plot orientation. Just pick a system (like "put the plot center at the selected coordinate and orient the long dimension of the quadrat north to south") and stick with it. The point of the system is to eliminate any subconscious bias in placing the quadrat frame. For example, in my experience, folks tend to move the frame away from poison oak but toward pretty flowers! Having a system protects your data from your subconscious biases. The coordinate system has problems along the boundaries. Let's say you are using coordinates to locate the center of a 1-m by 1-m quadrat. Then a coordinate of 0.3 m would place part of the quadrat outside the sampling area. In this system, any coordinate value less that half the length of the quadrat will put part of the quadrat outside the study area. (The same for coordinates near the end.) This is no good! You could just skip quadrat locations that extend beyond the edge of the study area. Or you could shrink the size of the quadrat, cutting it off at the boundary of the study area. Or you could fugedaboudit and ignore any quadrat that extends beyond your study area. None of these solutions is completely satisfactory because in different ways they violate the rules of simple random sampling. But the techniques for dealing with this problem correctly are much too difficult to use in the field. What to do?! I recommend that you skip quadrat locations that put the quadrat beyond the edge of the study area. Realize that this isn't quite legitimate, but it is probably the least bad. Besides, in practice, quadrats are much smaller than the study area and these difficulties with boundaries have little impact. But I thought you should know.
Overlapping quadrats

Sometimes the selection of random locations leads to quadrats that overlap each other. This is statistically acceptable and goes by the technical name of "sampling with replacement." But overlapping quadrats are hardy ever used in vegetation science. For one thing, the vegetation around the previous quadrat is usually disturbed by the process of sampling. The second, overlapping quadrat would then be damaged and give false data. (See above.) The standard procedure in vegetation science is to drop any random locations that would produce an overlap with a previous quadrat.

Knowing what is important

An important purpose of these guidelines for locating samples is to take the process out of our subjective hands and into an objective set of procedures. So it is important to follow the objective procedure precisely. But it is also important to recognize which part of your procedures are crucial for maintaining objective, representative, and independent observations -- and which parts are not. Imagine yourself at the end of a hard morning of sampling, when you discover that all your quadrat locations are off by half a meter because the tape establishing one Cartesian axis wasn't pulled quite tight enough. Do you throw away your data from the morning and start over? Not if you're on my crew! As long as the mistake didn't push a location outside your study boundary, everything is OK. The mistake was unintentional, so it couldn't impose a subjective choice on the location of quadrats. The locations are still random and independent of each other. Therefore the data collected from those locations are completely valid. Note the corrected locations, and get ready for the afternoon.

Locating lines for line-intercept measurements of cover


The process of locating lines involves selecting a starting point and a direction. The coordinate system described for locating quadrats also works for locating random starting points. You can then pick a random direction by, for example, picking a random number between 1 and 360 and going in that compass direction. This system has two problems with the boundaries of the study area that are similar to the problems with locating quadrats. The issue is more severe, though, because lines are long and are more likely to extend beyond the edge of the study area.
If you have a rectangular study area, there is a better way to locate lines. Say you have a 50 m by 100 m study area, and you want to locate 8 lines that are 25 m long. Picking a random starting point along the100-m axis of the study area, and then picking left or right at random (as by flipping a coin), is an efficient way to find line locations. This is valid simple random sampling, because every part of the study area is equally likely to be sampled and the location of one line does not affect the location of any other line.

What to do with your measurements? Calculations for SRS


Central tendency and variability

With simple random sampling without replacement, the best estimate of the population mean ( ) is usually the sample mean, the mean of your n measurements:

The best estimate of the population variability is usually the standard deviation of your data:

. There are separate formulas for and s2 for other sampling designs, like stratified random sampling and cluster sampling. Refer to the course references for details. Be sure to keep in mind your scientific objective: You want to make statements about the population mean and about your confidence in that mean. That is, you need to know the variability of your estimate of the mean, not the variability of the data. Lucky for us, statistical theory provides a way to convert from describing data to describing the behavior of your estimates of the mean:

, where n is the size of your sample, N is the size of the entire population, and is the amount you expect your estimates of the mean to vary. the standard error. is often called

But what about the factor on the right in the equation? This factor is called the finite population correction, or fpc. It is necessary because statistical distributions describe infinite populations, but sampling is from a carefully delimited (finite) population. (Reminder: The step of defining your study area / statistical population / sampling universe is the step that makes the sampling population finite.) You can see the effect of sample size on fpc at the extremes. When N is very large and n very small, fpc approaches 1 and the formula reduces to that of the familiar standard error.

When n = N, fpc = 0, which makes the estimate of variability = 0! But this makes sense because you have measured every member of the population and you now have a census, not a sample. Because you know the whole population, you know the mean exactly and there is no sampling error. Most studies in vegetation science ignore the finite population correction. Although technically incorrect, in practice it has little effect because sampling intensity in vegetation science is typically very low. For example, the sampling intensity of a study using 20 1-m2 quadrats per hectare is only 20/10000, so the fpc is

which is very close to 1.0. For the rest of the course, we will follow this grand tradition and usually not bother with the finite population correction factor unless sampling intensity goes above 10%.
Confidence and confidence interval

The next step is to convert your estimates of the population mean and its variability into confidence intervals. The statistical formula for the confidence interval with simple random sampling is the same as the standard formula (see the Statistical Background chapter and the Confidence Interval primer): to .

As before, is usually the best estimate of the population mean, t, the t-statistic, reflects both the number of samples and the level of confidence you have set (like 90%), and , the standard error, reflects the variability in the data.

More on precision
Before you use your carefully calculated values of central tendency and variability, pause a while to reflect on what contributes to the variability you measure. If your technique of vegetation measurement

22%? 28%?

varied from one time to another (and you know it did) then this measurement variabilitycontributes to overall variability.

If the vegetation itself varied from one sample location to another (and it always does), then this spatial heterogeneity or sampling variability contributes to overall variability.
Here's the important part. You can reduce the effect of sampling variability just by collecting more statistically valid samples. But the only way to reduce measurement variability is to get better at conducting the measurements themselves. That is what a lot of Chapter 3 was about, and what Chapter 9 will state again.

Putting your knowledge to the test


At this point, go to Assignments in Blackboard and select the quiz called "Using random numbers," if you haven't already. Then test your understanding of locating simple random samples with the exercise Locating quadrats.

You might also like