Research Methods in Politics: Palgrave
Research Methods in Politics: Palgrave
Research Methods in Politics: Palgrave
Zig Layton-Henry
palgrave
macmian
Chapter 6
r-fmnms&mfmm^m
Making Inferences
To make an inference is to ask, 'Are my research results applicable more widely than the specific cases I have used to obtain the results; and if so, how?' For instance, if a research project about proportional representation (PR) electoral systems covers the Netherlands, South frica, Argentina and Israel, inference-making happens when the researchers ask themselves 'To what extent are these research results about PR in the Netherlands, South frica, Argentina and Israel also true about other PR countries?' Using more technical terminology, political scientists aim to use a sample (e.g., the specific cases included in a research project) to gain knowledge about a particular population (in this example, all countries that use a PR electoral system). As Alan Bryman puts it, 'given that it is rarely feasible to send questionnaires or to interview whole populations (such as all members of a town, or the whole population of a country, or all members of an organization), we have to sample' (Bryman, 2001, p.75). However, at some point we will want to generalize from the sample to the population. This is where inference comes into play. Making inferences is very difficult and very important in both qualitative and quantitative political science, but it is fair to say that it is done more systematically and explicitly in quantitative research. The specialized terminology that goes along with inference-making places it squarely in the quantitative research tradition, and this may be a reason why non-quantitative researchers might feel excused from inference-making. However, inferencemaking goes on as much in qualitative research as in quantitative: the difference is merely that it is not commonly explicitly done in the qualitative research tradition. Moreover, like all other aspects of research, inference-making must be an explicit act in order to be reasoned and convincing. The chapter's first section explains why inference matters in political research, and this is followed by a section that distinguishes between descriptive and causal inference. Descriptive inference is the most common form of inference in political science; it is about
143
144
systematic description of selected cases, and on the basis of that description an inference may then take place in terms of what other cases might look like, or be like. Causal inference is mostly beyond reach for political scientists, who are rarely in a position to identify any causal relationships. Inferring causal relationships is therefore also rare. The chapter then turns to inference in quantitative and qualitative political science. The principie that underpins inference is the same for these two types of research, but the application of that principie differs greatly between them. In quantitative research a whole range of statistical tools are available. The Central Limit Theorem and the known properties of the so-called normal (or Gaussian) distribution enable researchers to attach probability statements to the question 'What is the probability that what the sarnple shows is also true of the population?' The sample mean is our best guess of the true but unknown population mean. If the sample is perfectly representative of the population, then the sample mean and the population mean will be the same. If there is sampling error, then the sample mean will deviate from the population mean. Since our real interest is in the population, not in the sample, it becomes crucial to be able to answer the question: 'How far away is the sample mean from the population mean?' The chi square test and the t-test can provide answers to this question, and this section of the chapter explains how to use these tests. Tools like these, flawed though they may be, are not available in qualitative research. Qualitative inference is therefore much less developed, although it is based on the same principie as quantitative inference: the principie of linking what we observe in a usually small sample to the usually much greater population that the sample was drawn from. Why inference matters Although there is sometimes a conception - or more accurately, a misconception - that figures somehow 'speak for themselves', theories of politics are essential in enabling researchers to interpret data. Theories of politics enable researchers to arrange abstract concepts in some relationship to one another, and then to examine some data to see whether the data seems to lend support to the theory. Usually the data do not include all possible cases, but just a sample. For instance, an opinin poli does not ask all the people in the country for their opinin, but only a small proportion of them. Whether or not the data seem to support the theory, the next step is usually to
Making Inferenees
145
consider whether the result applies more widely than to the data: if a particular conclusin could be drawn about the thousand or so people in an opinin poli, is it possible to draw the same conclusin about the rest of the population? Differently phrased, is it possible to make an inference from the sample to the population? Inference-making is often difficult and always uncertain, but avoiding it is normally not an easy option either because non-inferential research is the academic equivalent of navel-gazing. It is only relevant to itself, and is rarely of any importance to anyone else; in the opinin poli illustration above, it would mean that we only learn about 1,000 people, not about public opinin in any broader sense. Therefore, non-inferential research makes no contribution to theory-building or to refining hypotheses about political science and not much even to knowledge itself. Since it is never possible to collect and analyse all pieces of information about any political phenomenon, virtually all general knowledge in political science has emerged through inference-making. We know what we (think we) know about the political world because we have studied a few cases, and from these cases we hopefully extraplate general knowledge about other, similar cases, and try to determine under what conditions our research conclusions apply to them, too. In this manner inference-making serves to enhance the potential magnitude of the contribution a piece of research can make to theory-building and refining hypotheses. Most things that political scientists study are simultaneously unique cases and parts of general patterns, which means that case studies, even ones that cover only one or two cases, always need to address the questions: to what extent are these conclusions valid beyond the cases from which they were drawn? And to what extent are the conclusions due to the unique features of these cases? If theory-building and/or hypothesis development are one's purpose, then it often makes sense to deliberately select cases that seem to fit well into a general pattern, and then extend the conclusions from the case to the general pattern. In more technical language, the case as a sample must be representative of the population from which it was drawn (these terms are explained more fully below). Sometimes selecting a case that does not seem to fit well into a general pattern can also contribute to theory-building, by showing precisely how and why certain cases deviate from the norm in some significant way. For instance, Chapter 3 used the rise of far-right parties such as the Belgian Vlaams Blok and the Austrian Freedom Party to illustrate some important aspects of case selection in comparative research, and this example is relevant here, too.
146
Studying either the Vlaams Blok and/or the Freedom Party in order to learn about a general pattern of far-right parties would be a natural and appropriate choice. In inference-making, cases that are representative of a general pattern are the most useful ones to study. This does not negate the point made in Chapter 1, that unusual or unique cases are sometimes the most interesting to research. Studying a far-right party that has not had an electoral breakthrough might reveal what it is that makes that party less electorally successful (Chapter 3 introduces some comparative research designs that are useful for research with this type of purpose). Such a case is nevertheless not very useful as basis for inferences, due to it being unrepresentative of the general pattern of successful far-right parties. Essential terminology Inference-making has its own terminology. It is derived from statistics and therefore may not only be unfamiliar but also off-putting to qualitative political scientists. However, the terminology is not worse or more difficult to grasp than any other specialist terminology, and having a clear grasp of it helps making it an explicit act. The data used in a research project is the sample. A sample consists of a number of cases or observations (these two terms are synonymous). In political science cases are often countries, institutions, survey respondents, organizations, interviewees or parties. Sometimes a study deals with just a single case, such as a single Parliament, or a single election. Sometimes there are thousands of cases within a single study, such as in a public opinin survey. A sample is drawn from a population: the total universe of possible cases. The relationship between a sample and the population from which it was drawn is absolutely crucial in inference-making. Inference-making is only possible if the sample is representative of the population. Representative how? This depends on the research question. Say the purpose of a research project is to identify whether US citizens of different ethnic backgrounds have different opinions about how well the President is running the country. In terms of inference-making, then, the US population is the population (e.g., the total universe of cases), and the sample must be representative with respect to ethnic groups in the USA: each ethnic group should constitute roughly the same proportion of the sample as they do in the population. However, it does not matter at all if the sample is representative of the population in terms of people's eye colour, height or favourite food. What matters is that the
Making Inferences
147
sample is representative of the population in all research-relevant aspects. Occasionally, it can make sense to over-sample certain groups deliberately; Chapter 4 discusses this in terms of booster samples. The more representative the sample is of its population, the greater the certainty of any inference made about the population. Vice versa, the less representative the sample is of its population, the more uncertain the inference. Estimating the uncertainty of inferences is a key part of inference-making. This means attaching to the inference some measure of how likely it is that the sample really does teach us something about the population. There are several 'tools' that a researcher can use to manage the sampling process, such as sampling frames and sampling units. Chapter 4 explains what these are and their application. The divergence or discrepancies between a population and a sample drawn from it is known as sampling error. There is always sampling error, and this is the source of uncertainty that is always part and parcel of inference-making. However, not all kinds of sampling error are problematic: in quantitative analysis there is an important distinction between random and non-random sampling error. Random errors (if they truly are random) in the sample will over-estimate and under-estimate the population to the same extent, so on average random errors cancel each other out. This leaves nonrandom error to worry about, and the label 'non-random' indicates the presence of some systematically distorting influence on the sample that reduces its representativeness. For example, those who are more interested in politics are more likely to answer a survey about politics. Henee, all surveys overestimate things like political interest, voter turnout etc. Because this type of error is systematic, it does not cancel itself out. The implication is that inferences to the population are more uncertain.
148
Descriptive inferences are not mere descriptions: they are an attempt to use available data to crate a systematic description about political phenomena about which there are no available facts (see Box 6.1). An accurate, factual description is obviously one sine qua non of descriptive inference, but the second integral of descriptive inferences is to make a leap from the systematic description to some account of cases not studied (e.g., the population).
Box 6.1 Descriptve inference: the failed coup of August 1991 in the USSR
A handful of Communist hard-liners calling themselves the State Emergency Committee sought to halt and reverse Mikhail Gorbachev's reforms glasnost and perestroika by staging a coup against him in August 1991. The coup failed, and Gorbachev proceeded with his reform projects, which reached a kind of culmination on 25 December 1991, when he, as President of the USSR, dissolved the Union and by the same token resigned as its President. Students of transitions to democracy around the world (the former USSR bloc, frica, Latn America, Asia) might study the August 1991 coup in great detail, to understand what makes anti-reform coups fail. (Ideally, as Chapter 3 explains, such a research design should also include a case of a successful coup.) Differently phrased, this means drawing an inference from the particular case of the August 1991 coup to coups in general. To do this, all the details of the August 1991 coup must be sorted into two categories: (a) (b) details unique to the August 1991 coup (these details are the nonrandom error); and details that are generic to failed coups (these details are the aspects of the sample that are representa ti ve of the population).
Inference-making depends on being a ble to distinguish between these two types of details. Omitting to do so means that there is no attempt to 'leap' from the sample to the population. Failing to do so correctly means that the inference is false.
Causal inferences differ from descriptive ones in one very significant way: they take a 'leap' not only in terms of description, but in terms of some specific causal process. Causality is a fraught topic in political science, but the notion of causality developed by the thinker David Hume (1711-76) has survived relatively unscathed (see Box 6.2). This notion holds that causality occurs if the presence (or
Making Inferences
149
Box 6.2 Causality and poltica I science: the philosophy of David Hume
In A Treatise of Human Nature, David Hume (171176) ambitiously attempted to set out a complete system of the sciences. He saw all sciences as relating to human nature. Most importantly for the ideas of causality and causal inferences, he argued for a science of man: that is, explaining human behaviour and action, 'from the simplest and fewest causes'. He accepted that if it is possible to observe a 'constant conjunction' between two events or variables then it may be concluded that there is a process of causality at play, a process whereby one event causes another, subsequent event.
Since the political world is so complex and political scientists are rarely in a position to run experiments (see Chapter 3 for more details about the experimental method), political science tends strongly to be about probabilities rather than law-like regularities. Therefore, causal claims are rare in the discipline as political scientists are more comfortable with correlations than with causality. A correlation is weaker than causal mechanisms in two ways. First, a correlation does not assume any law-like regularity. In this respect correlations are probabilities. A strong, positive correlation between two variables means that a change on one variable tends to 'go with' a change in the same direction on the other variable. A weak correlation means that there is no pattern of change between the two variables. A strong, negative correlation means that a change on one variable tends to 'go with' a change in the opposite direction on the other variable. Second, a correlation between two variables leaves unspecified the direction of the effect. In contrast, in a causal relationship it is clear the causal direction is clear and constant. Keeping this in mind, causal inferences are naturally difficult to make, and always uncertain since it can rarely be known whether the variable we think is causing a given political outcome is actually the cause of that outcome. Since causality is so difficult to establish in political science, inference-making usually pertains to descriptive rather than causal inferences. A causal process will not simply reveal itself even when a researcher has data on all possible cases (the only situation that makes a nonsense of inferring from a sample to a population).
150
Box 6.3 Causality and correlation: public opinin in Central and Eastern Europe
In a 2001 opinin poli people in 13 Central and Eastern European countries applying to join the European Union were asked these two questions: 'Do you think that becoming a member of the European Union would bring (COUNTRY) . ..' Much more disadvantages/More disadvantages; As many advantages as disadvantages; More advantages/Many more advantages; Don't know/No answer. 'Generally speaking, do you think that (COUNTRY'S) membership of the European Union would be . . .' A good thing; Neither good or bad; A bad thing, This cross-tabulation shows that people who thought that EU membership would bring advantages also tended to think that EU membership would be a good thing. Equally, people who felt EU membership would be a bad thing by and large tended to hold the view that EU membership would bring disadvantages to their country (and people who were undecided about one question were also in the main undecided about the other question).
A Good Thing Neither Good or Bad
( /o)
A Bad Thing
( /o)
/ o/ \ o/ \ o/ \
More/Many more advantages As many advantages as disadvantages Much more/More disadvantages Don't know/No answer Total
81 13 2 4 100
20 46 20 14 100
6 14 76 4 100
Therefore, inferences in research about politics are usually about what unknown cases might look like or be like. The next section shows that even in quantitative political science, where inferencemaking is a far more established practice than it is in qualitative research, all that the statistics can normally achieve is to suggest the strength of some theorized relationship or the absence of such a relationship but statistics-based inference-making does not in itself bring to light a causal process (see Box 6.3).
Making Inferences
151
However, does this cross-tabulation show a causal relationship, or a correlation? Two criteria Help making the distinction: directionality and regularity. Directionality: Question: Answer: Causal relationships have clear cause(s) and effect(s) In the cross-tabulation, is the drection of effect clear? No. Tbere is no way of telling if people's views on membership being 'a good/bad/neither-nor' thing informs their opinin on advantages and disadvantages, or the other way round. The cause(s) have the same effect(s) at all times, all else being equal. In the cross-tabulation, do all people with the same attitude on one variable have the same attitude as each other on the other variable? No. There are cases in all cells ofthe cross-tabulation. If the relationship was causal, only one cell in each column ('A good thing-Many more/More advantages'; 'Neither or-As many advantages as disadvantages', and 'A bad thing-More/Many more disadvantages') would have observations. All other cells would be 'O'.
The cross-tabulation does not display a causal relationship, but it does display a positive correlation: the more positive someone's view of the EU, the more advantages that person is likely to perceive membership to bring. (A negative correlation would have meant that the more positive someone's view of the EU, the more likely the person would be to perceive EU membership as dsadvantageous. No correlation would mean no particular pattern between the two opinions.) Source: http://europa.eu.int, October 2002.
152
sample was drawn from. This section explains the Central Limit Theorem and the normal distribution, which form the basis for tests such as the chi square test and the t-test. These, and other probability tests, provide answers to key questions such as 'Does the correlation observed in this sample exist in the population, too?' and 'How well does the sample mean measure the population mean?' However, none of this will work unless samples are random, because it is the randomness of the sample that enables the link between the sample and its population. The definition of 'random' is that every case in the population has an equal and independen! chance of becoming part of the sample. To take a simple example, this would mean that if 50 people wrote their ames on pieces of paper and put them in a hat from which five ames were to be drawn, then each ame would have the same chance of being drawn in all five draws. This means that each piece of paper pulled out would have to be put back in the hat, so that it would have an equal and independent chance of being drawn again. The important point about random sampling is that it allows the sampling error to be estimated statistically with respect to both known and unknown sources of influence: this means that it is possible to establish the uncertainty of inferences. The normal distribution is actually a whole range of distributions, a 'family' of normal distributions. They share some key features that make them, by definition, normal distributions. Normal distributions are symmetric, smooth, and bell-shaped. It follows from this that they are unimodal and have no skewness. What distinguishes different normal distributions is their means and standard deviations. That is, although two normal distributions are very alike in that they are unimodal and unskewed, they may have different degrees of kurtosis (see Chapter 5 for an explanation of modality, skewness and kurtosis). As the variables that political scientists use are typically samples rather than populations, these variables can be seen as attempts to capture some essential aspects of an unknown population. What is more, each observation on a variable can be seen as a measure of the population, and consequently the sample mean (e.g., the variable mean) represents an estmate of the true population mean. However, this begs the question of how accurately the sample mean estimates the population mean? The particular significance of the normal distribution is that due to its symmetric, smooth and bell-shaped nature it is possible to attach a probability to the sample mean being within a certain distance of the true but
Making Inferences
153
Box 6.4
A sample mean is obtained by adding up the valu of each case in the sample, and dividing the sum by the number of cases in the sample. This sounds far more complicated than it is, not least because statistics computing software packages will do it at the touch of a button. For small samples it is also very simple to calclate the mean manually. The formula is:
x=
where xt, x2 (etc.) are all the cases in a sample, and is the number of cases in the sample.
unknown population mean (see Box 6.4). This distance is expressed in standard deviations, a concept that was discussed in Chapter 5, and again below (briefly, the standard deviation measures how far, on average, observations in a sample are from the sample mean: see also Box 6.5). For example, in a normally distributed sample, we can say with 90 per cent certainty that the true population mean will be within 1.645 standard deviations above or below the sample mean (see Box 6.6). If we want to be even more certain than 90 per cent, we can say with 95 per cent certainty that the true population mean will be within 1.96 standard deviations above or below the sample mean. Similarly, the true population mean will be within 2.576 standard deviations above or below the sample mean with 99 per cent certainty. The actual distance that, say, 1.96 standard deviations represents depends of course on the size of a sample's standard deviation: if it is small, cases are tightly clustered around the sample mean and 1.96 or even 2.576 standard deviations may not be very much. However, if the standard deviation is large, then 1.96 standard deviations may cover a very large range. The valu of a small standard deviation compared to a large one, then, is that it allows for a more precise estimate of the true population mean, at the same level of certainty. These rules of thumb are known as the 90, 95 and 99 per cent confidence intervals. As an illustration, to say that we have 95 per cent confidence in an inference really means that if 100 samples were drawn from the population, in 95 of those samples the population mean would be located somewhere within 1.96 standard deviations on either side of the sample mean. The phrase '90 per cent confidence' means that in 90
154
Box 6.5
A sample's standard deviation is obtained by calculating mean squared residuals, and then taking the square root of that mean. This sounds far more complicated than it is, not least because statistics computing software packages will do it at the touch of a button. For small samples it is also very simple to calclate the standard deviation manually. The formula is:
n- 1
where Z is 'the sum of, x is the valu of a case , x is the sample mean and n is the sample size. Follow these five easy steps: 1. Calclate the sample mean. See equation in Box 6.4 2. Obtain the residual for each case. Subtract the sample mean from the valu of each case; what you have left are the residuals Square all residuals. Multiply each residual by its own valu Add the squared residuals and divide the sum by n-1. Add up the valu of all squared residuals, and divide the sum by the number cases in the sample MINUS 1 (there are stastical reasons to use n-1 rather than n) Take the square root of the figure obtained. This is because the figure obtained in step 4 is not in the same units as the sample, due to the fact that the residuals were squared in step 3. Taking the square root here in step 5 simply transforms the standard deviation into the same unit as the sample. If this is not done then it becomes very complicated to interpret the standard deviation - is it large, small. . .?
3. 4.
5.
out of 100 samples from the same population the population mean will be located within 1.645 standard deviations on either side of the sample mean, whereas 99 per cent confidence means that in 99 out of 100 samples from the same population the population mean will be located within 2.576 standard deviations either side of the sample mean (Upton and Cook, 2002, pp.76-8). Note that the higher the confidence, the lower the precisin.
Making Inferences
155
Box 6.6
Normal distributions
This is clearly extremely useful when a distribution is normal, but what about variables that do not have a normal distribution, and populations whose distributions are unknown? Needless to say, most variables that political scientists use are not normally distributed, and population distributions are hardly ever known; if the nature of the population was known, there would be no reason to draw a sample. However, the Central Limit Theorem holds that if repeated random samples are drawn from a population then the means of those samples will always produce a normal distribution (Daly et al., 1995, pp.205-10). More formally, 'The sample mean x, drawn from a population with mean (i and variance a2, has a sampling distribution which approaches a Normal distribution with mean (o. and variance o2/n, as the sample size approaches infinity' (Barrow, 1996, p.122; note that variance is simply the standard deviation squared). Since time and other resources usually only allow a researcher to draw one sample, with one sample mean, we typically do not see how this Theorem works. However, a simple demonstration in SPSS or any other statistics software is easy to set up: take any variable with, say, at least 1,000 observations. For the purpose of the demonstration, the variable is treated as if it were a population, and the computer will draw random samples from it (the more samples the better, but 100 should be enough, each consisting of 5 or 10 per cent of the 'population'). Each sample has its own mean, and these means form a normal distribution irrespective of whether the 'population' itself is normally distributed. If you enter each sample mean as an observation on a new variable
156
you will find that its curve is normal. The more observations this new variable has, the more closely its shape will resemble the normal distribution (in the definition above, this is expressed as 'as the sample size approaches infinity'). That is, in a random sample from a population with an unknown mean, the sample mean is a good (possibly the best) indicator of the unknown population mean, and confidence intervals enable us to establish how accurately and with how much certainty we can infer from the sample mean what the population mean might be. The larger the sample, the better the sample mean becomes as an indicator of the population mean (this means that the Theorem is an asymptotic result). The importance of the sampling being random lies in the fact that random sampling will over- and under-estimate the true population mean by equal amounts. As a consequence of this the distribution of the new variable of sample means will take on the symmetric shape of a normal curve, e.g., there will be as many inaccurate estimates above the true population mean as below it. It is surprisingly unirnportant what proportion of the population is included in a sample. It is the size of the sample itself that matters in determining the accuracy of how well it measures the population, not the proportion of the population included in the sample. See Box 6.7 and 6.8 for examples. In normal distributions and under the Central Limit Theorem, it is possible to determine with a specified degree of certainty how cise to the sample mean the true population mean is located. What is more, using confidence intervals, we can assess whether two variables that have different means measure two population means that are in fact different from each other. If a survey showed that 50 per cent of people in Italy and 45 per cent in France supported the Euro, we could not be sure whether Italians really were more positive about the single currency than the French, as both results are subject to sampling error. That is, if the 95 per cent confidence interval for each distribution overlaps substantially, there is a possibility that the two true population means (Italian altitudes toward the Euro; French attitudes toward the Euro) might be exactly the sanie as each other. We can test the probability of these two means really being different from each other with a t-test (Box 6.9). Alternatively, it is possible to use the chi square test to test the association between two variables (Box 6.10). Both tests are based in the idea of a nuil hypothesis of no association between two variables, and an alternative hypothesis that involves some kind of association between two variables. Moreover, the result of both tests is expressed in terms of the odds, or the probability, that the association between two variables is true not only of
Making Inferenees
157
the sample, but also of the population from which the sample was drawn. The question is what odds are acceptable. The 90, 95 and 99 per cent confidence interval are frequently used. To use, say, the 95 per cent level means that if the p-value (probability valu) is 0.05 of below, we can reject the nuil hypothesis of no association and accept the alternative hypothesis, that there is some association between the two variables. Equally, at the 99 per cent level a p-value of 0.01 is necessary in order to reject the nuil hypothesis, and at the 90 per cent level a p-value of 0.1 is sufficient.
158
Box 6.7 The Central Limit Theorem I: British Army bases in Northern Ireland
This example demonstrates the Central Limit Theorem by showing that over repeated random samples from a population, the means of the samples will form a distribution that approximates a normal distribution, even if the population's distribution itself is not a normal distribution. As a consequence of this, the Central Limit Theorem applies even samples whose distributions are not normal, in large-size samples. In this example, the population is the respondents to the Northern Ireland Life 1 Times survey 2001. Some 1,800 people responded to this survey. One of the questions they answered was: 'Shutting down British army bases is extremely important for peace in Northern Ireland': Agree strongly 161 Agree 492 Neither agree or disagree 317 Disagree 433 Disagree strongly 272 Total (N) Mean Standard deviation 1,675 3.0973 1.2561
Drawing 20 random samples from this population (each sample consisting of approximately 5 per cent of the population size) generales 20 sample means:
159
The 20 sample means can be conceived of as a sample of 20 observations, with a mean of 3.0802 and a standard deviation of 0.1276. The distribution of the 20 observations in this sample is roughly normal:
6H
O -
2.5
3.5
Consequently, random samples drawn from the population can be analysed on the basis of the Central Limit Theorem, because over repeated samples the sample means form a normal distribution. In this example, the population mean is known (3.0973), so it is possible to check whether the data set consisting of 20 sample means does indeed tell us anything useful about the population. Let's look at three levis of confidence.
1. 90 per cent confidence: this level holds f the true population mean (3.0973) is within 1.645 standard deviations of the sample mean (3.0802). In real figures, this means that wth 90 per cent confidence the population mean is within the 2.8702-3.2902 range: 3.0802 (1.645 * 0.1276) = 3.0802 0.21 = 2.8702 to 3.2902 2. 95 per cent confidence: this level holds if the true population mean (3.0973) is within 1.96 standard deviations of the sample mean (3.0802). In real figures, this means within the 2.8301-3.3303 range: 3.08021(1.96 * 0.1276) = 3.0802 0.2501 = 2.8301 to 3.3303 3. 99 per cent confidence: this level holds if the true population mean (3.0973) is within 2.576 standard deviations of the sample mean (3.0802). In real figures, this means within the 2.7515-3.4089 range: 3.0802 (2.576 * 0.1276) = 3.0802 0.3287 = 2.7515 to 3.4089
Note that as the level of confidence increases, the wider the range of precisin becomes. This Ilustrares the trade-off between precisin and confidence. Source: Northern Ireland Social and Political Archive (2002).
160
Suppose we have a population (N) of six activists who partake in demonstrations. They have been on this many demonstrations, respectively: 2, 4, 4, 6, 8, and 12. On average, then, they have been on six demonstrations each. If a sample of = 2 is drawn from the population of N = 6 people (the population size being designated by N, the sample size by ), then our sampling fraction is one-third ("/N or 2/): that is, each observation has a one in three chance of being randomly selected. In all, 15 different random samples of n = 2 can be drawn from this small population. If each activist is allocated a letter, so that A has been on 2 demonstrations, B on 4, C on 4, D on 6, E on 8, and F on 12, then we can confirm this as follows:
Sample no: 2 3 Observation 1 : A A A Observation 2 : B C D Sample vales: 2 2 2 4 4 6 Average demos: 3 3 4 4 5 A A E F 2 2 8 12 5 7 6 B C 4 4 4 7 10 11 12 13 14 15 B B B C C C D D E D E F D E F E F F 4 4 4 4 4 4 6 6 8 6 8 12 6 8 12 8 12 12 5 6 8 5 6 8 7 91 0
Normally we draw only one random sample from a population, and if we were to draw only one of the 15 possible random samples from the population of six activists each of the 15 samples would have an equal chance of being drawn (this is the definition of randomness). Consequently, in the long run (that is, by repeated sampling), each of these samples will be
Chapter 9 shows that sampling in qualitative research often depends on access: sampling when it comes to interviews with policy-makers is, for example, often determinad by those to whom the researchers manage to secure access. It is possible that potential interviewees are 'not available for an interview' for particular reasons; if so, this introduces systematic error into the sample of interviewees. Similarly, archival research (Chapter 7) depends not only on what documents have survived over time, but also on what documents the government decides to relase. Withheld documents are almost certainly not released for particular reasons, which is also a source of non-random sampling error. Since non-random errors do not cancel themselves out, the sample becomes unrepresentative of its population. It is possible to construct qualitative research designs that allow attempts at inference. For example, Chapter 3 on comparative
Making Inferences
161
selected equally often. This being so, it is possible to construct a distribution showing the relative frequency with which, in the long run, different sample averages will occur. This is known as the sampling distribution of the mean:
Frequency of a particular sample being drawn 4
10
11
12
Sample mean
If sample size is increased, then the sampling distribution of the mean will be reduced. Observations will cluster around the true population mean in the shape of a normal curve. That is, in the activist example, the more repeated random samples we draw from our population of six activists, the more accurately the distribution of means will estmate the population mean of having taken part in six demonstrations.
research sets out some research designs that can be used to make inferences from qualitative research: the most similar and most different research designs are specifically aimed at facilitating inference-making. Research design is not the real problem with inferences in qualitative political science; the problem is to determine the uncertainty of those inferences. Inferences are always more or less uncertain, but within qualitative political science estimating the uncertainty of inferences has not been a central issue, quite probably because of a combination of two factors: first, many qualitative researchers view their task as describing unique cases without attempting to make inferences and consequently they perceive no need to assess uncertainty. Of course, reluctance to make inferences may simply signal a healthy realization that inferences made would be so uncertain that they would be misleading. Second, qualitative sampling processes are typically non-random, and are not effective
162
Box 6.9
T-tests
T-tests are a class of hypothesis tests that examine whether the means of two distributions (u.j, |I2) are the same or different from each other. The hypothesis that there is no difference between the two means is the nuil hypothesis (Hg. ul = p.2). The alternative hypothesis holds that there is some difference between the two means. HA often does not specify anything more than that the two means are different (HA: (ij & u,2): that is, HA is two-sided. In these instances a two-tailed t-test is necessary. Sometimes HA is one-sided: that is, it specifies which mean is larger or smaller than the other mean (HA: (ij > n2; HA: u,j < \i). In these instances one-tailed t-tests suffice. The outcome of a t-test is to reject one of the hypotheses, and to accept the other one. Having determined the two variables' distribution (e.g., normal, Poisson, Bernoulli, etc.) it is possible to decide which hypothesis to accept and which to reject on the basis of the test statistic, t. The likelihood of obtaining a -value that is as extreme or more extreme than the one that has been obtained is expressed in a p-value (probability valu). If we apply 95 per cent confidence intervals, then we look for the p-value to be below 0.05 to reject H0; if we apply 99 per cent confidence intervals, then we look for the p-value to be below 0.01 to reject HQ. Two types of mistake are possible in hypothesis-testing: rejecting HQ when it should be accepted (Type I error), and accepting H0 when it should be rejected (Type II error). At the 95 per cent level of confidence, an error of either type will occur 5 per cent of the time, or 1 time in 20; at the 99 per cent level of confidence an error will happen 1 per cent of the time, or 1 time in 100, etc.
'tools' for assessing uncertainty under such circumstances. It may often be that awareness of the inadequacies and strengths of a sample is the only way that qualitative research can assess uncertainty. It may allow us to 'compnsate' intelligently in the interpretation, for example knowing that the uninterested are likely to refuse an interview we can infer that survey estimates of interest are overestimates, mxima etc. Conclusin Inference-making should be a central airn of all forms of political science research because it increases the valu and potential audience of a research contribution. Although in some ways inferences and inference-making are very different in qualitative and quantitative political science, in other ways inferences and inference-making unite
Making Inferences
163
Chi square is denoted x2. A chi square test is a goodness-of-fit test that can be used to examine if two variables in a cross-tabulation are independent of each other (often called the nuil hypothesis, H 0 ), or whether diere is a correlation between them (the alternative hypothesis, HA). The variables can be categorical or ordinal (see Chapter 5 for definitions of level of measurement, as well as for an explanation of how to read crosstabulations). A chi square test tests HO, a nuil hypothesis of no correlation between the two variables, and HA, an alternative hypothesis of correlation between the two variables. The nuil hypothesis is either rejected or accepted on the basis of how different the cross-tabulation's frequencies (e.g., the observed frequencies in the test) are from the frequencies that would be expected if there were no correlation between the two variables. The comparison of observed and expected frequencies yields a chi square valu with an associated p-value. At a chosen level of confidence (usually 95 or 99 per cent), H0 is either accepted or rejected. If it is accepted, then the conclusin is that the two variables are statistically independent; if it is rejected, then the two variables are correlated to each other (subject to not making a Type I or Type II error; see Box 6.9). If HQ is rejected, we want to know something about the strength and direction of the correlation. Computing software programmes typically give (or can be programmed to give) a correlation coefficient that ranges from -1 to 1. The closer the coefficient is to either extreme of the -1 to 1 range, the stronger the correlation is. Vice versa, a coefficient of zero or cise to zero indicates a non-existent or very weak correlation. Furthermore, a positive coefficient indicates a positive correlation: that is, as the vales on one variable increase, so do the vales on the other variable. A negative coefficient indicates that the variables move in opposite directions^ as the vales on one variable increase, the vales on the other variable decrease. This is described by a negative correlation coefficient. There is a p-value associated with the correlation coefficient, too, indicating at what level of confidence the correlation can be inferred from the sample to the population. Following the usual conventions, the correlation coefficient may be accepted if its p-value is, say, below 0.01 or 0.05.
rather than divide these two branches of research. That is, although there are many more established tools for quantitative inferencemaking (e.g., random sampling, the normal distribution and the Central Limit Theorem) than for the qualitative equivalen!, in both cases making inferences is about acknowledging that one's sample is
164
not the main focus of interest, and that 'the researcher should always keep in mind that the results of research are only as good as the quality of the data' (Gujarati, 1995, p.27). In this context, 'quality' refers primarily to randomness and representativeness. What matters is what the sample reveis about the population from which it was drawn, and with what degree of confidence it is possible to say that what is observed in the sample is also true of the population.