Practical Research 2 Module 5
Practical Research 2 Module 5
ANALYSIS OF VARIANCE (ANOVA) Analysis of Variance ¥ One Way (one factor, fixed effects) ¥ ‘Two Way (two factors, randomized blocks) ¥ Two Way with Repeated Observations (Iwo factors, randomized block) Fully Nested (hierarchical factors) Latin Square (one primary and two secondary factors) “ ¥ Crossover (two factors, fixed effects, treatment crossover) ¥ Kruskal-Wallis (nonparametiie one way) v Friedman (nonparametric two way) ¥ Homogeneity of Variance (examine the ANOVA assumption of equal variance) Normality (examine the ANOVA assumption of normality) ¥ Agreement (examine agreement of two or more samples) Basics Concepts ANOVA is a set of statistical methods used mainly to compare the means of two or more samples. Estimates of variance are the key intermediate statistics calculated, hence the reference to variance in the title ANOVA. The different types of ANOVA reflect the different experimental designs and situations for which they have been developed, Excellent accounts of ANOVA are given by Armitage & Berry (1994) and Kleinbaum et.al (1998). Nonparametric alternatives to ANOVA are discussed by Conover (1999) and Hollander and Wolle (1999). ANOVA and regression ANOVA can be treated as a special case of general linear regression where independ avpredicator variables are the nominal categories or factors. Each value that ean be taken by a factor is referred toas.a level. k different levels (c.g. three different types of diet in a study of diet on weight gain) are coded not as a single column (eg. of diet 1 to 3) but as k-ldummy variables. The dependentioutcome variable inthe regression consists ofthe study observations. General linear regression can be used in this way to build more complex ANOVA models than those described in this section; this is best done under expert statistical guidance. 90Fixed vs. random effects A fixed factor has only the levels used in the analysis (¢.8. sex, age, blood group). A random factor has many possible levels and some are used in the analysis (e.g. time periods, subjects, observers), Some factors that are usually treated as fixed may also be treated as random if the study is looking at them as part of a larger group (e.g. treatments, locations, tests). Most general statistical texts arrange data for ANOVA into tables whet columns represent fixed factors and the one and two way analyses described are fixed factor methods. Multiple comparisons ANOVA gives an overall test for the difference between the means of k groups. StatsDirect enables you to compare all k(k-1)/2 possible pairs of means using methods that are designed to avoid the type I error that would be seen if you used two sample methods such as t test for these comparisons. The multiple comparison/contrast methods offered by StatsDirect are Tukey(- Kramer), Scheffé, Newman-Keuls, Dunnett and Bonferroni (Amitage and Berry, 1994; Wallenstein, 1980; Liddell, 1983; Miller, 1981; Hsu, 1996; Kleinbaum et al., 1998). See multiple ‘comparisons for more information, Further methods ‘There are many possible ANOVA designs. StatsDirect covers the common designs in its, ANOVA section and provides general tools (see general linear regression and dummy variables) for building more complex designs. Other software such as SAS and Genstat provide further specific ANOVA designs. For example, balanced incomplete block design: - with complete missing blocks you should consider a balanced incomplete block design provided the number of missing blocks does not exceed the number of treatments. Treatments, 1 2 4 A x x x B x x x Blocks: c x x x D x x x 91Complex ANOVA should not be attempted without expert statistical guidance. Beware situations where over complex analysis is used in order to compensate for poor experimental design. There is no substitute for good experimental desiga. > Regression Regression is a statistical measure used in finance, investing and other disciplines that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Regression helps investment and financial managers to value assets and understand the relationships between variables, such as commodity prices and the stocks of businesses dealing in those commodities. The two basic types of regression are linear regression and multiple linear regression, although there are non-linear regression methods for more complicated data and analysis. Linear regression uses one independent variable to explain or predict the outcome of the dependent variable Y, while multiple regressions use two or more independent variables to prediet the outcome. Regression can help finance and investment professionals as well as professionals in other businesses. Regression can help predict sales for a company based on weather, previous sales, GDP growth or other conditions. The capital asset pricing model (CAPM) is an often-used regression model in finance for pricing assets and discovering costs of capital. The general form of each type of regression is: Linear Regression: Y =a +bX tu Multiple Regression: Y =a + b:X, * boX2+bsXs +. + BX. + Where: Y = the variable that you are trying to predict (dependent variable) X =the variable that you are using to predict Y (independent variable) the intercept b= the slope = the regression residual Regression takes a group of random variables thought to be predicting Y, and tries to find a ‘mathematical relationship between them. This relationship is typically in the form of a straight line (linear regression) that best approximates all the individual data points. In multiple regression, the separate variables are differentiated by using numbers with subscript 92Regression in Investing Regression is often used to determine how many specific factors such as the price of a commodity, interest rates, particular industries or sectors influence the price movement of an asset ‘The aforementioned CAPM is based on regression, and itis utilized to project the expected returns for stocks and to generate costs of capital, A stock's returns are regressed against the returns of a broader index, such as the S&P 500, to generate a beta for the particular stock. Beta is the stock's risk in relation to the market or index and is reflected as the slope in the CAPM model. The ‘expected retum for the stock in question would be the dependent variable Y, while the independent variable X would be the market risk premium. Additional variables such as the market capitalization of a stock, valuation ratios and recent retums can be added to the CAPM model to get better estimates for retums. These additional factors are known as the Fama-French factors, named after the professors who developed the multiple linear regression model to better explain asset retums. SAMPLING PROCEDURE Sampling Procedures Sampling is a process or technique of choosing a sub-group from a population to participate in the study; it is the process of selecting a number of individuals for a study in such a way that the cted (Ogula, 2005). There individuals selected represent the large group from which they wer are two major sampling procedures in research. These include probability and non-probability samp Probability Sampling Procedures In probability sampling, everyone has an equal chance of being selected. This scheme is one in which every unit in the population has a chance (greater than zero) of being selected in the sample. There are four basic types of sampling procedures associated with probability samples. ‘These include simple random, systematic sampling, stratified and cluster 93nple Random Sampling Procedure Simple random sampling provides the base from which the other more complex sampling methodologies are derived. To conduct a simple random sample, the researcher must first prepare an exhaustive list (sampling frame) of all members of the population of interest. From this list, the sample is drawn so that each person or item has an equal chance of being drawn during each selection round (Kanupriya, 2012). ‘To draw a simple random sample without introducing researcher bias, computerized sampling programs and random number tables are used to impartially select the members of the population to be sampled. Subjects in the population are sampled by a random process, using either a random number generator or a random number table, so that each person remaining in the 2008). population has the same probability of being selected for the sample (Friedrichs, Systematic Sampling Procedure Systematic sampling procedure often used in place of simple random sampling. In systematic sampling, the researcher selects every nth member after randomly selecting the first through nth element as the starting point. For example, if the researcher decides to sample 20 respondents from a sample of 100, every Sth member of the population will systematically be selected. A researcher may choose to conduct a systematic sample instead of a simple random sample for several reasons. Firstly, systematic samples tend to be easier to draw and exe te, secondly, the researcher does not have to go back and forth through the sampling frame to draw the members to ‘be sampled, thindly, a systematic sample may spread the members selected for measurement more evenly across the entire population than simple random sampling. Therefore, in some cases, systematic sampling may be more representative of the population and mote precise (Groves et al., 2006).Stratified Sampling Procedure Stratified sampling procedure is the most effective method of sampling when a researcher wants to get a representative sample of a population. It involves categorizing the members of the population into mutually exclusive and collectively exhaustive groups. An independent simple random sample is then drawn from each group. Stratified sampling techniques can provide more precise estimates if the population is surveyed is more heterogeneous than the cat jorized groups. ‘This technique can enable the researcher to determine desired levels of sampling precision for each group, and can provide administrative efficiency. The main advantage of the approach is that it’s able to give the most representative sample of a population (Hunt & Tyrrell, 2001), Cluster Sampling Procedure In cluster sampling, a cluster (a group of population elements), constitutes the sampling unit, instead of a single element of the population. The sampling in this technique is mainly geographically driven. ‘The main reason for cluster sampling is cost efficiency (economy and feasibility). The sampling frame is also often readily available at cluster level and takes short time for listing and implementation, The technique also suitable for survey of institutions (Abmed, 2009) or households within a given geographical area. But the design is not without disadvantages, some of the challenges that stand out are: it may not reflect the diversity of the community; other elements in the same cluster may share similar characteristics; provides less information per observation than an SRS of the same size (redundant information: similar information from the others in the cluster); standard errors of the estimates are high, compared to other sampling designs with the same sample size. Non Probability Sampling Procedures Non probability sampling is used in some situations, where the population may not be well ‘defined, In other situations, there may not be great interest in drawing inferences from the sample to the population. The most common reason for using non probability sampling procedure is that it is less expensive than probability sampling procedure and can often be implemented more quickly (Michael, 2011). It includes purposive, convenience and quota sampling procedures. 95Purposive/Judgmental Sampling Procedure In purposive sampling procedure, the researcher chooses the sample based on who he/she thinks would be appropriate for the study. The main objective of purposive sampling is to arrive as ata sample that can adequately answer the research objectives. The selection of a purposive sample is often accomplished by applying expert knowledge of the target population to select in a non- random manner a sample that repre ion of the population (Henry, 1990). ‘A major disadvantage of this method is subjectivity since another researcher is likely to come up with a different sample when identifying important characteristics and picking typical elements to be in the sample. Given the subjectivity of the selection mechanism, purposive sampling is generally considered most appropriate for the selection of small samples often from a limited geographic area or from a restricted population definition. The knowledge and experience of the researcher making the selections is a key aspect of the “success”? of the resulting sample (Michael, 2011). A case study research design for instan employs purposive sampling procedure to amive at a particular “case” of study and a given group of respondents. Key informants are also selected using this procedure. Convenience Sampling Procedure Convenience sampling is sometimes known as opportunity, accidental or haphazard sampling. It is a type of nonprobability sampling which involves the sample being drawn from that part of the population which is close to hand, that is, a population which is readily available and convenient. The researcher using such a sample cannot scientifically make generalizations about the total population from this sample because it would not be representative enough (Michael, 2011), This type of sampling is most useful for pilot testing. Convenience sampling differs from purposive sampling in that expert judgment is not used to select a representative sample. The primary selection criterion relates to the ease of obtaining a sample, Ease of obtaining the sample relates to the cost of locating elements of the population, the geographic distribution of the sample, and obtaining the interview data from the selected elements (de Le 1w, Hox & Huisman, 2003), 96Sampling Techniques When sampling, you need to decide what units (ie., what people, organizations, data, to inelude in your sample and which ones to exclude. As you'll know by now, sampling techniques act as a guide to help you select these units, and you will have chosen a specific probability or non-probability sampling technique: + If you are following a probability sampling technique, you'll know that you require a list of the population from which you select units for your sample. This raises potential data protection and confidentiality issues because units in the list (i.e, when people are your units) will not necessarily have given you permission to access the list with their details. ‘Therefore, you need to check that you have the right to access the list in the first place. + Ifusing anon-probability sampling technique, you need to ask yourself whether you are including or excluding units for theoretical or practical reasons. In the case of purposive sampling, the choice of which units to include and exclude is theoretically-driven. In such cases, there are few ethical concerns. However, where units are included or excluded for practical reasons, such as ease of access or personal preferences (e.g,, conveni e sampling), there is a danger that units will be excluded unnecessarily. For example, itis not uncommon when select units using convenience sampling that researchers’ natural preferences (and even prejudices) will influence the selection process. For example, maybe the researcher would avoid approaching certain groups (e.g., socially marginalized individuals, people who speak little English, disabled people, etc.). Where this happens, it raises ethical issues because the picture being built through the research can be excessively narrow, and arguably, unethically narrow. This highlights the importance of using theory to. determine the creation of samples when using non-probability sampling techniques rather than praetieal reasons, whenever possible. Sample size Whether you are using a probability sampling or non-probability sampling technique to help you create your sample, you will need to decide how large your sample should be (ic., your sample size). Your sample size becomes an ethical issue for two reasons: (a) over-sized samples and (b) under-sized samples. > Over-sized samples 7‘A sample is over-sized when there are more units (e.g., people, organizations) in the sample than are needed to achieve you goals (i.c., to answer your research questions robustly). An over-sized sample is considered to be an ethical issue because it potentially exposes an excessive mumber of people (or other units) to your research. Let's look at where this may or may not be a problem: Not an ethical issue Imagine that you were interested in the career choices of students at your university, and you were only asking students to complete a questionnaire taking no more than 10 minutes, all an over-sized sample would have done was waste a little of the students’ time. Whilst you don't want to be wasting peoples’ time, and should try and avoid doing so, this is not a ‘major ethical issue, A potential ethical issue Imagine that you were interested in the effect of a carbohydrate free diet on the concentration levels of female university students in the classroom. You know that carbohydrate free diets (ie., no breads, pasta, rice, ete.) are a new fad) amongst female university students because some female students feel that it helps them loose weight (or not put weight on). However, you have read some research showing that such diets can make people feel lethargic ( Jow on energy). Therefore, you want to know whether this is affecting students’ performance; or more specifically, the concentration levels of female students in the classroom, You decide to conduct an experiment where you measure concentration levels amongst 40 female students that are not on any specific diet. First, you measure their concentration levels. Then, you ask 20 of the students to go on a carbohydrate free diet and whilst the remaining 20 continue with the normal food consumption. After a period of time (eg. 14 days), you measure the concentration levels of all 40 students to compare any differences between the two groups (ie. the normal group and the group on the carbohydrate free diet). You find that the carbohydrate free diet did significantly impact on the concentration levels of the 20 students. So here comes the ethical issue: What if you could have come to the same conclusion with fewer students? What if you only needed to ask 10 students to go on the carbohydrate free diet rather than 20? Would this have meant that the performance of 10 students would. not have been negatively for a 14 day period as a result? The important point is that you do not want to expose individuals to distress or harm unnecessarily, 98Under-sized samples A sample is under-sized when you are unable to achieve your goals (ie., to answer your research ques jons robustly) b c you insufficient units in your sample. The important point is that you fail to answer your research questions not because a potential answer did not exist, but because your sample size was too small for such an answer to be discovered (or interpreted). Let's look where this may or may not be a problem: > Not an ethical issue let’s take thi imple of the career choices of students at your university. If you did not collect sufficient data; that is, you did not ask eno students to complete your questionnaire, the answers you get back from your sample may not be representative of the population of all students at your university. This is bad from two perspectives, but only one is arguably a potential ethical issue: First, itis bad because your dissertation findings will be of a lower quality; they will not reflect the population of all students at the university that ‘you are interested in, which will most likely lead to a lower mark (i.e, external validity is an important goal of quantitative research). This is bad for you, but not necessarily unethical. However, if the findings from your research are incorrectly taken to reflect the views of all students at your university, and somehow wrongly influence policy within the university € amongst the Career Advisory Service), your dissertation research could have negatively impacted other students. This is a potential ethical issue. Despite this, we would expect that the likelihood of this happening is fairly low > A potential ethical issue Going back to the example of the effect of a carbohydrate free diet on the concentration levels of female university students in the classroom, an under-sized sample does pose potential ethical issues. After all, with the exception of students that just want to help you out, it is likely that most students are taking part voluntarily because they want to the effect of such a diet on their potential classroom performance. Perhaps they have used the diet before or are thinking about using the diet. Altemately, perhaps they are worried about the effects of such diets, and what to further research in this area. In either if no conclusions can be made or the findings © not statistically significant because 99.the sample size was (oo small, the effort, and potential distress and harm that these volunteers put themselves through was all in vein (i.e., completely wasted). This is where an under-sized sample can become an ethical issue. As a researcher, even when you're an undergraduate or master’s level student, you have a duty not to expose an e sive number of people to unnecessary distress or harm. This is one of the basic principles of research ethics. At the same time, you have a duty not to fail to achieve what You set out to achieve. This is not just a duty to yourself or the sponsors of your dissertation (if you have any), but more importantly, to the people that take part in your research (i.e., your sample). To try and minimize the potential ethical issues that come with over-sized and under-sized samples, there are instances where you can make sample size calculations to estimate the required sample ls, size to achieve your g Gatekeepers Gatekeepers can often control access to the participants you are interested in (e.g, a manager's control over ess to employees within an organization). This has ethical implications because of the power that such gatekeepers can exercise over those individuals. For example, they may control what access is (and is not) granted to which individuals, coerce individuals into taking part in your research, and influence the nature of responses. This may affect the level of consent that 4 participant gives (or is believed to have given) you. Ask yourself: Do I think that participants are taking part voluntari the voluntary nature of individuals? participation, and how will it affect the data? ) How did the way that I gained access to participants affect not only Problems with gatekeepers can also affect the representativeness of the sample. Whilst qualitative research designs are_more likely to use non-probability sampling techniques, even ‘quantitative research designs that use probability sampling can suffer from issues of reliability associated with gatekeepers. In the case of quantitative research designs using probability sampling, are gatekeepers providing an accurate list of the population without missing out potential participants (e.g., employees that may give a negative view of an organization)? If non- probability sampling is being used, are gatekeepers coercing participants to take part or influencing their responses? 100CHECK YOUR KNOWLEDGE (Short Answer Question) (2 POINTS EACH) DIRECTIONS: Read the question carefully. Write your answer on the space provided. 1. It is a systematic approach to investigations during which numerical data is collected and/or the researcher transforms what is collected or observed into numerical data. _ 2. a series of questions and other prompts for the purpose of gathering information from respondents. ~ conversation between two or more people (the interviewer and the interviewee) where questions are asked by the interviewer to obtain information from the _ =a more structured approach would be used to gather quantitative data 4. a group or single participants are manipulated by the researcher, for example, asked to perform a specific task or action, Observa are then made of their user behavior, user processes, workilows etc, either in a controlled situation (e.g. lab based) or in a real- ‘world situation (e.g. the workplace). recordings or logs of system or website activity analysis of documents belonging to an organization. the whole units of analysis that might be investigated, this could be students, cats, house prices etc. 8. the actual set of units selected for investigation and who participate in the research 9. characteristics of the units/participants. 10, the score/label/walue of a variable, not the frequency of occurrence. For example, if age is a characteristic of a participant then the value label would be the actual age, eg. 21,22, 25, 30, 18, not how ‘many participants are 21, 22, 25, 30, 18. 11. the individual unit/participant of the study/research, = 12, is complex and can be done in many ways dependent on 1) what you want to achieve from your research, 2) practical considerations of who is available to participate. 13, to analyzed data means to quantify of change the verbally expressed data into numerical information. ~ 14, uses. statistical analysis to yield results that describes the relationship of two variables. The results, however are incapable of establishing casual relationships, 15, is a statistical method used to test differences between two or more me SD sem ots arte esting scaled “Analysn of Variance ter tan Means sar =_- = = 101 2? eeVIITY 1: SPECULATIVE THINKING (GROUP WORK) Directions: Question does not only indicate your curiosity about your world but also signal your desire for clearer explanation about things. Hence, ask one another thought-provoking questions about quantitative data analysis. For proper question formulation, you may draft your question on the space below. ACTIVITY 2: INDIVIDUAL WORK: Recall two or three most challenging question from your classmates shared to the class that you wanted to answer but to get the chance to do so, Write and answer them on the lines provided. ACTIVITY 3: MATCHING TYPE Directions: Match the expression in A with those in B by writing the letter of your answer on the line before the word. A B — 1. Mean a. data-set divider 2. Ratio b. facts or information 3. Data ¢. part-by-part examination ____4. Coding 4. data-preparation techniques ____5. Analysis €. repetitive appearance of an item 6. Mode £ sum divided numbers of items ____7. Media . valuable zero _____8. Standard deviation h. ANOVA 9. Regression i. shows variable predictor = 10. Table j. data organizer 102[REFERENCE | David M. Lane, Online Statistics Education: An Interactive Multimedia Course of Study, Developed by Rice University (Lead Developer) University of Houston Clear Lake, and Tufts University batp:/fonlinestatbook.com/2/analysis_of_variance/intro.hunl butp://www health herts.ac.uik/immunologyAVeb%20programme%20- e20Researchhealthprofessionals/quantitative_data_analysis.htm blip://www.investopedia.com/terms/v/statisties.asp Algina, J., & Keselman, H. J. (1999). Comparing squared multiple correlation coefficients Examination of a confidence interval and a test significance. Psychological Methods, 4(1), 7683. Bobko, P. (2001). Correlation and regression: Applications for industrial organizational psychology and management (2nd ed.). Thousand Oaks, CA: Sage Publications. Bonett, D. G. (2008). Meta-analytic interval estimation for bivariate correlations. Psychological Methods, 13(3), 173-181 P. Y., & Popovich, P. M. (2002). Correlation: Parametric and nonparametric measures: ‘Thousand Oaks, CA: Sage Publications. Cheung, M. W.-L., & Chan, W. (2004). Testing dependent correlation coefficients via structural equation modeling. Organizational Research Methods, 7(2), 206-223. Coffman, D. L., Maydeu-Olivares, A., Arnau, J. (2008). Asymptotic distribution free interval ‘estimation: For an intraclass correlation coefficient with applications to longitudinal data. Methodology, 4(1), 4-9. Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences. (rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates. Hatch, J. P., Heame, E. M., & Clark, G. M. (1982). A method of testing for serial correlation in univariate repeated-measures analysis of variance. Behavior Research Methods & Instrumentation, 14(5), 497-498, Kendall, M. G., & Gibbons, J. D. (1990). Rank Correlation Methods (Sth ed.) London: Edward Arnold. Krijnen, W. P. (2004). Positive loadings and factor correlations from positive covariance