100% found this document useful (1 vote)
2K views

Practical Research 2 Module 5

Practical Research

Uploaded by

Carl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
100% found this document useful (1 vote)
2K views

Practical Research 2 Module 5

Practical Research

Uploaded by

Carl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 38
Module FINDING ANSWERS THROUGH DATA 3 COLLECTION Introduction Data collection is the process of gathering and measuring information on variables of interest, in an established systematic fashion that enables one to answer stated research questions, test hypotheses, and evaluate outcomes. The data collection component of research is common to all fields of study including physical and social sciences, humanities, business, etc. While methods vary by discipline, the emphasis on ensuring accurate and honest collection remains the same, Craddick etal (2003) Intended Learning Outcomes After this lesson, you should be able to: 1. collects data using appropriate instruments. 2. presents and interprets data in tabular and graphical forms, 3. uses statistical techniques to analyze data— study of differe ces and relationships limited for bivaria inalysis. 4, Use descriptive statistics in analyzing data, PERFORMANCE STANDARD ‘The learner is able to; Gather and analyze data with intellectual honesty, using suitable techniques. LESSON 12 QUANTITATIVE DATA ANALYSIS Quantitative Data Analysis It is a systematic approach to investigations during which numerical data is collected and/or the researcher transforms what is collected or observed into numerical data. It often describes a 66 situation or event; answering the ‘what’ and ‘how many’ questions you may have about something, ‘This is research which involves measuring or counting attributes (i.e. quantities) A quantitative approach is often concerned with finding evidence to either support or contradict an idea or hypothesis you might have. A hypothesis is where a predicted answer to a research question is proposed, for example, you might propose that if'you give a student training in how to use a search engine it will improve their success in finding information on the Internet. You could then go on to explain why a particular answer is expected - you put forward a theory, We can gather quantitative data in a variety of ways and from a number of different sources. Many of these are similar to sources of qualitative data, for example: Y Questionnaires ~ a series of questions and other prompts for the purpose of gathering information from respondents, ¥ Interviews - a conversation between two or more people (the interviewer and the imeerviewee) where questions are asked by the interviewer to obtain information from the imerviewee - a more structured approach would be used to gather quantitative data ¥ Observation - a group or single participants are manipulated by the researcher, for example, on, Observations are then made of their user behavior asked to perform a specific task or act user processes, workilows etc, either in a controlled situation (e.g. lab based) or in a real- world situation (e.g. the workplace) ¥- ‘Transaction logs - recordings or logs of system or website activity ¥ Documentary research - analysis of docum is belonging to an organization Why do we do quantitative data analysis? Once you have collected your data you need to make sense of the responses you have got back. Quantitative data analysis enables you to make sense of data by ‘© organizing them + summarizing them © doing exploratory analysis ‘And to communicate the meaning to others by presenting data as: 67 + tables ‘+ graphical displays ‘+ summary statistics ‘We can also use quantitative data analysis to see: + where responses are similar, for example, we might find that the majority of students all g0 to the university library twice a week + if there are differences between the things we have studied, for example, Ist year students might go once a week to the library, 2 nd year students twice a week and 3 rd year students three times a week + if there is a relationship between the things we have studied. So, is there a relationship between the number of times a student goes to the library and their year of study? ising software for statistical analysis Some key concepts Before we look at types of analysis and tools we need to be familiar with a few concepts first: ‘+ Population - the whole units of analysis that might be investigated, this could be students, cals, house prices et. ‘+ Sample - the actual set of units selected for investigation and who participate in the research + Variable - characteristics of the units/participants ‘+ Value - the score/label/value of a variable, not the Lrequency of occurrence. For example, if age is a characteristic of a participant then the value label would be the actual age, eg. 21, 30, 18, 5, 30, 18, not how many participants are 21, ‘+ Case/subject - the individual unit/participant of the study/research, ‘Sampling Sampling is complex and can be done in many ways dependent on 1) what you want to achieve from your research, 2) practical considerations of who is available (o participate! The type of statistical analysis you do will depend on the sample type you have. Most importantly, you cannot generalize your findings to the population as a whole if you do not have 68 a random sample, You can still undertake some in ential statistical analysis but you should report these as results of your sample, not as applicable to the population at large. ‘Common sampling approaches include: + Random sampling ‘+ Stratified sampli © Cluster sampling + Convenience sampling + Accidental sampling Steps in Quantitative Data Analysis According to Baraceros (2016), she identified the different steps in Quantitative data analysis and she quoted that no “data organization means no sound data analysis’ 1. Coding system — to analyzed data means to quantify of change the verbally expressed data into numerical information. Converting the words, images, or pictures into numbers, they become fit for any analytical procedures requiring knowledge of arithmetic and mathematical computations. But it is not possible for the researcher to do the mathematical operations such as division, multiplication, or subtraction in the word level, unless you code the verbal responses and observation categories. For example: As regards gender variable, give number 1 as the code or value for Male and number 2 for Female. As to educational attainment as another variable, give the value of 2 for elementary; 4 for high school, 6 for college, 9 for M.A., and 12 for PhD level. By coding each item with a certain number in a data set, you are able to add the points or values of the respondent answers to a particular interview questionnaire item, 69 Total Sample size: 24 Gender Male: 11 (46%) Female: 13 (54%) Program ‘School Fine Arts: 9 (37%) Architecture: 625%) Journalism: 4 (17%) Com. Arts: 5 (20%) FEU: 3 (12%) MLQU: 4 (17%) UCU: 3 (12%), PUNP: 5 (20%) LNL:4 (17%) PSU: (5 %) ‘Attending in 2017 Summer Arts ‘Seminar-Workshop on Arts Yes: 18 (75%) No: 6 (25%) Role in the 2017 Seminar- Workshop on Arts: Speaker: 2 (17%) Organizer: 3 (12%) Demonstrator: 5 (20%) Participant: 12 (50%) Satisfaction with the demonstration and practice exercises Strongly agree: 11 (46%) Agree:3 (12%) Neutral: 2 (8%) Disagree: 4 (14%) Strongly disagree: 2 (8%) Source: Baraceras 2076 Practical Research 2, RexBoolstore pp-110 Step 2: Analyzing the Data Data coding and tabulation are both essential in preparing the data analysis. Before you interpret every component of the data, the researcher decides first what kind of quantitative analysis to use whether to use a simple descriptive statistical technique or an advance analytical method The first one that college students often use tells some aspects of categories of data such as: frequency of distribution, measure of central tendency (mean, median and mode), and standard 70 deviation. However, this does not give information about population from where the sample came. The second one, on the other hand, fits graduate-level studies because this involves complex statistical analysis requiring a good foundation and thorough knowledge the data- gathering instrument used. The results of the analysis reveal the following aspects of an item in set of data (Mogan 2014; Punch 2014; Walsh 2010) cited by Baraceros (2016): Y Frequency distribution — gives you the frequency of distribution and percentage of the occurrence of an item in asset of data. In other words, it gives you the number of responses fen repeatedly for one question. Question: By and large, do you find the Senators” attendance in 2015 legislative session awful Measurement Frequency | Percent Scale Code pistribution Sirongly agree i ie Agree 2 3 13% Neutral z 8% Disagree 7 T HE Strongly disagree | 5 a 17% Source: Baraceros 2016 Practical Research 2, RexBookstore pp-1I7 Y Measure of Central Tendency — indicates the different positions or values of the items, such that in a category of data, you find an item or items serving as the: ‘Mean — average of all the items or scores Example: 3484942431043 = 38 38 +7=5.43 (Mean) ‘Median — the score in the middle of the set of items that cuts or divides the set into two groups Example: The number in the example for the Mean has 2 as the Median, ‘Mode — refers to the item or score in the data set that has the most repeated ‘appearance in the set. Example: Again, in the example above for the Mean, 3 is the Mode. 1 ¥ Standard Deviation — shows the extent of the difference of the data from the mean, An ‘examination of this gap between the mean and the data gives you an idea about the extent of the similarities and differences between the respondents. There are mathematical operations that you have to determine the standard deviation. Step 1: Compute the Mean, Step 2: Compute the deviation (difference) between each respondent’s answer (data item) and the mean. The positive sign (+) appears before the number if the differe ce is higher; negative sign (), ifthe difference is lower. Step 3: Compute the square of each deviation Step 4: Compute the sum of squares by adding the squared figures, Step 5: Divide the sum of squares by the number of data items to get the variance Step 6: Compute the square root of variance figure to get standard deviation Example: andard Deviation of the category of data collected from selected faculty members of one university (Step 1) Me | (Step 2) (Step 3) | Data Item Deviation Square of Deviation 1 8 68 2 5 25 6 4 1 | 6 “1 1 | 8 48 1 | 6 mi 1 | 6 “1 1 | ‘4 #7 49 i} 16 49 st | Total: 321 (Step 4) Sum of Squares: 321 (Step 5) Variance = 36 (321 +9) (Step 6) Standard Deviation -6 (square root of 6) Advanced Quantitative Analytical Methods ~ An analysis of quantitative data that involves the use of more complex statistical methods needing computer software like the SPSS, STATA, or MINITAB, among others, occurs graduate-level students taking their MA. or PhD degrees, n Some of the advanced method of quantitative data analysis are the following (Argyous 2011 Levin & Fox 2014; Godwin 2014; as cited by Baraceros 2016) 8) Correlation — uses statistical analysis to yield results that describes the relationship of two variables. The results, however are incapable of establishing casual relationships. b) Analysis of Variance (ANOVA) - is a statistical method used to test differences between two or more means. It may seem odd that the technique is called "Analysis of Variance” rather than “Analysis of Means.” As you will see, the name is appropriate because inferences ‘about means are made by analyzing variance. ©) Regression ~ In statistical modeli regression analysis is a statistical process for estimating the relationships among variables. I includes many techniques for modeling and analyzing several variables, when the focus is ‘on the relationship between a dependent variable and one or more independent variables (or ‘predictors), LESSON 13 STATISTICAL METHODS Basic Concept Statistics is a form of mathematical analysis that uses quantified models, representations and synopses for a given set of experimental data or real-life studies. Statistics studies methodologies to gather, review, analyze and draw conclusions from data, Statistical methods analyze large volumes of data and their properties. Statistics is used in various disciplines such as psychology, business, physical and social sciences, humanities, government and manufacturing. Statistical data is gathered using a sample procedure or other method. Two types of statistical methods are used in analyzing data: descriptive statistics and infer ial statistics. Descriptive statistics are used to synopsize data from a sample exercising the mean or standard deviation, Inferential statistics are used when data is viewed a ‘a subclass of a specific population. 2B Statistical Methodologies 1. Descriptive Statisties- Descriptive statistics are brief descriptive coefficients that summarize a given data set, which can be either a representation of the entire population or a sample of it, Descriptive statistics are broken down into measures of central tendency and measures of variability, or spread. Measures of central tendency inelude the mean, median and mode, while measures of variability include the standard deviation or variance, and the ‘minimum and maximum variables. Inferential Statisties - Now, suppose you need to collect data on a very large population For example, suppose you want to know the average height of all the men in a city with a population of so many million residents. It isn't very practical to try and get the height of each man. This is where infer ential statistics comes into play. Inferential statistics makes inferences about populations using data drawn from the population, Instead of using the entire population to gather the data, the statistician will collect a sample or samples from the millions of residents and make inferences about the entire population using the sample. The sample is a set of data taken from the population to represent the population. Probability distributions, hypothesis testing, correlation testing and regression analysis are all fall under the category of inferential statistics. Types of Statistical Data Analysis 1. Univariate Analysis ~ analysis of one variable 2. Bivariate Analysis ~ analysis of two variables (independent and dependent) 3. Multivariate An: is — analysis of multiple relations between multiple variables. Statistical Methods of Bivariate Analysis According to the book of Baraceros (2016) bivariate analysis happens by means of the following methods (Argyrous 2011; Babbie 2013; Punch 2014) 1. Correlation or Covariation (correlated variation) ~ describes the relationship between two variables and also tests the strengths or significance of their linear relation. 14 Covariance is the statistical term (0 measure the extent of the change in the relationship of two random variables. Random variables are data with varied values like those ones in the interval level or scale (Strongly disagree, disagree, neutral, agree, strongly agree) whose values depend on the arbitrariness of the respondents, 2. Cross Tabulation — is also called “crosstab or students-contingency table” that follows the format of a matrix that is made up of lines of numbers, symbols, and other expressions. Similar to one type of graph called table, matrix arranges data in rows and columns. If the table compares data on only two variables, such table is called Bivariate Table. Example: Secondary School Participants who attend the I“ UCNHS Research Conference School MAL ALE, Row Total 152 127 Qua (18.7%) (15.4%) 279 120 98 UNCNHS (14.8%) (1.9%) 218 59 48, PUNP (7.2%) (58%) 107 61 58 ucu (75%) (1%) 9 81 79 LNL (10%) (95%) 159 9 99 U-Pang, (9.7%) (12s) 178 102 120 cLLC (12.6%) (14.5%) 222 9 93 ABE (85%) (13%) 162 83 101 sti (10.2%) (12.2%) 184 Column Total 806 823 100%) 00%) 1,629 Measure of Correlations Correlation is a bivariate analysis that measures the strengths of association between two variables and the direction of the relationship, In terms of the strength of relationship, the value of the correlation coeticient varies between +1 and -1. When the value of the correlation coefficient lies around + 1, then it is suid to be a perfect degree of association between the two variables, As the correlation coefficient value goes towards 0, the relationship between the two variables will be weaker. The direction of the relationship is simply the + (indicating a positive relationship between the variables) or ~ Gndicating @ negative relationship between the variables) sign of the correlation, Usually, in statistics, we measure four types of correlations: Pearson correlation, Kendall rank correlation, Spearman correlation, and the Point-Biserial > PEARSON R CORRELATION Pearson r correlation is the most widely used correlation statistic to measure the degree of the relationship between linearly related variables. For example, in the stock market, if we want to measure how two stocks are related to each other, Pearson rcorrelation is used to measure the degree of relationship between the two. The Point-biserial correlation is conducted with the Pearson correlation formula except that one of the variables is dichotomous. The following formula is used to calculate the Pearson r correlation: Pearson r correlation coefficient N = number of value in each data set ‘Day = sum of the products of paired scores Ix =sum of x s Ly =sum of y scores Dx2+ sum of squared x scores Ly2= sum of squared y scores ‘Types of research questions a Pearson correlation can examine: Is there a statistically significant relationship between age, as measured in years, and height, measured in inches? 76 Is there a relationship between temperature, measure in degree Fahrenheit, and ice ream sales, measured by income? Is there a relationship among job satisfaction, as measured by the JSS, and income, measured in dollars? Assumptions For the Pearson r correlation, both variables should be normally distributed (normally distributed variables have a bell-shaped curve). Other assumptions include linearity and homoscedasticity. Linearity assumes a straight line relationship between each of the variables in the analysis and homoscedasticity assumes that data is normally distributed about the regression line. CONDUCT AND INTERPRET A PEARSON CORRELATION KEY TERMS Effect size: Cohen’s standard will be used to evaluate the correlation coefficient to determine the strength of the relationship, or the effect size, where comelation coefficients between .10 and .29 represent a small association, coefficients between .30 and .49 represent a medium association, and coefficients of 50 and above represent a large association or relationship. Continuous data: Data that is interval or ratio level. This type of data possesses the properties of magnitude and equal interval between adjacent units. Equal intervals between adjacent units’ means that there are equal amounts of the variable being measured between adjacent units on the scale, An example would be age. An increase in age from 21 to 22 would be the same as an increase in age from 60 to 61 > KENDALL RANK CORRELATION Kendall rank correlation is a non-parametric test that measures the strength of dependence between two variables. If we consider two samples, a and b, where each sample size isn, we know that the total number of pairings with ab is n(-1)/2. The fotlos ng formula is used to calculate the value of Kendall rank correlation: Ne= number of concordant Nd= Number of discordant 7 CONDUCT AND INTERPRET A KENDALL CORRELATION KEY TERMS Concordant: Ordered in the same way. Discordant: Ordered differently Spearman rank correlation: Spearman rank correlation is a non-parametric test that is used to measure the degree of association between two variables. It was developed by Spearman, thus itis called the Spearman rank correlation. Spearman rank correlation test does not assume any assumptions about the distribution of the data and is the appropriate correlation analysis when the variables are measured on a scale that is at least ordinal ‘The following formula is used (0 calculate the Spearman rank correlation: P= Spearman rank correlation diz the diffe umber of value in each data set nce between the ranks of comesponding values Xi and Yi Questions Spearman Correlation Answers Is there a statistically significant relationship between participants’ responses to two Likert scales ‘questions? Is there a statistically significant relationship between how the horses rank in the race and the horses’ ages? Assumptions Spearman rank correlation test does not make any assumptions about the distribution, The assumptions of Spearman rho correlation are that data must be at least ordinal and scores on one variable must be montonically related to the other variable. CONDUCT AND INTERPRET A SPEARMAN CORRELATION KEY TERMS Effect size: Cohen’s standard will be used to evaluate the correlation coefficient to determine the strength of the relationship, or the effect size, where coefficients between .10 and .29 represent a small association; coefficients between .30 and .49 represent a medium association; and coefficients of .50 and above represent a large association or relationship. 8 Ordinal dat « Ordinal scales rank order the items that are being measured to indicate if they possess more, less, or the same amount of the variable being measured. An ordinal scale allows us to determine if X > Y, ¥ > X, or ifX = Y. An example would be rank ordering the participants in a dance contest. The dancer who was ranked one was a better dancer than the dancer who was ranked two, The dancer ranked two was a better dancer than the dancer who was ranked three, and so ‘on. Although this scale allows us to determine greater than, less than, or equal to, it still does not define the magnitude of the relationship between units, > Chi-square is the statistical test for bivariate analysis of nominal variables, specifically, to test the null hypothesis. It tests whether or not a relationship exists between or among variables and tells. the probability that the relationship is caused by chance. This cannot in any way show extent of the association between two variables. Types of Data: ‘There are basically two types of random variables and they yield wo types of data suumerical and categorical. A chi square (X°) statistic is used to investigate whether distributions of jorical variable yield data in the categorical variables differ from one another. Basically categories and numerical variables yield data in numerical form. Responses to such questions as ‘What is your major? or Do you own a car?” are categorical because they yield data such as “biology” or “no.” In contrast, responses to such questions as “How tall are you?” or "What is your G.P.A.2" are numerical. Numerical data can be either discrete or continuous. The table below may help you see the differences hetween these two variables. Categorical Whats your sex? male o female Numeral Discrete How many cas 40900 yop thee Numeral Continsous - How tall ae you? 72 inches Notice that discrete data arise from a counting process, while continuous data arise from a ‘measuring process 79 ‘The Chi Square statistic compares the tallies or counts of categorical responses between two (or more) independent groups. (Note: Chi square tests can only be used on actual numbers and not on percentages, proportions, means, etc.) 2x 2 Contingency Table ‘There are several types of chi square tests depending on the way the data was collected and the hypothesis being tested. We'll begin with the simplest case: a 2 x 2 contingency table. If we set the 2 x 2 table to the general notation shown below in Table I using the letters a,b, c, and d to denote the contents ofthe cells, then we would have the following table: Table 1. General notation for a 2 x 2 contingency table. Variable 1 Variable2 | Datatype1 Data type2 //)/Potals) Category I [a 3 Category 2 | tera] a a ar arbtetd=N, For a2 x2 contingency table the Chi Square statistic is calculated by the formula: Note: notice that the four components of the denominator are the four totals from the table columns and rows. Suppose you conducted a drug trial on a group of animals and you hypothesized that the animals receiving the drug would show increased heart rates compared to those that did not receive the drug, You conduct the study and collect the following data: Ho: The proportion of animals whose heart rate increased is independent of drug treatment. Ha: The proportion of animals whose heart rate increased is associated with drug treatment 80 ‘Table 2. Hypothetical drug trial results. Heart Rate |No Heart Rate Increased Increase Treated 36 M4 oO Nottreated 30 25 | Toa 6689S Applying the formula above we get 105 [ (36) (25) - (14) (30) F / (50) (55) (39) (66) Chi square 3418, Before we can proceed we need to know how many degrees of freedom we have. When a ‘comparison is made between one sample and another, a simple rule is that the degrees of freedom ‘equal (number of columns minus one) x (number of rows minus one) not counting the totals for rows of columns. For our data this gives (2-1) x (2-1 We now have our chi square statistic (x? = 3.418), our predetermined alpha level of significance (0.05), and our degrees of freedom (df = 1). Entering the Chi square distribution table with | degree of freedom and reading along the row we find our value of x? (3.418) lies between 2.706 and 3.841. The corresponding probability is between the 0.10 and 0.05 probability levels. ‘That means thatthe p-value is above 0.05 (it is actually 0.065). Since a p-value of 0.65 is greater than the conventionally accepted significance level of 0.05 (i.e. p > 0.05) we fail to reject the null hypothesis. In other words, there is no statistically significant difference in the proportion of animals whose heart rate increased. What would happen if the number of control animals whose heart rate increased dropped to 29 instead of 30 and, consequently, the number of controls whose he: rate did not inerease changed from 25 to 26? Try it. Notice that the new x? value is 4.125 and this value exceeds the table value of 3.841 (at 1 degree of freedom and an alpha level of 0.05). This means that p < 0.05 (it is now0.04) and we reject the null hypothesis in favor of the alternative hypothesis - the heart rate of animals is different between the treatment groups. When p < 0.05 we generally refer to this as a significant difference. 81 ‘Table 3. Chi Square distribution table. Probability level (alpha) 0.455 |2.706 3.841 |5.412 6.635 10.827 1.386 | 4.605 | 5.991 |7.824 9.210 | 13.815 2.366 6.251 7.815 |9.837 11.345 16.268 3.357 7.779 9.488 | 11.668 13.277 18.465 4,351 | 9.236 | 11,070 | 13.388 15.086 20.517 ‘To make the chi square calculations a bit easier, plug you're observed and expected values into the following applet. Click on the cell and then enter the value. Click the compute button on the lower tight corner to see the chi square value printed in the lower left hand corner. Chi Square Goodness of Fit (One Sample Test) ‘This test allows us to compare a collection of categorical data with some theoretical expected distribution. This testis often used in genetics to compare the results of a cross with the theoretical distribution based on genetic theory. Suppose you preformed a simpe monohybrid cross between two individuals that were heterozygous for the trait of interest. Aax Aa ‘The results of your cross are shown in Table 4. ‘Table 4, Results of a monohybrid cross between two heterozygotes for the 'a’ gene. ‘The phenotypic ratio 85 of the “A” type and 15 of the a-type (homozygous recessive). In a monohybrid cross between two heterozygotes, however, we would have predicted a 3:1 ratio of phenotypes. In other words, we would have expected to get 75 A-type and 25 a-type. Are or results different? ‘Calculate the chi square statistic x* by completing the following steps: 1. Foreach observed number in the table subtract the corresponding expected number (0 — E). 2] 3. Divide the squares obtained for each cell in the table by the expected number for that cell [ (O-E)/E}. 2. Square the difference | (O 4. Sum all the values for (O - E)? /E. This is the chi square statistic For our example, the calculation would be: x° = 5.33 Observed Expected (O—B) (O—F)2 (O—E)E Actype 85 75 10 100 1.33 atype 15 25 10 100 40 Total 100 100 5.33 We now have our chi square statistic (x? = 5.33), our predetermined alpha level of significance (0.05), and our degrees of freedom (if =1). Entering the Chi square distribution table with I degree of freedom and reading along the row we find our value of x? 5.33) lies between 3.841 and S412. The comesponding probability is 0.0S ANALYSIS OF VARIANCE (ANOVA) Analysis of Variance ¥ One Way (one factor, fixed effects) ¥ ‘Two Way (two factors, randomized blocks) ¥ Two Way with Repeated Observations (Iwo factors, randomized block) Fully Nested (hierarchical factors) Latin Square (one primary and two secondary factors) “ ¥ Crossover (two factors, fixed effects, treatment crossover) ¥ Kruskal-Wallis (nonparametiie one way) v Friedman (nonparametric two way) ¥ Homogeneity of Variance (examine the ANOVA assumption of equal variance) Normality (examine the ANOVA assumption of normality) ¥ Agreement (examine agreement of two or more samples) Basics Concepts ANOVA is a set of statistical methods used mainly to compare the means of two or more samples. Estimates of variance are the key intermediate statistics calculated, hence the reference to variance in the title ANOVA. The different types of ANOVA reflect the different experimental designs and situations for which they have been developed, Excellent accounts of ANOVA are given by Armitage & Berry (1994) and Kleinbaum et.al (1998). Nonparametric alternatives to ANOVA are discussed by Conover (1999) and Hollander and Wolle (1999). ANOVA and regression ANOVA can be treated as a special case of general linear regression where independ avpredicator variables are the nominal categories or factors. Each value that ean be taken by a factor is referred toas.a level. k different levels (c.g. three different types of diet in a study of diet on weight gain) are coded not as a single column (eg. of diet 1 to 3) but as k-ldummy variables. The dependentioutcome variable inthe regression consists ofthe study observations. General linear regression can be used in this way to build more complex ANOVA models than those described in this section; this is best done under expert statistical guidance. 90 Fixed vs. random effects A fixed factor has only the levels used in the analysis (¢.8. sex, age, blood group). A random factor has many possible levels and some are used in the analysis (e.g. time periods, subjects, observers), Some factors that are usually treated as fixed may also be treated as random if the study is looking at them as part of a larger group (e.g. treatments, locations, tests). Most general statistical texts arrange data for ANOVA into tables whet columns represent fixed factors and the one and two way analyses described are fixed factor methods. Multiple comparisons ANOVA gives an overall test for the difference between the means of k groups. StatsDirect enables you to compare all k(k-1)/2 possible pairs of means using methods that are designed to avoid the type I error that would be seen if you used two sample methods such as t test for these comparisons. The multiple comparison/contrast methods offered by StatsDirect are Tukey(- Kramer), Scheffé, Newman-Keuls, Dunnett and Bonferroni (Amitage and Berry, 1994; Wallenstein, 1980; Liddell, 1983; Miller, 1981; Hsu, 1996; Kleinbaum et al., 1998). See multiple ‘comparisons for more information, Further methods ‘There are many possible ANOVA designs. StatsDirect covers the common designs in its, ANOVA section and provides general tools (see general linear regression and dummy variables) for building more complex designs. Other software such as SAS and Genstat provide further specific ANOVA designs. For example, balanced incomplete block design: - with complete missing blocks you should consider a balanced incomplete block design provided the number of missing blocks does not exceed the number of treatments. Treatments, 1 2 4 A x x x B x x x Blocks: c x x x D x x x 91 Complex ANOVA should not be attempted without expert statistical guidance. Beware situations where over complex analysis is used in order to compensate for poor experimental design. There is no substitute for good experimental desiga. > Regression Regression is a statistical measure used in finance, investing and other disciplines that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Regression helps investment and financial managers to value assets and understand the relationships between variables, such as commodity prices and the stocks of businesses dealing in those commodities. The two basic types of regression are linear regression and multiple linear regression, although there are non-linear regression methods for more complicated data and analysis. Linear regression uses one independent variable to explain or predict the outcome of the dependent variable Y, while multiple regressions use two or more independent variables to prediet the outcome. Regression can help finance and investment professionals as well as professionals in other businesses. Regression can help predict sales for a company based on weather, previous sales, GDP growth or other conditions. The capital asset pricing model (CAPM) is an often-used regression model in finance for pricing assets and discovering costs of capital. The general form of each type of regression is: Linear Regression: Y =a +bX tu Multiple Regression: Y =a + b:X, * boX2+bsXs +. + BX. + Where: Y = the variable that you are trying to predict (dependent variable) X =the variable that you are using to predict Y (independent variable) the intercept b= the slope = the regression residual Regression takes a group of random variables thought to be predicting Y, and tries to find a ‘mathematical relationship between them. This relationship is typically in the form of a straight line (linear regression) that best approximates all the individual data points. In multiple regression, the separate variables are differentiated by using numbers with subscript 92 Regression in Investing Regression is often used to determine how many specific factors such as the price of a commodity, interest rates, particular industries or sectors influence the price movement of an asset ‘The aforementioned CAPM is based on regression, and itis utilized to project the expected returns for stocks and to generate costs of capital, A stock's returns are regressed against the returns of a broader index, such as the S&P 500, to generate a beta for the particular stock. Beta is the stock's risk in relation to the market or index and is reflected as the slope in the CAPM model. The ‘expected retum for the stock in question would be the dependent variable Y, while the independent variable X would be the market risk premium. Additional variables such as the market capitalization of a stock, valuation ratios and recent retums can be added to the CAPM model to get better estimates for retums. These additional factors are known as the Fama-French factors, named after the professors who developed the multiple linear regression model to better explain asset retums. SAMPLING PROCEDURE Sampling Procedures Sampling is a process or technique of choosing a sub-group from a population to participate in the study; it is the process of selecting a number of individuals for a study in such a way that the cted (Ogula, 2005). There individuals selected represent the large group from which they wer are two major sampling procedures in research. These include probability and non-probability samp Probability Sampling Procedures In probability sampling, everyone has an equal chance of being selected. This scheme is one in which every unit in the population has a chance (greater than zero) of being selected in the sample. There are four basic types of sampling procedures associated with probability samples. ‘These include simple random, systematic sampling, stratified and cluster 93 nple Random Sampling Procedure Simple random sampling provides the base from which the other more complex sampling methodologies are derived. To conduct a simple random sample, the researcher must first prepare an exhaustive list (sampling frame) of all members of the population of interest. From this list, the sample is drawn so that each person or item has an equal chance of being drawn during each selection round (Kanupriya, 2012). ‘To draw a simple random sample without introducing researcher bias, computerized sampling programs and random number tables are used to impartially select the members of the population to be sampled. Subjects in the population are sampled by a random process, using either a random number generator or a random number table, so that each person remaining in the 2008). population has the same probability of being selected for the sample (Friedrichs, Systematic Sampling Procedure Systematic sampling procedure often used in place of simple random sampling. In systematic sampling, the researcher selects every nth member after randomly selecting the first through nth element as the starting point. For example, if the researcher decides to sample 20 respondents from a sample of 100, every Sth member of the population will systematically be selected. A researcher may choose to conduct a systematic sample instead of a simple random sample for several reasons. Firstly, systematic samples tend to be easier to draw and exe te, secondly, the researcher does not have to go back and forth through the sampling frame to draw the members to ‘be sampled, thindly, a systematic sample may spread the members selected for measurement more evenly across the entire population than simple random sampling. Therefore, in some cases, systematic sampling may be more representative of the population and mote precise (Groves et al., 2006). Stratified Sampling Procedure Stratified sampling procedure is the most effective method of sampling when a researcher wants to get a representative sample of a population. It involves categorizing the members of the population into mutually exclusive and collectively exhaustive groups. An independent simple random sample is then drawn from each group. Stratified sampling techniques can provide more precise estimates if the population is surveyed is more heterogeneous than the cat jorized groups. ‘This technique can enable the researcher to determine desired levels of sampling precision for each group, and can provide administrative efficiency. The main advantage of the approach is that it’s able to give the most representative sample of a population (Hunt & Tyrrell, 2001), Cluster Sampling Procedure In cluster sampling, a cluster (a group of population elements), constitutes the sampling unit, instead of a single element of the population. The sampling in this technique is mainly geographically driven. ‘The main reason for cluster sampling is cost efficiency (economy and feasibility). The sampling frame is also often readily available at cluster level and takes short time for listing and implementation, The technique also suitable for survey of institutions (Abmed, 2009) or households within a given geographical area. But the design is not without disadvantages, some of the challenges that stand out are: it may not reflect the diversity of the community; other elements in the same cluster may share similar characteristics; provides less information per observation than an SRS of the same size (redundant information: similar information from the others in the cluster); standard errors of the estimates are high, compared to other sampling designs with the same sample size. Non Probability Sampling Procedures Non probability sampling is used in some situations, where the population may not be well ‘defined, In other situations, there may not be great interest in drawing inferences from the sample to the population. The most common reason for using non probability sampling procedure is that it is less expensive than probability sampling procedure and can often be implemented more quickly (Michael, 2011). It includes purposive, convenience and quota sampling procedures. 95 Purposive/Judgmental Sampling Procedure In purposive sampling procedure, the researcher chooses the sample based on who he/she thinks would be appropriate for the study. The main objective of purposive sampling is to arrive as ata sample that can adequately answer the research objectives. The selection of a purposive sample is often accomplished by applying expert knowledge of the target population to select in a non- random manner a sample that repre ion of the population (Henry, 1990). ‘A major disadvantage of this method is subjectivity since another researcher is likely to come up with a different sample when identifying important characteristics and picking typical elements to be in the sample. Given the subjectivity of the selection mechanism, purposive sampling is generally considered most appropriate for the selection of small samples often from a limited geographic area or from a restricted population definition. The knowledge and experience of the researcher making the selections is a key aspect of the “success”? of the resulting sample (Michael, 2011). A case study research design for instan employs purposive sampling procedure to amive at a particular “case” of study and a given group of respondents. Key informants are also selected using this procedure. Convenience Sampling Procedure Convenience sampling is sometimes known as opportunity, accidental or haphazard sampling. It is a type of nonprobability sampling which involves the sample being drawn from that part of the population which is close to hand, that is, a population which is readily available and convenient. The researcher using such a sample cannot scientifically make generalizations about the total population from this sample because it would not be representative enough (Michael, 2011), This type of sampling is most useful for pilot testing. Convenience sampling differs from purposive sampling in that expert judgment is not used to select a representative sample. The primary selection criterion relates to the ease of obtaining a sample, Ease of obtaining the sample relates to the cost of locating elements of the population, the geographic distribution of the sample, and obtaining the interview data from the selected elements (de Le 1w, Hox & Huisman, 2003), 96 Sampling Techniques When sampling, you need to decide what units (ie., what people, organizations, data, to inelude in your sample and which ones to exclude. As you'll know by now, sampling techniques act as a guide to help you select these units, and you will have chosen a specific probability or non-probability sampling technique: + If you are following a probability sampling technique, you'll know that you require a list of the population from which you select units for your sample. This raises potential data protection and confidentiality issues because units in the list (i.e, when people are your units) will not necessarily have given you permission to access the list with their details. ‘Therefore, you need to check that you have the right to access the list in the first place. + Ifusing anon-probability sampling technique, you need to ask yourself whether you are including or excluding units for theoretical or practical reasons. In the case of purposive sampling, the choice of which units to include and exclude is theoretically-driven. In such cases, there are few ethical concerns. However, where units are included or excluded for practical reasons, such as ease of access or personal preferences (e.g,, conveni e sampling), there is a danger that units will be excluded unnecessarily. For example, itis not uncommon when select units using convenience sampling that researchers’ natural preferences (and even prejudices) will influence the selection process. For example, maybe the researcher would avoid approaching certain groups (e.g., socially marginalized individuals, people who speak little English, disabled people, etc.). Where this happens, it raises ethical issues because the picture being built through the research can be excessively narrow, and arguably, unethically narrow. This highlights the importance of using theory to. determine the creation of samples when using non-probability sampling techniques rather than praetieal reasons, whenever possible. Sample size Whether you are using a probability sampling or non-probability sampling technique to help you create your sample, you will need to decide how large your sample should be (ic., your sample size). Your sample size becomes an ethical issue for two reasons: (a) over-sized samples and (b) under-sized samples. > Over-sized samples 7 ‘A sample is over-sized when there are more units (e.g., people, organizations) in the sample than are needed to achieve you goals (i.c., to answer your research questions robustly). An over-sized sample is considered to be an ethical issue because it potentially exposes an excessive mumber of people (or other units) to your research. Let's look at where this may or may not be a problem: Not an ethical issue Imagine that you were interested in the career choices of students at your university, and you were only asking students to complete a questionnaire taking no more than 10 minutes, all an over-sized sample would have done was waste a little of the students’ time. Whilst you don't want to be wasting peoples’ time, and should try and avoid doing so, this is not a ‘major ethical issue, A potential ethical issue Imagine that you were interested in the effect of a carbohydrate free diet on the concentration levels of female university students in the classroom. You know that carbohydrate free diets (ie., no breads, pasta, rice, ete.) are a new fad) amongst female university students because some female students feel that it helps them loose weight (or not put weight on). However, you have read some research showing that such diets can make people feel lethargic ( Jow on energy). Therefore, you want to know whether this is affecting students’ performance; or more specifically, the concentration levels of female students in the classroom, You decide to conduct an experiment where you measure concentration levels amongst 40 female students that are not on any specific diet. First, you measure their concentration levels. Then, you ask 20 of the students to go on a carbohydrate free diet and whilst the remaining 20 continue with the normal food consumption. After a period of time (eg. 14 days), you measure the concentration levels of all 40 students to compare any differences between the two groups (ie. the normal group and the group on the carbohydrate free diet). You find that the carbohydrate free diet did significantly impact on the concentration levels of the 20 students. So here comes the ethical issue: What if you could have come to the same conclusion with fewer students? What if you only needed to ask 10 students to go on the carbohydrate free diet rather than 20? Would this have meant that the performance of 10 students would. not have been negatively for a 14 day period as a result? The important point is that you do not want to expose individuals to distress or harm unnecessarily, 98 Under-sized samples A sample is under-sized when you are unable to achieve your goals (ie., to answer your research ques jons robustly) b c you insufficient units in your sample. The important point is that you fail to answer your research questions not because a potential answer did not exist, but because your sample size was too small for such an answer to be discovered (or interpreted). Let's look where this may or may not be a problem: > Not an ethical issue let’s take thi imple of the career choices of students at your university. If you did not collect sufficient data; that is, you did not ask eno students to complete your questionnaire, the answers you get back from your sample may not be representative of the population of all students at your university. This is bad from two perspectives, but only one is arguably a potential ethical issue: First, itis bad because your dissertation findings will be of a lower quality; they will not reflect the population of all students at the university that ‘you are interested in, which will most likely lead to a lower mark (i.e, external validity is an important goal of quantitative research). This is bad for you, but not necessarily unethical. However, if the findings from your research are incorrectly taken to reflect the views of all students at your university, and somehow wrongly influence policy within the university € amongst the Career Advisory Service), your dissertation research could have negatively impacted other students. This is a potential ethical issue. Despite this, we would expect that the likelihood of this happening is fairly low > A potential ethical issue Going back to the example of the effect of a carbohydrate free diet on the concentration levels of female university students in the classroom, an under-sized sample does pose potential ethical issues. After all, with the exception of students that just want to help you out, it is likely that most students are taking part voluntarily because they want to the effect of such a diet on their potential classroom performance. Perhaps they have used the diet before or are thinking about using the diet. Altemately, perhaps they are worried about the effects of such diets, and what to further research in this area. In either if no conclusions can be made or the findings © not statistically significant because 99. the sample size was (oo small, the effort, and potential distress and harm that these volunteers put themselves through was all in vein (i.e., completely wasted). This is where an under-sized sample can become an ethical issue. As a researcher, even when you're an undergraduate or master’s level student, you have a duty not to expose an e sive number of people to unnecessary distress or harm. This is one of the basic principles of research ethics. At the same time, you have a duty not to fail to achieve what You set out to achieve. This is not just a duty to yourself or the sponsors of your dissertation (if you have any), but more importantly, to the people that take part in your research (i.e., your sample). To try and minimize the potential ethical issues that come with over-sized and under-sized samples, there are instances where you can make sample size calculations to estimate the required sample ls, size to achieve your g Gatekeepers Gatekeepers can often control access to the participants you are interested in (e.g, a manager's control over ess to employees within an organization). This has ethical implications because of the power that such gatekeepers can exercise over those individuals. For example, they may control what access is (and is not) granted to which individuals, coerce individuals into taking part in your research, and influence the nature of responses. This may affect the level of consent that 4 participant gives (or is believed to have given) you. Ask yourself: Do I think that participants are taking part voluntari the voluntary nature of individuals? participation, and how will it affect the data? ) How did the way that I gained access to participants affect not only Problems with gatekeepers can also affect the representativeness of the sample. Whilst qualitative research designs are_more likely to use non-probability sampling techniques, even ‘quantitative research designs that use probability sampling can suffer from issues of reliability associated with gatekeepers. In the case of quantitative research designs using probability sampling, are gatekeepers providing an accurate list of the population without missing out potential participants (e.g., employees that may give a negative view of an organization)? If non- probability sampling is being used, are gatekeepers coercing participants to take part or influencing their responses? 100 CHECK YOUR KNOWLEDGE (Short Answer Question) (2 POINTS EACH) DIRECTIONS: Read the question carefully. Write your answer on the space provided. 1. It is a systematic approach to investigations during which numerical data is collected and/or the researcher transforms what is collected or observed into numerical data. _ 2. a series of questions and other prompts for the purpose of gathering information from respondents. ~ conversation between two or more people (the interviewer and the interviewee) where questions are asked by the interviewer to obtain information from the _ =a more structured approach would be used to gather quantitative data 4. a group or single participants are manipulated by the researcher, for example, asked to perform a specific task or action, Observa are then made of their user behavior, user processes, workilows etc, either in a controlled situation (e.g. lab based) or in a real- ‘world situation (e.g. the workplace). recordings or logs of system or website activity analysis of documents belonging to an organization. the whole units of analysis that might be investigated, this could be students, cats, house prices etc. 8. the actual set of units selected for investigation and who participate in the research 9. characteristics of the units/participants. 10, the score/label/walue of a variable, not the frequency of occurrence. For example, if age is a characteristic of a participant then the value label would be the actual age, eg. 21,22, 25, 30, 18, not how ‘many participants are 21, 22, 25, 30, 18. 11. the individual unit/participant of the study/research, = 12, is complex and can be done in many ways dependent on 1) what you want to achieve from your research, 2) practical considerations of who is available to participate. 13, to analyzed data means to quantify of change the verbally expressed data into numerical information. ~ 14, uses. statistical analysis to yield results that describes the relationship of two variables. The results, however are incapable of establishing casual relationships, 15, is a statistical method used to test differences between two or more me SD sem ots arte esting scaled “Analysn of Variance ter tan Means sar =_- = = 101 2? ee VIITY 1: SPECULATIVE THINKING (GROUP WORK) Directions: Question does not only indicate your curiosity about your world but also signal your desire for clearer explanation about things. Hence, ask one another thought-provoking questions about quantitative data analysis. For proper question formulation, you may draft your question on the space below. ACTIVITY 2: INDIVIDUAL WORK: Recall two or three most challenging question from your classmates shared to the class that you wanted to answer but to get the chance to do so, Write and answer them on the lines provided. ACTIVITY 3: MATCHING TYPE Directions: Match the expression in A with those in B by writing the letter of your answer on the line before the word. A B — 1. Mean a. data-set divider 2. Ratio b. facts or information 3. Data ¢. part-by-part examination ____4. Coding 4. data-preparation techniques ____5. Analysis €. repetitive appearance of an item 6. Mode £ sum divided numbers of items ____7. Media . valuable zero _____8. Standard deviation h. ANOVA 9. Regression i. shows variable predictor = 10. Table j. data organizer 102 [REFERENCE | David M. Lane, Online Statistics Education: An Interactive Multimedia Course of Study, Developed by Rice University (Lead Developer) University of Houston Clear Lake, and Tufts University batp:/fonlinestatbook.com/2/analysis_of_variance/intro.hunl butp://www health herts.ac.uik/immunologyAVeb%20programme%20- e20Researchhealthprofessionals/quantitative_data_analysis.htm blip://www.investopedia.com/terms/v/statisties.asp Algina, J., & Keselman, H. J. (1999). Comparing squared multiple correlation coefficients Examination of a confidence interval and a test significance. Psychological Methods, 4(1), 7683. Bobko, P. (2001). Correlation and regression: Applications for industrial organizational psychology and management (2nd ed.). Thousand Oaks, CA: Sage Publications. Bonett, D. G. (2008). Meta-analytic interval estimation for bivariate correlations. Psychological Methods, 13(3), 173-181 P. Y., & Popovich, P. M. (2002). Correlation: Parametric and nonparametric measures: ‘Thousand Oaks, CA: Sage Publications. Cheung, M. W.-L., & Chan, W. (2004). Testing dependent correlation coefficients via structural equation modeling. Organizational Research Methods, 7(2), 206-223. Coffman, D. L., Maydeu-Olivares, A., Arnau, J. (2008). Asymptotic distribution free interval ‘estimation: For an intraclass correlation coefficient with applications to longitudinal data. Methodology, 4(1), 4-9. Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences. (rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates. Hatch, J. P., Heame, E. M., & Clark, G. M. (1982). A method of testing for serial correlation in univariate repeated-measures analysis of variance. Behavior Research Methods & Instrumentation, 14(5), 497-498, Kendall, M. G., & Gibbons, J. D. (1990). Rank Correlation Methods (Sth ed.) London: Edward Arnold. Krijnen, W. P. (2004). Positive loadings and factor correlations from positive covariance

You might also like