0% found this document useful (0 votes)
10 views

Chapter 6 Processing and Analysis of Data

The document discusses processing and analysis of data in research. It covers topics like measures of central tendency, dispersion, skewness, and relationship. Measures of central tendency discussed include mode, median, and mean. Measures of dispersion covered are range, variance, and standard deviation. The document provides details on calculating and applying these statistical measures to data.

Uploaded by

solomon tadesse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Chapter 6 Processing and Analysis of Data

The document discusses processing and analysis of data in research. It covers topics like measures of central tendency, dispersion, skewness, and relationship. Measures of central tendency discussed include mode, median, and mean. Measures of dispersion covered are range, variance, and standard deviation. The document provides details on calculating and applying these statistical measures to data.

Uploaded by

solomon tadesse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

School of Electrical

Engineering and Computing

Department of Electronics

and Communication

Engineering

Engineering Research and

Development Methodology
By

BY
4/25/2021
Demissie Jobir Gelmecha (PhD.)
1

Engineering Research and Development Methodology


Chapter 6: Processing and Analysis of Dat
6.1 Elements/Types of Analysis
6.2 Statistics in Research
6.3 Measures of Central Tendency
6.4 Measures of Dispersion
6.5 Measures of Asymmetry (Skewness)
6.6 Measures of Relationship
6.7 Simple Regression Analysis
6
4/25/2021 2

Nonlinear Chiral Fiber


6.1 Processing Operations
 Before you can interpret your data, you must first organize
and summarize them.
 How you organize your data depends on your research design.
1. Editing: is a process of examining the collected raw data
(specially in surveys) to detect errors and omissions and to
correct these when possible
2. Coding: is the process of assigning numerals or other symbols
to answers so that responses can be put into a limited number
of categories or classes.
3. Classification: Most research studies result in a large volume
of raw data which must be reduced into homogeneous groups
if we are to get meaningful relationships.
4. Tabulation: When a mass of data has been assembled, it
becomes necessary for the researcher to arrange the same in
some kind of concise and logical order.
6.2 Statistics in Research

In many research situations, it is convenient

to summarize your data by applying

descriptive statistics.

 Two categories of descriptive statistics:

I. Measures of center Tendency and

II.Measures of Dispersion.
6.3 Measures of Center Tendency
 It gives you a single score that represents the

general magnitude of scores in a distribution.

 This score characterizes your distribution by

indicating a score value that falls at or near the

middle of the distribution.

 The most common measures of center are the mode,

the median, and the mean (also called the

arithmetic average).

 Each measure of center has strengths and


The Mode
 The mode is simply the most frequent score in a distribution.

 To obtain the mode, count the number of scores falling into


each response category.

 The response category with the highest frequency is the mode.

 The mode of the distribution 1, 2, 4, 6, 4, 3, 4 is 4, because 4 is


the most frequent score.

 No mode exists for a distribution in which all the scores are


different.

 Some distributions, called bimodal distributions, have two


modes.
 Although the mode is simple to calculate, it is limited

because it does not take into account the values of scores

outside of the most frequent score.

 The only information yielded by the mode is the most

frequent score.

 Consequently, two distributions may have similar modes

and yet look very different.

 Looking only at the mode, you might conclude that the two

distributions are similar.

 Obviously, this conclusion is incorrect.


The Median
 The median is the middle score in a distribution.

 To calculate the median, follow these steps:

1. Order the scores in your distribution from lowest to

highest (or highest to lowest, it does not matter).

2. Count down through the distribution and find the score

in the middle of the distribution.

 median of the following distribution: 7, 5, 2, 9, 4, 8, 1 is 5.

 The ordered distribution is 1, 2, 4, 5, 7, 8, 9, and 5 is the

middle score.
 The median takes more information into account
than the mode.
 However, it is still a rather insensitive measure of
center because it does not take into account the
magnitudes of the scores above and below the
median.
 As with the mode, two distributions can have the
same median and yet be very different in character.
 For this reason, the median is used primarily when
the mean is not a good choice.
The Mean
 The mean is the most sensitive measure of center because

it takes into account all scores in a distribution when it is

calculated.

 The computational formula for the mean is the sum of the

scores (ΣX) divided by the number of scores in the

distribution (n).

 The major advantage of the mean is that, unlike the mode

and the median, its value is directly affected by the

magnitude of each score in the distribution.


 Assume that distribution A contains the scores 4, 6, 3, 8, 9,

2, 3, and distribution B contains the scores 4, 6, 3, 8, 9, 2, 43.

Although the two distributions differ by only a single

score (3 versus 43), they differ greatly in their means (5

versus 10.7, respectively).

 The mean of 5 appears to be more representative of the

first distribution than the mean of 10.7 is of the second.

 The median is a better measure of center for the second

distribution. The medians of the two distributions are 4

and 6, respectively.
Choosing a Measure of Center
 One of the first things you should do when summarizing your
data is to generate a frequency distribution of the scores.

 If your scores are normally distributed (or at least nearly


normally distributed), then the mean, median, and mode will fall
at the same point in the middle of the distribution,

 When your scores are normally distributed, use the mean as your
measure of center because it is based on the most information.
 As your distribution deviates from normality, the
mean becomes a less representative measure of
center.

 The two graphs show the relationship between the


three measures of center with a positively skewed
distribution and a negatively skewed distribution.
 In a negatively skewed distribution, the mean underestimates the center.

 Conversely, in a positively skewed distribution, the mean overestimates

the center.

 Because the median is much less affected by skew, it provides a more

representative picture of the distribution’s center than does the mean


and should be preferred whenever your distribution is strongly skewed.

 Neither the mean nor the median will accurately represent the center if

your distribution is bimodal.

 With a bimodal distribution, both measures of center underrepresent one

large cluster of scores and over represent the other.


6.4 Measures of Dispersion
 If you look again at some of the sample distributions described
thus far, you will notice that the scores in the distributions differ
from each other.

 A measure of spread provides information that helps you to


interpret your data.

 Two sets of scores may have highly similar means yet very
different distributions, as the following example illustrates.

 The distributions of the two players’ averages are as follows:


• Player 1: .260, .397, .200, .195

• Player 2: .263, .267, .259, .263

 Each player has a .263 batting average over 4 years


 Which of these two players would you prefer to have
on your team? Most likely, you would pick player 2
because he is more “consistent” than player 1.

 This simple example illustrates an important point


about descriptive statistics.

 When you are evaluating your data, you should take


into account both the center and the spread of the
scores.

 Measures of spread: the range, the variance, and the


standard deviation.
The Range
 is the simplest and least informative measure of spread.

 To calculate the range, you simply subtract the lowest

score from the highest score.

 In the baseball example, the range for player 1 is .202, and

the range for player 2 is .008.

 Compare the following two distributions of scores: 1, 2, 3, 4,

5, 6 and 1, 2, 3, 4, 5, 31. The range for the first distribution is

5, and the range for the second is 30.

 The two ranges are highly discrepant despite the fact that
The Variance
 The variance is the average squared deviation

from the mean.

 The defining formula is


The Standard Deviation

 Although the variance is frequently used as a

measure of spread in certain statistical

calculations, it does have the disadvantage of

being expressed in units different from those of

the summarized data.

 However, the variance can be easily converted

into a measure of spread (s) expressed in the same

unit of measurement as the original scores: To


4.5 Measures of Skewness
 Skewness means lack of symmetry.
 In skewed distribution, the mean and the median are
pulled away from the mode.
 Mean, median and mode are not equal.

 A skewed distribution is an asymmetrical distribution.

 It has a long tail on one side and short tail on the other

side.
 Test of skewness

 To test whether a distribution is skewed or not, the

following are to be noticed. A distribution is skewed if

1. mean, median and mode are not equal.


 Shape can be described by degree of asymmetry (i.e.,

skewness).

◦ mean > median positive or right-skewness

◦ mean = median symmetric or zero-skewness

◦ mean < median negative or left-skewness

 Positive skewness can arise when the mean is increased by

some unusually high values.

 Negative skewness can arise when the mean is decreased by

some unusually low values.


4.6 Measures of Relationship
 In some cases, you may want to evaluate the direction and degree of
relationship (correlation) between the scores in two distributions.
 For this purpose, you must use a measure of association.
 The most widely used measure of association is the Pearson product-
moment correlation coefficient, or Pearson r.
 The Pearson correlation coefficient provides an index of the direction
and magnitude of the relationship between two sets of scores.
 The value of Pearson r can range from +1 through 0 to −1. The sign of the
coefficient tells you the direction of the relationship.
 A positive correlation indicates a direct relationship.
 A negative correlation indicates an inverse relationship.
Cont.
4.7 Simple Regression Analysis
 Simple Regression analysis is a quantitative research
method which is used when the study involves modelling
and analyzing variables, where the relationship includes a
dependent variable and independent variables.
 In simple terms, regression analysis is a quantitative
method used to test the nature of relationships between a
dependent variable and one or more independent
variables.
 The basic form of regression models includes unknown
parameters (β), independent variables (X), and the
dependent variable (Y).
 Regression model, basically, specifies the relation of

dependent variable (Y) to a function combination of

independent variables (X) and unknown parameters (β)

 Y ≈ f (X, β)

 Regression equation can be used to predict the values of

‘y’, if the value of ‘x’ is given, and both ‘y’ and ‘x’ are the

two sets of measures of a sample size of ‘n’. The formulae

for regression equation would be


Simple Regression Example
The following data are diastolic blood
pressure (DBP) measurements taken at
different times after an intervention for n =
5 persons. For each person, the data
available include the time of the
measurement and the DBP level. Of interest
is the relationship between these two
variables.

19 -
26
Time DPB
Patie x 2 y y2 xy
nt x
1 0 0 72 5,184 0
2 5 25 66 4,356 330
3 10 100 70 4,900 700
4 15 225 64 4,096 960
5 20 4,356 1,320
Sum 50 750 338 22,892 3,310
Mean 10 67.6 19 -
27
19 -
28
19 -
29
Thank You

You might also like