School of Electrical
Engineering and Computing
Department of Electronics
and Communication
Engineering
Engineering Research and
Development Methodology
By
BY
4/25/2021
Demissie Jobir Gelmecha (PhD.)
1
Engineering Research and Development Methodology
Chapter 6: Processing and Analysis of Dat
6.1 Elements/Types of Analysis
6.2 Statistics in Research
6.3 Measures of Central Tendency
6.4 Measures of Dispersion
6.5 Measures of Asymmetry (Skewness)
6.6 Measures of Relationship
6.7 Simple Regression Analysis
6
4/25/2021 2
Nonlinear Chiral Fiber
6.1 Processing Operations
Before you can interpret your data, you must first organize
and summarize them.
How you organize your data depends on your research design.
1. Editing: is a process of examining the collected raw data
(specially in surveys) to detect errors and omissions and to
correct these when possible
2. Coding: is the process of assigning numerals or other symbols
to answers so that responses can be put into a limited number
of categories or classes.
3. Classification: Most research studies result in a large volume
of raw data which must be reduced into homogeneous groups
if we are to get meaningful relationships.
4. Tabulation: When a mass of data has been assembled, it
becomes necessary for the researcher to arrange the same in
some kind of concise and logical order.
6.2 Statistics in Research
In many research situations, it is convenient
to summarize your data by applying
descriptive statistics.
Two categories of descriptive statistics:
I. Measures of center Tendency and
II.Measures of Dispersion.
6.3 Measures of Center Tendency
It gives you a single score that represents the
general magnitude of scores in a distribution.
This score characterizes your distribution by
indicating a score value that falls at or near the
middle of the distribution.
The most common measures of center are the mode,
the median, and the mean (also called the
arithmetic average).
Each measure of center has strengths and
The Mode
The mode is simply the most frequent score in a distribution.
To obtain the mode, count the number of scores falling into
each response category.
The response category with the highest frequency is the mode.
The mode of the distribution 1, 2, 4, 6, 4, 3, 4 is 4, because 4 is
the most frequent score.
No mode exists for a distribution in which all the scores are
different.
Some distributions, called bimodal distributions, have two
modes.
Although the mode is simple to calculate, it is limited
because it does not take into account the values of scores
outside of the most frequent score.
The only information yielded by the mode is the most
frequent score.
Consequently, two distributions may have similar modes
and yet look very different.
Looking only at the mode, you might conclude that the two
distributions are similar.
Obviously, this conclusion is incorrect.
The Median
The median is the middle score in a distribution.
To calculate the median, follow these steps:
1. Order the scores in your distribution from lowest to
highest (or highest to lowest, it does not matter).
2. Count down through the distribution and find the score
in the middle of the distribution.
median of the following distribution: 7, 5, 2, 9, 4, 8, 1 is 5.
The ordered distribution is 1, 2, 4, 5, 7, 8, 9, and 5 is the
middle score.
The median takes more information into account
than the mode.
However, it is still a rather insensitive measure of
center because it does not take into account the
magnitudes of the scores above and below the
median.
As with the mode, two distributions can have the
same median and yet be very different in character.
For this reason, the median is used primarily when
the mean is not a good choice.
The Mean
The mean is the most sensitive measure of center because
it takes into account all scores in a distribution when it is
calculated.
The computational formula for the mean is the sum of the
scores (ΣX) divided by the number of scores in the
distribution (n).
The major advantage of the mean is that, unlike the mode
and the median, its value is directly affected by the
magnitude of each score in the distribution.
Assume that distribution A contains the scores 4, 6, 3, 8, 9,
2, 3, and distribution B contains the scores 4, 6, 3, 8, 9, 2, 43.
Although the two distributions differ by only a single
score (3 versus 43), they differ greatly in their means (5
versus 10.7, respectively).
The mean of 5 appears to be more representative of the
first distribution than the mean of 10.7 is of the second.
The median is a better measure of center for the second
distribution. The medians of the two distributions are 4
and 6, respectively.
Choosing a Measure of Center
One of the first things you should do when summarizing your
data is to generate a frequency distribution of the scores.
If your scores are normally distributed (or at least nearly
normally distributed), then the mean, median, and mode will fall
at the same point in the middle of the distribution,
When your scores are normally distributed, use the mean as your
measure of center because it is based on the most information.
As your distribution deviates from normality, the
mean becomes a less representative measure of
center.
The two graphs show the relationship between the
three measures of center with a positively skewed
distribution and a negatively skewed distribution.
In a negatively skewed distribution, the mean underestimates the center.
Conversely, in a positively skewed distribution, the mean overestimates
the center.
Because the median is much less affected by skew, it provides a more
representative picture of the distribution’s center than does the mean
and should be preferred whenever your distribution is strongly skewed.
Neither the mean nor the median will accurately represent the center if
your distribution is bimodal.
With a bimodal distribution, both measures of center underrepresent one
large cluster of scores and over represent the other.
6.4 Measures of Dispersion
If you look again at some of the sample distributions described
thus far, you will notice that the scores in the distributions differ
from each other.
A measure of spread provides information that helps you to
interpret your data.
Two sets of scores may have highly similar means yet very
different distributions, as the following example illustrates.
The distributions of the two players’ averages are as follows:
• Player 1: .260, .397, .200, .195
• Player 2: .263, .267, .259, .263
Each player has a .263 batting average over 4 years
Which of these two players would you prefer to have
on your team? Most likely, you would pick player 2
because he is more “consistent” than player 1.
This simple example illustrates an important point
about descriptive statistics.
When you are evaluating your data, you should take
into account both the center and the spread of the
scores.
Measures of spread: the range, the variance, and the
standard deviation.
The Range
is the simplest and least informative measure of spread.
To calculate the range, you simply subtract the lowest
score from the highest score.
In the baseball example, the range for player 1 is .202, and
the range for player 2 is .008.
Compare the following two distributions of scores: 1, 2, 3, 4,
5, 6 and 1, 2, 3, 4, 5, 31. The range for the first distribution is
5, and the range for the second is 30.
The two ranges are highly discrepant despite the fact that
The Variance
The variance is the average squared deviation
from the mean.
The defining formula is
The Standard Deviation
Although the variance is frequently used as a
measure of spread in certain statistical
calculations, it does have the disadvantage of
being expressed in units different from those of
the summarized data.
However, the variance can be easily converted
into a measure of spread (s) expressed in the same
unit of measurement as the original scores: To
4.5 Measures of Skewness
Skewness means lack of symmetry.
In skewed distribution, the mean and the median are
pulled away from the mode.
Mean, median and mode are not equal.
A skewed distribution is an asymmetrical distribution.
It has a long tail on one side and short tail on the other
side.
Test of skewness
To test whether a distribution is skewed or not, the
following are to be noticed. A distribution is skewed if
1. mean, median and mode are not equal.
Shape can be described by degree of asymmetry (i.e.,
skewness).
◦ mean > median positive or right-skewness
◦ mean = median symmetric or zero-skewness
◦ mean < median negative or left-skewness
Positive skewness can arise when the mean is increased by
some unusually high values.
Negative skewness can arise when the mean is decreased by
some unusually low values.
4.6 Measures of Relationship
In some cases, you may want to evaluate the direction and degree of
relationship (correlation) between the scores in two distributions.
For this purpose, you must use a measure of association.
The most widely used measure of association is the Pearson product-
moment correlation coefficient, or Pearson r.
The Pearson correlation coefficient provides an index of the direction
and magnitude of the relationship between two sets of scores.
The value of Pearson r can range from +1 through 0 to −1. The sign of the
coefficient tells you the direction of the relationship.
A positive correlation indicates a direct relationship.
A negative correlation indicates an inverse relationship.
Cont.
4.7 Simple Regression Analysis
Simple Regression analysis is a quantitative research
method which is used when the study involves modelling
and analyzing variables, where the relationship includes a
dependent variable and independent variables.
In simple terms, regression analysis is a quantitative
method used to test the nature of relationships between a
dependent variable and one or more independent
variables.
The basic form of regression models includes unknown
parameters (β), independent variables (X), and the
dependent variable (Y).
Regression model, basically, specifies the relation of
dependent variable (Y) to a function combination of
independent variables (X) and unknown parameters (β)
Y ≈ f (X, β)
Regression equation can be used to predict the values of
‘y’, if the value of ‘x’ is given, and both ‘y’ and ‘x’ are the
two sets of measures of a sample size of ‘n’. The formulae
for regression equation would be
Simple Regression Example
The following data are diastolic blood
pressure (DBP) measurements taken at
different times after an intervention for n =
5 persons. For each person, the data
available include the time of the
measurement and the DBP level. Of interest
is the relationship between these two
variables.
19 -
26
Time DPB
Patie x 2 y y2 xy
nt x
1 0 0 72 5,184 0
2 5 25 66 4,356 330
3 10 100 70 4,900 700
4 15 225 64 4,096 960
5 20 4,356 1,320
Sum 50 750 338 22,892 3,310
Mean 10 67.6 19 -
27
19 -
28
19 -
29
Thank You