0% found this document useful (0 votes)
33 views30 pages

Data Analysis

Uploaded by

fentawmelaku1993
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views30 pages

Data Analysis

Uploaded by

fentawmelaku1993
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Research Methodology

Data Analysis

April 2023
Data Analysis
 The data, after collection, has to be processed and analysed in
accordance with the outline laid down for the purpose at the time of
developing the research plan.
 This is essential for ensuring that we have all relevant data for
making contemplated comparisons and analysis.
 processing implies editing, coding, classification and tabulation of
collected data so that they are amenable to analysis.
 Data analysis is the process of inspecting, transforming, and
modeling data with the goal of discovering useful information
suggesting conclusions, & supporting decision making.
Data Processing and Presentation
Editing
–Editing of data is a process of examining the collected raw
data (specially in surveys) to detect errors & omissions and
to correct these when possible.
–Editing is done to assure that the data are accurate,
consistent with other facts gathered, uniformly entered, as
complete as possible and have been well arranged to
facilitate coding and tabulation.
Coding
– Coding refers to the process of assigning numerals or
other symbols to answers so that responses can be put
into a limited number of categories or classes.

– Such classes/categories must possess the characteristic


of exhaustiveness and also that of mutual exclusively
(which means that a specific answer can be placed in
one and only one cell in a given category set).
Coding
– Another rule to be observed is that of uni-
dimensionality by which is meant that every class is
defined in terms of only one concept (it measures
single attributes) .
– Coding decisions should usually be taken at the
designing stage of a data collection.
Classification
– Classification is the process of arranging data in groups or
classes on the basis of common characteristics.
– Classification can be one of the following two types,
depending upon the nature of the phenomenon involved:
—) Classification According to Attributes
—) Classification According to Class-intervals
- Classification According to Attributes: data are classified on the
basis of common characteristics which can be descriptive (such
as literacy/educational level, sex……).
- Descriptive characteristics refer to qualitative phenomenon
which cannot be measured quantitatively.
- Such data are known as statistics of attributes and their
classification is said to be classification according to attributes.
Classification
– Classification According to Class-intervals: Unlike descriptive
characteristics, the numerical characteristics refer to
quantitative phenomenon which can be measured through
some statistical units.

– Numerical data relating to income, production, age,


weight….come under this category.
– Such data are known as statistics of variables and are
classified on the basis of class intervals.
Tabulation
– When a mass of data ‘Big Data’ has been assembled, it
becomes necessary for the researcher to arrange it in
some kind of concise & logical order. This procedure is
referred to as tabulation.
– Thus, tabulation is the process of summarizing raw
data and displaying the same in compact form (i.e.,
in the form of statistical tables) for further analysis.
– In a broader sense, tabulation is an orderly
arrangement of data in columns and rows
Data Analysis

Two types of data analysis


Quantitative
Qualitative
Data Analysis

Quantitative Data Analysis


–The two most commonly used quantitative data analysis
methods are:
i. Descriptive Statistics
ii. Inferential Statistics
–Descriptive statistics are used to describe, summarize, or
explain a given set of data to find patterns.
–Inferential statistics is used to infer certain characteristics
of samples to population.
Descriptive analysis
Descriptive analysis: Describes the sample data through several characteristics of a particular array of measurement.
Measures of Mean = ∑ Xi / n;
Central  Median = Middle Data in an array of sequentially ordered values (with same values repeated as
Tendency necessary);
 Mode = Most frequent data
Measures of Distributions: Composition of the data set under consideration. It defines the nature of the data at
Dispersion hand. Typical conventional statistical distributions include:
 Mean Deviation = ∑ (Xi – Mean) / n; (average deviation of individual scores in the distribution from
the mean)
 Variance = ∑ (Xi – Mean)2 / (n – 1) for a sample data. If the data is for a population, ∑ (Xi –
Mean)2 / N. Mean of squared deviations (sum of squared deviations divided by sample size)
 Standard Deviations = √Variance; Square root of mean of squared deviations;
Note: Measure of central tendency (particularly mean) should be reported with measure of dispersion
(particularly standard deviation) to describe data

Measures ofChi-Square tests – measures goodness for fit (not causality) using frequency distributions;
Associations Correlation coefficients –shows type and strength of relation of items
Descriptive Statistics

Measures of Central Tendency


– Measures of central tendency (or statistical averages) tells
the point about which items have a tendency to cluster.
– The three most frequently used measures of central
tendency are:
—) Mean
—) Median
—) Mode
Mean
–also known as arithmetic average
– is the most commonly used & accepted measure of central
tendency.

– This should be used in the case of interval or ratio data.


– If the scores for a given sample distributions are:
32, 32, 35, 36, 37, 38, 38, 39, 39, 39, 40, 40, 42, 45

– The mean of the distribution will be:


[32+32+35+36+37+38+38+39+39+39+40+40+42+45]/14 =
Median
– Median is defined as the middle value in an ordered
arrangement of observations.

– The median is often used to summarize the location of a


distribution.

– Further, the median can be used with ordinal, interval,


or ratio measurements.

– If the scores for a given sample distributions are:


32, 32, 35, 36, 37, 38, 38, 39, 39, 39, 40, 40, 42, 45
The median will be: 38 + 39 = 38.5
Mode
– Mode can be defined as the most frequently occurring
value in a group of observations.

– If the scores for a given sample distributions are:


32, 32, 35, 36, 37, 38, 38, 39, 39, 39, 40, 40, 42, 45

– Then, the mode would be 39 because a score of 39


occurs three times, more than any other score.

– Mode is very good measure for ascertaining the


location of distribution in the case of nominal data.
Measure of Dispersion
• An averages fails to give any idea about the scatter of the
values of items of a variable in the series around the true
value of average

• The measure of dispersion is important to specify the


spread of distribution, which is measured by measure of
dispersion.
– The three most frequently used measures statistics
measuring dispersion are:
—) Range
—) Mean deviation
—) Standard deviation
—) Variance
Range
– Range is the difference between the highest & lowest
value.
– It is based solely on extreme values. Thus, it cannot
truly reveal the body of measurement.
Variance
– The squared deviation of a random variable from
its mean.

– Variance makes deviation much larger than it


actually is, hence to remove the effect they are
un-squared.
– Take the square root of the squared deviations in
the process of computing standard deviation.
Standard Deviation
– The standard deviation provides the best measure of
dispersion for interval/ratio measurements and is the
most widely used statistical measure after mean.

– The standard deviation for a sample will be calculated


by the following formula:
Example:
– The owner of a cafe is interested in how much people
spend at her cafe.

– She examines 10 randomly selected customers and


noted the following:
44, 50, 38, 96, 42, 47, 40, 39, 46, 50

– She calculated the mean by adding the observations and


dividing by 10 to get:
x = 49.2
Example:
– Below is the table for getting the standard deviation:
X (X- 49.2) (X-49.2)2
44 -5.2 27.04
50 0.8 0.64
38 11.2 125.44
96 46.8 2190.24
42 -7.2 51.84
47 -2.2 4.84
50 -9.2 84.64
39 -10.2 104.04
46 -3.2 10.24
50 0.8 0.64
Total 2,600.4
Example:
– Hence, the variance is 289 & the standard deviation is
the square root of 289 = 17.
– The mean for this example was about 49.2 and the
standard deviation was 17.
– We have:
49.2 - 17 = 32.2
49.2 + 17 = 66.2
– What this means is that most of the customers probably
spend between 32.20 and 66.20.
Skewness

Mean Mean Mean


Mode Mode
Median
Median Mode Median

Negatively Symmetric Positively


Skewed (Not Skewed) Skewed
Normal
Research Software
– Statistical software are specialized computer programs
for statistical & econometric analysis.

– The most commonly used statistical packages in


research:
—) SPSS
—) STATA
—) SAS
—) Minitab
—) NVivo
Reading Assignment
Inferential statistics
Qualitative Analysis
Quality Assurance
 Research quality: is the measure of how facts,
problems and objectives are established, how
the researchers explicated the facts and how
conclusions are drawn from the facts.
 The research design should ensure the quality
of all the processes.
Research quality
Epistemological viewpoint
Positivist Relativist Constructionist
Do the measures Have sufficient Does the study clearly
Validity (construct) correspond closely to number of gain access to the
reality? perspectives been experiences of those
included? doing the research
setting?
Will similar
Reliability Will the measures observations be Is there transparency
yield the same result reached by other how sense was made
on other occasions? observers? from the raw data?
What is the Do the concepts and
To what extent does probability that patters constructs derived
the study confirm or observed in the from this study have
Generalizability sample will be any relevance to other
contradict existing
findings in the same repeated in the general setting?
field? populations?
Research rigor (quality) (How to ensure. Eg. Case study)
Measure of quality (rigor) in case study research
Construct validity Internal validity External validity Reliability
 Data triangulation (both source and  Theoretical framework  Theory as a basis of  Case study protocol
collection strategy) for example: [preferably explicitly generalization (report of how the entire
derived] to be used as (particularly for single case study was
 Documents and archival data (internal basis of research case studies) conducted)
reports, minutes or archives, annual reports, processes;
press or other secondary articles);  Multiple case studies for  Case study database
 Pattern matching replication (database with all
 Interview data (original interviews carried (matching patterns available documents,
out by researchers) identified to those  Cross case analysis (as interview transcripts,
reported by other applicable) archival data, etc.)
 Review of transcripts and draft by peers, key authors);
informants  Rationale for case study  Maintain chain of
 Theory triangulation selection (explanation evidence
 Present cases systematically (from research (different theoretical why this case study was
question to conclusion and vice versa lenses and bodies of appropriate in view of
including citation to specific evidence literature used, either as research question)
sources) research framework, or
as means to interpret  Details on case study
 Indication of data collection circumstances findings) context (explanation of
(explanation how access to data has been e.g., industry context)
achieved) as well as checking for  Using different analysis
circumstances of data collection vs. actual techniques such as logic  Comparison with other
procedure (reflection of how actual course of models and explanation literature/study
research affected data collection process) building, rival
explanation in addition to
 Explanation of data analysis (clarification of pattern matching in
data analysis procedure) analysis
 Reflexivity (account of the researcher’s stand
and the research process has shaped the fact
finding and outcome of the research)

You might also like