0% found this document useful (0 votes)

68 views29 pages

Data Analysis Notes

The document outlines the process and importance of data analysis, emphasizing its role in decision-making across various fields. It discusses the methods of data collection, cleaning, and categorization, as well as the types of data (qualitative and quantitative) and their measurement levels. Additionally, it highlights the significance of critical thinking when interpreting data and the need for robust research methodologies to ensure credibility and reliability in findings.

Uploaded by

jnankya9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views29 pages

Data Analysis Notes

Uploaded by

jnankya9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

RESEARCH METHODS 2025

LECTURE NOTES ON DATA ANALYSIS

Data Analysis is a process of inspecting, cleaning, transforming and modeling data with the
goal of discovering useful information, suggesting conclusions and supporting decision-
making. The purpose of Data Analysis is to extract useful information from data and taking the
decision based upon the data analysis. Data analysis has multiple facets and approaches,
encompassing diverse techniques under a variety of names, and is used in different business,
science, and social science domains. In today's business world, data analysis plays a role in
making decisions more scientific and helping businesses operate more effectively.

The analysis of data requires a number of closely related operations such as establishment of
categories, data dimensions, the application of these categories to raw data through coding,
tabulation and then drawing statistical inferences. The unwieldy data should necessarily be
condensed into a few manageable groups and tables for further analysis. Thus, a researcher
should classify the raw data into some purposeful and usable categories.

Data analysis involves ordering and organizing raw data so that useful information can be
extracted from it. This enables one to understanding what the data does and does not contain.
Dat analysis can be approached in many ways, and it is easy to manipulate data during the
analysis phase to push certain conclusions or agendas. For this reason, it is important to pay
attention when data analysis is presented, and to think critically about the data and the
conclusions which were drawn.

Raw data can take a variety of forms, including measurements, survey responses, and
observations. In its raw form, this information can be incredibly useful, but also overwhelming.
Over the course of the data analysis process, the raw data is ordered in a way which will be
useful. For example, survey results may be tallied, so that people can see at a glance how many
people answered the survey, and how people responded to specific questions. Modeling the data
with the use of mathematics and other tools can sometimes exaggerate such points of interest
in the data, making them easier for the researcher to see.

When people encounter summarized data and conclusions, they should view them critically.
Asking where the data is from is important, the sampling method used to collect the data, and
the size of the sample. If the source of the data appears to have a conflict of interest with the
type of data being gathered, the results in question could be doubted. Likewise, data gathered
from a small sample or a sample which is not truly random may be of questionable validity.
Reputable researchers always provide information about the data gathering techniques used, the
source of funding, and the point of the data collection in the beginning of the analysis so that
readers can think about this information while they review the analysis.

Analysis, refers to dividing a whole into its separate components for individual
examination. Data analysis, is a process for obtaining raw data, and subsequently converting it
into information useful for decision-making by users. Data, is collected and analyzed to answer
questions, test hypotheses, or disprove theories.

Data analysis is an important stage of the research process. It provides a summary of the process
and explores specific areas of data analysis that might be applicable to learners studying at
undergraduate and post graduate levels. In any reserach, one should take some time to carefully
1
review all of the data collected from the experiment. Use charts and graphs to help you analyze
the data and patterns. Did you get the results you had expected? What did you find out from
your experiment? Then, really think about what you have discovered and use your data to help
you explain why you think certain things happened.

Research Process
Researchers who are attempting to answer a research question employ the research process.
Though presented in a liner format, in practice the process of research can be less
straightforward. This said, researchers attempt to follow the process and use it to present their
research findings in research reports and journal articles.

Issues of data analysis

a) Establishing Trustworthiness: In qualitative research data must be auditable through
checking that the interpretations are credible, transferable, dependable and confirmable.

b) Credibility improved through long engagement with the respondents or triangulation in data
collection (internal validity).

c) Transferability achieved through a thick description of the research process to allow a reader
to see if the results can be transferred to a different setting (external validity).

d) Dependability examined through the audit trail (reliability) e.g. member checking.

e) Confirmability audit trail categories used e.g. raw data included, data analysis and reduction
processes described, data reconstruction and synthesis including structuring of categories and
themes, process notes included, instrument development information included.

Why use data?

People turn to data because they have a story to tell or a problem to solve. Most people start
with a question, then look to data for answers. In a service setting, questions might include,
“who is receiving services?” and “who does best in treatment?”

What if you do not have a question to begin with? Exploring data without a defined question,
sometimes referred to as “data mining”, can sometimes reveal interesting patterns in the data
that are worth exploring. Regardless of what leads you to look at data, thinking about your
audience (your staff, supervisor, Board members, etc.) is helpful to shape the story and guide
your thinking about the data.

Whenever you look at data, it is important to be open to unexpected patterns, explanations, and
unusual results. Sometimes the most interesting stories to be told with data are not the ones you
set out to tell.

Data is used to describe things by assigning a value to them. The values are then organized,
processed, and presented within a given context so that it becomes useful. Data can be in
different forms: qualitative and quantitative:

Before embacking on data analysis

2
Whether the study employs secondary or primary data, the researcher should formulate
questions that can be addressed with data and collect, organize and display relevant data to
answer them.

Three key question that should be at the back of your mind when analysing data include
i) Is this the right data to answer the proposed research questions?
ii) Is the data well organized for the software to know the type of data being analysed such as
cross section data, time series data and panel data?
iii) What is the appropriate technique to analyse the data to provide accurate solutions?

This will guide you to understand the differences among various kinds of studies and which
types of inferences can legitimately be drawn from each; know the characteristics of well-
designed studies, including the role of randomization in surveys and experiments; understand
the meaning of measurement data and categorical data, of univariate and bivariate data, and of
the term variable; understand histograms, parallel box plots, and scatter plots and use them to
display data; compute basic statistics and understand the distinction between a statistic and a
parameter.

Select and use appropriate statistical methods to analyze data

For univariate measurement data, be able to display the distribution, describe its shape, and
select and calculate summary statistics; for bivariate measurement data, be able to display a
scatter plot, describe its shape, and determine regression coefficients, regression equations, and
correlation coefficients using technological tools; display and discuss bivariate data where at
least one variable is categorical; recognize how linear transformations of univariate data affect
shape, center and spread; identify trends in bivariate data and find functions that model the data
or transform the data so that they can be modeled.

Data collection and cleaning

Once processed and organized, the data may be incomplete, contain duplicates, or contain
errors. The need for data cleaning, will arise from problems in the way that the datum are
entered and stored. Data cleaning is the process of preventing and correcting these errors.
Common tasks include record matching, identifying inaccuracy of data, overall quality of
existing data, deduplication, and column segmentation. Such data problems can also be
identified through a variety of analytical techniques. For example, with financial information,
the totals for particular variables may be compared against separately published numbers, that
are believed to be reliable. Unusual amounts, above or below predetermined thresholds, may
also be reviewed. There are several types of data cleaning, that are dependent upon the type of
data in the set; this could be phone numbers, email addresses, employers, or other values.

Quantitative data methods for outlier detection, can be used to get rid of data that appears to
have a higher likelihood of being input incorrectly. Textual data spell checkers, can be used to
lessen the amount of mis-typed words, however, it is harder to tell if the words themselves are
correct.

Types of Data
Data is mainly categorised into secondary and primary data. This can be subcategorised into
nominal and ordinal, discrete and contineous, binary and categorical, or countable data. The
Table 1 summarises data types and their associated measurement level, plus some examples. It

3
is important to appreciate that appropriate methods for summary and display depend on the type
of data being used. This is also true for ensuring the appropriate statistical test is employed.

Table 1: Data types

Type of data Level of measurement Examples
Nominal
Eye colour, ethnicity, diagnosis
(no inherent order in categories)
Ordinal
Categorical Job grade, age groups
(categories have inherent order)
Binary
Gender- Male and Female
(2 categories – special case of above)
Discrete
Quantitative Size of household (ratio)
(usually whole numbers)
(Interval/Ratio)
Continuous Temperature °C/°F (no absolute
(NB units of
(can, in theory, take any value in a range, zero) (interval)
measurement
although necessarily recorded to a
used)
predetermined degree of precision) Height, age (ratio)

a) Quantitative data
Quantitative data is that which can be easily measured and recorded in numerical form. This is
used extensively in education in forms such as exam results, SATs results, absence and truancy
figures etc. Quantitative data is collected by testing to an agreed criteria as in exams or by
measuring as in height, age etc. Often this data is expressed using percentages rather than the
actual numbers themselves.

“Quantitative data” is data that is expressed with numbers. Quantitative data is data which can
be put into categories, measured, or ranked. Length, weight, age, cost, rating scales, are all
examples of quantitative data. Quantitative data can be represented visually in graphs and tables
and be statistically analyzed.

There are two types of quantitative data: categorical and continuous.

Categorical data
Categorical data is data that has been placed into groups. An item cannot belong to more than
one group at a time. Examples of categorical data include the individual’s current living
situation, smoking status, or whether he/she is employed. As discussed in more detail later, the
type of analysis used with categorical data is the Chi-square test.

Continuous data
“Continuous data” is numerical data measured on a continuous range or scale. In continuous
data, all values are possible with no gaps in between. Examples of continuous data are a
person’s height or weight, and temperature. As discussed in more detail later, many types of
analysis can be used with continuous data, including effect size calculations.

4
Calculations and Summarizing Data
Often, you will need to perform calculations on your raw data in order to get the results from
which you will generate a conclusion. A spreadsheet program such as Microsoft Excel may be
a good way to perform such calculations, and then later the spreadsheet can be used to display
the results. Be sure to label the rows and columns--don't forget to include the units of
measurement (grams, centimeters, liters, etc.).

You should have performed multiple trials of your experiment. Think about the best way to
summarize your data. Do you want to calculate the average for each group of trials, or
summarize the results in some other way such as ratios, percentages, or error and significance
for really advanced students? Or, is it better to display your data as individual data points?

Do any calculations that are necessary for you to analyze and understand the data from your
experiment.

● Use calculations from known formulas that describe the relationships you are testing.
● Pay careful attention because you may need to convert some of your units to do your
calculation correctly. All of the units for a measurement should be of the same scale–
(keep L with L and mL with mL, do not mix L with mL!)

b) Qualitative Data
Qualitative data is data that uses words and descriptions. Qualitative data can be observed but
is subjective and therefore difficult to use for the purposes of making comparisons. Descriptions
of texture, taste, or an experience are all examples of qualitative data. Qualitative data collection
methods include focus groups, interviews, or open-ended items on a survey. Qualitative data
is information that is represented by means other than numbers. This could be data on gender,
place of birth, school attended etc. Data from questionnaires or forms is often of a qualitative
nature and categories are often used to group the data together such as questions on racial origin.
Qualitative data is often presented in numbers or percentages such as in the statement, 23% of
Makere University Students are of Ugandan origin.

c) Discrete and Continuous Data

Discrte data can be category without order also called nominal (has data name only) or with
order called ordinal (has data name and order but have no numeric scale) or whole number data.
Discrete data is anything that can be measured exactly as a whole number such as how many
children are attending a class on a particular day. There can only be 24 or 31 children not 24.756
or 31.961.

Continuous data is that which can be any number on a scale. If you were to measure the heights
of the children in the class there would be a range of measurements from the shortest to tallest
child and these measurements could be anywhere on the chosen scale. For practical purposes
the measurements would usually be rounded of to the nearest whole or half unit but could
actually be at any point. Other examples of continuous data are things like rainfall, length of
feet and weight.

d) Time series data

This refers to situations where other variables are being measured at different regular times
intervals. These should just use whole numbers, i.e., it is treated as whole number data. The
variable being collected would also be ordinal or whole number data. Time series data is often

5
plotted with line graphs, but are also plotted as bar graphs where times are clumped together,
e.g., days of the week.

Levels of data measurement

a) Nominal: This is used to describes variables that are categorical in nature. The characteristics
of the data you're collecting fall into distinct categories. If there are a limited number of distinct
categories (usually only two), then you're dealing with a discrete variable. If there are an
unlimited or infinite number of distinct categories, then you're dealing with a continuous
variable. Nominal variables include demographic characteristics like sex, race, and religion.

b) Ordinal:This data measurement describes variables that can be ordered or ranked in some
order of importance. It describes most judgments about things, such as big or little, strong or
weak. Most opinion and attitude scales or indexes in the social sciences are ordinal in nature.

c) Interval:This data measurement describes variables that have more or less equal intervals, or
meaningful distances between their ranks. For example, if you were to ask somebody if they
were first, second, or third generation immigrant, the assumption is that the distance, or number
of years, between each generation is the same. All crime rates in criminal justice are interval
level measures, as is any kind of rate.

d) Ratio:This data measurement describes variables that have equal intervals and a fixed
reference point. It is possible to have zero income, zero education, and no involvement in crime,
but rarely do we see ratio level variables in social science since it's almost impossible to have
zero attitudes on things, although "not at all", "often", and "twice as often" might qualify as
ratio level measurement.

Actual data analysis

Before one embarks on actual data analysis, it is important to explore your data,i.e. exploratory
data data analysis. Once the datasets are cleaned, it can then be analyzed. Analysts may apply
a variety of techniques, referred to as exploratory data analysis, to begin understanding the
messages contained within the obtained data. The process of data exploration may result in
additional data cleaning or additional requests for data.

Descriptive statistics, such as, the average or median, can be generated to aid in understanding
the data. Data visualization is also a technique used, in which the analyst is able to examine the
data in a graphical format in order to obtain additional insights, regarding the messages within
the data.

Modeling and algorithms

Mathematical formulas or models (known as algorithms), may be applied to the data in order
to identify relationships among the variables; for example, using correlation or causation. In
general terms, models may be developed to evaluate a specific variable based on other
variable(s) contained within the dataset, with some residual error depending on the
implemented model's accuracy (e.g., Data = Model + Error).

6
Inferential statistics, includes utilizing techniques that measure the relationships between
particular variables. For example, regression analysis may be used to model whether a change
in advertising (independent variable X), provides an explanation for the variation in sales
(dependent variable Y). In mathematical terms, Y (sales) is a function of X (advertising). It may
be described as (Y= aX + b + error). Basically, analysts may also attempt to build models that
are descriptive of the data, in an aim to simplify analysis and communicate results.

Data visualization
Once the data is analyzed, it may be reported in many formats to the users of the analysis to
support their requirements. The users may have feedback, which results in additional analysis.
As such, much of the analytical cycle is iterative.

When determining how to communicate the results, the analyst may consider implementing a
variety of data visualization techniques, to help clearly and efficiently communicate the
message to the audience. Data visualization uses information displays (graphics such as, tables
and charts) to help communicate key messages contained in the data. Tables are a valuable tool
by enabling the ability of a user to query and focus on specific numbers; while charts (e.g., bar
charts or line charts), may help explain the quantitative messages contained in the data.

The data analysts typically obtain descriptive statistics for study variables, such as the mean
(average), median, and standard deviation. They may also analyze the distribution of the key
variables to see how the individual values cluster around the mean.

Hypothesis testing is used when a particular hypothesis about the true state of affairs is made
by the analyst and data is analysed to determine whether that state of affairs is true or false. For
example, the hypothesis might be that "Unemployment has no effect on inflation", which relates
to an economics concept called the Phillips Curve. Hypothesis testing involves considering the
likelihood of Type I and type II errors, which relate to whether the data supports accepting or
rejecting the hypothesis.

Regression analysis may be used when the analyst is trying to determine the extent to which
independent variable X affects dependent variable Y (e.g., "To what extent do changes in the
unemployment rate (X) affect the inflation rate (Y)?"). This is an attempt to model or fit an
equation line or curve to the data, such that Y is a function of X.

Necessary condition analysis (NCA) may be used when the analyst is trying to determine the
extent to which independent variable X affects the dependent variable Y. For example, To what
extent is a certain unemployment rate (X) necessary for a certain inflation rate (Y)? Whereas
(multiple) regression analysis uses additive logic where each X-variable can produce the
outcome and the X's can compensate for each other (they are sufficient but not necessary),
necessary condition analysis (NCA) uses necessity logic, where one or more X-variables allow
the outcome to exist, but may not produce it (they are necessary but not sufficient). Each single
necessary condition must be present and compensation is not possible.

Barriers to effective analysis

7
Barriers to effective analysis may exist among the analysts performing the data analysis or
among the audience. Distinguishing fact from opinion, cognitive biases, and innumeracy are all
challenges to sound data analysis.

Confusing fact and opinion

Effective analysis requires obtaining relevant facts to answer questions, support a conclusion
or formal opinion, or test hypotheses. This requires one to leave the data to speak, rather than
the research speaking to the data. Facts by definition are irrefutable, meaning that any person
involved in the analysis should be able to agree upon them. For example, in August 2010, 20
Ugandans died in motor accidents caused by reckless driving. Everyone should be able to agree
that indeed this is what police reported; they can all examine the report. This makes it a fact.
Whether persons agree or disagree with the police is their own opinion.

As another example, the auditor of a public company must arrive at a formal opinion on whether
financial statements of publicly traded corporations are "fairly stated, in all material respects”.
This requires extensive analysis of factual data and evidence to support their opinion. When
making the leap from facts to opinions, there is always the possibility that the opinion
is erroneous.

Cognitive biases
There are a variety of cognitive biases that can adversely affect analysis. For
example, confirmation bias is the tendency to search for or interpret information in a way that
confirms one’s preconceptions. In addition, individuals may discredit information that does not
support their views. Note that, analysts may be trained specifically to be aware of these biases
and how to overcome them.

Innumeracy
Effective analysts are generally adept with a variety of numerical techniques. However,
audiences may not have such literacy with numbers or numeracy; they are said to be innumerate.
Persons communicating the data may also be attempting to mislead or misinform, deliberately
using bad numerical techniques.

For example, whether a number is rising or falling may not be the key factor. More important
may be the number relative to another number, such as the size of government revenue or
spending relative to the size of the economy (GDP) or the amount of cost relative to revenue in
corporate financial statements. This numerical technique is referred to as normalization or
common-sizing. There are many such techniques employed by analysts, whether adjusting for
inflation (i.e., comparing real vs. nominal data) or considering population increases,
demographics, etc. Analysts apply a variety of techniques to address the various quantitative
messages described in the section above.

Subjectivism and Objectivism

8
Qualitative methodology recognizes that the subjectivity of the researcher is intimately
involved in scientific research. Subjectivity guides everything from the choice of topic that one
studies, to formulating hypotheses, to selecting methodologies, and interpreting data. In
qualitative methodology, the researcher is encouraged to reflect on the values and objectives he
brings to his research and how these affect the research project. Other researchers are also
encouraged to reflect on the values that any particular investigator utilizes.

A key issue that arises with the recognition of subjectivity is how it affects objectivity. Two
positions have been articulated. Many qualitative researchers counterpoise subjectivity and
objectivity. Objectivity is said to negate subjectivity since it renders the observer a passive
recipient of external information, devoid of agency. And the researcher's subjectivity is said to
negate the possibility of objectively knowing a social psychological world. The investigator's
values are said to define the world that is studied. One never really sees or talks about the world,
per se. One only sees and talks about what one's values dictate. A world may exist beyond
values, but it can never be known as it is, only as values shape our knowledge of it.

Secondary Quantitative Data

There are enormous amounts of data that are collected every day by government agencies,
universities, private organizations, non-profits, think tanks, public opinion polls, and students.
Some examples include Bank of Uganda, Uganda Bureau of Statistics, URA etc. In order to
use secondary data for your research, you need to 1) locate the data; 2) evaluate the data; and
3) verify the data.

1. Locating and verifying the data

Secondary data can be located by using printed indices, such as the American Statistics Index
or the Statistical Reference Index, available at most libraries. There are also on-line databases
of secondary data, for example, the UNHS, UDHS.

If the data seem valid and reliable, you need to make sure that you have an accurate copy of
the data, especially if you obtained it through an electronic medium. This includes verifying
that you:

Have proper documentation

Have the correct number of observations or cases
Have the correct number of variables
Have the correct coding scheme
Can reproduce the original summary statistics

Why use secondary data? (Advantages and Disadvantages of Secondary Data Analysis)
It is unobtrusive research
It can be less expensive than gathering the data all over again. Low cost
It may allow the researcher to cover a wider geographic or temporal range. That is, using
secondary data ensures the breadth of data available.
It can allow for larger scale studies on a small budget.
It does not exhaust people's good will by re-collecting readily available data.

Potential Drawbacks (Disadvantages of using secondary data)

Secondary data are only as good as the research that produced them
Must assume what the author(s) meant by the terms they used;
9
There may be sub-culture references, jargon, or idiomatic expressions
Data may be neither valid nor reliable
Instruments or data collection methods may have changed over time
Data may have been modified by the researcher already (e.g., weighted)
Poor documentation of the secondary data set
Electronic format incompatibilities
Limited access to the data, e.g., on-site only
Confidentiality considerations that lessen its usefulness
Substantial purchase or loan cost

What were the conditions that led to their production?

a) Data may have been originally gathered to persuade, justify, or otherwise convey a particular
point of view.

b) Data may have been intended for consumption by particular groups, which differ from the
present project

c) Data may have decayed over time, been censored or purged

QUANTITATIVE DATA ANALYSIS

This is the process of presenting and interpreting numerical data. The Results section of
reserach report or papers including quantitative data analysis often contain descriptive statistics
and inferential statistics.

The descriptive data analysis techniques include graphics, tabulation , simple summary
statistics, pictoral and textural methods. Summary statistics include measures of central
tendency (averages - mean, median and mode) and measures of variability about the average
(range and standard deviation). These give the reader a 'picture' of the data collected and used
in the research project.

Inferential statistics are the outcomes of statistical tests, helping deductions to be made from
the data collected, to test hypotheses set and relating findings to the sample or population.

Empirical data analysis is a sophisticated approach to data analysis used to establish

asociaations, causation and make predictions and forecasts. It allows to test the classical linear
regression assumptions and all the dignostic tests to ensure that the results reported are
unbiased, consistent and efficient. It is mainly undertaken by estimating regression models. The
models estimated depends on the type of the data under investigation.

Types of descriptive data analysis

a) Graphical data analysis
Figures are a geometric display of general relationships or the logical sequences involved in a
process of thought or organisation. Each figure should have its own purpose but be numbered,
titled and referred to in the text in the same manner as tables and graphs. Examples of figures
are time series charts, which show data and trends over time, bar and pie charts which can show
relative quantum’s of discrete items, histograms and distribution functions which show
quantum’s of continuous data, flow and organisational charts show relationships between
activities and events and spatial mapping which show general data and trends over space.
10
Graphs are often an excellent way to display your results. In fact, most good scientific projects
have at least one graph. For any type of graph:
● Generally, you should place your independent variable on the x-axis of your graph and
the dependent variable on the y-axis.
● Be sure to label the axes of your graph don’t forget to include the units of measurement
(grams, centimeters, liters, etc.).
● If you have more than one set of data, show each series in a different color or symbol
and include a legend with clear labels.

Pictures are often better at communicating ideas than other media forms; graphic presentations
are pictures. In working with graphics, pay particular attention to axes and/or labels and legends
which define the data in the graphic. Basic exaple of graphs we used include

Using graph for data visualization is very common in your day to day life; they often appear in
the form of charts and graphs. In other words, data shown graphically so that it will be easier
for the human brain to understand and process it. Data visualization often used to discover
unknown facts and trends. By observing relationships and comparing datasets, you can find a
way to find out meaningful information.

i) Scatter diagram
A scatter plot is used to show how two variables are related with each other. By observation
one is able to tell whether they are positively or negatvely related or nop relationship. A scatter
diagram is useful in data exploration and when a line of best fit is imposed, one is able to know
how the twovaraobale best fit the data. Example of a scatter diagram and lone of best fit can be
shown in figire 1 below:

ii) Bar Diagram

It is also called a columnar diagram. The bar diagrams are drawn through columns of equal
width. Following rules were observed while constructing a bar diagram:
(a) The width of all the bars or columns is similar.
(b) All the bars should are placed on equal intervals/distance.
(c) Bars are shaded with colours or patterns to make them distinct and attractive.
11
Three types of bar diagrams are used to represent different data sets:

Line and Bar Graph

The line and bar graphs as drawn separately may also be combined to depict the data related
to some of the closely associated characteristics such as the climatic data of mean monthly
temperatures and rainfall.

Multiple Bar Diagram

Multiple bar diagrams are constructed to represent two or more than two variables for the
purpose of comparison. For example, a multiple bar diagram may be constructed to show
proportion of males and females in the total, rural and urban population or the share of canal,
tube well and well irrigation in the total irrigated area in different states.
12
Figure 5 shows the average weekly hours worked by male and female youth. A quick glance at
this graph shows that male youth work more hours compared to their female counterparts in
urban areas of all regions. Surprisingly female youth in all rural areas work fewer weekly hours
than their male counterparts.

Figure : Mean weekly hours worked by the youth by region and residence

iii) Histogram: A histogram is the most commonly used graph to show frequency distributions.
It looks very much like a bar chart, but there are important differences. This is good for
comparing different groups if the indepedent variable is not numeric. A histogram can be used
to establish the normal distribution of the variable, i.e. whether it is normally distributed or
positive or negatively skewed. Illustrations:

13
Skewed Distribution
The skewed distribution is asymmetrical because a natural limit prevents outcomes on one side.
The distribution’s peak is off center toward the limit and a tail stretches away from it. For
example, a distribution of analyses of a very pure product would be skewed, because the product
cannot be more than 100 percent pure. Other examples of natural limits are holes that cannot
be smaller than the diameter of the drill bit or call-handling times that cannot be less than zero.
These distributions are called right- or left-skewed according to the direction of the tail.

iv) Line graph

Line graphs are used to show the direct relationship between one quantity and another and are
often used to represent change over time - this is known as a time series. Line graphs are
particularly useful for showing data such as temperature, sales, income, or growth. The line
graph below shows the number of asylum-seekers entering the Uganda each year. Graphs like
these are used to indicate trends and sudden changes in situations such as the increase in asylum
seekers in 1991 and 2011. Note that line graphs are usually drawn to represent the time series
data related to the temperature, rainfall, population growth, birth rates and the death rates. A
line graph can be use to plot one variable or more variables in the same plane. Essentially, it
provided the trend over time of a given macroeconomic variable.

Figure: Bank credit ad Domestic savings/GDP ratio over time

14
v) Pie Diagram
Pie diagram is another graphical method of the representation of data. It is drawn to depict the
total value of the given attribute using a circle. Dividing the circle into corresponding degrees
of angle then represent the sub– sets of the data. Hence, it is also called as Divided Circle
Diagram. Data are defined by labels and/or the legend associated with the chart.The angle of
each variable is calculated using the following formulae. Analysis of data using pie chat can be
illustrated using the figure beelow:

15
Construction
(a) Mark time series data on X-axis and variable data on Y-axis as per the selected scale.
(b) Plot the data in closed columns.

Compound Bar Diagram

When different components are grouped in one set of variable or different variables of one
component are put together, their representation is made by a compound bar diagram. In this
method, different variables are shown in a single bar with different rectangles.

Data Tabulations

This mainly involves data analysis using tabulation of frequencies, percentages or number of
observations. It may be a one way tabulation, two way tabulation, three way tabulation etc
depending on the information one wants to display and report to the readers. Below we focus
on a one way and two way tabulations. However, in two way, one needs to take precautions, by
noting that he/she is doing row tabulations, column tabulations or cell tabulations because these
provide different interpretations of the study findings.
16
One way tabulation

Table 1: Distribution of respondents by region

Freq. Percent Cum.

North 2,096 20.25 20.25

central 2,774 26.80 47.05
South 2,853 27.56 74.61
West 2,628 25.39 100
Total 10,351 100

Two way Column Tabulation using

Table 2 explores the extent to which youth employment differ by gender in their employment
across different sectors. Overall, more youth are employed in the agriculture sectors (34.4%).
By gender, more male youth (36.1%) compared to 32.4% female youth are employed in
agriculture sector. The results show that other services (21.4%) followed by trade (20.7%)
employ more youth in Uganda than other sectors.

Table 2: Percentage distribution of Sector of employment by gender

Male Female Full sample
Agriculture 36.1 32.4 34.4
Mining 1.3 0.8 1.1
Manufacturing 8.1 9.9 8.9
Utilities 0.5 0.1 0.3
Construction 10.9 0.4 6.1
Trade 15.5 27 20.7
Transport 11.1 0.2 6.2
Finance 0.8 1 0.9
Other services 15.6 28.4 21.4

Two way row tabulation

Table 3 presents youth transition stage in their employment status by gender. The results show
that 38.5% of male youth transited into stable jobs compared to 25.4% of their female
counterparts, while more female youth (39.4%) transited to satisfactory jobs than male youth
(32.3%) and more female youth are in transition (35.2%) than male youth (29.2%).

Table 3: Percentage distribution of youth Transition stages by background variables

Transited stable job Transited satisfactory job In transition Total

Sex
Male 38.5 32.3 29.2 100
Female 25.4 39.4 35.2 100

17
Three way tabulation
Table 4: Tabulation of region sex race
Regions 1=white, 2=black, 3=other and 1=male, 2=female
White Black Other
Male Female Male Femal Male Femal
e e
North 962 1017 51 55 5 6
central 1170 1292 133 162 7 10
South 1076 1208 247 301 9 12
West 1104 1236 69 68 82 69

Table 5: Summary statistics analysis for the study variables

variable N mean sd min max range cv p50

Iregion 1 10351 0.200 0.400 0 1 1 1.980 0
Iregion 2 10351 0.270 0.440 0 1 1 1.650 0
Iregion 3 10351 0.280 0.450 0 1 1 1.620 0
Iregion 4 10351 0.250 0.440 0 1 1 1.710 0
Irace 1 10351 0.880 0.330 0 1 1 0.380 1
Irace 2 10351 0.100 0.310 0 1 1 2.920 0
Irace 3 10351 0.0200 0.140 0 1 1 7.120 0
age 10351 47.58 17.21 20 74 54 0.360 49
weight 10351 71.90 15.36 30.84 175.9 145.0 0.210 70.42
height 10351 167.7 9.660 135.5 200 64.50 0.0600 167.3

The mean value is what we typically call the "average." You calculate the mean by adding up
all of the measurements in a group and then dividing by the number of measurements.

Quartiles: The lower (Q1) quartile is the value below which the bottom 25% of the sample data
lie, and the upper (Q3) quartile is the value above which the upper 25% lie. NB. The middle
quartile (Q2) corresponds to the median.
The median is a statustsical value that is mid-way an arranged data set in ana ascending or
dissending order and is less sensitive to “outliers” in the data. That is, data values at the extremes
of a group.

Range: This measures the distance between the lowest and highest values in the data set and
generally describes how spread out data are. The range gives only minimal information about
the spread of the data, by defining the two extremes. It says nothing about how the data are
distributed between those two endpoints. Two other related measures of dispersion, the variance
and the standard deviation, provide a numerical summary of how much the data are scattered.
For example, after an exam, an instructor may tell the class that the lowest score was 65 and
the highest was 95. The range would then be 30. Note that a good approximation of the standard
deviation can be obtained by dividing the range by 4.

Measures of Dispersion: Range, Variance, and Standard Deviation

18
These statistics describe how the data varies or is dispersed (spread out). The two most
commonly used measures of dispersion are the range and the standard deviation. Rather than
showing how data are similar, they show how data differs (its variation, spread, or dispersion).

Variance is expressed as the sum of the squares of the differences between each observation
and the mean, which quantity is then divided by the sample size. A measure of the dispersion
of a set of data points around their mean value. Variance is a mathematical expectation of the
average squared deviations from the mean. For populations, it is designated by the square of
the Greek letter sigma ( ). For samples, it is designated by the square of the letter s ( ).
Since this is a quadratic expression, i.e. a number raised to the second power, variance is the
second moment of statistics.

Standard deviation is expressed as the positive square root of the variance, i.e. for
populations and s for samples. A measure of the dispersion of a set of data from its mean. The
more spread apart the data, the higher the deviation. It is the average difference between
observed values and the mean. The standard deviation is used when expressing dispersion in
the same units as the original measurements. It is used more commonly than the variance in
expressing the degree to which data are spread out.

Coefficient of variation measures relative dispersion by dividing the standard deviation by the
mean and then multiplying by 100 to render a percent. A statistical measure of the dispersion
of data points in a data series around the mean. This number is designated as V for populations
and v for samples and describes the variance of two data sets better than the standard deviation.
For example, one data set has a standard deviation of 10 and a mean of 5. Thus, values vary by
two times the mean. Another data set has the same standard deviation of 10 but a mean of 5,000.
In this case, the variance and, hence, the standard deviation are
insignificant.

Percentiles measure the percentage of data points which lie below a certain value when the
values are ordered. For example, a student scores 1280 on the Scholastic Aptitude Test (SAT).
Her scorecard informs her she is in the 90th percentile of students taking the exam. Thus, 90
percent of the students scored lower than she did.

Quartiles group observations such that 25 percent are arranged together according to their
values. The top 25 percent of values are referred to as the upper quartile. The lowest 25 percent
of the values are referred to as the lower quartile. Often the two quartiles on either side of the
median are reported together as the interquartile range. Examining how data fall within quartile
groups describes how deviant certain observations may be from others.

Measures of skew describe how concentrated data points are at the high or low end of the scale
of measurement. Skew is designated by the symbols Sk for populations and sk for samples.
Skew indicates the degree of symmetry in a data set. The more skewed the distribution, the
higher the variability of the measures, and the higher the variability, the less reliable are the
data.

Skew is calculated by either multiplying the difference between the mean and the median by
three and then dividing by the standard deviation or by summing the cubes of the differences

19
between each observation and the mean and then dividing by the cube of the standard deviation.
Note that the use of cubic quantities helps explain why skew is called the third moment.

More conceptually, skew defines the relative positions of the mean, median, and mode. If a
distribution is skewed to the right (positive skew), the mean lies to the right of both the mode
(most frequent value and hump in the curve) and median (middle value). That is, mode less
than (<) median less than (<) mean. But, if the distribution is skewed left (negative skew), the
mean lies to the left of the median and the mode. That is, mean < median < mode.

In a perfect distribution, mean = median = mode, and skew is 0. The values of the equations
noted above will indicate left skew with a negative number and right skew with a positive
number.

Measures of kurtosis describe how concentrated data are around a single value, usually the
mean. It is statistical measure used to describe the distribution of observed data around the
mean. It is sometimes referred to as the “volatility of volatility”. Thus, kurtosis assesses how
peaked or flat is the data distribution. The more peaked or flat the distribution, the less normally
distributed the data. And the less normal the distribution, the less reliable the data.
Kurtosis is designated by the letter K for populations and k for samples and is calculated by
raising the sum of the squares of the differences between each observation and the mean to the
fourth power and then dividing by the fourth power of the standard deviation. Note that the use
of the fourth power explains why kurtosis is called the fourth moment. Three degrees of kurtosis
are noted:

Mesokurtic distributions are, like the normal bell curve, neither peaked nor flat.

Platykurtic distributions are flatter than the normal bell curve. A description of the kurtosis in
a distribution in which the statistical value is negative.

Leptokurtic distributions are more peaked than the normal bell curve. A description of the
kurtosis in a distribution in which the statistical value is positive.

The ideal value rendered by the equation for kurtosis is 3, the kurtosis of the normal bell curve.
The higher the number above 3, the more leptokurtic (peaked) is the distribution. The lower the
number below 3, the more platykurtic (flat) is the distribution.

Correlation analysis

Correlation analysis is performed to establish how two variables are associated to one another.
The correlation coefficinet ranges from -1 to +1, When it is negative we say that the two
variables are negatively correlated, when it is positive we say that the two variables are
positively correlated, and when it is zero, we say that the twwo variables are independent.

Note that correlation analysis is one of the exploratory data analysis and can be used to
estaablish whether the variables are highly associatedd such that when the two variables are
used as independent variables in a regresion model, the results will be biased, i.e.e inneficient
and not accurate for forecasting. Using teh rule of the Thumb, when the correlation coefficient

20
is equal to or gretaer that 0.8, we may suspect that tere is a problem of multicollinearity in the
data and thus, the data needs to eb transformed. We can transform the data using either ratios,
logarithms of lags or differencing.

Results of sample correlation analysis is presented in Table 6 below.

Table 6: Pairwise correlations
Variables Age Wight Height
Age 1.00

Weight 0.039* 1.00

(0.000)
Height -0.206* 0.477* 1.00
(0.000) (0.000)
*** p<0.01, ** p<0.05, * p<0.1

Simple regression analysis

Empirical reserach results:

Accurate analysis of data using standardized statistical methods in scientific studies is critical
to determining the validity of empirical research. Statistical formulas such as regression,
uncertainty coefficient, t-test, chi square, and various types of ANOVA (analyses of variance)
are fundamental to forming logical, valid conclusions. If empirical data reach significance under
the appropriate statistical formula, the research hypothesis is supported. If not, the null
hypothesis is supported (or, more correctly, not rejected), meaning no effect of the independent
variable(s) was observed on the dependent variable(s).

It is important to understand that the outcome of empirical research using statistical hypothesis
testing is never proof. It can only support a hypothesis, reject it, or do neither. These methods
yield only probabilities.

Among scientific researchers, empirical evidence (as distinct from empirical research) refers to
objective evidence that appears the same regardless of the observer. For example, a
thermometer will not display different temperatures for each individual who observes it.
Temperature, as measured by an accurate, well calibrated thermometer, is empirical evidence.
By contrast, non-empirical evidence is subjective, depending on the observer. Following the
previous example, observer A might truthfully report that a room is warm, while observer B
might truthfully report that the same room is cool, though both observe the same reading on the
thermometer. The use of empirical evidence negates this effect of personal (i.e., subjective)
experience.

Empirical cycle
1. Observation: The collecting and organisation of empirical facts; Forming hypothesis.
2. Induction: Formulating hypothesis.
3. Deduction: Deducting consequences of hypothesis as testable predictions.
4. Testing: Testing the hypothesis with new empirical material.
5. Evaluation: Evaluating the outcome of testing or else

21
We are bombarded with information all of our lives. Does the information make sense? Is it
important? Why should I care? Reading and thinking critically involves asking questions of
everything we read or study. The questions listed below are just some of the questions you need
to ask while reading reports of empirical studies. These questions will help you identify key
information in the report and reflect on what purpose the information serves.

In order to critically evaluate a report of empirical research, it is essential to consider the context
in which the research was conducted. The following questions will help you understand the
context for the research

Who did the research? Where was it published? What are the research questions? , Where did
these research questions come from? Is the research important? Why or why not? Researchers
use different methods to address the research questions. The following questions will help you
evaluate the way in which the research was conducted:

Who or what is involved in the study? Are the subjects appropriate for the study? What is the
research design? Is the research design appropriate for the research question(s)?, What are the
measures? Are the measures appropriate for addressing the research question(s)? What ethical
considerations are important to address? Are they all addressed in the article?

The results of the study are used by the researchers to answer the research questions. Use the
following questions to help you understand the results and determine whether they answer the
research questions:

What are the main results of the study?, Can the results be used to answer the research
question(s)?, Can the results be generalized beyond the context of the study?

The conclusions place the results of the study into the context of the study. The following
questions will help you understand how the researchers make sense of the results and how they
use the results to better understand the discipline:

What conclusions do the researchers draw from the results? , Are the conclusions important?
Why or why not?

Diagnostic Analysis
22
Diagnostic Analysis shows “Why did it happen?” by finding the cause from the insight found
in Statistical Analysis. This Analysis is useful to identify behavior patterns of data. If a new
problem arrives in your business process, then you can look into this Analysis to find similar
patterns of that problem. And it may have chances to use similar prescriptions for the new
problems.

Predictive Analysis
Predictive Analysis shows “what is likely to happen” by using previous data. The simplest
example is like if last year I bought two dresses based on my savings and if this year my salary
is increasing double then I can buy four dresses. But of course it's not easy like this because
you have to think about other circumstances like chances of prices of clothes is increased this
year or maybe instead of dresses you want to buy a new bike, or you need to buy a house!

So here, this Analysis makes predictions about future outcomes based on current or past data.
Forecasting is just an estimate. Its accuracy is based on how much detailed information you
have and how much you dig in it.

Prescriptive Analysis
Prescriptive Analysis combines the insight from all previous Analysis to determine which
action to take in a current problem or decision. Most data-driven companies are utilizing
Prescriptive Analysis because predictive and descriptive Analysis are not enough to improve
data performance. Based on current situations and problems, they analyze the data and make
decisions.

TYPES OF REGERSSION MODELS

a) Continous dependent variables-OLS

This model is used only when the dependent variable is continous by minimising the sum of
squared residues.

y=a+bX+u

where y is height and X is age. The model is estimated to show the relationship between age
and height of an individual.
Source | SS df MS Number of obs = 10,351

-------------+---------------------------------- F(1, 10349) = 459.42

Model | 41018.1735 1 41018.1735 Prob > F = 0.0000

Residual | 923981.868 10,349 89.2822367 R-squared = 0.0425

-------------+---------------------------------- Adj R-squared = 0.0424

Total | 965000.042 10,350 93.236719 Root MSE = 9.4489

------------------------------------------------------------------------------

height | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

age | -.1156419 .0053952 -21.43 0.000 -.1262176 -.1050662

23
_cons | 173.1531 .272987 634.29 0.000 172.618 173.6882

------------------------------------------------------------------------------

Interpretation of the model results

The model has been estimated using 10,351 observations. It has statistically significant F-
statistic at 1% level of significance and thus we see that the model fits well the data. In terms
of the estimated coefficient, from the tabel above, we observe a negative relationship between
age and height and it is statistically significant at 1% level of significance.

b) Binary dependent variable models-These are also known as probability models. This when
the outcome variables is coded as 0 or 1. In this the most appropriate technique of models
estimation is either logit/probit techniques. When a logit model is estimated we obtain
coefficients that are difficult to inpterete and which violates the probability properties as they
may be greater positive 1 or less than negative 1. therefore to get meaningful results, for the
logit model, we can report the odd ratios or the marginal effects.

When odd ratios are reported for a logit model, odds greater than 1 imply a positive effect to
the outcome variable, the effect is given by oddsratio-1. On the other hand, if the odds are less
than, then it implies that the indenpendent varibale has a negative effect on the outcome.

More common in empirical analysis, marginal effects can be computed for both the logit/probit
models. These are interpreted as marginal probabilities that give the likelihood of occurance of
the outcome variables with respect to a given independent variable.

*logit and probit models can be estimated for both cross sectional and panel data

in case of cross section data we use the commands logit/probit, or logistic and in panel data we
used the command xtlogit/xtprobit.

c) Categorical dependent variable-This is when the dependent variable has more than two
outcome choices- the level of education codes 0-noeduc, 1 primary, 2 secondary, 3
postsecondary.

These models are known as multinomial logit/probit models. Ordinarily the coefficient of these
models do not make sense, hence we always report relative risk ratios for multinomiallogit or
marginal probabilities.

Note. In case of multinomial logit/probit the dependent variable does not take any form of
ordering. However, when the dependent assumes a given ordering, then we estimate the ordered
logit or probit models.

d) Count dependent variable-The dependent variable is non-negative but assumes countable

numbers including zeros. In this case we estimate poisson or negative binomial models. Also,
coefficients of poisson or negative binomial do not make sense, hence we report Relative Risk
Ratio (RRR) or marginal effects.

Note, In case the data have so many zeros, we instead estimated what is known as zero inltaed
negative binomial models or zero inflated poisson models.

24
Econometric model evaluation

The precision of the estimate depends on the size of the sample. Clearly the larger the sample
the better the estimate will be. Precision is measured by calculating the standard error of the
estimate or a confidence interval (usually the 95% confidence interval).

Confidence interval
A range of values, within which we are fairly sure the true value of the parameter being
investigated lies. A common confidence interval (CI) is 95%. Thus, for example, we can be
95% certain that the true population mean lies approximately within the interval calculated from
the sample mean ± 2 x standard error of the mean. 2 is an approximation, dependent on sample
size.

Statistical Hypothesis Testing

The formal statistical procedure for performing a hypothesis test is to state two hypotheses and
to use an appropriate statistical test to reject one of the hypotheses and therefore accept (or fail
to reject) the other.

The first hypothesis is usually referred to as the Null Hypothesis because it is the hypothesis of
no effect or no difference between the populations of interest. It is usually given the symbol H0.

The second hypothesis is usually called the Alternative Hypothesis by statisticians, but since it
is often the hypothesis that the researcher would like to be true, it is sometimes referred to as
the Study Hypothesis or Research Hypothesis. Note, however, in equivalence trials a researcher
would like a new (but perhaps cheaper) treatment to be as effective as the current treatment, it
is the null hypothesis that the researcher would like to see supported by the data. The Alternative
Hypothesis is usually given the symbol H1 or HA. The Alternative Hypothesis states that there
is an effect or that there is a difference between the populations.

One-and two-tailed hypotheses

Also referred to as one- and two-sided hypotheses, these refer to the alternative hypothesis. The
alternative hypothesis, referred to above, is a non-directional alternative hypothesis; it states
that there is a difference, with no indication of the direction of change, eg higher or lower, more
or less. This is a non-directional (two-tailed) alternative hypothesis.

However, in some instances the researcher may be interested in a change in one direction only
(eg pulse is lower or pain relief is better). The alternative hypothesis in this case is known as a
directional (one-tailed) alternative hypothesis. In this case, the alternative hypothesis will take
the form, for example:

H1: on average, there is a greater pain relief from taking drug A, than not.

Note: the null hypothesis is the same for both directional and non-directional cases.

The distinction between non-directional and directional hypotheses is important when

interpreting the results of significance tests. In Minitab, the appropriate alternative hypothesis
can be set, but SPSS printouts only show two-tailed probabilities. In the latter case, the p-value
(‘2-tail sig’) should be halved.

The p-value (level of significance)

25
All statistical tests produce a p-value and this is equal to the probability of obtaining the
observed difference, or one more extreme, if the null hypothesis is true. To put it another way
- if the null hypothesis is true, the p-value is the probability of obtaining a difference at least as
large as that observed due to sampling variation.

Consequently, if the p-value is small the data support the alternative hypothesis. If the p-value
is large the data support the null hypothesis. But how small is 'small' and how large is 'large'?!

Conventionally (and arbitrarily) a p-value of 0.05 (5%) is generally regarded as sufficiently

small to reject the null hypothesis. If the p-value is larger than 0.05 we fail to reject the null
hypothesis.

The 5% value is called the significance level of the test. Other significance levels that are
commonly used are 1% and 10%.

Statistical Power
The use of a significance level of 5% controls the probability of erroneously rejecting the null
hypothesis when it is, in fact, true. Rejecting the null hypothesis when it is true is called a Type
I error. However, there is another error can be made - that is failing to reject the null hypothesis
when it is, in fact, not true. This is called a Type II error.

The power of a test (or statistical power) refers to the probability that a statistical test will
correctly reject a null hypothesis when it is false. In other words, it’s the ability of the test to
detect an effect or difference when one truly exists. The power of a test is an important concept
in hypothesis testing, as it indicates how sensitive the test is to detecting real differences or
relationships in the data.

Key elements influencing statistical power

Significance level (α): The significance level (often set at 0.05) defines the threshold for
rejecting the null hypothesis. It represents the probability of making a Type I error, which
occurs when the null hypothesis is wrongly rejected. The lower the α value, the stricter the
criteria for rejecting the null hypothesis. However, a lower α may reduce the power of the test,
as fewer differences will be considered significant.

Sample size (n): Larger sample sizes generally increase the power of a test. This is because
larger samples provide more information, making it easier to detect a true effect. With smaller
sample sizes, the test may fail to detect differences, even if they exist, because the data is less
precise.

Effect size: The effect size measures the magnitude of the difference or relationship that the
test is trying to detect. A larger effect size makes it easier to detect a true effect, increasing the
power of the test. For example, if the difference between two groups is very large, the test is
more likely to detect that difference. If the effect size is small, the power of the test will
decrease.

Variability (Standard Deviation) in the Data: The less variability (or noise) there is in the
data, the easier it is to detect a significant effect. High variability (i.e., large standard deviations)
can obscure true differences or relationships, reducing the test's power.

26
Test type and design: The power of a test can also depend on the type of statistical test used
and the design of the experiment. For example, paired tests (such as paired t-tests) tend to have
more power than unpaired tests (like independent t-tests) because paired tests reduce the
variability by comparing the same subjects under different conditions.

To increase the power of a statistical test, researchers can:

 Increase the sample size: Larger sample sizes decrease variability and increase the
precision of estimates.
 Increase the effect size: Although the effect size is determined by the phenomenon being
studied, a more powerful experimental design can help magnify the effect.
 Reduce variability: By controlling for sources of variation or improving measurement
techniques, researchers can reduce the noise in the data and increase power.
 Choose a more sensitive test: Some statistical tests are more powerful than others (e.g.,
paired t-tests are more powerful than independent t-tests).

Importance of Power Analysis

Before conducting a study, researchers can perform a power analysis to determine the required
sample size for a given power level. This helps ensure that the study is adequately powered to
detect meaningful effects while minimizing the risk of Type II errors. Power analysis is also
used to identify the likelihood of detecting an effect when designing experiments, especially
when resources or participants are limited.

QUALITATIVE DATA ANALYSIS

Prefiguring the field

Analysis of qualitative data begins before it is collected. Sounds a little strange, but
1. by framing and posing a research question or problem

2. and being aware of the theoretical positions available on the topic researchers' are 'pre-
figuring the field' i.e. anticipating what they may find.

Rigour
Pre-figuring the field runs the risk of researchers only finding out what they want to find by
only looking for a specific phenomena, or by being blind to other issues that arise. It involves
the checks and balances built into qualitative research to make sure it is believable, trustworthy
and credible.

Reflexivity
Forewarned is forearmed. By being aware of the pitfalls of pre-figuring the field, researchers
can maintain an openess to the situation they are investigating. They can be attentive to issues
that are not expected or do not conform to existing accounts or theories of society. This idea of
being aware of your own values, ideas and pre-judgements as a researcher is known as
reflexivity.

Iteration
Iteration means moving back and forth. In qualitative research it is difficult to cleanly separate

27
out data collection or generation from data analysis because there is movement back and forth
between generation and analysis.

Researchers usually generate data at a point in time and also write analytical notes to themselves
about that data. These notes are then processed into memos or guiding notes to inform the next
bout of data collection. And so leads the merry dance.

Analytical memos
The sorts of things included are –
1. The identification of patterns;
2. Working out the limitations, exceptions and variations present in whatever is being
investigated;
3. Generating tentative explanations for the patterns and seeing if they are present or absent in
other settings or situations;
4. Working explanations into a theoretical model;
5. Confirming or modifying the theoretical model;

The way this is presented here sounds like it is an inevitable process that follows a straight line
and does not deviate. Of course life is not like that, and these stages are an ideal type meant to
help you get a handle on the topic. What makes qualitative data analysis dynamic, exciting and
intellectually challenging is the iteration between generation and analysis and within the
different types of analytical work.

Triangulation of analysis
It is very rare for qualitative data to be collected all in one go, then processed and analysed. If
this happened we might criticise the project for not being true to the context in which it was
generalised, which would make it a weak piece of work.

One way of producing believable, credible and trustworthy work is to use triangulation. This is
a term 'borrowed' from geography - and in qualitative analysis means more than one perspective
on a situation e.g. patients or service users, their families and friends, and service providers.

Fluency
To analyse texts for their meaning, researchers have to be fluent in the language which the
research participants use.

Not just the formal language, but also the colloquialisms used in every day talk. Listen carefully
next time you are in a public place to the richness of everyday language that bears little
resemblance to standard English - check with a friend their interpretation of phrase or word
against your own. An inability to understand what is said will restrict researchers' abilities to
gain an understanding of participants' motives, meanings and behaviours.

Capturing talk
The act of capturing talk may shape what is said and in turn influence how it is analysed. Using
tape recorders to capture talk means that researchers' may attend to the interviewee without
having to focus on writing down their talk verbatim. However, the recording will have to be
clear to allow an accurate transcription so attention to equipment and environment will have a
direct affect on the quality of the analysis.

28
Processing texts and archiving
The most common way of processing texts is to transcribe taped talk into word processed
documents. These may then be read and re-read to identify meaning, patterns and models.
Analytical notes and memos will be made, and all of these need to stored carefully -
1. to protect the integrity of the original document,
2. to allow the various components of the current analysis to be identified,
3. to locate the source of the comments made.

There are software programmes which provide an orderly and rigourous framework for data
archival and administrative tasks. Each programme has built in assumptions about data and how
it should be handled. Researchers need to choose with care a programme that is similar to their
own perspective and to the characteristics of their data.

Ensuring the voice of the researched is heard

The way in which qualitative research is presented to us as readers is crucial for us to have
confidence in the rigour of the work. A good way to show that theories come from the
understanding of the research participants is to allow their voices to be heard. This means
including representative quotations from peoples' talk to illustrate points.

This is where qualitative data analysis software programmes come into their own because they
allow researchers to earmark segments of text, apply tags or descriptive lables to the segments,
and build up categories and themes of analysis. When it comes to writing the definitive research
document these segments can then be found easily in the archive, and directly inserted into the
text.

Analysis of Visual Data

This involves analysis of a Set of Photographs using the following key questions:

● What is the setting of the photograph?

● What is the likely time of year and day?
● What is the subject of the photograph?
● Does it include people, animals, buildings and/or scenery?
● What is the main activity of the photograph?
● Does there seem to be a theme to a set of photographs?

Shubha Arya A0814121006 ITEL
No ratings yet
Shubha Arya A0814121006 ITEL
10 pages
An Overview of Data Analysis and Interpr
No ratings yet
An Overview of Data Analysis and Interpr
27 pages
Data Processing and Analysis
100% (3)
Data Processing and Analysis
38 pages
Data Analysis Guide for Researchers
No ratings yet
Data Analysis Guide for Researchers
8 pages
Data Analysis: Types, Process, Methods, Techniques and Tools
No ratings yet
Data Analysis: Types, Process, Methods, Techniques and Tools
6 pages
Chapter7 Methods of Research Module
No ratings yet
Chapter7 Methods of Research Module
9 pages
Data Editing and Analysis Techniques
No ratings yet
Data Editing and Analysis Techniques
5 pages
What Is Data Visualization and Why Is It Important
No ratings yet
What Is Data Visualization and Why Is It Important
18 pages
Business Research Data Analysis
No ratings yet
Business Research Data Analysis
34 pages
Module 4
No ratings yet
Module 4
8 pages
Additional Mathematics Project Work Form 5 2014
No ratings yet
Additional Mathematics Project Work Form 5 2014
33 pages
1define What Is Statistic
No ratings yet
1define What Is Statistic
2 pages
Data Analysis: Types and Process Explained
No ratings yet
Data Analysis: Types and Process Explained
82 pages
1overview On Data Analysis
No ratings yet
1overview On Data Analysis
67 pages
Chapter Four
No ratings yet
Chapter Four
2 pages
Data Analysis Methods for Researchers
No ratings yet
Data Analysis Methods for Researchers
16 pages
An Overview of Data Analysis and Interpretations in Research
No ratings yet
An Overview of Data Analysis and Interpretations in Research
3 pages
UNIT 2 Notes - Data Science
No ratings yet
UNIT 2 Notes - Data Science
18 pages
Data Analysis
No ratings yet
Data Analysis
22 pages
3 Is
No ratings yet
3 Is
4 pages
Xdata Analysis
No ratings yet
Xdata Analysis
7 pages
Tinywow Translate 56458424
No ratings yet
Tinywow Translate 56458424
14 pages
Unit 05: Data Preparation & Analysis
100% (1)
Unit 05: Data Preparation & Analysis
26 pages
Data Analysis in Business Research Key C
No ratings yet
Data Analysis in Business Research Key C
6 pages
Group 3.id - en
No ratings yet
Group 3.id - en
14 pages
MPRA Paper 120831
No ratings yet
MPRA Paper 120831
26 pages
Statistics
100% (1)
Statistics
12 pages
III - Data Analysis Method Module
No ratings yet
III - Data Analysis Method Module
10 pages
English - 3is - Q2 - LP 9
No ratings yet
English - 3is - Q2 - LP 9
12 pages
Quantitative Data Analysis Guide
100% (1)
Quantitative Data Analysis Guide
6 pages
Upload 3
No ratings yet
Upload 3
22 pages
Data Analysis & Research Methods Guide
No ratings yet
Data Analysis & Research Methods Guide
2 pages
Chap 7 (Research)
No ratings yet
Chap 7 (Research)
24 pages
Simple Guide to Quantitative Data Analysis
No ratings yet
Simple Guide to Quantitative Data Analysis
6 pages
Wolkite University: College of Computing and Informatics
No ratings yet
Wolkite University: College of Computing and Informatics
52 pages
Data Analysis - Wikipedia
No ratings yet
Data Analysis - Wikipedia
40 pages
Data Analysis Methods in Research
100% (2)
Data Analysis Methods in Research
14 pages
Data Presentation and Analysing: Orcid ADK-8747-2022
No ratings yet
Data Presentation and Analysing: Orcid ADK-8747-2022
18 pages
DATA ANALYSIS Docx
No ratings yet
DATA ANALYSIS Docx
17 pages
Data Analysis Techniques Overview
100% (1)
Data Analysis Techniques Overview
4 pages
CS 325 Data Collection Analyzing Data Conclusions and Results
No ratings yet
CS 325 Data Collection Analyzing Data Conclusions and Results
8 pages
Comprehensive Guide to Data Analysis
No ratings yet
Comprehensive Guide to Data Analysis
6 pages
6.research Methodology-BBA S1M6
No ratings yet
6.research Methodology-BBA S1M6
64 pages
BRM Chapter 6
No ratings yet
BRM Chapter 6
8 pages
ANALYSING EVALUATION RESULTS - Class Notes
No ratings yet
ANALYSING EVALUATION RESULTS - Class Notes
9 pages
Introduction To Data Analytics
0% (1)
Introduction To Data Analytics
28 pages
Data Analysis: By: Group 6
No ratings yet
Data Analysis: By: Group 6
12 pages
208 RM Lab File1 PDF
No ratings yet
208 RM Lab File1 PDF
31 pages
Data Analysis & Interpretation
No ratings yet
Data Analysis & Interpretation
5 pages
Unit 1 Notes - Data Analysis Using R
No ratings yet
Unit 1 Notes - Data Analysis Using R
17 pages
Term2 Datascience Notes
No ratings yet
Term2 Datascience Notes
8 pages
Assignment Data
No ratings yet
Assignment Data
7 pages
Data Analysis and Interpretation Guide
No ratings yet
Data Analysis and Interpretation Guide
33 pages
Data Analysis and Interpretation PDF
No ratings yet
Data Analysis and Interpretation PDF
13 pages
Notes - Data Collection and Analysis
No ratings yet
Notes - Data Collection and Analysis
2 pages
RESEARCH DEVELOPMENT Lesson 8
No ratings yet
RESEARCH DEVELOPMENT Lesson 8
13 pages
Unit 4
No ratings yet
Unit 4
15 pages
Irony Worksheet 5
No ratings yet
Irony Worksheet 5
4 pages
HSE Performance Report Summary
No ratings yet
HSE Performance Report Summary
1 page
Delegation Framework For HCM V. 9.0: Peoplesoft
100% (1)
Delegation Framework For HCM V. 9.0: Peoplesoft
33 pages
Digital Current Limiting Techniques For Switching Power Supplies PDF
No ratings yet
Digital Current Limiting Techniques For Switching Power Supplies PDF
9 pages
β-Galactosidase from Ziziphus spina-christi
No ratings yet
β-Galactosidase from Ziziphus spina-christi
9 pages
Finance Box NDA Agreement Details
No ratings yet
Finance Box NDA Agreement Details
9 pages
Railway ALP Physics & Maths Syllabus
No ratings yet
Railway ALP Physics & Maths Syllabus
4 pages
Hotspots of Biodiversity Borneo
No ratings yet
Hotspots of Biodiversity Borneo
18 pages
Wantai HBsAg ELISA Kit Overview
No ratings yet
Wantai HBsAg ELISA Kit Overview
16 pages
Helmet Manual Rev 04 09 15
No ratings yet
Helmet Manual Rev 04 09 15
15 pages
Section 7B CE 921E Brake Cool
No ratings yet
Section 7B CE 921E Brake Cool
8 pages
Wildlife Sanctuaries of Goa Guide
No ratings yet
Wildlife Sanctuaries of Goa Guide
30 pages
كشف حساب 3 أشهر
No ratings yet
كشف حساب 3 أشهر
6 pages
Santander Private Banking UK - Global Private Banking
No ratings yet
Santander Private Banking UK - Global Private Banking
11 pages
Prokaryotic Life: Structure and Function
No ratings yet
Prokaryotic Life: Structure and Function
42 pages
Humano 04diverse Catalogue
No ratings yet
Humano 04diverse Catalogue
25 pages
C Lab1
No ratings yet
C Lab1
4 pages
3mijares - Transes
No ratings yet
3mijares - Transes
9 pages
Horacio Vaggione CMJ Interview by Osvaldo Budon
No ratings yet
Horacio Vaggione CMJ Interview by Osvaldo Budon
13 pages
BSafe Case Study 4
No ratings yet
BSafe Case Study 4
4 pages
04c Denver-Green Ammonia-Siemens-final
No ratings yet
04c Denver-Green Ammonia-Siemens-final
11 pages
Internet Safety and Precautions Guide
No ratings yet
Internet Safety and Precautions Guide
20 pages
About Berhampur
No ratings yet
About Berhampur
9 pages
Week 7 Self Study
No ratings yet
Week 7 Self Study
14 pages
Chekhov's Plays: Second Series
No ratings yet
Chekhov's Plays: Second Series
244 pages
Beston Garbage Separation System
No ratings yet
Beston Garbage Separation System
31 pages
Factory Act PDF
No ratings yet
Factory Act PDF
26 pages
Radial Inflow Turbines
No ratings yet
Radial Inflow Turbines
17 pages
Commercial Suit for Recovery of ₹8.96L
100% (1)
Commercial Suit for Recovery of ₹8.96L
20 pages

Data Analysis Notes

Uploaded by

Data Analysis Notes

Uploaded by

RESEARCH METHODS 2025

LECTURE NOTES ON DATA ANALYSIS

Issues of data analysis

Why use data?

Before embacking on data analysis

Select and use appropriate statistical methods to analyze data

Data collection and cleaning

Table 1: Data types

There are two types of quantitative data: categorical and continuous.

c) Discrete and Continuous Data

d) Time series data

Levels of data measurement

Actual data analysis

Modeling and algorithms

Barriers to effective analysis

Confusing fact and opinion

Subjectivism and Objectivism

Secondary Quantitative Data

1. Locating and verifying the data

Have proper documentation

Potential Drawbacks (Disadvantages of using secondary data)

What were the conditions that led to their production?

c) Data may have decayed over time, been censored or purged

QUANTITATIVE DATA ANALYSIS

Empirical data analysis is a sophisticated approach to data analysis used to establish

Types of descriptive data analysis

ii) Bar Diagram

Line and Bar Graph

Multiple Bar Diagram

iv) Line graph

Figure: Bank credit ad Domestic savings/GDP ratio over time

Compound Bar Diagram

Table 1: Distribution of respondents by region

Freq. Percent Cum.

North 2,096 20.25 20.25

Two way Column Tabulation using

Table 2: Percentage distribution of Sector of employment by gender

Two way row tabulation

Table 3: Percentage distribution of youth Transition stages by background variables

Table 5: Summary statistics analysis for the study variables

variable N mean sd min max range cv p50

Measures of Dispersion: Range, Variance, and Standard Deviation

Results of sample correlation analysis is presented in Table 6 below.

Weight 0.039* 1.00

Simple regression analysis

Empirical reserach results:

TYPES OF REGERSSION MODELS

-------------+---------------------------------- F(1, 10349) = 459.42

Model | 41018.1735 1 41018.1735 Prob > F = 0.0000

Residual | 923981.868 10,349 89.2822367 R-squared = 0.0425

-------------+---------------------------------- Adj R-squared = 0.0424

Total | 965000.042 10,350 93.236719 Root MSE = 9.4489

height | Coef. Std. Err. t P>|t| [95% Conf. Interval]

age | -.1156419 .0053952 -21.43 0.000 -.1262176 -.1050662

Interpretation of the model results

d) Count dependent variable-The dependent variable is non-negative but assumes countable

Statistical Hypothesis Testing

One-and two-tailed hypotheses

The distinction between non-directional and directional hypotheses is important when

The p-value (level of significance)

Conventionally (and arbitrarily) a p-value of 0.05 (5%) is generally regarded as sufficiently

Key elements influencing statistical power

To increase the power of a statistical test, researchers can:

Importance of Power Analysis

QUALITATIVE DATA ANALYSIS

Prefiguring the field

Ensuring the voice of the researched is heard

Analysis of Visual Data

● What is the setting of the photograph?

You might also like