Data Types and Data Analysis
Data Types and Data Analysis
2
1-2
What is Research ?
4
1-4
Research is not … cont.
Data Collection
• an intermediate step to gain reliable knowledge
• collecting reliable data is part of the research
process
5
1-5
Research is not … cont.
Searching out published research results in libraries
(or the internet)
• This is an important early step of research
• The research process always includes desk research
and analysis
• But, just reviewing of literature is not research
6
1-6
Research is…
1. Searching for explanation of events, phenomena,
relationships and causes
– What, how and why things occur
– Are there interactions?
2. A process
– Planned and managed – to make the information generated
credible
– The process is creative
7
1-7
The Process of Research
• Define Research Problem
• Review of Literature
• Formulation of Hypothesis
• Preparation of Research design
• Data Collection
• Data Analysis
• Interpretation & Report Writing
8
1-8
Data Collection
The 5 W’s of data collection are:
1. What data is to be collected?
2. From whom data is to be collected?
3. Who will collect data?
4. From where the data will be collected?
5. When is the data collected?
9
1-9
Data Collection Methods
• Involves data collection directly from the subjects by the researcher or
trained data collector.
Primary Data • Data collected are specifically for the purpose of research
collection method • e.g. Surveys, Interview, Observations etc
• It involves of use of the data that were collected for various purposes
other than current research.
Secondary Data • Eg. diaries, nurses notes, care plans, patient medication record, statistical
collection method abstracts, census reports neither published or unpublished data
10
1-10
Primary Data
Advantage • The researcher can determine exactly what data will be collected
and can identify the specific tools that will be used.
• Completeness of data is ensured.
• Time consuming
• Rely on subjects recall and communication abilities
Disadvantage • Bias may occur due to various factors.
• Need to check reliability of raters.
11
1-11
Secondary Data
• It is easier and quicker.
• Absence of researcher’s biases.
Advantage • Economical and time saving
• Participant’s co-operation may not be necessary & it eliminates the
biases related to participant awareness.
12
1-12
Tools & Methods of Data Collection
• Tools
A device/ instrument used by the researcher to
collect data
• Methods
Various steps or strategies used for gathering and
analyzing data in a research
13
1-13
Tools for Data Collection
• Questionnaire
• Interview checklist
• Observational form
• Attitude/view scale
• Content analysis form
• Field Tools and equipments
14
1-14
Data Collection Methods
• Questionnaire
• Observations
• Experiments
• Simulation
• Ethnographic
15
1-15
Questionnaire Method
• Questionnaires are perhaps the most commonly
used method in sport-related research.
16
1-16
Categories of Questionnaire
Questionnaires generally fall into one of three categories:
1 Postal questionnaire. The questionnaire is given or posted to the participant,
who completes it in his or her own time. The participant then posts the
completed questionnaire back to the researcher.
3 Face to face questionnaire. The researcher and participant are in the same
location, and the researcher asks the questions ‘face to face’. 17
1-17
Observation
Observation is often classified as either non - participant or
participant observation.
Non-participant observation is the simplest form, and is where
the researcher will observe the phenomenon ‘from outside’
with no engagement with either the activity or the subjects.
Data can be collected using various techniques, for example
video, photography, or simply watching with the naked eye, and
recording the data on an appropriate sheet.
18
1-18
Observation cont.
19
1-19
Experimental
In an experiment, a treatment is applied to part of a
population and responses are observed.
For example, an experiment was performed in which
diabetics took cinnamon extract daily while a control
group took none. After 40 days, the diabetics who had
the cinnamon reduced their risk of heart disease
while the control group experienced no change.
20
1-20
Simulation
A simulation is the use of a mathematical or physical model to
reproduce the conditions of a situation or process. Collecting
data often involves the use of computers.
Simulations allow you to study situations that are impractical or
even dangerous to create in real life, and often the save time
and money.
For examples, automobile manufactures use simulations with
dummies to study the effect of crashes on humans,
Recreating plane crash in simulated environment to find reasons
of plane crash
21
1-21
Ethnographic Research
In this method, ethnographer investigates a group
through collecting data over a substantial period of
time, generally using a number of different methods.
The ethnographer studies the group on its own
ground, observing natural behaviours in a natural
setting.
22
1-22
23
1-23
What is Data?
24
1-24
Types of Data
A. Qualitative or Attribute data - the characteristic being
studied is nonnumeric.
E.g.: Gender, religious affiliation, state of birth, country representing, words, images,
videos
E.g.: time (in seconds) for 400 mts race, Prize money won by a tennis player , or
number of boundaries scored in a match.
1-25
Quantitative Variables - Classifications
1-27
Levels of Measurements
28
1-28
Nominal-Level Data
Properties:
• Observations of a
qualitative variable can
only be classified and
counted.
• There is no particular
order to the labels.
1-29
Ordinal-Level Data
Properties:
• Data classifications are represented
by sets of labels or names (high,
medium, low) that have relative
values.
• Because of the relative values, the
data classified can be ranked or
ordered.
1-30
Interval-Level Data
Properties:
Example: Women’s dress sizes listed on the table.
• Data classifications are ordered
according to the amount of the
characteristic they possess.
• Equal differences in the
characteristic are represented
by equal differences in the
measurements.
1-31
Ratio-Level Data
Practically all quantitative data is recorded on the ratio level of
measurement.
Ratio level is the “highest” level of measurement.
Properties:
• Data classifications are ordered according to the amount of the characteristics they
possess.
• Equal differences in the characteristic are represented by equal differences in the
numbers assigned to the classifications.
• The zero point is the absence of the characteristic and the ratio between two
numbers is meaningful.
1-32
Four Levels of Measurement
Nominal level - data that is classified into categories Interval level - similar to the ordinal level, with the
and cannot be arranged in any particular order. additional property that meaningful amounts of
differences between data values can be determined.
EXAMPLES: eye color, gender, religious affiliation. There is no natural zero point.
Ordinal level – data arranged in some order, but the Ratio level - the interval level with an inherent zero starting
differences between data values cannot be point. Differences and ratios are meaningful for this
determined or are meaningless. level of measurement.
1-33
Summary of the Characteristics for Levels of
Measurement
1-34
Why to Know the Level of Measurement of a Data?
1-35
Data Preparation
Four steps of Data Preparation
• Data Coding
• Data Entry
• Data Checking
• Dealing with missing data
36
1-36
37
1-37
Types of Analysis
38
1-38
• Descriptive statistics uses the data to provide
descriptions of the population, either through
numerical calculations or graphs or tables.
1-40
Choosing summary statistics
Scale Categorical
1-42
Flow chart of commonly used descriptive statistics Categorical data
and graphical illustrations
Frequency
Percentage (Row, Column or Total)
Continuous data
Graphical illustrations
Histogram (can be plotted against a categorical
variable)
Box & Whisker plot (can be plotted against a
categorical variable)
Dot plot (can be plotted against a categorical
variable)
1-43 Scatter plot (two continuous variables)
44
1-44
Inferential Statistics
The methods of inferential statistics are
• the estimation of parameter(s)
• testing of Statistical hypothesis
45
1-45
Parametric or Non-parametric?
1-46
Parametric Test
47
1-47
48
1-48
Non – Parametric Tests
49
1-49
Parametric & Non-Parametric tests
Purpose of test Parametric Test Non-Parametric Test
Compare central value( Mean / Median) One sample t / Z test Wilcoxon Signed Rank
with specific value
Compare central values of two Two sample t / Z test Mann -Whitney
independent samples
Compare central values of two Paired t test Wilcoxon Signed Rank
dependent samples
Compare central values of three or One Way ANOVA Kruskal - Wallis
more samples ( One Variable)
Compare central values of three or Two Way ANOVA Friedman
more samples ( Two Variable)
Compare independence of two Chi – Square
categorical variables 50
1-50
P-value
• P-value ≡ the probability the test statistic would take a value
as extreme or more extreme than observed test statistic,
when H0 is true
• Smaller-and-smaller P-values → stronger-and-stronger
evidence against H0
• For typical analysis, using the standard α = 0.05 cutoff, the
null hypothesis is
- rejected when p <= .05 and
- not rejected when p > .05.
51
1-51
Those who rule
DATA will rule
the entire world.
- Masayoshi Son
52
1-52