100% found this document useful (6 votes)
2K views

Data Types and Data Analysis

This presentation will give an idea of different types of data and statistics used to describe it. It also throws light on various data analysis techniques. More importantly it emphasizes on when to use which statistical technique for data analysis

Uploaded by

Parag Shah
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (6 votes)
2K views

Data Types and Data Analysis

This presentation will give an idea of different types of data and statistics used to describe it. It also throws light on various data analysis techniques. More importantly it emphasizes on when to use which statistical technique for data analysis

Uploaded by

Parag Shah
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 52

Data Types and Data Analysis

Dr. Parag Shah


H L College of Commerce
www.paragstatistics.wordpress.com
1
1-1
Session Flow
• What is Research?
• Research Process
• Tools & methods of Data Collection
• Types of Data
• Data Analysis :
– Descriptive and Inferential Statistics

2
1-2
What is Research ?

“Research is the systematic approach to obtaining


and confirming new and reliable knowledge”
– Systematic and orderly (following a series of steps)
– Purpose is new knowledge, which must be reliable

This is a general definition which applies to all


disciplines
3
1-3
Research is not
Accidental discovery :
• Accidental discovery may occur in structured research
process
• Usually takes the form of a phenomenon not previously
noticed
• May lead to a structured research process to verify or
understand the observation

4
1-4
Research is not … cont.
Data Collection
• an intermediate step to gain reliable knowledge
• collecting reliable data is part of the research
process

5
1-5
Research is not … cont.
Searching out published research results in libraries
(or the internet)
• This is an important early step of research
• The research process always includes desk research
and analysis
• But, just reviewing of literature is not research

6
1-6
Research is…
1. Searching for explanation of events, phenomena,
relationships and causes
– What, how and why things occur
– Are there interactions?
2. A process
– Planned and managed – to make the information generated
credible
– The process is creative

7
1-7
The Process of Research
• Define Research Problem
• Review of Literature
• Formulation of Hypothesis
• Preparation of Research design
• Data Collection
• Data Analysis
• Interpretation & Report Writing
8
1-8
Data Collection
The 5 W’s of data collection are:
1. What data is to be collected?
2. From whom data is to be collected?
3. Who will collect data?
4. From where the data will be collected?
5. When is the data collected?

9
1-9
Data Collection Methods
• Involves data collection directly from the subjects by the researcher or
trained data collector.
Primary Data • Data collected are specifically for the purpose of research
collection method • e.g. Surveys, Interview, Observations etc

• It involves of use of the data that were collected for various purposes
other than current research.
Secondary Data • Eg. diaries, nurses notes, care plans, patient medication record, statistical
collection method abstracts, census reports neither published or unpublished data

10
1-10
Primary Data

• Tailored according to research needs

Advantage • The researcher can determine exactly what data will be collected
and can identify the specific tools that will be used.
• Completeness of data is ensured.

• Time consuming
• Rely on subjects recall and communication abilities
Disadvantage • Bias may occur due to various factors.
• Need to check reliability of raters.

11
1-11
Secondary Data
• It is easier and quicker.
• Absence of researcher’s biases.
Advantage • Economical and time saving
• Participant’s co-operation may not be necessary & it eliminates the
biases related to participant awareness.

• Accuracy, completeness and reliability depend upon original


individual collecting the data.
Disadvantage • May not be suitable for answering current research question.
• Missed data and inaccuracy are common.
• Biases are commonly expected

12
1-12
Tools & Methods of Data Collection
• Tools
A device/ instrument used by the researcher to
collect data
• Methods
Various steps or strategies used for gathering and
analyzing data in a research
13
1-13
Tools for Data Collection
• Questionnaire
• Interview checklist
• Observational form
• Attitude/view scale
• Content analysis form
• Field Tools and equipments
14
1-14
Data Collection Methods
• Questionnaire
• Observations
• Experiments
• Simulation
• Ethnographic

15
1-15
Questionnaire Method
• Questionnaires are perhaps the most commonly
used method in sport-related research.

• Defined simply, a questionnaire is a standardized set


of questions to gain information from a subject.

16
1-16
Categories of Questionnaire
Questionnaires generally fall into one of three categories:
1 Postal questionnaire. The questionnaire is given or posted to the participant,
who completes it in his or her own time. The participant then posts the
completed questionnaire back to the researcher.

2 Telephone questionnaire. The researcher questions the participant over the


telephone and the researcher fills in the responses.

3 Face to face questionnaire. The researcher and participant are in the same
location, and the researcher asks the questions ‘face to face’. 17
1-17
Observation
Observation is often classified as either non - participant or
participant observation.
Non-participant observation is the simplest form, and is where
the researcher will observe the phenomenon ‘from outside’
with no engagement with either the activity or the subjects.
Data can be collected using various techniques, for example
video, photography, or simply watching with the naked eye, and
recording the data on an appropriate sheet.

18
1-18
Observation cont.

In Participant observation, the researcher actually


takes part in the phenomenon being studied.
Data in this instance would be recorded by the
researcher in the form of field notes, whereby the
researcher’s experiences would be recorded.

19
1-19
Experimental
In an experiment, a treatment is applied to part of a
population and responses are observed.
For example, an experiment was performed in which
diabetics took cinnamon extract daily while a control
group took none. After 40 days, the diabetics who had
the cinnamon reduced their risk of heart disease
while the control group experienced no change.
20
1-20
Simulation
A simulation is the use of a mathematical or physical model to
reproduce the conditions of a situation or process. Collecting
data often involves the use of computers.
Simulations allow you to study situations that are impractical or
even dangerous to create in real life, and often the save time
and money.
For examples, automobile manufactures use simulations with
dummies to study the effect of crashes on humans,
Recreating plane crash in simulated environment to find reasons
of plane crash
21
1-21
Ethnographic Research
In this method, ethnographer investigates a group
through collecting data over a substantial period of
time, generally using a number of different methods.
The ethnographer studies the group on its own
ground, observing natural behaviours in a natural
setting.

22
1-22
23
1-23
What is Data?

Data is a collection of facts or information from which


conclusions may be drawn

24
1-24
Types of Data
A. Qualitative or Attribute data - the characteristic being
studied is nonnumeric.
E.g.: Gender, religious affiliation, state of birth, country representing, words, images,
videos

B. Quantitative data - information is reported numerically.

E.g.: time (in seconds) for 400 mts race, Prize money won by a tennis player , or
number of boundaries scored in a match.

1-25
Quantitative Variables - Classifications

Quantitative variables can be classified as either discrete or


continuous.

A. Discrete variables: can only assume certain values


EXAMPLE: the number of goals in a football match, or the number of wickets by a bowler in a cricket match
(1,2,3,…,etc.)

B. Continuous variable can assume any value within a


specified range.
EXAMPLE: the height of an athlete or the weight of a boxer.
1-26
Summary of Types of Variables

1-27
Levels of Measurements

• Categorical: Nominal, Ordinal

• Scale: Interval, Ratio

28
1-28
Nominal-Level Data
Properties:
• Observations of a
qualitative variable can
only be classified and
counted.
• There is no particular
order to the labels.

1-29
Ordinal-Level Data
Properties:
• Data classifications are represented
by sets of labels or names (high,
medium, low) that have relative
values.
• Because of the relative values, the
data classified can be ranked or
ordered.

1-30
Interval-Level Data
Properties:
Example: Women’s dress sizes listed on the table.
• Data classifications are ordered
according to the amount of the
characteristic they possess.
• Equal differences in the
characteristic are represented
by equal differences in the
measurements.

1-31
Ratio-Level Data
 Practically all quantitative data is recorded on the ratio level of
measurement.
 Ratio level is the “highest” level of measurement.

Properties:
• Data classifications are ordered according to the amount of the characteristics they
possess.
• Equal differences in the characteristic are represented by equal differences in the
numbers assigned to the classifications.
• The zero point is the absence of the characteristic and the ratio between two
numbers is meaningful.

1-32
Four Levels of Measurement
Nominal level - data that is classified into categories Interval level - similar to the ordinal level, with the
and cannot be arranged in any particular order. additional property that meaningful amounts of
differences between data values can be determined.
EXAMPLES: eye color, gender, religious affiliation. There is no natural zero point.

EXAMPLE: Temperature on the Fahrenheit scale., size of garment,


Likert’s scale

Ordinal level – data arranged in some order, but the Ratio level - the interval level with an inherent zero starting
differences between data values cannot be point. Differences and ratios are meaningful for this
determined or are meaningless. level of measurement.

EXAMPLES: Monthly income of surgeons, or distance traveled by


EXAMPLE: During a taste test of 4 soft drinks, Thumps Up was ranked
manufacturer’s representatives per month.
number 1, Sprite number 2, Seven-up number 3, and Fanta
number 4.

1-33
Summary of the Characteristics for Levels of
Measurement

1-34
Why to Know the Level of Measurement of a Data?

• The level of measurement of the data dictates the


calculations that can be done to summarize and
present the data.
• To determine the statistical tests that should be
performed on the data

1-35
Data Preparation
Four steps of Data Preparation
• Data Coding
• Data Entry
• Data Checking
• Dealing with missing data

36
1-36
37
1-37
Types of Analysis

38
1-38
• Descriptive statistics uses the data to provide
descriptions of the population, either through
numerical calculations or graphs or tables.

• Inferential statistics makes inferences and


predictions about a population based on a sample
of data taken from the population in question.
39
1-39
Descriptive Statistics
Summarizing Data:
– Central Tendency (or Groups’ “Middle Values”)
• Mean
• Median
• Mode
– Variation (or Summary of Differences Within Groups)
• Range
• Interquartile Range
• Variance
• Standard Deviation

1-40
Choosing summary statistics

Which average and measure of


spread?

Scale Categorical

Normally Skewed data Ordinal: Nominal:


distributed Median Median Mode
Mean (Interquartile (Interquartile (None)
(Standard deviation) range) range)
1-41
Which graph?
1st Only 1 variable Scale Categorical
variable
Scale Histogram Scatter plot Box-plot

Categorical Pie/ Bar Box-plot Stacked/ multiple


bar chart

1-42
Flow chart of commonly used descriptive statistics  Categorical data
and graphical illustrations
 Frequency
 Percentage (Row, Column or Total)

 Continuous data: Measure of location


 Descriptive statistics  Mean
 Median

 Continuous data: Measure of variation


 Standard deviation
 Range (Min, Max)

Exploring data  Inter-quartile range (LQ, UQ)


 Categorical data
 Bar chart
 Clustered bar charts (two categorical variables)

 Continuous data
 Graphical illustrations
 Histogram (can be plotted against a categorical
variable)
 Box & Whisker plot (can be plotted against a
categorical variable)
 Dot plot (can be plotted against a categorical
variable)
1-43  Scatter plot (two continuous variables)
44
1-44
Inferential Statistics
The methods of inferential statistics are
• the estimation of parameter(s)
• testing of Statistical hypothesis

45
1-45
Parametric or Non-parametric?

•Parametric tests are restricted to data that:


1) show a normal distribution
2) are independent of one another
3) are on the same continuous scale of measurement

•Non-parametric tests are used on data that:


1) show an other-than normal distribution
2) are dependent or conditional on one another
3) in general, do not have a continuous scale of measurement

1-46
Parametric Test

47
1-47
48
1-48
Non – Parametric Tests

49
1-49
Parametric & Non-Parametric tests
Purpose of test Parametric Test Non-Parametric Test

Compare central value( Mean / Median) One sample t / Z test Wilcoxon Signed Rank
with specific value
Compare central values of two Two sample t / Z test Mann -Whitney
independent samples
Compare central values of two Paired t test Wilcoxon Signed Rank
dependent samples
Compare central values of three or One Way ANOVA Kruskal - Wallis
more samples ( One Variable)
Compare central values of three or Two Way ANOVA Friedman
more samples ( Two Variable)
Compare independence of two Chi – Square
categorical variables 50
1-50
P-value
• P-value ≡ the probability the test statistic would take a value
as extreme or more extreme than observed test statistic,
when H0 is true
• Smaller-and-smaller P-values → stronger-and-stronger
evidence against H0
• For typical analysis, using the standard α = 0.05 cutoff, the
null hypothesis is
- rejected when p <= .05 and
- not rejected when p > .05.
51
1-51
Those who rule
DATA will rule
the entire world.
- Masayoshi Son

52
1-52

You might also like