0% found this document useful (0 votes)
57 views

Statistic Notes

The document discusses different types of data including univariate, bivariate, and multivariate data. It also covers attribute, quantitative, discrete, and continuous data as well as nominal, ordinal, interval, and ratio levels of measurement. The document concludes by defining time series and cross-sectional data.

Uploaded by

Waleed Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Statistic Notes

The document discusses different types of data including univariate, bivariate, and multivariate data. It also covers attribute, quantitative, discrete, and continuous data as well as nominal, ordinal, interval, and ratio levels of measurement. The document concludes by defining time series and cross-sectional data.

Uploaded by

Waleed Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 25

Basics of Data

Contents
Basic Elements of Statistics
Data Come From
Data Types: (Basis = Number of Variables)
Data Types in General
Levels of Measurement
Time Series and Cross-sectional Data
Basics Elements of Statistics
 Individuals, Variables, Observation(s)
• Individual– an item for study (e.g., an employee in
your company).
• Variable – a characteristic about the subject or
individual (e.g., employee’s income).
• Observation – value measured for variable in an
individual. A set of observations makes a data set.
Data Come From:
• Data is the plural form of the Latin datum (a “given”
fact).
• In scientific research, data arise
from experiments whose results
are recorded systematically.
• In business, data usually arise from
accounting transactions or
management processes.

• Important decisions may depend on data.


McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
Data Types: (Basis=Number of Variables)
 • Three types of data sets:

Data Set Variables Typical Tasks


Univariate One Histograms, descriptive
statistics, frequency tallies

Bivariate Two Scatter plots, correlations,


simple regression

Multivariate More than Multiple regression, data


two mining, econometric modeling
Data Types: An Example
 Individuals, Variables, Data Sets
Consider the multivariate data set with
5 variables 8 Individuals
5 x 8 = 40 observations
Data Types in General
 Data Types
• A data set may have one or more data types.

Types of Data

Attribute Numerical
(qualitative) (quantitative)

Verbal Label Coded Discrete Continuous


X = economics X=3 X=2 X = 3.15
(your major) (i.e., economics) (your siblings) (your GPA)
Data Types in General (con’t.)
 Attribute/Qualitative Data
• Also called categorical, nominal/ordinal or
qualitative data.
• Values are described by words rather than
numbers.
• For example,
- Automobile style (e.g., X = full, midsize,
compact, subcompact).
- Mutual fund (e.g., X = load, no-load).
• Data obtained when observations made about a
qualitative variable.
Data Types in General (con’t.)
 Quantitative Data
• Observations about quantitative variable by
counting, measurement, or some kind of
calculation.
• For example,
- Number of auto insurance claims filed in
March (e.g., X = 114 claims).
- Ratio of profit to sales for last quarter
(e.g., X = 0.0447).

• Can be broken down into two types – discrete or


continuous data.
Data Types in General (con’t.)
 Discrete Data
• A numerical variable with a countable number of
values that can be represented by an integer (no
fractional values).
• For example,
- Number of Medicaid patients (e.g., X = 2).
- Number of takeoffs at O’Hare (e.g., X = 37).
Data Types in General (con’t.)
 Continuous Data
• A numerical variable that can have any value
within an interval (e.g., length, weight, time, sales,
price/earnings ratios).
• Any continuous interval contains infinitely many
possible values (e.g., 426 < X < 428).
Data Types in General (con’t.)
 Rounding
• Ambiguity is introduced when continuous data are
rounded to whole numbers.
• Underlying measurement scale is continuous.
• Precision of measurement depends on instrument.
• Sometimes discrete data are treated as
continuous when the range is very large (e.g., SAT
scores) and small differences (e.g., 604 or 605)
aren’t of much importance.
Level of Measurement
 Four levels of measurement for data:
Level of
Measurement Characteristics Example
Nominal Categories only Eye color (blue, brown,
green, hazel)
Ordinal Rank has meaning Bond ratings (Aaa, Aab,
C, D, F, etc.)
Interval Distance has Temperature (57o
meaning Celsius)
Ratio Meaningful zero Accounts payable ($21.7
exists million)
Level of Measurement (con’t.)
 Nominal Measurement
• Nominal data merely identify a category.
• Nominal data are qualitative, attribute, categorical
or classification data (e.g., Apple, Compaq, Dell,
HP).
• Nominal data are usually coded numerically,
codes are arbitrary (e.g., 1 = Apple, 2 = Compaq,
3 = Dell, 4 = HP).
• Only mathematical operations are counting (e.g.,
frequencies) and simple statistics.
Level of Measurement (con’t.)
 Ordinal Measurement
• Ordinal data codes can be ranked
(e.g., 1 = Frequently, 2 = Sometimes, 3 = Rarely,
4 = Never).
• Distance between codes is not meaningful
(e.g., distance between 1 and 2, or between 2 and
3, or between 3 and 4 lacks meaning).
• Many useful statistical tests exist for ordinal data.
Especially useful in social science, marketing and
human resource research.
Level of Measurement (con’t.)
 Interval Measurement
• Data can not only be ranked, but also have
meaningful intervals between scale points
(e.g., difference between 60F and 70F is same
as difference between 20F and 30F).
• Since intervals between numbers represent
distances, mathematical operations can be
performed (e.g., average).
• Zero point of interval scales is arbitrary, so ratios
are not meaningful (e.g., 60F is not twice as
warm as 30F).
Level of Measurement (con’t.)
 Likert Scales
• A special case of interval data frequently used in
survey research.
• The coarseness of a Likert scale refers to the
number of scale points (typically 5 or 7).
“College-bound high school students should be required to study a
foreign language.” (check one)
    
Strongly Somewhat Neither Somewhat Strongly
Agree Agree Agree Disagree Disagree
Nor
Disagree
Level of Measurement (con’t.)
 Likert Scales
• A neutral midpoint (“Neither Agree Nor Disagree”)
is allowed if an odd number of scale points is used
or omitted to force the respondent to “lean” one
way or the other.
Likert coding: Likert coding:
• Likert data are 1 to 5 scale -2 to +2 scale
coded numerically 5 = Help a lot +2 = Help a lot
(e.g., 1 to 5) but any 4 = Help a little +1 = Help a little
3 = No effect 0 = No effect
equally spaced 2 = Hurt a little 1 = Hurt a little
values will work. 1 = Hurt a lot 2 = Hurt a lot
Level of Measurement (con’t.)
 Likert Scales
• Careful choice of verbal anchors results in
measurable intervals (e.g., the distance from 1 to
2 is “the same” as the interval, say, from 3 to 4).

• Ratios are not meaningful (e.g., here 4 is not


twice 2).
• Many statistical calculations can be performed
(e.g., averages, correlations, etc.).
Level of Measurement (con’t.)
 Ambiguity
• Grades are usually coded numerically
(A = 4, B = 3, C = 2, D = 1, F = 0) and are used to
calculate a mean GPA.
• Is the interval from 3.0 to 4.0 really the same as
the interval from 1.0 to 2.0?
• What is the underlying reality ranging from 0 to 4
that we are measuring?
• Best to be conservative and limit statistical tests to
those for ordinal data.
Level of Measurement (con’t.)
 Ratio Measurement
• Ratio data have all properties of nominal, ordinal
and interval data types and also possess a
meaningful zero (absence of quantity being
measured).
• Because of this zero point, ratios of data values
are meaningful (e.g., $20 million profit is twice as
much as $10 million).
• Zero does not have to be observable in the data,
it is an absolute reference point.
Level of Measurement (con’t.)
 Use the following procedure to
recognize data types:
Question If “Yes”

Q1. Is there a Ratio data (all statistical operations are


meaningful zero point? allowed)
Q2. Are intervals Interval data (common statistics allowed,
between scale points e.g., means and standard deviations)
meaningful?
Q3. Do scale points Ordinal data (restricted to certain types
represent rankings? of nonparametric statistical tests)
Q4. Are there discrete Nominal data (only counting allowed,
categories? e.g. finding the mode)
Level of Measurement (con’t.)
 Changing Data by Recoding
• In order to simplify data or when exact data
magnitude is of little interest, ratio data can be
recoded downward into ordinal or nominal
measurements (but not conversely).
• For example, recode systolic blood pressure as
“normal” (under 130), “elevated” (130 to 140), or
“high” (over 140).
• The above recoded data are ordinal (ranking is
preserved) but intervals are unequal and some
information is lost.
Time Series and Cross-sectional Data

 Time Series Data


• Each observation in the sample represents a
different equally spaced point in time (e.g., years,
months, days).
• Periodicity may be annual, quarterly, monthly,
weekly, daily, hourly, etc.
• We are interested in trends and patterns over time
(e.g., annual growth in consumer debit card use
from 1999 to 2006).
Time Series and Cross-sectional Data
(con’t.)
 Cross-sectional Data
• Each observation represents a different individual
unit (e.g., person) at the same point in time
(e.g., monthly VISA balances).
• We are interested in
- variation among observations or in
- relationships.
• We can combine the two data types to get pooled
cross-sectional and time series data.

You might also like