0% found this document useful (0 votes)
111 views25 pages

Understanding Data: Dr. Rohit Vishal Kumar

This document discusses different types of data and how they are classified. It begins by defining data as observations of variables and notes that data can be classified as primary or secondary based on its source. It then examines various statistical classifications of data, including categorical vs measurement data, and nominal, ordinal, interval, and ratio scales. The document also reviews descriptive statistics measures like measures of central tendency (mean, median, mode) and measures of dispersion (range, quartile deviation, mean absolute deviation, standard deviation).

Uploaded by

api-3697538
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views25 pages

Understanding Data: Dr. Rohit Vishal Kumar

This document discusses different types of data and how they are classified. It begins by defining data as observations of variables and notes that data can be classified as primary or secondary based on its source. It then examines various statistical classifications of data, including categorical vs measurement data, and nominal, ordinal, interval, and ratio scales. The document also reviews descriptive statistics measures like measures of central tendency (mean, median, mode) and measures of dispersion (range, quartile deviation, mean absolute deviation, standard deviation).

Uploaded by

api-3697538
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 25

UNDERSTANDING DATA

Dr. Rohit Vishal Kumar


Reader, Department of Marketing
Xavier Institute of Social Service
PO Box No 7, Purulia Road
Ranchi – 834001, Jharkhand, India
Email: [email protected]

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 1


What is Data?
• Observations of a set of variables
• Lowest level of abstraction from which information is derived

• Each Discipline has evolved it’s own method of classification of data

• Two Broad Classification of Data Based on Source


– Primary Data:
• Data Collected from Primary Source
– Secondary Data:
• Data Collected From Secondary Source

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 2


Classification :: Statistics
• Categorical Data
– The Objects are grouped into categories based on some Qualitative Trait
– The resultant data are merely labels or categories
– Example:
• Hair Color: Brown / Black / Red
• Smoking Status: Favor / Neutral / Against
• Measurement Data
– The Objects are “measured” on some Quantitative Trait
– The resultant data is a set of numbers
– Example:
• Age of the Students
• JEMAT Score
• Number of Students Not Attending Class

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 3


Categorical Data
• Nominal Data
– A type of categorical data in which numbers act as a label without having
any specific meaning
– Example:
• Male : 1
• Female: 2
• Ordinal Data
– A type of categorical data in which numbers act as an guide to the level of
importance of the object
– Example:
• Mild
• Moderate
• Severe

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 4


Measurement Data
• Discrete Data
– Only Certain Values are Possible
– There are gaps between the possible value
– Are generated through the process of Counting
– Example:
• Number of students in the class
• Number of Employees Absent from Work
• Continuous Data
– Any value within an interval is possible with a suitable measuring device
– Theoretically, the number can be accurate to any desired number of
decimal places
– Are generated through the process of Measurement
– Example:
• Height in cm
• Time to complete the assignment

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 5


Classification :: Scaling Theory
• Nominal Data ORDER DISTANCE ORIGIN
– A type of categorical data in which numbers act as a label without having
any specific meaning
– Example:
• Male : 1
• Female: 2
• Ordinal Data
– A type of categorical data in which numbers act as an guide to the level of
importance of the object
– Example:
• Mild
• Moderate
• Severe

ORDER DISTANCE ORIGIN


(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 6
Classification :: Scaling Theory
• Interval Data ORDER DISTANCE ORIGIN
– Quantitative Data but does not has any real zero point
– Allows comparison within the scale but cannot compare outside the scale
– Used in Social Research, but most researcher not clear about Interval
scale
– Example:
• Definitely Will Buy / Probably Will Buy / May or May not Buy / Probably Will not
Buy / Definitely Will not Buy
• Ratio Data
– Quantitative Data but has real zero point
– Allows conversion and preservation on the magnitude in another scale
– Example:
• Distance in Kms

ORDER DISTANCE ORIGIN


(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 7
Why understand Data?
• The type of Analysis depends on the Type of data you have
collected
• General Guideline is a follows:

– Nominal Data Mode, Chi-Square

– Ordinal Data + Median / Percentiles

– Interval Data + Mean / SD / Correlation / Regression /


ANOVA

– Ratio Scale + Geometric Mean / Harmonic Mean /


Coefficient of Variation / Logarithms

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 8


Some Points to Remember
• Tend to use Interval Scales
• Data need not be comparable with other studies
• Data has to make sense in your context
• Students fail to understand the importance of Data
– Wrong Approach
• “Data Collect Kore Niyechi… Ebar Ki Kori”
– Right Approach
• “Amar Ki Data Dorkar? Kano Daokar? Kothay Pabo? Kibhabe
Analyse Kore Uttor Pabo”

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 9


Descriptive Statistics
:: A Quick Review

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 10


Measures of Central Tendency
• Central tendency is “loosely” defined as the concept of
location of the center of a distribution of data
• Three basic measures
– Arithmetic Mean
– Median
– Mode

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 11


Arithmetic Mean
• Advantages:
– Easy to Compute
– Affected by every value in the set of observations
– Defined by rigid mathematical formulation
– It is relatively reliable
– It represents the “center of gravity” of the data
• Disadvantages:
– Unduly affected by small and / or large values
– Cannot be calculated for data with open ended class
– Is a good measure only when the distribution is fairly symmetric

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 12


Median
• Advantages
– Refers to the “Middle Value” of the distribution
– It is a “positional measure”
– Useful in case of open ended class
– Not seriously affected by Extreme Values
– Most appropriate for dealing with Qualitative Rank Data
– Has a series of related positional measures like Quartiles, Deciles,
Percentiles
• Disadvantages:
– It does not take every value into consideration
– It is not capable of algebraic treatment
– It is erratic if the number of items are smalle

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 13


Mode
• Advantages:
– It is the most typical or representative value of a distribution
– Not unduly affected by extreme values
– It can be used to describe qualitative phenomenon
• Disadvantages:
– Mode may not be there in a distribution or may be present more
than once in a distribution
– Not capable of algebraic treatment
– It is not rigidly defined for calculation

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 14


Relation Between the 3 Measures
• In moderately skewed distribution:
Mode = 3 Median – 2 Mean

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 15


Measures of Dispersion
• Dispersion is defined as the degree to which data tends to
spread about a central value
• Four Absolute & Relative Measures
– Range Coefficient of Range
– Quartile Deviation Coefficient of Quartile Deviation
– Mean Absolute Deviation Coefficient of MAD
– Standard Deviation Coefficient of Variation

• Range and QD are positional measures of dispersion


• AD and SD are calculation measures of dispersion

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 16


Range
• Range

• Coefficient of Range:

• Advantages
– Simplest to understand and compute
• Disadvantages:
– Not based on each and every item in the data
– Does not take into account the shape of distribution
– Cannot be computed in case of open ended classes

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 17


Quartile Deviation
• Inter Quartile Range (IQR)

• Quartile Deviation (Semi IQR)

• Coefficient of QD

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 18


Quartile Deviation
• Advantages:
– Can measure variation in open ended distributions
– It is extremely useful in case of erratic or badly skewed data
– It is not affected by extreme values
• Disadvantages:
– Ignores 50% of the data
– Is not capable of mathematical manipulation
– Is not considered as a measure of dispersion:
• Effectively shows the distance between two positional points

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 19


Mean Absolute Deviation
• Mean Absolute Deviation (MAD) defined as:

• Coefficient of MAD defined as:


= MAD / Median or MAD / Mean
• Advantages:
– Simple to understand and compute
– Based on each and every item in the data
– Less affected by extreme values than other measured
• Disadvantage:
– It is not capable of mathematical treatment

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 20


Standard Deviation
• Defined as “Root Mean Squared Deviation from Mean”

• Coefficient of Variation

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 21


Standard Deviation
• Advantages:
– Best Measure of Dispersion
– Possible to calculate the combined standard deviation of two or
more groups
– Chebycheff’s Theorem (1821-1894)
• What so ever be the distribution at least 75% of the values will fall
within +/- 2 sd from the mean of the distribution and at least 89% will
fall within +/- 3 sd from the mean of the distribution
– Has relation with other measures:
• QD = 0.667 SD
• MD = 0.80 SD

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 22


Skewness
• Refers to the asymmetry in the shape of the distribution

• Important to test skewness in data analysis as skewed


data suggest that the assumption of normality is violated

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 23


Kurtosis
• Kurtosis means “Bulginess”
• Refers to the degree of flatness or peaked-ness in the
region about the mode of the distribution:
– Lepto-Kurtic : If the curve is more peaked than Normal Curve
– Meso-Kurtic : If the curve is the same as the Normal Curve
– Platy-Kurtic : If the curve is less peaked than Normal Curve

• The peakedness of Normal Curve is taken as 3


• Presence of Kurtosis does not violate normality
• Important to check Kurtosis because it shows the
distribution of data around the mode

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 24


What is Descriptive Statistics?
• The following Needs to Be Reported:
– Arithmetic Mean
– Median
– Mode
– Standard Deviation
– Variance
– Kurtosis
– Skewness
– Range
– Minimum
– Maximum
– Sum
– Count

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 25

You might also like