0% found this document useful (0 votes)

111 views25 pages

Understanding Data: Dr. Rohit Vishal Kumar

This document discusses different types of data and how they are classified. It begins by defining data as observations of variables and notes that data can be classified as primary or secondary based on its source. It then examines various statistical classifications of data, including categorical vs measurement data, and nominal, ordinal, interval, and ratio scales. The document also reviews descriptive statistics measures like measures of central tendency (mean, median, mode) and measures of dispersion (range, quartile deviation, mean absolute deviation, standard deviation).

Uploaded by

api-3697538

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views25 pages

Understanding Data: Dr. Rohit Vishal Kumar

Uploaded by

api-3697538

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 25

UNDERSTANDING DATA

Dr. Rohit Vishal Kumar

Reader, Department of Marketing
Xavier Institute of Social Service
PO Box No 7, Purulia Road
Ranchi – 834001, Jharkhand, India
Email: [email protected]

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 1

What is Data?
• Observations of a set of variables
• Lowest level of abstraction from which information is derived

• Each Discipline has evolved it’s own method of classification of data

• Two Broad Classification of Data Based on Source

– Primary Data:
• Data Collected from Primary Source
– Secondary Data:
• Data Collected From Secondary Source

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 2

Classification :: Statistics
• Categorical Data
– The Objects are grouped into categories based on some Qualitative Trait
– The resultant data are merely labels or categories
– Example:
• Hair Color: Brown / Black / Red
• Smoking Status: Favor / Neutral / Against
• Measurement Data
– The Objects are “measured” on some Quantitative Trait
– The resultant data is a set of numbers
– Example:
• Age of the Students
• JEMAT Score
• Number of Students Not Attending Class

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 3

Categorical Data
• Nominal Data
– A type of categorical data in which numbers act as a label without having
any specific meaning
– Example:
• Male : 1
• Female: 2
• Ordinal Data
– A type of categorical data in which numbers act as an guide to the level of
importance of the object
– Example:
• Mild
• Moderate
• Severe

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 4

Measurement Data
• Discrete Data
– Only Certain Values are Possible
– There are gaps between the possible value
– Are generated through the process of Counting
– Example:
• Number of students in the class
• Number of Employees Absent from Work
• Continuous Data
– Any value within an interval is possible with a suitable measuring device
– Theoretically, the number can be accurate to any desired number of
decimal places
– Are generated through the process of Measurement
– Example:
• Height in cm
• Time to complete the assignment

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 5

Classification :: Scaling Theory
• Nominal Data ORDER DISTANCE ORIGIN
– A type of categorical data in which numbers act as a label without having
any specific meaning
– Example:
• Male : 1
• Female: 2
• Ordinal Data
– A type of categorical data in which numbers act as an guide to the level of
importance of the object
– Example:
• Mild
• Moderate
• Severe

ORDER DISTANCE ORIGIN

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 6
Classification :: Scaling Theory
• Interval Data ORDER DISTANCE ORIGIN
– Quantitative Data but does not has any real zero point
– Allows comparison within the scale but cannot compare outside the scale
– Used in Social Research, but most researcher not clear about Interval
scale
– Example:
• Definitely Will Buy / Probably Will Buy / May or May not Buy / Probably Will not
Buy / Definitely Will not Buy
• Ratio Data
– Quantitative Data but has real zero point
– Allows conversion and preservation on the magnitude in another scale
– Example:
• Distance in Kms

ORDER DISTANCE ORIGIN

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 7
Why understand Data?
• The type of Analysis depends on the Type of data you have
collected
• General Guideline is a follows:

– Nominal Data Mode, Chi-Square

– Ordinal Data + Median / Percentiles

– Interval Data + Mean / SD / Correlation / Regression /

ANOVA

– Ratio Scale + Geometric Mean / Harmonic Mean /

Coefficient of Variation / Logarithms

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 8

Some Points to Remember
• Tend to use Interval Scales
• Data need not be comparable with other studies
• Data has to make sense in your context
• Students fail to understand the importance of Data
– Wrong Approach
• “Data Collect Kore Niyechi… Ebar Ki Kori”
– Right Approach
• “Amar Ki Data Dorkar? Kano Daokar? Kothay Pabo? Kibhabe
Analyse Kore Uttor Pabo”

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 9

Descriptive Statistics
:: A Quick Review

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 10

Measures of Central Tendency
• Central tendency is “loosely” defined as the concept of
location of the center of a distribution of data
• Three basic measures
– Arithmetic Mean
– Median
– Mode

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 11

Arithmetic Mean
• Advantages:
– Easy to Compute
– Affected by every value in the set of observations
– Defined by rigid mathematical formulation
– It is relatively reliable
– It represents the “center of gravity” of the data
• Disadvantages:
– Unduly affected by small and / or large values
– Cannot be calculated for data with open ended class
– Is a good measure only when the distribution is fairly symmetric

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 12

Median
• Advantages
– Refers to the “Middle Value” of the distribution
– It is a “positional measure”
– Useful in case of open ended class
– Not seriously affected by Extreme Values
– Most appropriate for dealing with Qualitative Rank Data
– Has a series of related positional measures like Quartiles, Deciles,
Percentiles
• Disadvantages:
– It does not take every value into consideration
– It is not capable of algebraic treatment
– It is erratic if the number of items are smalle

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 13

Mode
• Advantages:
– It is the most typical or representative value of a distribution
– Not unduly affected by extreme values
– It can be used to describe qualitative phenomenon
• Disadvantages:
– Mode may not be there in a distribution or may be present more
than once in a distribution
– Not capable of algebraic treatment
– It is not rigidly defined for calculation

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 14

Relation Between the 3 Measures
• In moderately skewed distribution:
Mode = 3 Median – 2 Mean

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 15

Measures of Dispersion
• Dispersion is defined as the degree to which data tends to
spread about a central value
• Four Absolute & Relative Measures
– Range Coefficient of Range
– Quartile Deviation Coefficient of Quartile Deviation
– Mean Absolute Deviation Coefficient of MAD
– Standard Deviation Coefficient of Variation

• Range and QD are positional measures of dispersion

• AD and SD are calculation measures of dispersion

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 16

Range
• Range

• Coefficient of Range:

• Advantages
– Simplest to understand and compute
• Disadvantages:
– Not based on each and every item in the data
– Does not take into account the shape of distribution
– Cannot be computed in case of open ended classes

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 17

Quartile Deviation
• Inter Quartile Range (IQR)

• Quartile Deviation (Semi IQR)

• Coefficient of QD

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 18

Quartile Deviation
• Advantages:
– Can measure variation in open ended distributions
– It is extremely useful in case of erratic or badly skewed data
– It is not affected by extreme values
• Disadvantages:
– Ignores 50% of the data
– Is not capable of mathematical manipulation
– Is not considered as a measure of dispersion:
• Effectively shows the distance between two positional points

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 19

Mean Absolute Deviation
• Mean Absolute Deviation (MAD) defined as:

• Coefficient of MAD defined as:

= MAD / Median or MAD / Mean
• Advantages:
– Simple to understand and compute
– Based on each and every item in the data
– Less affected by extreme values than other measured
• Disadvantage:
– It is not capable of mathematical treatment

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 20

Standard Deviation
• Defined as “Root Mean Squared Deviation from Mean”

• Coefficient of Variation

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 21

Standard Deviation
• Advantages:
– Best Measure of Dispersion
– Possible to calculate the combined standard deviation of two or
more groups
– Chebycheff’s Theorem (1821-1894)
• What so ever be the distribution at least 75% of the values will fall
within +/- 2 sd from the mean of the distribution and at least 89% will
fall within +/- 3 sd from the mean of the distribution
– Has relation with other measures:
• QD = 0.667 SD
• MD = 0.80 SD

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 22

Skewness
• Refers to the asymmetry in the shape of the distribution

• Important to test skewness in data analysis as skewed

data suggest that the assumption of normality is violated

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 23

Kurtosis
• Kurtosis means “Bulginess”
• Refers to the degree of flatness or peaked-ness in the
region about the mode of the distribution:
– Lepto-Kurtic : If the curve is more peaked than Normal Curve
– Meso-Kurtic : If the curve is the same as the Normal Curve
– Platy-Kurtic : If the curve is less peaked than Normal Curve

• The peakedness of Normal Curve is taken as 3

• Presence of Kurtosis does not violate normality
• Important to check Kurtosis because it shows the
distribution of data around the mode

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 24

What is Descriptive Statistics?
• The following Needs to Be Reported:
– Arithmetic Mean
– Median
– Mode
– Standard Deviation
– Variance
– Kurtosis
– Skewness
– Range
– Minimum
– Maximum
– Sum
– Count

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 25

11 Economics - Measures of Central Tendency - Notes
92% (12)
11 Economics - Measures of Central Tendency - Notes
16 pages
1.1 Mode Median and Range
No ratings yet
1.1 Mode Median and Range
7 pages
Unit 6 Study Guide Answer Key
100% (1)
Unit 6 Study Guide Answer Key
4 pages
The Friedman Test
No ratings yet
The Friedman Test
7 pages
Foundations or Research Analysis
No ratings yet
Foundations or Research Analysis
31 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
1.9 Data and Data Analysis
No ratings yet
1.9 Data and Data Analysis
31 pages
Lecture Week 2 Statistics
No ratings yet
Lecture Week 2 Statistics
57 pages
Ge8 Statistics
No ratings yet
Ge8 Statistics
2 pages
PC 2 Statistics by Praveen Mathur
No ratings yet
PC 2 Statistics by Praveen Mathur
44 pages
Introductory Lecture
No ratings yet
Introductory Lecture
29 pages
Descriptive Stat Lec 1
No ratings yet
Descriptive Stat Lec 1
32 pages
Data Analysis and Statistical Treatment
No ratings yet
Data Analysis and Statistical Treatment
99 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
73 pages
Day 3 Educational Statistics
No ratings yet
Day 3 Educational Statistics
37 pages
01 Data
No ratings yet
01 Data
100 pages
ANL303 - Week - 3 - Jan 2023
No ratings yet
ANL303 - Week - 3 - Jan 2023
69 pages
Introduction To Statistics
100% (3)
Introduction To Statistics
43 pages
8614.educational Statitics Unit 4
No ratings yet
8614.educational Statitics Unit 4
34 pages
MMW (Data Management) - Part 1
No ratings yet
MMW (Data Management) - Part 1
26 pages
Data Management
No ratings yet
Data Management
36 pages
Lecture 7
No ratings yet
Lecture 7
17 pages
QT Module-2
No ratings yet
QT Module-2
45 pages
Measures of CT and Dispersion
No ratings yet
Measures of CT and Dispersion
57 pages
Statistics
No ratings yet
Statistics
68 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Module 3 4 MMW
No ratings yet
Module 3 4 MMW
6 pages
Reviewer Part 1
No ratings yet
Reviewer Part 1
9 pages
Statistics - Imp Points
No ratings yet
Statistics - Imp Points
6 pages
Chapter 01
No ratings yet
Chapter 01
56 pages
Lecture 9
No ratings yet
Lecture 9
40 pages
RM EBBA Class 8 CH0 11 Quatitative Analysis
No ratings yet
RM EBBA Class 8 CH0 11 Quatitative Analysis
37 pages
Lesson 5 (Descriptive Statistics Part 1) - Oct 2024
No ratings yet
Lesson 5 (Descriptive Statistics Part 1) - Oct 2024
72 pages
Intro SRM
No ratings yet
Intro SRM
73 pages
DDDDDD 6
No ratings yet
DDDDDD 6
3 pages
Six Sigma: Statistics: By: - Hakeem-Ur-Rehman
No ratings yet
Six Sigma: Statistics: By: - Hakeem-Ur-Rehman
44 pages
Measures of CT and Dispersion
No ratings yet
Measures of CT and Dispersion
43 pages
Desc. Stat
No ratings yet
Desc. Stat
41 pages
Last Minute Statistics Revision Sscjsosi Abhishek
No ratings yet
Last Minute Statistics Revision Sscjsosi Abhishek
31 pages
Data Analysis
No ratings yet
Data Analysis
30 pages
Safari
No ratings yet
Safari
385 pages
Topic 8 Data Processing and Analysis PDF
No ratings yet
Topic 8 Data Processing and Analysis PDF
157 pages
Data Management
No ratings yet
Data Management
48 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Statistics Lecture 1
No ratings yet
Statistics Lecture 1
20 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
63 pages
Ahsan Stats
No ratings yet
Ahsan Stats
9 pages
Statistics
No ratings yet
Statistics
63 pages
Data Management
100% (1)
Data Management
51 pages
Presentation On Data Analysis: Submitted by
No ratings yet
Presentation On Data Analysis: Submitted by
38 pages
Unit-3 DS Students
No ratings yet
Unit-3 DS Students
35 pages
Stats
No ratings yet
Stats
109 pages
Introduction Key Concepts
No ratings yet
Introduction Key Concepts
37 pages
Fybsc Stats Syllabus
No ratings yet
Fybsc Stats Syllabus
21 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
MCS Lecture 3
No ratings yet
MCS Lecture 3
57 pages
Qunt Data Coding & Analysis
No ratings yet
Qunt Data Coding & Analysis
104 pages
Unit 4 Descriptive Statistics
No ratings yet
Unit 4 Descriptive Statistics
8 pages
Statistics SLM
No ratings yet
Statistics SLM
7 pages
Business Statistics and Computing Complete Ppts
No ratings yet
Business Statistics and Computing Complete Ppts
213 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Business Statstics Complete
No ratings yet
Business Statstics Complete
13 pages
Week One: Introduction To Quantitative Methods MBA 2013
No ratings yet
Week One: Introduction To Quantitative Methods MBA 2013
49 pages
Statistical Analysis and Visualization
From Everand
Statistical Analysis and Visualization
Mohit Chatterjee
No ratings yet
Class 2 Exploratory Data Analysis
100% (1)
Class 2 Exploratory Data Analysis
18 pages
Class 5 Factor Analysis
No ratings yet
Class 5 Factor Analysis
6 pages
SEM-II Testing of Hypothesis
No ratings yet
SEM-II Testing of Hypothesis
8 pages
Class 7 Citation
No ratings yet
Class 7 Citation
9 pages
SEM II Scaling
No ratings yet
SEM II Scaling
10 pages
Im202 SFM
No ratings yet
Im202 SFM
2 pages
BNM854 Measures of Location
No ratings yet
BNM854 Measures of Location
11 pages
Low Flow Frequency Analysis of Three Rivers in Eastern Canada
No ratings yet
Low Flow Frequency Analysis of Three Rivers in Eastern Canada
33 pages
Test Bank For Statistics For People Who Think They Hate Statistics Using Microsoft Excel 2016 4th Edition Salkind 1483374084 9781483374086 Download
No ratings yet
Test Bank For Statistics For People Who Think They Hate Statistics Using Microsoft Excel 2016 4th Edition Salkind 1483374084 9781483374086 Download
55 pages
PROF ED Assessment and Evaluation of Learning 3
No ratings yet
PROF ED Assessment and Evaluation of Learning 3
4 pages
Basic Statistical Concepts For Nurses
100% (2)
Basic Statistical Concepts For Nurses
23 pages
Measures of Central Tendenc Ppty
No ratings yet
Measures of Central Tendenc Ppty
25 pages
Project Risk Management
100% (2)
Project Risk Management
104 pages
Lesson Plan in Mathematics VI
No ratings yet
Lesson Plan in Mathematics VI
8 pages
Assessment and Evaluation Learning 2
No ratings yet
Assessment and Evaluation Learning 2
14 pages
Module 6 Business Mathematics
No ratings yet
Module 6 Business Mathematics
22 pages
PK-Glossary PK Working Group 2004
No ratings yet
PK-Glossary PK Working Group 2004
23 pages
Statistics Previous Year Question Paper
No ratings yet
Statistics Previous Year Question Paper
12 pages
Measures of Central Tendency Edited 2025
No ratings yet
Measures of Central Tendency Edited 2025
11 pages
LM 9 Activity 1,2, and 3
No ratings yet
LM 9 Activity 1,2, and 3
6 pages
Final Diana Cris D. Gabriel
No ratings yet
Final Diana Cris D. Gabriel
41 pages
WKST Gd-9 Statistics
No ratings yet
WKST Gd-9 Statistics
3 pages
Mba (Rtmnu) Syllabus
No ratings yet
Mba (Rtmnu) Syllabus
89 pages
Statistics in Education - Made Simple
100% (1)
Statistics in Education - Made Simple
26 pages
SAT Statistics
No ratings yet
SAT Statistics
18 pages
Assignment - 1 - Business Statistics
No ratings yet
Assignment - 1 - Business Statistics
16 pages
Percentile Matching
No ratings yet
Percentile Matching
8 pages
Describing Data: Numerical Measures: Nguyen Thi Lien
No ratings yet
Describing Data: Numerical Measures: Nguyen Thi Lien
50 pages
Tasks #1-#5
No ratings yet
Tasks #1-#5
12 pages
Module 1 Complete Latest Updated
No ratings yet
Module 1 Complete Latest Updated
81 pages
Dlsau Stat Presentation Sept 10
No ratings yet
Dlsau Stat Presentation Sept 10
42 pages
Sampling Methods Applied To Fisheries Science: A Manual
No ratings yet
Sampling Methods Applied To Fisheries Science: A Manual
100 pages

Understanding Data: Dr. Rohit Vishal Kumar

Uploaded by

Understanding Data: Dr. Rohit Vishal Kumar

Uploaded by

UNDERSTANDING DATA

Dr. Rohit Vishal Kumar

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 1

• Each Discipline has evolved it’s own method of classification of data

• Two Broad Classification of Data Based on Source

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 2

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 3

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 4

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 5

ORDER DISTANCE ORIGIN

ORDER DISTANCE ORIGIN

– Nominal Data Mode, Chi-Square

– Ordinal Data + Median / Percentiles

– Interval Data + Mean / SD / Correlation / Regression /

– Ratio Scale + Geometric Mean / Harmonic Mean /

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 8

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 9

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 10

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 11

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 12

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 13

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 14

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 15

• Range and QD are positional measures of dispersion

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 16

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 17

• Quartile Deviation (Semi IQR)

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 18

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 19

• Coefficient of MAD defined as:

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 20

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 21

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 22

• Important to test skewness in data analysis as skewed

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 23

• The peakedness of Normal Curve is taken as 3

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 24

(C) Rohit Vishal Kumar Presented at WBUT 25-May-09 25

You might also like